Difference between revisions of "Vt"
(→View) |
|||
Line 28: | Line 28: | ||
-o defines the out file which and has the STDOUT set as the default. | -o defines the out file which and has the STDOUT set as the default. | ||
You may modify the STDOUT to output the binary version of the format. | You may modify the STDOUT to output the binary version of the format. | ||
+ | |||
+ | == Alternate headers == | ||
+ | |||
+ | As BCF is a restrictive format of VCF where all meta data must be present in the header, | ||
+ | vt provides a mechanism to read an alternative header for VCF files that do not have a | ||
+ | complete header. Simply provide a header file stub named as <vcf-file>.hdr and vt | ||
+ | will automatically read it instead of the original header in <vcf-file>. | ||
== VCF Manipulation == | == VCF Manipulation == |
Revision as of 16:55, 17 December 2013
Introduction
vt is a variant tool set that discovers short variants from Next Generation Sequencing data. The features are being rolled out to github as major rewriting is being undertaken.
Installation
The source files are housed in github. htslib is used and a copy of a developmental freeze is stored as part of the vt repository to ensure compatibility.
To install, perform the following steps:
#this will create a directory named vt in the directory you cloned the repository 1. git clone https://github.com/atks/vt.git #change directory to vt 2. cd vt #run make, note that compilers need to support the c++0x standard 3. make
Building has been tested on Linux and Mac systems on gcc 4.3 and above and clang 3.4.
Common options
-i multiple intervals in <seq>:<start>-<end> format delimited by commas.
-I multiple intervals in <seq>:<start>-<end> format listed in a text file line by line.
-o defines the out file which and has the STDOUT set as the default. You may modify the STDOUT to output the binary version of the format.
Alternate headers
As BCF is a restrictive format of VCF where all meta data must be present in the header, vt provides a mechanism to read an alternative header for VCF files that do not have a complete header. Simply provide a header file stub named as <vcf-file>.hdr and vt will automatically read it instead of the original header in <vcf-file>.
VCF Manipulation
View
Views a VCF or VCF.GZ or BCF file.
#views mills.bcf and outputs to standard out vt view mills.bcf #views mills.bcf and locally sorts it in a 10000bp window and outputs to out.bcf vt view -w 10000 mills.bcf
usage : vt view [options] <in.vcf>
options : -o output VCF/VCF.GZ/BCF file [-] -w local sorting window size [0] -p print options and summary [] -I file containing list of intervals [] -i intervals []
Index
Indexes a VCF.GZ or BCF file.
#indexes mills.bcf vt index mills.bcf
usage : vt index [options] <in.vcf>
options : -p print options and summary [] -- ignores the rest of the labeled arguments following this flag -h displays help
Sorting
Local sorting can be done using | view setting the -w option to a non 0 value.
Normalization
Normalize variants in a VCF file.
Normalized variants may have their positions changed; in such cases, the normalized variants are reordered and output in an ordered fashion. The local reordering takes place over a window of 10000 base pairs.
#normalize variants and write out to mills.normalized.vcf vt normalize mills.vcf -r seq.fa -o mills.normalized.vcf #normalize variants, send to standard out and remove duplicates. vt normalize mills.vcf -r seq.fa | vt merge_duplicate_variants - -o mills.normalized.merged.vcf
usage : vt normalize [options] <in.vcf>
options : -o output VCF file [-] -I file containing list of intervals [] -i intervals [] -r reference sequence fasta file [] -- ignores the rest of the labeled arguments following this flag -h displays help
Merge duplicate variants
Merges duplicate variants by position with the option of considering alleles. (This just discards the duplicate variant that appears later in the VCF file)
#merge duplicate variants and save output in mills.merged.vcf vt mergedups mills.vcf -o mills.merged.vcf
usage : vt mergedups [options] <in.vcf>
options : -o output VCF file [-] -p merge by position [false]
Peek
Summarizes the variants in a VCF file
#merge duplicate variants and save output in mills.merged.vcf vt view mills.vcf
usage : vt peek [options] <in.vcf>
options : -o output VCF file [-] -I file containing list of intervals [] -i intervals [] -r reference sequence fasta file [] -- ignores the rest of the labeled arguments following this flag -h displays help
Variant Calling
Discover
Discovers variants from reads in a BAM file.
#discover variants from NA12878.bam and write to stdout vt discover -b NA12878.bam -s NA12878 -r hs37d5.fa -i 20 -v snps,indels,mnps
usage : vt discover [options]
options : -b input BAM file -v variant types [snps,mnps,indels] -f fractional evidence cutoff for candidate allele [0.1] -e evidence count cutoff for candidate allele [2] -q base quality cutoff for bases [13] -m MAPQ cutoff for alignments [20] -s sample ID -r reference sequence fasta file [] -o output VCF file [-] -I file containing list of intervals [] -i intervals [] -- ignores the rest of the labeled arguments following this flag -h displays help
Merge candidate variants
Merge candidate variants across samples. Each VCF file is required to have the FORMAT flags E and N and should have exactly one sample.
#merge candidate variants from VCFs in candidate.txt and output in candidate.sites.vcf vt merge_candidate_variants candidates.txt -o candidate.sites.vcf
usage : vt merge_candidate_variants [options]
options : -L file containing list of input VCF files -o output VCF file [-] -I file containing list of intervals [] -i intervals -- ignores the rest of the labeled arguments following this flag -h displays help
Construct Probes
Construct probes for genotyping a variant.
#construct probes from candidate.sites.bcf and output to standard out vt construct_probes candidates.sites.bcf -r ref.fa
usage : vt construct_probes [options] <in.vcf>
options : -o output VCF file [-] -f minimum flank length [20] -r reference sequence fasta file [] -I file containing list of intervals [] -i intervals [] -- ignores the rest of the labeled arguments following this flag -h displays help
Genotype
Genotypes variants for each sample.
#genotypes variants found in candidate.sites.vcf from sample.bam vt genotype -r seq.fa -b sample.bam -i candidates.sites.vcf -o sample.sites.vcf
usage : vt genotype [options]
options : -r reference sequence fasta file [] -s sample ID [] -o output VCF file [-] -b input BAM file [] -i input candidate VCF file [] -- ignores the rest of the labeled arguments following this flag -h displays help
Resource Bundle
Maintained by
This page is maintained by Adrian