Line 1: |
Line 1: |
| + | === Introduction === |
| | | |
| + | vt is a tool set that calls, genotypes and filters short variants. It provides profiling of variants to aid in QC. |
| + | |
| + | |
| + | === Location === |
| + | |
| + | Internal usage |
| + | |
| + | binaries |
| + | /net/fantasia/home/atks/programs/vt |
| + | |
| + | test data |
| + | /net/fantasia/home/atks/programs/vt/test |
| + | |
| + | scripts |
| + | /net/fantasia/home/atks/programs/vt/scripts |
| + | |
| + | External usage |
| + | |
| + | download from sourceforge/github |
| + | |
| + | == Common options patterns == |
| + | |
| + | -i defines the input file and by default, this is a require parameter, |
| + | however, you may set it as '-' to accept STDIN which by default is |
| + | assumed to be a non compressed format. |
| + | |
| + | -o defines the out file which and has the STDOUT set as the default. |
| + | You may modify the STDOUT to output the binary version of the format, |
| + | e.g. BCF. with the option -c |
| + | |
| + | == Major Workflows == |
| + | |
| + | === Discovery === |
| + | |
| + | Discovery is performed at per sample level, the evidence sites lists for each sample are then merged and site discovery statistics are computed. |
| + | The user then makes a decision on cut offs to make to create an initial candidate site list. |
| + | |
| + | Generates site list with info fields E and N. |
| + | |
| + | vt discover -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa |
| + | |
| + | Normalize(including left aligning) variants. This is required as left alignment of insertions and/or deletions within a read is sometimes insufficient to ensure complete left alignment. |
| + | |
| + | vt normalize -i NA12878.bam -o NA12878.normalized.sites.vcf -g hs37d5.fa |
| + | |
| + | Evidence site lists are combined across samples and split by sites to allow for parallelization. |
| + | |
| + | vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -l 5000 |
| + | |
| + | Discovery statistics are computed. These statistics will allow you to choose a suitable cut off for creating a suitable candidate site list. |
| + | |
| + | vt compute_discovery_stats -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf |
| + | |
| + | Merge site lists. |
| + | |
| + | vt merge -i 1000000.sites.vcf,2000000.sites.vcf,3000000.sites.vcf -o candidate.sites.vcf |
| + | |
| + | Plot charts to help with candidate list selection criteria. |
| + | |
| + | vt plot_discovery -i candidate.sites.vcf |
| + | |
| + | |
| + | A calling pipeline implemented in a make file is available here. |
| + | |
| + | === Genotyping === |
| + | |
| + | Each individual is genotyped at a set of sites. |
| + | |
| + | vt genotype -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa |
| + | |
| + | Genotype sample VCFs are combined across samples and split by sites. |
| + | |
| + | vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -o 1-1000000.sites.vcf |
| + | |
| + | Features are computed. |
| + | |
| + | vt compute_features -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf |
| + | |
| + | A genotyping pipeline implemented in a make file is available here. |
| + | |
| + | === Filtering === |
| + | |
| + | Requires a set of features AND an installed copy of SVMLight. |
| + | |
| + | vt filter NA12878.bam -i NA12878.sites.vcf -o NA12878.svm.sites.vcf --pos positive.sites.vcf --neg negative.sites.vcf |
| + | |
| + | A filtering pipeline implemented in a make file is available here. |
| + | |
| + | == Generation == |
| + | |
| + | === Discovery === |
| + | |
| + | Discovers variants from bams. |
| + | |
| + | Options: |
| + | -b, --input-bam-file : Input BAM file |
| + | -o, --output-vcf-file : Output VCF file |
| + | -v, --variant-type : Variant Types, takes on any combinations of |
| + | the values snps,mnps,indels comma delimited |
| + | [snps,mnps,indels] |
| + | -q, --q-cutoff : BASE Cutoff, only bases with |
| + | QUAL/BAQ >= baseq are considered [13] |
| + | -m, --mapq-cutoff : MAPQ Cutoff, only alignments with |
| + | map quality >= mapq are considered [20] |
| + | -g, --genome-fa-file : Genome FASTA file |
| + | -s, --sample-id : Sample ID |
| + | |
| + | Example: |
| + | e.g. vt discover -b in.bam -o - -g ref.fa -v snps,indels -s HG0001 |
| + | e.g. bam mergeBam --in a.bam --in b.bam -o - | |
| + | vt discover -b - -o out.sites.vcf -g ref.fa -v all -s HG0001 | |
| + | vt left_align -i - | vt merge_duplicate_variants |
| + | |
| + | === Genotyping === |
| + | |
| + | Genotypes variants for each sample. |
| + | |
| + | Options: |
| + | -b, --input-bam-file : Input BAM file |
| + | -i, --input-candidate-vcf : Input Candidate VCF file |
| + | -o, --output-vcf-file : Output VCF file |
| + | -v, --variant-type : Variant Types, takes on any combinations |
| + | of the values snps,mnps,indels comma |
| + | delimited [snps,mnps,indels] |
| + | -g, --genome-fa-file : Genome FASTA file |
| + | -s, --sample-id : Sample ID |
| + | |
| + | Example: |
| + | e.g. vt genotype -b in.bam -i candidate.sites.vcf -o - -g ref.fa -s HG0001 |
| + | |
| + | == Annotation == |
| + | |
| + | === Make Probes === |
| + | |
| + | Populates the info field with REFPROBE, ALTPROBE and PLEN tags for genotyping. |
| + | |
| + | Options: |
| + | -i, --input-vcf <string> : Input VCF file |
| + | -o, --output-vcf <string> : Output VCF file [-] |
| + | -g, --genome-fa : Genome FASTA file [/net/fantasia/home/atks/ref/genome/human.g1k.v37.fa] |
| + | -f, --flank-length <integer> : Minimum Flank Length [20] |
| + | |
| + | Example: |
| + | e.g. vt make_probes -i 8904indels.dups.genotypes.vcf -o probes.sites.vcf -g ref.fa |
| + | |
| + | === Compute Feature === |
| + | |
| + | Compute feature of variant. |
| + | |
| + | vt compute_feature -i mills.vcf |
| + | |
| + | === Compute Allele balance === |
| + | |
| + | Compute [http://genome.sph.umich.edu/wiki/Genotype_Likelihood_Based_Allele_Balance allele balance]. Outputs allele balance, allele frequency, genotype frequency. |
| + | |
| + | vt compute_ab -i mills.vcf |
| + | |
| + | === Compute Allele Frequency === |
| + | |
| + | Compute [http://genome.sph.umich.edu/wiki/Genotype_Likelihood_based_Allele_Frequency allele frequency]. Outputs allele frequency and genotype frequency. |
| + | |
| + | vt compute_af -i mills.vcf |
| + | |
| + | === Compute Inbreeding Coefficient === |
| + | |
| + | Compute [http://genome.sph.umich.edu/wiki/Genotype_Likelihood_based_Inbreeding_Coefficient inbreeding coefficient]. Outputs inbreeding coefficient based on genotype likelihoods. |
| + | |
| + | vt compute_fic -i mills.vcf |
| + | |
| + | === Compute HWE === |
| + | |
| + | Compute [http://genome.sph.umich.edu/wiki/Genotype_Likelihood_based_Hardy-Weinberg_Test Hardy-Weinberg equilibrium statistic]. Outputs PHRED scaled HWE Test p-values for biallelic as well as multiallelic variants. |
| + | |
| + | vt compute_hwe -i mills.vcf |
| + | |
| + | === Compute Mendelian Error === |
| + | |
| + | Compute mendelian error statistics. Outputs allele frequency and genotype frequency. |
| + | |
| + | vt compute_mendel -i mills.vcf |
| + | |
| + | === Compute features === |
| + | |
| + | vt compute_<feature1>_<feature2>_ ... _<feature n> -i mills.vcf |
| + | |
| + | == Modification == |
| + | |
| + | === Left Alignment === |
| + | |
| + | [http://genome.sph.umich.edu/wiki/Variant_Normalization Left aligns] indel type variants in a VCF file. This differs from normalization in that it only left aligns and left trims a variant. This affects Indels only. |
| + | |
| + | vt left_align -i mills.vcf -o mills.leftaligned.vcf |
| + | |
| + | === Normalization === |
| + | |
| + | [http://genome.sph.umich.edu/wiki/Variant_Normalization Normalize] variants in a VCF file. |
| + | |
| + | vt normalize -i mills.vcf -o mills.normalized.vcf |
| + | |
| + | === Merge duplicate variants === |
| + | |
| + | Merges duplicate variants by position with the option of considering alleles. (This just discards the duplicate variant that appears later in the VCF file) |
| + | |
| + | Options: |
| + | -i, --input-vcf <string> : Input VCF file |
| + | -o, --output-vcf <string> : Output VCF file [-] |
| + | -p, --merge-by-position : Merge by position [false] |
| + | |
| + | Example: |
| + | e.g. vt merge_duplicate_variants -i 8904indels.dups.genotypes.vcf -o out.vcf |
| + | e.g. vt merge_duplicate_variants -p -i 8904indels.dups.genotypes.vcf -o out.vcf |
| + | |
| + | == Profiling == |
| + | |
| + | A standard procedure is as follows: |
| + | |
| + | zcat dataset.vcf.gx | vt normalize -i - | vt merge_duplicate_variants -i - > dataset.normalized.vcf |
| + | |
| + | cut -f1-8 dataset.normalized.vcf > dataset.sites.vcf |
| + | |
| + | cat dataset.normalized.sites.vcf | vt profile_snps -i - > snps.summary.log |
| + | |
| + | |
| + | |
| + | === Profile SNPs === |
| + | |
| + | Profile SNPs. |
| + | |
| + | * ts/tv ratio |
| + | * overlap analyses |
| + | |
| + | vt profile_snps -i mills.snps.sites.vcf |
| + | |
| + | === Profile Indels === |
| + | |
| + | Profile indels. |
| + | |
| + | * Overlap analyses with known data sets |
| + | * FS/NFS annotation |
| + | |
| + | vt profile_indels mills.indels.sites.vcf |
| + | |
| + | === Profile MNPs === |
| + | |
| + | Profile MNPs. |
| + | |
| + | vt profile_mnps -i mills.mnps.sites.vcf |
| + | |
| + | === Summarize Variants === |
| + | |
| + | Summarizes variants present in VCF file. |
| + | |
| + | vt peek -i mills.vcf |
| + | |
| + | == Plotting == |
| + | |
| + | === Allele Frequency Spectrum === |
| + | |
| + | Plots Allele Frequency Spectrum of variants found in VCF file |
| + | |
| + | vt plot_afs -i mills.xml |
| + | |
| + | === Genotype Likelihood Concordance === |
| + | |
| + | Plots Genotype Likelihood Concordance graph. |
| + | |
| + | vt plot_gl -i mills.xml |
| + | |
| + | === Allele Balance Spectrum=== |
| + | |
| + | Plots Allele Balance graph of variants in the VCF file. |
| + | |
| + | vt plot_ab -i mills.xml |
| + | |
| + | = VCF File Manipulation = |
| + | |
| + | === Sort === |
| + | |
| + | Sort variants according to contig lists in header. |
| + | |
| + | vt sort -i mills.sites.vcf |
| + | |
| + | === Split by variant === |
| + | |
| + | Split VCF files by variant type. |
| + | |
| + | vt split_by_variant -i mills.sites.vcf |
| + | |
| + | = Resource Files = |
| + | |
| + | dbSNP |
| + | OMNI 1000G |
| + | Mills |
| + | HAPMAP |
| + | |
| + | = Maintained by = |
| + | |
| + | This page is maintained by [mailto:atks@umich.edu Adrian] |