Difference between revisions of "Vt"
Line 16: | Line 16: | ||
=== Discovery === | === Discovery === | ||
− | Discovery is performed at per sample level, the evidence sites lists for each sample | + | Discovery is performed at per sample level, the evidence sites lists for each sample are then merged and site discovery statistics are computed. |
− | The user then makes a decision on cut offs to make to create an initial site list. | + | The user then makes a decision on cut offs to make to create an initial candidate site list. |
Generates site list with info fields E and N. | Generates site list with info fields E and N. | ||
Line 23: | Line 23: | ||
vt discover -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa | vt discover -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa | ||
− | Left align variants. | + | Left align variants. This is required as left alignment if insertions and/or deletions within a read is sometimes insufficient to ensure complete left alignment. |
vt left_align -i NA12878.bam -o NA12878.leftaligned.sites.vcf -g hs37d5.fa | vt left_align -i NA12878.bam -o NA12878.leftaligned.sites.vcf -g hs37d5.fa | ||
− | Evidence site lists are combined across samples and split by sites. | + | Evidence site lists are combined across samples and split by sites to allow for parallelization. |
− | vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf - | + | vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -l 5000 |
− | Discovery statistics are computed. | + | Discovery statistics are computed. These statistics will allow you to choose a suitable cut off for creating a suitable candidate site list. |
vt compute_discovery_stats -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf | vt compute_discovery_stats -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf | ||
+ | |||
+ | Merge site lists. | ||
+ | |||
+ | vt merge -i 1000000.sites.vcf,2000000.sites.vcf,3000000.sites.vcf -o candidate.sites.vcf | ||
+ | |||
+ | Plot charts to help with candidate list selection criteria. | ||
+ | |||
+ | vt plot_discovery -i candidate.sites.vcf | ||
A calling pipeline implemented in a make file is available here. | A calling pipeline implemented in a make file is available here. | ||
− | + | ||
=== Genotyping === | === Genotyping === | ||
Revision as of 16:33, 12 March 2013
Introduction
vt is a tool set that calls, genotypes and filters short variants. It provides profiling of variants to aid in QC.
Location
Internal usage
/net/fantasia/home/atks/programs/vtools/vt
External usage
download from sourceforge/github
Discovery
Discovery is performed at per sample level, the evidence sites lists for each sample are then merged and site discovery statistics are computed. The user then makes a decision on cut offs to make to create an initial candidate site list.
Generates site list with info fields E and N.
vt discover -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa
Left align variants. This is required as left alignment if insertions and/or deletions within a read is sometimes insufficient to ensure complete left alignment.
vt left_align -i NA12878.bam -o NA12878.leftaligned.sites.vcf -g hs37d5.fa
Evidence site lists are combined across samples and split by sites to allow for parallelization.
vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -l 5000
Discovery statistics are computed. These statistics will allow you to choose a suitable cut off for creating a suitable candidate site list.
vt compute_discovery_stats -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf
Merge site lists.
vt merge -i 1000000.sites.vcf,2000000.sites.vcf,3000000.sites.vcf -o candidate.sites.vcf
Plot charts to help with candidate list selection criteria.
vt plot_discovery -i candidate.sites.vcf
A calling pipeline implemented in a make file is available here.
Genotyping
Each individual is genotyped at a set of sites.
vt genotype -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa
Genotype sample VCFs are combined across samples and split by sites.
vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -o 1-1000000.sites.vcf
Features are computed.
vt compute_features -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf
A genotyping pipeline implemented in a make file is available here.
Filtering
Requires a set of features
vt svm NA12878.bam -i NA12878.sites.vcf -o NA12878.svm.sites.vcf --pos positive.sites.vcf --neg negative.sites.vcf
A filtering pipeline implemented in a make file is available here.
Left Alignment
Left align indel type variants in a VCF file.
vt leftalign -i mills.vcf -o mills.leftaligned.vcf
Profile SNPs
Profile SNPs.
vt profile_snps -i mills.snps.sites.vcf
Profile Indels
Profile indels.
vt profile_indels -i mills.indels.sites.vcf
Profile MNPs
Profile MNPs.
vt profile_mnps -i mills.mnps.sites.vcf
Sort
Sort variants according to contig lists in header.
vt sort -i mills.sites.vcf
Split by variant
Split VCF files by variant type.
vt split_by_variant -i mills.sites.vcf
Compute Feature
Compute feature of variant.
vt compute_feature -i mills.vcf
Plot Type
Plot based on type.
vt plot_<type> -i mills.xml
vt plot_<type> -i mills.xml,hgdp.xml,um.xml
Resource Files
dbSNP OMNI 1000G Mills HAPMAP