|
|
Line 1: |
Line 1: |
− | === Introduction ===
| |
| | | |
− | vt is a tool set that calls, genotypes and filters short variants. It provides profiling of variants to aid in QC.
| |
− |
| |
− |
| |
− | === Location ===
| |
− |
| |
− | Internal usage
| |
− |
| |
− | /net/fantasia/home/atks/programs/vtools/vt
| |
− |
| |
− | External usage
| |
− |
| |
− | download from sourceforge/github
| |
− |
| |
− | === Discovery ===
| |
− |
| |
− | Discovery is performed at per sample level, the evidence sites lists for each sample are then merged and site discovery statistics are computed.
| |
− | The user then makes a decision on cut offs to make to create an initial candidate site list.
| |
− |
| |
− | Generates site list with info fields E and N.
| |
− |
| |
− | vt discover -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa
| |
− |
| |
− | Left align variants. This is required as left alignment if insertions and/or deletions within a read is sometimes insufficient to ensure complete left alignment.
| |
− |
| |
− | vt left_align -i NA12878.bam -o NA12878.leftaligned.sites.vcf -g hs37d5.fa
| |
− |
| |
− | Evidence site lists are combined across samples and split by sites to allow for parallelization.
| |
− |
| |
− | vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -l 5000
| |
− |
| |
− | Discovery statistics are computed. These statistics will allow you to choose a suitable cut off for creating a suitable candidate site list.
| |
− |
| |
− | vt compute_discovery_stats -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf
| |
− |
| |
− | Merge site lists.
| |
− |
| |
− | vt merge -i 1000000.sites.vcf,2000000.sites.vcf,3000000.sites.vcf -o candidate.sites.vcf
| |
− |
| |
− | Plot charts to help with candidate list selection criteria.
| |
− |
| |
− | vt plot_discovery -i candidate.sites.vcf
| |
− |
| |
− |
| |
− | A calling pipeline implemented in a make file is available here.
| |
− |
| |
− | === Genotyping ===
| |
− |
| |
− | Each individual is genotyped at a set of sites.
| |
− |
| |
− | vt genotype -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa
| |
− |
| |
− | Genotype sample VCFs are combined across samples and split by sites.
| |
− |
| |
− | vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -o 1-1000000.sites.vcf
| |
− |
| |
− | Features are computed.
| |
− |
| |
− | vt compute_features -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf
| |
− |
| |
− | A genotyping pipeline implemented in a make file is available here.
| |
− |
| |
− | === Filtering ===
| |
− |
| |
− | Requires a set of features AND an installed copy of SVMLight.
| |
− |
| |
− | vt filter NA12878.bam -i NA12878.sites.vcf -o NA12878.svm.sites.vcf --pos positive.sites.vcf --neg negative.sites.vcf
| |
− |
| |
− | A filtering pipeline implemented in a make file is available here.
| |
− |
| |
− | === Left Alignment ===
| |
− |
| |
− | Left align indel type variants in a VCF file.
| |
− |
| |
− | vt left_align -i mills.vcf -o mills.leftaligned.vcf
| |
− |
| |
− | === Profile SNPs ===
| |
− |
| |
− | Profile SNPs.
| |
− |
| |
− | vt profile_snps -i mills.snps.sites.vcf
| |
− |
| |
− | === Profile Indels ===
| |
− |
| |
− | Profile indels.
| |
− |
| |
− | vt profile_indels -i mills.indels.sites.vcf
| |
− |
| |
− | === Profile MNPs ===
| |
− |
| |
− | Profile MNPs.
| |
− |
| |
− | vt profile_mnps -i mills.mnps.sites.vcf
| |
− |
| |
− | === Sort ===
| |
− |
| |
− | Sort variants according to contig lists in header.
| |
− |
| |
− | vt sort -i mills.sites.vcf
| |
− |
| |
− | === Split by variant ===
| |
− |
| |
− | Split VCF files by variant type.
| |
− |
| |
− | vt split_by_variant -i mills.sites.vcf
| |
− |
| |
− | === Compute Feature ===
| |
− |
| |
− | Compute feature of variant.
| |
− |
| |
− | vt compute_feature -i mills.vcf
| |
− |
| |
− | === Summarize Variants ===
| |
− |
| |
− | Summarizes variants present in VCF file.
| |
− |
| |
− | vt peek -i mills.vcf
| |
− |
| |
− | === Plot Type ===
| |
− |
| |
− | Plot based on type.
| |
− |
| |
− | vt plot_<type> -i mills.xml
| |
− |
| |
− | vt plot_<type> -i mills.xml,hgdp.xml,um.xml
| |
− |
| |
− | === Resource Files ===
| |
− |
| |
− | dbSNP
| |
− | OMNI 1000G
| |
− | Mills
| |
− | HAPMAP
| |