Difference between revisions of "Vt"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 66: Line 66:
 
Requires a set of features AND an installed copy of SVMLight.
 
Requires a set of features AND an installed copy of SVMLight.
  
  vt svm NA12878.bam -i NA12878.sites.vcf -o NA12878.svm.sites.vcf --pos positive.sites.vcf --neg negative.sites.vcf
+
  vt filter NA12878.bam -i NA12878.sites.vcf -o NA12878.svm.sites.vcf --pos positive.sites.vcf --neg negative.sites.vcf
  
 
A filtering pipeline implemented in a make file is available here.
 
A filtering pipeline implemented in a make file is available here.

Revision as of 16:58, 12 March 2013

Introduction

vt is a tool set that calls, genotypes and filters short variants. It provides profiling of variants to aid in QC.


Location

Internal usage

 /net/fantasia/home/atks/programs/vtools/vt

External usage

 download from sourceforge/github

Discovery

Discovery is performed at per sample level, the evidence sites lists for each sample are then merged and site discovery statistics are computed. The user then makes a decision on cut offs to make to create an initial candidate site list.

Generates site list with info fields E and N.

vt discover -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa

Left align variants. This is required as left alignment if insertions and/or deletions within a read is sometimes insufficient to ensure complete left alignment.

vt left_align -i NA12878.bam -o NA12878.leftaligned.sites.vcf -g hs37d5.fa

Evidence site lists are combined across samples and split by sites to allow for parallelization.

vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -l 5000

Discovery statistics are computed. These statistics will allow you to choose a suitable cut off for creating a suitable candidate site list.

vt compute_discovery_stats -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf 

Merge site lists.

vt merge -i 1000000.sites.vcf,2000000.sites.vcf,3000000.sites.vcf -o candidate.sites.vcf 

Plot charts to help with candidate list selection criteria.

vt plot_discovery -i candidate.sites.vcf 


A calling pipeline implemented in a make file is available here.

Genotyping

Each individual is genotyped at a set of sites.

vt genotype -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa

Genotype sample VCFs are combined across samples and split by sites.

vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -o 1-1000000.sites.vcf

Features are computed.

vt compute_features -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf 

A genotyping pipeline implemented in a make file is available here.

Filtering

Requires a set of features AND an installed copy of SVMLight.

vt filter NA12878.bam -i NA12878.sites.vcf -o NA12878.svm.sites.vcf --pos positive.sites.vcf --neg negative.sites.vcf

A filtering pipeline implemented in a make file is available here.

Left Alignment

Left align indel type variants in a VCF file.

vt leftalign -i mills.vcf -o mills.leftaligned.vcf

Profile SNPs

Profile SNPs.

vt profile_snps -i mills.snps.sites.vcf

Profile Indels

Profile indels.

vt profile_indels -i mills.indels.sites.vcf

Profile MNPs

Profile MNPs.

vt profile_mnps -i mills.mnps.sites.vcf

Sort

Sort variants according to contig lists in header.

vt sort -i mills.sites.vcf

Split by variant

Split VCF files by variant type.

vt split_by_variant -i mills.sites.vcf

Compute Feature

Compute feature of variant.

vt compute_feature -i mills.vcf

Summarize Variants

Summarizes variants present in VCF file.

vt peek -i mills.vcf

Plot Type

Plot based on type.

vt plot_<type> -i mills.xml
vt plot_<type> -i mills.xml,hgdp.xml,um.xml

Resource Files

dbSNP OMNI 1000G Mills HAPMAP