Difference between revisions of "Vt"

Revision as of 16:33, 12 March 2013

Introduction

vt is a tool set that calls, genotypes and filters short variants. It provides profiling of variants to aid in QC.

Location

Internal usage

 /net/fantasia/home/atks/programs/vtools/vt

External usage

 download from sourceforge/github

Discovery

Discovery is performed at per sample level, the evidence sites lists for each sample are then merged and site discovery statistics are computed. The user then makes a decision on cut offs to make to create an initial candidate site list.

Generates site list with info fields E and N.

vt discover -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa

Left align variants. This is required as left alignment if insertions and/or deletions within a read is sometimes insufficient to ensure complete left alignment.

vt left_align -i NA12878.bam -o NA12878.leftaligned.sites.vcf -g hs37d5.fa

Evidence site lists are combined across samples and split by sites to allow for parallelization.

vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -l 5000

Discovery statistics are computed. These statistics will allow you to choose a suitable cut off for creating a suitable candidate site list.

vt compute_discovery_stats -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf

Merge site lists.

vt merge -i 1000000.sites.vcf,2000000.sites.vcf,3000000.sites.vcf -o candidate.sites.vcf

Plot charts to help with candidate list selection criteria.

vt plot_discovery -i candidate.sites.vcf

A calling pipeline implemented in a make file is available here.

Genotyping

Each individual is genotyped at a set of sites.

vt genotype -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa

Genotype sample VCFs are combined across samples and split by sites.

vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -o 1-1000000.sites.vcf

Features are computed.

vt compute_features -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf

A genotyping pipeline implemented in a make file is available here.

Filtering

Requires a set of features

vt  svm NA12878.bam -i NA12878.sites.vcf -o NA12878.svm.sites.vcf --pos positive.sites.vcf --neg negative.sites.vcf

A filtering pipeline implemented in a make file is available here.

Left Alignment

Left align indel type variants in a VCF file.

vt leftalign -i mills.vcf -o mills.leftaligned.vcf

Profile SNPs

Profile SNPs.

vt profile_snps -i mills.snps.sites.vcf

Profile Indels

Profile indels.

vt profile_indels -i mills.indels.sites.vcf

Profile MNPs

Profile MNPs.

vt profile_mnps -i mills.mnps.sites.vcf

Sort

Sort variants according to contig lists in header.

vt sort -i mills.sites.vcf

Split by variant

Split VCF files by variant type.

vt split_by_variant -i mills.sites.vcf

Compute Feature

Compute feature of variant.

vt compute_feature -i mills.vcf

Plot Type

Plot based on type.

vt plot_<type> -i mills.xml

vt plot_<type> -i mills.xml,hgdp.xml,um.xml

Resource Files

dbSNP OMNI 1000G Mills HAPMAP

Difference between revisions of "Vt"

Revision as of 16:33, 12 March 2013

Contents

Introduction

Location

Discovery

Genotyping

Filtering

Left Alignment

Profile SNPs

Profile Indels

Profile MNPs

Sort

Split by variant

Compute Feature

Plot Type

Resource Files

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools

@@ Line 16: / Line 16: @@
 === Discovery ===
-Discovery is performed at per sample level, the evidence sites lists for each sample is then merged and site discovery statistics are computed.
+Discovery is performed at per sample level, the evidence sites lists for each sample are then merged and site discovery statistics are computed.
-The user then makes a decision on cut offs to make to create an initial site list.
+The user then makes a decision on cut offs to make to create an initial candidate site list.
 Generates site list with info fields E and N.
@@ Line 23: / Line 23: @@
   vt discover -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa
-Left align variants.
+Left align variants.  This is required as left alignment if insertions and/or deletions within a read is sometimes insufficient to ensure complete left alignment.
   vt left_align -i NA12878.bam -o NA12878.leftaligned.sites.vcf -g hs37d5.fa
-Evidence site lists are combined across samples and split by sites.
+Evidence site lists are combined across samples and split by sites to allow for parallelization.
-  vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -o 1-1000000.sites.vcf
+  vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -l 5000
-Discovery statistics are computed.
+Discovery statistics are computed.  These statistics will allow you to choose a suitable cut off for creating a suitable candidate site list.
   vt compute_discovery_stats -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf
+Merge site lists.
+ vt merge -i 1000000.sites.vcf,2000000.sites.vcf,3000000.sites.vcf -o candidate.sites.vcf
+Plot charts to help with candidate list selection criteria.
+ vt plot_discovery -i candidate.sites.vcf
 A calling pipeline implemented in a make file is available here.
 === Genotyping ===