Difference between revisions of "Vt"

From Genome Analysis Wiki
Jump to navigationJump to search
(Created page with '=== Introduction === vt is a tool set that calls, genotypes and filters short variants. It provides profiling of variants to aid in QC. === Discovery === Discovery is perform…')
 
Line 56: Line 56:
  
 
  vt leftalign -i mills.vcf -o mills.leftaligned.vcf
 
  vt leftalign -i mills.vcf -o mills.leftaligned.vcf
 +
 +
=== Profile SNPs ===
 +
 +
Profile SNPs.
 +
 +
vt profile_snps -i mills.snps.sites.vcf
 +
 +
=== Profile Indels ===
 +
 +
Profile indels.
 +
 +
vt profile_indels -i mills.indels.sites.vcf
 +
 +
=== Profile MNPs ===
 +
 +
Profile MNPs.
 +
 +
vt profile_mnps -i mills.mnps.sites.vcf
 +
 +
=== sort ===
 +
 +
Sort variants according to contig lists in header.
 +
 +
vt sort -i mills.sites.vcf
 +
 +
=== split_by_variant ===
 +
 +
Split VCF files by variant type.
 +
 +
vt split_by_variant -i mills.sites.vcf
 +
 +
=== compute_<feature> ===
 +
 +
Compute feature of variant.
 +
 +
vt compute_feature -i mills.vcf
 +
 +
 +
 +
=== Resource Files ===
 +
 +
dbSNP
 +
OMNI 1000G
 +
Mills
 +
HAPMAP

Revision as of 16:18, 12 March 2013

Introduction

vt is a tool set that calls, genotypes and filters short variants. It provides profiling of variants to aid in QC.

Discovery

Discovery is performed at per sample level, the evidence sites lists for each sample is then merged and site discovery statistics are computed. The user then makes a decision on cut offs to make to create an initial site list.

Generates site list with info fields E and N.

vt discover -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa

Left align variants.

vt left_align -i NA12878.bam -o NA12878.leftaligned.sites.vcf -g hs37d5.fa

Evidence site lists are combined across samples and split by sites.

vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -o 1-1000000.sites.vcf

Discovery statistics are computed.

vt compute_discovery_stats -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf 


A calling pipeline implemented in a make file is available here.

Genotyping

Each individual is genotyped at a set of sites.

vt genotype -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa

Genotype sample VCFs are combined across samples and split by sites.

vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -o 1-1000000.sites.vcf

Features are computed.

vt compute_features -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf 

A genotyping pipeline implemented in a make file is available here.

Filtering

Requires a set of features

vt  svm NA12878.bam -i NA12878.sites.vcf -o NA12878.svm.sites.vcf --pos positive.sites.vcf --neg negative.sites.vcf

A filtering pipeline implemented in a make file is available here.

Left Alignment

Left align indel type variants in a VCF file.

vt leftalign -i mills.vcf -o mills.leftaligned.vcf

Profile SNPs

Profile SNPs.

vt profile_snps -i mills.snps.sites.vcf

Profile Indels

Profile indels.

vt profile_indels -i mills.indels.sites.vcf

Profile MNPs

Profile MNPs.

vt profile_mnps -i mills.mnps.sites.vcf

sort

Sort variants according to contig lists in header.

vt sort -i mills.sites.vcf

split_by_variant

Split VCF files by variant type.

vt split_by_variant -i mills.sites.vcf

compute_<feature>

Compute feature of variant.

vt compute_feature -i mills.vcf


Resource Files

dbSNP OMNI 1000G Mills HAPMAP