Vt

From Genome Analysis Wiki
Jump to navigationJump to search

Introduction

vt is a tool set that calls, genotypes and filters short variants. It provides profiling of variants to aid in QC.

Discovery

Discovery is performed at per sample level, the evidence sites lists for each sample is then merged and site discovery statistics are computed. The user then makes a decision on cut offs to make to create an initial site list.

Generates site list with info fields E and N.

vt discover -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa

Left align variants.

vt left_align -i NA12878.bam -o NA12878.leftaligned.sites.vcf -g hs37d5.fa

Evidence site lists are combined across samples and split by sites.

vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -o 1-1000000.sites.vcf

Discovery statistics are computed.

vt compute_discovery_stats -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf 


A calling pipeline implemented in a make file is available here.

Genotyping

Each individual is genotyped at a set of sites.

vt genotype -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa

Genotype sample VCFs are combined across samples and split by sites.

vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -o 1-1000000.sites.vcf

Features are computed.

vt compute_features -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf 

A genotyping pipeline implemented in a make file is available here.

Filtering

Requires a set of features

vt  svm NA12878.bam -i NA12878.sites.vcf -o NA12878.svm.sites.vcf --pos positive.sites.vcf --neg negative.sites.vcf

A filtering pipeline implemented in a make file is available here.

Left Alignment

Left align indel type variants in a VCF file.

vt leftalign -i mills.vcf -o mills.leftaligned.vcf

Profile SNPs

Profile SNPs.

vt profile_snps -i mills.snps.sites.vcf

Profile Indels

Profile indels.

vt profile_indels -i mills.indels.sites.vcf

Profile MNPs

Profile MNPs.

vt profile_mnps -i mills.mnps.sites.vcf

sort

Sort variants according to contig lists in header.

vt sort -i mills.sites.vcf

split_by_variant

Split VCF files by variant type.

vt split_by_variant -i mills.sites.vcf

compute_<feature>

Compute feature of variant.

vt compute_feature -i mills.vcf


Resource Files

dbSNP OMNI 1000G Mills HAPMAP