Revision as of 16:25, 21 October 2013

Introduction

vt is a variant tool set that discovers short variants from Next Generation Sequencing data. The features are being rolled out to github as major rewriting is being undertaken.

Location

You may pull it from github:

 git clone https://github.com/atks/vt.git

Common options

   -i  multiple intervals in <seq>:start-end format

   -o defines the out file which and has the STDOUT set as the default.
      You may modify the STDOUT to output the binary version of the format.

Normalization

Normalize variants in a VCF file.

vt normalize -i mills.vcf -o mills.normalized.vcf

Merge duplicate variants

Merges duplicate variants by position with the option of considering alleles. (This just discards the duplicate variant that appears later in the VCF file)

  Options:
  -i,  --input-vcf <string>  : Input VCF file
  -o,  --output-vcf <string> : Output VCF file [-]
  -p,  --merge-by-position   : Merge by position [false]

  Example:
  e.g. vt merge_duplicate_variants -i 8904indels.dups.genotypes.vcf -o out.vcf
  e.g. vt merge_duplicate_variants -p -i 8904indels.dups.genotypes.vcf -o out.vcf

Maintained by

This page is maintained by Adrian

Difference between revisions of "Vt"

Revision as of 16:25, 21 October 2013

Contents

Introduction

Location

Common options

Normalization

Merge duplicate variants

Maintained by

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools

@@ Line 1: / Line 1: @@
 === Introduction ===
-vt is a tool set that calls, genotypes and filters short variants.  It provides profiling of variants to aid in QC.
+vt is a variant tool set that discovers short variants from Next Generation Sequencing data.  The features are being rolled out to github as major rewriting is being undertaken.
 === Location ===
-Internal usage
+You may pull it from github:
-   binaries
-  /net/fantasia/home/atks/programs/vt
-  test data
-  /net/fantasia/home/atks/programs/vt/test
-  scripts
-  /net/fantasia/home/atks/programs/vt/scripts
-External usage
-   download from sourceforge/github
+   git clone https://github.com/atks/vt.git
-== Common options patterns ==
+== Common options ==
-     -i defines the input file and by default, this is a require parameter,
+     -i  multiple intervals in <seq>:start-end format
-       however, you may set it as '-' to accept STDIN which by default is
-       assumed to be a non compressed format.
      -o defines the out file which and has the STDOUT set as the default.
-        You may modify the STDOUT to output the binary version of the format,
+        You may modify the STDOUT to output the binary version of the format.
-       e.g. BCF. with the option -c
-== Major Workflows ==
-=== Discovery ===
-Discovery is performed at per sample level, the evidence sites lists for each sample are then merged and site discovery statistics are computed.
-The user then makes a decision on cut offs to make to create an initial candidate site list.
-Generates site list with info fields E and N.
- vt discover -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa
-Normalize(including left aligning) variants.  This is required as left alignment of insertions and/or deletions within a read is sometimes insufficient to ensure complete left alignment.
- vt normalize -i NA12878.bam -o NA12878.normalized.sites.vcf -g hs37d5.fa
-Evidence site lists are combined across samples and split by sites to allow for parallelization.
- vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -l 5000
-Discovery statistics are computed.  These statistics will allow you to choose a suitable cut off for creating a suitable candidate site list.
- vt compute_discovery_stats -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf
-Merge site lists.
- vt merge -i 1000000.sites.vcf,2000000.sites.vcf,3000000.sites.vcf -o candidate.sites.vcf
-Plot charts to help with candidate list selection criteria.
- vt plot_discovery -i candidate.sites.vcf
-A calling pipeline implemented in a make file is available here.
-=== Genotyping ===
-Each individual is genotyped at a set of sites.
- vt genotype -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa
-Genotype sample VCFs are combined across samples and split by sites.
- vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -o 1-1000000.sites.vcf
-Features are computed.
- vt compute_features -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf
-A  genotyping pipeline implemented in a make file is available here.
-=== Filtering ===
-Requires a set of features AND an installed copy of SVMLight.
- vt filter NA12878.bam -i NA12878.sites.vcf -o NA12878.svm.sites.vcf --pos positive.sites.vcf --neg negative.sites.vcf
-A filtering pipeline implemented in a make file is available here.
-== Generation ==
-=== Discovery ===
-Discovers variants from bams.
-   Options:
-   -b,  --input-bam-file    : Input BAM file
-   -o,  --output-vcf-file   : Output VCF file
-   -v,  --variant-type      : Variant Types, takes on any combinations of
-                              the values snps,mnps,indels comma delimited
-                              [snps,mnps,indels]
-   -q,  --q-cutoff          : BASE Cutoff, only bases with
-                              QUAL/BAQ >= baseq are considered [13]
-   -m,  --mapq-cutoff       : MAPQ Cutoff, only alignments with
-                              map quality >= mapq are considered [20]
-   -g,  --genome-fa-file    : Genome FASTA file
-   -s,  --sample-id         : Sample ID
-   Example:
-   e.g. vt discover -b in.bam -o - -g ref.fa -v snps,indels -s HG0001
-   e.g. bam mergeBam --in a.bam --in b.bam -o - |
-        vt discover -b - -o out.sites.vcf -g ref.fa -v all -s HG0001 |
-        vt left_align -i - | vt merge_duplicate_variants
-=== Genotyping ===
-Genotypes variants for each sample.
-   Options:
-   -b,  --input-bam-file        : Input BAM file
-   -i,  --input-candidate-vcf   : Input Candidate VCF file
-   -o,  --output-vcf-file       : Output VCF file
-   -v,  --variant-type          : Variant Types, takes on any combinations
-                                  of the values snps,mnps,indels comma
-                                  delimited [snps,mnps,indels]
-   -g,  --genome-fa-file        : Genome FASTA file
-   -s,  --sample-id             : Sample ID
-   Example:
-   e.g. vt genotype -b in.bam -i candidate.sites.vcf -o - -g ref.fa -s HG0001
-== Annotation ==
-=== Make Probes ===
-Populates the info field with REFPROBE, ALTPROBE and PLEN tags for genotyping.
-   Options:
-   -i,  --input-vcf <string>      : Input VCF file
-   -o,  --output-vcf <string>     : Output VCF file [-]
-   -g,  --genome-fa               : Genome FASTA file [/net/fantasia/home/atks/ref/genome/human.g1k.v37.fa]
-   -f,  --flank-length <integer>  : Minimum Flank Length [20]
-   Example:
-   e.g. vt make_probes -i 8904indels.dups.genotypes.vcf -o probes.sites.vcf -g ref.fa
-=== Compute Feature ===
-Compute feature of variant.
- vt compute_feature -i mills.vcf
-=== Compute Allele balance ===
-Compute [http://genome.sph.umich.edu/wiki/Genotype_Likelihood_Based_Allele_Balance allele balance].  Outputs allele balance, allele frequency, genotype frequency.
- vt compute_ab -i mills.vcf
-=== Compute Allele Frequency ===
-Compute [http://genome.sph.umich.edu/wiki/Genotype_Likelihood_based_Allele_Frequency allele frequency].  Outputs  allele frequency and genotype frequency.
- vt compute_af -i mills.vcf
-=== Compute Inbreeding Coefficient ===
-Compute [http://genome.sph.umich.edu/wiki/Genotype_Likelihood_based_Inbreeding_Coefficient inbreeding coefficient].  Outputs inbreeding coefficient based on genotype likelihoods.
- vt compute_fic -i mills.vcf
-=== Compute HWE ===
-Compute [http://genome.sph.umich.edu/wiki/Genotype_Likelihood_based_Hardy-Weinberg_Test Hardy-Weinberg equilibrium statistic].  Outputs  PHRED scaled HWE Test p-values for biallelic as well as multiallelic variants.
- vt compute_hwe -i mills.vcf
-=== Compute Mendelian Error ===
-Compute mendelian error  statistics.  Outputs  allele frequency and genotype frequency.
- vt compute_mendel -i mills.vcf
-=== Compute features ===
- vt compute_<feature1>_<feature2>_ ... _<feature n> -i mills.vcf
-== Modification ==
-=== Left Alignment ===
-[http://genome.sph.umich.edu/wiki/Variant_Normalization Left aligns] indel type variants in a VCF file.  This differs from normalization in that it only left aligns and left trims a variant.  This affects Indels only.
- vt left_align -i mills.vcf -o mills.leftaligned.vcf
 === Normalization ===
@@ Line 198: / Line 20: @@
 [http://genome.sph.umich.edu/wiki/Variant_Normalization Normalize] variants in a VCF file.
-  vt normalize mills.vcf -r seq.fa -o mills.normalized.vcf
+  vt normalize -i mills.vcf -o mills.normalized.vcf
 === Merge duplicate variants ===
@@ Line 213: / Line 35: @@
     e.g. vt merge_duplicate_variants -p -i 8904indels.dups.genotypes.vcf -o out.vcf
-== Profiling ==
-A standard procedure is as follows:
-  zcat dataset.vcf.gx | vt normalize -i - | vt merge_duplicate_variants -i - > dataset.normalized.vcf
-  cut -f1-8 dataset.normalized.vcf > dataset.sites.vcf
-  cat dataset.normalized.sites.vcf | vt profile_snps -i - > snps.summary.log
-=== Profile SNPs ===
-Profile SNPs.
-* ts/tv ratio
-* overlap analyses
- vt profile_snps -i mills.snps.sites.vcf
-=== Profile Indels ===
-Profile indels.
-* Overlap analyses with known data sets
-* FS/NFS annotation
- vt profile_indels mills.indels.sites.vcf
-=== Profile MNPs ===
-Profile MNPs.
- vt profile_mnps -i mills.mnps.sites.vcf
-=== Summarize Variants ===
-Summarizes variants present in VCF file.
- vt peek -i mills.vcf
-== Plotting ==
-=== Allele Frequency Spectrum ===
-Plots Allele Frequency Spectrum of variants found in VCF file
- vt plot_afs -i mills.xml
-=== Genotype Likelihood Concordance ===
-Plots Genotype Likelihood Concordance graph.
- vt plot_gl -i mills.xml
-=== Allele Balance Spectrum===
-Plots Allele Balance graph of variants in the VCF file.
- vt plot_ab -i mills.xml
-= VCF File Manipulation =
-=== Sort ===
-Sort variants according to contig lists in header.
- vt sort -i mills.sites.vcf
-=== Split by variant ===
-Split VCF files by variant type.
- vt split_by_variant -i mills.sites.vcf
-= Resource Files =
-dbSNP
-OMNI 1000G
-Mills
-HAPMAP
 = Maintained by =
 This page is maintained by  [mailto:atks@umich.edu Adrian]