Changes

From Genome Analysis Wiki
Jump to navigationJump to search
7,518 bytes removed ,  16:25, 21 October 2013
no edit summary
Line 1: Line 1:  
=== Introduction ===
 
=== Introduction ===
   −
vt is a tool set that calls, genotypes and filters short variants.  It provides profiling of variants to aid in QC.
+
vt is a variant tool set that discovers short variants from Next Generation Sequencing dataThe features are being rolled out to github as major rewriting is being undertaken.
 
      
=== Location ===
 
=== Location ===
   −
Internal usage
+
You may pull it from github:
 
  −
  binaries
  −
  /net/fantasia/home/atks/programs/vt
  −
 
  −
  test data
  −
  /net/fantasia/home/atks/programs/vt/test
  −
 
  −
  scripts
  −
  /net/fantasia/home/atks/programs/vt/scripts
  −
 
  −
External usage
     −
   download from sourceforge/github
+
   git clone https://github.com/atks/vt.git
   −
== Common options patterns ==
+
== Common options ==
   −
     -i defines the input file and by default, this is a require parameter,
+
     -i multiple intervals in <seq>:start-end format
      however, you may set it as '-' to accept STDIN which by default is
  −
      assumed to be a non compressed format
      
     -o defines the out file which and has the STDOUT set as the default.
 
     -o defines the out file which and has the STDOUT set as the default.
       You may modify the STDOUT to output the binary version of the format,
+
       You may modify the STDOUT to output the binary version of the format.
      e.g. BCF. with the option -c
  −
 
  −
== Major Workflows ==
  −
 
  −
=== Discovery ===
  −
 
  −
Discovery is performed at per sample level, the evidence sites lists for each sample are then merged and site discovery statistics are computed.
  −
The user then makes a decision on cut offs to make to create an initial candidate site list.
  −
 
  −
Generates site list with info fields E and N.
  −
 
  −
vt discover -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa
  −
 
  −
Normalize(including left aligning) variants.  This is required as left alignment of insertions and/or deletions within a read is sometimes insufficient to ensure complete left alignment.
  −
 
  −
vt normalize -i NA12878.bam -o NA12878.normalized.sites.vcf -g hs37d5.fa
  −
 
  −
Evidence site lists are combined across samples and split by sites to allow for parallelization.
  −
 
  −
vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -l 5000
  −
 
  −
Discovery statistics are computed.  These statistics will allow you to choose a suitable cut off for creating a suitable candidate site list.
  −
 
  −
vt compute_discovery_stats -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf
  −
 
  −
Merge site lists.
  −
 
  −
vt merge -i 1000000.sites.vcf,2000000.sites.vcf,3000000.sites.vcf -o candidate.sites.vcf
  −
 
  −
Plot charts to help with candidate list selection criteria.
  −
 
  −
vt plot_discovery -i candidate.sites.vcf
  −
 
  −
 
  −
A calling pipeline implemented in a make file is available here.
  −
 
  −
=== Genotyping ===
  −
 
  −
Each individual is genotyped at a set of sites.
  −
 
  −
vt genotype -i NA12878.bam -o NA12878.sites.vcf -g hs37d5.fa
  −
 
  −
Genotype sample VCFs are combined across samples and split by sites.
  −
 
  −
vt merge_and_split_sample_vcf -i NA12878.sites.vcf,NA12879.sites.vcf,NA12880.sites.vcf -o 1-1000000.sites.vcf
  −
 
  −
Features are computed.
  −
 
  −
vt compute_features -i 1-1000000.sites.vcf -o 1-1000000.annotated.sites.vcf
  −
 
  −
A  genotyping pipeline implemented in a make file is available here.
  −
 
  −
=== Filtering ===
  −
 
  −
Requires a set of features AND an installed copy of SVMLight.
  −
 
  −
vt filter NA12878.bam -i NA12878.sites.vcf -o NA12878.svm.sites.vcf --pos positive.sites.vcf --neg negative.sites.vcf
  −
 
  −
A filtering pipeline implemented in a make file is available here.
  −
 
  −
== Generation ==
  −
 
  −
=== Discovery ===
  −
 
  −
Discovers variants from bams.
  −
 
  −
  Options:
  −
  -b,  --input-bam-file    : Input BAM file
  −
  -o,  --output-vcf-file  : Output VCF file
  −
  -v,  --variant-type      : Variant Types, takes on any combinations of
  −
                              the values snps,mnps,indels comma delimited
  −
                              [snps,mnps,indels]
  −
  -q,  --q-cutoff          : BASE Cutoff, only bases with
  −
                              QUAL/BAQ >= baseq are considered [13]
  −
  -m,  --mapq-cutoff      : MAPQ Cutoff, only alignments with
  −
                              map quality >= mapq are considered [20]
  −
  -g,  --genome-fa-file    : Genome FASTA file
  −
  -s,  --sample-id        : Sample ID
  −
 
  −
  Example:
  −
  e.g. vt discover -b in.bam -o - -g ref.fa -v snps,indels -s HG0001
  −
  e.g. bam mergeBam --in a.bam --in b.bam -o - |
  −
        vt discover -b - -o out.sites.vcf -g ref.fa -v all -s HG0001 |
  −
        vt left_align -i - | vt merge_duplicate_variants
  −
 
  −
=== Genotyping ===
  −
 
  −
Genotypes variants for each sample.
  −
 
  −
  Options:
  −
  -b,  --input-bam-file        : Input BAM file
  −
  -i,  --input-candidate-vcf  : Input Candidate VCF file
  −
  -o,  --output-vcf-file      : Output VCF file
  −
  -v,  --variant-type          : Variant Types, takes on any combinations
  −
                                  of the values snps,mnps,indels comma
  −
                                  delimited [snps,mnps,indels]
  −
  -g,  --genome-fa-file        : Genome FASTA file
  −
  -s,  --sample-id            : Sample ID
  −
 
  −
  Example:
  −
  e.g. vt genotype -b in.bam -i candidate.sites.vcf -o - -g ref.fa -s HG0001
  −
 
  −
== Annotation ==
  −
 
  −
=== Make Probes ===
  −
 
  −
Populates the info field with REFPROBE, ALTPROBE and PLEN tags for genotyping.
  −
 
  −
  Options:
  −
  -i,  --input-vcf <string>      : Input VCF file
  −
  -o,  --output-vcf <string>    : Output VCF file [-]
  −
  -g,  --genome-fa              : Genome FASTA file [/net/fantasia/home/atks/ref/genome/human.g1k.v37.fa]
  −
  -f,  --flank-length <integer>  : Minimum Flank Length [20]
  −
 
  −
  Example:
  −
  e.g. vt make_probes -i 8904indels.dups.genotypes.vcf -o probes.sites.vcf -g ref.fa
  −
 
  −
=== Compute Feature ===
  −
 
  −
Compute feature of variant.
  −
 
  −
vt compute_feature -i mills.vcf
  −
 
  −
=== Compute Allele balance ===
  −
 
  −
Compute [http://genome.sph.umich.edu/wiki/Genotype_Likelihood_Based_Allele_Balance allele balance].  Outputs allele balance, allele frequency, genotype frequency.
  −
 
  −
vt compute_ab -i mills.vcf
  −
 
  −
=== Compute Allele Frequency ===
  −
 
  −
Compute [http://genome.sph.umich.edu/wiki/Genotype_Likelihood_based_Allele_Frequency allele frequency].  Outputs  allele frequency and genotype frequency.
  −
 
  −
vt compute_af -i mills.vcf
  −
 
  −
=== Compute Inbreeding Coefficient ===
  −
 
  −
Compute [http://genome.sph.umich.edu/wiki/Genotype_Likelihood_based_Inbreeding_Coefficient inbreeding coefficient].  Outputs inbreeding coefficient based on genotype likelihoods.
  −
 
  −
vt compute_fic -i mills.vcf
  −
 
  −
=== Compute HWE ===
  −
 
  −
Compute [http://genome.sph.umich.edu/wiki/Genotype_Likelihood_based_Hardy-Weinberg_Test Hardy-Weinberg equilibrium statistic].  Outputs  PHRED scaled HWE Test p-values for biallelic as well as multiallelic variants.
  −
 
  −
vt compute_hwe -i mills.vcf
  −
 
  −
=== Compute Mendelian Error ===
  −
 
  −
Compute mendelian error  statistics.  Outputs  allele frequency and genotype frequency.
  −
 
  −
vt compute_mendel -i mills.vcf
  −
 
  −
=== Compute features ===
  −
 
  −
vt compute_<feature1>_<feature2>_ ... _<feature n> -i mills.vcf
  −
 
  −
== Modification ==
  −
 
  −
=== Left Alignment ===
  −
 
  −
[http://genome.sph.umich.edu/wiki/Variant_Normalization Left aligns] indel type variants in a VCF file.  This differs from normalization in that it only left aligns and left trims a variant.  This affects Indels only.
  −
 
  −
vt left_align -i mills.vcf -o mills.leftaligned.vcf
      
=== Normalization ===
 
=== Normalization ===
Line 198: Line 20:  
[http://genome.sph.umich.edu/wiki/Variant_Normalization Normalize] variants in a VCF file.
 
[http://genome.sph.umich.edu/wiki/Variant_Normalization Normalize] variants in a VCF file.
   −
  vt normalize mills.vcf -r seq.fa -o mills.normalized.vcf
+
  vt normalize -i mills.vcf -o mills.normalized.vcf
    
=== Merge duplicate variants ===
 
=== Merge duplicate variants ===
Line 213: Line 35:  
   e.g. vt merge_duplicate_variants -p -i 8904indels.dups.genotypes.vcf -o out.vcf
 
   e.g. vt merge_duplicate_variants -p -i 8904indels.dups.genotypes.vcf -o out.vcf
   −
== Profiling ==
  −
  −
A standard procedure is as follows:
  −
  −
  zcat dataset.vcf.gx | vt normalize -i - | vt merge_duplicate_variants -i - > dataset.normalized.vcf
  −
  −
  cut -f1-8 dataset.normalized.vcf > dataset.sites.vcf
  −
  −
  cat dataset.normalized.sites.vcf | vt profile_snps -i - > snps.summary.log
  −
  −
  −
  −
=== Profile SNPs ===
  −
  −
Profile SNPs.
  −
  −
* ts/tv ratio
  −
* overlap analyses
  −
  −
vt profile_snps -i mills.snps.sites.vcf
  −
  −
=== Profile Indels ===
  −
  −
Profile indels.
  −
  −
* Overlap analyses with known data sets
  −
* FS/NFS annotation
  −
  −
vt profile_indels mills.indels.sites.vcf
  −
  −
=== Profile MNPs ===
  −
  −
Profile MNPs.
  −
  −
vt profile_mnps -i mills.mnps.sites.vcf
  −
  −
=== Summarize Variants ===
  −
  −
Summarizes variants present in VCF file.
  −
  −
vt peek -i mills.vcf
  −
  −
== Plotting ==
  −
  −
=== Allele Frequency Spectrum ===
  −
  −
Plots Allele Frequency Spectrum of variants found in VCF file
  −
  −
vt plot_afs -i mills.xml
  −
  −
=== Genotype Likelihood Concordance ===
  −
  −
Plots Genotype Likelihood Concordance graph.
  −
  −
vt plot_gl -i mills.xml
  −
  −
=== Allele Balance Spectrum===
  −
  −
Plots Allele Balance graph of variants in the VCF file.
  −
  −
vt plot_ab -i mills.xml
  −
  −
= VCF File Manipulation =
  −
  −
=== Sort ===
  −
  −
Sort variants according to contig lists in header.
  −
  −
vt sort -i mills.sites.vcf
  −
  −
=== Split by variant ===
  −
  −
Split VCF files by variant type.
  −
  −
vt split_by_variant -i mills.sites.vcf
  −
  −
= Resource Files =
  −
  −
dbSNP
  −
OMNI 1000G
  −
Mills
  −
HAPMAP
      
= Maintained by =
 
= Maintained by =
    
This page is maintained by  [mailto:atks@umich.edu Adrian]
 
This page is maintained by  [mailto:atks@umich.edu Adrian]
1,102

edits

Navigation menu