Changes

From Genome Analysis Wiki
Jump to navigationJump to search
3,649 bytes added ,  17:35, 15 June 2014
no edit summary
Line 1: Line 1:  
=Motivation=
 
=Motivation=
   −
This wiki page details some standard Indel analyses which hopefully can help the group in understanding the issues and perform the analyses quickly without reinventing the wheel.
+
This wiki page details some standard Indel analyses for the sequencing workshop in the example indel data set.
    
=Tools=
 
=Tools=
   −
You can download [[vt|vt]] and have some working knowledge of PERL to do stuff that vt does not support.
+
This walkthrough requires  [[vt|vt]].
    
=Analyses=
 
=Analyses=
 +
 +
The file generated from the indel calling is a binary version [[http://www.1000genomes.org/wiki/analysis/variant-call-format/bcf-binary-vcf-version-2 BCFv2.1]] of the Variant Call Format (VCF).  The binary version
 +
== Anatomy of file ==
 +
 +
You can access the header by running the command:
 +
 +
  vt view -H all.genotypes.bcf.
 +
 +
The header is as follows:
 +
 +
  ##fileformat=VCFv4.2
 +
  ##FILTER=<ID=PASS,Description="All filters passed">
 +
  ##contig=<ID=22,length=51304566>
 +
  ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
 +
  ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes">
 +
  ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Depth">
 +
  ##FORMAT=<ID=AD,Number=3,Type=Integer,Description="Allele Depth">
 +
  ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
 +
  ##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate Allele Counts">
 +
  ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total Number Allele Counts">
 +
  ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
 +
  ##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate Allele Frequency">
 +
  ##INFO=<ID=GC,Number=G,Type=Integer,Description="Genotype Counts">
 +
  ##INFO=<ID=GN,Number=1,Type=Integer,Description="Total Number of Genotypes Counts">
 +
  ##INFO=<ID=GF,Number=G,Type=Float,Description="Genotype Frequency">
 +
  ##INFO=<ID=HWEAF,Number=A,Type=Float,Description="Genotype likelihood based MLE Allele Frequency assuming HWE">
 +
  ##INFO=<ID=HWEGF,Number=G,Type=Float,Description="Genotype likelihood based MLE Genotype Frequency assuming HWE">
 +
  ##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Genotype likelihood based MLE Allele Frequency">
 +
  ##INFO=<ID=MLEGF,Number=G,Type=Float,Description="Genotype likelihood based MLE Genotype Frequency">
 +
  ##INFO=<ID=HWE_LLR,Number=1,Type=Float,Description="Genotype likelihood based Hardy Weinberg ln(Likelihood Ratio)">
 +
  ##INFO=<ID=HWE_LPVAL,Number=1,Type=Float,Description="Genotype likelihood based Hardy Weinberg Likelihood Ratio Test Statistic ln(p-value)">
 +
  ##INFO=<ID=HWE_DF,Number=1,Type=Integer,Description="Degrees of freedom for Genotype likelihood based Hardy Weinberg Likelihood Ratio Test Statistic">
 +
  ##INFO=<ID=FIC,Number=1,Type=Float,Description="Genotype likelihood based Inbreeding Coefficient">
 +
  ##INFO=<ID=AB,Number=1,Type=Float,Description="Genotype likelihood based Allele Balance">
 +
  ##FILTER=<ID=TPASS,Description="Temporary pass">
 +
  ##FILTER=<ID=overlap,Description="Overlapping variant">
    
==File Preparation==
 
==File Preparation==
Line 14: Line 50:     
To convert to BCF format which will work fast with vt:
 
To convert to BCF format which will work fast with vt:
 +
 +
22 36990877 . GGT G . TPASS AC=32;AN=116;AF=0.275862;GC=32,20,6;GN=58;GF=0.551724,0.344828,0.103448;NS=58;HWEAF=0.275797;HWEGF=0.52447,0.399466,0.0760642;MLEAF=0.27366;MLEGF=0.494275,0.464129,0.0415952;HWE_LLR=-0.453098;HWE_LPVAL=-1.0755;HWE_DF=1;FIC=-0.0718807;AB=0.6129 GT:PL:DP:AD:GQ 0/0:0,9,108:9:3,0,6:10
 +
22 36991203 . TGAG T . TPASS AC=5;AN=124;AF=0.0403226;GC=58,3,1;GN=62;GF=0.935484,0.0483871,0.016129;NS=62;HWEAF=0.0355594;HWEGF=0.930145,0.0685899,0.00126447;MLEAF=0.0353706;MLEGF=0.929259,0.0707412,5.94815e-11;HWE_LLR=-0.0443401;HWE_LPVAL=-0.266754;HWE_DF=1;FIC=-0.0109029;AB=0.562243 GT:PL:DP:AD:GQ 0/0:0,12,155:6:4,0,2:12
 +
22 36995311 . GA G . TPASS AC=61;AN=124;AF=0.491935;GC=21,21,20;GN=62;GF=0.33871,0.33871,0.322581;NS=62;HWEAF=0.492227;HWEGF=0.257834,0.499879,0.242287;MLEAF=0.492028;MLEGF=0.298019,0.419905,0.282076;HWE_LLR=-0.605122;HWE_LPVAL=-1.30459;HWE_DF=1;FIC=0.0444598;AB=0.53981 GT:PL:DP:AD:GQ 0/1:55,0,24:3:1,2,0:24
 +
22 36995329 . GA G . TPASS AC=2;AN=124;AF=0.016129;GC=60,2,0;GN=62;GF=0.967742,0.0322581,0;NS=62;HWEAF=0.0164696;HWEGF=0.967332,0.0323966,0.000271246;MLEAF=0.0165148;MLEGF=0.96697,0.0330296,7.28675e-44;HWE_LLR=-0.0171385;HWE_LPVAL=-0.158856;HWE_DF=1;FIC=-0.00275028;AB=0.339746 GT:PL:DP:AD:GQ 0/0:0,9,97:5:3,0,2:10
    
   vt view mills.vcf -o mills.bcf
 
   vt view mills.vcf -o mills.bcf
1,102

edits

Navigation menu