Changes

From Genome Analysis Wiki
Jump to navigationJump to search
3,294 bytes removed ,  18:12, 15 June 2014
Line 1: Line 1:  
=Motivation=
 
=Motivation=
   −
This wiki page details some standard Indel analyses for the sequencing workshop in the example indel data set.
+
This wiki page details some standard Indel analyses which hopefully can help the group in understanding the issues and perform the analyses quickly without reinventing the wheel.
    
=Tools=
 
=Tools=
Line 9: Line 9:  
=Analyses=
 
=Analyses=
   −
The file generated from the indel calling is a binary version [[http://www.1000genomes.org/wiki/analysis/variant-call-format/bcf-binary-vcf-version-2 BCFv2.1]] of the Variant Call Format (VCF).  The binary version
+
The file generated from the indel calling is a binary version [[http://www.1000genomes.org/wiki/analysis/variant-call-format/bcf-binary-vcf-version-2 BCFv2.1]] of the Variant Call Format (VCF).  BCFv2.1 is more efficient to process as the data is already stored in computer readable format on the hard disk. It is however not necessarily more compact than VCF4.2 especially when the format fields are rich in details.
== Anatomy of file ==
  −
 
  −
You can access the header by running the command:
  −
 
  −
  vt view -H all.genotypes.bcf.
  −
 
  −
The header is as follows:
  −
 
  −
  ##fileformat=VCFv4.2
  −
  ##FILTER=<ID=PASS,Description="All filters passed">
  −
  ##contig=<ID=22,length=51304566>
  −
  ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
  −
  ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes">
  −
  ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Depth">
  −
  ##FORMAT=<ID=AD,Number=3,Type=Integer,Description="Allele Depth">
  −
  ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
  −
  ##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate Allele Counts">
  −
  ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total Number Allele Counts">
  −
  ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
  −
  ##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate Allele Frequency">
  −
  ##INFO=<ID=GC,Number=G,Type=Integer,Description="Genotype Counts">
  −
  ##INFO=<ID=GN,Number=1,Type=Integer,Description="Total Number of Genotypes Counts">
  −
  ##INFO=<ID=GF,Number=G,Type=Float,Description="Genotype Frequency">
  −
  ##INFO=<ID=HWEAF,Number=A,Type=Float,Description="Genotype likelihood based MLE Allele Frequency assuming HWE">
  −
  ##INFO=<ID=HWEGF,Number=G,Type=Float,Description="Genotype likelihood based MLE Genotype Frequency assuming HWE">
  −
  ##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Genotype likelihood based MLE Allele Frequency">
  −
  ##INFO=<ID=MLEGF,Number=G,Type=Float,Description="Genotype likelihood based MLE Genotype Frequency">
  −
  ##INFO=<ID=HWE_LLR,Number=1,Type=Float,Description="Genotype likelihood based Hardy Weinberg ln(Likelihood Ratio)">
  −
  ##INFO=<ID=HWE_LPVAL,Number=1,Type=Float,Description="Genotype likelihood based Hardy Weinberg Likelihood Ratio Test Statistic ln(p-value)">
  −
  ##INFO=<ID=HWE_DF,Number=1,Type=Integer,Description="Degrees of freedom for Genotype likelihood based Hardy Weinberg Likelihood Ratio Test Statistic">
  −
  ##INFO=<ID=FIC,Number=1,Type=Float,Description="Genotype likelihood based Inbreeding Coefficient">
  −
  ##INFO=<ID=AB,Number=1,Type=Float,Description="Genotype likelihood based Allele Balance">
  −
  ##FILTER=<ID=TPASS,Description="Temporary pass">
  −
  ##FILTER=<ID=overlap,Description="Overlapping variant">
      
==File Preparation==
 
==File Preparation==
Line 51: Line 17:  
To convert to BCF format which will work fast with vt:
 
To convert to BCF format which will work fast with vt:
   −
22 36990877 . GGT G . TPASS AC=32;AN=116;AF=0.275862;GC=32,20,6;GN=58;GF=0.551724,0.344828,0.103448;NS=58;HWEAF=0.275797;HWEGF=0.52447,0.399466,0.0760642;MLEAF=0.27366;MLEGF=0.494275,0.464129,0.0415952;HWE_LLR=-0.453098;HWE_LPVAL=-1.0755;HWE_DF=1;FIC=-0.0718807;AB=0.6129 GT:PL:DP:AD:GQ 0/0:0,9,108:9:3,0,6:10
+
 
22 36991203 . TGAG T . TPASS AC=5;AN=124;AF=0.0403226;GC=58,3,1;GN=62;GF=0.935484,0.0483871,0.016129;NS=62;HWEAF=0.0355594;HWEGF=0.930145,0.0685899,0.00126447;MLEAF=0.0353706;MLEGF=0.929259,0.0707412,5.94815e-11;HWE_LLR=-0.0443401;HWE_LPVAL=-0.266754;HWE_DF=1;FIC=-0.0109029;AB=0.562243 GT:PL:DP:AD:GQ 0/0:0,12,155:6:4,0,2:12
  −
22 36995311 . GA G . TPASS AC=61;AN=124;AF=0.491935;GC=21,21,20;GN=62;GF=0.33871,0.33871,0.322581;NS=62;HWEAF=0.492227;HWEGF=0.257834,0.499879,0.242287;MLEAF=0.492028;MLEGF=0.298019,0.419905,0.282076;HWE_LLR=-0.605122;HWE_LPVAL=-1.30459;HWE_DF=1;FIC=0.0444598;AB=0.53981 GT:PL:DP:AD:GQ 0/1:55,0,24:3:1,2,0:24
  −
22 36995329 . GA G . TPASS AC=2;AN=124;AF=0.016129;GC=60,2,0;GN=62;GF=0.967742,0.0322581,0;NS=62;HWEAF=0.0164696;HWEGF=0.967332,0.0323966,0.000271246;MLEAF=0.0165148;MLEGF=0.96697,0.0330296,7.28675e-44;HWE_LLR=-0.0171385;HWE_LPVAL=-0.158856;HWE_DF=1;FIC=-0.00275028;AB=0.339746 GT:PL:DP:AD:GQ 0/0:0,9,97:5:3,0,2:10
      
   vt view mills.vcf -o mills.bcf
 
   vt view mills.vcf -o mills.bcf
1,102

edits

Navigation menu