Line 1: |
Line 1: |
| =Motivation= | | =Motivation= |
| | | |
− | This wiki page details some standard Indel analyses which hopefully can help the group in understanding the issues and perform the analyses quickly without reinventing the wheel. | + | This wiki page details some standard Indel analyses for the sequencing workshop in the example indel data set. |
| | | |
| =Tools= | | =Tools= |
| | | |
− | You can download [[vt|vt]] and have some working knowledge of PERL to do stuff that vt does not support.
| + | This walkthrough requires [[vt|vt]]. |
| | | |
| =Analyses= | | =Analyses= |
| + | |
| + | The file generated from the indel calling is a binary version [[http://www.1000genomes.org/wiki/analysis/variant-call-format/bcf-binary-vcf-version-2 BCFv2.1]] of the Variant Call Format (VCF). The binary version |
| + | == Anatomy of file == |
| + | |
| + | You can access the header by running the command: |
| + | |
| + | vt view -H all.genotypes.bcf. |
| + | |
| + | The header is as follows: |
| + | |
| + | ##fileformat=VCFv4.2 |
| + | ##FILTER=<ID=PASS,Description="All filters passed"> |
| + | ##contig=<ID=22,length=51304566> |
| + | ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> |
| + | ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes"> |
| + | ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Depth"> |
| + | ##FORMAT=<ID=AD,Number=3,Type=Integer,Description="Allele Depth"> |
| + | ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> |
| + | ##INFO=<ID=AC,Number=A,Type=Integer,Description="Alternate Allele Counts"> |
| + | ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total Number Allele Counts"> |
| + | ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"> |
| + | ##INFO=<ID=AF,Number=A,Type=Float,Description="Alternate Allele Frequency"> |
| + | ##INFO=<ID=GC,Number=G,Type=Integer,Description="Genotype Counts"> |
| + | ##INFO=<ID=GN,Number=1,Type=Integer,Description="Total Number of Genotypes Counts"> |
| + | ##INFO=<ID=GF,Number=G,Type=Float,Description="Genotype Frequency"> |
| + | ##INFO=<ID=HWEAF,Number=A,Type=Float,Description="Genotype likelihood based MLE Allele Frequency assuming HWE"> |
| + | ##INFO=<ID=HWEGF,Number=G,Type=Float,Description="Genotype likelihood based MLE Genotype Frequency assuming HWE"> |
| + | ##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Genotype likelihood based MLE Allele Frequency"> |
| + | ##INFO=<ID=MLEGF,Number=G,Type=Float,Description="Genotype likelihood based MLE Genotype Frequency"> |
| + | ##INFO=<ID=HWE_LLR,Number=1,Type=Float,Description="Genotype likelihood based Hardy Weinberg ln(Likelihood Ratio)"> |
| + | ##INFO=<ID=HWE_LPVAL,Number=1,Type=Float,Description="Genotype likelihood based Hardy Weinberg Likelihood Ratio Test Statistic ln(p-value)"> |
| + | ##INFO=<ID=HWE_DF,Number=1,Type=Integer,Description="Degrees of freedom for Genotype likelihood based Hardy Weinberg Likelihood Ratio Test Statistic"> |
| + | ##INFO=<ID=FIC,Number=1,Type=Float,Description="Genotype likelihood based Inbreeding Coefficient"> |
| + | ##INFO=<ID=AB,Number=1,Type=Float,Description="Genotype likelihood based Allele Balance"> |
| + | ##FILTER=<ID=TPASS,Description="Temporary pass"> |
| + | ##FILTER=<ID=overlap,Description="Overlapping variant"> |
| | | |
| ==File Preparation== | | ==File Preparation== |
Line 14: |
Line 50: |
| | | |
| To convert to BCF format which will work fast with vt: | | To convert to BCF format which will work fast with vt: |
| + | |
| + | 22 36990877 . GGT G . TPASS AC=32;AN=116;AF=0.275862;GC=32,20,6;GN=58;GF=0.551724,0.344828,0.103448;NS=58;HWEAF=0.275797;HWEGF=0.52447,0.399466,0.0760642;MLEAF=0.27366;MLEGF=0.494275,0.464129,0.0415952;HWE_LLR=-0.453098;HWE_LPVAL=-1.0755;HWE_DF=1;FIC=-0.0718807;AB=0.6129 GT:PL:DP:AD:GQ 0/0:0,9,108:9:3,0,6:10 |
| + | 22 36991203 . TGAG T . TPASS AC=5;AN=124;AF=0.0403226;GC=58,3,1;GN=62;GF=0.935484,0.0483871,0.016129;NS=62;HWEAF=0.0355594;HWEGF=0.930145,0.0685899,0.00126447;MLEAF=0.0353706;MLEGF=0.929259,0.0707412,5.94815e-11;HWE_LLR=-0.0443401;HWE_LPVAL=-0.266754;HWE_DF=1;FIC=-0.0109029;AB=0.562243 GT:PL:DP:AD:GQ 0/0:0,12,155:6:4,0,2:12 |
| + | 22 36995311 . GA G . TPASS AC=61;AN=124;AF=0.491935;GC=21,21,20;GN=62;GF=0.33871,0.33871,0.322581;NS=62;HWEAF=0.492227;HWEGF=0.257834,0.499879,0.242287;MLEAF=0.492028;MLEGF=0.298019,0.419905,0.282076;HWE_LLR=-0.605122;HWE_LPVAL=-1.30459;HWE_DF=1;FIC=0.0444598;AB=0.53981 GT:PL:DP:AD:GQ 0/1:55,0,24:3:1,2,0:24 |
| + | 22 36995329 . GA G . TPASS AC=2;AN=124;AF=0.016129;GC=60,2,0;GN=62;GF=0.967742,0.0322581,0;NS=62;HWEAF=0.0164696;HWEGF=0.967332,0.0323966,0.000271246;MLEAF=0.0165148;MLEGF=0.96697,0.0330296,7.28675e-44;HWE_LLR=-0.0171385;HWE_LPVAL=-0.158856;HWE_DF=1;FIC=-0.00275028;AB=0.339746 GT:PL:DP:AD:GQ 0/0:0,9,97:5:3,0,2:10 |
| | | |
| vt view mills.vcf -o mills.bcf | | vt view mills.vcf -o mills.bcf |