Line 1: |
Line 1: |
− | =Motivation= | + | =Introduction= |
| | | |
− | This wiki page details some standard Indel analyses which hopefully can help the group in understanding the issues and perform the analyses quickly without reinventing the wheel. | + | This wiki page details some standard Indel analyses for the sequencing workshop in the example indel data set. |
| | | |
| + | =Viewing the BCF file= |
| | | |
| + | The file generated from the indel calling is a binary version [[http://www.1000genomes.org/wiki/analysis/variant-call-format/bcf-binary-vcf-version-2 BCFv2.1]] of the Variant Call Format (VCF). BCFv2.1 is more efficient to process as the data is already stored in computer readable format on the hard disk. It is however not necessarily more compact than VCF4.2 especially when the format fields are rich in details. |
| | | |
− | This wiki page details some standard Indel analyses for the sequencing workshop in the example indel data set.
| + | ==Header== |
− | | |
− | == Anatomy of all.genotypes.bcf == | |
| | | |
| You can access the header by running the command: | | You can access the header by running the command: |
Line 13: |
Line 13: |
| vt view -H all.genotypes.bcf. | | vt view -H all.genotypes.bcf. |
| | | |
− |
| |
− | The file generated from the indel calling is a binary version [[http://www.1000genomes.org/wiki/analysis/variant-call-format/bcf-binary-vcf-version-2 BCFv2.1]] of the Variant Call Format (VCF). BCFv2.1 is more efficient to process as the data is already stored in computer readable format on the hard disk. It is however not necessarily more compact than VCF4.2 especially when the format fields are rich in details.
| |
| The header is as follows: | | The header is as follows: |
| | | |
Line 44: |
Line 42: |
| ##FILTER=<ID=overlap,Description="Overlapping variant"> | | ##FILTER=<ID=overlap,Description="Overlapping variant"> |
| | | |
− | === body === | + | ==Records== |
| + | |
| + | To view the records: |
| | | |
− | To view some of the records:
| + | vt view all.genotypes.bcf. |
| | | |
| 22 36990877 . GGT G . TPASS AC=32;AN=116;AF=0.275862;GC=32,20,6;GN=58;GF=0.551724,0.344828,0.103448;NS=58; | | 22 36990877 . GGT G . TPASS AC=32;AN=116;AF=0.275862;GC=32,20,6;GN=58;GF=0.551724,0.344828,0.103448;NS=58; |
Line 52: |
Line 52: |
| MLEGF=0.494275,0.464129,0.0415952;HWE_LLR=-0.453098;HWE_LPVAL=-1.0755;HWE_DF=1;FIC=-0.0718807;AB=0.6129 | | MLEGF=0.494275,0.464129,0.0415952;HWE_LLR=-0.453098;HWE_LPVAL=-1.0755;HWE_DF=1;FIC=-0.0718807;AB=0.6129 |
| GT:PL:DP:AD:GQ 0/0:0,9,108:9:3,0,6:10 | | GT:PL:DP:AD:GQ 0/0:0,9,108:9:3,0,6:10 |
− | 22 36991203 . TGAG T . TPASS AC=5;AN=124;AF=0.0403226;GC=58,3,1;GN=62;GF=0.935484,0.0483871,0.016129;NS=62;HWEAF=0.0355594;HWEGF=0.930145,0.0685899,0.00126447;MLEAF=0.0353706;MLEGF=0.929259,0.0707412,5.94815e-11;HWE_LLR=-0.0443401;HWE_LPVAL=-0.266754;HWE_DF=1;FIC=-0.0109029;AB=0.562243 GT:PL:DP:AD:GQ 0/0:0,12,155:6:4,0,2:12 | + | 22 36991203 . TGAG T . TPASS AC=5;AN=124;AF=0.0403226;GC=58,3,1;GN=62;GF=0.935484,0.0483871,0.016129;NS=62;HWEAF=0.0355594;HWEGF=0.930145,0.0685899,0.00126447;MLEAF=0.0353706;MLEGF=0.929259,0.0707412,5.94815e-11;HWE_LLR=-0.0443401;HWE_LPVAL=-0.266754;HWE_DF=1;FIC=-0.0109029;AB=0.562243 |
− | 22 36995311 . GA G . TPASS AC=61;AN=124;AF=0.491935;GC=21,21,20;GN=62;GF=0.33871,0.33871,0.322581;NS=62;HWEAF=0.492227;HWEGF=0.257834,0.499879,0.242287;MLEAF=0.492028;MLEGF=0.298019,0.419905,0.282076;HWE_LLR=-0.605122;HWE_LPVAL=-1.30459;HWE_DF=1;FIC=0.0444598;AB=0.53981 GT:PL:DP:AD:GQ 0/1:55,0,24:3:1,2,0:24 | + | GT:PL:DP:AD:GQ 0/0:0,12,155:6:4,0,2:12 |
− | 22 36995329 . GA G . TPASS AC=2;AN=124;AF=0.016129;GC=60,2,0;GN=62;GF=0.967742,0.0322581,0;NS=62;HWEAF=0.0164696;HWEGF=0.967332,0.0323966,0.000271246;MLEAF=0.0165148;MLEGF=0.96697,0.0330296,7.28675e-44;HWE_LLR=-0.0171385;HWE_LPVAL=-0.158856;HWE_DF=1;FIC=-0.00275028;AB=0.339746 GT:PL:DP:AD:GQ 0/0:0,9,97:5:3,0,2:10
| + | |
| | | |
| =Tools= | | =Tools= |