Line 39: |
Line 39: |
| ##INFO=<ID=FIC,Number=1,Type=Float,Description="Genotype likelihood based Inbreeding Coefficient"> | | ##INFO=<ID=FIC,Number=1,Type=Float,Description="Genotype likelihood based Inbreeding Coefficient"> |
| ##INFO=<ID=AB,Number=1,Type=Float,Description="Genotype likelihood based Allele Balance"> | | ##INFO=<ID=AB,Number=1,Type=Float,Description="Genotype likelihood based Allele Balance"> |
− | ##FILTER=<ID=TPASS,Description="Temporary pass"> | + | ##FILTER=<ID=PASS,Description="Temporary pass"> |
| ##FILTER=<ID=overlap,Description="Overlapping variant"> | | ##FILTER=<ID=overlap,Description="Overlapping variant"> |
| | | |
Line 124: |
Line 124: |
| no. of observed variants : 720 | | no. of observed variants : 720 |
| | | |
− | The variants have filter labels TPASS meaning a temporary pass and overlap, meaning that the variants are overlapping with another variant, implying multiallelicity. | + | The variants have filter labels PASS meaning a temporary pass and overlap, meaning that the variants are overlapping with another variant, implying multiallelicity. |
| We can count the number of variants with the following commands. | | We can count the number of variants with the following commands. |
| | | |
− | vt peek all.genotypes.bcf -f "FILTER.TPASS" | + | vt peek all.genotypes.bcf -f "FILTER.PASS" |
| | | |
| stats: no. of samples : 62 | | stats: no. of samples : 62 |
Line 140: |
Line 140: |
| no. of chromosomes : 1 <br> | | no. of chromosomes : 1 <br> |
| no. Indels : 136 | | no. Indels : 136 |
− | 2 alleles (ins/del) : 136 (1.89) [89/47] #notice the difference insertion deletion ratios differences | + | 2 alleles (ins/del) : 136 (1.89) [89/47] #notice the difference in insertion deletion ratios |
| >=3 alleles (ins/del) : 0 (-nan) [0/0] | | >=3 alleles (ins/del) : 0 (-nan) [0/0] |
| | | |
| #passed singletons only | | #passed singletons only |
− | vt peek all.genotypes.bcf -f "FILTER.TPASS&&INFO.AC==1" | + | vt peek all.genotypes.bcf -f "FILTER.PASS&&INFO.AC==1" |
| | | |
| #passed indels of length 1 only | | #passed indels of length 1 only |
− | vt peek all.genotypes.bcf -f "FILTER.TPASS&&LEN==1" | + | vt peek all.genotypes.bcf -f "FILTER.PASS&&LEN==1" |
| | | |
| #passed indels of length >4 | | #passed indels of length >4 |
− | vt peek all.genotypes.bcf -f "FILTER.TPASS&&LEN>1" | + | vt peek all.genotypes.bcf -f "FILTER.PASS&&LEN>1" |
| | | |
| #passed singletons of length 4 or insertions of length 3 | | #passed singletons of length 4 or insertions of length 3 |
− | vt peek all.genotypes.bcf -f "FILTER.TPASS&&(LEN==4||DLEN==3)" | + | vt peek all.genotypes.bcf -f "FILTER.PASS&&(LEN==4||DLEN==3)" |
| | | |
| == Comparison with other data sets == | | == Comparison with other data sets == |
Line 159: |
Line 159: |
| It is usually useful to examine the call sets against known data sets for the passed variants. | | It is usually useful to examine the call sets against known data sets for the passed variants. |
| | | |
− | vt profile_indels -g /net/fantasia/home/atks/ref/vt/grch37/indel.reference.txt -r /net/fantasia/home/atks/ref/vt/grch37/hs37d5.fa run/final/all.genotypes.bcf -i 22:36000000-37000000 -f "PASS" | + | vt profile_indels -g indel.reference.txt -r hs37d5.fa all.genotypes.bcf -i 22:36000000-37000000 -f "PASS" |
| | | |
| data set | | data set |
Line 198: |
Line 198: |
| We perform the same analysis for the failed variants again, the relatively low overlap with known data sets imply a reasonable tradeoff in sensitivity and specificity. | | We perform the same analysis for the failed variants again, the relatively low overlap with known data sets imply a reasonable tradeoff in sensitivity and specificity. |
| | | |
− | vt profile_indels -g /net/fantasia/home/atks/ref/vt/grch37/indel.reference.txt -r /net/fantasia/home/atks/ref/vt/grch37/hs37d5.fa run/final/all.genotypes.bcf -i 22:36000000-37000000 -f "~PASS" | + | vt profile_indels -g indel.reference.txt -r hs37d5.fa all.genotypes.bcf -i 22:36000000-37000000 -f "~PASS" |
| | | |
| data set | | data set |
Line 321: |
Line 321: |
| To normalize and remove duplicate variants: | | To normalize and remove duplicate variants: |
| | | |
− | vt normalize mills.genotypes.bcf -r ~/ref/vt/grch37/hs37d5.fa | vt mergedups - -o mills.normalized.genotypes.bcf | + | vt normalize mills.genotypes.bcf -r hs37d5.fa | vt mergedups - -o mills.normalized.genotypes.bcf |
| | | |
| and you will observe that 3994 variants had to be left aligned and 1092 variants were removed. | | and you will observe that 3994 variants had to be left aligned and 1092 variants were removed. |