Line 499: |
Line 499: |
| - about a loss of 11% of variant thought distinct. | | - about a loss of 11% of variant thought distinct. |
| | | |
− | To normalize and remove duplicate variants:
| + | For chromosome 22, there are 121 variants, but only 106 distinct Indels after normalization |
| + | - about a loss of 12% of variant thought distinct. |
| | | |
− | ${GC}/bin/vt normalize ${VTREF}/mills_indels_hg19.sites.bcf -r ${VTREF}/hs37d5.fa | ${GC}/bin/vt mergedups - -o ${OUT}/mills.normalized.genotypes.bcf
| + | To normalize and remove duplicate variants for chromosome 22: |
| | | |
− | and you will observe that 3994 variants had to be left aligned and 1092 variants were removed.
| + | ${GC}/bin/vt normalize ${SS}/ref22/mills_indels_hg19.22.sites.bcf -r ${SS}/ref22/human.g1k.v37.chr22.fa | ${GC}/bin/vt mergedups - -o ${OUT}/mills.22.normalized.genotypes.bcf |
| | | |
− | stats: biallelic
| + | and you will observe that 43 variants had to be left aligned and 15 variants were removed. |
| + | |
| + | <pre> |
| + | stats: biallelic |
| no. left trimmed : 0 | | no. left trimmed : 0 |
| no. left trimmed and left aligned : 0 | | no. left trimmed and left aligned : 0 |
| no. left trimmed and right trimmed : 0 | | no. left trimmed and right trimmed : 0 |
− | no. left aligned : 3994 | + | no. left aligned : 43 |
− | no. right trimmed : 0 <br> | + | no. right trimmed : 0 |
− | multiallelic
| + | |
| + | total no. biallelic normalized : 43 |
| + | |
| + | multiallelic |
| no. left trimmed : 0 | | no. left trimmed : 0 |
| no. left trimmed and left aligned : 0 | | no. left trimmed and left aligned : 0 |
| no. left trimmed and right trimmed : 0 | | no. left trimmed and right trimmed : 0 |
| no. left aligned : 0 | | no. left aligned : 0 |
− | no. right trimmed : 0 <br> | + | no. right trimmed : 0 |
− | no. variants observed : 9996 <br> | + | |
− | <br>
| + | total no. multiallelic normalized : 0 |
− | stats: Total number of observed variants 9996
| + | |
− | Total number of unique variants 8904 <br>
| + | total no. variants normalized : 43 |
| + | total no. variants observed : 121 |
| + | |
| + | |
| + | stats: Total number of observed variants 121 |
| + | Total number of unique variants 106 |
| + | </pre> |
| | | |
− | Let's look for a variant that was normalized. | + | Let's look at the last two variants that were normalized. |
− | ${GC}/bin/vt view ${OUT}/mills.normalized.genotypes.bcf | grep OLD_VARIANT |head -1 | + | ${GC}/bin/vt view ${OUT}/mills.22.normalized.genotypes.bcf | grep OLD_VARIANT |tail -2 |
| | | |
| Results: | | Results: |
− | *The position has changed - it was: | + | *The positions have changed - the were: |
− | ** 18293100 (as seen after OLD_VARIANT) | + | ** 48831918 & 50012616 (as seen after OLD_VARIANT) |
− | *Now it is | + | *Now they are: |
− | ** 18293097 | + | ** 48831915 & 50012614 |
| [[File:indelNormalize.png|800px]] | | [[File:indelNormalize.png|800px]] |
| | | |
| | | |
| UMICH's algorithm for normalization has been adopted by Petr Danecek in bcftools and is also used in GKNO. | | UMICH's algorithm for normalization has been adopted by Petr Danecek in bcftools and is also used in GKNO. |