From Genome Analysis Wiki
Jump to navigationJump to search
359 bytes added
, 13:52, 17 June 2014
Line 434: |
Line 434: |
| To normalize and remove duplicate variants: | | To normalize and remove duplicate variants: |
| | | |
− | ${GC}/bin/vt normalize mills.genotypes.bcf -r hs37d5.fa | ${GC}/bin/vt mergedups - -o mills.normalized.genotypes.bcf | + | ${GC}/bin/vt normalize ${VTREF}/mills_indels_hg19.sites.bcf -r ${VTREF}/hs37d5.fa | ${GC}/bin/vt mergedups - -o ${OUT}/mills.normalized.genotypes.bcf |
| | | |
| and you will observe that 3994 variants had to be left aligned and 1092 variants were removed. | | and you will observe that 3994 variants had to be left aligned and 1092 variants were removed. |
Line 455: |
Line 455: |
| Total number of unique variants 8904 <br> | | Total number of unique variants 8904 <br> |
| | | |
| + | Let's look for a variant that was normalized. |
| + | ${GC}/bin/vt view ${OUT}/mills.normalized.genotypes.bcf | grep OLD_VARIANT |head -1 |
| + | |
| + | Results: |
| + | 1 18293097 . T TCTC . PASS VC=INDEL;AC=263;AF=0.84;AN=314;OLD_VARIANT=1:18293100:C/CCTC |
| + | |
| + | The position has changed - it was: |
| + | * 18293100 (as seen after OLD_VARIANT) |
| + | Now it is |
| + | * 18293097 |
| | | |
| UMICH's algorithm for normalization has been adopted by Petr Danecek in bcftools and is also used in GKNO. | | UMICH's algorithm for normalization has been adopted by Petr Danecek in bcftools and is also used in GKNO. |