Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 499: Line 499:  
- about a loss of 11% of variant thought distinct.
 
- about a loss of 11% of variant thought distinct.
   −
To normalize and remove duplicate variants:
+
For chromosome 22, there are 121 variants, but only 106 distinct Indels after normalization
 +
- about a loss of 12% of variant thought distinct.
   −
  ${GC}/bin/vt normalize ${VTREF}/mills_indels_hg19.sites.bcf -r ${VTREF}/hs37d5.fa  | ${GC}/bin/vt mergedups - -o ${OUT}/mills.normalized.genotypes.bcf
+
To normalize and remove duplicate variants for chromosome 22:
   −
and you will observe that 3994 variants had to be left aligned and 1092 variants were removed.
+
  ${GC}/bin/vt normalize  ${SS}/ref22/mills_indels_hg19.22.sites.bcf -r ${SS}/ref22/human.g1k.v37.chr22.fa | ${GC}/bin/vt mergedups - -o ${OUT}/mills.22.normalized.genotypes.bcf
   −
  stats: biallelic
+
and you will observe that 43 variants had to be left aligned and 15 variants were removed.
 +
 
 +
<pre>
 +
stats: biallelic
 
           no. left trimmed                      : 0
 
           no. left trimmed                      : 0
 
           no. left trimmed and left aligned    : 0
 
           no. left trimmed and left aligned    : 0
 
           no. left trimmed and right trimmed    : 0
 
           no. left trimmed and right trimmed    : 0
           no. left aligned                      : 3994
+
           no. left aligned                      : 43
           no. right trimmed                    : 0 <br>
+
           no. right trimmed                    : 0
        multiallelic
+
 
 +
      total no. biallelic normalized          : 43
 +
 
 +
      multiallelic
 
           no. left trimmed                      : 0
 
           no. left trimmed                      : 0
 
           no. left trimmed and left aligned    : 0
 
           no. left trimmed and left aligned    : 0
 
           no. left trimmed and right trimmed    : 0
 
           no. left trimmed and right trimmed    : 0
 
           no. left aligned                      : 0
 
           no. left aligned                      : 0
           no. right trimmed                    : 0 <br>
+
           no. right trimmed                    : 0
       no. variants observed                   : 9996 <br>
+
 
  <br>
+
      total no. multiallelic normalized        : 0
  stats: Total number of observed variants  9996
+
 
        Total number of unique variants    8904 <br>
+
      total no. variants normalized            : 43
 +
       total no. variants observed             : 121
 +
 
 +
 
 +
stats: Total number of observed variants  121
 +
      Total number of unique variants    106
 +
</pre>
   −
Let's look for a variant that was normalized.
+
Let's look at the last two variants that were normalized.
  ${GC}/bin/vt view ${OUT}/mills.normalized.genotypes.bcf | grep OLD_VARIANT |head -1
+
  ${GC}/bin/vt view ${OUT}/mills.22.normalized.genotypes.bcf | grep OLD_VARIANT |tail -2
    
Results:
 
Results:
*The position has changed - it was:
+
*The positions have changed - the were:
** 18293100 (as seen after OLD_VARIANT)
+
** 48831918 & 50012616 (as seen after OLD_VARIANT)
*Now it is
+
*Now they are:
** 18293097
+
** 48831915 & 50012614
 
[[File:indelNormalize.png|800px]]
 
[[File:indelNormalize.png|800px]]
       
UMICH's algorithm for normalization has been adopted by Petr Danecek in bcftools and is also used in GKNO.
 
UMICH's algorithm for normalization has been adopted by Petr Danecek in bcftools and is also used in GKNO.

Navigation menu