Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 5: Line 5:  
= Normalization =
 
= Normalization =
   −
[[Image:normalization_mnp.png|none|500px|This figure shows multiple representations of a CA tandem repeat. The left shows five possible representations differentiated by color. The right shows the corresponding representation in VCF. The last representation represents the left aligned and parsi- monious representation of the Indel.]]
+
Normalization of a variant representation is divided into 2 parts, parsimony and left alignment.
This figure shows multiple representations of a CA tandem repeat. The left shows five possible representations differentiated by color. The right shows the corresponding representation in VCF. The last representation represents the left aligned and parsimonious representation of the Indel.
      
== Parsimony ==
 
== Parsimony ==
   −
This refers to the representation of a variant in as few nucleotides as possible. For simple insertions, a representation with more than a single nucleotide from the reference sequence is not considered parsimonious.  For simple deletions, a representation of the alternate allele with more than a single reference sequence nucleotide is not considered parsimoniousFor the insertion in the figure above, the blue representation is not parsimonious because the second CA can be inferred from the reference sequenceThe maroon representation is parsimonious as it contains only a single base from the reference sequence.
+
[[Image:normalization_mnp.png|none|500px|This figure shows multiple representations of a MNP. The left shows 4 possible representations differentiated by color. The right shows the corresponding representation in VCF. The last representation represents the parsimonious representation of the MNP.]]
 +
This figure shows multiple representations of a MNP. The left shows 4 possible representations differentiated by color. The right shows the corresponding representation in VCF. The last representation represents the parsimonious representation of the MNP.
 +
 
 +
We would like to represent a variant in as few nucleotides as possibleTaking the example above, the MNP is represented superfluously for the first 3 representations and parsimoniously for the 4th representationWhen a variants has superfluous nucleotides on the left side, we refer that as a need to left trim and similarly for right trimming.
    
== Left alignment ==
 
== Left alignment ==
   −
Left alignment is usually a concept associated with Indels.  We define an indel to be left aligned when the variant can not be shifted to the left any further while ensuring that the indel represented is consistent and that no alleles are represented with an empty string (empty allele).  The orange representation is not left aligned while the blue representation is.
+
[[Image:normalization_str.png|none|500px|This figure shows multiple representations of a CA tandem repeat. The left shows five possible representations differentiated by color. The right shows the corresponding representation in VCF. The last representation represents the left aligned and parsimonious representation of the Indel.]]
 +
This figure shows multiple representations of a CA tandem repeat. The left shows five possible representations differentiated by color. The right shows the corresponding representation in VCF. The last representation represents the left aligned and parsimonious representation of the Indel.
 +
 
 +
Left alignment is usually a concept associated with Indels.  We define an indel to be left aligned when the variant can not be shifted to the left any further while ensuring that the indel represented is consistent and that no alleles are represented with an empty string (empty allele).  Similarly, an indel representation can be non parsimonious as shown in the figure.
    
=== How to observe that a variant is not left aligned or parsimonious on the right side? ===
 
=== How to observe that a variant is not left aligned or parsimonious on the right side? ===
Line 20: Line 25:  
If the ends of each allele is the same nucleotide, it is not left aligned or parsimonious on the right side.
 
If the ends of each allele is the same nucleotide, it is not left aligned or parsimonious on the right side.
   −
=== Proof of left alignment ===
+
=== Proof of left alignment property ===
    
Suppose an indel is already left aligned. In order to shift the variant to the right, we have to be able to truncate the first leftmost nucleotide in each allele without any loss of information (i.e. we can reconstruct the original alleles from the right aligned version of the variant given the reference genome). In order to guarantee this, the first leftmost nucleotide in each allele should be the same type of nucleotide (in other words, the same as the reference nucleotide).
 
Suppose an indel is already left aligned. In order to shift the variant to the right, we have to be able to truncate the first leftmost nucleotide in each allele without any loss of information (i.e. we can reconstruct the original alleles from the right aligned version of the variant given the reference genome). In order to guarantee this, the first leftmost nucleotide in each allele should be the same type of nucleotide (in other words, the same as the reference nucleotide).
1,102

edits

Navigation menu