Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 104: Line 104:     
Thus A and B have to be at the same position and have the same length and variant normalization is unique.
 
Thus A and B have to be at the same position and have the same length and variant normalization is unique.
  −
= Here is an example where this normalization algorithm fails =
  −
  −
We distinguish the concepts of normalization and decomposition/reconstruction of variants as follows:
  −
  −
  −
Normalization involves reducing representations of a variant to a canonical representation. Normalization can be applied to biallelic variants or multiallelic variants. The problem of normalization is solvable and there exists a unique representation that is left aligned and parsimonious. Mathematical proof is published. [http://bioinformatics.oxfordjournals.org/content/early/2015/03/22/bioinformatics.btv112]
  −
  −
  −
Decomposition of variants involves the breaking down of a variant record into multiple records. It may be done vertically - as in multiallelics becoming biallelics or it can be done horizontally - a cluster of indels and SNPs represented as a complex variant being splitted up into several records. Horizontal decompositions in general do not have a unique solution.  Similarly, reconstruction combines several variant records into a single record.
  −
  −
  −
If your example contains the decomposition or reconstruction of variants, then it is probable that you can find inconsistencies. 
  −
  −
  −
It is important to distinguish the difference between normalization and decomposition/reconstruction.  The notion of normalization implies that a variant can be reduced to a standardized form.  If you were to include decomposition and reconstruction in your notion of normalization, you are  bound to find inconsistencies simply due to the inherent issues of identifiability.
  −
  −
  −
[https://github.com/atks/vt/issues/16 An example of inconsistent variant representation due to using vt normalize]
      
= Implementation =
 
= Implementation =
Line 236: Line 217:  
*GATK v3.1-1-g07a4bf8
 
*GATK v3.1-1-g07a4bf8
 
*vt normalize v0.5
 
*vt normalize v0.5
 +
 +
= Here is an example where this normalization algorithm fails =
 +
 +
We distinguish the concepts of normalization and decomposition/reconstruction of variants as follows:
 +
 +
 +
Normalization involves reducing representations of a variant to a canonical representation. Normalization can be applied to biallelic variants or multiallelic variants. The problem of normalization is solvable and there exists a unique representation that is left aligned and parsimonious. Mathematical proof is published. [http://bioinformatics.oxfordjournals.org/content/suppl/2015/02/19/btv112.DC1/VtNormApplicationNote_supp_20141113_1346.pdf]
 +
 +
 +
Decomposition of variants involves the breaking down of a variant record into multiple records. It may be done vertically - as in multiallelics becoming biallelics or it can be done horizontally - a cluster of indels and SNPs represented as a complex variant being splitted up into several records. Horizontal decompositions in general do not have a unique solution.  Similarly, reconstruction combines several variant records into a single record and can be done vertically and horizontally too. Vertical decomposition of a multiallelic variant to a set of biallelic records is a many to one function.  Construction of a set of biallelic variants into a multiallelic record is not unique as you need to considered all possible permutations of the haplotypes containing your alleles. 
 +
 +
 +
If your example contains the decomposition or reconstruction of variants, then it is probable that you can find inconsistencies. 
 +
 +
 +
It is important to distinguish the difference between normalization and decomposition/reconstruction.  The notion of normalization implies that a variant can be reduced to a standardized form.  If you were to include decomposition and reconstruction in your notion of normalization, you are  bound to find inconsistencies simply due to the inherent issues of identifiability. 
 +
 +
 +
When performing decomposition and construction, I think the following factors should be considered:
 +
 +
* Are your variants describing just a single individual or a population?
 +
* Are the genotypes (if any) in your individual(s) phased?
 +
 +
Depending on the context, you will obtain different answers.
 +
 +
[https://github.com/atks/vt/issues/16 An example of inconsistent variant representation due to using vt normalize]
    
= Citation =
 
= Citation =
   −
[http://bioinformatics.oxfordjournals.org/content/early/2015/02/19/bioinformatics.btv112.abstract?keytype=ref&ijkey=2kB1TkBGzkoP1gd Adrian Tan, Gonçalo R. Abecasis and Hyun Min Kang. (2015)  Unified Representation of Genetic Variants. Bioinformatics. doi: 10.1093/bioinformatics/btv112 ]
+
[http://bioinformatics.oxfordjournals.org/content/31/13/2202 Adrian Tan, Gonçalo R. Abecasis and Hyun Min Kang. (2015)  Unified Representation of Genetic Variants. Bioinformatics.]
 +
 
 +
= Translations =
 +
 
 +
A mandarin translation can be found [http://www.lyon0804.com/fan-yi-variant-normalization.html here]
    
= Maintained by =
 
= Maintained by =
    
This page is maintained by  [mailto:atks@umich.edu Adrian].
 
This page is maintained by  [mailto:atks@umich.edu Adrian].
1,102

edits

Navigation menu