Line 104: |
Line 104: |
| | | |
| Thus A and B have to be at the same position and have the same length and variant normalization is unique. | | Thus A and B have to be at the same position and have the same length and variant normalization is unique. |
− |
| |
− | = Here is an example where this normalization algorithm fails =
| |
− |
| |
− | We distinguish the concepts of normalization and decomposition/reconstruction of variants as follows:
| |
− |
| |
− |
| |
− | Normalization involves reducing representations of a variant to a canonical representation. Normalization can be applied to biallelic variants or multiallelic variants. The problem of normalization is solvable and there exists a unique representation that is left aligned and parsimonious. Mathematical proof is published. [http://bioinformatics.oxfordjournals.org/content/early/2015/03/22/bioinformatics.btv112]
| |
− |
| |
− |
| |
− | Decomposition of variants involves the breaking down of a variant record into multiple records. It may be done vertically - as in multiallelics becoming biallelics or it can be done horizontally - a cluster of indels and SNPs represented as a complex variant being splitted up into several records. Horizontal decompositions in general do not have a unique solution. Similarly, reconstruction combines several variant records into a single record and can be done vertically and horizontally too. Vertical decomposition of a multiallelic variant to a set of biallelic records is a many to one function. Construction of a set of biallelic variants into a multiallelic record is not unique as you need to considered all possible permutations of the haplotypes containing your alleles.
| |
− |
| |
− |
| |
− | If your example contains the decomposition or reconstruction of variants, then it is probable that you can find inconsistencies.
| |
− |
| |
− |
| |
− | It is important to distinguish the difference between normalization and decomposition/reconstruction. The notion of normalization implies that a variant can be reduced to a standardized form. If you were to include decomposition and reconstruction in your notion of normalization, you are bound to find inconsistencies simply due to the inherent issues of identifiability.
| |
− |
| |
− |
| |
− | When performing decomposition and construction, I think the following factors should be considered:
| |
− |
| |
− | * Are your variants describing just a single individual or a population?
| |
− | * Are the genotypes (if any) in your individual(s) phased?
| |
− |
| |
− | Depending on the context, you will obtain different answers.
| |
− |
| |
− | [https://github.com/atks/vt/issues/16 An example of inconsistent variant representation due to using vt normalize]
| |
| | | |
| = Implementation = | | = Implementation = |
Line 243: |
Line 217: |
| *GATK v3.1-1-g07a4bf8 | | *GATK v3.1-1-g07a4bf8 |
| *vt normalize v0.5 | | *vt normalize v0.5 |
| + | |
| + | = Here is an example where this normalization algorithm fails = |
| + | |
| + | We distinguish the concepts of normalization and decomposition/reconstruction of variants as follows: |
| + | |
| + | |
| + | Normalization involves reducing representations of a variant to a canonical representation. Normalization can be applied to biallelic variants or multiallelic variants. The problem of normalization is solvable and there exists a unique representation that is left aligned and parsimonious. Mathematical proof is published. [http://bioinformatics.oxfordjournals.org/content/suppl/2015/02/19/btv112.DC1/VtNormApplicationNote_supp_20141113_1346.pdf] |
| + | |
| + | |
| + | Decomposition of variants involves the breaking down of a variant record into multiple records. It may be done vertically - as in multiallelics becoming biallelics or it can be done horizontally - a cluster of indels and SNPs represented as a complex variant being splitted up into several records. Horizontal decompositions in general do not have a unique solution. Similarly, reconstruction combines several variant records into a single record and can be done vertically and horizontally too. Vertical decomposition of a multiallelic variant to a set of biallelic records is a many to one function. Construction of a set of biallelic variants into a multiallelic record is not unique as you need to considered all possible permutations of the haplotypes containing your alleles. |
| + | |
| + | |
| + | If your example contains the decomposition or reconstruction of variants, then it is probable that you can find inconsistencies. |
| + | |
| + | |
| + | It is important to distinguish the difference between normalization and decomposition/reconstruction. The notion of normalization implies that a variant can be reduced to a standardized form. If you were to include decomposition and reconstruction in your notion of normalization, you are bound to find inconsistencies simply due to the inherent issues of identifiability. |
| + | |
| + | |
| + | When performing decomposition and construction, I think the following factors should be considered: |
| + | |
| + | * Are your variants describing just a single individual or a population? |
| + | * Are the genotypes (if any) in your individual(s) phased? |
| + | |
| + | Depending on the context, you will obtain different answers. |
| + | |
| + | [https://github.com/atks/vt/issues/16 An example of inconsistent variant representation due to using vt normalize] |
| | | |
| = Citation = | | = Citation = |
| | | |
− | [http://bioinformatics.oxfordjournals.org/content/early/2015/03/22/bioinformatics.btv112 Adrian Tan, Gonçalo R. Abecasis and Hyun Min Kang. (2015) Unified Representation of Genetic Variants. Bioinformatics.] | + | [http://bioinformatics.oxfordjournals.org/content/31/13/2202 Adrian Tan, Gonçalo R. Abecasis and Hyun Min Kang. (2015) Unified Representation of Genetic Variants. Bioinformatics.] |
| + | |
| + | = Translations = |
| + | |
| + | A mandarin translation can be found [http://www.lyon0804.com/fan-yi-variant-normalization.html here] |
| | | |
| = Maintained by = | | = Maintained by = |
| | | |
| This page is maintained by [mailto:atks@umich.edu Adrian]. | | This page is maintained by [mailto:atks@umich.edu Adrian]. |