Changes

Variant Normalization (view source)

Revision as of 23:46, 26 March 2015

51 bytes removed , 23:46, 26 March 2015

→‎I can find an example where the normalization algorithm fails

Line 105: Line 105:

Thus A and B have to be at the same position and have the same length and variant normalization is unique.

−

= ~~I can find~~ an example where ~~the~~ normalization algorithm fails =

+

= Here is an example where this normalization algorithm fails =

−

~~Hi Terry,~~

+

We distinguish the concepts of normalization and decomposition/reconstruction of variants as follows:

−

~~Thanks for the report. This is an interesting example.~~

−

~~But before I begin, I would like~~ to ~~distinguish the difference between~~ normalization and ~~decomposition of variants (as we defined it)~~

+

Normalization involves reducing representations of a variant to a canonical representation. Normalization can be applied to biallelic variants or multiallelic variants. The problem of normalization is solvable and there exists a unique representation that is left aligned and parsimonious. Mathematical proof is published. [http://bioinformatics.oxfordjournals.org/content/early/2015/03/22/bioinformatics.btv112]

−

Decomposition of variants involves the breaking down of a variant record into multiple records. It may be done vertically - as in multiallelics becoming biallelics or it can be done horizontally - a cluster of indels and SNPs represented as a complex variant being splitted up into several records. Horizontal decompositions in general do not have a unique solution.

−

~~Normalization~~ involves ~~ensuring~~ the ~~representation~~ of a variant record ~~is left aligned and parsimonious and does not increase or decrease the number of~~ records ~~representing that variant~~. ~~Normalization~~ can be ~~applied to biallelic variants or multiallelic variants. The problem~~ of ~~normalization is solvable~~ and ~~there exists~~ a unique ~~representation that is left aligned and parsimonious~~. ~~Mathematical proof is published. [http://bioinformatics.oxfordjournals.org/content/early/2015/03/22/bioinformatics~~.~~btv112]~~

+

Decomposition of variants involves the breaking down of a variant record into multiple records. It may be done vertically - as in multiallelics becoming biallelics or it can be done horizontally - a cluster of indels and SNPs represented as a complex variant being splitted up into several records. Horizontal decompositions in general do not have a unique solution. Similarly, reconstruction combines several variant records into a single record.

−

Supporting haplotype reconstruction is actually not the goal of vt's normalization, it is meant for allowing one to compare the alleles of variant call sets from different variant callers applied possibly to multiple samples.

−

~~The notion of normalization that you described involves reconstruction of haplotypes, and you are right to say that there should be some inbuilt collision detection mechanism. It was a really nice~~ example ~~and in~~ the ~~context~~ of ~~a single sample and the assumption that the alternate alleles must occur on the same haplotype~~, it is ~~correct~~.

+

If your example contains the decomposition or reconstruction of variants, then it is probable that you can find inconsistencies.

+

It is important to distinguish the difference between normalization and decomposition/reconstruction. The notion of normalization implies that a variant can be reduced to a standardized form. If you were to include decomposition and reconstruction in your notion of normalization, you are bound to find inconsistencoes simply due to the inherent issues of identifiability.

+

[https://github.com/atks/vt/issues/16 An example of inconsistent variant representation due to using vt normalize]

= Implementation =

Atks

1,102

edits

Changes

Variant Normalization (view source)

Revision as of 23:46, 26 March 2015

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools