Changes

From Genome Analysis Wiki
Jump to navigationJump to search
743 bytes added ,  15:15, 19 February 2014
Line 21: Line 21:  
Indel representation is not unique, you should normalize them and remove duplicates.
 
Indel representation is not unique, you should normalize them and remove duplicates.
    +
The following table shows the number of variants that had to be normalized and the ensuing number of duplicate variants found for some of the 1000 Genomes Trio High Coverage call sets.
 +
Although left alignment seems to be a trivial concept, it is easily overlooked and remain a common mistake.  Another example is the Mills et al. data set which followed up with 10004 Indels
 +
for validation. Out of 9996 passed variants, it was found that after normalization, only 8904 distinct Indels remain - about a loss of 11% of variant thought distinct.
 +
 +
Variant normalization is implemented in [[vt#Normalization|vt]] and this page explains the algorithm and also provides a simple proof of correctness - [[Variant_Normalization|Variant Normalization]]
    
{| class="wikitable"
 
{| class="wikitable"
1,102

edits

Navigation menu