Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 189: Line 189:     
==Normalization==
 
==Normalization==
 +
 +
A slight digression here, when analyzing indels, it is important to normalize it.  While it is a simple concept,
 +
it is hardly standardized.  The call set here had already been normalized but we feel that this is an important
 +
concept so we discuss this a bit here.
    
Indel representation is not unique, you should normalize them and remove duplicates.
 
Indel representation is not unique, you should normalize them and remove duplicates.
Line 240: Line 244:  
| 0
 
| 0
 
| 374
 
| 374
|  
+
| 0
|  
+
| 0
 
|-
 
|-
 
| Left aligned
 
| Left aligned
Line 301: Line 305:  
   Time elapsed: 0.13s
 
   Time elapsed: 0.13s
   −
The following will be slight faster: + denotes using of uncompressed bcf stream. 
+
  vt normalize  mills.genotypes.bcf -r ~/ref/vt/grch37/hs37d5.fa -o + | vt mergedups + -o mills.normalized.genotypes.bcf
   −
  vt normalize  mills.genotypes.bcf -r ~/ref/vt/grch37/hs37d5.fa -o + | vt mergedups + -o mills.normalized.genotypes.bcf
     −
Also remember to index this file and extract the sites.
+
UMICH's algorithm for normalization has been adopted by Petr Danecek in bcftools and is also used in GKNO.
    
==to document==
 
==to document==
1,102

edits

Navigation menu