From Genome Analysis Wiki
Jump to navigationJump to search
235 bytes added
, 21:58, 15 June 2014
Line 189: |
Line 189: |
| | | |
| ==Normalization== | | ==Normalization== |
| + | |
| + | A slight digression here, when analyzing indels, it is important to normalize it. While it is a simple concept, |
| + | it is hardly standardized. The call set here had already been normalized but we feel that this is an important |
| + | concept so we discuss this a bit here. |
| | | |
| Indel representation is not unique, you should normalize them and remove duplicates. | | Indel representation is not unique, you should normalize them and remove duplicates. |
Line 240: |
Line 244: |
| | 0 | | | 0 |
| | 374 | | | 374 |
− | | | + | | 0 |
− | | | + | | 0 |
| |- | | |- |
| | Left aligned | | | Left aligned |
Line 301: |
Line 305: |
| Time elapsed: 0.13s | | Time elapsed: 0.13s |
| | | |
− | The following will be slight faster: + denotes using of uncompressed bcf stream.
| + | vt normalize mills.genotypes.bcf -r ~/ref/vt/grch37/hs37d5.fa -o + | vt mergedups + -o mills.normalized.genotypes.bcf |
| | | |
− | vt normalize mills.genotypes.bcf -r ~/ref/vt/grch37/hs37d5.fa -o + | vt mergedups + -o mills.normalized.genotypes.bcf
| |
| | | |
− | Also remember to index this file and extract the sites.
| + | UMICH's algorithm for normalization has been adopted by Petr Danecek in bcftools and is also used in GKNO. |
| | | |
| ==to document== | | ==to document== |