Difference between revisions of "Regions of high linkage disequilibrium (LD)"
From Genome Analysis Wiki
Jump to navigationJump to search (used new SimpleTable extension) |
|||
Line 1: | Line 1: | ||
− | There are regions of long-range, high linkage diequilibrium in the human genome <ref>Price et al. (2008) Long-Range LD Can Confound Genome Scans in Admixed Populations. Am. J. Hum. Genet. 86, 127-147</ref><ref>Weale M. (2010) Quality Control for Genome-Wide Association Studies from Michael R. Barnes and Gerome Breen (eds.), Genetic Variation: Methods and Protocols, Methods in Molecular Biology, vol. 628, DOI 10.1007/978-1-60327-367-1_19, © Springer Science+Business Media, LLC 2010</ref>. These regions should be excluded when performing certain analyses such as principal component analysis on genotype data. | + | There are regions of long-range, high linkage diequilibrium in the human genome <ref>Price et al. (2008) Long-Range LD Can Confound Genome Scans in Admixed Populations. Am. J. Hum. Genet. 86, 127-147</ref><ref>Weale M. (2010) Quality Control for Genome-Wide Association Studies from Michael R. Barnes and Gerome Breen (eds.), Genetic Variation: Methods and Protocols, Methods in Molecular Biology, vol. 628, DOI 10.1007/978-1-60327-367-1_19, © Springer Science+Business Media, LLC 2010</ref>. These regions should be excluded when performing certain analyses such as principal component analysis on genotype data. |
− | [[ | + | [[Image:High-ld.png]] |
− | + | <tab border="1" head="top"> | |
− | + | Chr Start Stop ID | |
− | + | 1 48060567 52060567 hild1 | |
− | + | 2 85941853 100407914 hild2 | |
− | + | 2 134382738 137882738 hild3 | |
− | + | 2 182882739 189882739 hild4 | |
− | + | 3 47500000 50000000 hild5 | |
− | + | 3 83500000 87000000 hild6 | |
− | + | 3 89000000 97500000 hild7 | |
− | + | 5 44500000 50500000 hild8 | |
− | + | 5 98000000 100500000 hild9 | |
− | + | 5 129000000 132000000 hild10 | |
− | + | 5 135500000 138500000 hild11 | |
− | + | 6 25500000 33500000 hild12 | |
− | + | 6 57000000 64000000 hild13 | |
− | + | 6 140000000 142500000 hild14 | |
− | + | 7 55193285 66193285 hild15 | |
− | + | 8 8000000 12000000 hild16 | |
− | + | 8 43000000 50000000 hild17 | |
− | + | 8 112000000 115000000 hild18 | |
− | + | 10 37000000 43000000 hild19 | |
− | + | 11 46000000 57000000 hild20 | |
− | + | 11 87500000 90500000 hild21 | |
− | + | 12 33000000 40000000 hild22 | |
− | + | 12 109521663 112021663 hild23 | |
− | + | 20 32000000 34500000 hild24 | |
− | + | 23 14150264 16650264 hild25 | |
− | + | 23 25650264 28650264 hild26 | |
− | + | 23 33150264 35650264 hild27 | |
− | + | 23 55133704 60500000 hild28 | |
− | + | 23 65133704 67633704 hild29 | |
− | + | 23 71633704 77580511 hild30 | |
− | + | 23 80080511 86080511 hild31 | |
− | + | 23 100580511 103080511 hild32 | |
− | + | 23 125602146 128102146 hild33 | |
− | + | 23 129102146 131602146 hild34 | |
− | + | </tab> | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | == Excluding Regions With Plink == | + | == Excluding Regions With Plink == |
− | You can remove these regions from a PED file using the following PLINK commands. Assuming you have the data stored in a file named "high-ld.txt" | + | You can remove these regions from a PED file using the following PLINK commands. Assuming you have the data stored in a file named "high-ld.txt" |
plink --file mydata --make-set high-ld.txt --write-set --out hild | plink --file mydata --make-set high-ld.txt --write-set --out hild | ||
− | + | plink --file mydata --exclude hild.set --recode --out mydatatrimmed | |
− | =References= | + | = References = |
− | <references/> | + | |
+ | <references /> |
Revision as of 10:39, 25 January 2012
There are regions of long-range, high linkage diequilibrium in the human genome [1][2]. These regions should be excluded when performing certain analyses such as principal component analysis on genotype data.
Chr | Start | Stop | ID |
---|---|---|---|
1 | 48060567 | 52060567 | hild1 |
2 | 85941853 | 100407914 | hild2 |
2 | 134382738 | 137882738 | hild3 |
2 | 182882739 | 189882739 | hild4 |
3 | 47500000 | 50000000 | hild5 |
3 | 83500000 | 87000000 | hild6 |
3 | 89000000 | 97500000 | hild7 |
5 | 44500000 | 50500000 | hild8 |
5 | 98000000 | 100500000 | hild9 |
5 | 129000000 | 132000000 | hild10 |
5 | 135500000 | 138500000 | hild11 |
6 | 25500000 | 33500000 | hild12 |
6 | 57000000 | 64000000 | hild13 |
6 | 140000000 | 142500000 | hild14 |
7 | 55193285 | 66193285 | hild15 |
8 | 8000000 | 12000000 | hild16 |
8 | 43000000 | 50000000 | hild17 |
8 | 112000000 | 115000000 | hild18 |
10 | 37000000 | 43000000 | hild19 |
11 | 46000000 | 57000000 | hild20 |
11 | 87500000 | 90500000 | hild21 |
12 | 33000000 | 40000000 | hild22 |
12 | 109521663 | 112021663 | hild23 |
20 | 32000000 | 34500000 | hild24 |
23 | 14150264 | 16650264 | hild25 |
23 | 25650264 | 28650264 | hild26 |
23 | 33150264 | 35650264 | hild27 |
23 | 55133704 | 60500000 | hild28 |
23 | 65133704 | 67633704 | hild29 |
23 | 71633704 | 77580511 | hild30 |
23 | 80080511 | 86080511 | hild31 |
23 | 100580511 | 103080511 | hild32 |
23 | 125602146 | 128102146 | hild33 |
23 | 129102146 | 131602146 | hild34 |
Excluding Regions With Plink
You can remove these regions from a PED file using the following PLINK commands. Assuming you have the data stored in a file named "high-ld.txt"
plink --file mydata --make-set high-ld.txt --write-set --out hild plink --file mydata --exclude hild.set --recode --out mydatatrimmed
References
- ↑ Price et al. (2008) Long-Range LD Can Confound Genome Scans in Admixed Populations. Am. J. Hum. Genet. 86, 127-147
- ↑ Weale M. (2010) Quality Control for Genome-Wide Association Studies from Michael R. Barnes and Gerome Breen (eds.), Genetic Variation: Methods and Protocols, Methods in Molecular Biology, vol. 628, DOI 10.1007/978-1-60327-367-1_19, © Springer Science+Business Media, LLC 2010