Difference between revisions of "Regions of high linkage disequilibrium (LD)"
From Genome Analysis Wiki
Jump to navigationJump to search(6 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | There are regions of high linkage diequilibrium in the human genome. These regions should be excluded when performing certain analyses such as principal component analysis on genotype data. | + | There are regions of long-range, high linkage diequilibrium in the human genome <ref>Price et al. (2008) Long-Range LD Can Confound Genome Scans in Admixed Populations. Am. J. Hum. Genet. 86, 127-147</ref><ref>Weale M. (2010) Quality Control for Genome-Wide Association Studies from Michael R. Barnes and Gerome Breen (eds.), Genetic Variation: Methods and Protocols, Methods in Molecular Biology, vol. 628, DOI 10.1007/978-1-60327-367-1_19, © Springer Science+Business Media, LLC 2010</ref>. These regions should be excluded when performing certain analyses such as principal component analysis on genotype data. |
− | [[ | + | [[Image:High-ld-b38.png]] |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | == | + | Here is a list of positions for GRCH Build 38. There positions are provided by the [https://github.com/cran/plinkQC/blob/master/inst/extdata/high-LD-regions-hg38-GRCh38.bed plinkQC R package] and were provided by Anderson2010<ref>Anderson, Carl A., et al. "Data quality control in genetic case-control association studies." Nature protocols 5.9 (2010): 1564-1573.</ref> |
+ | <tab border="1" head="top"> | ||
+ | Chr Start Stop | ||
+ | chr1 47761740 51761740 | ||
+ | chr1 125169943 125170022 | ||
+ | chr1 144106678 144106709 | ||
+ | chr1 181955019 181955047 | ||
+ | chr2 85919365 100517106 | ||
+ | chr2 87416141 87416186 | ||
+ | chr2 87417804 87417863 | ||
+ | chr2 87418924 87418981 | ||
+ | chr2 89917298 89917322 | ||
+ | chr2 135275091 135275210 | ||
+ | chr2 182427027 189427029 | ||
+ | chr2 207609786 207609808 | ||
+ | chr3 47483505 49987563 | ||
+ | chr3 83368158 86868160 | ||
+ | chr5 44464140 51168409 | ||
+ | chr5 129636407 132636409 | ||
+ | chr6 25391792 33424245 | ||
+ | chr6 26726947 26726981 | ||
+ | chr6 57788603 58453888 | ||
+ | chr6 61109122 61357029 | ||
+ | chr6 61424410 61424451 | ||
+ | chr6 139637169 142137170 | ||
+ | chr7 54964812 66897578 | ||
+ | chr7 62182500 62277073 | ||
+ | chr8 8105067 12105082 | ||
+ | chr8 43025699 48924888 | ||
+ | chr8 47303500 47317337 | ||
+ | chr8 110918594 113918595 | ||
+ | chr9 40365644 40365693 | ||
+ | chr9 64198500 64200392 | ||
+ | chr9 88958735 88959017 | ||
+ | chr10 36671065 43184546 | ||
+ | chr10 41693521 41885273 | ||
+ | chr11 88127183 91127184 | ||
+ | chr12 32955798 41319931 | ||
+ | chr12 34639034 34639084 | ||
+ | chr14 87391719 87391996 | ||
+ | chr14 94658026 94658080 | ||
+ | chr17 43159541 43159574 | ||
+ | chr20 4031884 4032441 | ||
+ | chr20 33948532 36438183 | ||
+ | chr22 30060084 30060162 | ||
+ | chr22 42980497 42980522 | ||
+ | </tab> | ||
− | You can remove these regions from a PED file using the following PLINK commands. Assuming you have the data stored in a file named "high-ld.txt" | + | Here is a list of positions for GRCH Build 37 |
+ | |||
+ | [[Image:High-ld.png]] | ||
+ | |||
+ | <tab border="1" head="top"> | ||
+ | Chr Start Stop | ||
+ | 1 48000000 52000000 | ||
+ | 2 86000000 100500000 | ||
+ | 2 134500000 138000000 | ||
+ | 2 183000000 190000000 | ||
+ | 3 47500000 50000000 | ||
+ | 3 83500000 87000000 | ||
+ | 3 89000000 97500000 | ||
+ | 5 44500000 50500000 | ||
+ | 5 98000000 100500000 | ||
+ | 5 129000000 132000000 | ||
+ | 5 135500000 138500000 | ||
+ | 6 25000000 35000000 | ||
+ | 6 57000000 64000000 | ||
+ | 6 140000000 142500000 | ||
+ | 7 55000000 66000000 | ||
+ | 8 7000000 13000000 | ||
+ | 8 43000000 50000000 | ||
+ | 8 112000000 115000000 | ||
+ | 10 37000000 43000000 | ||
+ | 11 46000000 57000000 | ||
+ | 11 87500000 90500000 | ||
+ | 12 33000000 40000000 | ||
+ | 12 109500000 112000000 | ||
+ | 20 32000000 34500000 | ||
+ | </tab> | ||
+ | |||
+ | |||
+ | These positions are for GRCH build 36. | ||
+ | <tab border="1" head="top"> | ||
+ | Chr Start Stop ID | ||
+ | 1 48060567 52060567 hild1 | ||
+ | 2 85941853 100407914 hild2 | ||
+ | 2 134382738 137882738 hild3 | ||
+ | 2 182882739 189882739 hild4 | ||
+ | 3 47500000 50000000 hild5 | ||
+ | 3 83500000 87000000 hild6 | ||
+ | 3 89000000 97500000 hild7 | ||
+ | 5 44500000 50500000 hild8 | ||
+ | 5 98000000 100500000 hild9 | ||
+ | 5 129000000 132000000 hild10 | ||
+ | 5 135500000 138500000 hild11 | ||
+ | 6 25500000 33500000 hild12 | ||
+ | 6 57000000 64000000 hild13 | ||
+ | 6 140000000 142500000 hild14 | ||
+ | 7 55193285 66193285 hild15 | ||
+ | 8 8000000 12000000 hild16 | ||
+ | 8 43000000 50000000 hild17 | ||
+ | 8 112000000 115000000 hild18 | ||
+ | 10 37000000 43000000 hild19 | ||
+ | 11 46000000 57000000 hild20 | ||
+ | 11 87500000 90500000 hild21 | ||
+ | 12 33000000 40000000 hild22 | ||
+ | 12 109521663 112021663 hild23 | ||
+ | 20 32000000 34500000 hild24 | ||
+ | X 14150264 16650264 hild25 | ||
+ | X 25650264 28650264 hild26 | ||
+ | X 33150264 35650264 hild27 | ||
+ | X 55133704 60500000 hild28 | ||
+ | X 65133704 67633704 hild29 | ||
+ | X 71633704 77580511 hild30 | ||
+ | X 80080511 86080511 hild31 | ||
+ | X 100580511 103080511 hild32 | ||
+ | X 125602146 128102146 hild33 | ||
+ | X 129102146 131602146 hild34 | ||
+ | </tab> | ||
+ | |||
+ | == Excluding Regions With Plink == | ||
+ | |||
+ | You can remove these regions from a PED file using the following PLINK commands. Assuming you have the data stored in a file named "high-ld.txt" | ||
plink --file mydata --make-set high-ld.txt --write-set --out hild | plink --file mydata --make-set high-ld.txt --write-set --out hild | ||
− | + | plink --file mydata --exclude hild.set --recode --out mydatatrimmed | |
+ | |||
+ | = References = | ||
+ | |||
+ | <references /> |
Latest revision as of 22:38, 10 October 2021
There are regions of long-range, high linkage diequilibrium in the human genome [1][2]. These regions should be excluded when performing certain analyses such as principal component analysis on genotype data.
Here is a list of positions for GRCH Build 38. There positions are provided by the plinkQC R package and were provided by Anderson2010[3]
Chr | Start | Stop |
---|---|---|
chr1 | 47761740 | 51761740 |
chr1 | 125169943 | 125170022 |
chr1 | 144106678 | 144106709 |
chr1 | 181955019 | 181955047 |
chr2 | 85919365 | 100517106 |
chr2 | 87416141 | 87416186 |
chr2 | 87417804 | 87417863 |
chr2 | 87418924 | 87418981 |
chr2 | 89917298 | 89917322 |
chr2 | 135275091 | 135275210 |
chr2 | 182427027 | 189427029 |
chr2 | 207609786 | 207609808 |
chr3 | 47483505 | 49987563 |
chr3 | 83368158 | 86868160 |
chr5 | 44464140 | 51168409 |
chr5 | 129636407 | 132636409 |
chr6 | 25391792 | 33424245 |
chr6 | 26726947 | 26726981 |
chr6 | 57788603 | 58453888 |
chr6 | 61109122 | 61357029 |
chr6 | 61424410 | 61424451 |
chr6 | 139637169 | 142137170 |
chr7 | 54964812 | 66897578 |
chr7 | 62182500 | 62277073 |
chr8 | 8105067 | 12105082 |
chr8 | 43025699 | 48924888 |
chr8 | 47303500 | 47317337 |
chr8 | 110918594 | 113918595 |
chr9 | 40365644 | 40365693 |
chr9 | 64198500 | 64200392 |
chr9 | 88958735 | 88959017 |
chr10 | 36671065 | 43184546 |
chr10 | 41693521 | 41885273 |
chr11 | 88127183 | 91127184 |
chr12 | 32955798 | 41319931 |
chr12 | 34639034 | 34639084 |
chr14 | 87391719 | 87391996 |
chr14 | 94658026 | 94658080 |
chr17 | 43159541 | 43159574 |
chr20 | 4031884 | 4032441 |
chr20 | 33948532 | 36438183 |
chr22 | 30060084 | 30060162 |
chr22 | 42980497 | 42980522 |
Here is a list of positions for GRCH Build 37
Chr | Start | Stop |
---|---|---|
1 | 48000000 | 52000000 |
2 | 86000000 | 100500000 |
2 | 134500000 | 138000000 |
2 | 183000000 | 190000000 |
3 | 47500000 | 50000000 |
3 | 83500000 | 87000000 |
3 | 89000000 | 97500000 |
5 | 44500000 | 50500000 |
5 | 98000000 | 100500000 |
5 | 129000000 | 132000000 |
5 | 135500000 | 138500000 |
6 | 25000000 | 35000000 |
6 | 57000000 | 64000000 |
6 | 140000000 | 142500000 |
7 | 55000000 | 66000000 |
8 | 7000000 | 13000000 |
8 | 43000000 | 50000000 |
8 | 112000000 | 115000000 |
10 | 37000000 | 43000000 |
11 | 46000000 | 57000000 |
11 | 87500000 | 90500000 |
12 | 33000000 | 40000000 |
12 | 109500000 | 112000000 |
20 | 32000000 | 34500000 |
These positions are for GRCH build 36.
Chr | Start | Stop | ID |
---|---|---|---|
1 | 48060567 | 52060567 | hild1 |
2 | 85941853 | 100407914 | hild2 |
2 | 134382738 | 137882738 | hild3 |
2 | 182882739 | 189882739 | hild4 |
3 | 47500000 | 50000000 | hild5 |
3 | 83500000 | 87000000 | hild6 |
3 | 89000000 | 97500000 | hild7 |
5 | 44500000 | 50500000 | hild8 |
5 | 98000000 | 100500000 | hild9 |
5 | 129000000 | 132000000 | hild10 |
5 | 135500000 | 138500000 | hild11 |
6 | 25500000 | 33500000 | hild12 |
6 | 57000000 | 64000000 | hild13 |
6 | 140000000 | 142500000 | hild14 |
7 | 55193285 | 66193285 | hild15 |
8 | 8000000 | 12000000 | hild16 |
8 | 43000000 | 50000000 | hild17 |
8 | 112000000 | 115000000 | hild18 |
10 | 37000000 | 43000000 | hild19 |
11 | 46000000 | 57000000 | hild20 |
11 | 87500000 | 90500000 | hild21 |
12 | 33000000 | 40000000 | hild22 |
12 | 109521663 | 112021663 | hild23 |
20 | 32000000 | 34500000 | hild24 |
X | 14150264 | 16650264 | hild25 |
X | 25650264 | 28650264 | hild26 |
X | 33150264 | 35650264 | hild27 |
X | 55133704 | 60500000 | hild28 |
X | 65133704 | 67633704 | hild29 |
X | 71633704 | 77580511 | hild30 |
X | 80080511 | 86080511 | hild31 |
X | 100580511 | 103080511 | hild32 |
X | 125602146 | 128102146 | hild33 |
X | 129102146 | 131602146 | hild34 |
Excluding Regions With Plink
You can remove these regions from a PED file using the following PLINK commands. Assuming you have the data stored in a file named "high-ld.txt"
plink --file mydata --make-set high-ld.txt --write-set --out hild plink --file mydata --exclude hild.set --recode --out mydatatrimmed
References
- ↑ Price et al. (2008) Long-Range LD Can Confound Genome Scans in Admixed Populations. Am. J. Hum. Genet. 86, 127-147
- ↑ Weale M. (2010) Quality Control for Genome-Wide Association Studies from Michael R. Barnes and Gerome Breen (eds.), Genetic Variation: Methods and Protocols, Methods in Molecular Biology, vol. 628, DOI 10.1007/978-1-60327-367-1_19, © Springer Science+Business Media, LLC 2010
- ↑ Anderson, Carl A., et al. "Data quality control in genetic case-control association studies." Nature protocols 5.9 (2010): 1564-1573.