Regions of high linkage disequilibrium (LD)
From Genome Analysis Wiki
Jump to navigationJump to searchThere are regions of long-range, high linkage diequilibrium in the human genome [1][2]. These regions should be excluded when performing certain analyses such as principal component analysis on genotype data.
Here is a list of positions for GRCH Build 38. There positions are provided by the plinkQC R package and were provided by Anderson2010[3]
Chr | Start | Stop |
---|---|---|
chr1 | 47761740 | 51761740 |
chr1 | 125169943 | 125170022 |
chr1 | 144106678 | 144106709 |
chr1 | 181955019 | 181955047 |
chr2 | 85919365 | 100517106 |
chr2 | 87416141 | 87416186 |
chr2 | 87417804 | 87417863 |
chr2 | 87418924 | 87418981 |
chr2 | 89917298 | 89917322 |
chr2 | 135275091 | 135275210 |
chr2 | 182427027 | 189427029 |
chr2 | 207609786 | 207609808 |
chr3 | 47483505 | 49987563 |
chr3 | 83368158 | 86868160 |
chr5 | 44464140 | 51168409 |
chr5 | 129636407 | 132636409 |
chr6 | 25391792 | 33424245 |
chr6 | 26726947 | 26726981 |
chr6 | 57788603 | 58453888 |
chr6 | 61109122 | 61357029 |
chr6 | 61424410 | 61424451 |
chr6 | 139637169 | 142137170 |
chr7 | 54964812 | 66897578 |
chr7 | 62182500 | 62277073 |
chr8 | 8105067 | 12105082 |
chr8 | 43025699 | 48924888 |
chr8 | 47303500 | 47317337 |
chr8 | 110918594 | 113918595 |
chr9 | 40365644 | 40365693 |
chr9 | 64198500 | 64200392 |
chr9 | 88958735 | 88959017 |
chr10 | 36671065 | 43184546 |
chr10 | 41693521 | 41885273 |
chr11 | 88127183 | 91127184 |
chr12 | 32955798 | 41319931 |
chr12 | 34639034 | 34639084 |
chr14 | 87391719 | 87391996 |
chr14 | 94658026 | 94658080 |
chr17 | 43159541 | 43159574 |
chr20 | 4031884 | 4032441 |
chr20 | 33948532 | 36438183 |
chr22 | 30060084 | 30060162 |
chr22 | 42980497 | 42980522 |
Here is a list of positions for GRCH Build 37
Chr | Start | Stop |
---|---|---|
1 | 48000000 | 52000000 |
2 | 86000000 | 100500000 |
2 | 134500000 | 138000000 |
2 | 183000000 | 190000000 |
3 | 47500000 | 50000000 |
3 | 83500000 | 87000000 |
3 | 89000000 | 97500000 |
5 | 44500000 | 50500000 |
5 | 98000000 | 100500000 |
5 | 129000000 | 132000000 |
5 | 135500000 | 138500000 |
6 | 25000000 | 35000000 |
6 | 57000000 | 64000000 |
6 | 140000000 | 142500000 |
7 | 55000000 | 66000000 |
8 | 7000000 | 13000000 |
8 | 43000000 | 50000000 |
8 | 112000000 | 115000000 |
10 | 37000000 | 43000000 |
11 | 46000000 | 57000000 |
11 | 87500000 | 90500000 |
12 | 33000000 | 40000000 |
12 | 109500000 | 112000000 |
20 | 32000000 | 34500000 |
These positions are for GRCH build 36.
Chr | Start | Stop | ID |
---|---|---|---|
1 | 48060567 | 52060567 | hild1 |
2 | 85941853 | 100407914 | hild2 |
2 | 134382738 | 137882738 | hild3 |
2 | 182882739 | 189882739 | hild4 |
3 | 47500000 | 50000000 | hild5 |
3 | 83500000 | 87000000 | hild6 |
3 | 89000000 | 97500000 | hild7 |
5 | 44500000 | 50500000 | hild8 |
5 | 98000000 | 100500000 | hild9 |
5 | 129000000 | 132000000 | hild10 |
5 | 135500000 | 138500000 | hild11 |
6 | 25500000 | 33500000 | hild12 |
6 | 57000000 | 64000000 | hild13 |
6 | 140000000 | 142500000 | hild14 |
7 | 55193285 | 66193285 | hild15 |
8 | 8000000 | 12000000 | hild16 |
8 | 43000000 | 50000000 | hild17 |
8 | 112000000 | 115000000 | hild18 |
10 | 37000000 | 43000000 | hild19 |
11 | 46000000 | 57000000 | hild20 |
11 | 87500000 | 90500000 | hild21 |
12 | 33000000 | 40000000 | hild22 |
12 | 109521663 | 112021663 | hild23 |
20 | 32000000 | 34500000 | hild24 |
X | 14150264 | 16650264 | hild25 |
X | 25650264 | 28650264 | hild26 |
X | 33150264 | 35650264 | hild27 |
X | 55133704 | 60500000 | hild28 |
X | 65133704 | 67633704 | hild29 |
X | 71633704 | 77580511 | hild30 |
X | 80080511 | 86080511 | hild31 |
X | 100580511 | 103080511 | hild32 |
X | 125602146 | 128102146 | hild33 |
X | 129102146 | 131602146 | hild34 |
Excluding Regions With Plink
You can remove these regions from a PED file using the following PLINK commands. Assuming you have the data stored in a file named "high-ld.txt"
plink --file mydata --make-set high-ld.txt --write-set --out hild plink --file mydata --exclude hild.set --recode --out mydatatrimmed
References
- ↑ Price et al. (2008) Long-Range LD Can Confound Genome Scans in Admixed Populations. Am. J. Hum. Genet. 86, 127-147
- ↑ Weale M. (2010) Quality Control for Genome-Wide Association Studies from Michael R. Barnes and Gerome Breen (eds.), Genetic Variation: Methods and Protocols, Methods in Molecular Biology, vol. 628, DOI 10.1007/978-1-60327-367-1_19, © Springer Science+Business Media, LLC 2010
- ↑ Anderson, Carl A., et al. "Data quality control in genetic case-control association studies." Nature protocols 5.9 (2010): 1564-1573.