Regions of high linkage disequilibrium (LD)

From Genome Analysis Wiki
Jump to: navigation, search

There are regions of long-range, high linkage diequilibrium in the human genome [1][2]. These regions should be excluded when performing certain analyses such as principal component analysis on genotype data.

High-ld.png

Here is a lost of positions for GRCH Build 37

Chr Start Stop
1 48000000 52000000
2 86000000 100500000
2 134500000 138000000
2 183000000 190000000
3 47500000 50000000
3 83500000 87000000
3 89000000 97500000
5 44500000 50500000
5 98000000 100500000
5 129000000 132000000
5 135500000 138500000
6 25000000 35000000
6 57000000 64000000
6 140000000 142500000
7 55000000 66000000
8 7000000 13000000
8 43000000 50000000
8 112000000 115000000
10 37000000 43000000
11 46000000 57000000
11 87500000 90500000
12 33000000 40000000
12 109500000 112000000
20 32000000 34500000


These positions are for GRCH build 36.

Chr Start Stop ID
1 48060567 52060567 hild1
2 85941853 100407914 hild2
2 134382738 137882738 hild3
2 182882739 189882739 hild4
3 47500000 50000000 hild5
3 83500000 87000000 hild6
3 89000000 97500000 hild7
5 44500000 50500000 hild8
5 98000000 100500000 hild9
5 129000000 132000000 hild10
5 135500000 138500000 hild11
6 25500000 33500000 hild12
6 57000000 64000000 hild13
6 140000000 142500000 hild14
7 55193285 66193285 hild15
8 8000000 12000000 hild16
8 43000000 50000000 hild17
8 112000000 115000000 hild18
10 37000000 43000000 hild19
11 46000000 57000000 hild20
11 87500000 90500000 hild21
12 33000000 40000000 hild22
12 109521663 112021663 hild23
20 32000000 34500000 hild24
X 14150264 16650264 hild25
X 25650264 28650264 hild26
X 33150264 35650264 hild27
X 55133704 60500000 hild28
X 65133704 67633704 hild29
X 71633704 77580511 hild30
X 80080511 86080511 hild31
X 100580511 103080511 hild32
X 125602146 128102146 hild33
X 129102146 131602146 hild34

Excluding Regions With Plink

You can remove these regions from a PED file using the following PLINK commands. Assuming you have the data stored in a file named "high-ld.txt"

   plink --file mydata --make-set high-ld.txt --write-set --out hild
  plink --file mydata --exclude hild.set --recode --out mydatatrimmed

References

  1. Price et al. (2008) Long-Range LD Can Confound Genome Scans in Admixed Populations. Am. J. Hum. Genet. 86, 127-147
  2. Weale M. (2010) Quality Control for Genome-Wide Association Studies from Michael R. Barnes and Gerome Breen (eds.), Genetic Variation: Methods and Protocols, Methods in Molecular Biology, vol. 628, DOI 10.1007/978-1-60327-367-1_19, © Springer Science+Business Media, LLC 2010