Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 20: Line 20:     
== Convert Genotype Data to Build 37 ==
 
== Convert Genotype Data to Build 37 ==
 +
 +
Current releases of 1000 Genome Project data use NCBI genome build 37 (hg19) and, before you start imputation, you need to ensure that all your genotypes are reported using build 37 coordinates and on the forward strand of the reference genome.
 +
 +
The online [http://genome.ucsc.edu/cgi-bin/hgLiftOver LiftOver tool] can convert data from earlier genome builds to build 37. This tool re-maps only coordinates, but not SNP identifiers. Before using the tool, you may have to look-up a [ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/organism_data/RsMergeArch.bcp.gz dbSNP merge table] ([http://www.ncbi.nlm.nih.gov/SNP/snp_db_table_description.cgi?t=RsMergeArch table description on the NCBI website]) to account for any changes in SNP rs# between builds.
 +
 +
It is normal for a few SNPs that fail LiftOver.  Some of these fail because they cannot be mapped unambiguously by NCBI (for example rs1131012); these should be dropped from imputation. Sometimes, a few of these failed SNPs can be rescued by manually looking up their coordinates but, because the number of affected SNPs is typically very small, this step manual rescue step is not recommended.
 +
 +
After LiftOver, it is important to ensure that all genotypes are reported on the forward strand. This strand flipping can be facilitated by tools such as [http://pngu.mgh.harvard.edu/~purcell/plink/ PLINK] and [http://www.well.ox.ac.uk/~cfreeman/software/gwas/gtool.html GTOOL].

Navigation menu