LiftOver is a necesary step to bring all genetical analysis to the same reference build.
Particularly, our current data are mainly in either NCBI build 36 (UCSC hg 18) or NCBI build 37 (UCSC hg19). Although lift over can be from higher build to lower build, we always recommend lift lower build to higher/current build.
LiftOver is not hard. The easier way is to use UCSC liftOver tool to lift [http://genome .ucsc.edu/FAQ/FAQformat.html#format1 BED format] file to BED format file. With additional steps, we can also lift Merlin and PLINK data files.
Besides introducing lift over genomic positions, lifting SNPs is also introduced.
== Lift over using BED files ==
=== Binary liftOver tool ===
Download the [http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/liftOver liftOver binary] from UCSC and [http://hgdownload.cse.ucsc.edu/goldenPath/hg18/liftOver/hg18ToHg19.over.chain.gz hg18 to hg 19 chain file]
Provide BED format file (input.bed)
NOTE: Use the 'chr' before each chromosome name
liftOver input.bed hg18ToHg19.over.chain.gz output.bed unlifted.bed
unlifted file will contain all genomic positions that cannot be lifted. The reason for that varies. See [[#Various reasons that lift over could fail | Various reasons that lift over could fail]]
=== Web interface ===
be lifted if you click "Explain failure messages"
== Lift Merlin format ==
PLINK format and [http://www.sph.umich.edu/csg/abecasis/Merlin/tour/input_files.html Merlin format are nearly identical].
to obtain Merlin .map file.
== Lift PLINK format ==
[http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml PLINK] format usually referrs to .ped and .map files.
recommend split the jobs in several steps: (1) convert .map to .bed file
By rearrange columns of .map file, we obtain a standard BED format file.
liftOver .bed file Use method mentioned above to convert .bed file from one build to another.
(3) convert lifted .bed file back to .map file
Rearrange column of .map file to obtain .bed file in the new build.
modify .ped file
.ped file have many column files. By convention, the first six columns are family_id, person_id, father_id, mother_id, sex, and phenotype.
From the 7th column, there are two letters/digits representing a genotype at the certain marker. In step (2), as some genomic positions cannot
(5) (optionally) change the rs number in the .map file
Similar to the human reference build, dbSNP also have different versions. You may consider change rs number from the old dbSNP version to new dbSNP version
depending on your needs Such steps are described in [#Lift dbSNP rs numbers | Lift dbSNP rs numbers].
6) ( optionally) additional method to lift dbSNP postion NCBI dbSNP team has provided a provisional map for converting the genomic position of a larget set dbSNP from NCBI build 36 to NCBI build 37.
In the second step, we have obtained unlifted genomic positions, so we can try to use the table to convert those unlfted dbSNPs.
After this step, there are still some SNPs that cannot be lifted,
and they are mostly located on non-reference chromosome.
== Lift dbSNP rs numbers == rs number is release by dbSNP. UCSC also make their own copy from each dbSNP version. Be aware that the same version of dbSNP from these two centers are not the same. When we convert rs number from lower version to higher version, there are practically two ways.
=== Use RsMergeArch and SNPHistory === In short,
# when different rs number are found to refer to the same SNP, then higher rs number will be merged to lower rs number, and the merging will be recorded in RsMergedArch.bcp.gz. # when rs number have to be retracted, rs number will be recorded in SNPHistory.bcp. gz
So we need to combine these two tables to obtain the relationship between older rs number and new rs number. Luckily, we have a script for internal use. See liftRsNumber. py
== Various reasons that lift over could fail ==
Thus it is probably not very useful to lift this SNP.
Cannot find rs number in newer dbSNP build ===
It is possible that new dbSNP build does not have certain rs numbers.
When dbSNp release new build, higher rs number may be merged to lower rs number because of those rs numbers are actually the same SNP.
* NCBI provisional map [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/misc/exchange/Remap_36_3_37_1.txt.gz file] and [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/misc/exchange/Remap_36_3_37_1.info info]
* NCBI RgMergeArch file and [http://www.ncbi.nlm.nih.gov/SNP/snp_db_table_description.cgi?t=RsMergeArch schema]