From Genome Analysis Wiki
no edit summary
LiftOver can have three use cases:
genomic positions | convert genomic position from one genome assembly to another genome assembly]]In most scenarios, we have known genomic positions in NCBI build 36 (UCSC hg 18) and hope to lift them over to NCBI build 37 (UCSC hg19).
(2) [[#Lift dbSNP rs numbers | convert dbSNP rs number from one build to another]]
(3) [[#Lift Merlin/PLINK format |convert both
genomic position and dbSNP rs number over different versions]]
It is likely to see such type of data in Merlin/PLINK format.
With our customized scripts, we can also lift rsNumber and Merlin/PLINK data files.
genomic positions == Genomic positions are best represented in [http://genome.ucsc.edu/FAQ/FAQformat.html#format1 BED format]. UCSC provides tools to convert BED file from one genome assembly to another.
=== Binary liftOver tool ===
liftOver input.bed hg18ToHg19.over.chain.gz output.bed unlifted.bed
unlifted.bed file will contain all
genomic positions that cannot be lifted. The reason for that varies. See [[#Various reasons that lift over could fail | Various reasons that lift over could fail]]
=== Web interface ===
Alternatively, you can lift over BED file in web interface
at: [http://genome.ucsc.edu/cgi-bin/hgLiftOver Link]
Web interface can tell you why some
genomic position cannot
be lifted if you click "Explain failure messages"
== Lift Merlin/PLINK format ==
In Merlin/PLINK .map files, each line contains both
genomic position and dbSNP rs number. Our goal here is to use both information to liftOver as many position as possible.There are 3 methods to liftOver and we recommend the first 2 method. The first method is common, and it lifts most genomic positions, however, it does not reflect the dbSNP build change. The second method is more robust in the sense that each lifted rs number has valid genomic position, as its uses dbSNP as data source. The third method is not straigtforward, and we just briefly mention it.
=== Lift Merlin format ===
(2) LiftOver .bed file
Use method mentioned [[#Lift
genomic positions | above]] to convert .bed file from one build to another.
(3) Convert lifted .bed file back to .map file
.ped file have many column files. By convention, the first six columns are family_id, person_id, father_id, mother_id, sex, and phenotype.
From the 7th column, there are two letters/digits representing a genotype at the certain marker. In step (2), as some
genomic positions cannot
be lifted to the new version, we need to drop their corresponding columns from .ped file to keep consistency. You can use PLINK --exclude those snps,
see [http://pngu.mgh.harvard.edu/~purcell/plink/dataman.shtml#exclude Remove a subset of SNPs].
==== Method 3 ====
NCBI dbSNP team has provided a [#Resources | provisional map] for converting the
genomic position of a larget set dbSNP from NCBI build 36 to NCBI build 37. In the second step, we have obtained unlifted genomic positions, so we can try to use the table to convert those unlfted dbSNPs.
After this step, there are still some SNPs that cannot be lifted, as they are mostly located on non-reference chromosome.
Note: due to the limitation of the provisional map, some SNP can have multiple locations.
(2) Use provisional map to update .map file
By joining .map file and this provisional map, we can obtain the new
genomic position in the new build.
Note: provisional map uses 1-based chromosomal index. Things will get tricker if we want to lift non-single site SNP e.g. AA/GG
Since provisional map provides a range in this case, it is necessary to know the
genomic position of that single base provided in the .map file,
and then we can look up the table, so it is not straigtforward.
== Various reasons that lift over could fail ==
Genomic position cannot be lifted ===When a SNP resides in a contig that only exists in older reference build, liftOver cannot give it new genomic.
You can try the following SNP (in BED format) in UCSC online liftOver site:
=== SNP in higher build are located in non-referernce assembly ===
Some SNP are not in autosomes or sex chromosomes in NCBI build 37. dbSNP does not include them.
You cannot use dbSNP database to lookup its
genomic position by rs number.
Take rs1006094 as an example:
== Resources ==
* liftRsNumber.py [
[Media: liftRsNumber.py]]* liftMap.py [ [Media: liftMap.py]
* NCBI provisional map [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/misc/exchange/Remap_36_3_37_1.txt.gz file] and [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/misc/exchange/Remap_36_3_37_1.info info]
* NCBI RgMergeArch file and [http://www.ncbi.nlm.nih.gov/SNP/snp_db_table_description.cgi?t=RsMergeArch schema]* NCBI SNPHistory file and [http://www.ncbi.nlm.nih.gov/SNP/snp_db_table_description.cgi?t=SNPHistory schema]
* How UCSC dbSNP differs from NCBI dbSNP [http://genomewiki.ucsc.edu/index.php/DbSNP_Track_Notes UCSC dbSNP track note]
* The dbSNP mapping process [http://www.ncbi.nlm.nih.gov/books/NBK44455/ link]