LiftOver LiftOver is a necesary step to bring all genetical analysis back to the same reference build.
Particularly, our current data are mainly in either NCBI build 36 (UCSC hg 18) or NCBI build 37 (UCSC hg19).
Although lift over can be from
high build to lower build, we always recommend lift lower build to higher/current build.
LiftOver is not hard. The easier way is to use UCSC liftOver tool to lift [http://genome.ucsc.edu/FAQ/FAQformat.html#format1 BED format] file to BED format file.
With additional steps, we can also lift Merlin and PLINK
== Lift over using BED files ==
1.1 Binary liftOver tool
Download the [http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/liftOver liftOver binary] from UCSC and [http://hgdownload.cse.ucsc.edu/goldenPath/hg18/liftOver/hg18ToHg19.over.chain.gz hg18 to hg 19 chain file]
NOTE: Use the 'chr' before each chromosome name
chr1 743267 743268 rs3115860
chr1 766408 766409 rs12124819
chr1 773885 773886 rs17160939
liftOver input.bed hg18ToHg19.over.chain.gz output.bed unlifted.bed
1. 2 Web interface
Alternatively, you can lift over BED file in web interface
Web interface can tell you why some genomic position cannot
be lifted .
2. Lift Merlin format
PLINK format and Merlin format are nearly identical, except the
3. Lift PLINK format
PLINK format usually referrs to .ped and .map files.
We recommend split the jobs in several steps:
(1) convert .map to .bed file
(2) liftOver .bed file
(3) convert lifted .bed file back to .map file
(4) modify .ped file
(5) (optionally) change the rs number in . map file to newer version
4. Lift RS id numbers RS number is release by dbSNP. UCSC also make their own copy from the dbSNP release . However, the same SNP build from these two centers are not the same.
4. 1 Use dbSNP provided exchange file
4. 2 Use the combination of RgMergeArch. bcp. gz and SNPHistory. bcp. gz
5. Why you cannot lift ?
5.1 genomic position cannot be lifted
That could happen if SNP position exists in old build but not in new build.
Try the following SNP (BED format) cannot be lifted:
20 56737667 56737668 rs1073519
5.2 rs number cannot be lifted between build
When dbSNp release new build, some high rs number may be merged to low rs
number because of those rs numbers are actually the same SNP.
This merge process can be complicate. For detail, see:
rs3001 has merged to rs2032.
5. 3. SNP in higher build are located in non-referernce assembly. For example: rs1006094 In NCBI dbSNP , this SNP is reported as "Mapped unambiguously on non- reference assembly only" Thus it is probably not very useful to lift this SNP.
5. 4 different dbSNP list 4. Different dbSNP databaseNCBI released dbSNP132 in VCF format, and UCSC also have their version of dbSNP132 in plain txt format.The two database files differ not only in file format, but in content as well. For example: rs1054140
NCBI dbSNP website (showed 1 location): http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs= 1054140 UCSC genome browser website(showed 2 locations): http://genome.ucsc.edu/cgi-bin/hgTracks?clade= mammal&org= Human&db= hg19&posit ion=rs1054140&hgt.suggest=&hgt.suggestTrack=knownGene&pix=800&Submit=submit& hgsid=205770459&hgt.newJQuery=1 NCBI dbSNP VCF file (no record) UCSC genome browser (2 locations): 721 chr10 17842693 17842694 rs1054140 0 + T T A/T genomic single by-cluster,by-submitter 0.5 0 unknown exact 2 MultipleAlignments 8 ABI,BCM-HGSC-SUB,HUMANGENOME_JCVI,ILLUMINA,KRIBB_YJKIM,LEE,SEQUENOM,WI_SSAHA SNP, 2 T,A, 1.000000,1.000000, 0.500000,0.500000, maf-5-some-pop,maf-5-all-pops 723 chr10 18089681 18089682 rs1054140 0 + T T A/T genomic single by-cluster,by-submitter 0.5 0 untranslated-3 exact 2 MultipleAlignments 8 ABI,BCM-HGSC-SUB,HUMANGENOME_JCVI,ILLUMINA,KRIBB_YJKIM,LEE,SEQUENOM,WI_SSAHA SNP, 2 T,A, 1.000000,1.000000, 0.500000,0.500000, maf-5-some-pop,maf-5-all-pops
e. g. rs3115860 in UCSC dbSNP 132 appeared twice, but does not appear in NCBI dbSNP 132.
BED format: NCBI exchange file and schema: NCBI RgMergeArch file and schema: NCBI SNPHistory file and schema: