Changes

From Genome Analysis Wiki
Jump to navigationJump to search
4,351 bytes added ,  17:11, 8 August 2011
Created page with 'LiftOver LiftOver is a necesary step to bring all genetical analysis back to the same reference build. Particularly, our current data are mainly in either NCBI build 36 (UCSC h…'
LiftOver

LiftOver is a necesary step to bring all genetical analysis back to the same reference build.
Particularly, our current data are mainly in either NCBI build 36 (UCSC hg 18) or NCBI build 37 (UCSC hg19).
Although lift over can be from high build to lower build, we always recommend lift lower build to higher/current build.

LiftOver is not hard. The easier way is to use UCSC liftOver tool to lift [http://genome.ucsc.edu/FAQ/FAQformat.html#format1 BED format] file to BED format file.
With additional steps, we can also lift Merlin and PLINK format.

== Lift over using BED files ==

1.1 Binary liftOver tool
Download the [http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/liftOver liftOver binary] from UCSC and [http://hgdownload.cse.ucsc.edu/goldenPath/hg18/liftOver/hg18ToHg19.over.chain.gz hg18 to hg 19 chain file]

Provide BED format file (input.bed)

NOTE: Use the 'chr' before each chromosome name
chr1 743267 743268 rs3115860
chr1 766408 766409 rs12124819
chr1 773885 773886 rs17160939

Run liftOver:

liftOver input.bed hg18ToHg19.over.chain.gz output.bed unlifted.bed

1.2 Web interface
Alternatively, you can lift over BED file in web interface
at: [http://genome.ucsc.edu/cgi-bin/hgLiftOver]
Web interface can tell you why some genomic position cannot
be lifted.

2. Lift Merlin format
PLINK format and Merlin format are nearly identical, except the
.map file.

3. Lift PLINK format
PLINK format usually referrs to .ped and .map files.
We recommend split the jobs in several steps:
(1) convert .map to .bed file

(2) liftOver .bed file

(3) convert lifted .bed file back to .map file

(4) modify .ped file

(5) (optionally) change the rs number in .map file to
newer version


4. Lift RS id numbers
RS number is release by dbSNP. UCSC also make their own copy from the dbSNP release. However, the same SNP build from these two centers are not the same.

4.1 Use dbSNP provided exchange file

4.2 Use the combination of RgMergeArch.bcp.gz and
SNPHistory.bcp.gz

5. Why you cannot lift ?
5.1 genomic position cannot be lifted
Possible reasons:
That could happen if SNP position exists in old build but not in new build.
For example:
Try the following SNP (BED format) cannot be lifted:
20 56737667 56737668 rs1073519

5.2 rs number cannot be lifted between build
Possible reasons:
When dbSNp release new build, some high rs number may be merged to low rs
number because of those rs numbers are actually the same SNP.
This merge process can be complicate. For detail, see:
http://www.ncbi.nlm.nih.gov/books/NBK44395/#FTP.do_you_have_a_table_of_merge
d_snps_s
For example:
rs3001 has merged to rs2032.

5.3. SNP in higher build are located in non-referernce assembly.
For example:
rs1006094
In NCBI dbSNP, this SNP is reported as "Mapped unambiguously on
non-reference assembly only"
Thus it is probably not very useful to lift this SNP.

5.4 different dbSNP list
4. Different dbSNP database
NCBI released dbSNP132 in VCF format, and UCSC also have their version of
dbSNP132 in plain txt format.
The two database files differ not only in file format, but in content as
well.
For example:
rs1054140

NCBI dbSNP website (showed 1 location):
http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=1054140
UCSC genome browser website(showed 2 locations):
http://genome.ucsc.edu/cgi-bin/hgTracks?clade=mammal&org=Human&db=hg19&posit
ion=rs1054140&hgt.suggest=&hgt.suggestTrack=knownGene&pix=800&Submit=submit&
hgsid=205770459&hgt.newJQuery=1
NCBI dbSNP VCF file (no record)
UCSC genome browser (2 locations):
721 chr10 17842693 17842694 rs1054140 0 +
T T A/T genomic single by-cluster,by-submitter 0.5 0
unknown exact 2 MultipleAlignments 8
ABI,BCM-HGSC-SUB,HUMANGENOME_JCVI,ILLUMINA,KRIBB_YJKIM,LEE,SEQUENOM,WI_SSAHA
SNP, 2 T,A, 1.000000,1.000000, 0.500000,0.500000,
maf-5-some-pop,maf-5-all-pops
723 chr10 18089681 18089682 rs1054140 0 +
T T A/T genomic single by-cluster,by-submitter 0.5 0
untranslated-3 exact 2 MultipleAlignments 8
ABI,BCM-HGSC-SUB,HUMANGENOME_JCVI,ILLUMINA,KRIBB_YJKIM,LEE,SEQUENOM,WI_SSAHA
SNP, 2 T,A, 1.000000,1.000000, 0.500000,0.500000,
maf-5-some-pop,maf-5-all-pops

e.g. rs3115860 in UCSC dbSNP 132 appeared twice, but does not appear in NCBI dbSNP 132.

6. Resouces

BED format:
NCBI exchange file and schema:
NCBI RgMergeArch file and schema:
NCBI SNPHistory file and schema:
255

edits

Navigation menu