Changes

LiftOver (view source)

Revision as of 11:24, 10 August 2011

123 bytes added , 11:24, 10 August 2011

no edit summary

Line 2: Line 2:

LiftOver can have three use cases:

−

(1) [[#Lift ~~genomic~~ positions | convert ~~genomic~~ position from one genome assembly to another genome assembly]]

+

(1) [[#Lift genome positions | convert genome position from one genome assembly to another genome assembly]]

−

In most scenarios, we have known ~~genomic~~ positions in NCBI build 36 (UCSC hg 18) and hope to lift them over to NCBI build 37 (UCSC hg19).

+

In most scenarios, we have known genome positions in NCBI build 36 (UCSC hg 18) and hope to lift them over to NCBI build 37 (UCSC hg19).

(2) [[#Lift dbSNP rs numbers | convert dbSNP rs number from one build to another]]

−

(3) [[#Lift Merlin/PLINK format |convert both ~~genomic~~ position and dbSNP rs number over different versions]]

+

(3) [[#Lift Merlin/PLINK format |convert both genome position and dbSNP rs number over different versions]]

It is likely to see such type of data in Merlin/PLINK format.

Line 17: Line 17:

With our customized scripts, we can also lift rsNumber and Merlin/PLINK data files.

−

== Lift ~~genomic~~ positions ==

+

== Lift genome positions ==

−

~~Genomic~~ positions are best represented in [http://genome.ucsc.edu/FAQ/FAQformat.html#format1 BED format]. UCSC provides tools to convert BED file from one genome assembly to another.

+

Genome positions are best represented in [http://genome.ucsc.edu/FAQ/FAQformat.html#format1 BED format]. UCSC provides tools to convert BED file from one genome assembly to another.

=== Binary liftOver tool ===

Line 35: Line 35:

liftOver input.bed hg18ToHg19.over.chain.gz output.bed unlifted.bed

−

unlifted.bed file will contain all ~~genomic~~ positions that cannot be lifted. The reason for that varies. See [[#Various reasons that lift over could fail | Various reasons that lift over could fail]]

+

unlifted.bed file will contain all genome positions that cannot be lifted. The reason for that varies. See [[#Various reasons that lift over could fail | Various reasons that lift over could fail]]

=== Web interface ===

Alternatively, you can lift over BED file in web interface

at: [http://genome.ucsc.edu/cgi-bin/hgLiftOver Link]

−

Web interface can tell you why some ~~genomic~~ position cannot

+

Web interface can tell you why some genome position cannot

be lifted if you click "Explain failure messages"

Line 80: Line 80:

== Lift Merlin/PLINK format ==

−

In Merlin/PLINK .map files, each line contains both ~~genomic~~ position and dbSNP rs number. Our goal here is to use both information to liftOver as many position as possible.

+

In Merlin/PLINK .map files, each line contains both genome position and dbSNP rs number. Our goal here is to use both information to liftOver as many position as possible.

−

There are 3 methods to liftOver and we recommend the first 2 method. The first method is common, and it lifts most ~~genomic~~ positions, however, it does not reflect the dbSNP build change. The second method is more robust in the sense that each lifted rs number has valid ~~genomic~~ position, as its uses dbSNP as data source. The third method is not straigtforward, and we just briefly mention it.

+

There are 3 methods to liftOver and we recommend the first 2 method. The first method is common, and it lifts most genome positions, however, it does not reflect the dbSNP build change. The second method is more robust in the sense that each lifted rs number has valid genome position, as its uses dbSNP as data source. The third method is not straigtforward, and we just briefly mention it.

=== Lift Merlin format ===

Line 107: Line 107:

(2) LiftOver .bed file

−

Use method mentioned [[#Lift ~~genomic~~ positions | above]] to convert .bed file from one build to another.

+

Use method mentioned [[#Lift genome positions | above]] to convert .bed file from one build to another.

(3) Convert lifted .bed file back to .map file

Line 116: Line 116:

.ped file have many column files. By convention, the first six columns are family_id, person_id, father_id, mother_id, sex, and phenotype.

−

From the 7th column, there are two letters/digits representing a genotype at the certain marker. In step (2), as some ~~genomic~~ positions cannot

+

From the 7th column, there are two letters/digits representing a genotype at the certain marker. In step (2), as some genome positions cannot

be lifted to the new version, we need to drop their corresponding columns from .ped file to keep consistency. You can use PLINK --exclude those snps,

see [http://pngu.mgh.harvard.edu/~purcell/plink/dataman.shtml#exclude Remove a subset of SNPs].

Line 145: Line 145:

==== Method 3 ====

−

NCBI dbSNP team has provided a [#Resources | provisional map] for converting the ~~genomic~~ position of a larget set dbSNP from NCBI build 36 to NCBI build 37.

+

NCBI dbSNP team has provided a [#Resources | provisional map] for converting the genome position of a larget set dbSNP from NCBI build 36 to NCBI build 37.

−

In the second step, we have obtained unlifted ~~genomic~~ positions, so we can try to use the table to convert those unlfted dbSNPs.

+

In the second step, we have obtained unlifted genome positions, so we can try to use the table to convert those unlfted dbSNPs.

After this step, there are still some SNPs that cannot be lifted, as they are mostly located on non-reference chromosome.

Note: due to the limitation of the provisional map, some SNP can have multiple locations.

Line 165: Line 165:

(2) Use provisional map to update .map file

−

By joining .map file and this provisional map, we can obtain the new ~~genomic~~ position in the new build.

+

By joining .map file and this provisional map, we can obtain the new genome position in the new build.

Note: provisional map uses 1-based chromosomal index. Things will get tricker if we want to lift non-single site SNP e.g. AA/GG

−

Since provisional map provides a range in this case, it is necessary to know the ~~genomic~~ position of that single base provided in the .map file,

+

Since provisional map provides a range in this case, it is necessary to know the genome position of that single base provided in the .map file,

and then we can look up the table, so it is not straigtforward.

Line 177: Line 177:

== Various reasons that lift over could fail ==

−

=== ~~Genomic~~ position cannot be lifted ===

+

=== Genome position cannot be lifted ===

−

When a SNP resides in a contig that only exists in older reference build, liftOver cannot give it new ~~genomic~~.

+

When a SNP resides in a contig that only exists in older reference build, liftOver cannot give it new genome.

You can try the following SNP (in BED format) in UCSC online liftOver site:

Line 186: Line 186:

=== SNP in higher build are located in non-referernce assembly ===

Some SNP are not in autosomes or sex chromosomes in NCBI build 37. dbSNP does not include them.

−

You cannot use dbSNP database to lookup its ~~genomic~~ position by rs number.

+

You cannot use dbSNP database to lookup its genome position by rs number.

Take rs1006094 as an example:

Line 234: Line 234:

== Resources ==

−

* liftRsNumber.py [~~[Media: liftRsNumber.py]~~]

+

* liftRsNumber.py []

−

* liftMap.py [~~[Media: liftMap.py~~]

+

* liftMap.py []

* NCBI provisional map [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/misc/exchange/Remap_36_3_37_1.txt.gz file] and [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/misc/exchange/Remap_36_3_37_1.info info]

−

* NCBI RgMergeArch file and [http://www.ncbi.nlm.nih.gov/SNP/snp_db_table_description.cgi?t=RsMergeArch schema]

+

* NCBI RgMergeArch [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/database/organism_data/RsMergeArch.bcp.gz file] and [http://www.ncbi.nlm.nih.gov/SNP/snp_db_table_description.cgi?t=RsMergeArch schema]

−

* NCBI SNPHistory file and [http://www.ncbi.nlm.nih.gov/SNP/snp_db_table_description.cgi?t=SNPHistory schema]

+

* NCBI SNPHistory [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/database/organism_data/SNPHistory.bcp.gz file] and [http://www.ncbi.nlm.nih.gov/SNP/snp_db_table_description.cgi?t=SNPHistory schema]

* How UCSC dbSNP differs from NCBI dbSNP [http://genomewiki.ucsc.edu/index.php/DbSNP_Track_Notes UCSC dbSNP track note]

* The dbSNP mapping process [http://www.ncbi.nlm.nih.gov/books/NBK44455/ link]

Zhanxw

255

edits

Changes

LiftOver (view source)

Revision as of 11:24, 10 August 2011

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools