Changes

From Genome Analysis Wiki
Jump to navigationJump to search
489 bytes added ,  23:37, 23 September 2011
no edit summary
Line 55: Line 55:     
With the above in mind, we are able to combine these two tables to obtain the relationship between older rs number and new rs number.
 
With the above in mind, we are able to combine these two tables to obtain the relationship between older rs number and new rs number.
We have developed a script (for internal use), named liftRsNumber.py [/net /dumbo/net/dumbo/home/zhanxw/amd/analyze/verifyBamID/] for lift rs numbers between builds.
+
We have developed a script (for internal use), named [http://genome.sph.umich.edu/wiki/LiftRsNumber.py liftRsNumber.py] AAAA for lift rs numbers between builds.
 
This scripts require RsMergeArch.bcp.gz  and SNPHistory.bcp.gz, those can be found in [[#Resources | Resources]].
 
This scripts require RsMergeArch.bcp.gz  and SNPHistory.bcp.gz, those can be found in [[#Resources | Resources]].
   Line 81: Line 81:  
== Lift Merlin/PLINK format ==
 
== Lift Merlin/PLINK format ==
 
In Merlin/PLINK .map files, each line contains both genome position and dbSNP rs number. Our goal here is to use both information to liftOver as many position as possible.
 
In Merlin/PLINK .map files, each line contains both genome position and dbSNP rs number. Our goal here is to use both information to liftOver as many position as possible.
There are 3 methods to liftOver and we recommend the first 2 method. The first method is common, and it lifts most genome positions, however, it does not reflect the dbSNP build change. The second method is more robust in the sense that each lifted rs number has valid genome position, as its uses dbSNP as data source. The third method is not straigtforward, and we just briefly mention it.
+
There are 3 methods to liftOver and we recommend the first 2 method. The first method is common and applicable in most cases, and in our observations it lifts the most genome positions, however, it does not reflect the rs  number change between different dbSNP builds. The second method is more robust in the sense that each lifted rs number has valid genome position, as it lift over old rs number as the first step by using dbSNP data. The third method is not straigtforward, and we just briefly mention it.
    
=== Lift Merlin format ===
 
=== Lift Merlin format ===
Line 123: Line 123:     
Similar to the human reference build, dbSNP also have different versions. You may consider change rs number from the old dbSNP version to new dbSNP version  
 
Similar to the human reference build, dbSNP also have different versions. You may consider change rs number from the old dbSNP version to new dbSNP version  
depending on your needs Such steps are described in [#Lift dbSNP rs numbers | Lift dbSNP rs numbers].
+
depending on your needs. Such steps are described in [[#Lift dbSNP rs numbers | Lift dbSNP rs numbers]]. AAAA
    
==== Method 2 ====
 
==== Method 2 ====
Line 145: Line 145:     
==== Method 3 ====
 
==== Method 3 ====
NCBI dbSNP team has provided a [#Resources | provisional map] for converting the genome position of a larget set dbSNP from NCBI build 36 to NCBI build 37.  
+
NCBI dbSNP team has provided a [[ #Resources | provisional map ]] AAAA for converting the genome position of a larget set dbSNP from NCBI build 36 to NCBI build 37.  
 
In the second step, we have obtained unlifted genome positions, so we can try to use the table to convert those unlfted dbSNPs.
 
In the second step, we have obtained unlifted genome positions, so we can try to use the table to convert those unlfted dbSNPs.
 
After this step, there are still some SNPs that cannot be lifted, as they are mostly located on non-reference chromosome.
 
After this step, there are still some SNPs that cannot be lifted, as they are mostly located on non-reference chromosome.
Line 175: Line 175:  
Accordingly, we need to deleted SNP genotypes for those cannot be lifted.
 
Accordingly, we need to deleted SNP genotypes for those cannot be lifted.
   −
== Various reasons that lift over could fail ==
+
== Various reasons that lift over can fail ==
    
=== Genome position cannot be lifted ===
 
=== Genome position cannot be lifted ===
Line 195: Line 195:  
It is possible that new dbSNP build does not have certain rs numbers.
 
It is possible that new dbSNP build does not have certain rs numbers.
 
When dbSNp release new build, higher rs number may be merged to lower rs number because of those rs numbers are actually the same SNP.
 
When dbSNp release new build, higher rs number may be merged to lower rs number because of those rs numbers are actually the same SNP.
This merge process can be complicate. For short description, see [#Use RsMergeArch and SNPHistory | Use RsMergeArch and SNPHistory].  
+
This merge process can be complicate. For short description, see [[ #Use RsMergeArch and SNPHistory | Use RsMergeArch and SNPHistory ]].  
 
For detail, see:
 
For detail, see:
   Line 215: Line 215:  
* SNPs that are not mapped on the reference genome (GRCh37)
 
* SNPs that are not mapped on the reference genome (GRCh37)
   −
For UCSC release, see [#Resources | UCSC dbSNP track note]
+
For UCSC release, see [[ #Resources | UCSC dbSNP track note ]]
    
Use rs1054140 as an example:
 
Use rs1054140 as an example:
Line 234: Line 234:     
== Resources ==
 
== Resources ==
* liftRsNumber.py [[liftRsNumber.py]]
+
* liftRsNumber.py [[liftRsNumber.py]] and its interal location: /net/dumbo/net/dumbo/home/zhanxw/amd/analyze/verifyBamID/liftRsNumber.py
 
* liftMap.py [[liftMap.py]]
 
* liftMap.py [[liftMap.py]]
 
* NCBI provisional map [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/misc/exchange/Remap_36_3_37_1.txt.gz file] and [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/misc/exchange/Remap_36_3_37_1.info info]
 
* NCBI provisional map [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/misc/exchange/Remap_36_3_37_1.txt.gz file] and [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/misc/exchange/Remap_36_3_37_1.info info]
 
* NCBI RgMergeArch [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/database/organism_data/RsMergeArch.bcp.gz file] and [http://www.ncbi.nlm.nih.gov/SNP/snp_db_table_description.cgi?t=RsMergeArch schema]
 
* NCBI RgMergeArch [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/database/organism_data/RsMergeArch.bcp.gz file] and [http://www.ncbi.nlm.nih.gov/SNP/snp_db_table_description.cgi?t=RsMergeArch schema]
 
* NCBI SNPHistory [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/database/organism_data/SNPHistory.bcp.gz file] and [http://www.ncbi.nlm.nih.gov/SNP/snp_db_table_description.cgi?t=SNPHistory schema]
 
* NCBI SNPHistory [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/database/organism_data/SNPHistory.bcp.gz file] and [http://www.ncbi.nlm.nih.gov/SNP/snp_db_table_description.cgi?t=SNPHistory schema]
 +
* NCBI SNPChrPosOnRef build 132 [ftp://ftp.ncbi.nih.gov:/snp/organisms/human_9606/database/b132_archive/organism_data/b132_SNPChrPosOnRef_37_1.bcp.gz file] and [http://www.ncbi.nlm.nih.gov/SNP/snp_db_table_description.cgi?t=SNPChrPosOnRef schema]
 
* How UCSC dbSNP differs from NCBI dbSNP [http://genomewiki.ucsc.edu/index.php/DbSNP_Track_Notes UCSC dbSNP track note]
 
* How UCSC dbSNP differs from NCBI dbSNP [http://genomewiki.ucsc.edu/index.php/DbSNP_Track_Notes UCSC dbSNP track note]
 
* The dbSNP mapping process [http://www.ncbi.nlm.nih.gov/books/NBK44455/ link]
 
* The dbSNP mapping process [http://www.ncbi.nlm.nih.gov/books/NBK44455/ link]
Line 254: Line 255:     
Please contact [mailto:zhanxw@umich.edu Xiaowei Zhan].
 
Please contact [mailto:zhanxw@umich.edu Xiaowei Zhan].
 +
 +
 +
AAAA
255

edits

Navigation menu