Changes

From Genome Analysis Wiki
Jump to navigationJump to search
no edit summary
Line 7: Line 7:  
=== Your Own Data ===
 
=== Your Own Data ===
   −
To get started, you will need to store your data in [[Merlin]] format pedigree and data files, one per chromosome. For details, of the Merlin file format, see the [http://www.sph.umich.edu/csg/abecasis/Merlin/tour/input_files.html Merlin tutorial].
+
To get started, you will need to store your data in [[Merlin]] format pedigree and data files, one per chromosome. For details, of the Merlin file format, see the [http://csg.sph.umich.edu/abecasis/Merlin/tour/input_files.html Merlin tutorial].
    
Within each file, markers should be stored by chromosome position. Alleles should be stored in the forward strand and can be encoded as 'A', 'C', 'G' or 'T' (there is no need to use numeric identifiers for each allele).  
 
Within each file, markers should be stored by chromosome position. Alleles should be stored in the forward strand and can be encoded as 'A', 'C', 'G' or 'T' (there is no need to use numeric identifiers for each allele).  
   −
The latest reference panel generated by the 1000 Genomes project uses NCBI Build 37 (HG 19). Make sure that your data is on Build 37 (or Minimac may ignore genotyped markers whose names have changed in build 37). If you are trying to convert your data from an earlier genome build to build 37, you'll probably find the [ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/organism_data/RsMergeArch.bcp.gz dbSNP merge table] ([http://www.ncbi.nlm.nih.gov/SNP/snp_db_table_description.cgi?t=RsMergeArch table description on the NCBI website]), which logs rs# changes between dbSNP builds, and the UCSC online [http://genome.ucsc.edu/cgi-bin/hgLiftOver liftOver tool], which converts genome positions between different genome builds, to be quite useful. We have also documented ([[LiftOver | Link]]) a general procedure to convert genome positions and rs number between builds.
+
The latest reference panel generated by the 1000 Genomes project uses NCBI Build 37 (HG 19). Make sure that your data is on Build 37 (or Minimac may ignore genotyped markers whose names have changed in Build 37). If you are trying to convert your data from an earlier genome build to Build 37, you'll probably find the [ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/database/organism_data/RsMergeArch.bcp.gz dbSNP merge table] ([http://www.ncbi.nlm.nih.gov/SNP/snp_db_table_description.cgi?t=RsMergeArch table description on the NCBI website]), which logs rs# changes between dbSNP builds, and the UCSC online [http://genome.ucsc.edu/cgi-bin/hgLiftOver liftOver tool], which converts genome positions between different genome builds, to be quite useful. We have also documented ([[LiftOver | Link]]) a general procedure to convert genome positions and rs number between builds.
   −
If you are planning to use imputation with the MetaboChip, you might find a list of SNPs whose order varies between NCBI genome build 36 and 37 convenient. Here it is: [http://www.sph.umich.edu/csg/cfuchsb/metab_order_changed.txt List of Metabochip SNPs Whose Order Changes With Build]
+
If you are planning to use imputation with the MetaboChip, you might find a list of SNPs whose order varies between NCBI genome Build 36 and 37 convenient. Here it is: [http://csg.sph.umich.edu/cfuchsb/metab_order_changed.txt List of Metabochip SNPs Whose Order Changes With Build]
    
Note: For the most recent reference panel in VCF format by default GWAS SNPs are expected to be in the chr:pos format e.g. 1:1000; otherwise, for GWAS SNPs in the rs format you have to set the --rs flag
 
Note: For the most recent reference panel in VCF format by default GWAS SNPs are expected to be in the chr:pos format e.g. 1:1000; otherwise, for GWAS SNPs in the rs format you have to set the --rs flag
Line 42: Line 42:  
=== Reference Haplotypes ===
 
=== Reference Haplotypes ===
   −
Reference haplotypes generated by the 1000 Genomes project and formatted so that they are ready for analysis are available from the [http://www.sph.umich.edu/csg/abecasis/MaCH/download/ MaCH download page]. In our hands, it is ideal to always use the most recent release since generation of additional sequence data, improvements in variant discovery, genotyping and haplotyping strategies typically result in noticeable improvements in data quality. When this page was written, the most recent release of 1000 Genomes project haplotypes was the [http://www.sph.umich.edu/csg/abecasis/MACH/download/1000G.2012-03-14.html Integrated Phase I release], including 1092 individuals. We recommend using the reduced GIANT all population panel (no monomorphic and singleton sites)
+
Reference haplotypes generated by the 1000 Genomes project and formatted so that they are ready for analysis are available from the [http://csg.sph.umich.edu/abecasis/MaCH/download/ MaCH download page]. In our hands, it is ideal to always use the most recent release since generation of additional sequence data, improvements in variant discovery, genotyping and haplotyping strategies typically result in noticeable improvements in data quality. When this page was written, the most recent release of 1000 Genomes project haplotypes was the [http://www.sph.umich.edu/csg/abecasis/MACH/download/1000G.2012-03-14.html Integrated Phase I release], including 1092 individuals. We recommend using the reduced GIANT all population panel (no monomorphic and singleton sites)
 
[ftp://share.sph.umich.edu/1000genomes/fullProject/2012.03.14/GIANT.phase1_release_v3.20101123.snps_indels_svs.genotypes.refpanel.ALL.vcf.gz.tgz GIANT.phase1_release_v3.20101123.snps_indels_svs.genotypes.refpanel.ALL.vcf.gz.tgz]
 
[ftp://share.sph.umich.edu/1000genomes/fullProject/2012.03.14/GIANT.phase1_release_v3.20101123.snps_indels_svs.genotypes.refpanel.ALL.vcf.gz.tgz GIANT.phase1_release_v3.20101123.snps_indels_svs.genotypes.refpanel.ALL.vcf.gz.tgz]
   Line 57: Line 57:  
   mach1 -d chr1.dat -p chr1.ped --rounds 20 --states 200 --phase --interim 5 --sample 5 --prefix chr$chr.haps
 
   mach1 -d chr1.dat -p chr1.ped --rounds 20 --states 200 --phase --interim 5 --sample 5 --prefix chr$chr.haps
   −
This will request that MaCH estimate haplotypes for your sample, using 20 iterations of its Markov sampler and conditioning each update on up to 200 haplotypes. A summary description of these parameters follows (but for a more complete description, you should go to the [http://www.sph.umich.edu/csg/abecasis/MaCH/ MaCH website]):
+
This will request that MaCH estimate haplotypes for your sample, using 20 iterations of its Markov sampler and conditioning each update on up to 200 haplotypes. A summary description of these parameters follows (but for a more complete description, you should go to the [http://csg.sph.umich.edu/abecasis/MaCH/ MaCH website]):
    
{| class="wikitable" border="1" cellpadding="2"
 
{| class="wikitable" border="1" cellpadding="2"
Line 65: Line 65:  
|-  
 
|-  
 
|style=white-space:nowrap|<code>-d sample.dat</code>
 
|style=white-space:nowrap|<code>-d sample.dat</code>
| Data file in [http://www.sph.umich.edu/csg/abecasis/Merlin/tour/input_files.html Merlin format]. Markers should be listed according to their order along the chromosome.
+
| Data file in [http://csg.sph.umich.edu/abecasis/Merlin/tour/input_files.html Merlin format]. Markers should be listed according to their order along the chromosome.
 
|-  
 
|-  
 
| <code>-p sample.ped</code>
 
| <code>-p sample.ped</code>
| Pedigree file in [http://www.sph.umich.edu/csg/abecasis/Merlin/tour/input_files.html Merlin format]. Alleles should be labeled on the forward strand.
+
| Pedigree file in [http://csg.sph.umich.edu/abecasis/Merlin/tour/input_files.html Merlin format]. Alleles should be labeled on the forward strand.
 
|-
 
|-
 
| <code>--states 200</code>
 
| <code>--states 200</code>
Line 120: Line 120:  
|-  
 
|-  
 
| <code>--refHaps ref.hap.gz </code>  
 
| <code>--refHaps ref.hap.gz </code>  
| Reference haplotypes (e.g. from [http://www.sph.umich.edu/csg/abecasis/MACH/download/1000G-2010-06.html MaCH download page])
+
| Reference haplotypes (e.g. from [http://csg.sph.umich.edu/abecasis/MACH/download/1000G-2010-06.html MaCH download page])
 
|-
 
|-
 
| <code>--vcfReference </code>  
 
| <code>--vcfReference </code>  
550

edits

Navigation menu