Changes

From Genome Analysis Wiki
Jump to navigationJump to search
428 bytes added ,  17:36, 13 January 2012
Line 4: Line 4:     
This is the new 2-step procedure we are recommending, particularly considering people that are performing imputation multiple times (using HapMap as reference, or using updated releases of the 1000 Genomes data as reference). <br>
 
This is the new 2-step procedure we are recommending, particularly considering people that are performing imputation multiple times (using HapMap as reference, or using updated releases of the 1000 Genomes data as reference). <br>
The first step is a pre-phasing step using MaCH. This step does not need external reference. This is a time-consuming step BUT is a one-time investment. For computational reason, we recommend break the genome into small overlapping segments ( [http://genome.sph.umich.edu/wiki/MaCH_FAQ#Divide_and_Conquer Divide-and-Conquer]) for this step. In general, we recommend >500Kb overlapping region on each side. For example, for Affymetrix 6.0 panel, if we use core region of 10Mb and flanking/overlapping region of 1Mb on each side, it will correspond to ~3500 SNps in the core region and ~350 SNPs on each side. For 2000 individuals, one job with ~4,200 SNPs running with --states 200 and -r 50, this would take ~40 hours. For other combinations, using the following link to estimate computing time [http://www.sph.umich.edu/csg/yli/MaCH-Admix/runtime.php#est runtime estimate]. <br>
+
The first step is a pre-phasing step using MaCH. This step does not need external reference. This is a time-consuming step BUT is a one-time investment. For computational reason, we recommend breaking the genome into small overlapping segments ( [http://genome.sph.umich.edu/wiki/MaCH_FAQ#Divide_and_Conquer Divide-and-Conquer]) for this step. In general, we recommend >500Kb overlapping region on each side. For example, for Affymetrix 6.0 panel, if we use core region of 10Mb and flanking/overlapping region of 1Mb on each side, it will correspond to ~3500 SNps in the core region and ~350 SNPs on each side. For 2000 individuals, one job with ~4,200 SNPs running with --states 200 and -r 50, this would take ~40 hours. For other combinations, using the following link to estimate computing time [http://www.sph.umich.edu/csg/yli/MaCH-Admix/runtime.php#est runtime estimate]. <br>
 
The second step is the actual imputation step using minimac. This step can run on whole chromosomes. Regarding computing time, one million markers for 1000 individuals using 100 reference haplotypes takes ~ 1 hour; and computing time increases linearly with all the above three parameters. See [http://genome.sph.umich.edu/wiki/Minimac minimac] for details.
 
The second step is the actual imputation step using minimac. This step can run on whole chromosomes. Regarding computing time, one million markers for 1000 individuals using 100 reference haplotypes takes ~ 1 hour; and computing time increases linearly with all the above three parameters. See [http://genome.sph.umich.edu/wiki/Minimac minimac] for details.
 +
 +
=== MaCH-Admix ===
 +
 +
If you are doing imputation only once (think twice if this is really true) or under an immediate time pressure, you can use MaCH-Admix, which does not require pre-phased data and takes ~1/7 of the computing time of the original MaCH. For large dataset, we recommend breaking the genome into small overlapping segments ( [http://genome.sph.umich.edu/wiki/MaCH_FAQ#Divide_and_Conquer Divide-and-Conquer]).
    
=== Divide and Conquer ===
 
=== Divide and Conquer ===
212

edits

Navigation menu