Changes

From Genome Analysis Wiki
Jump to navigationJump to search
no edit summary
Line 2: Line 2:  
* [[MaCH]] (concurrent phasing approach).
 
* [[MaCH]] (concurrent phasing approach).
 
OR
 
OR
* [[Minimac]] (pre-phasing approach).  
+
* [[Minimac]] (pre-phasing / 2-step approach).  
      Line 32: Line 32:     
Reference haplotypes generated by the 1000 Genomes project and formatted so that they are ready for analysis are available from the [http://www.sph.umich.edu/csg/abecasis/MaCH/download/1000G-2010-08.html MaCH download page]. In our hands, this August 2010 release is substantially better than previous 1000 Genome Project genotype call sets.
 
Reference haplotypes generated by the 1000 Genomes project and formatted so that they are ready for analysis are available from the [http://www.sph.umich.edu/csg/abecasis/MaCH/download/1000G-2010-08.html MaCH download page]. In our hands, this August 2010 release is substantially better than previous 1000 Genome Project genotype call sets.
 +
    
== MaCH Imputation ==
 
== MaCH Imputation ==
      
=== Estimating Model Parameters ===
 
=== Estimating Model Parameters ===
Line 119: Line 119:       −
== minimac Imputation ==
+
== Pre-phasing / 2-Step Imputation ==
    
=== Pre-phasing - MaCH ===
 
=== Pre-phasing - MaCH ===
   −
A typical MaCH command line to estimate phased haplotypes might look like this:
+
Pre-phasing / 2-Step imputation starts with the pre-phasing of your genotypes using MaCH. A typical MaCH command line to estimate phased haplotypes might look like this:
   −
   mach1 -d sample.dat -p sample.ped --rounds 20 --states 200 --phase --interim 5 --sample 5 --compact
+
   mach1 -d sample.dat -p sample.ped --rounds 20 --states 200 --phase --interim 5 --sample 5
   −
This will request that MaCH estimate haplotypes for your sample, using 20 iterations of its Markov sampler and conditioning each update on up to 200 haplotypes. A summary description of these parameters follows (but for a more complete description, you should go to the MaCH website):
+
This will request that MaCH estimate haplotypes for your sample, using 20 iterations of its Markov sampler and conditioning each update on up to 200 haplotypes.  
 +
A summary description of these parameters follows (but for a more complete description, you should go to the MaCH website):
    
{| class="wikitable" border="1" cellpadding="2"
 
{| class="wikitable" border="1" cellpadding="2"
Line 141: Line 142:  
|-
 
|-
 
| <code>--states 200</code>
 
| <code>--states 200</code>
| Number of haplotypes to consider during each update. Increasing this value will typically lead to better haplotypes, but can dramatically increase computing time and memory use. A value of 100 - 400 is typical.  
+
| Number of haplotypes to consider during each update. Increasing this value will typically lead to better haplotypes, but can dramatically increase computing time and memory use. A value of 200 - 400 is typical.  
 
|-
 
|-
 
| <code>--rounds 20</code>
 
| <code>--rounds 20</code>
| Iterations of the Markov sampler to use for haplotyping. Typically, using 20 - 100 rounds should give good results. To obtain better results, it is usually better to increase the <code>--states</code> parameter.
+
| Iterations of the Markov sampler to use for haplotyping. Typically, using 20-30 rounds should give good results. To obtain better results, it is usually better to increase the <code>--states</code> parameter.
 
|-
 
|-
 
| <code>--interim 5</code>
 
| <code>--interim 5</code>
Line 156: Line 157:  
|-
 
|-
 
| <code>--compact</code>
 
| <code>--compact</code>
| Reduce memory use at the cost of approximately doubling runtime. This option is recommended for most GWAS scale datasets and computing platforms.
+
| Reduce memory use at the cost of approximately doubling runtime.
 
|}
 
|}
  
550

edits

Navigation menu