Changes

From Genome Analysis Wiki
Jump to navigationJump to search
817 bytes removed ,  02:37, 16 September 2014
no edit summary
Line 16: Line 16:  
A binary Linux (64 bit) version of minimac2 is available [  from here] and source code [http://www.sph.umich.edu/csg/cfuchsb/minimac.src.tgz  from here]
 
A binary Linux (64 bit) version of minimac2 is available [  from here] and source code [http://www.sph.umich.edu/csg/cfuchsb/minimac.src.tgz  from here]
   −
The current version of minimac should be stamped  - if your version shows a different version number or date stamp when it runs, it is not current.  
+
The current version of minimac2 should be stamped  2014.9.15- if your version shows a different version number or date stamp when it runs, it is not current.  
    
If you use this version, please be sure to stop by the [http://www.sph.umich.edu/csg/abecasis/MaCH/download/ MaCH download page] and fill out the registration form, so that we can let you know when an official release is available and keep you updated with respect to any bug fixes.  
 
If you use this version, please be sure to stop by the [http://www.sph.umich.edu/csg/abecasis/MaCH/download/ MaCH download page] and fill out the registration form, so that we can let you know when an official release is available and keep you updated with respect to any bug fixes.  
Line 22: Line 22:  
== Multiprocessor Version ==
 
== Multiprocessor Version ==
   −
The current version of minimac comes in two flavours, <code>minimac</code> and <code>minimac-omp</code>. The latter version uses the [[OpenMP]] protocol to support multi-threading, resulting in faster throughput.
+
Minimac2 comes in two flavours, <code>minimac2</code> and <code>minimac2-omp</code>. The latter version uses the [[OpenMP]] protocol to support multi-threading, resulting in faster throughput.
   −
BE AWARE: since this version of minimac runs in parallel the order of samples in the output files (*dose, *haps,...) will vary between runs. Therefore, e.g. chunks have to be merged by sample id.
+
BE AWARE: since this version of minimac2 runs in parallel the order of samples in the output files (*dose, *haps,...) will vary between runs. Therefore, e.g. chunks have to be merged by sample id.
    
== Change log ==
 
== Change log ==
Line 34: Line 34:     
= Performance =
 
= Performance =
  −
== Pre-phasing ==
  −
For the pre-phasing step, cost increases quadratically with the number of states and linearly with the number of rounds. The following table provides a simple example.
  −
  −
{| class="wikitable" border="1" cellpadding="2"
  −
|- bgcolor="lightgray"
  −
! States
  −
! Cost per round
  −
|-
  −
| 100 states
  −
| 3 min
  −
|-
  −
| 200 states
  −
| 12 min = (3 min * 2<sup>2</sup>)
  −
|-
  −
| 400 states
  −
| 48 min = (3 min * 4<sup>2</sup>)
  −
|-
  −
| 500 states
  −
| 75 min = (3 min * 5<sup>2</sup>)
  −
|}
  −
  −
So, in this case running haplotyping with 500 states and 10 rounds would require 75 min * 10 = 750 min.
  −
  −
Typically, haplotype quality improves rapidly with the number of states but only slowly with the number of rounds. We recommend running ~20 rounds of the MaCH haplotyper and selecting a number of states as high as your patience will allow (but ideally greater than 200).
      
== Imputation ==
 
== Imputation ==
      −
A good rule of thumb is that minimac2 should take about 1 hour to impute 1,000,000 markers in 1,000 individuals using a reference panel with 1000 haplotypes. Performance should scale linearly with respect to all these factors. So, your approximate computing time in hours should be about:
+
A good rule of thumb is that minimac2 should take about 1 hour to impute 1,000,000 markers in 1,000 individuals using a reference panel with 1000 haplotypes (>10x faster than [[minimac]]). Performance should scale linearly with respect to all these factors. So, your approximate computing time in hours should be about:
    
:<math>
 
:<math>
Line 135: Line 110:  
== Imputation into Phased Haplotypes ==
 
== Imputation into Phased Haplotypes ==
   −
Imputing genotypes using '''minimac''' is an easy and straightforward process: after selecting a set of reference haplotypes, plugging-in the target haplotypes from the previous step and setting the number of rounds to use for the model parameter estimation, imputation should proceed rapidly.
+
Imputing genotypes using '''minimac2''' is an easy and straightforward process: after selecting a set of reference haplotypes, plugging-in the target haplotypes from the previous step and setting the number of rounds to use for the model parameter estimation, imputation should proceed rapidly.
    
=== Creating SNP List File ===
 
=== Creating SNP List File ===
   −
'''Minimac''' requires a file listing markers in the haplotype file. This file can be easily generated by extracting the second column from the .dat file. In a standard Unix system, a command like this should do:
+
'''Minimac2''' requires a file listing markers in the haplotype file. This file can be easily generated by extracting the second column from the .dat file. In a standard Unix system, a command like this should do:
    
   cut -f 2 -d " " sample.dat > target.snps
 
   cut -f 2 -d " " sample.dat > target.snps
   −
=== Running Minimac ===
+
=== Running Minimac2 ===
   −
A typical minimac command line might look like this:
+
A typical minimac2 command line might look like this:
    
==== using a VCF reference panel  ====
 
==== using a VCF reference panel  ====
   minimac --vcfReference --refHaps ref.vcf.gz --haps target.hap.gz --snps target.snps.gz --rounds 5 --states 200 --prefix results
+
   minimac2 --vcfReference --refHaps ref.vcf.gz --haps target.hap.gz --snps target.snps.gz --rounds 5 --states 200 --prefix results
    
'''Note''': GWAS SNPs (file --snps target.snps.gz) are by default expected to be in the chr:pos format e.g. 1:1000 and on build37/hg19; otherwise, please set the --rs flag
 
'''Note''': GWAS SNPs (file --snps target.snps.gz) are by default expected to be in the chr:pos format e.g. 1:1000 and on build37/hg19; otherwise, please set the --rs flag
   −
A detailed description of all minimac options is available [[Minimac Command Reference|elsewhere]]. Here is a brief description of the above parameters:
+
A detailed description of all minimac2 options is available [[Minimac Command Reference|elsewhere]]. Here is a brief description of the above parameters:
    
{| class="wikitable" border="1" cellpadding="2"
 
{| class="wikitable" border="1" cellpadding="2"
Line 201: Line 176:     
=== Imputation quality evaluation ===
 
=== Imputation quality evaluation ===
To evaluate imputation quality, Minimac hides data for each genotyped SNP in turn and calculates 3 statistics:
+
To evaluate imputation quality, Minimac2 hides data for each genotyped SNP in turn and calculates 3 statistics:
 
* looRSQ - this is the estimated rsq for that SNP (as if SNP weren't typed).  
 
* looRSQ - this is the estimated rsq for that SNP (as if SNP weren't typed).  
 
* empR - this is the empirical correlation between true and imputed genotypes for the SNP. If this is negative, the SNP is probably flipped.  
 
* empR - this is the empirical correlation between true and imputed genotypes for the SNP. If this is negative, the SNP is probably flipped.  
550

edits

Navigation menu