Line 16: |
Line 16: |
| A binary Linux (64 bit) version of minimac2 is available [ from here] and source code [http://www.sph.umich.edu/csg/cfuchsb/minimac.src.tgz from here] | | A binary Linux (64 bit) version of minimac2 is available [ from here] and source code [http://www.sph.umich.edu/csg/cfuchsb/minimac.src.tgz from here] |
| | | |
− | The current version of minimac should be stamped - if your version shows a different version number or date stamp when it runs, it is not current. | + | The current version of minimac2 should be stamped 2014.9.15- if your version shows a different version number or date stamp when it runs, it is not current. |
| | | |
| If you use this version, please be sure to stop by the [http://www.sph.umich.edu/csg/abecasis/MaCH/download/ MaCH download page] and fill out the registration form, so that we can let you know when an official release is available and keep you updated with respect to any bug fixes. | | If you use this version, please be sure to stop by the [http://www.sph.umich.edu/csg/abecasis/MaCH/download/ MaCH download page] and fill out the registration form, so that we can let you know when an official release is available and keep you updated with respect to any bug fixes. |
Line 22: |
Line 22: |
| == Multiprocessor Version == | | == Multiprocessor Version == |
| | | |
− | The current version of minimac comes in two flavours, <code>minimac</code> and <code>minimac-omp</code>. The latter version uses the [[OpenMP]] protocol to support multi-threading, resulting in faster throughput.
| + | Minimac2 comes in two flavours, <code>minimac2</code> and <code>minimac2-omp</code>. The latter version uses the [[OpenMP]] protocol to support multi-threading, resulting in faster throughput. |
| | | |
− | BE AWARE: since this version of minimac runs in parallel the order of samples in the output files (*dose, *haps,...) will vary between runs. Therefore, e.g. chunks have to be merged by sample id. | + | BE AWARE: since this version of minimac2 runs in parallel the order of samples in the output files (*dose, *haps,...) will vary between runs. Therefore, e.g. chunks have to be merged by sample id. |
| | | |
| == Change log == | | == Change log == |
Line 34: |
Line 34: |
| | | |
| = Performance = | | = Performance = |
− |
| |
− | == Pre-phasing ==
| |
− | For the pre-phasing step, cost increases quadratically with the number of states and linearly with the number of rounds. The following table provides a simple example.
| |
− |
| |
− | {| class="wikitable" border="1" cellpadding="2"
| |
− | |- bgcolor="lightgray"
| |
− | ! States
| |
− | ! Cost per round
| |
− | |-
| |
− | | 100 states
| |
− | | 3 min
| |
− | |-
| |
− | | 200 states
| |
− | | 12 min = (3 min * 2<sup>2</sup>)
| |
− | |-
| |
− | | 400 states
| |
− | | 48 min = (3 min * 4<sup>2</sup>)
| |
− | |-
| |
− | | 500 states
| |
− | | 75 min = (3 min * 5<sup>2</sup>)
| |
− | |}
| |
− |
| |
− | So, in this case running haplotyping with 500 states and 10 rounds would require 75 min * 10 = 750 min.
| |
− |
| |
− | Typically, haplotype quality improves rapidly with the number of states but only slowly with the number of rounds. We recommend running ~20 rounds of the MaCH haplotyper and selecting a number of states as high as your patience will allow (but ideally greater than 200).
| |
| | | |
| == Imputation == | | == Imputation == |
| | | |
| | | |
− | A good rule of thumb is that minimac2 should take about 1 hour to impute 1,000,000 markers in 1,000 individuals using a reference panel with 1000 haplotypes. Performance should scale linearly with respect to all these factors. So, your approximate computing time in hours should be about: | + | A good rule of thumb is that minimac2 should take about 1 hour to impute 1,000,000 markers in 1,000 individuals using a reference panel with 1000 haplotypes (>10x faster than [[minimac]]). Performance should scale linearly with respect to all these factors. So, your approximate computing time in hours should be about: |
| | | |
| :<math> | | :<math> |
Line 135: |
Line 110: |
| == Imputation into Phased Haplotypes == | | == Imputation into Phased Haplotypes == |
| | | |
− | Imputing genotypes using '''minimac''' is an easy and straightforward process: after selecting a set of reference haplotypes, plugging-in the target haplotypes from the previous step and setting the number of rounds to use for the model parameter estimation, imputation should proceed rapidly. | + | Imputing genotypes using '''minimac2''' is an easy and straightforward process: after selecting a set of reference haplotypes, plugging-in the target haplotypes from the previous step and setting the number of rounds to use for the model parameter estimation, imputation should proceed rapidly. |
| | | |
| === Creating SNP List File === | | === Creating SNP List File === |
| | | |
− | '''Minimac''' requires a file listing markers in the haplotype file. This file can be easily generated by extracting the second column from the .dat file. In a standard Unix system, a command like this should do: | + | '''Minimac2''' requires a file listing markers in the haplotype file. This file can be easily generated by extracting the second column from the .dat file. In a standard Unix system, a command like this should do: |
| | | |
| cut -f 2 -d " " sample.dat > target.snps | | cut -f 2 -d " " sample.dat > target.snps |
| | | |
− | === Running Minimac === | + | === Running Minimac2 === |
| | | |
− | A typical minimac command line might look like this: | + | A typical minimac2 command line might look like this: |
| | | |
| ==== using a VCF reference panel ==== | | ==== using a VCF reference panel ==== |
− | minimac --vcfReference --refHaps ref.vcf.gz --haps target.hap.gz --snps target.snps.gz --rounds 5 --states 200 --prefix results | + | minimac2 --vcfReference --refHaps ref.vcf.gz --haps target.hap.gz --snps target.snps.gz --rounds 5 --states 200 --prefix results |
| | | |
| '''Note''': GWAS SNPs (file --snps target.snps.gz) are by default expected to be in the chr:pos format e.g. 1:1000 and on build37/hg19; otherwise, please set the --rs flag | | '''Note''': GWAS SNPs (file --snps target.snps.gz) are by default expected to be in the chr:pos format e.g. 1:1000 and on build37/hg19; otherwise, please set the --rs flag |
| | | |
− | A detailed description of all minimac options is available [[Minimac Command Reference|elsewhere]]. Here is a brief description of the above parameters: | + | A detailed description of all minimac2 options is available [[Minimac Command Reference|elsewhere]]. Here is a brief description of the above parameters: |
| | | |
| {| class="wikitable" border="1" cellpadding="2" | | {| class="wikitable" border="1" cellpadding="2" |
Line 201: |
Line 176: |
| | | |
| === Imputation quality evaluation === | | === Imputation quality evaluation === |
− | To evaluate imputation quality, Minimac hides data for each genotyped SNP in turn and calculates 3 statistics: | + | To evaluate imputation quality, Minimac2 hides data for each genotyped SNP in turn and calculates 3 statistics: |
| * looRSQ - this is the estimated rsq for that SNP (as if SNP weren't typed). | | * looRSQ - this is the estimated rsq for that SNP (as if SNP weren't typed). |
| * empR - this is the empirical correlation between true and imputed genotypes for the SNP. If this is negative, the SNP is probably flipped. | | * empR - this is the empirical correlation between true and imputed genotypes for the SNP. If this is negative, the SNP is probably flipped. |