Changes

Minimac2 (view source)

Revision as of 07:19, 4 September 2014

2,866 bytes removed , 07:19, 4 September 2014

no edit summary

Line 1: Line 1: −

'''minimac2''' is ~~a low memory, computationally efficient implementation~~ of ~~the MaCH algorithm for genotype imputation~~. It is designed to work on phased genotypes and can handle very large reference panels with hundreds or thousands of haplotypes. The name has two parts. The first, "mini", refers to the modest amount of computational resources it requires. The second, "mac", is short hand for [[MaCH]], our widely used algorithm for genotype imputation.

+

'''minimac2''' is an improved version of minimac. It is designed to work on phased genotypes and can handle very large reference panels with hundreds or thousands of haplotypes. The name has two parts. The first, "mini", refers to the modest amount of computational resources it requires. The second, "mac", is short hand for [[MaCH]], our widely used algorithm for genotype imputation.

−

There are several minimac related pages on this wiki. The major ones are:

+

There are several minimac(2) related pages on this wiki. The major ones are:

* [[Minimac]] - This page, the main minimac page.

Line 14: Line 14:

= Download =

−

A binary Linux (64 bit) version of minimac is available [http://www.sph.umich.edu/csg/cfuchsb/minimac-beta-2013.7.17.tgz from here] and source code [http://www.sph.umich.edu/csg/cfuchsb/minimac.src.tgz from here]

−

~~The current version of minimac should be stamped 2013.7.17 - if your version shows a different version number or date stamp when it runs, it is not current.~~

−

If you use this beta version, please be sure to stop by the [http://www.sph.umich.edu/csg/abecasis/MaCH/download/ MaCH download page] and fill out the registration form, so that we can let you know when an official release is available and keep you updated with respect to any bug fixes.

== Multiprocessor Version ==

−

The current version of minimac comes in two flavours, <code>minimac</code> and <code>minimac-omp</code>. The latter version uses the [[OpenMP]] protocol to support multi-threading, resulting in faster throughput.

−

~~BE AWARE: since this version of minimac runs in parallel the order of samples in the output files (*dose, *haps,...) will vary between runs. Therefore, e.g. chunks have to be merged by sample id.~~

== Change log ==

−

~~2013.7.17~~

−

~~- minor bug fixes~~

−

~~-- all variants (SNPs, InDels, SVs) in the reference VCF will be imputed - independent from the FILTER column setting~~

−

~~2012.11.16~~

−

~~- minor bug fixes~~

−

~~2012.10.9~~

−

~~- added: improved support for [http://www.shapeit.fr ShapeIT] phased haplotypes~~

−

~~2012.10.3~~

−

~~- added: full support for reference panel based chunking~~

−

~~2012.9.22~~

−

~~- fixed: chunk chromosome bug~~

−

~~2012.8.6 (early adopter)~~

−

~~- added: chromosome X support~~

−

~~2012.3.14~~

−

~~- fixed: problem with --startposition~~

−

~~2012.2.29~~

−

~~- added: VCF support~~

−

~~- added: IDR (Insertion, Deletion, Reference) support~~

== Questions and Comments ==

Line 94: Line 52:

Typically, haplotype quality improves rapidly with the number of states but only slowly with the number of rounds. We recommend running ~20 rounds of the MaCH haplotyper and selecting a number of states as high as your patience will allow (but ideally greater than 200).

−

~~== Imputation ==~~

−

A good rule of thumb is that minimac should take about 1 hour to impute 1,000,000 markers in 1,000 individuals using a reference panel with 100 haplotypes. Performance should scale linearly with respect to all these factors. So, your approximate computing time in hours should be about:

−

~~:<math>~~

−

~~E(\mbox{Run Time in Hours}) = N_{markers} * N_{individuals} * N_{haplotypes} * 10^{-11}~~

−

~~</math>~~

−

~~These statistics refer to a single core in a modern Intel CPU core and, although your mileage will vary, most modern CPUs should be no more than a few times faster (or slower) than that.~~

−

~~If you are estimating model parameters at the same time as imputing missing genotypes, you can account for the time needed for parameter estimation with the following formula:~~

−

~~:<math>~~

−

~~E(\mbox{Run Time in Hours}) = N_{markers} * ({N_{individuals} + N_{rounds} * N_{states} * 0.75 }) * N_{haplotypes} * 10^{-11}~~

−

~~</math>~~

−

In this updated formula, N<sub>rounds</sub> represents the number of iterations used for parameter refinement and N<sub>states</sub> represents the maximum number of reference and target haplotypes considered for each update.

= Getting Started =

Cfuchsb

550

edits

Changes

Minimac2 (view source)

Revision as of 07:19, 4 September 2014

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools