Changes

Minimac: 1000 Genomes Imputation Cookbook (view source)

Revision as of 22:47, 10 February 2011

19 bytes added , 22:47, 10 February 2011

no edit summary

Line 2: Line 2:

* [[MaCH]] (concurrent phasing approach).

OR

−

* [[Minimac]] (pre-phasing approach).

+

* [[Minimac]] (pre-phasing / 2-step approach).

Line 32: Line 32:

Reference haplotypes generated by the 1000 Genomes project and formatted so that they are ready for analysis are available from the [http://www.sph.umich.edu/csg/abecasis/MaCH/download/1000G-2010-08.html MaCH download page]. In our hands, this August 2010 release is substantially better than previous 1000 Genome Project genotype call sets.

+

== MaCH Imputation ==

−

=== Estimating Model Parameters ===

Line 119: Line 119: −

== ~~minimac~~ Imputation ==

+

== Pre-phasing / 2-Step Imputation ==

=== Pre-phasing - MaCH ===

−

A typical MaCH command line to estimate phased haplotypes might look like this:

+

Pre-phasing / 2-Step imputation starts with the pre-phasing of your genotypes using MaCH. A typical MaCH command line to estimate phased haplotypes might look like this:

−

mach1 -d sample.dat -p sample.ped --rounds 20 --states 200 --phase --interim 5 --sample 5 ~~--compact~~

+

mach1 -d sample.dat -p sample.ped --rounds 20 --states 200 --phase --interim 5 --sample 5

−

This will request that MaCH estimate haplotypes for your sample, using 20 iterations of its Markov sampler and conditioning each update on up to 200 haplotypes. A summary description of these parameters follows (but for a more complete description, you should go to the MaCH website):

+

This will request that MaCH estimate haplotypes for your sample, using 20 iterations of its Markov sampler and conditioning each update on up to 200 haplotypes.

+

A summary description of these parameters follows (but for a more complete description, you should go to the MaCH website):

{| class="wikitable" border="1" cellpadding="2"

Line 141: Line 142:

|-

| <code>--states 200</code>

−

| Number of haplotypes to consider during each update. Increasing this value will typically lead to better haplotypes, but can dramatically increase computing time and memory use. A value of ~~100~~ - 400 is typical.

+

| Number of haplotypes to consider during each update. Increasing this value will typically lead to better haplotypes, but can dramatically increase computing time and memory use. A value of 200 - 400 is typical.

|-

| <code>--rounds 20</code>

−

| Iterations of the Markov sampler to use for haplotyping. Typically, using 20 - ~~100~~ rounds should give good results. To obtain better results, it is usually better to increase the <code>--states</code> parameter.

+

| Iterations of the Markov sampler to use for haplotyping. Typically, using 20-30 rounds should give good results. To obtain better results, it is usually better to increase the <code>--states</code> parameter.

|-

| <code>--interim 5</code>

Line 156: Line 157:

|-

| <code>--compact</code>

−

| Reduce memory use at the cost of approximately doubling runtime~~. This option is recommended for most GWAS scale datasets and computing platforms~~.

+

| Reduce memory use at the cost of approximately doubling runtime.

|}

Cfuchsb

550

edits

Changes

Minimac: 1000 Genomes Imputation Cookbook (view source)

Revision as of 22:47, 10 February 2011

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools