Minimac Command Reference

From Genome Analysis Wiki
Revision as of 13:25, 18 October 2013 by Cfuchsb (talk | contribs)
Jump to navigationJump to search

Minimac runs are controled by a series of command line parameters that specify input file names, an initial model for recombination and error rates, an optional series of iterations of model refinement using either Monte-Carlo methods or an Expectation-Maximization algorithm, and output file names.

Reference Haplotype Set

--refSNPs [filename]
This option specifies the name of a text file listing markers in the reference haplotype set, one per line.
--refHaps [filename]
This option specifies the name of a text file listing reference haplotypes. This file is typically generated by MaCH.
Each line in the reference haplotype file starts with a haplotype label. This is followed by a series of alleles (one character per marker), optionally separated by whitespace to improve readability. Acceptable allele labels are "A", "C", "G", "T","I","D","R" (I..Insertions, D..Deletion, R..Reference); upper and lower case letters are treated identically. The digits "1", "2", "3", "4" are also acceptable and treated as aliases for "A", "C", "G", "T".
--vcfReference
This option specifies that the provided --refHaps file is provided in VCF format , no --refSNPs file needed.
--vcfstart
This option specifies the start position for chunk based imputation
--vcfend
This option specifies the end position for chunk based imputation
--vcfwindow
This option specifies the size of the buffer region (in bp) to add on each side for chunk based imputation to avoid edge effects. These buffers are not included in the output files.


--snpAliases [filename]
This option points to a file listing mappings between alternate marker names. Each row should include two columns. The first column lists a previous commonly used name for a marker (perhaps from an earlier version of dbSNP) and the second column lists the current preferred name for the marker. An example input file, which maps ids from the 1000 Genomes Project and earlier versions of dbSNP to dbSNP build 134, see dbsnp134-merges.txt.gz.

Target Haplotype Set

MaCH Format

--snps [filename]
This option specifies the name of a text file listing markers in the target haplotypes. Only markers that are included in both the reference panel and the target haplotype set will be used to identify shared haplotype stretches.
--haps [filename]
This option specifies the name of a text file listing reference haplotypes. This file is typically generated by MaCH. The file should be formatted just like the reference haplotype file, described in the previous section.
--rs
In combination with --vcfReference, allows to use rs GWAS SNP identifiers

ShapeIT Format

--sample [filename]
Sample list in ShapeIT format
--shape_haps [filename]
ShapeIT phased haplotypes where missing genotypes will be imputed.
--chr [integer]
Chromosome for which we will carry out imputation (needed to run ShapeIT with chr:pos identifiers - default setting).

Model Refinement

--rounds [integer]
Iterations of Monte-Carlo or Expectation-Maximization algorithm model refinement algorithm.
--states [integer]
Maximum number of reference and target haplotypes to consider during model refinement.
--em
Instead of performing Monte-Carlo updates to parameter values, use an Expectation-Maximization algorithm.

Output Parameters

--prefix [label]
Use requested prefix for all output files.
--phased
Output dosages and most likely alleles for each haplotype separately.
--probs
Output probabilities for each genotype. Each row in the output will include two columns per marker. The first of these columns denotes the probability of an homozygote for allele 1. The second column denotes the probability of an heterozygote.
--gzip
Compress output files on the fly, reducing disk space requirements.