Difference between revisions of "Minimac Command Reference"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 36: Line 36:
 
;  --prefix [label]
 
;  --prefix [label]
 
: Use requested prefix for all output files.
 
: Use requested prefix for all output files.
 +
 +
;  --phased
 +
: Output dosages and most likely alleles for each haplotype separately.
 +
 +
;  --probs
 +
: Output probabilities for each genotype. Each row in the output will include two columns per marker. The first of these columns denotes the probability of an homozygote for allele 1. The second column denotes the probability of an heterozygote.
  
 
;  --gzip
 
;  --gzip
 
: Compress output files on the fly, reducing disk space requirements.
 
: Compress output files on the fly, reducing disk space requirements.

Revision as of 23:47, 21 September 2011

Minimac runs are controled by a series of command line parameters that specify input file names, an initial model for recombination and error rates, an optional series of iterations of model refinement using either Monte-Carlo methods or an Expectation-Maximization algorithm, and output file names.

Reference Haplotype Set

--refSNPs [filename]
This option specifies the name of a text file listing markers in the reference haplotype set, one per line.
--refHaps [filename]
This option specifies the name of a text file listing reference haplotypes. This file is typically generated by MaCH.
Each line in the reference haplotype file starts with a haplotype label. This is followed by a series of alleles (one character per marker), optionally separated by whitespace to improve readability. Acceptable allele labels are "A", "C", "G", "T"; upper and lower case letters are treated identically. The digits "1", "2", "3", "4" are also acceptable and treated as aliases for "A", "C", "G", "T".
--snpAliases [filename]
This option points to a file listing mappings between alternate marker names. Each row should include two columns. The first column lists a previous commonly used name for a marker (perhaps from an earlier version of dbSNP) and the second column lists the current preferred name for the marker. An example input file, which maps ids from the 1000 Genomes Project and earlier versions of dbSNP to dbSNP build 134, see dbsnp134-merges.txt.gz.

Target Haplotype Set

--snps [filename]
This option specifies the name of a text file listing markers in the target haplotypes. Only markers that are included in both the reference panel and the target haplotype set will be used to identify shared haplotype stretches.
--haps [filename]
This option specifies the name of a text file listing reference haplotypes. This file is typically generated by MaCH. The file should be formatted just like the reference haplotype file, described in the previous section.

Model Refinement

--rounds [integer]
Iterations of Monte-Carlo or Expectation-Maximization algorithm model refinement algorithm.
--states [integer]
Maximum number of reference and target haplotypes to consider during model refinement.
--em
Instead of performing Monte-Carlo updates to parameter values, use an Expectation-Maximization algorithm.

Output Parameters

--prefix [label]
Use requested prefix for all output files.
--phased
Output dosages and most likely alleles for each haplotype separately.
--probs
Output probabilities for each genotype. Each row in the output will include two columns per marker. The first of these columns denotes the probability of an homozygote for allele 1. The second column denotes the probability of an heterozygote.
--gzip
Compress output files on the fly, reducing disk space requirements.