Minimac3 Usage

From Genome Analysis Wiki
Jump to: navigation, search

Introduction

Minimac3 is a lower memory and more computationally efficient implementation of minimac2. It is an algorithm for genotypic imputation that works on phased genotypes (say from MaCH) and is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy.

This wiki page is designed to give users a detailed explanation on Minimac3 Usage.

Command Line Options

A typical Minimac3 command line would have the following parameter options:

Command Line Options:
  Reference Haplotypes : --refHaps [], --passOnly, --rsid
      Target Haplotypes : --haps []
      Output Parameters : --prefix [Minimac3.Output], --processReference,
                          --updateModel, --nobgzip, --vcfOutput [ON],
                          --doseOutput, --hapOutput, --format [GT,DS],
                          --allTypedSites
      Subset Parameters : --chr [], --start, --end, --window
    Starting Parameters : --rec [], --err []
  Estimation Parameters : --rounds [5], --states [200]
       Other Parameters : --log, --lowMemory, --help, --cpus [1], --params
              PhoneHome : --noPhoneHome, --phoneHomeThinning [50]

Detailed Usage

The available options of Minimac3 are explained in detail below. See wiki page on Examples and Full list of Options for more details. There is also a wiki-page on Minimac3 Imputation Cookbook which is recommended for new users !

Reference Haplotypes

"--refHaps" denotes the main input reference file could either be a VCF file or M3VCF file. No handle is necessary for denoting type of file, program will detect it itself.

Minimac3 can handle both VCF files or M3VCF files as input for the reference panel. The program can itself identify the type of file, and no handle is necessary for that. M3VCF files are customized files created by Minimac3 (possibly in some previous run) that stores large reference panels in a compact form so as to save memory and computation time involved in reading large files. See wiki page on M3VCF files for further details. Users can download commonly used reference panels in both VCF and M3VCF format from Reference Panels.

Target Haplotypes

"--haps" denotes the main input GWAS file which has to be a VCF file (.vcf or .vcf.gz). The extensions are not mandatory.

Minimac3 can handle only VCF files as input for the GWAS data (see page on Converting Files to VCF). Note that input VCF files would be automatically assumed to be pre-phased (see page on Pre-Phasing). Markers which are in the target panel and NOT in the reference panel would be excluded from the output files. User must merge these extra markers back to the original data in order to analyze them.

Output Files

"--prefix" denotes the prefix for the output files (By default: Minimac3.Output)

Minimac3 can output files in both VCF format and .dose format (usual minimac output format). By default, Minimac3 will only output in VCF format and users must use the handle --doseOutput to output in .dose format or the handle --hapOutput to output dosage data in phased format. Output VCF files can store dosage data only in the following formats and is managed by the handle --format (by default : --format DS,GT) :

  • DS : Estimated alternate allele dosage (default).
  • GT : Estimated most likely genotype (default).
  • GP : Estimated posterior genotype probabilities (use handle --format GP).

The handle --processReference is used to ONLY convert reference panels from VCF format to M3VCF format (and save parameter estimates). NO imputation will be performed and thus NO target/gwas haplotypes are required. However, by default, parameter estimation will be done using the reference panel and the estimates will be saved in the M3VCF files. Users should use --rounds 0 in order to opt out of parameter estimation and only compress the reference panel and save it as a M3VCF file. See wiki page on Examples for further details.

[NOTE: While doing imputation, if parameter estimates are found in M3VCF files, Minimac3 will automatically use them for imputation. Users should use handle --updateModel in order to update the parameter estimates using the target/gwas panel as well. However, this is NOT necessary in most cases, unless the user has strong reasons to believe that this might increase the imputation accuracy.]

Remaining Parameters

This sub-section explains the remaining parameters available.

  • Subset Parameters: The subset parameters are required if the user wishes to impute into a particular region of the chromosome rather than the whole chromosome (typically used when running imputation in chunks). For example, to analyze chromosome 6 from position 1000000 to position 2000000 with 500000 base positions on either side as a buffer, one must use --chr 6 --from 1000000 --to 2000000 --window 500000 . If using the subset parameters, a default window of 500 Kbp is applied on either side, unless otherwise specified by the user. Variants from the buffer region are only used for imputation and not reported in the final output.
  • Starting Parameters: The starting parameters are used if the users wishes to use some previously created parameter estimate files to save time on parameter estimation (.recom and .erate files can be used with --rec and --err respectively).
  • Estimation Parameters: The estimation parameters specify the number of iterations (--rounds [5]) and number of states (--states [200]) to consider while implementing the Hidden Markov Model for parameter estimation. Default values of 5 and 200 are used (these would generally give accurate enough estimates and need not be increased unless the user has strong reasons to do so).
  • Other Parameters: These parameters have varying usage. --help would print out a brief documentation of Minimac3 and its usage, --cpus [5] would allow the user to use multiple processors when running in parallel (this option is only available when running Minimac3-omp), --params is used to print out the current values for the usage parameters, --lowMemory is used to run a lower memory version of Minimac3 that requires 33% lesser memory but 10% more time (for the HRC panel)
  • PhoneHome: This option (by default) sends a message to a University of Michigan database about the success/failure of the analysis run (and as to what kind of failure had occurred, if so). No information about the data, file or file-name is sent back. User should use the handle --noPhoneHome to opt out from this option or should use --phoneHomeThinning 50 to send back a message with 50% chance (typically used when running lots of command lines).

Download

Minimac3 is available as an undocumented release version. The source files (and binary executable) are available for download in Source Files and commonly used reference panels in VCF and M3VCF formats are available for download in Reference Panels.

Useful Wiki Pages

There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:

  • Minimac3 Usage and Documentation

Contact

In case of any queries and bugs please contact Sayantan Das.