Minimac3 Usage

From Genome Analysis Wiki
Jump to navigationJump to search

Introduction

Minimac3 is a lower memory and more computationally efficient implementation of minimac2. It is an algorithm for genotypic imputation that works on phased genotypes (say from MaCH) and is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy.

This wiki page is designed to give users a detailed explanation on Minimac3 Usage.

Command Line Options

A typical Minimac3 command line would have the following parameter options:

Command Line Options:
   Reference Haplotypes : --refHaps [], --passOnly
      Target Haplotypes : --haps []
      Output Parameters : --processReference, --prefix [Minimac3.Output],
                          --updateModel, --nobgzip, --doseOutput, --hapOutput,
                          --format [GT,DS]
      Subset Parameters : --chr [], --start, --end, --window
    Starting Parameters : --rec [], --err []
  Estimation Parameters : --rounds [5], --states [200]
       Other Parameters : --help, --cpus [1], --params
              PhoneHome : --noPhoneHome, --phoneHomeThinning [50]

Detailed Usage

The available options of Minimac3 are explained in detail below. See wiki page on Examples and Full list of available options for more details.

Reference Haplotypes

"--refHaps" denotes the main input reference file could either be a VCF file or M3VCF file. No handle is necessary for denoting type of file, program will detect it itself.

Minimac3 can handle both VCF files or M3VCF files as input for the reference panel. The program can itself identify the type of file, and no handle is necessary for that. M3VCF files are customized files created by Minimac3 (possibly in some previous run) that stores large reference panels in a compact form so as to save memory and computation time involved in reading large files. See wiki page on M3VCF files for further details. Users can download commonly used reference panels in both VCF and M3VCF format from Reference Panels.

Target Haplotypes

"--haps" denotes the main input GWAS file which has to be a VCF file (.vcf or .vcf.gz). The extensions are not mandatory.

Minimac3 can handle only VCF files as input for the target/gwas data. Note that input VCF files would be automatically assumed to be pre-phased. Markers which are in the target panel and NOT in the reference panel would be excluded from the output files. User must merge these extra markers back to the original data in order to analyze them.

Output Files

"--prefix" denotes the prefix for the output files (By default: Minimac3.Output)

Minimac3 can output files in both VCF format and .dose format (usual minimac output format). By default, Minimac3 will only output in VCF format and users must use the handle --doseOutput to output in .dose format or the handle --hapOutput to output dosage data in phased format. Output VCF files can store dosage data only in the following formats and in managed by the handle --format (by default : --format DS,GT) :

  • DS : Estimated alternate allele dosage (default).
  • GT : Estimated most likely genotype (default).
  • GP : Estimated posterior genotype probabilities (use handle --format GP).

The handle --processReference is used to ONLY convert reference panels from VCF format to M3VCF format (and save parameter estimates). NO imputation will be performed and thus NO target/gwas haplotypes are required. However, by default, parameter estimation will be done using the reference panel and the estimates will be saved in the M3VCF files. Users should use --rounds 0 in order to opt out of parameter estimation and only compress the reference panel and save it as a M3VCF file. See wiki page on Examples for further details.

[NOTE: While doing imputation, if parameter estimates are found in M3VCF files, Minimac3 will automatically use them for imputation. Users should use handle --updateModel in order to update the parameter estimates using the target/gwas panel as well. However, this is NOT necessary in most cases, unless the user has strong reasons to believe that this might increase the imputation accuracy.]

Subset Parameters

Starting Parameters

Estimation Parameters

Other Parameters

PhoneHome

hjl

Full List of Options

The following table gives a brief description of all the parameters of Minimac3. A detailed description would be available soon.

Parameter Description
--refHaps filename VCF file or M3VCF file containing haplotype data for reference panel.
--passOnly If ON, only variants will FILTER=PASS will be recorded from reference VCF file (does NOT work on M3VCF files yet).
--haps filename File containing haplotype data for target (gwas) samples. Must be a VCF file.
--processReference This option will only convert an input VCF file to M3VCF format (maybe for a later run of imputation). If this option is ON, no imputation would be performed and thus all other parameters will be ignored (of course, except for parameters on Reference Haplotypes and Subsetting Options). This option also does parameter estimation using the reference panel and saves them in the M3VCF file (the estimation can be skipped with rounds = 0)
--prefix output Prefix for all output files generated. By default: [Minimac3.Output]
--updateModel If ON, saved parameter estimates read from a M3VCF file will be further updated using the gwas samples. Will be ignored if VCF reference file. [Default: OFF]
--nobgzip If ON, output files will be NOT bgzipped.
--doseOutput If ON, imputed data will be output as dosage file as well [Default: OFF].
--hapOutput If ON, phased imputed data will be output as well [Default: OFF].
--format Specifies which fields to output for the FORMAT field in output VCF file. Available handles: GT,DS,GP [Default: GT,DS].
--chr 22 Chromosome number for which we will carry out imputation.
--start 100000 Start position for imputation by chunking. Would not work without --chr option.
--end 200000 End position for imputation by chunking. Would not work without --chr option.
--window 5000 Length of buffer region on either side of --start and --end. By default = 0.
--rec Recombination File from previous run of Minimac/Minimac3. (--err parameter must also be provided, if using this handle)
--err Error File from previous run of Minimac/Minimac3. (--rec parameter must also be provided, if using this handle)
--rounds 5 Rounds of optimization for model parameters, which describe population recombination rates and per SNP error rates. By default = 5.
--states 200 Maximum number of reference (or target) haplotypes to be examined during parameter optimization. By default = 200.
--help A short help on options.
--cpus 5 Number of cpus for parallel computing. Would work only with Minimac3-omp.
--noPhoneHome If ON, code will NOT send a SUCCESS/FAILURE status of the execution to home server.
--phoneHomeThinning 50 Percentage probability of sending SUCCESS/FAILURE status of the execution to home server [Default: 50%]

Download

Minimac3 is available as an undocumented release version. The source files (and binary executable) are available for download in Source Files and commonly used reference panels in VCF and M3VCF formats are available for download in Reference Panels.

Contact

In case of any queries and bugs please contact Sayantan Das.