Difference between revisions of "Minimac3 Usage"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 38: Line 38:
 
         Other Parameters : --help, --cpus [1], --params
 
         Other Parameters : --help, --cpus [1], --params
 
               PhoneHome : --noPhoneHome, --phoneHomeThinning [50]
 
               PhoneHome : --noPhoneHome, --phoneHomeThinning [50]
 
  
 
= Detailed Usage =
 
= Detailed Usage =

Revision as of 00:27, 29 January 2015

Introduction

Minimac3 is a lower memory and more computationally efficient implementation of minimac2. It is an algorithm for genotypic imputation that works on phased genotypes (say from MaCH) and is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy.

This wiki page is designed to give users a detailed explanation on Minimac3 Usage.

Download

Minimac3 is available as an undocumented release version. The source files are available for download here and commonly used reference panels in M3VCF format are available for download in Reference Panels.

  • To Download Minimac3
Description Download Link
Minimac3 Executable UNIX Users
Minimac3-omp Executable (for parallel computing) UNIX Users
Minimac3 Source Files UNIX Users

A typical Minimac3 command line would have the following parameter options:

Command Line Options:
   Reference Haplotypes : --refHaps [], --passOnly
      Target Haplotypes : --haps []
      Output Parameters : --processReference, --prefix [Minimac3.Output],
                          --updateModel, --nobgzip, --doseOutput, --hapOutput,
                          --format [GT,DS]
      Subset Parameters : --chr [], --start, --end, --window
    Starting Parameters : --rec [], --err []
  Estimation Parameters : --rounds [5], --states [200]
       Other Parameters : --help, --cpus [1], --params
              PhoneHome : --noPhoneHome, --phoneHomeThinning [50]

Detailed Usage

The most typically used parameter options are explained below. See wiki page on Examples and subsection below for detailed list of available options.

Reference Haplotypes

"--refHaps" denotes the main input reference file could either be a VCF file or M3VCF file. No handle is necessary for denoting type of file, program will detect it itself.

Minimac3 can handle both VCF files or M3VCF files as input for the reference panel. The program can itself identify the type of file, and no handle is necessary for that. M3VCF files are customized files created by Minimac3 (possibly in some previous run) that stores large reference panels in a compact form so as to save memory and computation time involved in reading large files. M3VCF files must be generated in some previous run of Minimac3 and can be saved and used in later runs for faster loading of data. See section on M3VCF files and examples below on how to use them.

Target Haplotypes

"--haps" denotes the main input target file which has to be a VCF file (.vcf or .vcf.gz). The extensions are not mandatory.

Minimac3 can handle only VCF files as input for the target/gwas data. Note that input VCF files would be automatically assumed to be pre-phased. Markers which are in the target panel and NOT in the reference panel would be excluded from the output files. User must merge these extra markers back to the original data in order to analyze them. See examples below.

Output Files

"--prefix" denotes the prefix for the output files (By default: Minimac3.Output)

Minimac3 can output files in both VCF format and .dose format (usual minimac output format). By default, Minimac3 will only output in VCF format and users must use the handle --doseOutput to output in .dose format or the handle --hapOutput to output dosage data in phased format. VCF files can store dosage data only in the following formats:

  • DS : Estimated alternate allele dosage (default).
  • GT : Estimated most likely genotype (default).
  • GP : Estimated posterior genotype probabilities (use handle --format GP).

The handle --processReference is used to ONLY convert reference panels from VCF format to M3VCF format (and save parameter estimates). NO imputation will be performed and thus NO target/gwas haplotypes are required. However, by default, parameter estimation will be done using the reference panel and the estimates will be saved in the M3VCF files. Users should use --rounds 0 in order to opt out of parameter estimation and only compress the reference panel and save it as a M3VCF file. See examples and the list of options below for further details.

[NOTE: If parameter estimates are found in M3VCF files, Minimac3 will automatically use them for imputation. Users should use handle --updateModel in order to update the parameter estimates using the target/gwas panel as well. However, this is NOT necessary in most cases, unless the user has strong reasons to believe that this might increase the imputation accuracy.]