Difference between revisions of "Minimac3 Usage"

From Genome Analysis Wiki
Jump to navigationJump to search
 
(7 intermediate revisions by the same user not shown)
Line 11: Line 11:
  
 
  Command Line Options:
 
  Command Line Options:
    Reference Haplotypes : --refHaps [], --passOnly
+
  Reference Haplotypes : --refHaps [], --passOnly, --rsid
 
       Target Haplotypes : --haps []
 
       Target Haplotypes : --haps []
       Output Parameters : --processReference, --prefix [Minimac3.Output],
+
       Output Parameters : --prefix [Minimac3.Output], --processReference,
                           --updateModel, --nobgzip, --doseOutput, --hapOutput,
+
                           --updateModel, --nobgzip, --vcfOutput [ON],
                          --format [GT,DS]
+
                          --doseOutput, --hapOutput, --format [GT,DS],
 +
                          --allTypedSites
 
       Subset Parameters : --chr [], --start, --end, --window
 
       Subset Parameters : --chr [], --start, --end, --window
 
     Starting Parameters : --rec [], --err []
 
     Starting Parameters : --rec [], --err []
 
   Estimation Parameters : --rounds [5], --states [200]
 
   Estimation Parameters : --rounds [5], --states [200]
         Other Parameters : --help, --cpus [1], --params
+
         Other Parameters : --log, --lowMemory, --help, --cpus [1], --params
 
               PhoneHome : --noPhoneHome, --phoneHomeThinning [50]
 
               PhoneHome : --noPhoneHome, --phoneHomeThinning [50]
  
Line 36: Line 37:
 
<font face=Courier>"--haps"</font> denotes the main input GWAS file which has to be a VCF file (<font face=Courier>.vcf</font> or <font face=Courier>.vcf.gz</font>). The extensions are not mandatory.  
 
<font face=Courier>"--haps"</font> denotes the main input GWAS file which has to be a VCF file (<font face=Courier>.vcf</font> or <font face=Courier>.vcf.gz</font>). The extensions are not mandatory.  
  
Minimac3 can handle only VCF files as input for the target/gwas data. Note that input VCF files would be automatically assumed to be pre-phased. Markers which are in the target panel and NOT in the reference panel would be excluded from the output files. User must merge these extra markers back to the original data in order to analyze them.
+
Minimac3 can handle only VCF files as input for the GWAS data (see page on [[Minimac3 Cookbook : Converting Files to VCF|Converting Files to VCF]]). Note that input VCF files would be automatically assumed to be pre-phased (see page on [[Minimac3 Cookbook : Pre-Phasing|Pre-Phasing]]). Markers which are in the target panel and NOT in the reference panel would be excluded from the output files. User must merge these extra markers back to the original data in order to analyze them.
  
 
==Output Files==
 
==Output Files==
Line 56: Line 57:
 
This sub-section explains the remaining parameters available.
 
This sub-section explains the remaining parameters available.
  
* '''Subset Parameters:''' The subset parameters are required if the user wishes to impute into a particular region of the chromosome rather than the whole chromosome (typically used when running imputation in chunks). For example, to analyze chromosome 6 from position 1000000 to position 2000000 with 500000 base positions on either side as a buffer, one must use <font face=Courier>--chr 6 --from 1000000  --to 2000000 --window 500000 </font>. If using the subset parameters, a default window of 1Mbp is applied on either side, unless otherwise specified by the user. Variants from the buffer region are only used for imputation and not reported in the final output.
+
* '''Subset Parameters:''' The subset parameters are required if the user wishes to impute into a particular region of the chromosome rather than the whole chromosome (typically used when running imputation in chunks). For example, to analyze chromosome 6 from position 1000000 to position 2000000 with 500000 base positions on either side as a buffer, one must use <font face=Courier>--chr 6 --from 1000000  --to 2000000 --window 500000 </font>. If using the subset parameters, a default window of 500 Kbp is applied on either side, unless otherwise specified by the user. Variants from the buffer region are only used for imputation and not reported in the final output.
  
 
* '''Starting Parameters:''' The starting parameters are used if the users wishes to use some previously created parameter estimate files to save time on parameter estimation (<font face=Courier>.recom</font> and <font face=Courier>.erate</font> files can be used with <font face=Courier>--rec</font> and <font face=Courier>--err</font> respectively).
 
* '''Starting Parameters:''' The starting parameters are used if the users wishes to use some previously created parameter estimate files to save time on parameter estimation (<font face=Courier>.recom</font> and <font face=Courier>.erate</font> files can be used with <font face=Courier>--rec</font> and <font face=Courier>--err</font> respectively).
Line 62: Line 63:
 
* '''Estimation Parameters:''' The estimation parameters specify the number of iterations (<font face=Courier>--rounds [5]</font>) and number of states (<font face=Courier>--states [200]</font>) to consider while implementing the Hidden Markov Model for parameter estimation. Default values of 5 and 200 are used (these would generally give accurate enough estimates and need not be increased unless the user has strong reasons to do so).
 
* '''Estimation Parameters:''' The estimation parameters specify the number of iterations (<font face=Courier>--rounds [5]</font>) and number of states (<font face=Courier>--states [200]</font>) to consider while implementing the Hidden Markov Model for parameter estimation. Default values of 5 and 200 are used (these would generally give accurate enough estimates and need not be increased unless the user has strong reasons to do so).
  
* '''Other Parameters:''' These parameters have varying usage. <font face=Courier>--help</font> would print out a brief documentation of Minimac3 and its usage, <font face=Courier>--cpus [5]</font> would allow the user to use multiple processors when running in parallel (this option is only available when running Minimac3-omp), <font face=Courier>--params</font> is used to print out the current values for the usage parameters.
+
* '''Other Parameters:''' These parameters have varying usage. <font face=Courier>--help</font> would print out a brief documentation of Minimac3 and its usage, <font face=Courier>--cpus [5]</font> would allow the user to use multiple processors when running in parallel (this option is only available when running Minimac3-omp), <font face=Courier>--params</font> is used to print out the current values for the usage parameters,  <font face=Courier>--lowMemory</font> is used to run a lower memory version of Minimac3 that requires 33% lesser memory but 10% more time (for the HRC panel)
  
* '''PhoneHome:''' This option (by default) sends a message to a University of Michigan database about the success/failure of the analysis run (and as to what kind of failure had occurred, if so). No information about the data, file or file-name is sent back.
+
* '''PhoneHome:''' This option (by default) sends a message to a University of Michigan database about the success/failure of the analysis run (and as to what kind of failure had occurred, if so). No information about the data, file or file-name is sent back. User should use the handle <font face=Courier>--noPhoneHome</font> to opt out from this option or should use <font face=Courier>--phoneHomeThinning 50</font> to send back a message with 50% chance (typically used when running lots of command lines).
  
 
= Download =
 
= Download =
Line 77: Line 78:
  
 
* [[Minimac3 Usage | Minimac3 Usage and Documentation]]
 
* [[Minimac3 Usage | Minimac3 Usage and Documentation]]
 +
 +
* [[Minimac3 - Full List of Options]]
  
 
* [[Minimac3 Imputation Cookbook]] ('''Recommended for New Users!!''')
 
* [[Minimac3 Imputation Cookbook]] ('''Recommended for New Users!!''')
 +
 +
* [[Minimac3 Cookbook : Pre-Phasing | Pre-Phasing ]]
 +
 +
* [[Minimac3 Cookbook : Converting Files to VCF| Converting Files to VCF]]
  
 
* [[Minimac3 Examples| Minimac3 Examples]]
 
* [[Minimac3 Examples| Minimac3 Examples]]

Latest revision as of 19:32, 6 June 2016

Introduction

Minimac3 is a lower memory and more computationally efficient implementation of minimac2. It is an algorithm for genotypic imputation that works on phased genotypes (say from MaCH) and is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy.

This wiki page is designed to give users a detailed explanation on Minimac3 Usage.

Command Line Options

A typical Minimac3 command line would have the following parameter options:

Command Line Options:
  Reference Haplotypes : --refHaps [], --passOnly, --rsid
      Target Haplotypes : --haps []
      Output Parameters : --prefix [Minimac3.Output], --processReference,
                          --updateModel, --nobgzip, --vcfOutput [ON],
                          --doseOutput, --hapOutput, --format [GT,DS],
                          --allTypedSites
      Subset Parameters : --chr [], --start, --end, --window
    Starting Parameters : --rec [], --err []
  Estimation Parameters : --rounds [5], --states [200]
       Other Parameters : --log, --lowMemory, --help, --cpus [1], --params
              PhoneHome : --noPhoneHome, --phoneHomeThinning [50]

Detailed Usage

The available options of Minimac3 are explained in detail below. See wiki page on Examples and Full list of Options for more details. There is also a wiki-page on Minimac3 Imputation Cookbook which is recommended for new users !

Reference Haplotypes

"--refHaps" denotes the main input reference file could either be a VCF file or M3VCF file. No handle is necessary for denoting type of file, program will detect it itself.

Minimac3 can handle both VCF files or M3VCF files as input for the reference panel. The program can itself identify the type of file, and no handle is necessary for that. M3VCF files are customized files created by Minimac3 (possibly in some previous run) that stores large reference panels in a compact form so as to save memory and computation time involved in reading large files. See wiki page on M3VCF files for further details. Users can download commonly used reference panels in both VCF and M3VCF format from Reference Panels.

Target Haplotypes

"--haps" denotes the main input GWAS file which has to be a VCF file (.vcf or .vcf.gz). The extensions are not mandatory.

Minimac3 can handle only VCF files as input for the GWAS data (see page on Converting Files to VCF). Note that input VCF files would be automatically assumed to be pre-phased (see page on Pre-Phasing). Markers which are in the target panel and NOT in the reference panel would be excluded from the output files. User must merge these extra markers back to the original data in order to analyze them.

Output Files

"--prefix" denotes the prefix for the output files (By default: Minimac3.Output)

Minimac3 can output files in both VCF format and .dose format (usual minimac output format). By default, Minimac3 will only output in VCF format and users must use the handle --doseOutput to output in .dose format or the handle --hapOutput to output dosage data in phased format. Output VCF files can store dosage data only in the following formats and is managed by the handle --format (by default : --format DS,GT) :

  • DS : Estimated alternate allele dosage (default).
  • GT : Estimated most likely genotype (default).
  • GP : Estimated posterior genotype probabilities (use handle --format GP).

The handle --processReference is used to ONLY convert reference panels from VCF format to M3VCF format (and save parameter estimates). NO imputation will be performed and thus NO target/gwas haplotypes are required. However, by default, parameter estimation will be done using the reference panel and the estimates will be saved in the M3VCF files. Users should use --rounds 0 in order to opt out of parameter estimation and only compress the reference panel and save it as a M3VCF file. See wiki page on Examples for further details.

[NOTE: While doing imputation, if parameter estimates are found in M3VCF files, Minimac3 will automatically use them for imputation. Users should use handle --updateModel in order to update the parameter estimates using the target/gwas panel as well. However, this is NOT necessary in most cases, unless the user has strong reasons to believe that this might increase the imputation accuracy.]

Remaining Parameters

This sub-section explains the remaining parameters available.

  • Subset Parameters: The subset parameters are required if the user wishes to impute into a particular region of the chromosome rather than the whole chromosome (typically used when running imputation in chunks). For example, to analyze chromosome 6 from position 1000000 to position 2000000 with 500000 base positions on either side as a buffer, one must use --chr 6 --from 1000000 --to 2000000 --window 500000 . If using the subset parameters, a default window of 500 Kbp is applied on either side, unless otherwise specified by the user. Variants from the buffer region are only used for imputation and not reported in the final output.
  • Starting Parameters: The starting parameters are used if the users wishes to use some previously created parameter estimate files to save time on parameter estimation (.recom and .erate files can be used with --rec and --err respectively).
  • Estimation Parameters: The estimation parameters specify the number of iterations (--rounds [5]) and number of states (--states [200]) to consider while implementing the Hidden Markov Model for parameter estimation. Default values of 5 and 200 are used (these would generally give accurate enough estimates and need not be increased unless the user has strong reasons to do so).
  • Other Parameters: These parameters have varying usage. --help would print out a brief documentation of Minimac3 and its usage, --cpus [5] would allow the user to use multiple processors when running in parallel (this option is only available when running Minimac3-omp), --params is used to print out the current values for the usage parameters, --lowMemory is used to run a lower memory version of Minimac3 that requires 33% lesser memory but 10% more time (for the HRC panel)
  • PhoneHome: This option (by default) sends a message to a University of Michigan database about the success/failure of the analysis run (and as to what kind of failure had occurred, if so). No information about the data, file or file-name is sent back. User should use the handle --noPhoneHome to opt out from this option or should use --phoneHomeThinning 50 to send back a message with 50% chance (typically used when running lots of command lines).

Download

Minimac3 is available as an undocumented release version. The source files (and binary executable) are available for download in Source Files and commonly used reference panels in VCF and M3VCF formats are available for download in Reference Panels.

Useful Wiki Pages

There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:

Contact

In case of any queries and bugs please contact Sayantan Das.