Difference between revisions of "Minimac3 Usage"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 40: Line 40:
 
==Output Files==
 
==Output Files==
  
<font face=Courier>"--prefix"</font> denotes the prefix for the output files (By default: <code>Minimac3.Output</code>)
+
<font face=Courier>"--prefix"</font> denotes the prefix for the output files (By default: <font face=Courier>Minimac3.Output</font>)
  
Minimac3 can output files in both <code>VCF</code> format and <code>.dose</code> format (usual [http://genome.sph.umich.edu/wiki/Minimac minimac] output format). By default, Minimac3 will only output in <code>VCF</code> format and users must use the handle <code>--doseOutput</code> to output in <code>.dose</code> format or the handle <code>--hapOutput</code> to output dosage data in phased format. Output VCF files can store dosage data only in the following formats and in managed by the handle <code>--format</code> (by default : <code>--format DS,GT</code>) :
+
Minimac3 can output files in both <font face=Courier>VCF</font> format and <font face=Courier>.dose</font> format (usual [http://genome.sph.umich.edu/wiki/Minimac minimac] output format). By default, Minimac3 will only output in <font face=Courier>VCF</font> format and users must use the handle <font face=Courier>--doseOutput</font> to output in <font face=Courier>.dose</font> format or the handle <font face=Courier>--hapOutput</font> to output dosage data in phased format. Output VCF files can store dosage data only in the following formats and in managed by the handle <font face=Courier>--format</font> (by default : <font face=Courier>--format DS,GT</font>) :
  
 
* '''DS''' : Estimated alternate allele dosage (default).
 
* '''DS''' : Estimated alternate allele dosage (default).
 
* '''GT''' : Estimated most likely genotype (default).
 
* '''GT''' : Estimated most likely genotype (default).
* '''GP''' : Estimated posterior genotype probabilities (use handle <code>--format GP</code>).
+
* '''GP''' : Estimated posterior genotype probabilities (use handle <font face=Courier>--format GP</font>).
  
The handle <code>--processReference</code> is used to ONLY convert reference panels from <code>VCF</code> format to [[M3VCF Files|<font face=Courier>M3VCF</font>]] format (and save parameter estimates). NO imputation will be performed and thus NO target/gwas haplotypes are required. However, by default, parameter estimation will be done using the reference panel and the estimates will be saved in the <code>M3VCF</code> files. Users should use <code>--rounds  0</code> in order to opt out of parameter estimation and only compress the reference panel and save it as a <code>M3VCF</code> file. See wiki page on [[Minimac3 Examples|Examples]] for further details.
+
The handle <font face=Courier>--processReference</font> is used to ONLY convert reference panels from <font face=Courier>VCF</font> format to [[M3VCF Files|<font face=Courier>M3VCF</font>]] format (and save parameter estimates). NO imputation will be performed and thus NO target/gwas haplotypes are required. However, by default, parameter estimation will be done using the reference panel and the estimates will be saved in the <font face=Courier>M3VCF</font> files. Users should use <font face=Courier>--rounds  0</font> in order to opt out of parameter estimation and only compress the reference panel and save it as a <font face=Courier>M3VCF</font> file. See wiki page on [[Minimac3 Examples|Examples]] for further details.
  
[NOTE: While doing imputation, if parameter estimates are found in <code>M3VCF</code> files, Minimac3 will automatically use them for imputation.  Users should use handle <code>--updateModel</code> in order to update the parameter estimates using the target/gwas panel as well. However, this is NOT necessary in most cases, unless the user has strong reasons to believe that this might increase the imputation accuracy.]
+
[NOTE: While doing imputation, if parameter estimates are found in <font face=Courier>M3VCF</font> files, Minimac3 will automatically use them for imputation.  Users should use handle <font face=Courier>--updateModel</font> in order to update the parameter estimates using the target/gwas panel as well. However, this is NOT necessary in most cases, unless the user has strong reasons to believe that this might increase the imputation accuracy.]
  
 
== Subset Parameters ==
 
== Subset Parameters ==
Line 72: Line 72:
 
!  scope="col" width="1225px" | Description
 
!  scope="col" width="1225px" | Description
 
|-  
 
|-  
| <code>--refHaps filename </code>  
+
| <font face=Courier>--refHaps filename </font>  
|  VCF file or <code>M3VCF</code> file containing haplotype data for reference panel.
+
|  VCF file or <font face=Courier>M3VCF</font> file containing haplotype data for reference panel.
 
|-  
 
|-  
| <code>--passOnly</code>  
+
| <font face=Courier>--passOnly</font>  
| If ON, only variants will FILTER=PASS will be recorded from reference VCF file (does NOT work on <code>M3VCF</code> files yet).   
+
| If ON, only variants will FILTER=PASS will be recorded from reference VCF file (does NOT work on <font face=Courier>M3VCF</font> files yet).   
 
|-  
 
|-  
| <code>--haps filename </code>
+
| <font face=Courier>--haps filename </font>
 
| File containing haplotype data for target (gwas) samples. Must be a VCF file.
 
| File containing haplotype data for target (gwas) samples. Must be a VCF file.
 
|-
 
|-
| <code>--processReference</code>  
+
| <font face=Courier>--processReference</font>  
| This option will only convert an input VCF file to M3VCF format (maybe for a later run of imputation). If this option is ON, no imputation would be performed and thus all other parameters will be ignored (of course, except for parameters on Reference Haplotypes and Subsetting Options). This option also does parameter estimation using the reference panel and saves them in the M3VCF file (the estimation can be skipped with <code>rounds = 0</code>)
+
| This option will only convert an input VCF file to M3VCF format (maybe for a later run of imputation). If this option is ON, no imputation would be performed and thus all other parameters will be ignored (of course, except for parameters on Reference Haplotypes and Subsetting Options). This option also does parameter estimation using the reference panel and saves them in the M3VCF file (the estimation can be skipped with <font face=Courier>rounds = 0</font>)
 
|-  
 
|-  
| <code>--prefix output </code>
+
| <font face=Courier>--prefix output </font>
| Prefix for all output files generated. By default: <code>[Minimac3.Output]</code>
+
| Prefix for all output files generated. By default: <font face=Courier>[Minimac3.Output]</font>
 
|-   
 
|-   
| <code>--updateModel</code>
+
| <font face=Courier>--updateModel</font>
 
| If ON, saved parameter estimates read from a M3VCF file will be further updated using the gwas samples. Will be ignored if VCF reference file. [Default: OFF]
 
| If ON, saved parameter estimates read from a M3VCF file will be further updated using the gwas samples. Will be ignored if VCF reference file. [Default: OFF]
 
|-
 
|-
| <code>--nobgzip</code>
+
| <font face=Courier>--nobgzip</font>
 
| If ON, output files will be NOT bgzipped.  
 
| If ON, output files will be NOT bgzipped.  
 
|-
 
|-
| <code>--doseOutput</code>
+
| <font face=Courier>--doseOutput</font>
 
| If ON, imputed data will be output as dosage file as well [Default: OFF].  
 
| If ON, imputed data will be output as dosage file as well [Default: OFF].  
 
|-
 
|-
| <code>--hapOutput</code>
+
| <font face=Courier>--hapOutput</font>
 
| If ON, phased imputed data will be output as well [Default: OFF].  
 
| If ON, phased imputed data will be output as well [Default: OFF].  
 
|-
 
|-
| <code>--format</code>
+
| <font face=Courier>--format</font>
| Specifies which fields to output for the FORMAT field in output VCF file. Available handles: <code>GT,DS,GP </code>[Default: <code>GT,DS</code>].  
+
| Specifies which fields to output for the FORMAT field in output VCF file. Available handles: <font face=Courier>GT,DS,GP </font>[Default: <font face=Courier>GT,DS</font>].  
 
|-
 
|-
| <code>--chr 22</code>
+
| <font face=Courier>--chr 22</font>
 
| Chromosome number for which we will carry out imputation.
 
| Chromosome number for which we will carry out imputation.
 
|-  
 
|-  
| <code>--start 100000</code>
+
| <font face=Courier>--start 100000</font>
| Start position for imputation by chunking. Would not work without <code>--chr</code> option.
+
| Start position for imputation by chunking. Would not work without <font face=Courier>--chr</font> option.
 
|-  
 
|-  
| <code>--end 200000</code>
+
| <font face=Courier>--end 200000</font>
| End position for imputation by chunking. Would not work without <code>--chr</code> option.
+
| End position for imputation by chunking. Would not work without <font face=Courier>--chr</font> option.
 
|-
 
|-
  | <code>--window 5000</code>
+
  | <font face=Courier>--window 5000</font>
| Length of buffer region on either side of <code>--start</code> and <code>--end</code>. By default = 0.
+
| Length of buffer region on either side of <font face=Courier>--start</font> and <font face=Courier>--end</font>. By default = 0.
 
|-  
 
|-  
| <code>--rec</code>
+
| <font face=Courier>--rec</font>
| Recombination File from previous run of Minimac/Minimac3. (<code>--err</code> parameter must also be provided, if using this handle)
+
| Recombination File from previous run of Minimac/Minimac3. (<font face=Courier>--err</font> parameter must also be provided, if using this handle)
 
|-  
 
|-  
| <code>--err</code>
+
| <font face=Courier>--err</font>
| Error File from previous run of Minimac/Minimac3. (<code>--rec</code> parameter must also be provided, if using this handle)
+
| Error File from previous run of Minimac/Minimac3. (<font face=Courier>--rec</font> parameter must also be provided, if using this handle)
 
|-  
 
|-  
| <code>--rounds 5</code>
+
| <font face=Courier>--rounds 5</font>
 
| Rounds of optimization for model parameters, which describe population recombination rates and per SNP error rates. By default = 5.
 
| Rounds of optimization for model parameters, which describe population recombination rates and per SNP error rates. By default = 5.
 
|-  
 
|-  
| <code>--states 200</code>
+
| <font face=Courier>--states 200</font>
 
| Maximum number of reference (or target) haplotypes to be examined during parameter optimization. By default = 200.
 
| Maximum number of reference (or target) haplotypes to be examined during parameter optimization. By default = 200.
 
|-  
 
|-  
| <code>--help</code>
+
| <font face=Courier>--help</font>
 
| A short help on options.
 
| A short help on options.
 
|-  
 
|-  
| <code>--cpus 5</code>
+
| <font face=Courier>--cpus 5</font>
 
| Number of cpus for parallel computing. Would work only with Minimac3-omp.
 
| Number of cpus for parallel computing. Would work only with Minimac3-omp.
 
|-  
 
|-  
| <code>--noPhoneHome</code>
+
| <font face=Courier>--noPhoneHome</font>
 
| If ON, code will NOT send a SUCCESS/FAILURE status of the execution to home server.
 
| If ON, code will NOT send a SUCCESS/FAILURE status of the execution to home server.
 
|-  
 
|-  
| <code>--phoneHomeThinning 50</code>
+
| <font face=Courier>--phoneHomeThinning 50</font>
 
| Percentage probability of sending SUCCESS/FAILURE status of the execution to home server [Default: 50%]
 
| Percentage probability of sending SUCCESS/FAILURE status of the execution to home server [Default: 50%]
 
|}
 
|}

Revision as of 20:20, 29 January 2015

Introduction

Minimac3 is a lower memory and more computationally efficient implementation of minimac2. It is an algorithm for genotypic imputation that works on phased genotypes (say from MaCH) and is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy.

This wiki page is designed to give users a detailed explanation on Minimac3 Usage.

Command Line Options

A typical Minimac3 command line would have the following parameter options:

Command Line Options:
   Reference Haplotypes : --refHaps [], --passOnly
      Target Haplotypes : --haps []
      Output Parameters : --processReference, --prefix [Minimac3.Output],
                          --updateModel, --nobgzip, --doseOutput, --hapOutput,
                          --format [GT,DS]
      Subset Parameters : --chr [], --start, --end, --window
    Starting Parameters : --rec [], --err []
  Estimation Parameters : --rounds [5], --states [200]
       Other Parameters : --help, --cpus [1], --params
              PhoneHome : --noPhoneHome, --phoneHomeThinning [50]

Detailed Usage

The available options of Minimac3 are explained in detail below. See wiki page on Examples and subsection below for full list of available options.

Reference Haplotypes

"--refHaps" denotes the main input reference file could either be a VCF file or M3VCF file. No handle is necessary for denoting type of file, program will detect it itself.

Minimac3 can handle both VCF files or M3VCF files as input for the reference panel. The program can itself identify the type of file, and no handle is necessary for that. M3VCF files are customized files created by Minimac3 (possibly in some previous run) that stores large reference panels in a compact form so as to save memory and computation time involved in reading large files. M3VCF files must be generated in some previous run of Minimac3 and can be saved and used in later runs for faster loading of data. See wiki page on M3VCF files for further details.

Target Haplotypes

"--haps" denotes the main input GWAS file which has to be a VCF file (.vcf or .vcf.gz). The extensions are not mandatory.

Minimac3 can handle only VCF files as input for the target/gwas data. Note that input VCF files would be automatically assumed to be pre-phased. Markers which are in the target panel and NOT in the reference panel would be excluded from the output files. User must merge these extra markers back to the original data in order to analyze them.

Output Files

"--prefix" denotes the prefix for the output files (By default: Minimac3.Output)

Minimac3 can output files in both VCF format and .dose format (usual minimac output format). By default, Minimac3 will only output in VCF format and users must use the handle --doseOutput to output in .dose format or the handle --hapOutput to output dosage data in phased format. Output VCF files can store dosage data only in the following formats and in managed by the handle --format (by default : --format DS,GT) :

  • DS : Estimated alternate allele dosage (default).
  • GT : Estimated most likely genotype (default).
  • GP : Estimated posterior genotype probabilities (use handle --format GP).

The handle --processReference is used to ONLY convert reference panels from VCF format to M3VCF format (and save parameter estimates). NO imputation will be performed and thus NO target/gwas haplotypes are required. However, by default, parameter estimation will be done using the reference panel and the estimates will be saved in the M3VCF files. Users should use --rounds 0 in order to opt out of parameter estimation and only compress the reference panel and save it as a M3VCF file. See wiki page on Examples for further details.

[NOTE: While doing imputation, if parameter estimates are found in M3VCF files, Minimac3 will automatically use them for imputation. Users should use handle --updateModel in order to update the parameter estimates using the target/gwas panel as well. However, this is NOT necessary in most cases, unless the user has strong reasons to believe that this might increase the imputation accuracy.]

Subset Parameters

Starting Parameters

Estimation Parameters

Other Parameters

PhoneHome

Full List of Options

The following table gives a brief description of all the parameters of Minimac3. A detailed description would be available soon.

Parameter Description
--refHaps filename VCF file or M3VCF file containing haplotype data for reference panel.
--passOnly If ON, only variants will FILTER=PASS will be recorded from reference VCF file (does NOT work on M3VCF files yet).
--haps filename File containing haplotype data for target (gwas) samples. Must be a VCF file.
--processReference This option will only convert an input VCF file to M3VCF format (maybe for a later run of imputation). If this option is ON, no imputation would be performed and thus all other parameters will be ignored (of course, except for parameters on Reference Haplotypes and Subsetting Options). This option also does parameter estimation using the reference panel and saves them in the M3VCF file (the estimation can be skipped with rounds = 0)
--prefix output Prefix for all output files generated. By default: [Minimac3.Output]
--updateModel If ON, saved parameter estimates read from a M3VCF file will be further updated using the gwas samples. Will be ignored if VCF reference file. [Default: OFF]
--nobgzip If ON, output files will be NOT bgzipped.
--doseOutput If ON, imputed data will be output as dosage file as well [Default: OFF].
--hapOutput If ON, phased imputed data will be output as well [Default: OFF].
--format Specifies which fields to output for the FORMAT field in output VCF file. Available handles: GT,DS,GP [Default: GT,DS].
--chr 22 Chromosome number for which we will carry out imputation.
--start 100000 Start position for imputation by chunking. Would not work without --chr option.
--end 200000 End position for imputation by chunking. Would not work without --chr option.
--window 5000 Length of buffer region on either side of --start and --end. By default = 0.
--rec Recombination File from previous run of Minimac/Minimac3. (--err parameter must also be provided, if using this handle)
--err Error File from previous run of Minimac/Minimac3. (--rec parameter must also be provided, if using this handle)
--rounds 5 Rounds of optimization for model parameters, which describe population recombination rates and per SNP error rates. By default = 5.
--states 200 Maximum number of reference (or target) haplotypes to be examined during parameter optimization. By default = 200.
--help A short help on options.
--cpus 5 Number of cpus for parallel computing. Would work only with Minimac3-omp.
--noPhoneHome If ON, code will NOT send a SUCCESS/FAILURE status of the execution to home server.
--phoneHomeThinning 50 Percentage probability of sending SUCCESS/FAILURE status of the execution to home server [Default: 50%]

Download

Minimac3 is available as an undocumented release version. The source files (and binary executable) are available for download in Source Files and commonly used reference panels in VCF and M3VCF formats are available for download in Reference Panels.

Contact

In case of any queries and bugs please contact Sayantan Das.