Difference between revisions of "Minimac3 Info File"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 6: Line 6:
 
This wiki page is designed to give users '''a detailed explanation of the info file outputted by Minimac3'''.
 
This wiki page is designed to give users '''a detailed explanation of the info file outputted by Minimac3'''.
  
= Detailed Usage =
+
= Info File Descriptors =
  
 
The available options of Minimac3 are explained in detail below. See wiki page on [[Minimac3 Examples|Examples]] and [[Minimac3 - Full List of Options |Full list of Options]] for more details. There is also a wiki-page on [[Minimac3 Imputation Cookbook]] which is recommended for new users !  
 
The available options of Minimac3 are explained in detail below. See wiki page on [[Minimac3 Examples|Examples]] and [[Minimac3 - Full List of Options |Full list of Options]] for more details. There is also a wiki-page on [[Minimac3 Imputation Cookbook]] which is recommended for new users !  
  
==Reference Haplotypes==
+
==='''SNP'''===
  
<font face=Courier>"--refHaps"</font> denotes the main input reference file could either be a VCF file or <font face=Courier>M3VCF</font> file. No handle is necessary for denoting type of file, program will detect it itself.  
+
The SNP identifier for the variant. This is usually in the form of chr:position, but could be the rsid of the variant if the user had selected <code>--rsid</code> during the Minimac3 run.
  
Minimac3 can handle both VCF files or <font face=Courier>M3VCF</font> files as input for the reference panel. The program can itself identify the type of file, and no handle is necessary for that.  <font face=Courier>M3VCF</font> files are customized files created by Minimac3 (possibly in some previous run) that stores large reference panels in a compact form so as to save memory and computation time involved in reading large files. See wiki page on [[M3VCF Files| <font face=Courier>M3VCF</font> files]] for further details. Users can download commonly used reference panels in both VCF and <font face=Courier>M3VCF</font> format from [[Minimac3#Reference Panels for Download |Reference Panels]].
+
==='''REF''', '''ALT'''===
  
==Target Haplotypes==
+
These are the reference and alternate alleles for the variant as imported from the reference panel file (either VCF or M3VCF)
  
<font face=Courier>"--haps"</font> denotes the main input GWAS file which has to be a VCF file (<font face=Courier>.vcf</font> or <font face=Courier>.vcf.gz</font>). The extensions are not mandatory.
+
==='''Major''', '''Minor'''===
  
Minimac3 can handle only VCF files as input for the GWAS data (see page on [[Minimac3 Cookbook : Converting Files to VCF|Converting Files to VCF]]). Note that input VCF files would be automatically assumed to be pre-phased (see page on [[Minimac3 Cookbook : Pre-Phasing|Pre-Phasing]]). Markers which are in the target panel and NOT in the reference panel would be excluded from the output files. User must merge these extra markers back to the original data in order to analyze them.
+
These are the major and minor alleles for the variant based on the reference allele frequency (and NOT imputed dosage frequency)
  
==Output Files==
+
==='''DoseMAF'''===
  
<font face=Courier>"--prefix"</font> denotes the prefix for the output files (By default: <font face=Courier>Minimac3.Output</font>)
+
This is the minor allele frequency of the variant in the imputed dosage data (and NOT the reference panel). [NOTE: The DoseMAF does NOT always correspond to the allele frequency of the listed minor allele, since sometimes the imputed minor allele is different from the reference minor allele]
 
 
Minimac3 can output files in both <font face=Courier>VCF</font> format and <font face=Courier>.dose</font> format (usual [http://genome.sph.umich.edu/wiki/Minimac minimac] output format). By default, Minimac3 will only output in <font face=Courier>VCF</font> format and users must use the handle <font face=Courier>--doseOutput</font> to output in <font face=Courier>.dose</font> format or the handle <font face=Courier>--hapOutput</font> to output dosage data in phased format. Output VCF files can store dosage data only in the following formats and is managed by the handle <font face=Courier>--format</font> (by default : <font face=Courier>--format DS,GT</font>) :
 
 
 
* '''DS''' : Estimated alternate allele dosage (default).
 
* '''GT''' : Estimated most likely genotype (default).
 
* '''GP''' : Estimated posterior genotype probabilities (use handle <font face=Courier>--format GP</font>).
 
 
 
The handle <font face=Courier>--processReference</font> is used to ONLY convert reference panels from <font face=Courier>VCF</font> format to [[M3VCF Files|<font face=Courier>M3VCF</font>]] format (and save parameter estimates). NO imputation will be performed and thus NO target/gwas haplotypes are required. However, by default, parameter estimation will be done using the reference panel and the estimates will be saved in the <font face=Courier>M3VCF</font> files. Users should use <font face=Courier>--rounds  0</font> in order to opt out of parameter estimation and only compress the reference panel and save it as a <font face=Courier>M3VCF</font> file. See wiki page on [[Minimac3 Examples|Examples]] for further details.
 
 
 
[NOTE: While doing imputation, if parameter estimates are found in <font face=Courier>M3VCF</font> files, Minimac3 will automatically use them for imputation.  Users should use handle <font face=Courier>--updateModel</font> in order to update the parameter estimates using the target/gwas panel as well. However, this is NOT necessary in most cases, unless the user has strong reasons to believe that this might increase the imputation accuracy.]
 
 
 
== Remaining Parameters ==
 
 
 
This sub-section explains the remaining parameters available.
 
 
 
* '''Subset Parameters:''' The subset parameters are required if the user wishes to impute into a particular region of the chromosome rather than the whole chromosome (typically used when running imputation in chunks). For example, to analyze chromosome 6 from position 1000000 to position 2000000 with 500000 base positions on either side as a buffer, one must use <font face=Courier>--chr 6 --from 1000000  --to 2000000 --window 500000 </font>. If using the subset parameters, a default window of 1Mbp is applied on either side, unless otherwise specified by the user. Variants from the buffer region are only used for imputation and not reported in the final output.
 
 
 
* '''Starting Parameters:''' The starting parameters are used if the users wishes to use some previously created parameter estimate files to save time on parameter estimation (<font face=Courier>.recom</font> and <font face=Courier>.erate</font> files can be used with <font face=Courier>--rec</font> and <font face=Courier>--err</font> respectively).
 
 
 
* '''Estimation Parameters:''' The estimation parameters specify the number of iterations (<font face=Courier>--rounds [5]</font>) and number of states (<font face=Courier>--states [200]</font>) to consider while implementing the Hidden Markov Model for parameter estimation. Default values of 5 and 200 are used (these would generally give accurate enough estimates and need not be increased unless the user has strong reasons to do so).
 
 
 
* '''Other Parameters:''' These parameters have varying usage. <font face=Courier>--help</font> would print out a brief documentation of Minimac3 and its usage, <font face=Courier>--cpus [5]</font> would allow the user to use multiple processors when running in parallel (this option is only available when running Minimac3-omp), <font face=Courier>--params</font> is used to print out the current values for the usage parameters.
 
 
 
* '''PhoneHome:''' This option (by default) sends a message to a University of Michigan database about the success/failure of the analysis run (and as to what kind of failure had occurred, if so). No information about the data, file or file-name is sent back. User should use the handle <font face=Courier>--noPhoneHome</font> to opt out from this option or should use <font face=Courier>--phoneHomeThinning 50</font> to send back a message with 50% chance (typically used when running lots of command lines).
 
  
 
= Download =
 
= Download =

Revision as of 12:21, 18 September 2015

Introduction

Minimac3 is a lower memory and more computationally efficient implementation of minimac2. It is an algorithm for genotypic imputation that works on phased genotypes (say from MaCH) and is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy.

This wiki page is designed to give users a detailed explanation of the info file outputted by Minimac3.

Info File Descriptors

The available options of Minimac3 are explained in detail below. See wiki page on Examples and Full list of Options for more details. There is also a wiki-page on Minimac3 Imputation Cookbook which is recommended for new users !

SNP

The SNP identifier for the variant. This is usually in the form of chr:position, but could be the rsid of the variant if the user had selected --rsid during the Minimac3 run.

REF, ALT

These are the reference and alternate alleles for the variant as imported from the reference panel file (either VCF or M3VCF)

Major, Minor

These are the major and minor alleles for the variant based on the reference allele frequency (and NOT imputed dosage frequency)

DoseMAF

This is the minor allele frequency of the variant in the imputed dosage data (and NOT the reference panel). [NOTE: The DoseMAF does NOT always correspond to the allele frequency of the listed minor allele, since sometimes the imputed minor allele is different from the reference minor allele]

Download

Minimac3 is available as an undocumented release version. The source files (and binary executable) are available for download in Source Files and commonly used reference panels in VCF and M3VCF formats are available for download in Reference Panels.

Useful Wiki Pages

There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:

Contact

In case of any queries and bugs please contact Sayantan Das.