Minimac3 Info (Older Version)
If you downloaded Minimac3 version earlier than 0.1.13 (can check the LOG file; source code downloaded before Oct 15, 2015 or imputed data downloaded from Imputation Server) please see this page for Info File descriptors. If using version 0.1.13 or above, please see new version of Minimac3 Info page !!!
[P.S. There is NO analytical difference between the two versions of the info file. The new version of the Minimac3 info file just makes it easier for users to determine the minor allele in their imputed data. In the earlier versions, we only reported the minor allele frequency (MAF) of the imputed data in the info file, while the minor and major alleles reported in the info file were major/minor alleles with respect to the reference panel allele frequency and NOT imputed data allele frequency. As a result, the users only had the MAF of their imputed data in their info files but NO way of knowing which one was the minor allele in the imputed data (although most of the times we have the same major/minor alleles between the reference and GWAS data, sometimes the alleles get swapped, especially when the MAF is close to 0.5). In the new version, we report the REF and ALT alleles and the ALT allele frequency along with MAF; thus comparing the MAF with the alternate allele frequency, one can easily determine the minor allele in the their imputed data. For users with this earlier version of info file, there is NO need to re-run the imputation. The imputed data is same. Only the info files have slightly different formats. And if you need to determine the minor allele in your imputed data, you just need to calculate the MAF on your side from the imputed data. Please contact Sayantan Das for any queries you have. We, the authors of Minimac3, apologize for any inconvenience caused due to these updates.
Minimac3 is a lower memory and more computationally efficient implementation of minimac2. It is an algorithm for genotypic imputation that works on phased genotypes (say from MaCH) and is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy.
This wiki page is designed to give users a detailed explanation of the info file outputted by Minimac3.
Info File Descriptors
The available column descriptors for typical Miniamc3 output are as follows:
The SNP identifier for the variant. This is usually in the form of chr:position, but could be the rsid of the variant if the user had selected
--rsid during the Minimac3 run (provided the input reference panel as the rsid in the INFO column).
These are the reference and alternate alleles for the variant as imported from the reference panel file (either VCF or M3VCF)
These are the major and minor alleles for the variant based on the reference allele frequency (and NOT imputed dosage frequency)
This is the minor allele frequency of the variant in the imputed dosage data (and NOT the reference panel). [NOTE: The DoseMAF does NOT always correspond to the allele frequency of the above Minor allele, since that is based on the reference panel minor allele frequency, and sometimes the imputed minor allele is different from the reference minor allele]
This is the alternate allele frequency of the variant in the reference panel (NOT the minor allele frequency). If the RefAF is the lesser than 0.5, then the ALT and Minor alleles are going to be same, if it is greater than 0.5, then they should be swapped.
This is the estimated value of the squared correlation between imputed genotypes and true, unobserved genotypes. Since true genotypes are not available, this calculation is based on the idea that poorly imputed genotype counts will shrink towards their expectations based on population allele frequencies alone; specifically where is the frequency of the allele being imputed.
Currently, Minimac3 uses the following definition:
This column in an indicator of whether the variant was "
Imputed" or "
This statistic can only be provided for genotyped sites. This is similar to the estimated Rsq above, but the imputed dosages value used to compare are calculated by hiding all known genotypes for the given SNP (called LooDosage).
While the LooRsq statistic completely ignores experimental genotypes, EmpR is calculated by calculating the correlation between the true genotyped values and the imputed dosages that were calculated by hiding all known genotyped for the given SNP (called LooDosage). A negative correlation between imputed and experimental genotypes can indicate allele flips. This statistic also can only be provided for genotyped sites. EmpRsq is the square of this correlation.
Minimac3 is available as an undocumented release version. The source files (and binary executable) are available for download in Source Files and commonly used reference panels in VCF and M3VCF formats are available for download in Reference Panels.
Useful Wiki Pages
There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:
- Minimac3 Imputation Cookbook (Recommended for New Users!!)
In case of any queries and bugs please contact Sayantan Das.