Minimac3 Info (Older Version)

From Genome Analysis Wiki
Jump to: navigation, search

If you downloaded Minimac3 version earlier than 0.1.13 (can check the LOG file; source code downloaded before Oct 15, 2015 or imputed data downloaded from Imputation Server) please see this page for Info File descriptors. If using version 0.1.13 or above, please see new version of Minimac3 Info page !!!

[P.S. There is NO analytical difference between the two versions of the info file. The new version of the Minimac3 info file just makes it easier for users to determine the minor allele in their imputed data. In the earlier versions, we only reported the minor allele frequency (MAF) of the imputed data in the info file, while the minor and major alleles reported in the info file were major/minor alleles with respect to the reference panel allele frequency and NOT imputed data allele frequency. As a result, the users only had the MAF of their imputed data in their info files but NO way of knowing which one was the minor allele in the imputed data (although most of the times we have the same major/minor alleles between the reference and GWAS data, sometimes the alleles get swapped, especially when the MAF is close to 0.5). In the new version, we report the REF and ALT alleles and the ALT allele frequency along with MAF; thus comparing the MAF with the alternate allele frequency, one can easily determine the minor allele in the their imputed data. For users with this earlier version of info file, there is NO need to re-run the imputation. The imputed data is same. Only the info files have slightly different formats. And if you need to determine the minor allele in your imputed data, you just need to calculate the MAF on your side from the imputed data. Please contact Sayantan Das for any queries you have. We, the authors of Minimac3, apologize for any inconvenience caused due to these updates.

Introduction

Minimac3 is a lower memory and more computationally efficient implementation of minimac2. It is an algorithm for genotypic imputation that works on phased genotypes (say from MaCH) and is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy.

This wiki page is designed to give users a detailed explanation of the info file outputted by Minimac3.

Info File Descriptors

The available column descriptors for typical Miniamc3 output are as follows:

SNP

The SNP identifier for the variant. This is usually in the form of chr:position, but could be the rsid of the variant if the user had selected --rsid during the Minimac3 run (provided the input reference panel as the rsid in the INFO column).

REF, ALT

These are the reference and alternate alleles for the variant as imported from the reference panel file (either VCF or M3VCF)

Major, Minor

These are the major and minor alleles for the variant based on the reference allele frequency (and NOT imputed dosage frequency)

DoseMAF

This is the minor allele frequency of the variant in the imputed dosage data (and NOT the reference panel). [NOTE: The DoseMAF does NOT always correspond to the allele frequency of the above Minor allele, since that is based on the reference panel minor allele frequency, and sometimes the imputed minor allele is different from the reference minor allele]

RefAF

This is the alternate allele frequency of the variant in the reference panel (NOT the minor allele frequency). If the RefAF is the lesser than 0.5, then the ALT and Minor alleles are going to be same, if it is greater than 0.5, then they should be swapped.

AvgCall

-

Rsq

This is the estimated value of the squared correlation between imputed genotypes and true, unobserved genotypes. Since true genotypes are not available, this calculation is based on the idea that poorly imputed genotype counts will shrink towards their expectations based on population allele frequencies alone; specifically 2p where p is the frequency of the allele being imputed.

Currently, Minimac3 uses the following definition:

\hat{r}^2 = {{Var(\mbox{Estimated Counts})}\over{\hat{p}(1-\hat{p})}}

Genotyped

This column in an indicator of whether the variant was "Genotyped", "Imputed" or "Genotyped_Only".

LooRsq

This statistic can only be provided for genotyped sites. This is similar to the estimated Rsq above, but the imputed dosages value used to compare are calculated by hiding all known genotypes for the given SNP (called LooDosage).

EmpR, EmpRsq

While the LooRsq statistic completely ignores experimental genotypes, EmpR is calculated by calculating the correlation between the true genotyped values and the imputed dosages that were calculated by hiding all known genotyped for the given SNP (called LooDosage). A negative correlation between imputed and experimental genotypes can indicate allele flips. This statistic also can only be provided for genotyped sites. EmpRsq is the square of this correlation.

Dose1

-

Dose2

-

Download

Minimac3 is available as an undocumented release version. The source files (and binary executable) are available for download in Source Files and commonly used reference panels in VCF and M3VCF formats are available for download in Reference Panels.

Useful Wiki Pages

There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:

Contact

In case of any queries and bugs please contact Sayantan Das.