Minimac3 Info File
For versions earlier than 0.1.13 (downloaded before Oct 15, 2015 or from Imputation Server) please see older version of Minimac3 Info page !!!
Minimac3 is a lower memory and more computationally efficient implementation of minimac2. It is an algorithm for genotypic imputation that works on phased genotypes (say from MaCH) and is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy.
This wiki page is designed to give users a detailed explanation of the info file outputted by Minimac3.
Info File Descriptors
The available column descriptors for typical Miniamc3 output are as follows.
The SNP identifier for the variant. This is usually in the form of chr:position, but could be the rsid of the variant if the user had selected
--rsid during the Minimac3 run (provided the input reference panel as the rsid in the INFO column).
These are the reference and alternate alleles for the variant as imported from the reference panel file (either VCF or M3VCF). The dosage value (see Dosage) in the
.dose file is the alternate allele dosage and NOT major allele dosage as in earlier versions of minimac. Specifcally, the dosage denotes the probability
P(REF,ALT) + 2*P(ALT,ALT).
This is the allele frequency of alternate (ALT) allele in the imputed dosage data (see Dosage).
This is the minor allele frequency of the variant in the imputed dosage data. Comparing the MAF to ALT_Frq would give one the minor allele.
This is the allele frequency of alternate (ALT) allele from hard-call genotypes (see Hard Genotype) (allele with maximum posterior probability used instead of probability of alternate allele)
This is the estimated value of the squared correlation between imputed genotypes and true, unobserved genotypes. Since true genotypes are not available, this calculation is based on the idea that poorly imputed genotype counts will shrink towards their expectations based on population allele frequencies alone; specifically where is the frequency of the allele being imputed.
Currently, Minimac3 uses the following definition (where is the alternate allele frequency and is the imputed alternate allele probability at the haplotype (see Dosage) and is the number of GWAS samples) :
This column in an indicator of whether the variant was "
Imputed" or "
This statistic can only be provided for genotyped sites. This is similar to the estimated Rsq above, but the imputed dosages value used to compare are calculated by hiding all known genotypes for the given SNP (see LooDosage).
While the LooRsq statistic completely ignores experimental genotypes, EmpR is calculated by calculating the correlation between the true genotyped values and the imputed dosages that were calculated by hiding all known genotyped for the given SNP (see LooDosage). A negative correlation between imputed and experimental genotypes can indicate allele flips. This statistic also can only be provided for genotyped sites. EmpRsq is the square of this correlation.
Average LooDosage at haplotypes with alternate allele at this site. A value of 0.97 denotes that out of all the haplotypes with an alternate allele at this site, 97% of them would get imputed accurately to the alternate allele, if this site was assumed to be not genotyped. The closer the value is to 1.0, more accurately has that site been imputed.
One minus the average LooDosage at haplotypes with reference allele at this site. A value of 0.03 denotes that out of all the haplotypes with a reference allele at this site, 3% of them would get imputed in-accurately to the alternate allele, if this site was assumed to be not genotyped. The closer the value is to 0.0, more accurately has that site been imputed.
Minimac3 estimates imputed dosage at an haplotype level by finding the posterior probability of the alternate allele at that site. The genotype dosage is next evaluated as the sum of the haplotype dosages of each haplotype. For e.g. if the estimated posterior probability of the alternate allele is 0.98 and 0.96 in each haplotype, the genotype dosage is output as 0.98 + 0.97 = 1.95.
Minimac3 uses maximum likelihood estimator for hard-call genotypes. For each haplotype, the allele with the maximum posterior probability is assigned, and the final genotype call is obtained from the hard-call haplotypes. For e.g. if the estimated posterior probability of the alternate allele is 0.56 and 0.60 in each haplotype, then the alternate allele is assigned to each haplotype and the final hard-call genotype is output as 1|1. Note that, the hard call genotype is NOT the MLE from the estimated genotype probabilities but instead from the estimated haplotype probabilities. For e.g. in this example, the posterior probability of the genotype 0|1 is maximum and equal to 0.48, but the output hard genotype is not 0|1 but 1|1, because at the haplotype level, each haplotype had more than 50% probability of alternate allele. As it is obvious, such cases will only arise when the sites are not imputed well.
Minimac3 uses an ad-hoc method to estimate imputation accuracy at sites that were genotyped in the study sample. For each such genotyped site, Minimac3 hides all known genotypes for that site and calculates an imputed dosage (in addition to the usual alternate allele dosage calculated assuming the genotypes are known at the site). This special imputed value is called Leave-One-Out dosage (LooDosage) and is only available for genotyped sites. LooDosage is used to calculate Empirical-Rsquare (EmpR, EmpRsq) by directly calculating the Pearson correlation coefficient between LooDosageand known genotypes. It is also used to estimate the LooRsq.
Minimac3 is available as an undocumented release version. The source files (and binary executable) are available for download in Source Files and commonly used reference panels in VCF and M3VCF formats are available for download in Reference Panels.
Useful Wiki Pages
There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:
- Minimac3 Imputation Cookbook (Recommended for New Users!!)
In case of any queries and bugs please contact Sayantan Das.