Minimac3 Imputation Cookbook
Introduction
Minimac3 is a lower memory and more computationally efficient implementation of minimac2. It is an algorithm for genotypic imputation that works on phased genotypes and is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy.
This wiki page is designed to give users a detailed step-by-step description on running typical GWAS imputation experiments.
Imputation Cookbook
This section gives a brief summary of the steps required to go through an experiment of imputation on typical GWAS samples.
Preliminary Data Quality Control
Before pre-phasing and imputation, users must ensure that their data is quality controlled. Standard quality control filters involve excluding markers with high missingness rate, high deviations from Hardy-Weinberg equilibrium, high discordance rates (if duplicate copies available), excess Mendelian inconsistencies etc. and removing samples with high missingness rate, unusual heterozygosity, high inbreeding coefficient, clear evidence of being genetic ancestry outliers, evidence of relatedness etc. All of these steps can be easily carried out using PLINK. With older genotyping platforms, low frequency SNPs are also often excluded because they are hard to genotype accurately. With more modern genotyping arrays, the accuracy of genotype calls for low frequency SNPs is less of a concern.
Pre-Phasing the GWAS data
Once a quality controlled dataset is available we need to pre-phase the data followed by imputation. Pre-Phasing can be done using either MaCH or SHAPEIT, the most commonly used tools. See our wiki page on Pre-Phasing for further details on pre-phasing on GWAS of different samples sizes.
Convert Files to VCF
After the pre-phasing has been done, we can begin to run the imputation. But before that, we need to convert our phased GWAS panel files (obtained above) to VCF format (since Minimac3 can only use VCF format files). If pre-phased data is already available in VCF format, users can skip this step. Otherwise, see our wiki page on Converting to VCF for further details/tools on converting files to VCF
Chromosome X Imputation
Chromosome X has a pseudo-autosomal region (PAR) which can be imputed for males and females together. Imputing the PAR on chromosome X is same as usual imputation, since both males and females are diploids at these sites. However, the non pseudo-autosomal region needs to be imputed for males and females separately, as males are haploids while females are diploids. Of course, the PAR and non-PAR regions need to be imputed separately.
The following example illustrates imputation on the non-PAR of chromosome X for males and females separately (files available in Minimac3/test/
directory)
Male Samples (Non-PAR)
../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf --haps targetStudyChrX.males.vcf --prefix testRun
Female Samples (Non-PAR)
../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf --haps targetStudyChrX.females.vcf --prefix testRun
NOTE: For imputing non-PAR of chromosome X, user must analyze male and female samples separately, otherwise program would crash. User should also ensure that the reference panel consists of only PAR or non-PAR region of chromosome X, otherwise program would crash.
Download
Minimac3 is available as an undocumented release version. The source files (and binary executable) are available for download in Source Files and commonly used reference panels in VCF and M3VCF formats are available for download in Reference Panels.
Useful Wiki Pages
There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:
- Minimac3 Imputation Cookbook (Recommended for New Users!!)
Contact
In case of any queries and bugs please contact Sayantan Das.