Minimac3 Imputation Cookbook

From Genome Analysis Wiki
Jump to: navigation, search

Introduction

Minimac3 is a lower memory and more computationally efficient implementation of minimac2. It is an algorithm for genotypic imputation that works on phased genotypes and is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy.

This wiki page is designed to give users a detailed step-by-step description on running typical GWAS imputation experiments.

Imputation Cookbook

This section gives a brief summary of the steps required to go through an experiment of imputation on typical GWAS samples.

Preliminary Data Quality Control

Before pre-phasing and imputation, users must ensure that their data is quality controlled. Standard quality control filters involve excluding markers with high missingness rate, high deviations from Hardy-Weinberg equilibrium, high discordance rates (if duplicate copies available), excess Mendelian inconsistencies etc. and removing samples with high missingness rate, unusual heterozygosity, high inbreeding coefficient, clear evidence of being genetic ancestry outliers, evidence of relatedness etc. All of these steps can be easily carried out using PLINK. With older genotyping platforms, low frequency SNPs are also often excluded because they are hard to genotype accurately. With more modern genotyping arrays, the accuracy of genotype calls for low frequency SNPs is less of a concern.

Pre-Phasing the GWAS data

Once a quality controlled dataset is available we need to pre-phase the data followed by imputation. Pre-Phasing can be done using either MaCH or SHAPEIT, the most commonly used tools. See our wiki page on Pre-Phasing for further details on pre-phasing on GWAS of different samples sizes.

Convert Files to VCF

After the pre-phasing has been done, we can begin to run the imputation. But before that, we need to convert our phased GWAS panel files (obtained above) to VCF format (since Minimac3 can only use VCF format files). If pre-phased data is already available in VCF format, users can skip this step. Otherwise, see our wiki page on Converting to VCF for further details/tools on converting files to VCF

Download Reference Panel

Commonly used reference panels are 1000 Genomes Phase 3 (2,535 samples), 1000 Genomes Phase 1 (1,094 samples), HapMap2 (269 samples), Haplotype Reference Consortium (32,914 samples) etc. Users are advised to use either 1000 Genomes Phase 3 or the Haplotype Reference Consortium (which due to data privacy issues cannot be shared publicly but can be used for imputation remotely on a server through a imputation server setup at University of Michigan). Reference panels for different versions of 1000 Genomes, in both VCF and M3VCF format, are available for download in Reference Panels.

Impute Samples

The final step for imputation involves running Minimac3 to perform the imputation analysis. Now that we have the pre-phased GWAS panel (in VCF format) and the appropriate reference panel (in VCF or M3VCF format), we can run Minimac3 as follows. In the following examples, the first one uses a VCF file for reference (that can be obtained as explained above) and the second example uses a M3VCF file (that might have been downloaded from Reference Panels or created on a previous run of Minimac3).

../bin/Minimac3 --refHaps ReferencePanel.Chr20.1000Genomes.vcf \ 
                --haps Gwas.Chr20.Phased.Output.VCF.format.vcf \
                --prefix Gwas.Chr20.Imputed.Output
../bin/Minimac3 --refHaps ReferencePanel.Chr20.1000Genomes.m3vcf \ 
                --haps Gwas.Chr20.Phased.Output.VCF.format.vcf \
                --prefix Gwas.Chr20.Imputed.Output

Chromosome X Imputation

Chromosome X has a pseudo-autosomal region (PAR) which can be imputed for males and females together. Imputing the PAR on chromosome X is same as usual imputation, since both males and females are diploids at these sites. However, the non pseudo-autosomal region needs to be imputed for males and females separately, as males are haploids while females are diploids. Of course, the PAR and non-PAR regions need to be imputed separately. See our wiki page on Chromosome X Imputation for details on imputing chromosome X.

Download

Minimac3 is currently available as a pre-release. The source files (and binary executable) are available for download in Source Files and commonly used reference panels in VCF and M3VCF formats are available for download in Reference Panels.

Useful Wiki Pages

There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:

  • Minimac3 Imputation Cookbook (Recommended for New Users!!)

Contact

In case of any queries and bugs please contact Sayantan Das.