Minimac3 Cookbook : Chromosome X Imputation
Minimac3 is a lower memory and more computationally efficient implementation of minimac2. It is an algorithm for genotypic imputation that works on phased genotypes and is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy.
This wiki page is designed to give users a detailed step-by-step description on imputing chromosome X.
Chromosome X Imputation
Chromosome X has a pseudo-autosomal region (PAR) which can be imputed for males and females together. Imputing the PAR on chromosome X is same as usual imputation, since both males and females are diploids at these sites. However, the non pseudo-autosomal region (non-PAR) needs to be imputed for males and females separately, as males are haploids while females are diploids. Of course, the PAR and non-PAR regions need to be imputed separately. Following should be the steps involved in imputing chromosome X.
- Convert files to VCF Format: Start by converting the unphased, quality controlled data set into VCF format. See our wiki page on Converting to VCF for more details on how to convert.
- Split the data into PAR and non-PAR: Separate the pseudo-autosomal part and non-pseudo-autosomal part into separate files. The non-PAR is located on chrX:2699520-154931043 on build hg19. The split can be done for VCF files as follows.
vcftools --gzvcf gwas.data.vcf.gz \ --chr X \ --from-bp 2699520 \ --to-bp 154931043 \ --recode \ --out Non.PAR.gwas.data vcftools --gzvcf gwas.data.vcf.gz \ --exclude-positions Non.PAR.gwas.data.recode.vcf \ --recode \ --out PAR.gwas.data
NOTE: After this step, please verify that the male samples have only one haplotype in Non.PAR.gwas.data.recode.vcf and two haplotypes in PAR.gwas.data.recode.vcf
- Split the non-PAR data by Sex: Separate the non-PAR data by sex, which can also be done by vcftools as follows. Note that the PAR.gwas.data.recode.vcf need NOT be separated since both males and females are diploids there.
vcftools --vcf Non.PAR.gwas.data.recode.vcf \ --keep male.sample.list ## or female.sample.list \ --recode \ --out Male.Non.PAR.gwas.data ## or Female.Non.PAR.gwas.data \
- Pre-phase PAR data and female non-PAR data: Out of the three available data, only the PAR data and female non-PAR data have two haplotypes and thus need to be phased, while the male non-PAR data has haploids and need not be phased. See our wiki page on Pre-Phasing and Converting to VCF for further details on pre-phasing and converting files back to VCF format.
- Impute Data: The following example illustrates how to impute into the phased PAR data (both males and females together), phased female non-PAR data and haploid male non-PAR data (same as obtained after splitting the non-PAR by sex) as follows:
# Phased All Samples (PAR) ../bin/Minimac3 --refHaps refPanelChrX.Auto.vcf \ --haps Phased.PAR.gwas.data.vcf \ --prefix testRun.All.PAR # Phased Female Samples (Non-PAR) ../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf \ --haps Phased.Female.Non.PAR.gwas.data.vcf \ --prefix testRun.females.Non.PAR # Haploid Male Samples (Non-PAR) ../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf \ --haps Male.Non.PAR.gwas.data.recode.vcf \ --prefix testRun.males.Non.PAR
- NOTE: For imputing non-PAR of chromosome X, user must analyze male and female samples separately, otherwise program would crash. User should also ensure that the reference panel consists of only PAR or non-PAR region of chromosome X, otherwise program would crash.
Minimac3 is currently available as a pre-release. The source files (and binary executable) are available for download in Source Files and commonly used reference panels in VCF and M3VCF formats are available for download in Reference Panels.
Useful Wiki Pages
There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:
- Minimac3 Imputation Cookbook (Recommended for New Users!!)
In case of any queries and bugs please contact Sayantan Das.