Minimac3
Introduction
Minimac3 is a lower memory and more computationally efficient implementation of minimac2. It is an algorithm for genotypic imputation that works on phased genotypes (say from MaCH). minimac3 is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy. This algorithm analyzes only the unique sets of haplotypes in small genomic segments, thereby saving on time-complexity, computational memory but no loss in degree of accuracy.
Minimac3, apart from performing imputation, also creates M3VCF files (customized minimac3 VCF files) which are able to store reference panel information in a compact form, thus saving on memory and time required to read large datasets. User will have an option to use the binary code to either just convert VCF files to M3VCF files or to perform imputation as well. The code can also take a previously generated M3VCF file as input for the reference panel. M3VCF files can also store pre-calculated estimates of recombination fraction and error, which can be used for later runs of imputation. The latest version of Minimac3 also allows output in the form of VCF files for easier data manipulation in downstream analysis.
Download
Minimac3 is available as an undocumented release version. The source files are available for download here and commonly used reference panels in M3VCF
format are available for download in Reference Panels. The authors would really appreciate if users would use it on their data set and let us know of possible bugs to be fixed.
- To Download Minimac3
Description | Download Link |
---|---|
Minimac3 Executable | UNIX Users |
Minimac3-omp Executable (for parallel computing) | UNIX Users |
Minimac3 Source Files | UNIX Users |
Usage
Users should follow the following steps to compile Minimac3 (if they downloaded the source files) or should skip them (if they downloaded the binary executable).
## EXTRACT MINIMAC3 AND COMPILE tar -xzvf Minimac3.v1.0.0.tar.gz cd Minimac3/ make
A typical Minimac3 command line for imputation is as follows
../bin/Minimac3 --refHaps refPanel.vcf \ --haps targetStudy.vcf \ --prefix testRun
Here refPanel.vcf
is the reference panel used in VCF format (e.g. 1000 Genomes), targetStudy.vcf
is the phased GWAS data in VCF format, and testRun
is the prefix for the output files. Some commonly used reference panels are available for download in Reference Panels. See wiki page on Detailed Usage and Imputation Cookbook for further details on using Minimac3 for imputation analysis.
Users can always type the following for further support:
/bin/Minimac3 --help
Chromosome X Imputation
Chromosome X has a pseudo-autosomal region (PAR) which can be imputed for males and females together. Imputing the PAR on chromosome X is same as usual imputation, since both males and females are diploids at these sites. However, the non pseudo-autosomal region needs to be imputed for males and females separately, as males are haploids while females are diploids. Of course, the PAR and non-PAR regions need to be imputed separately.
The following example illustrates imputation on the non-PAR of chromosome X for males and females separately (files available in Minimac3/test/
directory)
Male Samples (Non-PAR)
../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf --haps targetStudyChrX.males.vcf --prefix testRun
Female Samples (Non-PAR)
../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf --haps targetStudyChrX.females.vcf --prefix testRun
NOTE: For imputing non-PAR of chromosome X, user must analyze male and female samples separately, otherwise program would crash. User should also ensure that the reference panel consists of only PAR or non-PAR region of chromosome X, otherwise program would crash.
Reference Panels for Download
Some commonly used reference panels are available for download here. [NOTE: Chromosome X will be be available soon]
Reference Panel | Format | Download Link | Internal CSG Copy Link |
---|---|---|---|
1000 Genomes Phase 3 | VCF Files | Coming Soon | /net/fantasia/home/sayantan/DATABASE/1000G/PHASE_3/FOR_UPLOAD/G1K_P3/VCF_Files/
|
1000 Genomes Phase 3 | M3VCF Files (With Parameter Estimates) | Coming Soon | /net/fantasia/home/sayantan/DATABASE/1000G/PHASE_3/FOR_UPLOAD/G1K_P3/M3VCF_Files_With_Estimates/
|
1000 Genomes Phase 3 | M3VCF Files (Without Parameter Estimates) | Coming Soon | /net/fantasia/home/sayantan/DATABASE/1000G/PHASE_3/FOR_UPLOAD/G1K_P3/M3VCF_Files_No_Estimates/
|
1000 Genomes Phase 1 | VCF Files | Coming Soon | /net/fantasia/home/sayantan/DATABASE/1000G/PHASE_1_V3/FOR_UPLOAD/G1K_P1/VCF_Files/
|
1000 Genomes Phase 1 | M3VCF Files (With Parameter Estimates) | Coming Soon | /net/fantasia/home/sayantan/DATABASE/1000G/PHASE_1_V3/FOR_UPLOAD/G1K_P1/M3VCF_Files_With_Estimates/
|
1000 Genomes Phase 1 | M3VCF Files (Without Parameter Estimates) | Coming Soon | /net/fantasia/home/sayantan/DATABASE/1000G/PHASE_1_V3/FOR_UPLOAD/G1K_P1/M3VCF_Files_No_Estimates/
|
Contact
In case of any queries and bugs please contact Sayantan Das.