Difference between revisions of "Minimac4"

From Genome Analysis Wiki
Jump to: navigation, search
(Introduction)
 
(35 intermediate revisions by 3 users not shown)
Line 2: Line 2:
 
= Introduction =
 
= Introduction =
  
'''Minimac4 ''' is a latest version in the series of genotype imputation software - preceded by [[Minimac3|Minimac3]] (2015), [[Minimac2|Minimac2]] (2014), [[Minimac|minimac]] (2012) and [[MaCH|MaCH]] (2010). '''Minimac4''' is a lower memory and more computationally efficient implementation of the original algorithms with negligible fall in imputation quality.
+
'''Minimac4 ''' is a latest version in the series of genotype imputation software - preceded by [[Minimac3|Minimac3]] (2015), [[Minimac2|Minimac2]] (2014), [[Minimac|minimac]] (2012) and [[MaCH|MaCH]] (2010). '''Minimac4''' is a lower memory and more computationally efficient implementation of the original algorithms with comparable imputation quality.
  
= Download =
+
The Minimac3 mailing list has been renamed as the Minimac4 mailing list. If you were already a member, no need to re-join. If not, please join our [https://groups.google.com/forum/embed/?place=forum/minimac4-help&umich.edu| mailing list] to get updates about future releases or report possible bugs or email them to [mailto:yukt@umich.edu  Ketian Yu] or [mailto:sayantan@umich.edu Sayantan Das].'''
  
'''Minimac4 ''' is currently available for testing purposes only. Commonly used reference panels in <font face=Courier>M3VCF</font> format are available for download in [[#Reference Panels for Download | Reference Panels]].
+
= Installation =
  
The Minimac3 mailing list has been renamed as the [https://groups.google.com/forum/embed/?place=forum/minimac4-help&umich.edu| Minimac4 mailing list]. If you were already a member, no need to re-join. If not, please join our [https://groups.google.com/forum/embed/?place=forum/minimac4-help&umich.edu| mailing list] to get updates about future releases or report possible bugs or email them to  [mailto:sayantan@umich.edu Sayantan Das].'''
+
'''Minimac4 (version 1.0.0, released 2.14.2018)''' is currently available on [https://github.com/Santy-8128/Minimac4 Minimac4 Github].  
  
'''VERSION: 1.0.1 !!! (Updated 12.1.2016) !!!'''
+
The easiest way to install Minimac4 and its dependencies is to use the install.sh file provided.
 +
git clone <nowiki>https://github.com/statgen/Minimac4.git</nowiki>
 +
cd Minimac4
 +
bash install.sh
  
'''Github Repo:''' Users can clone from github repository as well : [https://github.com/Santy-8128/Minimac4 Minimac4 Github]  
+
Please see [https://github.com/Santy-8128/Minimac4 Minimac4 Github] for the full instructions for installation.
  
'''Cloning from GitHub is recommened so that updates can be easily pulled back !!!'''
+
Commonly used reference panels in <font face=Courier>M3VCF</font> format are available for download in [[#Reference Panels for Download | Reference Panels]].
 +
 
 +
= What's New =
 +
 
 +
The input file format, output file formats and typical command lines are the same in Minimac4 (as they were in minimac3). Some of the main new features are summarized below:
 +
 
 +
* '''Improved Speed - ''' Minimac4 is approximately '''6 times''' faster for 1000 Genomes Phase 1 and Phase 3 and '''2 times''' faster for the HRC reference panels at comparable accuracy (details of accuracy for imputing into 10 European samples are given here). The speed can be further improved by tuning the approximation parameters (see below), but we recommend using the default values.
 +
 
 +
* '''Automated Chunking - ''' Minimac4 automatically chunks the whole chromosome (into overlapping chunks), analyzes each chunk sequentially and then concatenates the imputed chunks back. This caps the memory usage across different chromosomes (memory requirement is based on chunk size, not chromosome size). The length of the chunk and the overlap can be controlled by the <code>--chunkLengthMb</code> and <code>--chunkLengthOverlapMb</code> options, although we recommend using the default values of 20 and 3, respectively.
 +
 
 +
* '''Approximations - ''' Minimac4 uses some simple approximations to speed up the imputation analyses. The levels of approximation can be controlled by the parameters <code>--probThreshold</code>, <code>--diffThreshold</code>, and <code>--topThreshold</code> (details given in Minimac4 Usage). Higher levels of approximation will reduce the compute time but marginally reduce the imputation accuracy. We recommend using the default values (0.01).
 +
 
 +
* '''Improved Chromosome X/Y Support - ''' Minimac4 can handle different ploidys in the same VCF file for imputation of sex chromosomes. For example, for the non-PAR region on chromosome X, males and females can be imputed together, irrespective of whether males are coded as haploids or diploids. However, each sample must have a fixed ploidy. Thus, PAR and non-PAR regions still need to be imputed separately. Please see Chromosome X Imputation for more details.
 +
 
 +
* '''Other Helpful Features'''
 +
** We introduced a new feature called <code>--memUsage</code> that will estimate and report the memory required by Minimac4. This feature should be useful for users running their jobs on a compute cluster that requires memory specification.
 +
** We introduced some other FORMAT options for the output dosage data, allowing users to retrieve haplotype dosages, genotype probabilities, genotype dosages or any other measure of summary that they want.
 +
** We have fixed the bug related to FILTER=GENOTYPED and FILTER=GENOTYPED_ONLY which was causing a crash in bcftools.
 +
 
 +
* '''Obsolete Features'''
 +
** In Minimac4, we removed the <code>--doseOutput</code> and <code>--hapOutput</code> options. Please use [[DosageConvertor]] to convert your files to MaCH or PLINK dosage format.
 +
** Currently Minimac4 can ONLY handle M3VCF format files. If your reference panel is in VCF format, please use [[Minimac3]] to convert the VCF file to M3VCF (along with parameter estimation) and then use that M3VCF for imputation using Minimac4. The same holds for the option <code>--processReference</code> as well. Although the handle is made available, we will implement it in a later version.
 +
** Parameters such as <code>--rounds</code>, <code>--states</code>, <code>--rec</code>, <code>--err</code> have been deactivated for now until we implement parameter estimation in minimac4.
 +
 
 +
= Usage =
 +
A typical Minimac4 command line for imputation is as follows
 +
 
 +
minimac4 --refHaps refPanel.m3vcf \
 +
          --haps targetStudy.vcf \
 +
          --prefix testRun
 +
 
 +
Here <font face=Courier>refPanel.m3vcf</font> is the reference panel used in M3VCF format (e.g. 1000 Genomes),  <font face=Courier>targetStudy.vcf</font> is the phased GWAS data in VCF format, and  <font face=Courier>testRun</font> is the prefix for the output files.
 +
 
 +
=== Full List of Options ===
 +
Please see '''[[Minimac4 Documentation]]''' for detailed explanation of all available options.
 +
 
 +
Also, users can always type the following for the full list of available options:
 +
minimac4 --help
 +
 
 +
=== Convert VCF to M3VCF ===
 +
If the reference panel is in VCF format, please use [[Minimac3]] to convert it into M3VCF format first.
 +
../bin/Minimac3 --refHaps refPanel.vcf \
 +
                --processReference \
 +
                --prefix refPanel
 +
 
 +
=== Multi-Threading ===
 +
The following example shows the same analysis as above, but using 5 threads:
 +
 
 +
minimac4 --refHaps refPanel.m3vcf \
 +
          --haps targetStudy.vcf \
 +
          --prefix testRun \
 +
          --cpus 5
 +
 
 +
= Reference Panels for Download =
 +
 
 +
Some commonly used reference panels are available for download here:
 +
 
 +
{| class="wikitable" style="text-align:center" border="1" cellpadding="2"
 +
|- bgcolor="lightgray"
 +
! width="150px" |Reference Panel
 +
! width="100px" |Number <br> of Samples
 +
! width="100px" |File Format
 +
! width="100px" |Parameter <br>  Estimates <br> Available
 +
! width="120px" |Chromosomes
 +
! width="80px" |Link
 +
|-
 +
| rowspan=4 | '''1000 Genomes''' <br>
 +
'''Phase 3''' <br>
 +
(version 5)
 +
| rowspan=4  style="text-align:center" | '''2,504'''
 +
| '''VCF'''
 +
| -
 +
| 1-22,X
 +
| [ftp://share.sph.umich.edu/minimac3/G1K_P3_VCF_Files.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P3_VCF_Files.tar.gz Download]-->
 +
|-
 +
| rowspan=2  style="text-align:center" | '''M3VCF'''
 +
| YES
 +
| 1-22,X
 +
|  [ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_WITH_ESTIMATES.tar.gz Download] <!-- [ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_WITH_ESTIMATES.tar.gz Download]-->
 +
|-
 +
|NO
 +
| 1-22,X
 +
|  [ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_NO_ESTIMATES.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_NO_ESTIMATES.tar.gz Download]-->
 +
|-
 +
| '''VCF''','''M3VCF'''
 +
| YES
 +
| X
 +
|  [ftp://share.sph.umich.edu/minimac3/G1K_P3_CHR_X_VCF_M3VCF_FILES.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P3_CHR_X_VCF_M3VCF_FILES.tar.gz Download]-->
 +
|-
 +
| rowspan=4 |  '''1000 Genomes''' <br>
 +
'''Phase 1''' <br>
 +
(version 3)
 +
| rowspan=4  | '''1,092'''
 +
| '''VCF'''
 +
| -
 +
| 1-22,X
 +
| [ftp://share.sph.umich.edu/minimac3/G1K_P1_VCF_Files.tar.gz Download]
 +
|-
 +
|  rowspan=2  style="text-align:center" | '''M3VCF'''
 +
| YES
 +
| 1-22,X
 +
| [ftp://share.sph.umich.edu/minimac3/G1K_P1_M3VCF_FILES_WITH_ESTIMATES.tar.gz Download]
 +
|-
 +
| NO
 +
| 1-22,X
 +
| [ftp://share.sph.umich.edu/minimac3/G1K_P1_M3VCF_FILES_NO_ESTIMATES.tar.gz Download]
 +
|-
 +
| '''VCF''','''M3VCF'''
 +
| YES
 +
| X
 +
|  [ftp://share.sph.umich.edu/minimac3/G1K_P1_CHR_X_VCF_M3VCF_FILES.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P1_CHR_X_VCF_M3VCF_FILES.tar.gz Download]-->
 +
|}
 +
 
 +
 
 +
= Useful Wiki Pages =
 +
 
 +
There are a few pages in this Wiki that may be useful to for '''Minimac4''' users. Here are links to a few:
 +
 
 +
* [[Minimac4| Minimac4 Overview Page]]
 +
 
 +
* [[Minimac4 Documentation]]
 +
 
 +
* [[M3VCF Files| M3VCF Files]]

Latest revision as of 23:29, 19 July 2019

Introduction

Minimac4 is a latest version in the series of genotype imputation software - preceded by Minimac3 (2015), Minimac2 (2014), minimac (2012) and MaCH (2010). Minimac4 is a lower memory and more computationally efficient implementation of the original algorithms with comparable imputation quality.

The Minimac3 mailing list has been renamed as the Minimac4 mailing list. If you were already a member, no need to re-join. If not, please join our mailing list to get updates about future releases or report possible bugs or email them to Ketian Yu or Sayantan Das.

Installation

Minimac4 (version 1.0.0, released 2.14.2018) is currently available on Minimac4 Github.

The easiest way to install Minimac4 and its dependencies is to use the install.sh file provided.

git clone https://github.com/statgen/Minimac4.git
cd Minimac4
bash install.sh

Please see Minimac4 Github for the full instructions for installation.

Commonly used reference panels in M3VCF format are available for download in Reference Panels.

What's New

The input file format, output file formats and typical command lines are the same in Minimac4 (as they were in minimac3). Some of the main new features are summarized below:

  • Improved Speed - Minimac4 is approximately 6 times faster for 1000 Genomes Phase 1 and Phase 3 and 2 times faster for the HRC reference panels at comparable accuracy (details of accuracy for imputing into 10 European samples are given here). The speed can be further improved by tuning the approximation parameters (see below), but we recommend using the default values.
  • Automated Chunking - Minimac4 automatically chunks the whole chromosome (into overlapping chunks), analyzes each chunk sequentially and then concatenates the imputed chunks back. This caps the memory usage across different chromosomes (memory requirement is based on chunk size, not chromosome size). The length of the chunk and the overlap can be controlled by the --chunkLengthMb and --chunkLengthOverlapMb options, although we recommend using the default values of 20 and 3, respectively.
  • Approximations - Minimac4 uses some simple approximations to speed up the imputation analyses. The levels of approximation can be controlled by the parameters --probThreshold, --diffThreshold, and --topThreshold (details given in Minimac4 Usage). Higher levels of approximation will reduce the compute time but marginally reduce the imputation accuracy. We recommend using the default values (0.01).
  • Improved Chromosome X/Y Support - Minimac4 can handle different ploidys in the same VCF file for imputation of sex chromosomes. For example, for the non-PAR region on chromosome X, males and females can be imputed together, irrespective of whether males are coded as haploids or diploids. However, each sample must have a fixed ploidy. Thus, PAR and non-PAR regions still need to be imputed separately. Please see Chromosome X Imputation for more details.
  • Other Helpful Features
    • We introduced a new feature called --memUsage that will estimate and report the memory required by Minimac4. This feature should be useful for users running their jobs on a compute cluster that requires memory specification.
    • We introduced some other FORMAT options for the output dosage data, allowing users to retrieve haplotype dosages, genotype probabilities, genotype dosages or any other measure of summary that they want.
    • We have fixed the bug related to FILTER=GENOTYPED and FILTER=GENOTYPED_ONLY which was causing a crash in bcftools.
  • Obsolete Features
    • In Minimac4, we removed the --doseOutput and --hapOutput options. Please use DosageConvertor to convert your files to MaCH or PLINK dosage format.
    • Currently Minimac4 can ONLY handle M3VCF format files. If your reference panel is in VCF format, please use Minimac3 to convert the VCF file to M3VCF (along with parameter estimation) and then use that M3VCF for imputation using Minimac4. The same holds for the option --processReference as well. Although the handle is made available, we will implement it in a later version.
    • Parameters such as --rounds, --states, --rec, --err have been deactivated for now until we implement parameter estimation in minimac4.

Usage

A typical Minimac4 command line for imputation is as follows

minimac4 --refHaps refPanel.m3vcf \
         --haps targetStudy.vcf \
         --prefix testRun

Here refPanel.m3vcf is the reference panel used in M3VCF format (e.g. 1000 Genomes), targetStudy.vcf is the phased GWAS data in VCF format, and testRun is the prefix for the output files.

Full List of Options

Please see Minimac4 Documentation for detailed explanation of all available options.

Also, users can always type the following for the full list of available options:

minimac4 --help

Convert VCF to M3VCF

If the reference panel is in VCF format, please use Minimac3 to convert it into M3VCF format first.

../bin/Minimac3 --refHaps refPanel.vcf \ 
                --processReference \ 
                --prefix refPanel

Multi-Threading

The following example shows the same analysis as above, but using 5 threads:

minimac4 --refHaps refPanel.m3vcf \
         --haps targetStudy.vcf \
         --prefix testRun \
         --cpus 5

Reference Panels for Download

Some commonly used reference panels are available for download here:

Reference Panel Number
of Samples
File Format Parameter
Estimates
Available
Chromosomes Link
1000 Genomes

Phase 3
(version 5)

2,504 VCF - 1-22,X Download
M3VCF YES 1-22,X Download
NO 1-22,X Download
VCF,M3VCF YES X Download
1000 Genomes

Phase 1
(version 3)

1,092 VCF - 1-22,X Download
M3VCF YES 1-22,X Download
NO 1-22,X Download
VCF,M3VCF YES X Download


Useful Wiki Pages

There are a few pages in this Wiki that may be useful to for Minimac4 users. Here are links to a few: