Difference between revisions of "Minimac3"

From Genome Analysis Wiki
Jump to navigationJump to search
(Add category tag to improve findability)
 
(98 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
* '''New Version [[Minimac4]] available ! Please Check out !!!'''
 +
 +
* '''Please join our NEW [https://groups.google.com/forum/embed/?place=forum/minimac4-help#!forum/minimac4-help mailing list] to get updates about future releases, bug fixes or post queries.'''
 +
 +
* '''No further development on Minimac3 !!!''' See [[Minimac3 ChangeLog | ChangeLog ]] for details !!!
 +
 +
= Useful Wiki Pages =
 +
 +
There are a few pages in this Wiki that may be useful to for '''Minimac3''' users. Here are links to a few:
 +
 +
* [[Minimac3| Minimac3 Overview Page]]
 +
 +
* [[Minimac3 Usage | Minimac3 Usage and Documentation]]
 +
 +
* [[Minimac3 Imputation Cookbook]] ('''Recommended for New Users!!''')
 +
 +
* [[Minimac3 Cookbook : Chromosome X Imputation| Chromosome X Imputation]]
 +
 +
* [[Minimac3 Examples| Minimac3 Examples]]
 +
 +
* [[Minimac3 Info File| Minimac3 Info File]]
 +
 +
* [[Minimac3 ChangeLog | Minimac3 ChangeLog ]]
 +
 +
* [[M3VCF Files| M3VCF Files]]
 +
 
= Introduction =
 
= Introduction =
  
'''Minimac3 ''' is a lower memory and more computationally efficient implementation of [http://genome.sph.umich.edu/wiki/Minimac2 minimac2]. It is an algorithm for genotypic imputation that works on phased genotypes (say from [http://genome.sph.umich.edu/wiki/MaCH MaCH]). minimac3 is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy. This algorithm analyzes only the unique sets of haplotypes in small genomic segments, thereby saving on time-complexity, computational memory but no loss in degree of accuracy.
+
'''Minimac3 ''' is a lower memory and more computationally efficient implementation of the genotype imputation algorithms in [[Minimac|minimac]] and [[Minimac2|minimac2]]. '''Minimac3''' is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy. It accomplishes this by identifying repeat haplotype patterns and using these to simplify the underlying calculations, with no loss of accuracy.
  
Minimac3, apart from performing imputation, also creates [[#M3VCF Files|<font face=Courier>M3VCF</font> files]] (customized minimac3 VCF files) which are able to store reference panel information in a compact form, thus saving on memory and time required to read large datasets. User will have an option to use the binary code to either just convert VCF files to <font face=Courier>M3VCF</font> files or to perform imputation as well. The code can also take a previously generated <font face=Courier>M3VCF</font> file as input for the reference panel. <font face=Courier>M3VCF</font> files can also store pre-calculated estimates of recombination fraction and error, which can be used for later runs of imputation.  The latest version of Minimac3 also allows output in the form of VCF files for easier data manipulation in downstream analysis.
+
Minimac3 uses [[M3VCF Files|<font face=Courier>M3VCF</font> files]] (customized minimac3 VCF files) to store reference panel information in a compact form, thus saving on memory and time required to read large datasets. Users can use Minimac3 to convert standard VCF files to <font face=Courier>M3VCF</font> files. <font face=Courier>M3VCF</font> files can also store pre-calculated estimates of recombination fraction and error, which speeds up later rounds of imputation.  Minimac3 outputs results in the form of standard VCF files for easy data manipulation in downstream analysis.
  
 
= Download =
 
= Download =
  
'''Minimac3 ''' is available as an undocumented release version. The source files are available for download here and commonly used reference panels in <code>M3VCF</code> format are available for download in [[#Reference Panels for Download | Reference Panels]]. The authors would really appreciate if users would use it on their data set and let us know of possible bugs to be fixed.  
+
'''Minimac3 ''' is currently available as a release version. Commonly used reference panels in <font face=Courier>M3VCF</font> format are available for download in [[#Reference Panels for Download | Reference Panels]].  
 +
 
 +
'''Please join our NEW [https://groups.google.com/forum/embed/?place=forum/minimac4-help#!forum/minimac4-help mailing list] to get updates about future releases or report possible bugs or email them to [mailto:sayantan@umich.edu Sayantan Das].'''
 +
 
 +
'''VERSION: 2.0.1 !!! (Updated 6.6.2016) !!!'''
 +
 
 +
'''Github Repo:''' Users can clone from github repository as well : [https://github.com/Santy-8128/Minimac3 Minimac3 Github]
  
* To Download Minimac3
+
'''Cloning from GitHub is recommened so that updates can be easily pulled back !!!'''
  
{| class="wikitable" border="1" cellpadding="2"
+
{| class="wikitable" style="text-align:center"  border="1" cellpadding="2"
 
|- bgcolor="lightgray"
 
|- bgcolor="lightgray"
 
! Description
 
! Description
 
! Download Link
 
! Download Link
 
|-  
 
|-  
| Minimac3 Executable
+
| Source Files
| [[Media : Minimac3.v1.0.0.binary.tar.gz | UNIX Users ]]
+
| [ftp://share.sph.umich.edu/minimac3/Minimac3.v2.0.1.tar.gz UNIX Users ]
 
|-  
 
|-  
| Minimac3-omp Executable (for parallel computing)
+
| Binary Executable <sup>&#8224;</sup>
| [[Media : Minimac3.v1.0.0.binary-OMP.tar.gz | UNIX Users ]]
+
| [ftp://share.sph.umich.edu/minimac3/Minimac3Executable.tar.gz UNIX Users ]
|-
 
| Minimac3 Source Files
 
| [[Media : Minimac3.v1.0.0.tar.gz | UNIX Users ]]
 
 
|}
 
|}
 +
 +
'''<sup>&#8224;</sup>''' Binary executables are NOT guaranteed to run on every LINUX machine. Please compile from source files if you have trouble with the executable, or else contact the author [mailto:sayantan@umich.edu Sayantan Das]..
  
 
= Usage=
 
= Usage=
Line 30: Line 61:
 
Users should follow the following steps to compile '''Minimac3''' (if they downloaded the source files) or should skip them (if they downloaded the binary executable).
 
Users should follow the following steps to compile '''Minimac3''' (if they downloaded the source files) or should skip them (if they downloaded the binary executable).
  
  ## EXTRACT MINIMAC3 AND COMPILE
+
  ## DOWNLOAD, EXTRACT MINIMAC3 AND COMPILE
 
  &nbsp;
 
  &nbsp;
  tar -xzvf Minimac3.v1.0.0.tar.gz
+
wget ftp://share.sph.umich.edu/minimac3/Minimac3.v2.0.1.tar.gz
 +
  tar -xzvf Minimac3.v2.0.1.tar.gz
 
  cd Minimac3/
 
  cd Minimac3/
 
  make
 
  make
Line 42: Line 74:
 
                 --prefix testRun
 
                 --prefix testRun
  
Here <code>refPanel.vcf</code> is the reference panel used in VCF format (e.g. 1000 Genomes), <code>targetStudy.vcf</code> is the phased GWAS data in VCF format, and <code>testRun</code> is the prefix for the output files. Some commonly used reference panels are available for download in [[#Reference Panels for Download| Reference Panels]]. See wiki page on [[Minimac3 Usage| Detailed Usage]] and [[Minimac3 Imputation Cookbook|Imputation Cookbook]] for further details on using '''Minimac3''' for imputation analysis.
+
Here <font face=Courier>refPanel.vcf</font> is the reference panel used in VCF format (e.g. 1000 Genomes), <font face=Courier>targetStudy.vcf</font> is the phased GWAS data in VCF format, and <font face=Courier>testRun</font> is the prefix for the output files. Some commonly used reference panels are available for download in [[Minimac3 Imputation Cookbook#Reference Panels for Download| Reference Panels]]. See wiki page on [[Minimac3 Usage| Detailed Usage]] and [[Minimac3 Imputation Cookbook|Imputation Cookbook]] for further details on using '''Minimac3''' for imputation analysis.
 
   
 
   
 
Users can always type the following for further support:
 
Users can always type the following for further support:
Line 48: Line 80:
 
   /bin/Minimac3 --help
 
   /bin/Minimac3 --help
  
= Examples =
+
= Reference Panels for Download =  
  
 +
Some commonly used reference panels are available for download here:
  
To look at the examples, the folder of '''Minimac3 ''' needs to be copied to the users local directory first. Then move to the folder <font face=Courier>LocalDirectory/test/</font>
+
'''Chr X Haplotypes for 1000 Genomes Phase 3 have been updated on Oct 20 to include multi-allelic variants as well (split as bi-allelic variants) !!!'''
  
  cp -r /net/fantasia/home/sayantan/Softwares/Minimac3/ LocalDirectoryMinimac3/
+
{| class="wikitable" style="text-align:center" border="1" cellpadding="2"
  cd LocalDirectoryMinimac3/test/
 
 
 
The following example uses a VCF reference file <font face=Courier>[refPanel.vcf]</font> and a VCF target sample file <font face=Courier>[targetStudy.vcf]</font>
 
 
 
  ../bin/Minimac3 --refHaps refPanel.vcf \
 
                  --haps targetStudy.vcf \
 
                  --prefix testRun
 
 
 
The following example is same as above but uses minimac3-omp (which is implemented using openMP programming enabling parallel computing).
 
 
 
  ../bin/Minimac3-omp --refHaps refPanel.vcf \
 
                      --haps targetStudy.vcf \
 
                      --prefix testRun \
 
                      --cpus 5
 
 
 
The following example converts a VCF reference file into <font face=Courier>M3VCF</font> (only). It also does parameter estimation based on the reference panel using leave-one-out method and saves them in the <font face=Courier>M3VCF</font> file. The parameter estimation can be skipped with "<font face=Courier>--rounds = 0</font>". If the option "<font face=Courier>--processReference</font>" is ON, no imputation will be done, only compression of file from VCF to <font face=Courier>M3VCF</font> format will be done.
 
 
 
../bin/Minimac3 --refHaps refPanel.vcf \
 
                --processReference \
 
                --prefix testRun
 
 
 
The following example uses a <font face=Courier>M3VCF</font> file (which was created in the previous example) and VCF target sample files (<font face=Courier>targetStudy.vcf</font>) for imputation.
 
 
 
../bin/Minimac3 --refHaps testRun.m3vcf.gz \
 
                --haps targetStudy.vcf \
 
                --prefix testRun
 
 
 
[NOTE: In the example above, if <code>testRun.m3vcf.gz</code> was created with <code>rounds = 0</code>, it would contain no parameter estimates. Note that the program works with the saved estimates when available (as in the example above), whereas it does parameter estimation when the estimates are NOT available (as in the example below which is created with <code>rounds = 0</code>)]
 
 
 
../bin/Minimac3 --refHaps refPanel.vcf \
 
                --processReference \
 
                --rounds 0 \
 
                --prefix testRun
 
../bin/Minimac3 --refHaps testRun.m3vcf.gz \
 
                --haps targetStudy.vcf \
 
                --prefix testRun
 
 
 
The following example also uses a <font face=Courier>M3VCF</font> reference file <font face=Courier>[refPanel.m3vcf.gz]</font> and a VCF target sample file <font face=Courier>[targetStudy.vcf]</font>. However, it only analyzes chromosome 6 from position 505988 to 873131 (allowing a buffer of 100 bp on either side). It also outputs a phased haplotype file (using <code>--hapOutput,</code> option) and the usual dosage file (using <code>--doseOutput,</code> option)
 
 
 
../bin/Minimac3 --refHaps testRun.m3vcf.gz \
 
                --haps targetStudy.vcf \
 
                --chr 6 \
 
                --start 505988 \
 
                --end 873131 \
 
                --window 100 \ 
 
                --prefix testRun \
 
                --hapOutput \
 
                --doseOutput
 
 
 
For examples on imputation of chromosome X, see [[#Chromosome X Imputation|Chromosome X Imputation]]
 
 
 
= Imputation Cookbook =
 
 
 
This section gives a brief summary of the steps required to go through an experiment of imputation on typical GWAS samples. Before pre-phasing and imputation, users must ensure that their data is quality controlled. Standard quality control filters involve excluding markers with high missingness rate, high deviations from Hardy-Weinberg equilibrium, high discordance rates (if duplicate copies available), excess Mendelian inconsistencies etc. and removing samples with high missingness rate, unusual heterozygosity, high inbreeding coefficient, clear evidence of being genetic ancestry outliers, evidence of relatedness etc. All of these steps can be easily carried out using [http://pngu.mgh.harvard.edu/~purcell/plink/plink2.shtml PLINK]. With older genotyping platforms, low frequency SNPs are also often excluded because they are hard to genotype accurately. With more modern genotyping arrays, the accuracy of genotype calls for low frequency SNPs is less of a concern.
 
 
 
Once a quality controlled dataset is available we need to pre-phase the data followed by imputation. The steps are explained below.
 
 
 
== Pre-Phasing the GWAS data ==
 
 
 
Pre-Phasing can be done using either [http://www.sph.umich.edu/csg/abecasis/MaCH/ MaCH] or [https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html SHAPEIT].
 
 
 
=== MaCH ===
 
 
 
'''MaCH''' is a Markov Chain based haplotyper. It can resolve long haplotypes in samples of unrelated individuals. The source code is available for download [http://www.sph.umich.edu/csg/abecasis/MaCH/download/ here]. Check out their [http://www.sph.umich.edu/csg/abecasis/MaCH/ home-page] for further details.
 
 
 
A typical command line to phase using MaCH looks like this (<code>Gwas.chr20.Unphased.dat</code> and <code>Gwas.chr20.Unphased.ped </code> is the quality controlled GWAS data set in [http://www.sph.umich.edu/csg/abecasis/Merlin/ Merlin] format)
 
 
 
mach1 -d Gwas.chr20.Unphased.dat \
 
      -p Gwas.chr20.Unphased.ped \
 
      --rounds 20 \
 
      --states 200 \
 
      --phase \
 
      --interim 5 \
 
      --sample 5 \
 
      --prefix Gwas.Chr20.Phased.Output
 
 
 
=== SHAPEIT===
 
 
 
'''SHAPEIT''' is a fast and accurate method for estimation of haplotypes (phasing) from genotype or sequencing data. The source code is available for download [https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#download here]. Check out their [https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html home-page] for further details. It can be used to phase a small number of samples (a reference panel required) as well as a large number of samples (NO reference panel required). The reference panels and genetic map files required by SHAPEIT are available for download [https://mathgen.stats.ox.ac.uk/impute/impute_v2.html#reference here].
 
 
 
* The following example shows a typical SHAPEIT command line to phase a LARGE number (>200) of GWAS samples (<code>Gwas.chr20.Unphased.vcf</code> is the quality controlled GWAS data set in VCF format).
 
 
 
shapeit -V Gwas.chr20.Unphased.vcf \
 
        -M genetic_map_chr20.txt \
 
        -O Gwas.Chr20.Phased.Output
 
 
 
* The following example shows a typical SHAPEIT command line to phase a SMALL number (<200) of GWAS samples (<code>Gwas.chr20.Unphased.vcf</code> is the quality controlled GWAS data set in VCF format).
 
 
 
## The following step splits out variants mis-aligned between the reference and gwas panel
 
shapeit -check \
 
        -V Gwas.chr20.Unphased.vcf\
 
        -M genetic_map_chr20.txt \
 
        --input-ref reference.haplotypes.gz reference.legend.gz reference.sample \
 
        --output-log gwas.alignments
 
 
## The following step phases gwas panel using the reference panel while excluding the markers found in the step above.
 
shapeit -B gwas \
 
        -V Gwas.chr20.Unphased.vcf \
 
        --input-ref reference.haplotypes.gz reference.legend.gz reference.sample \
 
        --exclude-snp gwas.alignments.strand.exclude \
 
        -O Gwas.Chr20.Phased.Output
 
 
 
== Running Imputation ==
 
 
 
After the pre-phasing has been done, we can begin to run the imputation. But before that,we need to convert our phased GWAS panel files (obtained above) to VCF format (since Minimac3 can only use VCF format files) and also download the reference panels required for imputation. Consequently, we would have the following steps.
 
 
 
===Convert GWAS Panel Files into VCF ===
 
 
 
If pre-phased GWAS data is available in VCF format, users can skip this step. Otherwise, the following steps show how to convert other format files to VCF format.
 
 
 
* '''PLINK:''' Use PLINK2 (available [https://www.cog-genomics.org/plink2 here]) as follows:
 
 
 
plink --bfile Gwas.Chr20.Phased.Output \
 
      --recode vcf \
 
      --out Gwas.Chr20.Phased.Output.VCF.format
 
 
 
* '''MaCH:''' Use Mach2VCF (coming soon) as follows:
 
 
 
mach2VCF --haps Gwas.Chr20.Phased.Output.hap \
 
          --snps Gwas.Chr20.Phased.Output.snps \
 
          --prefix Gwas.Chr20.Phased.Output.VCF.format
 
 
 
* '''SHAPEIT:''' Use SHAPEIT (available [https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/shapeit.html#download here]) as follows:
 
 
 
shapeit -convert \
 
        --input-haps Gwas.Chr20.Phased.Output \
 
        --output-vcf Gwas.Chr20.Phased.Output.VCF.format.vcf
 
 
 
=== Download Reference Panel ===
 
 
 
Commonly used reference panels are 1000 Genomes Phase 3 (2,535 samples), 1000 Genomes Phase 1 (1,094 samples), HapMap2 (269 samples), Haplotype Reference Consortium (32,914 samples) etc. Users are advised to use either 1000 Genomes Phase 3 (available for download in [[#Reference Panels for Download |Reference Panels ]]) or the Haplotype Reference Consortium (which due to data privacy issues cannot be shared publicly but can be used for imputation remotely on a server through a [http://imputationserver.sph.umich.edu/ imputation server] setup at University of Michigan). Reference panels for different versions of 1000 Genomes, in both VCF and <code>M3VCF</code> format, are available for download in [[#Reference Panels for Download |Reference Panels]].
 
 
 
=== Impute Samples ===
 
 
 
The final step for imputation involves running '''Minimac3''' to perform the imputation analysis. Now that we have the pre-phased GWAS panel (in VCF format) and the appropriate reference panel (in VCF or <code>M3VCF</code> format), we can run Minimac3 as follows. In the following examples, the first one uses a VCF file for reference (that can be obtained as explained above) and the second example uses a <code>M3VCF</code> file (that might have been downloaded from the links [[#Reference Panels for Download|below]] or created on a previous run of Minimac3).
 
 
 
../bin/Minimac3 --refHaps ReferencePanel.Chr20.1000Genomes.vcf \
 
                --haps Gwas.Chr20.Phased.Output.VCF.format.vcf \
 
                --prefix Gwas.Chr20.Imputed.Output
 
 
 
../bin/Minimac3 --refHaps ReferencePanel.Chr20.1000Genomes.m3vcf \
 
                --haps Gwas.Chr20.Phased.Output.VCF.format.vcf \
 
                --prefix Gwas.Chr20.Imputed.Output
 
 
 
= Chromosome X Imputation =
 
 
 
Chromosome X has a pseudo-autosomal region (PAR) which can be imputed for males and females together. Imputing the PAR on chromosome X is same as usual imputation, since both males and females are diploids at these sites. However, the non pseudo-autosomal region needs to be imputed for males and females separately, as males are haploids while females are diploids. Of course, the PAR and non-PAR regions need to be imputed separately.
 
 
 
The following example illustrates imputation on the non-PAR of chromosome X for males and females separately (files available in <code>Minimac3/test/</code> directory)
 
 
 
Male Samples (Non-PAR)
 
  ../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf --haps targetStudyChrX.males.vcf --prefix testRun
 
 
 
Female Samples (Non-PAR)
 
  ../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf --haps targetStudyChrX.females.vcf --prefix testRun
 
 
 
NOTE: For imputing non-PAR of chromosome X, user must analyze male and female samples separately, otherwise program would crash. User should also ensure that the reference panel consists of only PAR or non-PAR region of chromosome X, otherwise program would crash.
 
 
 
= Reference Panels for Download =
 
 
 
Some commonly used reference panels are available for download here. [NOTE: Chromosome X will be be available soon]
 
 
 
{| class="wikitable" border="1" cellpadding="2"
 
 
|- bgcolor="lightgray"
 
|- bgcolor="lightgray"
! Reference Panel
+
! width="150px" |Reference Panel
! Format
+
! width="100px" |Number <br> of Samples
! Download Link
+
! width="100px" |File Format
! Internal CSG Copy Link
+
! width="100px" |Parameter <br>  Estimates <br> Available
 +
! width="120px" |Chromosomes
 +
! width="80px" |Link
 +
|-
 +
| rowspan=4 | '''1000 Genomes''' <br>
 +
'''Phase 3''' <br>
 +
(version 5)
 +
| rowspan=4  style="text-align:center" | '''2,504'''
 +
| '''VCF'''
 +
| -
 +
| 1-22,X
 +
| [ftp://share.sph.umich.edu/minimac3/G1K_P3_VCF_Files.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P3_VCF_Files.tar.gz Download]-->
 +
|-
 +
| rowspan=2  style="text-align:center" | '''M3VCF'''
 +
| YES
 +
| 1-22,X
 +
|  [ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_WITH_ESTIMATES.tar.gz Download] <!-- [ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_WITH_ESTIMATES.tar.gz Download]-->
 
|-  
 
|-  
| 1000 Genomes Phase 3
+
|NO
| VCF Files
+
| 1-22,X
| Coming Soon
+
| [ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_NO_ESTIMATES.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_NO_ESTIMATES.tar.gz Download]-->
| <code>/net/fantasia/home/sayantan/DATABASE/1000G/PHASE_3/FOR_UPLOAD/G1K_P3/VCF_Files/</code>
 
 
|-  
 
|-  
| 1000 Genomes Phase 3
+
| '''VCF''','''M3VCF'''
| M3VCF Files (With Parameter Estimates)
+
| YES
| Coming Soon
+
| X
| <code>/net/fantasia/home/sayantan/DATABASE/1000G/PHASE_3/FOR_UPLOAD/G1K_P3/M3VCF_Files_With_Estimates/</code>
+
| [ftp://share.sph.umich.edu/minimac3/G1K_P3_CHR_X_VCF_M3VCF_FILES.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P3_CHR_X_VCF_M3VCF_FILES.tar.gz Download]-->
 
|-  
 
|-  
| 1000 Genomes Phase 3
+
| rowspan=4 |  '''1000 Genomes''' <br>
| M3VCF Files (Without Parameter Estimates)
+
'''Phase 1''' <br>
| Coming Soon
+
(version 3)
| <code>/net/fantasia/home/sayantan/DATABASE/1000G/PHASE_3/FOR_UPLOAD/G1K_P3/M3VCF_Files_No_Estimates/</code>
+
| rowspan=4  | '''1,092'''
 +
| '''VCF'''
 +
| -
 +
| 1-22,X
 +
| [ftp://share.sph.umich.edu/minimac3/G1K_P1_VCF_Files.tar.gz Download]
 
|-  
 
|-  
| 1000 Genomes Phase 1
+
| rowspan=2  style="text-align:center" | '''M3VCF'''
| VCF Files
+
| YES
| Coming Soon
+
| 1-22,X
| <code>/net/fantasia/home/sayantan/DATABASE/1000G/PHASE_1_V3/FOR_UPLOAD/G1K_P1/VCF_Files/</code>
+
| [ftp://share.sph.umich.edu/minimac3/G1K_P1_M3VCF_FILES_WITH_ESTIMATES.tar.gz Download]
 
|-  
 
|-  
| 1000 Genomes Phase 1
+
| NO
| M3VCF Files (With Parameter Estimates)
+
| 1-22,X
| Coming Soon
+
| [ftp://share.sph.umich.edu/minimac3/G1K_P1_M3VCF_FILES_NO_ESTIMATES.tar.gz Download]
| <code>/net/fantasia/home/sayantan/DATABASE/1000G/PHASE_1_V3/FOR_UPLOAD/G1K_P1/M3VCF_Files_With_Estimates/</code>
 
 
|-  
 
|-  
| 1000 Genomes Phase 1
+
| '''VCF''','''M3VCF'''
| M3VCF Files (Without Parameter Estimates)
+
| YES
| Coming Soon
+
| X
| <code>/net/fantasia/home/sayantan/DATABASE/1000G/PHASE_1_V3/FOR_UPLOAD/G1K_P1/M3VCF_Files_No_Estimates/</code>
+
| [ftp://share.sph.umich.edu/minimac3/G1K_P1_CHR_X_VCF_M3VCF_FILES.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P1_CHR_X_VCF_M3VCF_FILES.tar.gz Download]-->
 +
|}
  
|}
+
= Reference =
 +
 
 +
If you use [[minimac3]] please cite:
 +
 
 +
''Das S, Forer L, Schönherr S, Sidore C, Locke AE'' et al. Next-generation genotype imputation service and methods. Nature Genetics 2016; 48, 1284–1287 (2016) doi:10.1038/ng.3656[http://www.nature.com/ng/journal/v48/n10/full/ng.3656.html]
  
 
= Contact =
 
= Contact =
  
 
In case of any queries and bugs please contact [mailto:sayantan@umich.edu Sayantan Das].
 
In case of any queries and bugs please contact [mailto:sayantan@umich.edu Sayantan Das].
 +
 +
[[Category:Software]]

Latest revision as of 15:40, 18 October 2022

  • New Version Minimac4 available ! Please Check out !!!
  • Please join our NEW mailing list to get updates about future releases, bug fixes or post queries.
  • No further development on Minimac3 !!! See ChangeLog for details !!!

Useful Wiki Pages

There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:

Introduction

Minimac3 is a lower memory and more computationally efficient implementation of the genotype imputation algorithms in minimac and minimac2. Minimac3 is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy. It accomplishes this by identifying repeat haplotype patterns and using these to simplify the underlying calculations, with no loss of accuracy.

Minimac3 uses M3VCF files (customized minimac3 VCF files) to store reference panel information in a compact form, thus saving on memory and time required to read large datasets. Users can use Minimac3 to convert standard VCF files to M3VCF files. M3VCF files can also store pre-calculated estimates of recombination fraction and error, which speeds up later rounds of imputation. Minimac3 outputs results in the form of standard VCF files for easy data manipulation in downstream analysis.

Download

Minimac3 is currently available as a release version. Commonly used reference panels in M3VCF format are available for download in Reference Panels.

Please join our NEW mailing list to get updates about future releases or report possible bugs or email them to Sayantan Das.

VERSION: 2.0.1 !!! (Updated 6.6.2016) !!!

Github Repo: Users can clone from github repository as well : Minimac3 Github

Cloning from GitHub is recommened so that updates can be easily pulled back !!!

Description Download Link
Source Files UNIX Users
Binary Executable UNIX Users

Binary executables are NOT guaranteed to run on every LINUX machine. Please compile from source files if you have trouble with the executable, or else contact the author Sayantan Das..

Usage

Users should follow the following steps to compile Minimac3 (if they downloaded the source files) or should skip them (if they downloaded the binary executable).

## DOWNLOAD, EXTRACT MINIMAC3 AND COMPILE
 
wget ftp://share.sph.umich.edu/minimac3/Minimac3.v2.0.1.tar.gz
tar -xzvf Minimac3.v2.0.1.tar.gz
cd Minimac3/
make

A typical Minimac3 command line for imputation is as follows

../bin/Minimac3 --refHaps refPanel.vcf \ 
                --haps targetStudy.vcf \
                --prefix testRun

Here refPanel.vcf is the reference panel used in VCF format (e.g. 1000 Genomes), targetStudy.vcf is the phased GWAS data in VCF format, and testRun is the prefix for the output files. Some commonly used reference panels are available for download in Reference Panels. See wiki page on Detailed Usage and Imputation Cookbook for further details on using Minimac3 for imputation analysis.

Users can always type the following for further support:

 /bin/Minimac3 --help

Reference Panels for Download

Some commonly used reference panels are available for download here:

Chr X Haplotypes for 1000 Genomes Phase 3 have been updated on Oct 20 to include multi-allelic variants as well (split as bi-allelic variants) !!!

Reference Panel Number
of Samples
File Format Parameter
Estimates
Available
Chromosomes Link
1000 Genomes

Phase 3
(version 5)

2,504 VCF - 1-22,X Download
M3VCF YES 1-22,X Download
NO 1-22,X Download
VCF,M3VCF YES X Download
1000 Genomes

Phase 1
(version 3)

1,092 VCF - 1-22,X Download
M3VCF YES 1-22,X Download
NO 1-22,X Download
VCF,M3VCF YES X Download

Reference

If you use minimac3 please cite:

Das S, Forer L, Schönherr S, Sidore C, Locke AE et al. Next-generation genotype imputation service and methods. Nature Genetics 2016; 48, 1284–1287 (2016) doi:10.1038/ng.3656[1]

Contact

In case of any queries and bugs please contact Sayantan Das.