Difference between revisions of "Minimac3"

From Genome Analysis Wiki
Jump to navigationJump to search
(Add category tag to improve findability)
 
(126 intermediate revisions by 2 users not shown)
Line 1: Line 1:
= Introduction =
+
* '''New Version [[Minimac4]] available ! Please Check out !!!'''
  
'''Minimac3 ''' is a lower memory and more computationally efficient implementation of [http://genome.sph.umich.edu/wiki/Minimac2 minimac2]. It is an algorithm for genotypic imputation that works on phased genotypes (say from [http://genome.sph.umich.edu/wiki/MaCH MaCH]). minimac3 is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy. This algorithm analyzes only the unique sets of haplotypes in small genomic segments, thereby saving on time-complexity, computational memory but no loss in degree of accuracy.
+
* '''Please join our NEW [https://groups.google.com/forum/embed/?place=forum/minimac4-help#!forum/minimac4-help mailing list] to get updates about future releases, bug fixes or post queries.'''
  
Minimac3, apart from performing imputation, also creates [[#M3VCF Files|<font face=Courier>M3VCF</font> files]] (customized minimac3 VCF files) which are able to store reference panel information in a compact form, thus saving on memory and time required to read large datasets. User will have an option to use the binary code to either just convert VCF files to <font face=Courier>M3VCF</font> files or to perform imputation as well. The code can also take a previously generated <font face=Courier>M3VCF</font> file as input for the reference panel. <font face=Courier>M3VCF</font> files can also store pre-calculated estimates of recombination fraction and error, which can be used for later runs of imputation.  The latest version of Minimac3 also allows output in the form of VCF files for easier data manipulation in downstream analysis.
+
* '''No further development on Minimac3 !!!''' See [[Minimac3 ChangeLog | ChangeLog ]] for details !!!
  
= Download =
+
= Useful Wiki Pages =
  
'''Minimac3 ''' is available as an undocumented release version. The source files and commonly used reference panels in <code>M3VCF</code> format will be available for download here. The authors would really appreciate if users could test it on their data set and let us know of possible bugs to be fixed.
+
There are a few pages in this Wiki that may be useful to for '''Minimac3''' users. Here are links to a few:
  
You can either copy the directory from fantasia OR download it from the link below.
+
* [[Minimac3| Minimac3 Overview Page]]
  
* To copy from fantasia:
+
* [[Minimac3 Usage | Minimac3 Usage and Documentation]]
 
  cp -r /net/fantasia/home/sayantan/Softwares/Minimac3/ LocalDirectoryMinimac3/
 
  
* To Download
+
* [[Minimac3 Imputation Cookbook]] ('''Recommended for New Users!!''')
  
{| class="wikitable" border="1" cellpadding="2"
+
* [[Minimac3 Cookbook : Chromosome X Imputation| Chromosome X Imputation]]
|- bgcolor="lightgray"
 
! Description
 
! Download Link
 
|-
 
| Minimac3 Executable
 
| [[Media : Minimac3.v1.0.0.binary.tar.gz  | UNIX Users ]]
 
|-
 
| Minimac3-omp Executable (for parallel computing)
 
| [[Media : Minimac3.v1.0.0.binary-OMP.tar.gz | UNIX Users ]]
 
|-
 
| Minimac3 Source Files
 
| [[Media : Minimac3.v1.0.0.tar.gz  | UNIX Users ]]
 
|}
 
  
= Usage=
+
* [[Minimac3 Examples| Minimac3 Examples]]
  
Users can always type the following for further support:
+
* [[Minimac3 Info File| Minimac3 Info File]]
  
  /bin/Minimac3 --help
+
* [[Minimac3 ChangeLog | Minimac3 ChangeLog ]]
  
A typical Minimac3 command line would have the following parameter options:
+
* [[M3VCF Files| M3VCF Files]]
  
Command Line Options:
+
= Introduction =
    Reference Haplotypes : --refHaps [], --passOnly
 
      Target Haplotypes : --haps []
 
      Output Parameters : --processReference, --prefix [Minimac3.Output],
 
                          --updateModel, --nobgzip, --doseOutput, --hapOutput,
 
                          --format [GT,DS]
 
      Subset Parameters : --chr [], --start, --end, --window
 
    Starting Parameters : --rec [], --err []
 
  Estimation Parameters : --rounds [5], --states [200]
 
        Other Parameters : --help, --cpus [1], --params
 
              PhoneHome : --noPhoneHome, --phoneHomeThinning [50]
 
  
 +
'''Minimac3 ''' is a lower memory and more computationally efficient implementation of the genotype imputation algorithms in [[Minimac|minimac]] and [[Minimac2|minimac2]]. '''Minimac3''' is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy. It accomplishes this by identifying repeat haplotype patterns and using these to simplify the underlying calculations, with no loss of accuracy.
  
The most typically used parameter options are explained below with [[#Examples|examples]]. See subsection below for detailed [[#List of Options |list of available options]].
+
Minimac3 uses [[M3VCF Files|<font face=Courier>M3VCF</font> files]] (customized minimac3 VCF files) to store reference panel information in a compact form, thus saving on memory and time required to read large datasets. Users can use Minimac3 to convert standard VCF files to <font face=Courier>M3VCF</font> files. <font face=Courier>M3VCF</font> files can also store pre-calculated estimates of recombination fraction and error, which speeds up later rounds of imputation.  Minimac3 outputs results in the form of standard VCF files for easy data manipulation in downstream analysis.
  
==Reference Haplotypes==
+
= Download =
 
 
<font face=Courier>"--refHaps"</font> denotes the main input reference file could either be a VCF file or <font face=Courier>M3VCF</font> file. No handle is necessary for denoting type of file, program will detect it itself.
 
 
 
Minimac3 can handle both VCF files or <font face=Courier>M3VCF</font> files as input for the reference panel. The program can itself identify the type of file, and no handle is necessary for that.  <font face=Courier>M3VCF</font> files are customized files created by Minimac3 (possibly in some previous run) that stores large reference panels in a compact form so as to save memory and computation time involved in reading large files. <font face=Courier>M3VCF</font> files must be generated in some previous run of Minimac3 and can be saved and used in later runs for faster loading of data. See section on [[#M3VCF Files| <font face=Courier>M3VCF</font>]] files and [[#Examples|examples]] below on how to use them.
 
 
 
 
 
 
 
==Target Haplotypes==
 
 
 
<font face=Courier>"--haps"</font> denotes the main input target file which has to be a VCF file (<font face=Courier>.vcf</font> or <font face=Courier>.vcf.gz</font>). The extensions are not mandatory.
 
 
 
Minimac3  can handle only VCF files as input for the target/gwas data. Note that input VCF files would be automatically assumed to be pre-phased. Markers which are in the target panel and NOT in the reference panel would be excluded from the output files. User must merge these extra markers back to the original data in order to analyze them. See [[#Examples|examples]] below.
 
 
 
 
==Output Files==
 
 
 
<font face=Courier>"--prefix"</font> denotes the prefix for the output files (By default: <code>Minimac3.Output</code>)
 
 
 
Minimac3 can output files in both <code>VCF</code> format and <code>.dose</code> format (usual [http://genome.sph.umich.edu/wiki/Minimac minimac] output format). By default, Minimac3 will only output in <code>VCF</code> format and users must use the handle <code>--doseOutput</code> to output in <code>.dose</code> format or the handle <code>--hapOutput</code> to output dosage data in phased format. VCF files can store dosage data only in the following formats:
 
 
 
* '''DS''' : Estimated alternate allele dosage (default).
 
* '''GT''' : Estimated most likely genotype (default).
 
* '''GP''' : Estimated posterior genotype probabilities (use handle <code>--format GP</code>).
 
 
 
See [[#Examples|examples]] and the [[#List of Options|list of options]] below for further details.
 
 
 
= Examples =
 
 
 
 
 
To look at the examples, the folder of '''Minimac3 ''' needs to be copied to the users local directory first. Then move to the folder <font face=Courier>LocalDirectory/test/</font>
 
 
 
  cp -r /net/fantasia/home/sayantan/Softwares/Minimac3/ LocalDirectoryMinimac3/
 
  cd LocalDirectoryMinimac3/test/
 
 
 
The following example uses a VCF reference file <font face=Courier>[refPanel.vcf]</font> and a VCF target sample file <font face=Courier>[targetStudy.vcf]</font>
 
 
 
  ../bin/Minimac3 --refHaps refPanel.vcf --haps targetStudy.vcf --prefix testRun
 
 
 
The following example is same as above but uses minimac3-omp (which is implemented using openMP programming enabling parallel computing).
 
 
 
  ../bin/Minimac3-omp --refHaps refPanel.vcf --haps targetStudy.vcf --prefix testRun --cpus 5
 
 
 
The following example converts a VCF reference file into <font face=Courier>M3VCF</font> (only). It also does parameter estimation based on the reference panel using leave-one-out method and saves them in the <font face=Courier>M3VCF</font> file. The parameter estimation can be skipped with "<font face=Courier>--rounds = 0</font>". If the option "<font face=Courier>--processReference</font>" is ON, no imputation will be done, only compression of file from VCF to <font face=Courier>M3VCF</font> format will be done.
 
 
 
../bin/Minimac3 --refHaps refPanel.vcf --processReference --prefix testRun
 
 
 
The following example uses a <font face=Courier>M3VCF</font> file (which was created in the previous example) and VCF target sample files (<font face=Courier>targetStudy.vcf</font>) for imputation.
 
 
 
../bin/Minimac3 --refHaps testRun.m3vcf.gz --haps targetStudy.vcf --prefix testRun
 
 
 
[NOTE: In the example above, if <code>testRun.m3vcf.gz</code> was created with <code>rounds = 0</code>, it would contain no parameter estimates. Note that the program works with the saved estimates when available (as in the example above), whereas it does parameter estimation when the estimates are NOT available (as in the example below which is created with <code>rounds = 0</code>)]
 
 
 
../bin/Minimac3 --refHaps refPanel.vcf --processReference --rounds 0 --prefix testRun
 
../bin/Minimac3 --refHaps testRun.m3vcf.gz --haps targetStudy.vcf --prefix testRun
 
 
 
The following example also uses a <font face=Courier>M3VCF</font> reference file <font face=Courier>[refPanel.m3vcf.gz]</font> and a VCF target sample file <font face=Courier>[targetStudy.vcf]</font>. However, it only analyzes chromosome 6 from position 505988 to 873131 (allowing a buffer of 100 bp on either side). It also outputs a phased haplotype file (using <code>--hapOutput,</code> option) and the usual dosage file (using <code>--doseOutput,</code> option)
 
 
 
../bin/Minimac3 --refHaps testRun.m3vcf.gz --chr 6 --start 505988 --end 873131 --window 100 --haps targetStudy.vcf --prefix testRun --hapOutput --doseOutput
 
 
 
For examples on imputation of chromosome X, see [[#Chromosome X Imputation|Chromosome X Imputation]]
 
  
= Chromosome X Imputation =
+
'''Minimac3 ''' is currently available as a release version. Commonly used reference panels in <font face=Courier>M3VCF</font> format are available for download in [[#Reference Panels for Download | Reference Panels]].
  
Chromosome X has a pseudo-autosomal region (PAR) which can be imputed for males and females together. Imputing the PAR on chromosome X is same as usual imputation, since both males and females are diploids at these sites. However, the non pseudo-autosomal region needs to be imputed for males and females separately, as males are haploids while females are diploids. Of course, the PAR and non-PAR regions need to be imputed separately.
+
'''Please join our NEW [https://groups.google.com/forum/embed/?place=forum/minimac4-help#!forum/minimac4-help mailing list] to get updates about future releases or report possible bugs or email them to  [mailto:sayantan@umich.edu Sayantan Das].'''
  
The following example illustrates imputation on the non-PAR of chromosome X for males and females separately (files available in <code>Minimac3/test/</code> directory)
+
'''VERSION: 2.0.1 !!! (Updated 6.6.2016) !!!'''
  
Male Samples (Non-PAR)
+
'''Github Repo:''' Users can clone from github repository as well : [https://github.com/Santy-8128/Minimac3 Minimac3 Github]
  ../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf --haps targetStudyChrX.males.vcf --prefix testRun
 
  
Female Samples (Non-PAR)
+
'''Cloning from GitHub is recommened so that updates can be easily pulled back !!!'''
  ../bin/Minimac3 --refHaps refPanelChrX.Non.Auto.vcf --haps targetStudyChrX.females.vcf --prefix testRun
 
  
NOTE: For imputing non-PAR of chromosome X, user must analyze male and female samples separately, otherwise program would crash. User should also ensure that the reference panel consists of only PAR or non-PAR region of chromosome X, otherwise program would crash.
+
{| class="wikitable"  style="text-align:center"  border="1" cellpadding="2"
 
 
= List of Options =
 
 
 
 
 
The following table gives a brief description of all the parameters of '''Minimac3'''. A detailed description would be available soon.
 
 
 
{| border="1" cellpadding="2"  
 
 
|- bgcolor="lightgray"
 
|- bgcolor="lightgray"
! scope="col" width="200px" | Parameter
+
! Description
! scope="col" width="1225px" | Description
+
! Download Link
 
|-  
 
|-  
| <code>--refHaps filename </code>
+
| Source Files
| VCF file or <code>M3VCF</code> file containing haplotype data for reference panel.
+
| [ftp://share.sph.umich.edu/minimac3/Minimac3.v2.0.1.tar.gz UNIX Users ]
 
|-  
 
|-  
| <code>--passOnly</code>  
+
| Binary Executable <sup>&#8224;</sup>  
| If ON, only variants will FILTER=PASS will be recorded from reference VCF file (does NOT work on <code>M3VCF</code> files yet). 
+
| [ftp://share.sph.umich.edu/minimac3/Minimac3Executable.tar.gz UNIX Users ]
|-
 
| <code>--haps filename </code>
 
| File containing haplotype data for target (gwas) samples. Must be a VCF file.
 
|-
 
| <code>--processReference</code>
 
| This option will only convert an input VCF file to M3VCF format (maybe for a later run of imputation). If this option is ON, no imputation would be performed and thus all other parameters will be ignored (of course, except for parameters on Reference Haplotypes and Subsetting Options). This option also does parameter estimation using the reference panel and saves them in the M3VCF file (the estimation can be skipped with <code>rounds = 0</code>)
 
|-
 
| <code>--prefix output </code>
 
| Prefix for all output files generated. By default: <code>[Minimac3.Output]</code>
 
|- 
 
| <code>--updateModel</code>
 
| If ON, saved parameter estimates read from a M3VCF file will be further updated using the gwas samples. Will be ignored if VCF reference file. [Default: OFF]
 
|-
 
| <code>--nobgzip</code>
 
| If ON, output files will be NOT bgzipped.
 
|-
 
| <code>--doseOutput</code>
 
| If ON, imputed data will be output as dosage file as well [Default: OFF].
 
|-
 
| <code>--hapOutput</code>
 
| If ON, phased imputed data will be output as well [Default: OFF].
 
|-
 
| <code>--format</code>
 
| Specifies which fields to output for the FORMAT field in output VCF file. Available handles: <code>GT,DS,GP </code>[Default: <code>GT,DS</code>].
 
|-
 
| <code>--chr 22</code>
 
| Chromosome number for which we will carry out imputation.
 
|-
 
| <code>--start 100000</code>
 
| Start position for imputation by chunking. Would not work without <code>--chr</code> option.
 
|-
 
| <code>--end 200000</code>
 
| End position for imputation by chunking. Would not work without <code>--chr</code> option.
 
|-
 
| <code>--window 5000</code>
 
| Length of buffer region on either side of <code>--start</code> and <code>--end</code>. By default = 0.
 
|-
 
| <code>--rec</code>
 
| Recombination File from previous run of Minimac/Minimac3. (<code>--err</code> parameter must also be provided, if using this handle)
 
|-
 
| <code>--err</code>
 
| Error File from previous run of Minimac/Minimac3. (<code>--rec</code> parameter must also be provided, if using this handle)
 
|-
 
| <code>--rounds 5</code>
 
| Rounds of optimization for model parameters, which describe population recombination rates and per SNP error rates. By default = 5.
 
|-
 
| <code>--states 200</code>
 
| Maximum number of reference (or target) haplotypes to be examined during parameter optimization. By default = 200.
 
|-
 
| <code>--help</code>
 
| A short help on options.
 
|-
 
| <code>--cpus 5</code>
 
| Number of cpus for parallel computing. Would work only with Minimac3-omp.
 
|-
 
| <code>--noPhoneHome</code>
 
| If ON, code will NOT send a SUCCESS/FAILURE status of the execution to home server.
 
|-
 
| <code>--phoneHomeThinning 50</code>
 
| Percentage probability of sending SUCCESS/FAILURE status of the execution to home server [Default: 50%]
 
 
|}
 
|}
  
= M3VCF Files =
+
'''<sup>&#8224;</sup>''' Binary executables are NOT guaranteed to run on every LINUX machine. Please compile from source files if you have trouble with the executable, or else contact the author [mailto:sayantan@umich.edu Sayantan Das]..
  
<code>M3VCF</code> files stand for " '''M'''inimac'''3''' '''VCF'''" files and are files that can store data on large reference panels in a compact way, thereby saving on memory required. These files are created on the basis of the same idea as this method of imputation. Since, in small genomic segments, the number of unique haplotypes is much lesser than the total number of haplotypes, we could just store the unique representatives instead of all the haplotypes and thus save on memory required. <code>M3VCF</code> files are a very convenient way to save large reference panels as compared to VCF files because:
+
= Usage=
* They require lesser space than VCF files. The compression ratio for a panel of 50K samples and 337K markers is '''~1200x''' (unzipped) and '''~4x''' (zipped).
 
* They are faster to read while importing data. Above mentioned reference panel was '''20x''' faster when imported as <code>M3VCF</code> file (as compared to VCF file).
 
* They are already stored in a way to attain optimal computational complexity while imputation.
 
  
 +
Users should follow the following steps to compile '''Minimac3''' (if they downloaded the source files) or should skip them (if they downloaded the binary executable).
  
<code>M3VCF</code> files are formatted ''somewhat'' following the structure of a VCF files. An example is shown below. The first few lines are header lines and contain information pertaining to number of haplotypes, number of markers and number of genomic segments. Following these, we define each genomic segment (usually denoted by <code><BLOCK:*-*></code>) followed by the markers contained in this genomic segment (denoted by their original marker IDs). In the example below, a reference panel of 6 samples (12 haplotypes) and 8 markers was reduced to two genomic segments (<code><BLOCK:0-5></code> and <code><BLOCK:5-7></code>). The first block is from marker 0 to 5 (with 6 variants) and the next one from 5 to 7 (with 3 variants). Note that two consecutive blocks must overlap at the common marker. The column under <code>FORMAT</code> stores the number of markers in a segment (<code>VARIANTS</code>) and the number of unique haplotypes in that segment (<code>REPS</code>). The following columns represent the unique label for each sample in that block. The numbers represent (under the column of samples) the unique haplotype representative which it resembles in that genomic segment. The unique haplotypes are stored in the following rows in marker x sample format.
+
## DOWNLOAD, EXTRACT MINIMAC3 AND COMPILE
 +
&nbsp;
 +
wget ftp://share.sph.umich.edu/minimac3/Minimac3.v2.0.1.tar.gz
 +
tar -xzvf Minimac3.v2.0.1.tar.gz
 +
cd Minimac3/
 +
make
  
 +
A typical '''Minimac3''' command line for imputation is as follows
  
In the rows followed by the block identification, the details of the variants are stored (like in a usual VCF file) along with the unique haplotypes (under the <code>FORMAT</code> column). For the <code><BLOCK:0-5></code>, we have 4 unique haplotypes (given by the variable <code>REPS</code>) which are the four sub-columns (of 0's and 1's) under the <code>FORMAT</code> column. Similarly, the 2 unique haplotypes for <code><BLOCK:5-7></code> are shown in the <code>FORMAT</code> column for its three markers.
+
../bin/Minimac3 --refHaps refPanel.vcf \
 +
                --haps targetStudy.vcf \
 +
                --prefix testRun
  
 +
Here <font face=Courier>refPanel.vcf</font> is the reference panel used in VCF format (e.g. 1000 Genomes), <font face=Courier>targetStudy.vcf</font> is the phased GWAS data in VCF format, and <font face=Courier>testRun</font> is the prefix for the output files. Some commonly used reference panels are available for download in [[Minimac3 Imputation Cookbook#Reference Panels for Download| Reference Panels]]. See wiki page on [[Minimac3 Usage| Detailed Usage]] and [[Minimac3 Imputation Cookbook|Imputation Cookbook]] for further details on using '''Minimac3''' for imputation analysis.
 +
 +
Users can always type the following for further support:
  
##fileformat=M3VCF
+
   /bin/Minimac3 --help
##version=1.1
 
##compression=block
 
##n_blocks=2
 
##n_haps=12
 
##n_markers=8
 
##<Note=This is NOT a VCF File and cannot be read by vcftools>
 
#CHROM  POS    ID              REF    ALT    QUAL    FILTER  INFO                FORMAT    A1    A2    B1    B2    C1    C2    D1    D2    E1    E2    F1    F2
 
6      73924   '''<BLOCK:0-5>'''    .      .      .      .      B1;'''VARIANTS'''=6;'''REPS'''=4 .        0    1    3    0    0    0    1    0    3    1    0    3
 
6      73924  chr6:73924:D    AAGAG  A      .      .      B1.M1;R=7;A=5        0000
 
6      89919  chr6:89919      T      G      .      .      B1.M;R=4;A=3        0100
 
6      89921  chr6:89921      C      T      .      .      B1.M3;R=2;A=4        0000
 
6      89932  chr6:89932      A      G      .      .      B1.M4;R=1;A=3        0000
 
6      89949  chr6:89949      G      A      .      .      B1.M5;R=3;A=1        0010
 
6      100116  chr6:100116    C      A      .      .      B1.M6;R=2;A=1        0001
 
6      100116  '''<BLOCK:5-7>'''    .      .      .      .      B2;'''VARIANTS'''=3;'''REPS'''=2  .        0    1    0    0    0    0    1    0    1    1    0    1
 
6      100116  chr6:100116    T      A      .      .      B1.M8;R=4;A=1        00
 
6      132285  chr6:132285    T      A      .      .      B1.M9;R=4;A=1        01
 
6      148689  chr6:148689    TAA    T      .      .      B1.M9;R=4;A=1        01
 
  
 +
= Reference Panels for Download =
  
 +
Some commonly used reference panels are available for download here:
  
= Reference Panels for Download =
+
'''Chr X Haplotypes for 1000 Genomes Phase 3 have been updated on Oct 20 to include multi-allelic variants as well (split as bi-allelic variants) !!!'''
  
Some commonly used reference panels are available for download here. [NOTE: Chromosome X will be be available soon]
+
{| class="wikitable" style="text-align:center" border="1" cellpadding="2"
 
 
{| class="wikitable" border="1" cellpadding="2"
 
 
|- bgcolor="lightgray"
 
|- bgcolor="lightgray"
! Reference Panel
+
! width="150px" |Reference Panel
! Format
+
! width="100px" |Number <br> of Samples
! Download Link
+
! width="100px" |File Format
! Internal CSG Copy Link
+
! width="100px" |Parameter <br>  Estimates <br> Available
 +
! width="120px" |Chromosomes
 +
! width="80px" |Link
 
|-  
 
|-  
| 1000 Genomes Phase 3
+
| rowspan=4 | '''1000 Genomes''' <br>
| VCF Files
+
'''Phase 3''' <br>
 +
(version 5)
 +
| rowspan=4  style="text-align:center" | '''2,504'''
 +
| '''VCF'''
 
| -
 
| -
| <code>/net/fantasia/home/sayantan/DATABASE/1000G/PHASE_3/FOR_UPLOAD/G1K_P3/VCF_Files/</code>
+
| 1-22,X
 +
| [ftp://share.sph.umich.edu/minimac3/G1K_P3_VCF_Files.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P3_VCF_Files.tar.gz Download]-->
 +
|-
 +
| rowspan=2  style="text-align:center" | '''M3VCF'''
 +
| YES
 +
| 1-22,X
 +
|  [ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_WITH_ESTIMATES.tar.gz Download] <!-- [ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_WITH_ESTIMATES.tar.gz Download]-->
 +
|-
 +
|NO
 +
| 1-22,X
 +
|  [ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_NO_ESTIMATES.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_NO_ESTIMATES.tar.gz Download]-->
 
|-  
 
|-  
| 1000 Genomes Phase 3
+
| '''VCF''','''M3VCF'''
| M3VCF Files (With Parameter Estimates)
+
| YES
| -
+
| X
| <code>/net/fantasia/home/sayantan/DATABASE/1000G/PHASE_3/FOR_UPLOAD/G1K_P3/M3VCF_Files_With_Estimates/</code>
+
| [ftp://share.sph.umich.edu/minimac3/G1K_P3_CHR_X_VCF_M3VCF_FILES.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P3_CHR_X_VCF_M3VCF_FILES.tar.gz Download]-->
 
|-  
 
|-  
| 1000 Genomes Phase 3
+
| rowspan=4 |  '''1000 Genomes''' <br>
| M3VCF Files (Without Parameter Estimates)
+
'''Phase 1''' <br>
 +
(version 3)
 +
| rowspan=4  | '''1,092'''
 +
| '''VCF'''
 
| -
 
| -
| <code>/net/fantasia/home/sayantan/DATABASE/1000G/PHASE_3/FOR_UPLOAD/G1K_P3/M3VCF_Files_No_Estimates/</code>
+
| 1-22,X
 +
| [ftp://share.sph.umich.edu/minimac3/G1K_P1_VCF_Files.tar.gz Download]
 
|-  
 
|-  
| 1000 Genomes Phase 1
+
| rowspan=2  style="text-align:center" | '''M3VCF'''
| VCF Files
+
| YES
| -
+
| 1-22,X
| <code>/net/fantasia/home/sayantan/DATABASE/1000G/PHASE_1_V3/FOR_UPLOAD/G1K_P1/VCF_Files/</code>
+
| [ftp://share.sph.umich.edu/minimac3/G1K_P1_M3VCF_FILES_WITH_ESTIMATES.tar.gz Download]
 
|-  
 
|-  
| 1000 Genomes Phase 1
+
| NO
| M3VCF Files (With Parameter Estimates)
+
| 1-22,X
| -
+
| [ftp://share.sph.umich.edu/minimac3/G1K_P1_M3VCF_FILES_NO_ESTIMATES.tar.gz Download]
| <code>/net/fantasia/home/sayantan/DATABASE/1000G/PHASE_1_V3/FOR_UPLOAD/G1K_P1/M3VCF_Files_With_Estimates/</code>
 
 
|-  
 
|-  
| 1000 Genomes Phase 1
+
| '''VCF''','''M3VCF'''
| M3VCF Files (Without Parameter Estimates)
+
| YES
| -
+
| X
| <code>/net/fantasia/home/sayantan/DATABASE/1000G/PHASE_1_V3/FOR_UPLOAD/G1K_P1/M3VCF_Files_No_Estimates/</code>
+
| [ftp://share.sph.umich.edu/minimac3/G1K_P1_CHR_X_VCF_M3VCF_FILES.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P1_CHR_X_VCF_M3VCF_FILES.tar.gz Download]-->
 +
|}
 +
 
 +
= Reference =
 +
 
 +
If you use [[minimac3]] please cite:
  
|}
+
''Das S, Forer L, Schönherr S, Sidore C, Locke AE'' et al. Next-generation genotype imputation service and methods. Nature Genetics 2016; 48, 1284–1287 (2016) doi:10.1038/ng.3656[http://www.nature.com/ng/journal/v48/n10/full/ng.3656.html]
  
 
= Contact =
 
= Contact =
  
 
In case of any queries and bugs please contact [mailto:sayantan@umich.edu Sayantan Das].
 
In case of any queries and bugs please contact [mailto:sayantan@umich.edu Sayantan Das].
 +
 +
[[Category:Software]]

Latest revision as of 15:40, 18 October 2022

  • New Version Minimac4 available ! Please Check out !!!
  • Please join our NEW mailing list to get updates about future releases, bug fixes or post queries.
  • No further development on Minimac3 !!! See ChangeLog for details !!!

Useful Wiki Pages

There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:

Introduction

Minimac3 is a lower memory and more computationally efficient implementation of the genotype imputation algorithms in minimac and minimac2. Minimac3 is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy. It accomplishes this by identifying repeat haplotype patterns and using these to simplify the underlying calculations, with no loss of accuracy.

Minimac3 uses M3VCF files (customized minimac3 VCF files) to store reference panel information in a compact form, thus saving on memory and time required to read large datasets. Users can use Minimac3 to convert standard VCF files to M3VCF files. M3VCF files can also store pre-calculated estimates of recombination fraction and error, which speeds up later rounds of imputation. Minimac3 outputs results in the form of standard VCF files for easy data manipulation in downstream analysis.

Download

Minimac3 is currently available as a release version. Commonly used reference panels in M3VCF format are available for download in Reference Panels.

Please join our NEW mailing list to get updates about future releases or report possible bugs or email them to Sayantan Das.

VERSION: 2.0.1 !!! (Updated 6.6.2016) !!!

Github Repo: Users can clone from github repository as well : Minimac3 Github

Cloning from GitHub is recommened so that updates can be easily pulled back !!!

Description Download Link
Source Files UNIX Users
Binary Executable UNIX Users

Binary executables are NOT guaranteed to run on every LINUX machine. Please compile from source files if you have trouble with the executable, or else contact the author Sayantan Das..

Usage

Users should follow the following steps to compile Minimac3 (if they downloaded the source files) or should skip them (if they downloaded the binary executable).

## DOWNLOAD, EXTRACT MINIMAC3 AND COMPILE
 
wget ftp://share.sph.umich.edu/minimac3/Minimac3.v2.0.1.tar.gz
tar -xzvf Minimac3.v2.0.1.tar.gz
cd Minimac3/
make

A typical Minimac3 command line for imputation is as follows

../bin/Minimac3 --refHaps refPanel.vcf \ 
                --haps targetStudy.vcf \
                --prefix testRun

Here refPanel.vcf is the reference panel used in VCF format (e.g. 1000 Genomes), targetStudy.vcf is the phased GWAS data in VCF format, and testRun is the prefix for the output files. Some commonly used reference panels are available for download in Reference Panels. See wiki page on Detailed Usage and Imputation Cookbook for further details on using Minimac3 for imputation analysis.

Users can always type the following for further support:

 /bin/Minimac3 --help

Reference Panels for Download

Some commonly used reference panels are available for download here:

Chr X Haplotypes for 1000 Genomes Phase 3 have been updated on Oct 20 to include multi-allelic variants as well (split as bi-allelic variants) !!!

Reference Panel Number
of Samples
File Format Parameter
Estimates
Available
Chromosomes Link
1000 Genomes

Phase 3
(version 5)

2,504 VCF - 1-22,X Download
M3VCF YES 1-22,X Download
NO 1-22,X Download
VCF,M3VCF YES X Download
1000 Genomes

Phase 1
(version 3)

1,092 VCF - 1-22,X Download
M3VCF YES 1-22,X Download
NO 1-22,X Download
VCF,M3VCF YES X Download

Reference

If you use minimac3 please cite:

Das S, Forer L, Schönherr S, Sidore C, Locke AE et al. Next-generation genotype imputation service and methods. Nature Genetics 2016; 48, 1284–1287 (2016) doi:10.1038/ng.3656[1]

Contact

In case of any queries and bugs please contact Sayantan Das.