Difference between revisions of "Minimac3 ChangeLog"

From Genome Analysis Wiki
Jump to navigationJump to search
 
(24 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
 
= Introduction =
 
= Introduction =
  
Line 13: Line 14:
 
--------------------------------------------------------------------------
 
--------------------------------------------------------------------------
  
==1.0.10 (February 27, 2015)==
+
==Ver 2.0.1 (June 6, 2016)==
 +
 
 +
Version : 2.0.1
 +
Author  : Sayantan Das
 +
Date    : June 6, 2016
 +
 
 +
        Major Changes:
 +
        1. Reduced memory usage significantly (by 35% for HRC panel)
 +
        2. Reduced run time significantly (by 20% for HRC panel)
 +
        3. Added a --lowMemory version that further reduces the memory usage
 +
          furhter by 33% (but takes 10% more time.
 +
 +
        Minor Changes:
 +
        1. Fixed a minor bug in parameter estimation (--processReference)
 +
        2. Added a draft.m3vcf file output after file compression and
 +
          before parameter estimation.
 +
        3. Changed definition of Estimated R-square so that no estimates
 +
          are over 1.0.
 +
        4. Added <contig> in VCF header to make bcftools readable.
 +
        5. Added FILTER column for typed sites and only typed sites.
 +
 
 +
==Ver 1.0.14 (April 15, 2016)==
 +
 
 +
Version : 1.0.13
 +
Author  : Sayantan Das
 +
Date    : April 15, 2016
 +
 
 +
        Fixed a bug in the makefile that was missing the g++ -O4
 +
        optimization flag. As a result, the openmp version was
 +
        3-4 times slower than expected. Updated LibStatGen Library
 +
        files to remove some warnings/errors.
 +
 
 +
==Ver 1.0.13 (October 15, 2015)==
 +
 
 +
Version : 1.0.13
 +
Author  : Sayantan Das
 +
Date    : October 5, 2015
 +
 
 +
        Updated the format of info file to report only REF and ALT
 +
        alleles instead of Major and Minor allele. The ALT_Frq and
 +
        MAF can be compared to find minor allele. The usual .dose
 +
        and .hapDose file have been updated to report alternate al-
 +
        -llele dosage and NOT major allele dosage, like earlier ve-
 +
        -rsions of minimac. Also mended a bug related to variants
 +
        that were typed only, while using --allTypedVariants option.
 +
        Finally, added acknowledgements to David Hinds, for his
 +
        immense help in making the code faster.
 +
 
 +
==Ver 1.0.12 (September 5, 2015)==
 +
 
 +
Version : 1.0.12
 +
Author  : Sayantan Das
 +
Date    : September 5, 2015
 +
 
 +
        Added --allTypedVariants option which adds variants which were
 +
        typed ONLY to the imputed output files and imputes any missing
 +
        values in such variants to the MAF. Also fixed a bug with chr-
 +
        -omosome X only male samples. Also added --rsid option to use
 +
        rs IDs of variants. Updated info file RefMAF to RefAF since it
 +
        gives the alternate AF and NOT minor AF.
 +
 
 +
==Ver 1.0.11 (April 21, 2015)==
 +
 
 +
Version : 1.0.11
 +
Author  : Sayantan Das
 +
Date    : April 21, 2015
 +
 
 +
        Implemented parallel threading for processReference. Moved all
 +
        haplotype data from char vector to bool vector. Implemented
 +
        David's patch and similar other division-removal techniques.
 +
        Also added legacy parameter unphased for unphased output.
 +
 
 +
==Ver 1.0.10 (February 27, 2015)==
  
 
Version : 1.0.10
 
Version : 1.0.10
Line 24: Line 97:
 
         and X)
 
         and X)
  
--------------------------------------------------------------------------
+
 
 +
==Ver 1.0.9 (February 20, 2015)==
  
 
Version : 1.0.8, 1.0.9
 
Version : 1.0.8, 1.0.9
Line 38: Line 112:
 
         uses much lesser memory.
 
         uses much lesser memory.
  
--------------------------------------------------------------------------
+
==Ver 1.0.7 (January 31, 2015)==
  
 
Version : 1.0.7
 
Version : 1.0.7
Line 46: Line 120:
 
         Fixed minor bugs and made thigsn ready for final release !!!
 
         Fixed minor bugs and made thigsn ready for final release !!!
  
--------------------------------------------------------------------------
+
==Ver 1.0.6 (November 4, 2014)==
  
 
Version : 1.0.6
 
Version : 1.0.6
Line 54: Line 128:
 
         Mended some minor bugs.
 
         Mended some minor bugs.
  
--------------------------------------------------------------------------
+
==Ver 1.0.5 (October 4, 2014)==
  
 
Version : 1.0.5
 
Version : 1.0.5
Line 61: Line 135:
  
 
         Implemented chromosome X imputation (both PAR and non-PAR). Also
 
         Implemented chromosome X imputation (both PAR and non-PAR). Also
added bgzip option to enable tabix options.
+
        added bgzip option to enable tabix options.
  
--------------------------------------------------------------------------
+
==Ver 1.0.4 (September 19, 2014)==
  
 
Version : 1.0.4
 
Version : 1.0.4
Line 70: Line 144:
  
 
         Improved the chuking option for the compressed files. Updated name
 
         Improved the chuking option for the compressed files. Updated name
OPTM to M3VCF (Minimac3 VCF). Changed handle to --processReference.
+
        OPTM to M3VCF (Minimac3 VCF). Changed handle to --processReference.
Added the option to add parameter estimates within M3VCF files wh-
+
        Added the option to add parameter estimates within M3VCF files wh-
-ich can be used for later runs. Added a new handle --updateModel
+
        -ich can be used for later runs. Added a new handle --updateModel
that will allow users to update these estimates using the target
+
        that will allow users to update these estimates using the target
panel as well, if and when they think necessary.
+
        panel as well, if and when they think necessary.
  
--------------------------------------------------------------------------
+
==Ver 1.0.2 (August 28, 2014)==
  
 
Version : 1.0.2
 
Version : 1.0.2
Line 83: Line 157:
  
 
         Mended a bug related to --vcfoutput option when used with --gzip  
 
         Mended a bug related to --vcfoutput option when used with --gzip  
option. Also added a option --format to manage handles in the
+
        option. Also added a option --format to manage handles in the
FORMAT field for output VCF files.
+
        FORMAT field for output VCF files.
  
--------------------------------------------------------------------------
+
==Ver 1.0.1 (August 6, 2014)==
  
 
Version : 1.0.1
 
Version : 1.0.1
Line 93: Line 167:
  
 
         Added the --window option that will allow a buffer region around
 
         Added the --window option that will allow a buffer region around
chunks. Also implemented this --start, --end and --window option
+
        chunks. Also implemented this --start, --end and --window option
on OPTM files. Minimac3 will, thus, be able to handle chunks on  
+
        on OPTM files. Minimac3 will, thus, be able to handle chunks on  
OPTM files as well. However, it will not perform the optimal all-
+
        OPTM files as well. However, it will not perform the optimal all-
-ocation again but just extract the chunk from the given file.
+
        -ocation again but just extract the chunk from the given file.
  
--------------------------------------------------------------------------
+
==Ver 1.0.0 (July 21, 2014)==
  
 
Version : 1.0.0
 
Version : 1.0.0
Line 105: Line 179:
  
 
         Added the --vcfOutput option that will output the dosage in VCF
 
         Added the --vcfOutput option that will output the dosage in VCF
format as well. Improved it by allowing to flush out VCF files
+
        format as well. Improved it by allowing to flush out VCF files
after every 200 samples to save memory. Also implemented openmp
+
        after every 200 samples to save memory. Also implemented openmp
to allow parallel computing.
+
        to allow parallel computing.
  
--------------------------------------------------------------------------
+
==Ver 0.7.1 (June 8, 2014)==
  
 
Version : 0.7.1
 
Version : 0.7.1
Line 116: Line 190:
  
 
         Added the --golden option that can calculate the actual R-square
 
         Added the --golden option that can calculate the actual R-square
if the true/hard genotypes are provided in a vcf file. Made minor  
+
        if the true/hard genotypes are provided in a vcf file. Made minor  
editions for debugging.
+
        editions for debugging.
  
--------------------------------------------------------------------------
+
==Ver 0.6.2 (June 8, 2014)==
  
 
Version : 0.6.2
 
Version : 0.6.2
Line 126: Line 200:
  
 
         Edited bug in VCF packing part that was giving slightly less opt-
 
         Edited bug in VCF packing part that was giving slightly less opt-
mized configurations.  
+
        mized configurations.  
  
--------------------------------------------------------------------------
+
==Ver 0.6.1 (June 7, 2014)==
  
 
Version : 0.6.1
 
Version : 0.6.1
Line 135: Line 209:
  
 
         Worked on an important bug that gave memory overflow in large sa-
 
         Worked on an important bug that gave memory overflow in large sa-
mples. Name changed to Minimac3. Added utility for VCF output fo-
+
        mples. Name changed to Minimac3. Added utility for VCF output fo-
rmat that would enable users to easy data manipulation.  
+
        rmat that would enable users to easy data manipulation.  
  
--------------------------------------------------------------------------
+
==Ver 0.5.1 (April 9, 2014)==
  
 
Version : 0.5.1
 
Version : 0.5.1
Line 145: Line 219:
  
 
     Removed futher bugs. Modified to use lesser memory and run faster.
 
     Removed futher bugs. Modified to use lesser memory and run faster.
The Unique.cpp file was updated according to Goncalo's packVcf so-
+
        The Unique.cpp file was updated according to Goncalo's packVcf so-
ftware. Target Data is being read more effeciently. Scaffolding
+
        ftware. Target Data is being read more effeciently. Scaffolding
done locally during imputation. Changed name to Minimac2.
+
        done locally during imputation. Changed name to Minimac2.
  
--------------------------------------------------------------------------
+
==Ver 0.4.1 (March 14, 2014)==
  
  
Line 157: Line 231:
  
 
     Removed bug with --sample. Added lots of other options like --chr,
 
     Removed bug with --sample. Added lots of other options like --chr,
--start, --end, --rounds, --states etc. Added data structure for  
+
        --start, --end, --rounds, --states etc. Added data structure for  
variant type. Added structure so as NOT to use haplotype data after  
+
        variant type. Added structure so as NOT to use haplotype data after  
reading it once. Now the code uses the reduced haplotype structure
+
        reading it once. Now the code uses the reduced haplotype structure
once it has read the data. This was uses lesser memory.
+
        once it has read the data. This was uses lesser memory.
  
--------------------------------------------------------------------------
+
==Ver 0.2.1 (January 29, 2014)==
  
 
Version : 0.2.1
 
Version : 0.2.1
Line 169: Line 243:
  
 
Improved on the speed in parameter estimation part by NOT calcula-
 
Improved on the speed in parameter estimation part by NOT calcula-
ting the optimal allocation everytime, instead update it from the
+
        ting the optimal allocation everytime, instead update it from the
original allocation. Updated to use --refSnps for reference, use
+
        original allocation. Updated to use --refSnps for reference, use
--vcfTarget for target files.
+
        --vcfTarget for target files.
 
      
 
      
--------------------------------------------------------------------------
+
==Ver 0.1.1 (May 25, 2014)==
  
 
Version : 0.1.1
 
Version : 0.1.1
Line 180: Line 254:
  
 
     Started working on the basis of code of minimac. Implemented State
 
     Started working on the basis of code of minimac. Implemented State
Space Reduction Method for HMM calculations. Implemented the Para-
+
        Space Reduction Method for HMM calculations. Implemented the Para-
meter Estimation part and Optimal Allocation part.  
+
        meter Estimation part and Optimal Allocation part.  
  
 
--------------------------------------------------------------------------
 
--------------------------------------------------------------------------
 +
 +
= Download =
 +
 +
'''Minimac3 ''' is available as an undocumented release version. The source files (and binary executable) are available for download in  [[Minimac3#Download | Source Files]] and commonly used reference panels in VCF and <font face=Courier>M3VCF</font> formats are available for download in [[Minimac3#Reference Panels for Download | Reference Panels]].
 +
 +
= Useful Wiki Pages =
 +
 +
There are a few pages in this Wiki that may be useful to for '''Minimac3''' users. Here are links to a few:
 +
 +
* [[Minimac3| Minimac3 Overview Page]]
 +
 +
* [[Minimac3 Usage | Minimac3 Usage and Documentation]]
 +
 +
* [[Minimac3 - Full List of Options]]
 +
 +
* [[Minimac3 Imputation Cookbook]] ('''Recommended for New Users!!''')
 +
 +
* [[Minimac3 Cookbook : Pre-Phasing | Pre-Phasing ]]
 +
 +
* [[Minimac3 Cookbook : Converting Files to VCF| Converting Files to VCF]]
 +
 +
* [[Minimac3 Examples| Minimac3 Examples]]
 +
 +
* [[M3VCF Files| M3VCF Files]]
 +
 +
= Contact =
 +
 +
In case of any queries and bugs please contact [mailto:sayantan@umich.edu Sayantan Das].

Latest revision as of 22:24, 6 June 2016

Introduction

Minimac3 is a lower memory and more computationally efficient implementation of minimac2. It is an algorithm for genotypic imputation that works on phased genotypes (say from MaCH) and is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy.

This wiki page gives users a summary of changes over the versions.

Change Log Summary

This section gives details of the changes over the different versions.


Minimac3 Changes:


Ver 2.0.1 (June 6, 2016)

Version : 2.0.1 Author : Sayantan Das Date : June 6, 2016

       Major Changes:
       1. Reduced memory usage significantly (by 35% for HRC panel)
       2. Reduced run time significantly (by 20% for HRC panel)
       3. Added a --lowMemory version that further reduces the memory usage
          furhter by 33% (but takes 10% more time.

       Minor Changes:
       1. Fixed a minor bug in parameter estimation (--processReference)
       2. Added a draft.m3vcf file output after file compression and
          before parameter estimation.
       3. Changed definition of Estimated R-square so that no estimates
          are over 1.0.
       4. Added <contig> in VCF header to make bcftools readable.
       5. Added FILTER column for typed sites and only typed sites.

Ver 1.0.14 (April 15, 2016)

Version : 1.0.13 Author : Sayantan Das Date : April 15, 2016

       Fixed a bug in the makefile that was missing the g++ -O4
       optimization flag. As a result, the openmp version was 
       3-4 times slower than expected. Updated LibStatGen Library
       files to remove some warnings/errors.

Ver 1.0.13 (October 15, 2015)

Version : 1.0.13 Author : Sayantan Das Date : October 5, 2015

        Updated the format of info file to report only REF and ALT 
        alleles instead of Major and Minor allele. The ALT_Frq and
        MAF can be compared to find minor allele. The usual .dose
        and .hapDose file have been updated to report alternate al-
        -llele dosage and NOT major allele dosage, like earlier ve-
        -rsions of minimac. Also mended a bug related to variants
        that were typed only, while using --allTypedVariants option.
        Finally, added acknowledgements to David Hinds, for his
        immense help in making the code faster.

Ver 1.0.12 (September 5, 2015)

Version : 1.0.12 Author : Sayantan Das Date : September 5, 2015

       Added --allTypedVariants option which adds variants which were
       typed ONLY to the imputed output files and imputes any missing
       values in such variants to the MAF. Also fixed a bug with chr-
       -omosome X only male samples. Also added --rsid option to use
       rs IDs of variants. Updated info file RefMAF to RefAF since it
       gives the alternate AF and NOT minor AF.

Ver 1.0.11 (April 21, 2015)

Version : 1.0.11 Author : Sayantan Das Date : April 21, 2015

       Implemented parallel threading for processReference. Moved all
       haplotype data from char vector to bool vector. Implemented
       David's patch and similar other division-removal techniques.
       Also added legacy parameter unphased for unphased output.

Ver 1.0.10 (February 27, 2015)

Version : 1.0.10 Author : Sayantan Das Date : February 27, 2015

       Add parameter --log which will output a log file (BUT not print 
       on the screen). Also added legacy parameter --myChromosome whi-
       will enable users to use special chromosomes (apart from 1-22 
       and X)


Ver 1.0.9 (February 20, 2015)

Version : 1.0.8, 1.0.9 Author : Sayantan Das Date : February 20, 2015

       Removed dependcies on boost directory. Converted all double to fl-
       oat. Had to change the main imputation formula to work well for l-
       arge values. Also updated Rsq calulation formual to avoid underfl-
       as a result of using float. Made minor changes to memory allocati-
       on so as to remove all 2-d or 3-d look-ups. Updated a loop to rem-
       ove repeat allocation, similar to minimac2. Now it is faster and 
       uses much lesser memory.

Ver 1.0.7 (January 31, 2015)

Version : 1.0.7 Author : Sayantan Das Date : January 31, 2015

       Fixed minor bugs and made thigsn ready for final release !!!

Ver 1.0.6 (November 4, 2014)

Version : 1.0.6 Author : Sayantan Das Date : November 4, 2014

       Mended some minor bugs.

Ver 1.0.5 (October 4, 2014)

Version : 1.0.5 Author : Sayantan Das Date : October 3, 2014

       Implemented chromosome X imputation (both PAR and non-PAR). Also
       added bgzip option to enable tabix options.

Ver 1.0.4 (September 19, 2014)

Version : 1.0.4 Author : Sayantan Das Date : September 19, 2014

       Improved the chuking option for the compressed files. Updated name
       OPTM to M3VCF (Minimac3 VCF). Changed handle to --processReference.
       Added the option to add parameter estimates within M3VCF files wh-
       -ich can be used for later runs. Added a new handle --updateModel
       that will allow users to update these estimates using the target
       panel as well, if and when they think necessary.

Ver 1.0.2 (August 28, 2014)

Version : 1.0.2 Author : Sayantan Das Date : August 28, 2014

       Mended a bug related to --vcfoutput option when used with --gzip 
       option. Also added a option --format to manage handles in the
       FORMAT field for output VCF files.

Ver 1.0.1 (August 6, 2014)

Version : 1.0.1 Author : Sayantan Das Date : August 6, 2014

       Added the --window option that will allow a buffer region around
       chunks. Also implemented this --start, --end and --window option
       on OPTM files. Minimac3 will, thus, be able to handle chunks on 
       OPTM files as well. However, it will not perform the optimal all-
       -ocation again but just extract the chunk from the given file.

Ver 1.0.0 (July 21, 2014)

Version : 1.0.0 Author : Sayantan Das Date : July 21, 2014

       Added the --vcfOutput option that will output the dosage in VCF
       format as well. Improved it by allowing to flush out VCF files
       after every 200 samples to save memory. Also implemented openmp
       to allow parallel computing.

Ver 0.7.1 (June 8, 2014)

Version : 0.7.1 Author : Sayantan Das Date : June 8, 2014

       Added the --golden option that can calculate the actual R-square
       if the true/hard genotypes are provided in a vcf file. Made minor 
       editions for debugging.

Ver 0.6.2 (June 8, 2014)

Version : 0.6.2 Author : Sayantan Das Date : June 8, 2014

       Edited bug in VCF packing part that was giving slightly less opt-
       mized configurations. 

Ver 0.6.1 (June 7, 2014)

Version : 0.6.1 Author : Sayantan Das Date : June 7, 2014

       Worked on an important bug that gave memory overflow in large sa-
       mples. Name changed to Minimac3. Added utility for VCF output fo-
       rmat that would enable users to easy data manipulation. 

Ver 0.5.1 (April 9, 2014)

Version : 0.5.1 Author : Sayantan Das Date : April 9, 2014

   	Removed futher bugs. Modified to use lesser memory and run faster.
       The Unique.cpp file was updated according to Goncalo's packVcf so-
       ftware. Target Data is being read more effeciently. Scaffolding
       done locally during imputation. Changed name to Minimac2.

Ver 0.4.1 (March 14, 2014)

Version : 0.4.1 Author : Sayantan Das Date : March 14 2014

   	Removed bug with --sample. Added lots of other options like --chr,
       --start, --end, --rounds, --states etc. Added data structure for 
       variant type. Added structure so as NOT to use haplotype data after 
       reading it once. Now the code uses the reduced haplotype structure
       once it has read the data. This was uses lesser memory.

Ver 0.2.1 (January 29, 2014)

Version : 0.2.1 Author : Sayantan Das Date : January 29 2014

Improved on the speed in parameter estimation part by NOT calcula-

       ting the optimal allocation everytime, instead update it from the
       original allocation. Updated to use --refSnps for reference, use
       --vcfTarget for target files.
   

Ver 0.1.1 (May 25, 2014)

Version : 0.1.1 Author : Sayantan Das Date : May 25 2013

   	Started working on the basis of code of minimac. Implemented State
       Space Reduction Method for HMM calculations. Implemented the Para-
       meter Estimation part and Optimal Allocation part. 

Download

Minimac3 is available as an undocumented release version. The source files (and binary executable) are available for download in Source Files and commonly used reference panels in VCF and M3VCF formats are available for download in Reference Panels.

Useful Wiki Pages

There are a few pages in this Wiki that may be useful to for Minimac3 users. Here are links to a few:

Contact

In case of any queries and bugs please contact Sayantan Das.