Difference between revisions of "M3vcftools"

From Genome Analysis Wiki
Jump to navigationJump to search
 
(9 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
= Useful Wiki Pages =
 
= Useful Wiki Pages =
  
There are a few pages in this Wiki that may be useful to for '''Minimac3''' users. Here are links to a few:
+
There are a few pages in this Wiki that may be useful to for '''m3vcftools''' users. Here are links to a few:
  
 
* [[M3vcftools| m3vcftools Overview Page]]
 
* [[M3vcftools| m3vcftools Overview Page]]
  
* [[M3vcftools Usage | m3vcftools Usage and Documentation]]
+
* [[M3vcftools Usage | m3vcftools Usage Options]]
  
 
* [[M3vcftools Examples| m3vcftools Examples]]
 
* [[M3vcftools Examples| m3vcftools Examples]]
Line 19: Line 19:
 
= Introduction =
 
= Introduction =
  
'''Minimac3 ''' is a lower memory and more computationally efficient implementation of the genotype imputation algorithms in [[Minimac|minimac]] and [[Minimac2|minimac2]]. '''Minimac3''' is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy. It accomplishes this by identifying repeat haplotype patterns and using these to simplify the underlying calculations, with no loss of accuracy.
+
'''m3vcftools''' is a tool (exactly similar to [http://vcftools.sourceforge.net/ '''vcftools''']) but will be much faster and can be used for simple data query and basic summary like allele frequency, linkage-r2 calculation, etc. '''m3vcftools''' is ideal for mega reference panels like [http://www.haplotype-reference-consortium.org/ '''Haplotye Reference Consortium (HRC)'''] with over 32,000 samples where '''m3vcftools is 70-90 times faster than vcftools (18 mins vs 28 hours)''' in calculating AF or LD summaries. For panels like  '''[[Minimac3#Reference Panels for Download | 1000 Genomes Phase 3]], m3vcftools is 12-15 times faster'''.
  
Minimac3 uses [[M3VCF Files|<font face=Courier>M3VCF</font> files]] (customized minimac3 VCF files) to store reference panel information in a compact form, thus saving on memory and time required to read large datasets. Users can use Minimac3 to convert standard VCF files to <font face=Courier>M3VCF</font> files. <font face=Courier>M3VCF</font> files can also store pre-calculated estimates of recombination fraction and error, which speeds up later rounds of imputation.  Minimac3 outputs results in the form of standard VCF files for easy data manipulation in downstream analysis.
+
The command line format for m3vcftools is going to be exactly same as that of vcftools, thus users wouldn't have to learn a new tool or change their already existing pipelines. The only difference would lie in the fact that m3vcftools usually takes M3VCF files has input instead of VCF files. Of course, the tool itself can convert a VCF file into M3VCf format which can then later be used for fast data query and summary statistics calculations.
  
 
= Download =
 
= Download =
  
'''Minimac3 ''' is currently available as a release version. Commonly used reference panels in <font face=Courier>M3VCF</font> format are available for download in [[#Reference Panels for Download | Reference Panels]].  
+
'''m3vcftools''' is currently available as a pilot version. Please see our wiki page on [[M3vcftools Usage | m3vcftools Usage Options]] for more details on available options.  
  
'''Please join our NEW [https://groups.google.com/forum/embed/?place=forum/minimac3-help&umich.edu| mailing list] to get updates about future releases or report possible bugs or email them to  [mailto:sayantan@umich.edu Sayantan Das].'''
+
The software is in a very developmental mode and new options are going to be added regularly. '''Please join our NEW [https://groups.google.com/forum/embed/?place=forum/m3vcftools-help&umich.edu mailing list]''' to get updates about future releases or provide us with feedback about possible bug reports, remarks on documentation, '''or new feature requests that you might want early'''. User could also email their queries/feedback to  [mailto:sayantan@umich.edu Sayantan Das].
  
'''VERSION: 1.0.13 !!! (Updated 10.15.2015) !!!'''
+
'''VERSION: 1.0.1 !!! (Updated 10.25.2015) !!!'''
  
'''Github Repo:''' Users can clone from github repository as well : [https://github.com/Santy-8128/Minimac3 Minimac3 Github]  
+
'''Github Repo:''' Users can clone from github repository as well : [https://github.com/Santy-8128/m3vcftools m3vcftools Github]  
  
 
'''Cloning from GitHub is recommened so that updates can be easily pulled back !!!'''
 
'''Cloning from GitHub is recommened so that updates can be easily pulled back !!!'''
Line 40: Line 40:
 
! Download Link
 
! Download Link
 
|-  
 
|-  
| Minimac3 Source Files  
+
| m3vcftools Source Files  
| [ftp://share.sph.umich.edu/minimac3/Minimac3.v1.0.13.tar.gz UNIX Users ]
+
| [ftp://share.sph.umich.edu/minimac3/m3vcftools.v1.0.1.tar.gz UNIX Users ]
  
 
|}
 
|}
Line 47: Line 47:
 
= Usage=
 
= Usage=
  
Users should follow the following steps to compile '''Minimac3''' (if they downloaded the source files) or should skip them (if they downloaded the binary executable).
+
Users should follow the following steps to compile '''m3vcftools ''' (if they downloaded the source files).
  
  ## EXTRACT MINIMAC3 AND COMPILE
+
  ## EXTRACT M3VCFTOOLS AND COMPILE
 
  &nbsp;
 
  &nbsp;
  tar -xzvf Minimac3.v1.0.13.tar.gz
+
  tar -xzvf m3vcftools.v1.0.1.tar.gz
  cd Minimac3/
+
  cd m3vcftools /
 
  make
 
  make
  
A typical '''Minimac3''' command line for imputation is as follows
+
For e.g. to calculate LD the command should be: <code>../bin/m3vcftools --vcf INPUT.m3vcf.gz --hap-r2 --ld-window-bp 10000 --min-r2 0.4 --out PREFIX</code>
  
../bin/Minimac3 --refHaps refPanel.vcf \
+
For e.g. to convert a VCF file into M3VCF format  : <code>../bin/m3vcftools --vcf INPUT.vcf.gz   --recode --out PREFIX</code>
                --haps targetStudy.vcf \
 
                --prefix testRun
 
 
 
Here <font face=Courier>refPanel.vcf</font> is the reference panel used in VCF format (e.g. 1000 Genomes), <font face=Courier>targetStudy.vcf</font> is the phased GWAS data in VCF format, and <font face=Courier>testRun</font> is the prefix for the output files. Some commonly used reference panels are available for download in [[Minimac3 Imputation Cookbook#Reference Panels for Download| Reference Panels]]. See wiki page on [[Minimac3 Usage| Detailed Usage]] and [[Minimac3 Imputation Cookbook|Imputation Cookbook]] for further details on using '''Minimac3''' for imputation analysis.
 
 
Users can always type the following for further support:
 
 
 
   /bin/Minimac3 --help
 
 
 
= Reference Panels for Download =
 
 
 
Some commonly used reference panels are available for download here:
 
 
 
{| class="wikitable" style="text-align:center" border="1" cellpadding="2"
 
|- bgcolor="lightgray"
 
! width="150px" |Reference Panel
 
! width="100px" |Number <br> of Samples
 
! width="100px" |File Format
 
! width="100px" |Parameter <br>  Estimates <br> Available
 
! width="120px" |Chromosomes
 
! width="80px" |Link
 
|-
 
| rowspan=4 | '''1000 Genomes''' <br>
 
'''Phase 3''' <br>
 
(version 5)
 
| rowspan=4  style="text-align:center" | '''2,504'''
 
| '''VCF'''
 
| -
 
| 1-22,X
 
| [ftp://share.sph.umich.edu/minimac3/G1K_P3_VCF_Files.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P3_VCF_Files.tar.gz Download]-->
 
|-
 
| rowspan=2  style="text-align:center" | '''M3VCF'''
 
| YES
 
| 1-22,X
 
|  [ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_WITH_ESTIMATES.tar.gz Download] <!-- [ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_WITH_ESTIMATES.tar.gz Download]-->
 
|-
 
|NO
 
| 1-22,X
 
|  [ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_NO_ESTIMATES.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P3_M3VCF_FILES_NO_ESTIMATES.tar.gz Download]-->
 
|-
 
| '''VCF''','''M3VCF'''
 
| YES
 
| X
 
|  [ftp://share.sph.umich.edu/minimac3/G1K_P3_CHR_X_VCF_M3VCF_FILES.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P3_CHR_X_VCF_M3VCF_FILES.tar.gz Download]-->
 
|-
 
| rowspan=4 |  '''1000 Genomes''' <br>
 
'''Phase 1''' <br>
 
(version 3)
 
| rowspan=4  | '''1,092'''
 
| '''VCF'''
 
| -
 
| 1-22,X
 
| [ftp://share.sph.umich.edu/minimac3/G1K_P1_VCF_Files.tar.gz Download]
 
|-
 
|  rowspan=2  style="text-align:center" | '''M3VCF'''
 
| YES
 
| 1-22,X
 
| [ftp://share.sph.umich.edu/minimac3/G1K_P1_M3VCF_FILES_WITH_ESTIMATES.tar.gz Download]
 
|-  
 
| NO
 
| 1-22,X
 
| [ftp://share.sph.umich.edu/minimac3/G1K_P1_M3VCF_FILES_NO_ESTIMATES.tar.gz Download]
 
|-
 
| '''VCF''','''M3VCF'''
 
| YES
 
| X
 
|  [ftp://share.sph.umich.edu/minimac3/G1K_P1_CHR_X_VCF_M3VCF_FILES.tar.gz Download] <!--[ftp://share.sph.umich.edu/minimac3/G1K_P1_CHR_X_VCF_M3VCF_FILES.tar.gz Download]-->
 
|}
 
  
 
= Contact =
 
= Contact =
  
 
In case of any queries and bugs please contact [mailto:sayantan@umich.edu Sayantan Das].
 
In case of any queries and bugs please contact [mailto:sayantan@umich.edu Sayantan Das].

Latest revision as of 21:57, 10 July 2016

  • Download Pilot Version 1.0.1 !!! (Updated Oct 2015) !!! See ChangeLog for details !!!
  • Please join our NEW mailing list to get updates about future releases, bug fixes or post queries.

Useful Wiki Pages

There are a few pages in this Wiki that may be useful to for m3vcftools users. Here are links to a few:

Introduction

m3vcftools is a tool (exactly similar to vcftools) but will be much faster and can be used for simple data query and basic summary like allele frequency, linkage-r2 calculation, etc. m3vcftools is ideal for mega reference panels like Haplotye Reference Consortium (HRC) with over 32,000 samples where m3vcftools is 70-90 times faster than vcftools (18 mins vs 28 hours) in calculating AF or LD summaries. For panels like 1000 Genomes Phase 3, m3vcftools is 12-15 times faster.

The command line format for m3vcftools is going to be exactly same as that of vcftools, thus users wouldn't have to learn a new tool or change their already existing pipelines. The only difference would lie in the fact that m3vcftools usually takes M3VCF files has input instead of VCF files. Of course, the tool itself can convert a VCF file into M3VCf format which can then later be used for fast data query and summary statistics calculations.

Download

m3vcftools is currently available as a pilot version. Please see our wiki page on m3vcftools Usage Options for more details on available options.

The software is in a very developmental mode and new options are going to be added regularly. Please join our NEW mailing list to get updates about future releases or provide us with feedback about possible bug reports, remarks on documentation, or new feature requests that you might want early. User could also email their queries/feedback to Sayantan Das.

VERSION: 1.0.1 !!! (Updated 10.25.2015) !!!

Github Repo: Users can clone from github repository as well : m3vcftools Github

Cloning from GitHub is recommened so that updates can be easily pulled back !!!

Description Download Link
m3vcftools Source Files UNIX Users

Usage

Users should follow the following steps to compile m3vcftools (if they downloaded the source files).

## EXTRACT M3VCFTOOLS AND COMPILE
 
tar -xzvf m3vcftools.v1.0.1.tar.gz
cd m3vcftools /
make

For e.g. to calculate LD the command should be: ../bin/m3vcftools --vcf INPUT.m3vcf.gz --hap-r2 --ld-window-bp 10000 --min-r2 0.4 --out PREFIX

For e.g. to convert a VCF file into M3VCF format : ../bin/m3vcftools --vcf INPUT.vcf.gz --recode --out PREFIX

Contact

In case of any queries and bugs please contact Sayantan Das.