Difference between revisions of "DosageConvertor"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 32: Line 32:
 
'''<sup>&#8224;</sup>''' Binary executables are NOT guaranteed to run on every LINUX machine. Please compile from source files if you have trouble with the executable, or clone from the [https://github.com/Santy-8128/DosageConvertor github repository]. Else contact the author [mailto:sayantan@umich.edu Sayantan Das].
 
'''<sup>&#8224;</sup>''' Binary executables are NOT guaranteed to run on every LINUX machine. Please compile from source files if you have trouble with the executable, or clone from the [https://github.com/Santy-8128/DosageConvertor github repository]. Else contact the author [mailto:sayantan@umich.edu Sayantan Das].
  
= Usage=
+
= Installation =
  
 
Users should follow the following steps to compile '''DosageConvertor ''' (if they downloaded the source files).
 
Users should follow the following steps to compile '''DosageConvertor ''' (if they downloaded the source files).
Line 43: Line 43:
 
  make
 
  make
  
 +
 +
= Usage=
 
== Convert to PLINK Files ==
 
== Convert to PLINK Files ==
  
Line 48: Line 50:
  
 
  ./DosageConvertor        --vcfDose      TestDataImputedVCF.dose.vcf.gz
 
  ./DosageConvertor        --vcfDose      TestDataImputedVCF.dose.vcf.gz
                           --info        TestDataImputedVCF.info          (NOT mandatory)
+
                           --info        TestDataImputedVCF.info          (optional)
 
                           --prefix      OutPrefix
 
                           --prefix      OutPrefix
 
                           --type        plink                            (default)
 
                           --type        plink                            (default)
Line 60: Line 62:
  
 
  ./DosageConvertor        --vcfDose      TestDataImputedVCF.dose.vcf.gz
 
  ./DosageConvertor        --vcfDose      TestDataImputedVCF.dose.vcf.gz
                           --info        TestDataImputedVCF.info        (NOT mandatory)
+
                           --info        TestDataImputedVCF.info        (optional)
 
                           --prefix      OutPrefix
 
                           --prefix      OutPrefix
 
                           --type        mach
 
                           --type        mach

Revision as of 18:06, 11 July 2017

  • Download/Re-Clone Release Version 1.0.4 (Updated July 2017) !

Introduction

DosageConvertor is a C++ tool to convert dosage files (in VCF format) from Minimac3/4 to other formats such as MaCH or PLINK.

Download

VERSION: 1.0.4 (Updated 7.12.2017) !

[NOTE: Cloning from GitHub is recommened so that updates can be easily pulled back]

Description Download Link
Github Repository

DosageConvertor - Github

Source Files

UNIX Users

Binary Executable

UNIX Users

Binary executables are NOT guaranteed to run on every LINUX machine. Please compile from source files if you have trouble with the executable, or clone from the github repository. Else contact the author Sayantan Das.

Installation

Users should follow the following steps to compile DosageConvertor (if they downloaded the source files).

## EXTRACT M3VCFTOOLS AND COMPILE
 
wget ftp://share.sph.umich.edu/minimac3/DosageConvertor/DosageConvertor.v1.0.3.tar.gz
tar -xzvf DosageConvertor.v1.0.3.tar.gz
cd DosageConvertor/
make


Usage

Convert to PLINK Files

The following command line should convert a input VCF dosage file to a PLINK dosage file that can be tested for association in PLINK1.9 or PLINK2.0.

./DosageConvertor         --vcfDose      TestDataImputedVCF.dose.vcf.gz
                          --info         TestDataImputedVCF.info          (optional)
                          --prefix       OutPrefix
                          --type         plink                            (default)
                          --format       1                                (or 2,3)

This command line would create three files : OutPrefix.plink.dosage.gz, OutPrefix.fam, OutPrefix.map. The .fam and .map formats have been described here. The --format parameter can take values 1, 2 and 3. Each of these values correspond to the three different formats available for PLINK dosage files (details on PLINK dosage files are given here). Note that the generated OutPrefix.map does NOT contain any phenotype information (which needs to be manually edited before PLINK can run association testing). The OutPrefix.fam does NOT contain sex information unless chromosome X is available. See #Converting Chromosome X Files for more details.

Convert to MaCH Files

The following command line should convert a input VCF dosage file to a MaCH/minimac dosage file (the format that was available in the earlier versions of minimac). The generated dosage files can be tested for association using mach2dat.

./DosageConvertor         --vcfDose      TestDataImputedVCF.dose.vcf.gz
                          --info         TestDataImputedVCF.info         (optional)
                          --prefix       OutPrefix
                          --type         mach
                          --format       1                               (or 2)

When --type mach is used, the --format parameter can only take values 1 and 2. If the value is 1, the code generates OutPrefix.mach.dose.gz, OutPrefix.info where OutPrefix.mach.dose.gz contains the expected alternate allele count (one value per sample per marker). If the value is 2, it generates OutPrefix.mach.gprob.gz, OutPrefix.info where OutPrefix.mach.gprob.gz contains the genotype likelihoods for reference homozygote and heterozygote (two values per sample per marker). Note that in the input --info is NOT mandatory. However, if this info file is NOT provided, the output OutPrefix.info file will have some empty columns. Thus, if available, the generated info file should be provided along with the VCF file as input.

Converting Chromosome X Files

No extra parameter is necessary for converting chromosome X on the pseudo-autosomal region. For the non-PAR region:

  • If your input VCF dosage file has males as diploids, then just add handle --samePoidy. This will not generate sex information in the output PLINK .map file.
  • If your input VCF dosage file has males as haploids and also has GT information, the tool with automatically determine sex of the sample and report in the in the output .fam file.
  • If your input VCF file does NOT have GT information or as males coded as diploids, and you would still like the sex column in .fam file to be updated, then supply a sex file using --sexFile SomeFile where SomeFile has two columns, the names of samples as found in the VCF file as the first column, and M or F in the second column.

Command Line Options

The command options for DosageConvertor are explained below.

  • "--vcfDose" is a mandatory parameter requiring the input VCF file.
  • "--info" denotes the info file from the same imputation output. This parameter is NOT mandatory, but if NO info file is provided, the output MaCH info file will have some missing columns.
  • "--prefix" denotes the output file prefix (default value: Converted.Dosage).
  • "--type" denotes the output file format (available handles: mach (default) and plink).
  • "--format" decides whether to import imputed values from dosage (DS) or genotype probabilities (GP) of the input VCF file (available handles: DS (default) and GP).
  • "--buffer" denotes the number of markers to import at a time (valid only for MaCH format) (default value 10000).
  • "--idDelimiter " denotes the delimiter to Split VCF Sample ID into FID and IID for PLINK format (default value _).
Usage: ./DosageConvertor  --vcfDose      TestDataImputedVCF.dose.vcf.gz
                          --info         TestDataImputedVCF.info
                          --prefix       OutputFilePrefix
                          --type         plink OR mach   // depending on output format
                          --format       DS or GP        // based on if you want to output
                                                         // dosage (DS) or genotype prob (GP)
                          --buffer       10000           // Number of Markers to import and
                                                         // print at a time (valid only for
                                                         // MaCH format)
                          --idDelimiter  _               // Delimiter to Split VCF Sample ID into
                                                         // FID and IID for PLINK format

Contact

In case of any queries and bugs please contact Sayantan Das.