DosageConvertor
- Download/Re-Clone Release Version 1.0.4 (Updated July 2017) !
Introduction
DosageConvertor is a C++ tool to convert dosage files (in VCF format) from Minimac3/4 to other formats such as MaCH or PLINK.
Download
VERSION: 1.0.4 (Updated 7.12.2017) !
[NOTE: Cloning from GitHub is recommened so that updates can be easily pulled back]
Description | Download Link |
---|---|
Github Repository | |
Source Files | |
Binary Executable †
(Ubuntu 4.8.4) |
† Binary executables are NOT guaranteed to run on every LINUX machine. Please compile from source files if you have trouble with the executable, or clone from the github repository. Else contact the author Sayantan Das.
Installation
Users should follow the following steps to compile DosageConvertor (if they downloaded the source files).
## EXTRACT M3VCFTOOLS AND COMPILE wget ftp://share.sph.umich.edu/minimac3/DosageConvertor/DosageConvertor.v1.0.3.tar.gz tar -xzvf DosageConvertor.v1.0.3.tar.gz cd DosageConvertor/ make
Usage
Convert to PLINK Files
The following command line should convert a input VCF dosage file to a PLINK dosage file that can be tested for association in PLINK1.9 or PLINK2.0.
./DosageConvertor --vcfDose TestDataImputedVCF.dose.vcf.gz --info TestDataImputedVCF.info (optional) --prefix OutPrefix --type plink (default) --format 1 (or 2,3)
This command line would create three files : OutPrefix.plink.dosage.gz, OutPrefix.fam, OutPrefix.map
. The .fam
and .map
formats have been described here. The --format
parameter can take values 1, 2 and 3. Each of these values correspond to the three different formats available for PLINK dosage files (details on PLINK dosage files are given here). Note that the generated OutPrefix.map
does NOT contain any phenotype information (which needs to be manually edited before PLINK can run association testing). The OutPrefix.fam
does NOT contain sex information unless chromosome X is available. See #Converting Chromosome X Files for more details.
Convert to MaCH Files
The following command line should convert a input VCF dosage file to a MaCH/minimac dosage file (the format that was available in the earlier versions of minimac). The generated dosage files can be tested for association using mach2dat.
./DosageConvertor --vcfDose TestDataImputedVCF.dose.vcf.gz --info TestDataImputedVCF.info (optional) --prefix OutPrefix --type mach --format 1 (or 2)
When --type mach
is used, the --format
parameter can only take values 1 and 2. If the value is 1, the code generates OutPrefix.mach.dose.gz, OutPrefix.info
where OutPrefix.mach.dose.gz
contains the expected alternate allele count (one value per sample per marker). If the value is 2, it generates OutPrefix.mach.gprob.gz, OutPrefix.info
where OutPrefix.mach.gprob.gz
contains the genotype likelihoods for reference homozygote and heterozygote (two values per sample per marker). Note that in the input --info
is NOT mandatory. However, if this info file is NOT provided, the output OutPrefix.info
file will have some empty columns. Thus, if available, the generated info file should be provided along with the VCF file as input.
Converting Chromosome X Files
For a minimac3/4 output file containing the pseudo-autosomal region (PAR) on chromosome X, no extra parameter is necessary. For files containing the non-PAR region, please ensure the following:
- If your input VCF dosage file has males as diploids, then just add handle
--samePoidy
. This will NOT generate sex information in the output PLINK.fam
file.- If you still need the sex column in
.fam
file to be correctly updated, then supply a sex file using--sexFile SomeFile
whereSomeFile
has two columns: the first column has the names of samples as found in the VCF file, the second columns has M (for males) or F (for females).
- If you still need the sex column in
- If your input VCF dosage file has males as haploids and also has GT information, the tool with automatically determine the sex of the samples and report them in the output
.fam
file. No extra parameters are required.- If GT tags are NOT available, you would need to supply the sex file as described above. Otherwise it might crash.
Command Line Options
The command options for DosageConvertor are explained below.
"--vcfDose"
is a mandatory parameter requiring the input VCF file."--info"
denotes the info file from the same imputation output. This parameter is NOT mandatory, but if NO info file is provided, the output MaCH info file will have some missing columns."--prefix"
denotes the output file prefix (default value:Converted.Dosage
)."--type"
denotes the output file format (available handles:mach
(default) andplink
)."--format"
decides whether to import imputed values from dosage (DS
) or genotype probabilities (GP
) of the input VCF file (available handles:DS
(default) andGP
)."--buffer"
denotes the number of markers to import at a time (valid only for MaCH format) (default value10000
)."--idDelimiter "
denotes the delimiter to Split VCF Sample ID into FID and IID for PLINK format (default value_
).
Usage: ./DosageConvertor --vcfDose TestDataImputedVCF.dose.vcf.gz --info TestDataImputedVCF.info --prefix OutputFilePrefix --type plink OR mach // depending on output format --format DS or GP // based on if you want to output // dosage (DS) or genotype prob (GP) --buffer 10000 // Number of Markers to import and // print at a time (valid only for // MaCH format) --idDelimiter _ // Delimiter to Split VCF Sample ID into // FID and IID for PLINK format
Contact
In case of any queries and bugs please contact Sayantan Das.