Changes

From Genome Analysis Wiki
Jump to navigationJump to search
378 bytes removed ,  14:01, 18 January 2019
no edit summary
Line 4: Line 4:  
= Introduction =
 
= Introduction =
 
DosageConvertor is a C++ tool to convert dosage files (in VCF format) from [[Minimac4| Minimac3/4]] to other formats such as MaCH or PLINK.
 
DosageConvertor is a C++ tool to convert dosage files (in VCF format) from [[Minimac4| Minimac3/4]] to other formats such as MaCH or PLINK.
 +
 +
[Please note that this tool CANNOT handle missing values in the input files and may NOT work appropriately for non-Minimac3/4 VCF files]
 +
    
= Download =
 
= Download =
Line 19: Line 22:  
|  
 
|  
 
[https://github.com/Santy-8128/DosageConvertor DosageConvertor - Github]  
 
[https://github.com/Santy-8128/DosageConvertor DosageConvertor - Github]  
|-
  −
| Source Files
  −
|
  −
[ftp://share.sph.umich.edu/minimac3/DosageConvertor/DosageConvertor.v1.0.3.tar.gz  UNIX Users ]
  −
|-
  −
| Binary Executable <sup>&#8224;</sup>
  −
(Ubuntu 4.8.4)
  −
|
  −
[ftp://share.sph.umich.edu/minimac3/DosageConvertor/DosageConvertorExecutable.tar.gz  UNIX Users ]
      
|}
 
|}
  −
'''<sup>&#8224;</sup>''' Binary executables are NOT guaranteed to run on every LINUX machine. Please compile from source files if you have trouble with the executable, or clone from the [https://github.com/Santy-8128/DosageConvertor github repository]. Else contact the author [mailto:sayantan@umich.edu Sayantan Das].
      
= Installation =
 
= Installation =
Line 39: Line 31:  
  ## EXTRACT M3VCFTOOLS AND COMPILE
 
  ## EXTRACT M3VCFTOOLS AND COMPILE
 
  &nbsp;
 
  &nbsp;
  wget ftp://share.sph.umich.edu/minimac3/DosageConvertor/DosageConvertor.v1.0.3.tar.gz
+
  git clone https://github.com/Santy-8128/DosageConvertor
tar -xzvf DosageConvertor.v1.0.3.tar.gz
   
  cd DosageConvertor/
 
  cd DosageConvertor/
 
  make
 
  make
Line 55: Line 46:  
                           --format      1                                (or 2,3)
 
                           --format      1                                (or 2,3)
   −
This command will create three files :  <code> OutPrefix.plink.dosage.gz, OutPrefix.fam, OutPrefix.map</code>. The <code>.fam</code> and <code>.map</code> formats are described [http://zzz.bwh.harvard.edu/plink/data.shtml#map here]. The <code>--format</code> parameter can take values 1, 2 and 3. Each of these values correspond to the three different formats available for PLINK dosage files (details on PLINK dosage files are given [http://www.cog-genomics.org/plink/1.9/assoc#dosage here]). Note that the generated <code>OutPrefix.map</code> does NOT contain any phenotype information (which will need to be manually edited before PLINK can perform association tests). The <code>OutPrefix.fam</code> will NOT contain sex information unless chromosome X is available. See [[#Converting Chromosome X Files | Converting Chromosome X Files]] for details.
+
This command will create three files :  <code>OutPrefix.plink.dosage.gz</code>, <code>OutPrefix.fam</code>, and <code>OutPrefix.map</code>. The <code>.fam</code> and <code>.map</code> formats are described [http://zzz.bwh.harvard.edu/plink/data.shtml#map here]. The <code>--format</code> parameter can take values 1, 2, or 3. Each of these values correspond to the three different PLINK dosage file formats (details on PLINK dosage files are given [http://www.cog-genomics.org/plink/1.9/assoc#dosage here]). Note that the generated <code>OutPrefix.map</code> does NOT contain any phenotype information (which will need to be manually edited before PLINK can perform association tests). The <code>OutPrefix.fam</code> will NOT contain sex information unless chromosome X is available. See [[#Converting Chromosome X Files | Converting Chromosome X Files]] for details.
    
== Convert to MaCH Files ==
 
== Convert to MaCH Files ==
   −
The following command will convert an input VCF dosage file to a MaCH/minimac dosage file (the format for previous versions of [[Minimac | minimac]]). The generated dosage files can be tested for association using [http://genome.sph.umich.edu/wiki/Mach2dat:_Association_with_MACH_output mach2dat].  
+
The following command will convert an input VCF dosage file to a MaCH/minimac dosage file (the format for previous versions of [[Minimac | minimac]]). The generated dosage files can be tested for association using tools like [http://genome.sph.umich.edu/wiki/Mach2dat:_Association_with_MACH_output mach2dat] or [http://www.genabel.org/packages/ProbABEL ProbABEL].  
    
  ./DosageConvertor        --vcfDose      TestDataImputedVCF.dose.vcf.gz
 
  ./DosageConvertor        --vcfDose      TestDataImputedVCF.dose.vcf.gz
Line 67: Line 58:  
                           --format      1                              (or 2)
 
                           --format      1                              (or 2)
   −
When <code>--type mach</code> is used, the <code>--format</code> parameter can only take values 1 and 2.  
+
When <code>--type mach</code> is used, the <code>--format</code> parameter can only take values 1 or 2.  
    
*If the value is 1, the code generates <code>OutPrefix.mach.dose.gz</code> and <code>OutPrefix.info</code>, where <code>OutPrefix.mach.dose.gz</code> contains the expected alternate allele count (one value per sample per marker).  
 
*If the value is 1, the code generates <code>OutPrefix.mach.dose.gz</code> and <code>OutPrefix.info</code>, where <code>OutPrefix.mach.dose.gz</code> contains the expected alternate allele count (one value per sample per marker).  
Line 76: Line 67:  
== Converting Chromosome X Files ==
 
== Converting Chromosome X Files ==
   −
For a minimac3/4 output file containing the pseudo-autosomal region (PAR) on chromosome X, no extra parameter is necessary. For files containing the non-PAR region, please ensure the following:
+
For conversion of X chromosome files from minimac3/4, please ensure the following:
   −
* If your input VCF dosage file has '''males as diploids''', then just add handle <code>--allDiploid</code>. This will NOT generate sex information in the output PLINK <code>.fam</code> file.
+
* If your input VCF dosage file has '''males as diploids''', then just add handle <code>--allDiploid</code>. This will NOT generate sex information in the output PLINK <code>.fam</code> file.  
** If you still need the sex column in <code>.fam</code> file to be correctly updated, then supply a sex file using <code>--sexFile SomeFile</code> where <code>SomeFile</code> has two columns: the first column has the sample names as found in the VCF file, and the second columns has M (for males) or F (for females).
+
** If you still need the sex column in <code>.fam</code> file to be correctly updated, then supply a sex file using <code>--sexFile SomeFile</code> where <code>SomeFile</code> has two columns:
* If your input VCF dosage file has '''males as haploids''' and '''also has GT information''', the tool with automatically determine the sex of the samples and report them in the output <code>.fam</code> file. No extra parameters are required.
+
*** the first column has the sample names as found in the VCF file
** If GT tags are NOT available, you would need to supply the sex file as described above. Otherwise it will throw an error.
+
*** the second columns has M (for males) or F (for females).  
** '''NOTE''': If your VCF file has males as haploids, do NOT use <code>--allDiploid</code> as the code would NOT throw any error, but the output results would be erroneous.
+
* If your input VCF dosage file has '''males as haploids''' and '''also has GT information''', the tool with automatically determine the sex of the samples from their ploidy and report them in the output <code>.fam</code> file. No extra parameters are required.
 +
** If '''GT''' tags are NOT available, you would need to supply the <code>--sexFile</code> as described above. Otherwise it will throw an error.
 +
** '''NOTE''': If your VCF file has males as haploids, do NOT use <code>--allDiploid</code> as the output results would be erroneous (although the code would NOT throw any error)
    
= Command Line Options =
 
= Command Line Options =
Line 88: Line 81:  
The command options for DosageConvertor are explained below.  
 
The command options for DosageConvertor are explained below.  
   −
{| class="wikitable"  style="text-align:center"  border="1" cellpadding="2"
+
{| class="wikitable"  style="text-align:left"  border="1" cellpadding="2"
 
|- bgcolor="white"
 
|- bgcolor="white"
 
! Option
 
! Option
Line 105: Line 98:  
| <code>--prefix</code>  
 
| <code>--prefix</code>  
 
|  
 
|  
sets the prefix for output files (default value: <code>Converted.Dosage</code>)
+
sets the prefix for output files (default: <code>Converted.Dosage</code>)
 
|-  
 
|-  
 
| <code>--type</code>
 
| <code>--type</code>
Line 113: Line 106:  
| <code>--tag</code>
 
| <code>--tag</code>
 
|  
 
|  
indicates whether to import imputed values from dosages (<code>DS</code>: default), genotype probabilities (<code>GP</code>), or hard genotype calls (<code>GT</code>) from the input VCF file
+
indicates the FORMAT tag of the VCF file from which to import the imputed dosages:
 +
*<code>DS</code>: imputed values from dosages (default)
 +
*<code>GP</code>: genotype probabilities
 +
*<code>GT</code>: hard genotype calls
 
|-  
 
|-  
 
| <code>--format</code>
 
| <code>--format</code>
Line 119: Line 115:  
sets the format of the converted output file:
 
sets the format of the converted output file:
   −
*If <code>--type plink</code> is used, <code>--format</code> can take values 1, 2, or 3. Each of these values correspond to the three different formats available for PLINK dosage files (details given [http://www.cog-genomics.org/plink/1.9/assoc#dosage here])
+
*If <code>--type plink</code> is used, <code>--format</code> can take values 1, 2, or 3.  
*If <code>--type mach</code> is used, <code>--format</code> can take values 1 or 2. Details are given in [[#Convert to MaCH Files| Convert to MaCH Files]]  
+
 
 +
Each of these values correspond to the three different formats available for PLINK dosage files (details given [http://www.cog-genomics.org/plink/1.9/assoc#dosage here])
 +
*If <code>--type mach</code> is used, <code>--format</code> can take values 1 or 2.  
 +
 
 +
Details are given in [[#Convert to MaCH Files| Convert to MaCH Files]]  
 
|-  
 
|-  
 
| <code>--buffer</code>
 
| <code>--buffer</code>
Line 132: Line 132:  
| <code>--allDiploid</code>
 
| <code>--allDiploid</code>
 
|  
 
|  
indicates whether to assume all samples are diploid (necessary for chromosome X). If this option is active, the output PLINK <code>.fam</code> will NOT contain any sex information
+
indicates whether to assume all samples are diploid (necessary for chromosome X).  
 +
 
 +
If this option is active, the output PLINK <code>.fam</code> will NOT contain any sex information
 
|-  
 
|-  
 
| <code>--sexFile</code>
 
| <code>--sexFile</code>
 
|  
 
|  
indicates a file containing sample sex information, which requires two columns: the first column contains the sample names as found in the VCF file, and the second columns contains either M (for males) or F (for females)
+
indicates a file containing sample sex information, which requires two columns:  
 +
*the first column contains the sample names as found in the VCF file
 +
*the second columns contains either M (for males) or F (for females)
 
|-  
 
|-  
| <code>--TrimAlleles</code>
+
| <code>--trimNames</code>
 
|  
 
|  
indicates whether to trim alleles and variants IDs to 100 characters. Since PLINK does not allow variant IDs longer than 16,000 characters, this option can be used if variant names are too long
+
indicates whether to trim variants IDs to 100 characters
 +
 
 +
Since PLINK does not allow variant IDs longer than 16,000 characters, this option can be used if variant names are too long.
 +
|-
 +
| <code>--trimLength</code>
 +
|
 +
number (<16000) indicating the length to which to trim variants IDs to (default value : 100)
 
|}
 
|}
  
487

edits

Navigation menu