Changes

From Genome Analysis Wiki
Jump to navigationJump to search
123 bytes removed ,  14:01, 18 January 2019
no edit summary
Line 4: Line 4:  
= Introduction =
 
= Introduction =
 
DosageConvertor is a C++ tool to convert dosage files (in VCF format) from [[Minimac4| Minimac3/4]] to other formats such as MaCH or PLINK.
 
DosageConvertor is a C++ tool to convert dosage files (in VCF format) from [[Minimac4| Minimac3/4]] to other formats such as MaCH or PLINK.
 +
 +
[Please note that this tool CANNOT handle missing values in the input files and may NOT work appropriately for non-Minimac3/4 VCF files]
 +
    
= Download =
 
= Download =
Line 19: Line 22:  
|  
 
|  
 
[https://github.com/Santy-8128/DosageConvertor DosageConvertor - Github]  
 
[https://github.com/Santy-8128/DosageConvertor DosageConvertor - Github]  
|-
  −
| Source Files
  −
|
  −
[ftp://share.sph.umich.edu/minimac3/DosageConvertor/DosageConvertor.v1.0.3.tar.gz  UNIX Users ]
  −
|-
  −
| Binary Executable <sup>&#8224;</sup>
  −
(Ubuntu 4.8.4)
  −
|
  −
[ftp://share.sph.umich.edu/minimac3/DosageConvertor/DosageConvertorExecutable.tar.gz  UNIX Users ]
      
|}
 
|}
  −
'''<sup>&#8224;</sup>''' Binary executables are NOT guaranteed to run on every LINUX machine. Please compile from source files if you have trouble with the executable, or clone from the [https://github.com/Santy-8128/DosageConvertor github repository]. Else contact the author [mailto:sayantan@umich.edu Sayantan Das].
      
= Installation =
 
= Installation =
Line 39: Line 31:  
  ## EXTRACT M3VCFTOOLS AND COMPILE
 
  ## EXTRACT M3VCFTOOLS AND COMPILE
 
  &nbsp;
 
  &nbsp;
  wget ftp://share.sph.umich.edu/minimac3/DosageConvertor/DosageConvertor.v1.0.3.tar.gz
+
  git clone https://github.com/Santy-8128/DosageConvertor
tar -xzvf DosageConvertor.v1.0.3.tar.gz
   
  cd DosageConvertor/
 
  cd DosageConvertor/
 
  make
 
  make
Line 59: Line 50:  
== Convert to MaCH Files ==
 
== Convert to MaCH Files ==
   −
The following command will convert an input VCF dosage file to a MaCH/minimac dosage file (the format for previous versions of [[Minimac | minimac]]). The generated dosage files can be tested for association using [http://genome.sph.umich.edu/wiki/Mach2dat:_Association_with_MACH_output mach2dat].  
+
The following command will convert an input VCF dosage file to a MaCH/minimac dosage file (the format for previous versions of [[Minimac | minimac]]). The generated dosage files can be tested for association using tools like [http://genome.sph.umich.edu/wiki/Mach2dat:_Association_with_MACH_output mach2dat] or [http://www.genabel.org/packages/ProbABEL ProbABEL].  
    
  ./DosageConvertor        --vcfDose      TestDataImputedVCF.dose.vcf.gz
 
  ./DosageConvertor        --vcfDose      TestDataImputedVCF.dose.vcf.gz
Line 76: Line 67:  
== Converting Chromosome X Files ==
 
== Converting Chromosome X Files ==
   −
For a minimac3/4 output file containing the pseudo-autosomal region (PAR) on chromosome X, no extra parameter is necessary. For files containing the non-PAR region, please ensure the following:
+
For conversion of X chromosome files from minimac3/4, please ensure the following:
   −
* If your input VCF dosage file has '''males as diploids''', then just add handle <code>--allDiploid</code>. This will NOT generate sex information in the output PLINK <code>.fam</code> file. You would have to update it manually.
+
* If your input VCF dosage file has '''males as diploids''', then just add handle <code>--allDiploid</code>. This will NOT generate sex information in the output PLINK <code>.fam</code> file.  
* If your input VCF dosage file has '''males as haploids''' and '''also has GT information''', the tool with automatically determine the sex of the samples and report them in the output <code>.fam</code> file. No extra parameters are required.
+
** If you still need the sex column in <code>.fam</code> file to be correctly updated, then supply a sex file using <code>--sexFile SomeFile</code> where <code>SomeFile</code> has two columns:
** If GT tags are NOT available, you would need to supply the sex file as described above. Otherwise it will throw an error.
+
*** the first column has the sample names as found in the VCF file
** '''NOTE''': If your VCF file has males as haploids, do NOT use <code>--allDiploid</code> as the code would NOT throw any error, but the output results would be erroneous.
+
*** the second columns has M (for males) or F (for females).  
 +
* If your input VCF dosage file has '''males as haploids''' and '''also has GT information''', the tool with automatically determine the sex of the samples from their ploidy and report them in the output <code>.fam</code> file. No extra parameters are required.
 +
** If '''GT''' tags are NOT available, you would need to supply the <code>--sexFile</code> as described above. Otherwise it will throw an error.
 +
** '''NOTE''': If your VCF file has males as haploids, do NOT use <code>--allDiploid</code> as the output results would be erroneous (although the code would NOT throw any error)
    
= Command Line Options =
 
= Command Line Options =
Line 112: Line 106:  
| <code>--tag</code>
 
| <code>--tag</code>
 
|  
 
|  
indicates the genotype information to import from the input VCF file:
+
indicates the FORMAT tag of the VCF file from which to import the imputed dosages:
 
*<code>DS</code>: imputed values from dosages (default)
 
*<code>DS</code>: imputed values from dosages (default)
 
*<code>GP</code>: genotype probabilities
 
*<code>GP</code>: genotype probabilities
Line 148: Line 142:  
*the second columns contains either M (for males) or F (for females)
 
*the second columns contains either M (for males) or F (for females)
 
|-  
 
|-  
| <code>--TrimAlleles</code>
+
| <code>--trimNames</code>
 
|  
 
|  
indicates whether to trim alleles and variants IDs to 100 characters
+
indicates whether to trim variants IDs to 100 characters
    
Since PLINK does not allow variant IDs longer than 16,000 characters, this option can be used if variant names are too long.
 
Since PLINK does not allow variant IDs longer than 16,000 characters, this option can be used if variant names are too long.
 +
|-
 +
| <code>--trimLength</code>
 +
|
 +
number (<16000) indicating the length to which to trim variants IDs to (default value : 100)
 
|}
 
|}
  
487

edits

Navigation menu