Difference between revisions of "VcfRefGen"

From Genome Analysis Wiki
Jump to navigationJump to search
(Created page with '''vcfRefGen''' is a tool for generating VCF reference panels for minimac imputation. It reduces VCF files by removing the info field, saving only the GT genotype field and removi…')
 
 
(14 intermediate revisions by 2 users not shown)
Line 1: Line 1:
''vcfRefGen''' is a tool for generating VCF reference panels for minimac imputation. It reduces VCF files by removing the info field, saving only the GT genotype field and removing any records where any kept sample is not phased or is missing a genotype.
+
''vcfRefGen'' is a tool for generating VCF reference panels for [http://genome.sph.umich.edu/wiki/minimac minimac imputation]. It reduces VCF files by removing the info field, saving only the GT genotype field (and any additional genotype fields specified in the parameters) and removing any records where any kept sample is not phased or is missing a genotype.
  
 
= Download =
 
= Download =
Source code can be found [http://www.sph.umich.edu/csg/cfuchsb/ here] .
+
Source code can in: [[Media:vcfRefGen.0.1.4.tgz‎|VcfRefGen.0.1.4.tgz‎]] - Released 09/04/2014
  
 +
Older versions:
 +
* [[Media:vcfRefGen.0.1.3.tgz‎|VcfRefGen.0.1.3.tgz‎]] - Released 01/29/2013
  
 
= Parameter=  
 
= Parameter=  
Line 12: Line 14:
 
! Description
 
! Description
 
|-  
 
|-  
|style=white-space:nowrap|<code>--in <file></code>
+
|style=white-space:nowrap|<code>--in <filename></code>
| Input VCF file. The latest 1000 Genomes files can be found [ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/phase1_integrated_calls.20101123.ALL.panel here]
+
| Input VCF filename. The latest 1000 Genomes files can be found [ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/phase1_integrated_calls.20101123.ALL.panel here].
 
|-  
 
|-  
| <code>--out <file></code>
+
| <code>--out <filename></code>
| Output VCF filename
+
| Output VCF filename.
 
|-
 
|-
| <code>--states 200</code>
+
| <code>--allfields</code>
| Number of haplotypes to consider during each update. Increasing this value will typically lead to better haplotypes, but can dramatically increase computing time and memory use. A value of 200 - 400 is typical.  
+
| keep info and all genotype fields and do not filter out non-phased or missing genotype genotype records.
 
|-
 
|-
| <code>--rounds 20</code>
+
| <code>--uncompress</code>
| Iterations of the Markov sampler to use for haplotyping. Typically, using 20 - 30 rounds should give good results. To obtain better results, it is usually better to increase the <code>--states</code> parameter.
+
| write an uncompressed VCF output file.
 +
|-
 +
| <code>--sampleSubset <filename></code>
 +
| filename of file with samples IDs to keep (one sample ID per line).
 +
|-
 +
| <code>--minAC</code>
 +
| minor allele count to keep.
 +
|-
 +
| <code>--filterList <filename></code>
 +
| filename of file containing regions to include. <br>format: start end <br> start & end positions should be 1-based inclusive positions <br> for SNPs start=end position
 +
|-
 +
| <code>--keepGT <GTs to keep></code>
 +
| comma separated list of genotype fields to keep in addition to the GT field
 +
 
 +
|-
 +
| <code>--params</code>
 +
| print the parameter settings
 
|-
 
|-
 
|}
 
|}
 +
 +
= Example =
 +
 +
Please find below the command we used for generating MetaboChip specific reference panels from 1000 Genomes data:
 +
 +
<source lang="text">
 +
  foreach chr (`seq 1 22`)
 +
 +
    runon -m 1024 vcfRefGen --in ALL.chr$chr.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz \
 +
              --out chr$chr.metabo.phase1_release_v3.20101123.snps_indels_svs.genotypes.refpanel.ALL.vcf.gz \
 +
              --minAC 2 --filterList chr$chr.filter.regions
 +
  end
 +
</source>
 +
 +
== Questions and Comments ==
 +
 +
Please contact [mailto:cfuchsb@umich.edu Christian Fuchsberger] or [mailto:mktrost@umich.edu Mary Kate Wing]
 +
  
 
[[Category:Software]]
 
[[Category:Software]]

Latest revision as of 15:53, 4 September 2014

vcfRefGen is a tool for generating VCF reference panels for minimac imputation. It reduces VCF files by removing the info field, saving only the GT genotype field (and any additional genotype fields specified in the parameters) and removing any records where any kept sample is not phased or is missing a genotype.

Download

Source code can in: VcfRefGen.0.1.4.tgz‎ - Released 09/04/2014

Older versions:

Parameter

Parameter Description
--in <filename> Input VCF filename. The latest 1000 Genomes files can be found here.
--out <filename> Output VCF filename.
--allfields keep info and all genotype fields and do not filter out non-phased or missing genotype genotype records.
--uncompress write an uncompressed VCF output file.
--sampleSubset <filename> filename of file with samples IDs to keep (one sample ID per line).
--minAC minor allele count to keep.
--filterList <filename> filename of file containing regions to include.
format: start end
start & end positions should be 1-based inclusive positions
for SNPs start=end position
--keepGT <GTs to keep> comma separated list of genotype fields to keep in addition to the GT field
--params print the parameter settings

Example

Please find below the command we used for generating MetaboChip specific reference panels from 1000 Genomes data:

   foreach chr (`seq 1 22`)

     runon -m 1024 vcfRefGen --in ALL.chr$chr.phase1_release_v3.20101123.snps_indels_svs.genotypes.vcf.gz \
               --out chr$chr.metabo.phase1_release_v3.20101123.snps_indels_svs.genotypes.refpanel.ALL.vcf.gz \
               --minAC 2 --filterList chr$chr.filter.regions
   end

Questions and Comments

Please contact Christian Fuchsberger or Mary Kate Wing