Difference between revisions of "VcfRefGen"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 3: Line 3:
 
= Download =
 
= Download =
 
Source code can be found [http://www.sph.umich.edu/csg/cfuchsb/ here] .
 
Source code can be found [http://www.sph.umich.edu/csg/cfuchsb/ here] .
 
  
 
= Parameter=  
 
= Parameter=  
Line 13: Line 12:
 
|-  
 
|-  
 
|style=white-space:nowrap|<code>--in <filename></code>
 
|style=white-space:nowrap|<code>--in <filename></code>
| Input VCF file. The latest 1000 Genomes files can be found [ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/phase1_integrated_calls.20101123.ALL.panel here].
+
| Input VCF filename. The latest 1000 Genomes files can be found [ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20110521/phase1_integrated_calls.20101123.ALL.panel here].
 
|-  
 
|-  
 
| <code>--out <filename></code>
 
| <code>--out <filename></code>
Line 22: Line 21:
 
|-
 
|-
 
| <code>--sampleSubset <filename></code>
 
| <code>--sampleSubset <filename></code>
| file with samples IDs to keep (one sample ID per line).
+
| filename of file with samples IDs to keep (one sample ID per line).
 
|-
 
|-
 
| <code>--minAC</code>
 
| <code>--minAC</code>
Line 28: Line 27:
 
|-
 
|-
 
| <code>--filterList <filename></code>
 
| <code>--filterList <filename></code>
| filename of file containing regions to include. <br>format: start end <br> start & end positions should be 1-based inclusive positions.
+
| filename of file containing regions to include. <br>format: start end <br> start & end positions should be 1-based inclusive positions <br> for SNPs start=end position
 
|-
 
|-
 
| <code>--params</code>
 
| <code>--params</code>

Revision as of 15:17, 29 August 2012

vcfRefGen is a tool for generating VCF reference panels for minimac imputation. It reduces VCF files by removing the info field, saving only the GT genotype field and removing any records where any kept sample is not phased or is missing a genotype.

Download

Source code can be found here .

Parameter

Parameter Description
--in <filename> Input VCF filename. The latest 1000 Genomes files can be found here.
--out <filename> Output VCF filename.
--uncompress write an uncompressed VCF output file.
--sampleSubset <filename> filename of file with samples IDs to keep (one sample ID per line).
--minAC minor allele count to keep.
--filterList <filename> filename of file containing regions to include.
format: start end
start & end positions should be 1-based inclusive positions
for SNPs start=end position
--params print the parameter settings