http://genome.sph.umich.edu/w/index.php?title=Vcf2geno&feed=atom&action=historyVcf2geno - Revision history2024-03-29T13:42:54ZRevision history for this page on the wikiMediaWiki 1.35.9http://genome.sph.umich.edu/w/index.php?title=Vcf2geno&diff=6472&oldid=prevZhanxw: Created page with '= vcf2geno = Convert VCF files to genotype file and site file. == Input File == Vcf2geno takes VCF files. They can be in plain text or GZIP/BGZIP compressed formats. == Outpu…'2013-02-19T21:01:09Z<p>Created page with '= vcf2geno = Convert VCF files to genotype file and site file. == Input File == Vcf2geno takes VCF files. They can be in plain text or GZIP/BGZIP compressed formats. == Outpu…'</p>
<p><b>New page</b></p><div>= vcf2geno =<br />
<br />
Convert VCF files to genotype file and site file.<br />
<br />
== Input File ==<br />
<br />
Vcf2geno takes VCF files. They can be in plain text or GZIP/BGZIP compressed formats.<br />
<br />
== Outputs ==<br />
<br />
Vcf2geno generates two sets of first: prefix.geno and preifx.site where ''prefix'' is the given parameter to ''--out''.<br />
<br />
A .geno file is shown below:<br />
<br />
<pre>X1 X1 0 -9 0 0 0 2 2 2 2 2 2<br />
X2 X2 -9 -9 0 0 0 2 2 2 2 2 2<br />
X3 X3 0 -9 0 0 0 2 2 2 2 2 2<br />
X4 X4 0 -9 0 0 0 2 2 2 2 2 2<br />
X5 X5 0 -9 0 0 0 2 2 1 2 2 2<br />
X6 X6 0 -9 0 0 0 2 2 2 2 2 -9</pre><br />
The first and second columns are sample IDs copied from the header of VCF files. From column 3 till the last the column, they are individual level genotype converted from VCF files.<br />
<br />
A .site file is shwon below:<br />
<br />
<pre>CHROM POS ID REF ALT<br />
1 10 1:10 A T<br />
1 20 1:20 G C<br />
1 30 1:30 C A<br />
1 40 1:40 A C<br />
1 10000 1:10000 G C<br />
1 20000 1:20000 T A<br />
4 5000 4:5000 A T<br />
4 6000 4:6000 C T<br />
X 800 X:800 A C<br />
X 900 X:900 A T<br />
X 1000 X:1000 T G</pre><br />
The content of .site file begins with a header line, and thus the content part from the second line is chromosome, positions, reference alleles and alternative alleles.<br />
<br />
== Options ==<br />
<br />
Vcf2geno provides samplexs selection options and range selection options.<br />
<br />
There are four options to include/exclude samples: --peopleIncludeID, --peopleExcludeID: specify which samples are included/excluded in conversion, e.g. --peopleIncludeID X1,X2,X3 will convert only 3 people during conversion if input VCF file contains these three samples. --peopleIncludeFile, --peopleExcludeFile: speicify a file to include/exclude samples. Each line of the file should be a sample ID.<br />
<br />
There are two options to specify regions. You can convert part of the VCF file using this option, however, your input file must be indexed by TABIX. --rangeList: this options enable you to speicify a range by hand. e.g. --rangeList 1:100-200. Note your chromosome name in the command line should be consistent to the content of the VCF file (e.g. both do not have 'chr' prefix). --rangeFile: this optinos speicify range by a given file. Each line of the file should specify a range, e.g. '1:100-200' or alternatively three columns '1 100 200'.<br />
<br />
== Example ==<br />
<br />
Under the &quot;exampleVCF&quot; folder, you can file example.vcf.gz. This is an indexed VCF file. Basica usage of extracting all samples across all regions:<br />
<br />
<pre>../vcf2geno --inVcf example.vcf.gz --out test</pre><br />
Convert sample X1 from range 1:20-30: ../vcf2geno --inVcf example.vcf.gz --rangeList 1:20-30 --peopleIncludeID X1 --out test</div>Zhanxw