Line 33: |
Line 33: |
| // Open the vcf file for reading. | | // Open the vcf file for reading. |
| VcfFileReader reader; | | VcfFileReader reader; |
− | VcfFileHeader header; | + | VcfHeader header; |
| reader.open("vcfFileName.vcf", header); | | reader.open("vcfFileName.vcf", header); |
| </source> | | </source> |
Line 42: |
Line 42: |
| // Open the vcf file for reading. | | // Open the vcf file for reading. |
| VcfFileReader reader; | | VcfFileReader reader; |
− | VcfFileHeader header; | + | VcfHeader header; |
| // Subset 1 is delimited by new lines, '\n'. | | // Subset 1 is delimited by new lines, '\n'. |
| reader.open("vcfFileName.vcf", header, "subsetFile1.txt"); | | reader.open("vcfFileName.vcf", header, "subsetFile1.txt"); |
Line 49: |
Line 49: |
| // Open the vcf file for reading. | | // Open the vcf file for reading. |
| VcfFileReader reader; | | VcfFileReader reader; |
− | VcfFileHeader header; | + | VcfHeader header; |
| // Subset 2 is delimited by ';' | | // Subset 2 is delimited by ';' |
| reader.open("vcfFileName.vcf", header, "subsetFile2.txt", ';'); | | reader.open("vcfFileName.vcf", header, "subsetFile2.txt", ';'); |
Line 97: |
Line 97: |
| </source> | | </source> |
| and discards reads that do not have <code>PASS</code> in the <code>FILTER</code> field and reads that have a genotype that is not phased or have no <code>GT</code> in the <code>FORMAT</code> fields. | | and discards reads that do not have <code>PASS</code> in the <code>FILTER</code> field and reads that have a genotype that is not phased or have no <code>GT</code> in the <code>FORMAT</code> fields. |
| + | |
| + | ==== Read only Certain Sections of the File / Using a VCF Index (TABIX) File ==== |
| + | The VCF library has the capability of using a tabix file to allow random access in the VCF file. |
| + | |
| + | To use the INDEX File to read sections of the bam file: |
| + | # Open the VCF file using: <code>VcfFileReader::open</code> |
| + | # Open the VCF Index file using: <code>VcfFileReader::readVcfIndex</code> |
| + | # Set the Section to be read: <code>VcfFileReader::setReadSection</code> or <code>VcfFileReader::set1BasedReadSection</code> |
| + | # Read records from the section: <code>VcfFileReader::readRecord</code> |
| + | |
| + | <code>VcfFileReader::readVcfIndex</code> returns true if the index file was successfully read and false if not. |
| + | <source lang="cpp"> |
| + | bool readVcfIndex(const char * filename); |
| + | bool readVcfIndex(); |
| + | </source> |
| + | If you specify a filename, it looks for the index file with that path/name. If you do not specify the index file name, it will attempt to find it. First it searches for a file with your VCF FileName + a ".tbi" extension. If that isn't found, it removes the ".vcf" if found in your VCF FileName and looks for a file with that name and a ".tbi" extension. |
| + | |
| + | After opening both the VCF File and the Index file, set the section that should be read next using <code>setReadSection</code> or <code>set1BasedReadSection</code>. |
| + | <source lang="cpp"> |
| + | bool setReadSection(const char* chromName); |
| + | bool set1BasedReadSection(const char* chromName, |
| + | int32_t start, int32_t end); |
| + | </source> |
| + | Both methods currently always return true, since in the current implementation this can't fail. It currently allows the VcfIndex file to be read after this call. |
| + | |
| + | The read section is reset when a new file is opened, so do not set this prior to opening the file. |
| + | |
| + | <code>setReadSection</code> will set the code to read the entire specified chromosome. On the first <code>readRecord</code> call it will jump to the beginning of the chromosome. It will continue to read the chromosome for each <code>readRecord</code> call made until it has read the entire chromosome. Once the whole chromosome has been read, <code>readRecord</code> will return false until a new read section is specified. |
| + | |
| + | <code>set1BasedReadSection</code> will set the code to read the specified chromosome starting at the specified 1-based position up to, but not including, the specified 1-based end position. On the first <code>readRecord</code> call it will jump the file to the beginning of this section. It will continue to read the section for each <code>readRecord</code> call made until it has read the entire section. Once the entire section has been read, <code>readRecord</code> will return false until a new read section is specified. |
| + | |
| + | Implementation NOTE: internally it may read extra records prior to the section, but <code>readRecord</code> will keep reading and will not return until it finds a record in the section or until the section has been passed (no records in the section). |
| + | |
| + | <source lang="cpp"> |
| + | // Open the vcf file for reading. |
| + | VcfFileReader reader; |
| + | VcfHeader header; |
| + | VcfRecord record |
| + | reader.open("vcfFileName.vcf", header); |
| + | |
| + | // Index File name is "vcfFileName.vcf.tbi" or "vcfFileName.tbi" |
| + | reader.readVcfIndex(); |
| + | |
| + | reader.set1BasedReadSection("1", 16384, 32767); |
| + | while(reader.readRecord(record)) |
| + | { |
| + | // Process a record from this section. |
| + | } |
| + | |
| + | // Set another section, chromosome "X" |
| + | reader.setReadSection("X"); |
| + | while(reader.readRecord(record)) |
| + | { |
| + | // Process record on chromosome X |
| + | } |
| + | </source> |
| + | |
| + | Alternatively, the index file name can be specified. |
| + | <source lang="cpp"> |
| + | // Index File name is specified |
| + | reader.readVcfIndex("myIndex.tbi"); |
| + | </source> |
| | | |
| ==== Check if EOF was reached ==== | | ==== Check if EOF was reached ==== |
Line 105: |
Line 167: |
| True is returned if the End of the File has been reached, false if not. | | True is returned if the End of the File has been reached, false if not. |
| | | |
− | ==== See How Many Files Were Read ==== | + | ==== See How Many Records Were Read ==== |
| To see the total number of records in the file that were read, including any that were discarded, use <code>getNumRecords</code>. | | To see the total number of records in the file that were read, including any that were discarded, use <code>getNumRecords</code>. |
| | | |
Line 113: |
Line 175: |
| << "but only " << reader.getNumKeptRecords() << " were processed\n."; | | << "but only " << reader.getNumKeptRecords() << " were processed\n."; |
| </source> | | </source> |
− |
| |
| | | |
| ==== Closing the VCF File ==== | | ==== Closing the VCF File ==== |
Line 121: |
Line 182: |
| reader.close(); | | reader.close(); |
| </source> | | </source> |
| + | |
| | | |
| === <code>VcfRecord</code> === | | === <code>VcfRecord</code> === |