Changes

From Genome Analysis Wiki
Jump to navigationJump to search
3,715 bytes added ,  15:51, 18 April 2012
no edit summary
Line 33: Line 33:  
     // Open the vcf file for reading.
 
     // Open the vcf file for reading.
 
     VcfFileReader reader;
 
     VcfFileReader reader;
     VcfFileHeader header;
+
     VcfHeader header;
 
     reader.open("vcfFileName.vcf", header);
 
     reader.open("vcfFileName.vcf", header);
 
</source>
 
</source>
Line 42: Line 42:  
     // Open the vcf file for reading.
 
     // Open the vcf file for reading.
 
     VcfFileReader reader;
 
     VcfFileReader reader;
     VcfFileHeader header;
+
     VcfHeader header;
 
     // Subset 1 is delimited by new lines, '\n'.
 
     // Subset 1 is delimited by new lines, '\n'.
 
     reader.open("vcfFileName.vcf", header, "subsetFile1.txt");
 
     reader.open("vcfFileName.vcf", header, "subsetFile1.txt");
Line 49: Line 49:  
     // Open the vcf file for reading.
 
     // Open the vcf file for reading.
 
     VcfFileReader reader;
 
     VcfFileReader reader;
     VcfFileHeader header;
+
     VcfHeader header;
 
     // Subset 2 is delimited by ';'
 
     // Subset 2 is delimited by ';'
 
     reader.open("vcfFileName.vcf", header, "subsetFile2.txt", ';');
 
     reader.open("vcfFileName.vcf", header, "subsetFile2.txt", ';');
Line 97: Line 97:  
</source>
 
</source>
 
and discards reads that do not have <code>PASS</code> in the <code>FILTER</code> field and reads that have a genotype that is not phased or have no <code>GT</code> in the <code>FORMAT</code> fields.
 
and discards reads that do not have <code>PASS</code> in the <code>FILTER</code> field and reads that have a genotype that is not phased or have no <code>GT</code> in the <code>FORMAT</code> fields.
 +
 +
==== Read only Certain Sections of the File / Using a VCF Index (TABIX) File ====
 +
The VCF library has the capability of using a tabix file to allow random access in the VCF file.
 +
 +
To use the INDEX File to read sections of the bam file:
 +
# Open the VCF file using: <code>VcfFileReader::open</code>
 +
# Open the VCF Index file using: <code>VcfFileReader::readVcfIndex</code>
 +
# Set the Section to be read: <code>VcfFileReader::setReadSection</code> or <code>VcfFileReader::set1BasedReadSection</code>
 +
# Read records from the section: <code>VcfFileReader::readRecord</code>
 +
 +
<code>VcfFileReader::readVcfIndex</code> returns true if the index file was successfully read and false if not.
 +
<source lang="cpp">
 +
    bool readVcfIndex(const char * filename);
 +
    bool readVcfIndex();
 +
</source>
 +
If you specify a filename, it looks for the index file with that path/name.  If you do not specify the index file name, it will attempt to find it.  First it searches for a file with your VCF FileName + a ".tbi" extension.  If that isn't found, it removes the ".vcf" if found in your VCF FileName and looks for a file with that name and a ".tbi" extension.
 +
 +
After opening both the VCF File and the Index file, set the section that should be read next using <code>setReadSection</code> or <code>set1BasedReadSection</code>.
 +
<source lang="cpp">
 +
    bool setReadSection(const char* chromName);
 +
    bool set1BasedReadSection(const char* chromName,
 +
                              int32_t start, int32_t end);
 +
</source>
 +
Both methods currently always return true, since in the current implementation this can't fail.  It currently allows the VcfIndex file to be read after this call.
 +
 +
The read section is reset when a new file is opened, so do not set this prior to opening the file.
 +
 +
<code>setReadSection</code> will set the code to read the entire specified chromosome.  On the first <code>readRecord</code> call it will jump to the beginning of the chromosome.  It will continue to read the chromosome for each <code>readRecord</code> call made until it has read the entire chromosome.  Once the whole chromosome has been read, <code>readRecord</code> will return false until a new read section is specified.
 +
 +
<code>set1BasedReadSection</code> will set the code to read the specified chromosome starting at the specified 1-based position up to, but not including, the specified 1-based end position.  On the first <code>readRecord</code> call it will jump the file to the beginning of this section.  It will continue to read the section for each <code>readRecord</code> call made until it has read the entire section.  Once the entire section has been read, <code>readRecord</code> will return false until a new read section is specified.
 +
 +
Implementation NOTE: internally it may read extra records prior to the section, but <code>readRecord</code> will keep reading and will not return until it finds a record in the section or until the section has been passed (no records in the section).
 +
 +
<source lang="cpp">
 +
    // Open the vcf file for reading.
 +
    VcfFileReader reader;
 +
    VcfHeader header;
 +
    VcfRecord record
 +
    reader.open("vcfFileName.vcf", header);
 +
 +
    // Index File name is "vcfFileName.vcf.tbi" or "vcfFileName.tbi"
 +
    reader.readVcfIndex();
 +
 +
    reader.set1BasedReadSection("1", 16384, 32767);
 +
    while(reader.readRecord(record))
 +
    {
 +
        // Process a record from this section.
 +
    }
 +
 +
    // Set another section, chromosome "X"
 +
    reader.setReadSection("X");
 +
    while(reader.readRecord(record))
 +
    {
 +
        // Process record on chromosome X
 +
    }
 +
</source>
 +
 +
Alternatively, the index file name can be specified.
 +
<source lang="cpp">
 +
    // Index File name is specified
 +
    reader.readVcfIndex("myIndex.tbi");
 +
</source>
    
==== Check if EOF was reached ====
 
==== Check if EOF was reached ====
Line 105: Line 167:  
True is returned if the End of the File has been reached, false if not.
 
True is returned if the End of the File has been reached, false if not.
   −
==== See How Many Files Were Read ====
+
==== See How Many Records Were Read ====
 
To see the total number of records in the file that were read, including any that were discarded, use <code>getNumRecords</code>.
 
To see the total number of records in the file that were read, including any that were discarded, use <code>getNumRecords</code>.
   Line 113: Line 175:  
                   << "but only " << reader.getNumKeptRecords() << " were processed\n.";
 
                   << "but only " << reader.getNumKeptRecords() << " were processed\n.";
 
</source>
 
</source>
      
==== Closing the VCF File ====
 
==== Closing the VCF File ====
Line 121: Line 182:  
     reader.close();
 
     reader.close();
 
</source>
 
</source>
 +
    
=== <code>VcfRecord</code> ===
 
=== <code>VcfRecord</code> ===

Navigation menu