Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,444 bytes removed ,  16:20, 6 April 2010
no edit summary
Line 1: Line 1:  
== Reading/Writing SAM/BAM Files In Your Program ==
 
== Reading/Writing SAM/BAM Files In Your Program ==
 
The '''SamFile''' class allows a user to easily read/write a SAM/BAM file.
 
The '''SamFile''' class allows a user to easily read/write a SAM/BAM file.
 +
 +
The '''SamFile''' class contains additional functionality that allows a user to read specific sections of sorted & indexed BAM files.  In order take advantage of this capability, the index file must be read prior to setting the read section.  This logic saves the time of having to read the entire file and takes advantage of the seeking capability of BGZF files.
 +
 +
'''Future Enhancements:''' Add the ability to read alignments that match a given start, end position for a specific reference sequence.
 +
    
=== Class Methods ===
 
=== Class Methods ===
 +
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"  '''SamFile Class Methods'''
 
|-style="background: #f2f2f2; text-align: center;"  '''SamFile Class Methods'''
Line 9: Line 15:  
| <code>bool SamFile::IsEOF()</code>
 
| <code>bool SamFile::IsEOF()</code>
 
| bool: true if the end of file has been reached, false if not.
 
| bool: true if the end of file has been reached, false if not.
 +
Be careful using this method when you are only reading a specific section since you may reach the end of your section without hitting the end of the file
 +
 
|-
 
|-
 
| <code>bool SamFile::OpenForRead(const char* filename)</code>
 
| <code>bool SamFile::OpenForRead(const char* filename)</code>
Line 19: Line 27:  
Opens as BAM file if the specified filename ends in .bam.  Otherwise it is opened as a SAM file.
 
Opens as BAM file if the specified filename ends in .bam.  Otherwise it is opened as a SAM file.
 
Returns true if successfully opened for writing, false if not.
 
Returns true if successfully opened for writing, false if not.
 +
|-
 +
| <code>bool SamFile::ReadBamIndex(const char * filename)</code>
 +
| bool: true if the bam index file was successfully read, false if not.
 +
Reads the specified bam index file.  It must be read prior to setting a read section, for seeking and reading portions of a bam file.
 
|-
 
|-
 
| <code>void SamFile::Close()</code>
 
| <code>void SamFile::Close()</code>
Line 32: Line 44:  
|-   
 
|-   
 
| <code>bool SamFile::ReadRecord(SamFileHeader& header, SamRecord& record)</code>
 
| <code>bool SamFile::ReadRecord(SamFileHeader& header, SamRecord& record)</code>
| Reads the next record from the file and stores it in the passed in record.
+
| Reads the next record from the file and stores it in the passed in record.
 +
 
 +
If it is an indexed BAM file and SetReadSection was called, only alignments in the section specified by SetReadSection are read.  If they all have already been read, this method returns false.
 +
 
 
Validates that the record is sorted according to the value set by <code>setSortedValidation</code>.  No sorting validation is done if specified to be unsorted, or <code>setSortedValidation</code> was never called.
 
Validates that the record is sorted according to the value set by <code>setSortedValidation</code>.  No sorting validation is done if specified to be unsorted, or <code>setSortedValidation</code> was never called.
   Line 43: Line 58:  
Returns false if the record was not properly sorted or not successfully written.  Returns true if properly sorted and successfully written.
 
Returns false if the record was not properly sorted or not successfully written.  Returns true if properly sorted and successfully written.
 
|-
 
|-
| <code>void setSortedValidation(SortedType sortType)<\code>
+
| <code>void SamFile::setSortedValidation(SortedType sortType)<\code>
 
| Set the flag to validate that the file is sorted as it is read/written.  Must be called after the file has been opened.
 
| Set the flag to validate that the file is sorted as it is read/written.  Must be called after the file has been opened.
 
sortType specifies the type of sort to be checked for.
 
sortType specifies the type of sort to be checked for.
 
|-
 
|-
| <code>uint32_t GetCurrentRecordCount()</code>
+
| <code>uint32_t SamFile::GetCurrentRecordCount()</code>
 
| Return the number of records that have been read/written so far.
 
| Return the number of records that have been read/written so far.
 
|-
 
|-
| <code>SamStatus::Status GetFailure()</code>
+
| <code>SamStatus::Status SamFile::GetFailure()</code>
 
| Get the type of failure that occurred on a method failure.
 
| Get the type of failure that occurred on a method failure.
 +
|-
 +
| <code>bool SamFile::SetReadSection(int32_t refID)</code>
 +
| Tell the class which reference ID should be read from the BAM file.  This is the index into the BAM Index list of reference information: 0 - #references.  The records for that reference id will be retrieved on each ReadRecord call.  When all records have been retrieved for the specified reference id, ReadRecord will return false until a new read section is set.
 +
Pass in -1 in order to read the section of the bam file not associated with any reference ID.
 +
Returns true if the read section was successfully set, false if not.  False is returned if the BAM Index File has not yet been read or if a BAM file is not open for reading.
 
|}
 
|}
   Line 76: Line 96:  
=== Usage Example ===
 
=== Usage Example ===
 
The following example reads in a sam/bam file and writes it out as a sam/bam file.  The file format of the input sam/bam is determined by the SamFile class based on reading the type from the file.  The file format of the output sam/bam file is determined by the '''SamFile''' class based on the extension of the output file.  A ".bam" extension indicates a BAM file.  All other extensions indicate SAM files.
 
The following example reads in a sam/bam file and writes it out as a sam/bam file.  The file format of the input sam/bam is determined by the SamFile class based on reading the type from the file.  The file format of the output sam/bam file is determined by the '''SamFile''' class based on the extension of the output file.  A ".bam" extension indicates a BAM file.  All other extensions indicate SAM files.
 +
 
<source lang="cpp">
 
<source lang="cpp">
 
int main(int argc, char ** argv)
 
int main(int argc, char ** argv)
Line 149: Line 170:  
</source>
 
</source>
   −
== Reading Indexed (and Sorted) BAM Files ==
  −
The '''IndexedBamReader''' class allows a user to easily read BAM files that are sorted and indexed.
  −
This class allows a user to read only alignments for specific reference sequence.  This saves the time of having to read the entire file.
  −
It takes advantage of the seeking capability of BGZF files, using the BAM Index file to determine where in the BAM file to seek to.
     −
'''Future Enhancements''': Add the ability to read alignments that match a given start, end position for a specific reference sequence.
  −
  −
=== Class Methods ===
  −
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
  −
|-style="background: #f2f2f2; text-align: center;"  '''SamFile Class Methods'''
  −
! Method Name !!  Description
  −
|- 
  −
| <code>bool OpenForRead(const char* bamFilename, const char* bamIndexFilename)</code>
  −
| Opens the bam file for reading and reads in the corresponding index file.
  −
Returns true if successfully opened, false if not.
  −
|-
  −
| <code>bool SetReadSection(int32_t refID)</code>
  −
| Tell the class which reference ID should be read from the BAM file.  This is the index into the BAM Index list of reference information: 0 - #references.  The records for that reference id will be retrieved on each ReadRecord call.  When all records have been retrieved for the specified reference id, ReadRecord will return false until a new read section is set.
  −
Pass in -1 in order to read the section of the bam file not associated with any reference ID.
  −
Returns true if the read section was successfully set, false if not.
  −
|-
  −
| <code>bool ReadRecord(SamFileHeader& header, SamRecord& record)</code>
  −
| Reads the next record from the file and stores it in the passed in record.  Only alignments in the section specified by SetReadSection are read.  If they have all already been read, this method returns false.  If SetReadSection has not been called, then the entire file is read.
  −
Returns true if successfully read, false if not.
  −
|-
  −
| <code>bool IsEOF()</code>
  −
| While this is available, if you are only reading a specific reference sequence, you may never hit the end of the file, so be careful using this method.
  −
bool: true if the end of file has been reached, false if not.
  −
|-
  −
| <code>bool OpenForRead(const char* filename)</code>
  −
| This method exists, but does not do anything and just returns false.  You cannot open a file without an associated index file.
  −
Returns false.
  −
|-
  −
| <code>bool OpenForWrite(const char * filename)</code>
  −
| This method exists, but does not do anything and just returns false.  This class is only for reading.
  −
Returns false.
  −
|-
  −
| <code>void Close()</code>
  −
| Close the file if there is one open.
  −
|-
  −
| <code>bool ReadHeader(SamFileHeader& header)</code>
  −
| Reads the header section from the file and stores it in the passed in header.
  −
Returns true if successfully read, false if not.
  −
|-
  −
| <code>bool WriteHeader(SamFileHeader& header)</code>
  −
| This method exists, but does not do anything and just returns false.  This class is only for reading.
  −
Returns false.
  −
|-
  −
| <code>bool WriteRecord(SamFileHeader& header, SamRecord& record)</code>
  −
| This method exists, but does not do anything and just returns false.  This class is only for reading.
  −
Returns false.
  −
|}
  −
  −
  −
=== Usage Example ===
   
This example reads in the inputFilename bam file and writes it back out section by section to the specified outputFilename, starting with section -1.  It also prints a count of the number of records in each section.
 
This example reads in the inputFilename bam file and writes it back out section by section to the specified outputFilename, starting with section -1.  It also prints a count of the number of records in each section.
 
<source lang="cpp">
 
<source lang="cpp">

Navigation menu