Difference between revisions of "C++ Class: SamFile"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 2: Line 2:
 
The '''SamFile''' class allows a user to easily read/write a SAM/BAM file.
 
The '''SamFile''' class allows a user to easily read/write a SAM/BAM file.
  
=== Reading/Writing Standard SAM/BAM Files ===
+
=== Class Methods ===
The methods found in this class are:
 
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"  '''SamFile Class Methods'''
 
|-style="background: #f2f2f2; text-align: center;"  '''SamFile Class Methods'''
Line 42: Line 41:
  
  
==== Usage Example ====
+
=== Usage Example ===
 
The following example reads in a sam/bam file and writes it out as a sam/bam file.  The file format of the input sam/bam is determined by the SamFile class based on reading the type from the file.  The file format of the output sam/bam file is determined by the '''SamFile''' class based on the extension of the output file.  A ".bam" extension indicates a BAM file.  All other extensions indicate SAM files.
 
The following example reads in a sam/bam file and writes it out as a sam/bam file.  The file format of the input sam/bam is determined by the SamFile class based on reading the type from the file.  The file format of the output sam/bam file is determined by the '''SamFile''' class based on the extension of the output file.  A ".bam" extension indicates a BAM file.  All other extensions indicate SAM files.
 
<source lang="cpp">
 
<source lang="cpp">
Line 96: Line 95:
  
  
=== Reading Indexed (and Sorted) BAM Files ===
+
== Reading Indexed (and Sorted) BAM Files ==
 
The '''IndexedBamReader''' class allows a user to easily read BAM files that are sorted and indexed.
 
The '''IndexedBamReader''' class allows a user to easily read BAM files that are sorted and indexed.
 
This class allows a user to read only alignments for specific reference sequence.  This saves the time of having to read the entire file.
 
This class allows a user to read only alignments for specific reference sequence.  This saves the time of having to read the entire file.
Line 102: Line 101:
  
 
'''Future Enhancements''': Add the ability to read alignments that match a given start, end position for a specific reference sequence.
 
'''Future Enhancements''': Add the ability to read alignments that match a given start, end position for a specific reference sequence.
The methods found in this class are:
+
 
{| class="wikitable" style="width:100%" border="1"
+
=== Class Methods ===
|+ style="font-size:150%"|'''SamFile Class Methods'''
+
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
! width=""|Method Name
+
|-style="background: #f2f2f2; text-align: center;" '''SamFile Class Methods'''
width=""|Description
+
! Method Name !!  Description
 
|-   
 
|-   
| bool OpenForRead(const char* bamFilename, const char* bamIndexFilename)
+
| <code>bool OpenForRead(const char* bamFilename, const char* bamIndexFilename)</code>
 
| Opens the bam file for reading and reads in the corresponding index file.
 
| Opens the bam file for reading and reads in the corresponding index file.
 
Returns true if successfully opened, false if not.
 
Returns true if successfully opened, false if not.
 
|-
 
|-
| bool SetReadSection(int32_t refID)
+
| <code>bool SetReadSection(int32_t refID)</code>
 
| Tell the class which reference ID should be read from the BAM file.  This is the index into the BAM Index list of reference information: 0 - #references.  The records for that reference id will be retrieved on each ReadRecord call.  When all records have been retrieved for the specified reference id, ReadRecord will return false until a new read section is set.
 
| Tell the class which reference ID should be read from the BAM file.  This is the index into the BAM Index list of reference information: 0 - #references.  The records for that reference id will be retrieved on each ReadRecord call.  When all records have been retrieved for the specified reference id, ReadRecord will return false until a new read section is set.
 
Pass in -1 in order to read the section of the bam file not associated with any reference ID.
 
Pass in -1 in order to read the section of the bam file not associated with any reference ID.
 
Returns true if the read section was successfully set, false if not.
 
Returns true if the read section was successfully set, false if not.
 
|-
 
|-
| bool ReadRecord(SamFileHeader& header, SamRecord& record)
+
| <code>bool ReadRecord(SamFileHeader& header, SamRecord& record)</code>
 
| Reads the next record from the file and stores it in the passed in record.  Only alignments in the section specified by SetReadSection are read.  If they have all already been read, this method returns false.  If SetReadSection has not been called, then the entire file is read.
 
| Reads the next record from the file and stores it in the passed in record.  Only alignments in the section specified by SetReadSection are read.  If they have all already been read, this method returns false.  If SetReadSection has not been called, then the entire file is read.
 
Returns true if successfully read, false if not.
 
Returns true if successfully read, false if not.
 
|-
 
|-
| bool IsEOF()
+
| <code>bool IsEOF()</code>
 
| While this is available, if you are only reading a specific reference sequence, you may never hit the end of the file, so be careful using this method.
 
| While this is available, if you are only reading a specific reference sequence, you may never hit the end of the file, so be careful using this method.
 
bool: true if the end of file has been reached, false if not.
 
bool: true if the end of file has been reached, false if not.
 
|-
 
|-
| bool OpenForRead(const char* filename)
+
| <code>bool OpenForRead(const char* filename)</code>
 
| This method exists, but does not do anything and just returns false.  You cannot open a file without an associated index file.
 
| This method exists, but does not do anything and just returns false.  You cannot open a file without an associated index file.
 
Returns false.
 
Returns false.
 
|-
 
|-
| bool OpenForWrite(const char * filename)
+
| <code>bool OpenForWrite(const char * filename)</code>
 
| This method exists, but does not do anything and just returns false.  This class is only for reading.
 
| This method exists, but does not do anything and just returns false.  This class is only for reading.
 
Returns false.
 
Returns false.
 
|-
 
|-
| void Close()
+
| <code>void Close()</code>
 
| Close the file if there is one open.
 
| Close the file if there is one open.
 
|-
 
|-
| bool ReadHeader(SamFileHeader& header)
+
| <code>bool ReadHeader(SamFileHeader& header)</code>
 
| Reads the header section from the file and stores it in the passed in header.
 
| Reads the header section from the file and stores it in the passed in header.
 
Returns true if successfully read, false if not.
 
Returns true if successfully read, false if not.
 
|-  
 
|-  
| bool WriteHeader(SamFileHeader& header)
+
| <code>bool WriteHeader(SamFileHeader& header)</code>
 
| This method exists, but does not do anything and just returns false.  This class is only for reading.
 
| This method exists, but does not do anything and just returns false.  This class is only for reading.
 
Returns false.
 
Returns false.
 
|-  
 
|-  
| bool WriteRecord(SamFileHeader& header, SamRecord& record)
+
| <code>bool WriteRecord(SamFileHeader& header, SamRecord& record)</code>
 
| This method exists, but does not do anything and just returns false.  This class is only for reading.
 
| This method exists, but does not do anything and just returns false.  This class is only for reading.
 
Returns false.
 
Returns false.

Revision as of 10:04, 6 April 2010

Reading/Writing SAM/BAM Files In Your Program

The SamFile class allows a user to easily read/write a SAM/BAM file.

Class Methods

Method Name Description
bool SamFile::IsEOF() bool: true if the end of file has been reached, false if not.
bool SamFile::OpenForRead(const char* filename) Opens the specified file for reading.

Determines if it is a BAM/SAM file by reading the beginning of the file. Returns true if successfully opened reading, false if not.

bool SamFile::OpenForWrite(const char * filename) bool: true if successfully opened, false if not.

Opens as BAM file if the specified filename ends in .bam. Otherwise it is opened as a SAM file. Returns true if successfully opened for writing, false if not.

void SamFile::Close() Close the file if there is one open.
bool SamFile::ReadHeader(SamFileHeader& header) Reads the header section from the file and stores it in the passed in header.

Returns true if successfully read, false if not.

bool SamFile::WriteHeader(SamFileHeader& header) Writes the specified header into the file.

Returns true if successfully written, false if not.

bool SamFile::ReadRecord(SamFileHeader& header, SamRecord& record) Reads the next record from the file and stores it in the passed in record.

Returns true if successfully read, false if not.

bool SamFile::WriteRecord(SamFileHeader& header, SamRecord& record) Writes the specified record into the file.

Returns true if successfully written, false if not.


Usage Example

The following example reads in a sam/bam file and writes it out as a sam/bam file. The file format of the input sam/bam is determined by the SamFile class based on reading the type from the file. The file format of the output sam/bam file is determined by the SamFile class based on the extension of the output file. A ".bam" extension indicates a BAM file. All other extensions indicate SAM files.

int main(int argc, char ** argv)
{
   if(argc != 3)
   {
      printf("./bam <inputFile> <outputFile.sam/bam>\n");
      exit(-1);
   }


   SamFile samIn;
      
   samIn.OpenForRead(argv[1]);

   SamFile samOut;

   samOut.OpenForWrite(argv[2]);

   // Read the sam header.
   SamFileHeader samHeader;
   samIn.ReadHeader(samHeader);

   samOut.WriteHeader(samHeader);

   // Read the first sam record.
   SamRecord samRecord;

   // Keep reading records until the end of the file is reached.
   int numValidRecords = 0;
   int numRecords = 0;
   while(!samIn.IsEOF())
   {
      if(samIn.ReadRecord(samHeader, samRecord) == true)
      {
         numRecords++;
         // Successfully read a record from the file, so check to see
         // if it is valid.
         if(samRecord.isValid(samHeader))
         {
            //  It is valid, so write it.
            numValidRecords++;
            samOut.WriteRecord(samHeader, samRecord);
         }
      }
      
   }
   std::cout << "Number of records = " << numRecords << std::endl;
   std::cout << "Number of valid records = " << numValidRecords << std::endl;
}


Reading Indexed (and Sorted) BAM Files

The IndexedBamReader class allows a user to easily read BAM files that are sorted and indexed. This class allows a user to read only alignments for specific reference sequence. This saves the time of having to read the entire file. It takes advantage of the seeking capability of BGZF files, using the BAM Index file to determine where in the BAM file to seek to.

Future Enhancements: Add the ability to read alignments that match a given start, end position for a specific reference sequence.

Class Methods

Method Name Description
bool OpenForRead(const char* bamFilename, const char* bamIndexFilename) Opens the bam file for reading and reads in the corresponding index file.

Returns true if successfully opened, false if not.

bool SetReadSection(int32_t refID) Tell the class which reference ID should be read from the BAM file. This is the index into the BAM Index list of reference information: 0 - #references. The records for that reference id will be retrieved on each ReadRecord call. When all records have been retrieved for the specified reference id, ReadRecord will return false until a new read section is set.

Pass in -1 in order to read the section of the bam file not associated with any reference ID. Returns true if the read section was successfully set, false if not.

bool ReadRecord(SamFileHeader& header, SamRecord& record) Reads the next record from the file and stores it in the passed in record. Only alignments in the section specified by SetReadSection are read. If they have all already been read, this method returns false. If SetReadSection has not been called, then the entire file is read.

Returns true if successfully read, false if not.

bool IsEOF() While this is available, if you are only reading a specific reference sequence, you may never hit the end of the file, so be careful using this method.

bool: true if the end of file has been reached, false if not.

bool OpenForRead(const char* filename) This method exists, but does not do anything and just returns false. You cannot open a file without an associated index file.

Returns false.

bool OpenForWrite(const char * filename) This method exists, but does not do anything and just returns false. This class is only for reading.

Returns false.

void Close() Close the file if there is one open.
bool ReadHeader(SamFileHeader& header) Reads the header section from the file and stores it in the passed in header.

Returns true if successfully read, false if not.

bool WriteHeader(SamFileHeader& header) This method exists, but does not do anything and just returns false. This class is only for reading.

Returns false.

bool WriteRecord(SamFileHeader& header, SamRecord& record) This method exists, but does not do anything and just returns false. This class is only for reading.

Returns false.


Usage Example

This example reads in the inputFilename bam file and writes it back out section by section to the specified outputFilename, starting with section -1. It also prints a count of the number of records in each section.

int ReadIndexedBam(const char* inputFilename,
                   const char* outputFilename,
                   const char* indexFilename)
{
   IndexedBamReader samIn;
      
   samIn.OpenForRead(inputFilename, indexFilename);

   SamFile samOut;

   samOut.OpenForWrite(outputFilename);

   // Read the sam header.
   SamFileHeader samHeader;
   samIn.ReadHeader(samHeader);
   // Write the sam header.
   samOut.WriteHeader(samHeader);

   SamRecord samRecord;
   
   int numValidRecords = 0;
   int numRecords = 0;

   // Loop through each Reference.
   for(int i = -1; i < 23; i++)
   {
      int numSectionRecords = 0;
      samIn.SetReadSection(i);
      // Keep reading records until they aren't read successfully.
      while(samIn.ReadRecord(samHeader, samRecord) == true)
      {
         numSectionRecords++;
         numRecords++;
         // Successfully read a record from the file, so check to see
         // if it is valid.
         if(samRecord.isValid(samHeader))
         {
            //  It is valid, so write it.
            numValidRecords++;
            samOut.WriteRecord(samHeader, samRecord);
         }
      }
      std::cout << "Reference ID " << i << " has " << numSectionRecords 
                << " records" << std::endl;
   }
      
   std::cout << "Number of records = " << numRecords << std::endl;
   std::cout << "Number of valid records = " << numValidRecords << std::endl;

   return(0);
}