LibStatGen: BAM

From Genome Analysis Wiki
Revision as of 14:58, 16 March 2010 by Mktrost (talk | contribs)
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

SAM/BAM File

Reading/Writing SAM/BAM Files

The SamFile class allows a user to easily read/write a SAM/BAM file. This methods found in this class are:

SamFile Class Methods
Method Name Description
bool OpenForRead(const char* filename) Opens the specified file for reading.

Determines if it is a BAM/SAM file by reading the beginning of the file. Returns true if successfully opened reading, false if not.

bool OpenForWrite(const char * filename) bool: true if successfully opened, false if not.

Opens as BAM file if the specified filename ends in .bam. Otherwise it is opened as a SAM file. Returns true if successfully opened for writing, false if not.

bool ReadHeader(SamFileHeader& header) Reads the header section from the file and stores it in the passed in header.

Returns true if successfully read, false if not.

bool WriteHeader(const SamFileHeader& header) Writes the specified header into the file.

Returns true if successfully written, false if not.

bool ReadRecord(SamFileHeader& header, SamRecord& record) Reads the next record from the file and stores it in the passed in record.

Returns true if successfully read, false if not.

bool WriteRecord(SamFileHeader& header, SamRecord& record) Writes the specified record into the file.

Returns true if successfully written, false if not.

Usage Example

The following example reads in a sam/bam file and writes it out as a sam/bam file. The file format of the input sam/bam is determined by the SamFile class based on reading the type from the file. The file format of the output sam/bam file is determined by the SamFile class based on the extension of the output file. A ".bam" extension indicates a BAM file. All other extensions indicate SAM files.

int main(int argc, char ** argv)
{
   if(argc != 3)
   {
      printf("./bam <inputFile> <outputFile.sam/bam>\n");
      exit(-1);
   }


   SamFile samIn;
      
   samIn.OpenForRead(argv[1]);

   SamFile samOut;

   samOut.OpenForWrite(argv[2]);

   // Read the sam header.
   SamFileHeader samHeader;
   samIn.ReadHeader(samHeader);

   samOut.WriteHeader(samHeader);

   // Read the first sam record.
   SamRecord samRecord;

   // Keep reading records until it fails.
   int recordCount = 0;
   while (samIn.ReadRecord(samHeader, samRecord) == true)
   {
      recordCount++;
      samOut.WriteRecord(samHeader, samRecord);
   }
   printf("RecordCount = %d\n", recordCount);
}


Setting fields in a SAM/BAM Header

The SamRecord class contains accessors to set the header lines of a SAM/BAM header. By using these set methods to setup the header, they can be pulled back out using the get accessors or the header can be later written to a SAM/BAM file. This methods found in the SamFileHeader class for setting fields are:

SamFile Class Methods
Method Name Description
bool addHeaderLine(const char* type, const char* tag, int value) Adds the type, tag, and integer value to the header.

Returns true if successfully added, false if not.

bool addHeaderLine(const char* type, const char* tag, const char* value) Adds the type, tag, and const char* value to the header.

Returns true if successfully added, false if not.

bool addHeaderLine(const char* headerLine) Adds the already setup/formatted headerLine to the header. It is assumed that the line does not contain a “\n”.

Returns true if successfully added, false if not.


Setting fields in a SAM/BAM Record

The SamRecord class contains accessors to set the fields of a SAM/BAM record. They are used for creating a record that is not read from a SAM/BAM file. By using these set methods to setup the record, they can be pulled back out using the get accessors or the record can be later written as either a SAM/BAM record. This methods found in the SamRecord class for setting fields are:

SamFile Class Methods
Method Name Description
bool setReadName(const char* readName) Sets QNAME to the passed in name.

Returns true if successfully set, false if not.

bool setFlag(int flag) Sets the bitwise FLAG to the passed in value.

Returns true if successfully set, false if not.

bool setReferenceID(int referenceID) Sets the reference sequence id. The reference name is not currently stored. A map to the header needs to be done to get this (which is done when writing a SAM file). THIS is an opportunity for improvement.

Returns true if successfully set, false if not.

bool set1BasedPosition(int position) Sets the leftmost position. The value passed in is 1-based (SAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.

Returns true if successfully set, false if not.

bool set0BasedPosition(int position) Sets the leftmost position. The value passed in is 0-based (BAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.

Returns true if successfully set, false if not.

bool setMapQuality(int mapQuality) Sets the mapping quality.

Returns true if successfully set, false if not.

bool setCigar(const char* cigar) Sets the cigar string to the passed in CIGAR. This is a SAM formatted CIGAR string. Internal processing handles switching between SAM/BAM formats when read/written.

Returns true if successfully set, false if not.

bool setMateReferenceID(int mateReferenceID) Sets the mate reference sequence id. The mate reference name is not currently stored. A map to the header needs to be done to get this (which is done when writing a SAM file). THIS is an opportunity for improvement.

Returns true if successfully set, false if not.

bool set1BasedMatePosition(int matePosition) Sets the leftmost mate position. The value passed in is 1-based (SAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.

Returns true if successfully set, false if not.

bool set0BasedMatePosition(int matePosition) Sets the leftmost mate position. The value passed in is 0-based (BAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.

Returns true if successfully set, false if not.

bool setInsertSize(int insertSize) Sets the inferred insert size.

Returns true if successfully set, false if not.

bool setSequence(const char* seq) Sets the sequence string to the passed in string. This is a SAM formatted sequence string. Internal processing handles switching between SAM/BAM formats when read/written.

Returns true if successfully set, false if not.

bool setQuality(const char* quality) Sets the quality string to the passed in string. This is a SAM formatted quality string. Internal processing handles switching between SAM/BAM formats when read/written.

Returns true if successfully set, false if not.

bool addTag(const char* tag, char vtype, const char* value) Adds a tag to the record with the specified tag, vtype, and value. Vtype can be SAM/BAM vtype. Internal processing handles switching between SAM/BAM vtypes when read/written.

Returns true if successfully set, false if not.


Retrieving fields from a SAM/BAM Record

The SamRecord class contains accessors to access the fields of a SAM/BAM record. They assume that the class has already been populated, either by using the set commands or by calling SamFile::ReadRecord.