LibStatGen: BAM
SAM/BAM File
Reading/Writing SAM/BAM Files
The SamFile class allows a user to easily read/write a SAM/BAM file. This methods found in this class are:
Method Name | Description |
---|---|
bool OpenForRead(const char* filename) | Opens the specified file for reading.
Determines if it is a BAM/SAM file by reading the beginning of the file. Returns true if successfully opened reading, false if not. |
bool OpenForWrite(const char * filename) | bool: true if successfully opened, false if not.
Opens as BAM file if the specified filename ends in .bam. Otherwise it is opened as a SAM file. Returns true if successfully opened for writing, false if not. |
bool ReadHeader(SamFileHeader& header) | Reads the header section from the file and stores it in the passed in header.
Returns true if successfully read, false if not. |
bool WriteHeader(const SamFileHeader& header) | Writes the specified header into the file.
Returns true if successfully written, false if not. |
bool ReadRecord(SamFileHeader& header, SamRecord& record) | Reads the next record from the file and stores it in the passed in record.
Returns true if successfully read, false if not. |
bool WriteRecord(SamFileHeader& header, SamRecord& record) | Writes the specified record into the file.
Returns true if successfully written, false if not. |
Usage Example
The following example reads in a sam/bam file and writes it out as a sam/bam file. The file format of the input sam/bam is determined by the SamFile class based on reading the type from the file. The file format of the output sam/bam file is determined by the SamFile class based on the extension of the output file. A ".bam" extension indicates a BAM file. All other extensions indicate SAM files.
int main(int argc, char ** argv) { if(argc != 3) { printf("./bam <inputFile> <outputFile.sam/bam>\n"); exit(-1); } SamFile samIn; samIn.OpenForRead(argv[1]); SamFile samOut; samOut.OpenForWrite(argv[2]); // Read the sam header. SamFileHeader samHeader; samIn.ReadHeader(samHeader); samOut.WriteHeader(samHeader); // Read the first sam record. SamRecord samRecord; // Keep reading records until it fails. int recordCount = 0; while (samIn.ReadRecord(samHeader, samRecord) == true) { recordCount++; samOut.WriteRecord(samHeader, samRecord); } printf("RecordCount = %d\n", recordCount); }
Setting fields in a SAM/BAM Header
The SamRecord class contains accessors to set the header lines of a SAM/BAM header. By using these set methods to setup the header, they can be pulled back out using the get accessors or the header can be later written to a SAM/BAM file. This methods found in the SamFileHeader class for setting fields are:
Method Name | Description |
---|---|
bool addHeaderLine(const char* type, const char* tag, int value) | Adds the type, tag, and integer value to the header.
Returns true if successfully added, false if not. |
bool addHeaderLine(const char* type, const char* tag, const char* value) | Adds the type, tag, and const char* value to the header.
Returns true if successfully added, false if not. |
bool addHeaderLine(const char* headerLine) | Adds the already setup/formatted headerLine to the header. It is assumed that the line does not contain a “\n”.
Returns true if successfully added, false if not. |
Setting fields in a SAM/BAM Record
The SamRecord class contains accessors to set the fields of a SAM/BAM record. They are used for creating a record that is not read from a SAM/BAM file. By using these set methods to setup the record, they can be pulled back out using the get accessors or the record can be later written as either a SAM/BAM record. This methods found in the SamRecord class for setting fields are:
Method Name | Description |
---|---|
bool setReadName(const char* readName) | Sets QNAME to the passed in name.
Returns true if successfully set, false if not. |
bool setFlag(int flag) | Sets the bitwise FLAG to the passed in value.
Returns true if successfully set, false if not. |
bool setReferenceID(int referenceID) | Sets the reference sequence id. The reference name is not currently stored. A map to the header needs to be done to get this (which is done when writing a SAM file). THIS is an opportunity for improvement.
Returns true if successfully set, false if not. |
bool set1BasedPosition(int position) | Sets the leftmost position. The value passed in is 1-based (SAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool set0BasedPosition(int position) | Sets the leftmost position. The value passed in is 0-based (BAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool setMapQuality(int mapQuality) | Sets the mapping quality.
Returns true if successfully set, false if not. |
bool setCigar(const char* cigar) | Sets the cigar string to the passed in CIGAR. This is a SAM formatted CIGAR string. Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool setMateReferenceID(int mateReferenceID) | Sets the mate reference sequence id. The mate reference name is not currently stored. A map to the header needs to be done to get this (which is done when writing a SAM file). THIS is an opportunity for improvement.
Returns true if successfully set, false if not. |
bool set1BasedMatePosition(int matePosition) | Sets the leftmost mate position. The value passed in is 1-based (SAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool set0BasedMatePosition(int matePosition) | Sets the leftmost mate position. The value passed in is 0-based (BAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool setInsertSize(int insertSize) | Sets the inferred insert size.
Returns true if successfully set, false if not. |
bool setSequence(const char* seq) | Sets the sequence string to the passed in string. This is a SAM formatted sequence string. Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool setQuality(const char* quality) | Sets the quality string to the passed in string. This is a SAM formatted quality string. Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool addTag(const char* tag, char vtype, const char* value) | Adds a tag to the record with the specified tag, vtype, and value. Vtype can be SAM/BAM vtype. Internal processing handles switching between SAM/BAM vtypes when read/written.
Returns true if successfully set, false if not. |
Retrieving fields from a SAM/BAM Record
The SamRecord class contains accessors to access the fields of a SAM/BAM record. They assume that the class has already been populated, either by using the set commands or by calling SamFile::ReadRecord.