Difference between revisions of "LibStatGen: BAM"
Line 228: | Line 228: | ||
| const char* getQuality() | | const char* getQuality() | ||
| Returns the SAM formatted Quality string. | | Returns the SAM formatted Quality string. | ||
+ | |- | ||
+ | | bool getNextSamTag(char* tag, char& vtype, void** value) | ||
+ | | Returns true if a tag was read, false if there are no more tags. | ||
+ | For a true return value, tag is sent to the tag of the tag, vtype is set to the vtype of the tag, and value is a pointer to the value of the tag. You will then need to use a switch to cast value to int, double, char, or String. | ||
+ | |- | ||
+ | | bool isIntegerType(char vtype) | ||
+ | | Returns true if the passed in vtype is of integer ('c', 'C', 's', 'S', 'i', 'I') type. | ||
+ | |- | ||
+ | | bool isDoubleType(char vtype) | ||
+ | | Returns true if the passed in vtype is of double ('f') type. | ||
+ | |- | ||
+ | | bool isCharType(char vtype) | ||
+ | | Returns true if the passed in vtype is of char ('A') type. | ||
+ | |- | ||
+ | | bool isStringType(char vtype) | ||
+ | | Returns true if the passed in vtype is of String ('Z') type. | ||
+ | |- | ||
|- | |- | ||
|} | |} | ||
+ | |||
+ | |||
+ | Example of using getNextSamTag: | ||
+ | <pre> | ||
+ | // record is a previously setup SamRecord. | ||
+ | String recordString = ""; | ||
+ | char tag[3]; | ||
+ | char vtype; | ||
+ | void* value; | ||
+ | |||
+ | // While there are more tags, write them to the recordString. | ||
+ | while(record.getNextSamTag(tag, vtype, &value) != false) | ||
+ | { | ||
+ | recordString += "\t"; | ||
+ | recordString += tag; | ||
+ | recordString += ":"; | ||
+ | recordString += vtype; | ||
+ | recordString += ":"; | ||
+ | if(record.isIntegerType(vtype)) | ||
+ | { | ||
+ | recordString += (int)*(int*)value; | ||
+ | } | ||
+ | else if(record.isDoubleType(vtype)) | ||
+ | { | ||
+ | recordString += (double)*(double*)value; | ||
+ | } | ||
+ | else if(record.isCharType(vtype)) | ||
+ | { | ||
+ | recordString += (char)*(char*)value; | ||
+ | } | ||
+ | else | ||
+ | { | ||
+ | // String type. | ||
+ | recordString += (String)*(String*)value; | ||
+ | } | ||
+ | } | ||
+ | |||
+ | recordString += "\n"; | ||
+ | </pre> |
Revision as of 17:08, 16 March 2010
SAM/BAM File
Reading/Writing SAM/BAM Files
The SamFile class allows a user to easily read/write a SAM/BAM file. The methods found in this class are:
Method Name | Description |
---|---|
bool OpenForRead(const char* filename) | Opens the specified file for reading.
Determines if it is a BAM/SAM file by reading the beginning of the file. Returns true if successfully opened reading, false if not. |
bool OpenForWrite(const char * filename) | bool: true if successfully opened, false if not.
Opens as BAM file if the specified filename ends in .bam. Otherwise it is opened as a SAM file. Returns true if successfully opened for writing, false if not. |
bool ReadHeader(SamFileHeader& header) | Reads the header section from the file and stores it in the passed in header.
Returns true if successfully read, false if not. |
bool WriteHeader(const SamFileHeader& header) | Writes the specified header into the file.
Returns true if successfully written, false if not. |
bool ReadRecord(SamFileHeader& header, SamRecord& record) | Reads the next record from the file and stores it in the passed in record.
Returns true if successfully read, false if not. |
bool WriteRecord(SamFileHeader& header, SamRecord& record) | Writes the specified record into the file.
Returns true if successfully written, false if not. |
Usage Example
The following example reads in a sam/bam file and writes it out as a sam/bam file. The file format of the input sam/bam is determined by the SamFile class based on reading the type from the file. The file format of the output sam/bam file is determined by the SamFile class based on the extension of the output file. A ".bam" extension indicates a BAM file. All other extensions indicate SAM files.
int main(int argc, char ** argv) { if(argc != 3) { printf("./bam <inputFile> <outputFile.sam/bam>\n"); exit(-1); } SamFile samIn; samIn.OpenForRead(argv[1]); SamFile samOut; samOut.OpenForWrite(argv[2]); // Read the sam header. SamFileHeader samHeader; samIn.ReadHeader(samHeader); samOut.WriteHeader(samHeader); // Read the first sam record. SamRecord samRecord; // Keep reading records until it fails. int recordCount = 0; while (samIn.ReadRecord(samHeader, samRecord) == true) { recordCount++; samOut.WriteRecord(samHeader, samRecord); } printf("RecordCount = %d\n", recordCount); }
Setting fields in a SAM/BAM Header
The SamRecord class contains accessors to set the header lines of a SAM/BAM header. By using these set methods to setup the header, they can be pulled back out using the get accessors or the header can be later written to a SAM/BAM file. The methods found in the SamFileHeader class for setting fields are:
Method Name | Description |
---|---|
bool addHeaderLine(const char* type, const char* tag, int value) | Adds the type, tag, and integer value to the header.
Returns true if successfully added, false if not. |
bool addHeaderLine(const char* type, const char* tag, const char* value) | Adds the type, tag, and const char* value to the header.
Returns true if successfully added, false if not. |
bool addHeaderLine(const char* headerLine) | Adds the already setup/formatted headerLine to the header. It is assumed that the line does not contain a “\n”.
Returns true if successfully added, false if not. |
Setting fields in a SAM/BAM Record
The SamRecord class contains accessors to set the fields of a SAM/BAM record. They are used for creating a record that is not read from a SAM/BAM file. By using these set methods to setup the record, they can be pulled back out using the get accessors or the record can be later written as either a SAM/BAM record. The methods found in the SamRecord class for setting fields are:
Method Name | Description |
---|---|
bool setReadName(const char* readName) | Sets QNAME to the passed in name.
Returns true if successfully set, false if not. |
bool setFlag(int flag) | Sets the bitwise FLAG to the passed in value.
Returns true if successfully set, false if not. |
bool setReferenceID(int referenceID) | Sets the reference sequence id. The reference name is not currently stored. A map to the header needs to be done to get this (which is done when writing a SAM file). THIS is an opportunity for improvement.
Returns true if successfully set, false if not. |
bool set1BasedPosition(int position) | Sets the leftmost position. The value passed in is 1-based (SAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool set0BasedPosition(int position) | Sets the leftmost position. The value passed in is 0-based (BAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool setMapQuality(int mapQuality) | Sets the mapping quality.
Returns true if successfully set, false if not. |
bool setCigar(const char* cigar) | Sets the cigar string to the passed in CIGAR. This is a SAM formatted CIGAR string. Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool setMateReferenceID(int mateReferenceID) | Sets the mate reference sequence id. The mate reference name is not currently stored. A map to the header needs to be done to get this (which is done when writing a SAM file). THIS is an opportunity for improvement.
Returns true if successfully set, false if not. |
bool set1BasedMatePosition(int matePosition) | Sets the leftmost mate position. The value passed in is 1-based (SAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool set0BasedMatePosition(int matePosition) | Sets the leftmost mate position. The value passed in is 0-based (BAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool setInsertSize(int insertSize) | Sets the inferred insert size.
Returns true if successfully set, false if not. |
bool setSequence(const char* seq) | Sets the sequence string to the passed in string. This is a SAM formatted sequence string. Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool setQuality(const char* quality) | Sets the quality string to the passed in string. This is a SAM formatted quality string. Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool addTag(const char* tag, char vtype, const char* value) | Adds a tag to the record with the specified tag, vtype, and value. Vtype can be SAM/BAM vtype. Internal processing handles switching between SAM/BAM vtypes when read/written.
Returns true if successfully set, false if not. |
Retrieving fields from a SAM/BAM Record
The SamRecord class contains accessors to access the fields of a SAM/BAM record. They assume that the class has already been populated, either by using the set commands or by calling SamFile::ReadRecord. Not all of the values that can be retrieved using these get accessors have set methods. That is because they are internally calculated values if they were not read from a file.
The methods found in the SamRecord class for setting fields are:
Method Name | Description |
---|---|
int getBlockSize() | Returns the BAM block size of the record. |
int getReferenceID() | Returns the reference sequence id (BAM format). |
int get1BasedPosition() | Returns the 1-based (SAM formatted) leftmost position. |
int get0BasedPosition() | Returns the 0-based (BAM formatted) leftmost position. |
int getReadNameLength() | Returns the length of the ReadName (QNAME). |
int getMapQuality() | Returns the map quality. |
int getBin() | Returns the BAM bin for the record. |
int getCigarLength() | Returns the length of the CIGAR in BAM format. |
int getFlag() | Returns the flag. |
int getReadLength() | Returns the length of the read. |
int getMateReferenceID() | Returns the mate reference sequence id (BAM format). |
int get1BasedMatePosition() | Returns the 1-based (SAM formatted) mate leftmost position. |
int get0BasedMatePosition() | Returns the 0-based (BAM formatted) mate leftmost position. |
int getInsertSize() | Returns the insert size. |
const char* getReadName() | Returns the SAM formatted Read Name (QNAME). |
const char* getCigar() | Returns the SAM formatted CIGAR string. |
const char* getSequence() | Returns the SAM formatted Sequence string. |
const char* getQuality() | Returns the SAM formatted Quality string. |
bool getNextSamTag(char* tag, char& vtype, void** value) | Returns true if a tag was read, false if there are no more tags.
For a true return value, tag is sent to the tag of the tag, vtype is set to the vtype of the tag, and value is a pointer to the value of the tag. You will then need to use a switch to cast value to int, double, char, or String. |
bool isIntegerType(char vtype) | Returns true if the passed in vtype is of integer ('c', 'C', 's', 'S', 'i', 'I') type. |
bool isDoubleType(char vtype) | Returns true if the passed in vtype is of double ('f') type. |
bool isCharType(char vtype) | Returns true if the passed in vtype is of char ('A') type. |
bool isStringType(char vtype) | Returns true if the passed in vtype is of String ('Z') type. |
Example of using getNextSamTag:
// record is a previously setup SamRecord. String recordString = ""; char tag[3]; char vtype; void* value; // While there are more tags, write them to the recordString. while(record.getNextSamTag(tag, vtype, &value) != false) { recordString += "\t"; recordString += tag; recordString += ":"; recordString += vtype; recordString += ":"; if(record.isIntegerType(vtype)) { recordString += (int)*(int*)value; } else if(record.isDoubleType(vtype)) { recordString += (double)*(double*)value; } else if(record.isCharType(vtype)) { recordString += (char)*(char*)value; } else { // String type. recordString += (String)*(String*)value; } } recordString += "\n";