Difference between revisions of "C++ Class: SamRecord"
(Created page with '== Setting fields in a SAM/BAM Record == The '''SamRecord''' class contains accessors to set the fields of a SAM/BAM record. They are used for creating a record that is not read…') |
|||
Line 73: | Line 73: | ||
The methods found in the SamRecord class for setting fields are: | The methods found in the SamRecord class for setting fields are: | ||
− | {| | + | {| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1" |
− | | | + | |-style="background: #f2f2f2; text-align: center;" '''SamRecord Class Methods''' |
− | ! | + | ! Method Name !! Description |
− | ! | ||
|- | |- | ||
− | | bool isValid(SamFileHeader& header) | + | | <code>bool isValid(SamFileHeader& header)</code> |
| Returns true if the record is valid. This performs validation steps. TODO: the method exists, but it does not yet perform any checks, so just returns true. | | Returns true if the record is valid. This performs validation steps. TODO: the method exists, but it does not yet perform any checks, so just returns true. | ||
|- | |- | ||
− | | int32_t getBlockSize() | + | | <code>int32_t getBlockSize()</code> |
| Returns the BAM block size of the record. | | Returns the BAM block size of the record. | ||
|- | |- | ||
− | | const char* getReferenceName(SamFileHeader& header) | + | | <code>const char* getReferenceName(SamFileHeader& header)</code> |
| Returns the reference sequence name (SAM format). | | Returns the reference sequence name (SAM format). | ||
|- | |- | ||
− | | int32_t getReferenceID() | + | | <code>int32_t getReferenceID()</code> |
| Returns the reference sequence ID (BAM format). | | Returns the reference sequence ID (BAM format). | ||
|- | |- | ||
− | | int32_t get1BasedPosition() | + | | <code>int32_t get1BasedPosition()</code> |
| Returns the 1-based (SAM formatted) leftmost position. | | Returns the 1-based (SAM formatted) leftmost position. | ||
|- | |- | ||
− | | int32_t get0BasedPosition() | + | | <code>int32_t get0BasedPosition()</code> |
| Returns the 0-based (BAM formatted) leftmost position. | | Returns the 0-based (BAM formatted) leftmost position. | ||
|- | |- | ||
− | | int8_t getReadNameLength() | + | | <code>int8_t getReadNameLength()</code> |
| Returns the length of the ReadName (QNAME). | | Returns the length of the ReadName (QNAME). | ||
|- | |- | ||
− | | int8_t getMapQuality() | + | | <code>int8_t getMapQuality()</code> |
| Returns the map quality. | | Returns the map quality. | ||
|- | |- | ||
− | | int16_t getBin() | + | | <code>int16_t getBin()</code> |
| Returns the BAM bin for the record. | | Returns the BAM bin for the record. | ||
|- | |- | ||
− | | int16_t getCigarLength() | + | | <code>int16_t getCigarLength()</code> |
| Returns the length of the CIGAR in BAM format. | | Returns the length of the CIGAR in BAM format. | ||
|- | |- | ||
− | | int16_t getFlag() | + | | <code>int16_t getFlag()</code> |
| Returns the flag. | | Returns the flag. | ||
|- | |- | ||
− | | int32_t getReadLength() | + | | <code>int32_t getReadLength()</code> |
| Returns the length of the read. | | Returns the length of the read. | ||
|- | |- | ||
− | | const char* getMateReferenceName(SamFileHeader& header) | + | | <code>const char* getMateReferenceName(SamFileHeader& header)</code> |
| Returns the mate reference sequence name (SAM format). Returns the mate reference sequence name even if it is the same as the reference sequence name. | | Returns the mate reference sequence name (SAM format). Returns the mate reference sequence name even if it is the same as the reference sequence name. | ||
|- | |- | ||
− | | const char* getMateReferenceNameOrEqual(SamFileHeader& header) | + | | <code>const char* getMateReferenceNameOrEqual(SamFileHeader& header)</code> |
| Returns the mate reference sequence name (SAM format). Returns the mate reference sequence name, unless it is the same as the reference sequence name, then an "=" is returned.. | | Returns the mate reference sequence name (SAM format). Returns the mate reference sequence name, unless it is the same as the reference sequence name, then an "=" is returned.. | ||
|- | |- | ||
− | | int32_t getMateReferenceID() | + | | <code>int32_t getMateReferenceID()</code> |
| Returns the mate reference sequence id (BAM format). | | Returns the mate reference sequence id (BAM format). | ||
|- | |- | ||
− | | int32_t get1BasedMatePosition() | + | | <code>int32_t get1BasedMatePosition()</code> |
| Returns the 1-based (SAM formatted) mate leftmost position. | | Returns the 1-based (SAM formatted) mate leftmost position. | ||
|- | |- | ||
− | | int32_t get0BasedMatePosition() | + | | <code>int32_t get0BasedMatePosition()</code> |
| Returns the 0-based (BAM formatted) mate leftmost position. | | Returns the 0-based (BAM formatted) mate leftmost position. | ||
|- | |- | ||
− | | int32_t getInsertSize() | + | | <code>int32_t getInsertSize()</code> |
| Returns the insert size. | | Returns the insert size. | ||
|- | |- | ||
− | | int32_t get0BasedAlignmentEnd() | + | | <code>int32_t get0BasedAlignmentEnd()</code> |
| Returns the 0-based inclusive right-most position of the clipped sequence. | | Returns the 0-based inclusive right-most position of the clipped sequence. | ||
|- | |- | ||
− | | int32_t get1BasedAlignmentEnd() | + | | <code>int32_t get1BasedAlignmentEnd()</code> |
| Returns the 1-based inclusive right-most position of the clipped sequence. | | Returns the 1-based inclusive right-most position of the clipped sequence. | ||
|- | |- | ||
− | | int32_t get0BasedUnclippedStart() | + | | <code>int32_t get0BasedUnclippedStart()</code> |
| Returns the 0-based inclusive left-most position adjusted for clipped bases. | | Returns the 0-based inclusive left-most position adjusted for clipped bases. | ||
|- | |- | ||
− | | int32_t get1BasedUnclippedStart() | + | | <code>int32_t get1BasedUnclippedStart()</code> |
| Returns the 1-based inclusive left-most position adjusted for clipped bases. | | Returns the 1-based inclusive left-most position adjusted for clipped bases. | ||
|- | |- | ||
− | | int32_t get0BasedUnclippedEnd() | + | | <code>int32_t get0BasedUnclippedEnd()</code> |
| Returns the 0-based inclusive right-most position adjusted for clipped bases. | | Returns the 0-based inclusive right-most position adjusted for clipped bases. | ||
|- | |- | ||
− | | int32_t get1BasedUnclippedEnd() | + | | <code>int32_t get1BasedUnclippedEnd()</code> |
| Returns the 1-based inclusive right-most position adjusted for clipped bases. | | Returns the 1-based inclusive right-most position adjusted for clipped bases. | ||
|- | |- | ||
− | | const char* getReadName() | + | | <code>const char* getReadName()</code> |
| Returns the SAM formatted Read Name (QNAME). | | Returns the SAM formatted Read Name (QNAME). | ||
|- | |- | ||
− | | const char* getCigar() | + | | <code>const char* getCigar()</code> |
| Returns the SAM formatted CIGAR string. | | Returns the SAM formatted CIGAR string. | ||
|- | |- | ||
− | | const char* getSequence() | + | | <code>const char* getSequence()</code> |
| Returns the SAM formatted Sequence string. | | Returns the SAM formatted Sequence string. | ||
|- | |- | ||
− | | const char* getQuality() | + | | <code>const char* getQuality()</code> |
| Returns the SAM formatted Quality string. | | Returns the SAM formatted Quality string. | ||
|- | |- | ||
− | | bool getNextSamTag(char* tag, char& vtype, void** value) | + | | <code>bool getNextSamTag(char* tag, char& vtype, void** value)</code> |
| Returns true if a tag was read, false if there are no more tags. | | Returns true if a tag was read, false if there are no more tags. | ||
For a true return value, tag is sent to the tag of the tag, vtype is set to the vtype of the tag, and value is a pointer to the value of the tag. You will then need to use a switch to cast value to int, double, char, or String. | For a true return value, tag is sent to the tag of the tag, vtype is set to the vtype of the tag, and value is a pointer to the value of the tag. You will then need to use a switch to cast value to int, double, char, or String. | ||
|- | |- | ||
− | | bool isIntegerType(char vtype) | + | | <code>bool isIntegerType(char vtype)</code> |
| Returns true if the passed in vtype is of integer ('c', 'C', 's', 'S', 'i', 'I') type. | | Returns true if the passed in vtype is of integer ('c', 'C', 's', 'S', 'i', 'I') type. | ||
|- | |- | ||
− | | bool isDoubleType(char vtype) | + | | <code>bool isDoubleType(char vtype)</code> |
| Returns true if the passed in vtype is of double ('f') type. | | Returns true if the passed in vtype is of double ('f') type. | ||
|- | |- | ||
− | | bool isCharType(char vtype) | + | | <code>bool isCharType(char vtype)</code> |
| Returns true if the passed in vtype is of char ('A') type. | | Returns true if the passed in vtype is of char ('A') type. | ||
|- | |- | ||
− | | bool isStringType(char vtype) | + | | <code>bool isStringType(char vtype)</code> |
| Returns true if the passed in vtype is of String ('Z') type. | | Returns true if the passed in vtype is of String ('Z') type. | ||
− | |||
|- | |- | ||
|} | |} |
Revision as of 11:13, 6 April 2010
Setting fields in a SAM/BAM Record
The SamRecord class contains accessors to set the fields of a SAM/BAM record. They are used for creating a record that is not read from a SAM/BAM file. By using these set methods to setup the record, they can be pulled back out using the get accessors or the record can be later written as either a SAM/BAM record. The methods found in the SamRecord class for setting fields are:
Method Name | Description |
---|---|
void resetRecord()
|
Resets the record to be an empty record. This is not necessary when you are reading a Sam/Bam file, but if you are setting fields, it is a good idea to clean out a record before reusing it. Clearing it allows you to not have to set any empty fields. |
bool setReadName(const char* readName)
|
Sets QNAME to the passed in name.
Returns true if successfully set, false if not. |
bool setFlag(int16_t flag)
|
Sets the bitwise FLAG to the passed in value.
Returns true if successfully set, false if not. |
bool setReferenceName(SamFileHeader& header, const char* referenceName)
|
Sets the reference sequence name. The reference id is calculated using the header.
Returns true if successfully set, false if not. |
bool set1BasedPosition(int32_t position)
|
Sets the leftmost position. The value passed in is 1-based (SAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool set0BasedPosition(int32_t position)
|
Sets the leftmost position. The value passed in is 0-based (BAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool setMapQuality(int8_t mapQuality)
|
Sets the mapping quality.
Returns true if successfully set, false if not. |
bool setCigar(const char* cigar)
|
Sets the cigar string to the passed in CIGAR. This is a SAM formatted CIGAR string. Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool setMateReferenceName(SamFileHeader& header, const char* referenceName)
|
Sets the mate reference sequence name. The mate reference id is calculated using the header.
Returns true if successfully set, false if not. |
bool set1BasedMatePosition(int32_t matePosition)
|
Sets the leftmost mate position. The value passed in is 1-based (SAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool set0BasedMatePosition(int32_t matePosition)
|
Sets the leftmost mate position. The value passed in is 0-based (BAM formatted). Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool setInsertSize(int32_t insertSize)
|
Sets the inferred insert size.
Returns true if successfully set, false if not. |
bool setSequence(const char* seq)
|
Sets the sequence string to the passed in string. This is a SAM formatted sequence string. Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool setQuality(const char* quality)
|
Sets the quality string to the passed in string. This is a SAM formatted quality string. Internal processing handles switching between SAM/BAM formats when read/written.
Returns true if successfully set, false if not. |
bool addTag(const char* tag, char vtype, const char* value)
|
Adds a tag to the record with the specified tag, vtype, and value. Vtype can be SAM/BAM vtype. Internal processing handles switching between SAM/BAM vtypes when read/written.
Returns true if successfully set, false if not. |
When set, SAM fields are validated against: SAM Validation Criteria
Retrieving fields from a SAM/BAM Record
The SamRecord class contains accessors to access the fields of a SAM/BAM record. They assume that the class has already been populated, either by using the set commands or by calling SamFile::ReadRecord. Not all of the values that can be retrieved using these get accessors have set methods. That is because they are internally calculated values if they were not read from a file.
The methods found in the SamRecord class for setting fields are:
Method Name | Description |
---|---|
bool isValid(SamFileHeader& header)
|
Returns true if the record is valid. This performs validation steps. TODO: the method exists, but it does not yet perform any checks, so just returns true. |
int32_t getBlockSize()
|
Returns the BAM block size of the record. |
const char* getReferenceName(SamFileHeader& header)
|
Returns the reference sequence name (SAM format). |
int32_t getReferenceID()
|
Returns the reference sequence ID (BAM format). |
int32_t get1BasedPosition()
|
Returns the 1-based (SAM formatted) leftmost position. |
int32_t get0BasedPosition()
|
Returns the 0-based (BAM formatted) leftmost position. |
int8_t getReadNameLength()
|
Returns the length of the ReadName (QNAME). |
int8_t getMapQuality()
|
Returns the map quality. |
int16_t getBin()
|
Returns the BAM bin for the record. |
int16_t getCigarLength()
|
Returns the length of the CIGAR in BAM format. |
int16_t getFlag()
|
Returns the flag. |
int32_t getReadLength()
|
Returns the length of the read. |
const char* getMateReferenceName(SamFileHeader& header)
|
Returns the mate reference sequence name (SAM format). Returns the mate reference sequence name even if it is the same as the reference sequence name. |
const char* getMateReferenceNameOrEqual(SamFileHeader& header)
|
Returns the mate reference sequence name (SAM format). Returns the mate reference sequence name, unless it is the same as the reference sequence name, then an "=" is returned.. |
int32_t getMateReferenceID()
|
Returns the mate reference sequence id (BAM format). |
int32_t get1BasedMatePosition()
|
Returns the 1-based (SAM formatted) mate leftmost position. |
int32_t get0BasedMatePosition()
|
Returns the 0-based (BAM formatted) mate leftmost position. |
int32_t getInsertSize()
|
Returns the insert size. |
int32_t get0BasedAlignmentEnd()
|
Returns the 0-based inclusive right-most position of the clipped sequence. |
int32_t get1BasedAlignmentEnd()
|
Returns the 1-based inclusive right-most position of the clipped sequence. |
int32_t get0BasedUnclippedStart()
|
Returns the 0-based inclusive left-most position adjusted for clipped bases. |
int32_t get1BasedUnclippedStart()
|
Returns the 1-based inclusive left-most position adjusted for clipped bases. |
int32_t get0BasedUnclippedEnd()
|
Returns the 0-based inclusive right-most position adjusted for clipped bases. |
int32_t get1BasedUnclippedEnd()
|
Returns the 1-based inclusive right-most position adjusted for clipped bases. |
const char* getReadName()
|
Returns the SAM formatted Read Name (QNAME). |
const char* getCigar()
|
Returns the SAM formatted CIGAR string. |
const char* getSequence()
|
Returns the SAM formatted Sequence string. |
const char* getQuality()
|
Returns the SAM formatted Quality string. |
bool getNextSamTag(char* tag, char& vtype, void** value)
|
Returns true if a tag was read, false if there are no more tags.
For a true return value, tag is sent to the tag of the tag, vtype is set to the vtype of the tag, and value is a pointer to the value of the tag. You will then need to use a switch to cast value to int, double, char, or String. |
bool isIntegerType(char vtype)
|
Returns true if the passed in vtype is of integer ('c', 'C', 's', 'S', 'i', 'I') type. |
bool isDoubleType(char vtype)
|
Returns true if the passed in vtype is of double ('f') type. |
bool isCharType(char vtype)
|
Returns true if the passed in vtype is of char ('A') type. |
bool isStringType(char vtype)
|
Returns true if the passed in vtype is of String ('Z') type. |
Example of using getNextSamTag:
// record is a previously setup SamRecord.
String recordString = "";
char tag[3];
char vtype;
void* value;
// While there are more tags, write them to the recordString.
while(record.getNextSamTag(tag, vtype, &value) != false)
{
recordString += "\t";
recordString += tag;
recordString += ":";
recordString += vtype;
recordString += ":";
if(record.isIntegerType(vtype))
{
recordString += (int)*(int*)value;
}
else if(record.isDoubleType(vtype))
{
recordString += (double)*(double*)value;
}
else if(record.isCharType(vtype))
{
recordString += (char)*(char*)value;
}
else
{
// String type.
recordString += (String)*(String*)value;
}
}
recordString += "\n";