Difference between revisions of "C++ Class: SamRecord"

From Genome Analysis Wiki
Jump to: navigation, search
(Update to refer to doxygen)
Line 5: Line 5:
 
This class is part of [[C++ Library: libStatGen]].
 
This class is part of [[C++ Library: libStatGen]].
  
== Setting fields in a SAM/BAM Record ==
+
== Getting/Setting fields in a SAM/BAM Record ==
The '''SamRecord''' class contains accessors to set the fields of a SAM/BAM record.  They are used for creating a record that is not read from a SAM/BAM file.  By using these set methods to setup the record, they can be pulled back out using the get accessors or the record can be later written as either a SAM/BAM record. 
+
The '''SamRecord''' class contains accessors to "set" and "get" the fields of a SAM/BAM record.   
The methods found in the '''SamRecord''' class for setting fields are:
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"  '''SamRecord Class Methods'''
 
! Method Name !!  Description
 
|-
 
| <code>void SamRecord::resetRecord()</code>
 
| Resets the record to be an empty record.  This is not necessary when you are reading a Sam/Bam file, but if you are setting fields, it is a good idea to clean out a record before reusing it.  Clearing it allows you to not have to set any empty fields.
 
|-
 
| <code>bool SamRecord::setReadName(const char* readName)</code>
 
| Sets QNAME to the passed in name.
 
Returns true if successfully set, false if not.
 
|-
 
| <code>bool SamRecord::setFlag(uint16_t flag)</code>
 
| Sets the bitwise FLAG to the passed in value.
 
Returns true if successfully set, false if not.
 
|-
 
| <code>bool SamRecord::setReferenceName(SamFileHeader& header, const char* referenceName)</code>
 
| Sets the reference sequence name.  The reference id is calculated using the header.
 
Returns true if successfully set, false if not.
 
|-
 
| <code>bool SamRecord::set1BasedPosition(int32_t position)</code>
 
| Sets the leftmost position.  The value passed in is 1-based (SAM formatted).  Internal processing handles switching between SAM/BAM formats when read/written.
 
Returns true if successfully set, false if not.
 
|-
 
| <code>bool SamRecord::set0BasedPosition(int32_t position)</code>
 
| Sets the leftmost position.  The value passed in is 0-based (BAM formatted).  Internal processing handles switching between SAM/BAM formats when read/written.
 
Returns true if successfully set, false if not.
 
|-
 
| <code>bool SamRecord::setMapQuality(int8_t mapQuality)</code>
 
| Sets the mapping quality.
 
Returns true if successfully set, false if not.
 
|-
 
| <code>bool SamRecord::setCigar(const char* cigar)</code>
 
| Sets the cigar string to the passed in CIGAR.  This is a SAM formatted CIGAR string.  Internal processing handles switching between SAM/BAM formats when read/written.
 
Returns true if successfully set, false if not.
 
|-
 
| <code>bool SamRecord::setMateReferenceName(SamFileHeader& header, const char* referenceName)</code>
 
| Sets the mate reference sequence name.  The mate reference id is calculated using the header.
 
Returns true if successfully set, false if not.
 
|-
 
| <code>bool SamRecord::set1BasedMatePosition(int32_t matePosition)</code>
 
| Sets the leftmost mate position.  The value passed in is 1-based (SAM formatted).  Internal processing handles switching between SAM/BAM formats when read/written.
 
Returns true if successfully set, false if not.
 
|-
 
| <code>bool SamRecord::set0BasedMatePosition(int32_t matePosition)</code>
 
| Sets the leftmost mate position.  The value passed in is 0-based (BAM formatted).  Internal processing handles switching between SAM/BAM formats when read/written.
 
Returns true if successfully set, false if not.
 
|-
 
| <code>bool SamRecord::setInsertSize(int32_t insertSize)</code>
 
| Sets the inferred insert size.
 
Returns true if successfully set, false if not.
 
|-
 
| <code>bool SamRecord::setSequence(const char* seq)</code>
 
| Sets the sequence string to the passed in string.  This is a SAM formatted sequence string.  Internal processing handles switching between SAM/BAM formats when read/written.
 
Returns true if successfully set, false if not.
 
|-
 
| <code>bool SamRecord::setQuality(const char* quality)</code>
 
| Sets the quality string to the passed in string.  This is a SAM formatted quality string.  Internal processing handles switching between SAM/BAM formats when read/written.
 
Returns true if successfully set, false if not.
 
|-
 
| <code>bool SamRecord::addTag(const char* tag, char vtype, const char* value)</code>
 
| Adds a tag to the record with the specified tag, vtype, and value.  Vtype can be SAM/BAM vtypeInternal processing handles switching between SAM/BAM vtypes when read/written.
 
Returns true if successfully set, false if not.
 
|-
 
| <code>SamStatus::Status SamRecord::setBufferFromFile(IFILE filePtr, SamFileHeader& header)</code>
 
| Reads a BAM record from the specified BAM file.
 
|-
 
| <code>SamStatus::Status SamRecord::setBuffer(const char* fromBuffer, uint32_t fromBufferSize, SamFileHeader& header)</code>
 
| Sets the SamRecord to contain the BAM record contents found in fromBuffer.
 
|}
 
  
 +
The "set" accessors are used for creating a record that is not read from a SAM/BAM file.  By using these set methods to setup the record, they can be pulled back out using the get accessors or the record can be later written as either a SAM/BAM record. 
  
== Retrieving fields from a SAM/BAM Record ==
+
The "get" accessors assume that the class has already been populated, either by using the set commands or by calling SamFile::ReadRecord.  Not all of the values that can be retrieved using these get accessors have set methods.  That is because they are either read from the file or are internally calculated values.
The '''SamRecord''' class contains accessors to access the fields of a SAM/BAM record.  They assume that the class has already been populated, either by using the set commands or by calling SamFile::ReadRecord.  Not all of the values that can be retrieved using these get accessors have set methods.  That is because they are internally calculated values if they were not read from a file.
 
  
The methods found in the SamRecord class for setting fields are:
+
See: http://www.sph.umich.edu/csg/mktrost/doxygen/current/classSamFileHeader.html for documentation.
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"  '''SamRecord Class Methods'''
 
! Method Name !!  Description
 
|-
 
| <code>bool SamRecord::isValid(SamFileHeader& header)</code>
 
| Returns true if the record is valid.  This performs validation steps.
 
|-
 
| <code>const void* SamRecord::getRecordBuffer()</code>
 
| Returns a const pointer to the buffer that is the BAM representation of the record.
 
|-
 
| <code>SamStatus::Status SamRecord::writeRecordBuffer(IFILE filePtr)</code>
 
| Returns the status of writing the BAM record into the specified, already opened IFILE.
 
|-
 
| <code>int32_t SamRecord::getBlockSize()</code>
 
| Returns the BAM block size of the record.
 
|-
 
| <code>const char* SamRecord::getReferenceName()</code>
 
| Returns the reference sequence name (SAM format).
 
|-
 
| <code>int32_t SamRecord::getReferenceID()</code>
 
| Returns the reference sequence ID (BAM format).
 
|-
 
| <code>int32_t SamRecord::get1BasedPosition()</code>
 
| Returns the 1-based (SAM formatted) leftmost position.
 
|-
 
| <code>int32_t SamRecord::get0BasedPosition()</code>
 
| Returns the 0-based (BAM formatted) leftmost position.
 
|-
 
| <code>int8_t SamRecord::getReadNameLength()</code>
 
| Returns the length of the ReadName (QNAME).
 
|-
 
| <code>int8_t SamRecord::getMapQuality()</code>
 
| Returns the map quality.
 
|-
 
| <code>int16_t SamRecord::getBin()</code>
 
| Returns the BAM bin for the record.
 
|-
 
| <code>int16_t SamRecord::getCigarLength()</code>
 
| Returns the length of the CIGAR in BAM format.
 
|-
 
| <code>uint16_t SamRecord::getFlag()</code>
 
| Returns the flag.
 
|-
 
| <code>int32_t SamRecord::getReadLength()</code>
 
| Returns the length of the read.
 
|-
 
| <code>const char* SamRecord::getMateReferenceName()</code>
 
| Returns the mate reference sequence name (SAM format).  Returns the mate reference sequence name even if it is the same as the reference sequence name.
 
|-
 
| <code>const char* SamRecord::getMateReferenceNameOrEqual()</code>
 
| Returns the mate reference sequence name (SAM format).  Returns the mate reference sequence name, unless it is the same as the reference sequence name, then an "=" is returned, unless the name is "*", then "*" is returned.
 
|-
 
| <code>int32_t SamRecord::getMateReferenceID()</code>
 
| Returns the mate reference sequence id (BAM format).
 
|-
 
| <code>int32_t SamRecord::get1BasedMatePosition()</code>
 
| Returns the 1-based (SAM formatted) mate leftmost position.
 
|-
 
| <code>int32_t SamRecord::get0BasedMatePosition()</code>
 
| Returns the 0-based (BAM formatted) mate leftmost position.
 
|-
 
| <code>int32_t SamRecord::getInsertSize()</code>
 
| Returns the insert size.
 
|-
 
| <code>int32_t SamRecord::get0BasedAlignmentEnd()</code>
 
| Returns the 0-based inclusive right-most position of the clipped sequence.
 
|-
 
| <code>int32_t SamRecord::get1BasedAlignmentEnd()</code>
 
| Returns the 1-based inclusive right-most position of the clipped sequence.
 
|-
 
| <code>int32_t SamRecord::getAlignmentLength()</code>
 
| Returns the length of the clipped sequence (returns 0 if the cigar is '*').
 
|-
 
| <code>int32_t SamRecord::get0BasedUnclippedStart()</code>
 
| Returns the 0-based inclusive left-most position adjusted for clipped bases.
 
|-
 
| <code>int32_t SamRecord::get1BasedUnclippedStart()</code>
 
| Returns the 1-based inclusive left-most position adjusted for clipped bases.
 
|-
 
| <code>int32_t SamRecord::get0BasedUnclippedEnd()</code>
 
| Returns the 0-based inclusive right-most position adjusted for clipped bases.
 
|-
 
| <code>int32_t SamRecord::get1BasedUnclippedEnd()</code>
 
| Returns the 1-based inclusive right-most position adjusted for clipped bases.
 
|-
 
| <code>const char* SamRecord::getReadName()</code>
 
| Returns the SAM formatted Read Name (QNAME).
 
|-
 
| <code>const char* SamRecord::getCigar()</code>
 
| Returns the SAM formatted CIGAR string.
 
|-
 
| <code>const char* SamRecord::getSequence()</code>
 
| Returns the SAM formatted Sequence string.
 
|-
 
| <code>const char* SamRecord::getQuality()</code>
 
| Returns the SAM formatted Quality string.
 
|-
 
| <code>char SamRecord::getSequence(int index)</code>
 
| Get the sequence base at the specified index into this sequence, range: 0 to readLength - 1.
 
|-
 
| <code>char SamRecord::getQuality(int index)</code>
 
| Get the quality character at the specified index into this sequence, range: 0 to readLength - 1.
 
|-
 
|<code>Cigar* SamRecord::getCigarInfo()</code>
 
| Returns a pointer to the Cigar object associated with this record.  This object is essentially read-only - only allowing modifications due to lazy evaluations.
 
|-
 
| <code>uint32_t SamRecord::getNumOverlaps(int32_t start, int32_t end)</code>
 
| Return the number of bases in this read that overlap the passed in region. 
 
Matches and mismatches between the read and the reference are counted as overlaps, but insertions, deletions, skips, clips, and pads are not counted.
 
  
start is the 0-based inclusive starting reference position of the region.
 
  
end is the 0-based exclusive ending reference position of the region.
+
==Example of using getNextSamTag==
|-
 
| <code>bool SamRecord::getFields(bamRecordStruct& recStruct, String& readName, String& cigar, String& sequence, String& quality)</code>
 
| Returns true if the passed in fields were successfully set, otherwise false.
 
The bamRecordStruct that is set does not contain the values for the variable length fields.  Tags are not returned by this method.
 
|-
 
| <code>uint32_t SamRecord::getTagLength()</code>
 
| Returns the length of the tags in BAM format.
 
|-
 
| <code>bool SamRecord::getNextSamTag(char* tag, char& vtype, void** value)</code>
 
| Returns true if a tag was read, false if there are no more tags.
 
For a true return value, tag is sent to the tag of the tag, vtype is set to the vtype of the tag, and value is a pointer to the value of the tag.  You will then need to use a switch to cast value to int, double, char, or String.
 
|-
 
| <code>void SamRecord::resetTagIter()</code>
 
| Resets the iterator that loops through the tags to the beginning of the tags.  The iterator is automatically reset when a new record is read.  This method is only necessary if you want to iterate over a set of tags multiple times.
 
|-
 
| <code>bool SamRecord::isIntegerType(char vtype)</code>
 
| Returns true if the passed in vtype is of integer ('c', 'C', 's', 'S', 'i', 'I') type.
 
|-
 
| <code>bool SamRecord::isDoubleType(char vtype)</code>
 
| Returns true if the passed in vtype is of double ('f') type.
 
|-
 
| <code>bool SamRecord::isCharType(char vtype)</code>
 
| Returns true if the passed in vtype is of char ('A') type.
 
|-
 
| <code>bool SamRecord::isStringType(char vtype)</code>
 
| Returns true if the passed in vtype is of String ('Z') type.
 
|-
 
|}
 
 
 
 
 
Example of using getNextSamTag:
 
 
<source lang="cpp">
 
<source lang="cpp">
 
   // record is a previously setup SamRecord.
 
   // record is a previously setup SamRecord.

Revision as of 16:49, 6 September 2011


This class is part of C++ Library: libStatGen.

Getting/Setting fields in a SAM/BAM Record

The SamRecord class contains accessors to "set" and "get" the fields of a SAM/BAM record.

The "set" accessors are used for creating a record that is not read from a SAM/BAM file. By using these set methods to setup the record, they can be pulled back out using the get accessors or the record can be later written as either a SAM/BAM record.

The "get" accessors assume that the class has already been populated, either by using the set commands or by calling SamFile::ReadRecord. Not all of the values that can be retrieved using these get accessors have set methods. That is because they are either read from the file or are internally calculated values.

See: http://www.sph.umich.edu/csg/mktrost/doxygen/current/classSamFileHeader.html for documentation.


Example of using getNextSamTag

   // record is a previously setup SamRecord.
   String recordString = "";
   char tag[3];
   char vtype;
   void* value;

   // While there are more tags, write them to the recordString.
   while(record.getNextSamTag(tag, vtype, &value) != false)
   {
      recordString += "\t";
      recordString += tag;
      recordString += ":"; 
      recordString += vtype;
      recordString += ":";
      if(record.isIntegerType(vtype))
      {
         recordString += (int)*(int*)value;
      }
      else if(record.isDoubleType(vtype))
      {
         recordString += (double)*(double*)value;
      }
      else if(record.isCharType(vtype))
      {
         recordString += (char)*(char*)value;
      }
      else
      {
         // String type.
         recordString += (String)*(String*)value;
      }
   }

   recordString += "\n";