Changes

From Genome Analysis Wiki
Jump to navigationJump to search
9,183 bytes removed ,  11:10, 6 April 2010
no edit summary
Line 23: Line 23:     
Documentation on the '''SamFile''' class can be found at [[C++ Class: SamFile]].
 
Documentation on the '''SamFile''' class can be found at [[C++ Class: SamFile]].
 +
    
== SAM/BAM Header ==
 
== SAM/BAM Header ==
Line 29: Line 30:       −
== Setting fields in a SAM/BAM Record ==
+
== SAM/BAM Record ==
The '''SamRecord''' class contains accessors to set the fields of a SAM/BAM record.  They are used for creating a record that is not read from a SAM/BAM file.  By using these set methods to setup the record, they can be pulled back out using the get accessors or the record can be later written as either a SAM/BAM record. 
  −
The methods found in the '''SamRecord''' class for setting fields are:
  −
{| class="wikitable" style="width:100%" border="1"
  −
|+ style="font-size:150%"|'''SamRecord Class Methods'''
  −
!  width=""|Method Name
  −
!  width=""|Description
  −
|-
  −
| void resetRecord()
  −
| Resets the record to be an empty record.  This is not necessary when you are reading a Sam/Bam file, but if you are setting fields, it is a good idea to clean out a record before reusing it.  Clearing it allows you to not have to set any empty fields.
  −
|-
  −
| bool setReadName(const char* readName)
  −
| Sets QNAME to the passed in name.
  −
Returns true if successfully set, false if not.
  −
|-
  −
| bool setFlag(int16_t flag)
  −
| Sets the bitwise FLAG to the passed in value.
  −
Returns true if successfully set, false if not.
  −
|-
  −
| bool setReferenceName(SamFileHeader& header, const char* referenceName)
  −
| Sets the reference sequence name.  The reference id is calculated using the header.
  −
Returns true if successfully set, false if not.
  −
|-
  −
| bool set1BasedPosition(int32_t position)
  −
| Sets the leftmost position.  The value passed in is 1-based (SAM formatted).  Internal processing handles switching between SAM/BAM formats when read/written.
  −
Returns true if successfully set, false if not.
  −
|-
  −
| bool set0BasedPosition(int32_t position)
  −
| Sets the leftmost position.  The value passed in is 0-based (BAM formatted).  Internal processing handles switching between SAM/BAM formats when read/written.
  −
Returns true if successfully set, false if not.
  −
|-
  −
|bool setMapQuality(int8_t mapQuality)
  −
| Sets the mapping quality.
  −
Returns true if successfully set, false if not.
  −
|-
  −
| bool setCigar(const char* cigar)
  −
| Sets the cigar string to the passed in CIGAR.  This is a SAM formatted CIGAR string.  Internal processing handles switching between SAM/BAM formats when read/written.
  −
Returns true if successfully set, false if not.
  −
|-
  −
| bool setMateReferenceName(SamFileHeader& header, const char* referenceName)
  −
| Sets the mate reference sequence name.  The mate reference id is calculated using the header.
  −
Returns true if successfully set, false if not.
  −
|-
  −
| bool set1BasedMatePosition(int32_t matePosition)
  −
| Sets the leftmost mate position.  The value passed in is 1-based (SAM formatted).  Internal processing handles switching between SAM/BAM formats when read/written.
  −
Returns true if successfully set, false if not.
  −
|-
  −
| bool set0BasedMatePosition(int32_t matePosition)
  −
| Sets the leftmost mate position.  The value passed in is 0-based (BAM formatted).  Internal processing handles switching between SAM/BAM formats when read/written.
  −
Returns true if successfully set, false if not.
  −
|-
  −
| bool setInsertSize(int32_t insertSize)
  −
| Sets the inferred insert size.
  −
Returns true if successfully set, false if not.
  −
|-
  −
| bool setSequence(const char* seq)
  −
| Sets the sequence string to the passed in string.  This is a SAM formatted sequence string.  Internal processing handles switching between SAM/BAM formats when read/written.
  −
Returns true if successfully set, false if not.
  −
|-
  −
| bool setQuality(const char* quality)
  −
| Sets the quality string to the passed in string.  This is a SAM formatted quality string.  Internal processing handles switching between SAM/BAM formats when read/written.
  −
Returns true if successfully set, false if not.
  −
|-
  −
|  bool addTag(const char* tag, char vtype, const char* value)
  −
| Adds a tag to the record with the specified tag, vtype, and value.  Vtype can be SAM/BAM vtype.  Internal processing handles switching between SAM/BAM vtypes when read/written.
  −
Returns true if successfully set, false if not.
  −
|}
  −
 
  −
When set, SAM fields are validated against: [[SAM Validation Criteria]]
  −
 
  −
 
  −
== Retrieving fields from a SAM/BAM Record ==
  −
The '''SamRecord''' class contains accessors to access the fields of a SAM/BAM record.  They assume that the class has already been populated, either by using the set commands or by calling SamFile::ReadRecord.  Not all of the values that can be retrieved using these get accessors have set methods.  That is because they are internally calculated values if they were not read from a file.
  −
 
  −
The methods found in the SamRecord class for setting fields are:
  −
{| class="wikitable" style="width:100%" border="1"
  −
|+ style="font-size:150%"|'''SamRecord Class Get Methods'''
  −
!  width=""|Method Name
  −
!  width=""|Description
  −
|-
  −
| bool isValid(SamFileHeader& header)
  −
| Returns true if the record is valid.  This performs validation steps.  TODO: the method exists, but it does not yet perform any checks, so just returns true.
  −
|-
  −
| int32_t getBlockSize()
  −
| Returns the BAM block size of the record.
  −
|-
  −
| const char* getReferenceName(SamFileHeader& header)
  −
| Returns the reference sequence name (SAM format).
  −
|-
  −
| int32_t getReferenceID()
  −
| Returns the reference sequence ID (BAM format).
  −
|-
  −
| int32_t get1BasedPosition()
  −
| Returns the 1-based (SAM formatted) leftmost position.
  −
|-
  −
| int32_t get0BasedPosition()
  −
| Returns the 0-based (BAM formatted) leftmost position.
  −
|-
  −
| int8_t getReadNameLength()
  −
| Returns the length of the ReadName (QNAME).
  −
|-
  −
| int8_t getMapQuality()
  −
| Returns the map quality.
  −
|-
  −
| int16_t getBin()
  −
| Returns the BAM bin for the record.
  −
|-
  −
| int16_t getCigarLength()
  −
| Returns the length of the CIGAR in BAM format.
  −
|-
  −
| int16_t getFlag()
  −
| Returns the flag.
  −
|-
  −
| int32_t getReadLength()
  −
| Returns the length of the read.
  −
|-
  −
| const char* getMateReferenceName(SamFileHeader& header)
  −
| Returns the mate reference sequence name (SAM format).  Returns the mate reference sequence name even if it is the same as the reference sequence name.
  −
|-
  −
| const char* getMateReferenceNameOrEqual(SamFileHeader& header)
  −
| Returns the mate reference sequence name (SAM format).  Returns the mate reference sequence name, unless it is the same as the reference sequence name, then an "=" is returned..
  −
|-
  −
| int32_t getMateReferenceID()
  −
| Returns the mate reference sequence id (BAM format).
  −
|-
  −
| int32_t get1BasedMatePosition()
  −
| Returns the 1-based (SAM formatted) mate leftmost position.
  −
|-
  −
| int32_t get0BasedMatePosition()
  −
| Returns the 0-based (BAM formatted) mate leftmost position.
  −
|-
  −
| int32_t getInsertSize()
  −
| Returns the insert size.
  −
|-
  −
| int32_t get0BasedAlignmentEnd();
  −
| Returns the 0-based inclusive right-most position of the clipped sequence.
  −
|-
  −
| int32_t get1BasedAlignmentEnd();
  −
| Returns the 1-based inclusive right-most position of the clipped sequence.
  −
|-
  −
| int32_t get0BasedUnclippedStart();
  −
| Returns the 0-based inclusive left-most position adjusted for clipped bases.
  −
|-
  −
| int32_t get1BasedUnclippedStart();
  −
| Returns the 1-based inclusive left-most position adjusted for clipped bases.
  −
|-
  −
| int32_t get0BasedUnclippedEnd();
  −
| Returns the 0-based inclusive right-most position adjusted for clipped bases.
  −
|-
  −
| int32_t get1BasedUnclippedEnd();
  −
| Returns the 1-based inclusive right-most position adjusted for clipped bases.
  −
|-
  −
| const char* getReadName()
  −
| Returns the SAM formatted Read Name (QNAME).
  −
|-
  −
| const char* getCigar()
  −
| Returns the SAM formatted CIGAR string.
  −
|-
  −
| const char* getSequence()
  −
| Returns the SAM formatted Sequence string.
  −
|-
  −
| const char* getQuality()
  −
| Returns the SAM formatted Quality string.
  −
|-
  −
| bool getNextSamTag(char* tag, char& vtype, void** value)
  −
| Returns true if a tag was read, false if there are no more tags.
  −
For a true return value, tag is sent to the tag of the tag, vtype is set to the vtype of the tag, and value is a pointer to the value of the tag.  You will then need to use a switch to cast value to int, double, char, or String.
  −
|-
  −
| bool isIntegerType(char vtype)
  −
| Returns true if the passed in vtype is of integer ('c', 'C', 's', 'S', 'i', 'I') type.
  −
|-
  −
| bool isDoubleType(char vtype)
  −
| Returns true if the passed in vtype is of double ('f') type.
  −
|-
  −
| bool isCharType(char vtype)
  −
| Returns true if the passed in vtype is of char ('A') type.
  −
|-
  −
| bool isStringType(char vtype)
  −
| Returns true if the passed in vtype is of String ('Z') type.
  −
|-
  −
|-
  −
|}
  −
 
  −
 
  −
Example of using getNextSamTag:
  −
<source lang="cpp">
  −
  // record is a previously setup SamRecord.
  −
  String recordString = "";
  −
  char tag[3];
  −
  char vtype;
  −
  void* value;
  −
 
  −
  // While there are more tags, write them to the recordString.
  −
  while(record.getNextSamTag(tag, vtype, &value) != false)
  −
  {
  −
      recordString += "\t";
  −
      recordString += tag;
  −
      recordString += ":";
  −
      recordString += vtype;
  −
      recordString += ":";
  −
      if(record.isIntegerType(vtype))
  −
      {
  −
        recordString += (int)*(int*)value;
  −
      }
  −
      else if(record.isDoubleType(vtype))
  −
      {
  −
        recordString += (double)*(double*)value;
  −
      }
  −
      else if(record.isCharType(vtype))
  −
      {
  −
        recordString += (char)*(char*)value;
  −
      }
  −
      else
  −
      {
  −
        // String type.
  −
        recordString += (String)*(String*)value;
  −
      }
  −
  }
  −
 
  −
  recordString += "\n";
  −
</source>
      +
Documentation on reading/writing a SAM/BAM Record can be found at [[C++ Class: SamRecord]].
    
== Suggested Improvements/Features ==
 
== Suggested Improvements/Features ==
* Add optional user flag for checking if a file is sorted when it is read.  It would report an error if it is unsorted.
 

Navigation menu