Difference between revisions of "C++ Class: SamFileHeader"

From Genome Analysis Wiki
Jump to: navigation, search
Line 9: Line 9:
  
 
=== Sam Header Basics ===
 
=== Sam Header Basics ===
The SamFileHeader is comprised of multiple [[C++ Class: SamHeaderRecord|SamHeaderRecords]].
+
The SamFileHeader is comprised of multiple [http://www.sph.umich.edu/csg/mktrost/doxygen/current/classSamHeaderRecord.html SamHeaderRecords].
  
 
There are 4 types of SAM Header Records:
 
There are 4 types of SAM Header Records:
Line 23: Line 23:
 
The '''SamFileHeader''' also contains Comments, type CO.  They are not included as part of the '''SamHeaderRecord''' class since they do not contain Tag/Value pairs.
 
The '''SamFileHeader''' also contains Comments, type CO.  They are not included as part of the '''SamHeaderRecord''' class since they do not contain Tag/Value pairs.
  
=== Setting fields in the Header ===
+
See: http://www.sph.umich.edu/csg/mktrost/doxygen/current/classSamFileHeader.html for documentation.
The '''SamFileHeader''' class contains accessors to set the header lines of a SAM/BAM header.  By using these set methods to setup the header, they can be pulled back out using the get accessors or the header can be later written to a SAM/BAM file.
 
 
 
==== Copying a Header ====
 
These three methods are ways of copying the contents of one header into another one.
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"
 
! Method Name !!  Description
 
|-
 
| <code>bool SamFileHeader::copy(const SamFileHeader& header)</code>
 
| Copy method copies the specified header into this one.
 
|-
 
|<code>SamFileHeader::SamFileHeader(const SamFileHeader& header)</code>
 
| Copy constructor copies the specified header into this one.
 
|-
 
|<code>SamFileHeader & SamFileHeader::operator = (const SamFileHeader& header)</code>
 
| operator= copies the specified header into this one.
 
|}
 
 
 
==== Adding an entire Header Line ====
 
The methods found in the '''SamFileHeader''' class for setting an entire header record are:
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"  '''SamFileHeader Class Methods'''
 
! Method Name !!  Description
 
|-
 
| <code>bool SamFileHeader::addHeaderLine(const char* type, const char* tag, const char* value)</code>
 
| Adds the type, tag, and const char* value to the header.
 
Returns true if successfully added, false if not.
 
 
 
NOTE: currently, this method will only do one tag per type on a line.  If a type has multiple tags, then the whole line needs to be added at once.
 
|-
 
| <code>bool SamFileHeader::addHeaderLine(const char* headerLine)</code>
 
| Adds the already setup/formatted headerLine to the header.  It is assumed that the line does not contain a “\n”.
 
Returns true if successfully added, false if not.
 
|}
 
 
 
==== Set/Add/Remove a Single Tag ====
 
The passed in tag should be the two character SAM tag as defined in the SAM spec.
 
 
 
A tag is removed from the header record by setting it to "". For the SQ and RG header types, the key tags (SN for SQ and ID for RG) may not be modified or removed once set. This is because these values are used as a lookup key for the header record, so the entire record must be removed.
 
 
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"  '''SamFileHeader Class Methods'''
 
! Method Name !!  Description
 
|-
 
| <code>bool SamFileHeader::setHDTag(const char* tag, const char* value)</code>
 
| Set the specified tag to the specified value in the HD header.  A tag is removed by passing in value = "".
 
Returns true if the tag was successfully set.
 
|-
 
| <code>bool SamFileHeader::setPGTag(const char* tag, const char* value, const char* id)</code>
 
| Set the specified tag to the specified value in the PG header with the specified id.  If the header does not yet exist, the header is added and so is the ID tag with the value set to the passed in id.  A tag is removed by passing in value = "".  The ID tag may not be modified or removed after it is set.
 
Returns true if the tag was successfully set.
 
|-
 
| <code>bool SamFileHeader::setSQTag(const char* tag, const char* value, const char* name)</code>
 
| Set the specified tag to the specified value in the SQ header with the specified name.  If the header does not yet exist, the header is added and so is the SN tag with the value set to the passed in name.  A tag is removed by passing in value = "".  The SN tag may not be modified or removed after it is set.
 
Returns true if the tag was successfully set.
 
|-
 
| <code>bool SamFileHeader::setRGTag(const char* tag, const char* value, const char* id)</code>
 
| Set the specified tag to the specified value in the RG header with the specified id.  If the header does not yet exist, the header is added and so is the ID tag with the value set to the passed in id.  A tag is removed by passing in value = "".  The ID tag may not be modified or removed after it is set.
 
Returns true if the tag was successfully set.
 
|}
 
 
 
==== Add an Already Setup SamHeaderRecord ====
 
NOTE: These methods add a pointer to the passed in record.  The header record will be deleted when it is cleaned up from this header.
 
 
 
NOTE: Do NOT delete the passed in record, the SamFileHeader class takes care of that itself.
 
 
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"  '''SamFileHeader Class Methods'''
 
! Method Name !!  Description
 
|-
 
| <code>bool SamFileHeader::addHD(SamHeaderHD* hd)</code>
 
| Add the HD record to the header.
 
Returns true if the record was successfully added.
 
|-
 
| <code> bool SamFileHeader::addPG(SamHeaderPG* pg)</code>
 
| Add the PG record to the header.
 
Returns true if the record was successfully added.
 
|-
 
| <code>bool SamFileHeader::addSQ(SamHeaderSQ* sq)</code>
 
| Add the SQ record to the header.
 
Returns true if the record was successfully added.
 
|-
 
| <code>bool SamFileHeader::addRG(SamHeaderRG* rg)</code>
 
|Add the SQ record to the header.
 
Returns true if the record was successfully added.
 
|}
 
 
 
==== Remove an Entire Header Record ====
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"  '''SamFileHeader Class Methods'''
 
! Method Name !!  Description
 
|-
 
| <code>bool SamFileHeader::removeHD()</code>
 
| Remove the HD record.
 
Returns true if successfully removed or if it didn't exist in the first place.  Returns false if the record still exists.
 
|-
 
| <code>bool SamFileHeader::removePG(const char* id)</code>
 
| Remove the PG record associated with the specified id.
 
Returns true if successfully removed or if it didn't exist in the first place.  Returns false if the record still exists.
 
|-
 
| <code>bool SamFileHeader::removeSQ(const char* name)</code>
 
| Remove the SQ record associated with the specified name.
 
Returns true if successfully removed or if it didn't exist in the first place.  Returns false if the record still exists.
 
|-
 
| <code>bool SamFileHeader::removeRG(const char* id)</code>
 
| Remove the RG record associated with the specified id.
 
Returns true if successfully removed or if it didn't exist in the first place.  Returns false if the record still exists.
 
|}
 
 
 
==== Add a Comment ====
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"  '''SamFileHeader Class Methods'''
 
! Method Name !!  Description
 
|-
 
| <code>bool SamFileHeader::addComment(const char* comment)</code>
 
| Add the specified comment to the header.  The comment should NOT include the "@CO" or "\n".
 
Returns true if successfully added.
 
|}
 
 
 
 
 
=== Getting fields from the Header ===
 
The following sets of methods are accessors to pull information from the SAM/BAM Header.
 
 
 
==== Get the Entire Header ====
 
Get the entire header as a single string.
 
 
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"  '''SamFileHeader Class Methods'''
 
! Method Name !!  Description
 
|-
 
| <code>bool SamFileHeader::getHeaderString(std::string& header)</code>
 
| Set the passed in string to the entire header string.  Clearing its current contents.
 
Returns true if successfully set (even if set to "")
 
|}
 
 
 
==== Get the Header Record/Line by Record/Line ====
 
The following methods are for iterating through the header.
 
 
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"  '''SamFileHeader Class Methods'''
 
! Method Name !!  Description
 
|-
 
| <code>SamHeaderRecord* SamFileHeader::getNextHeaderRecord()</code>
 
| Returns the next header record.  After all headers have been retrieved, NULL is returned until a reset is called.
 
Does not return the Comment lines.
 
 
 
NOTE: both getNextHeaderRecord and getNextHeaderLine increment the same iterator.
 
|-
 
| <code>bool SamFileHeader::getNextHeaderLine(std::string& headerLine)</code>
 
|  Set the passed in string to the next header line.  The passed in string will be overwritten. If there are no more header lines or there is an error, false is returned and the passed in string is set to "" until a rest is called.  Will also return the comment lines.
 
 
 
NOTE: both getNextHeaderRecord and getNextHeaderLine increment the same iterator.
 
|-
 
| <code>void SamFileHeader::resetHeaderRecordIter()</code>
 
| Resets to the beginning of the header records so the next call to getNextHeaderRecord/getNextHeaderLine returns the first header line.
 
|-
 
| <code>SamHeaderRecord* SamFileHeader::getNextSQRecord()</code>
 
| Get the next SQ header record.  After all SQ headers have been retrieved, NULL is returned until a reset is called for the SQ Record Iterator.  Independent from getNextHeaderRecord, getNextHeaderLine and the other getNextXXRecord methods and the associated reset methods.
 
|-
 
| <code>SamHeaderRecord* SamFileHeader::getNextRGRecord()</code>
 
| Get the next RG header record.  After all RG headers have been retrieved, NULL is returned until a reset is called for the RG Record Iterator.  Independent from getNextHeaderRecord, getNextHeaderLine and the other getNextXXRecord methods and the associated reset methods.
 
|-
 
| <code>SamHeaderRecord* SamFileHeader::getNextPGRecord()</code>
 
| Get the next PG header record.  After all PG headers have been retrieved, NULL is returned until a reset is called for the PG Record Iterator.  Independent from getNextHeaderRecord, getNextHeaderLine and the other getNextXXRecord methods and the associated reset methods.
 
|-
 
| <code>void SamFileHeader::resetSQRecordIter()</code>
 
| Reset the SQ Record iterator to the beginning of the header records so the next call to getNextSQRecord returns the first SQ header record.
 
|-
 
| <code>void SamFileHeader::resetRGRecordIter()</code>
 
| Reset the RG Record iterator to the beginning of the header records so the next call to getNextRGRecord returns the first RG header record.
 
|-
 
| <code>void SamFileHeader::resetPGRecordIter()</code>
 
| Reset the PG Record iterator to the beginning of the header records so the next call to getNextPGRecord returns the first PG header record.
 
|}
 
 
 
==== Get a Specific Tag ====
 
These methods return the value associated with the specified tag.  If the tag does not exist in the record "" is returned.
 
 
 
For SQ and RG the value returned is for the tag associated with the specified key (name/id).  If a record with that key does not exist or if the tag does not exist for the record with that key, "" is returned.
 
 
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"  '''SamFileHeader Class Methods'''
 
! Method Name !!  Description
 
|-
 
| <code>const char* SamFileHeader::getHDTagValue(const char* tag)</code>
 
| Returns the value associated with the specified HD tag.  Returns "" if the tag does not exist in the header.
 
|-
 
| <code>const char* SamFileHeader::getPGTagValue(const char* tag, const char* id)</code>
 
| Returns the value associated with the specified tag on the PG line with the specified id.  Returns "" if the tag does not exist in the specified line or if the specified line does not exist.
 
|-
 
| <code>const char* SamFileHeader::getSQTagValue(const char* tag, const char* name)</code>
 
| Returns the value associated with the specified tag on the SQ line with the specified sequence name.  Returns "" if the tag does not exist in the specified line or if the specified line does not exist.
 
|-
 
| <code>const char* SamFileHeader::getRGTagValue(const char* tag, const char* id)</code>
 
| Returns the value associated with the specified tag on the RG line with the specified id.  Returns "" if the tag does not exist in the specified line or if the specified line does not exist.
 
|}
 
 
 
==== Get a Specific Header Record ====
 
These methods return a reference to the specific record that was requested.  They return NULL if that record does not exist in the header.
 
 
 
The returned record can be modified to add/remove some tags.  Since a reference is returned, the SamHeaderFile automatically reflects these changes.
 
 
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"  '''SamFileHeader Class Methods'''
 
! Method Name !!  Description
 
|-
 
| <code>SamHeaderHD* SamFileHeader::getHD()</code>
 
| Get the HD object.  Returns NULL if there is no HD record.
 
|-
 
| <code>SamHeaderPG* SamFileHeader::getPG(const char* id)</code>
 
| Get the PG object with the specified id.  Returns NULL if there is no PG record with that key.
 
|-
 
| <code>SamHeaderSQ* SamFileHeader::getSQ(const char* name)</code>
 
| Get the SQ object with the specified sequence name.  Returns NULL if there is no SQ object with that key.
 
|-
 
| <code>SamHeaderRG* SamFileHeader::getRG(const char* id)</code>
 
| Get the RG object with the specified id.  Returns NULL if there is no RG object with that key.
 
|}
 
 
 
==== Get Just Comments ====
 
These methods just access the header comments.
 
 
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"  '''SamFileHeader Class Methods'''
 
! Method Name !!  Description
 
|-
 
| <code>const char* SamFileHeader::getNextComment()</code>
 
| Returns the comment on the next comment.  Does not include the "@CO" or the "\n"
 
Returns "" if all comment lines have been returned, until resetCommentIter is called.
 
|-
 
| <code>void SamFileHeader::resetCommentIter()</code>
 
| Resets to the beginning of the comments so getNextComment returns the first comment.
 
|}
 
 
 
==== Additional Accessors ====
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"  '''SamFileHeader Class Methods'''
 
! Method Name !!  Description
 
|-
 
| <code>const char* SamFileHeader::getSortOrder()</code>
 
| Return the Sort Order value that is set in the Header.  If this field does not exist, "" is returned.
 
|-
 
| <code>int SamFileHeader::GetReferenceID(const String & referenceName)</code>
 
| Returns the reference ID associated with the specified reference name (chromosome)
 
|-
 
| <code>int SamFileHeader::GetReferenceID(const char* referenceName)</code>
 
| Returns the reference ID associated with the specified reference name (chromosome)
 
|-
 
| <code>const String& SamFileHeader::GetReferenceLabel(int id)</code>
 
| Returns the reference name (chromosome) associated with the specified reference ID.
 
|}
 
 
 
  
 
==== Additional Proposed Accessors ====
 
==== Additional Proposed Accessors ====

Revision as of 18:05, 24 August 2011


SamFileHeader

This class allows a user to get/set the fields in a SAM/BAM Header.

This class is part of C++ Library: libStatGen.

Sam Header Basics

The SamFileHeader is comprised of multiple SamHeaderRecords.

There are 4 types of SAM Header Records:

  1. HD - Header
  2. SQ - Sequence Dictionary
  3. RG - Read Group
  4. PG - Program

A SAM Header Record is comprised of Tag/Value pairs. Each tag only appears once within a specific record.

A SAM Header can have 0 or 1 HD records, 0 or more PG records, 0 or more SQ Records, and 0 or more RG records. The PG records are keyed off of the ID tag. The SQ records are keyed off of the SN, Sequence Name, tag. The RG records are keyed off of the ID, Unique Read Group Identifier, tag. The keys must be unique for that record type within the file.

The SamFileHeader also contains Comments, type CO. They are not included as part of the SamHeaderRecord class since they do not contain Tag/Value pairs.

See: http://www.sph.umich.edu/csg/mktrost/doxygen/current/classSamFileHeader.html for documentation.

Additional Proposed Accessors

  • HD
    • getVersion - returns the VN field (will only be one)
  • SQ
    • getRefSequenceCount - count of the number of SQ entries in the header
    • getRefSequenceName - gets the next reference sequence name.
    • getRefSequenceLength - gets the length associated with the specified reference sequence.
  • RG
    • getSampleID - for a specified Read Group....???? but SampleID is the key...maybe passing in a record?
    • getReadGroup - pass in record, return a read group structure?
    • getLibrary - for a given read group
    • getSample - for a given read group
    • getTechnology - for a given read group
    • getPlatformUnit - for a given read group

NOTE: More Get Accessors will be coming. Let me know if you need a specific one, and I can add that first