Difference between revisions of "C++ Class: SamFileHeader"

From Genome Analysis Wiki
Jump to navigationJump to search
Line 23: Line 23:
 
=== Setting fields in the Header ===
 
=== Setting fields in the Header ===
 
The '''SamFileHeader''' class contains accessors to set the header lines of a SAM/BAM header.  By using these set methods to setup the header, they can be pulled back out using the get accessors or the header can be later written to a SAM/BAM file.
 
The '''SamFileHeader''' class contains accessors to set the header lines of a SAM/BAM header.  By using these set methods to setup the header, they can be pulled back out using the get accessors or the header can be later written to a SAM/BAM file.
 +
 +
==== Copying a Header ====
 +
These three methods are ways of copying the contents of one header into another one.
 +
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 +
|-style="background: #f2f2f2; text-align: center;"
 +
! Method Name !!  Description
 +
|-
 +
| <code>bool SamFileHeader::copy(const SamFileHeader& header)</code>
 +
| Copy method copies the specified header into this one.
 +
|-
 +
|<code>SamFileHeader::SamFileHeader(const SamFileHeader& header)</code>
 +
| Copy constructor copies the specified header into this one.
 +
|-
 +
|<code>SamFileHeader & SamFileHeader::operator = (const SamFileHeader& header)</code>
 +
| operator= copies the specified header into this one.
 +
|}
  
 
==== Adding an entire Header Line ====
 
==== Adding an entire Header Line ====

Revision as of 17:03, 9 June 2010


SamFileHeader

This class allows a user to get/set the fields in a SAM/BAM Header.

This class is part of libbam.

Sam Header Basics

The SamFileHeader is comprised of multiple SamHeaderRecords.

There are 4 types of SAM Header Records:

  1. HD - Header
  2. SQ - Sequence Dictionary
  3. RG - Read Group
  4. PG - Program

A SAM Header Record is comprised of Tag/Value pairs. Each tag only appears once within a specific record.

A SAM Header can have 0 or 1 HD records, 0 or more PG records, 0 or more SQ Records, and 0 or more RG records. The PG records are keyed off of the ID tag. The SQ records are keyed off of the SN, Sequence Name, tag. The RG records are keyed off of the ID, Unique Read Group Identifier, tag. The keys must be unique for that record type within the file.

The SamFileHeader also contains Comments, type CO. They are not included as part of the SamHeaderRecord class since they do not contain Tag/Value pairs.

Setting fields in the Header

The SamFileHeader class contains accessors to set the header lines of a SAM/BAM header. By using these set methods to setup the header, they can be pulled back out using the get accessors or the header can be later written to a SAM/BAM file.

Copying a Header

These three methods are ways of copying the contents of one header into another one.

Method Name Description
bool SamFileHeader::copy(const SamFileHeader& header) Copy method copies the specified header into this one.
SamFileHeader::SamFileHeader(const SamFileHeader& header) Copy constructor copies the specified header into this one.
SamFileHeader & SamFileHeader::operator = (const SamFileHeader& header) operator= copies the specified header into this one.

Adding an entire Header Line

The methods found in the SamFileHeader class for setting an entire header record are:

Method Name Description
bool SamFileHeader::addHeaderLine(const char* type, const char* tag, const char* value) Adds the type, tag, and const char* value to the header.

Returns true if successfully added, false if not.

NOTE: currently, this method will only do one tag per type on a line. If a type has multiple tags, then the whole line needs to be added at once.

bool SamFileHeader::addHeaderLine(const char* headerLine) Adds the already setup/formatted headerLine to the header. It is assumed that the line does not contain a “\n”.

Returns true if successfully added, false if not.

Set/Add/Remove a Single Tag

The passed in tag should be the two character SAM tag as defined in the SAM spec.

A tag is removed from the header record by setting it to "". For the SQ and RG header types, the key tags (SN for SQ and ID for RG) may not be modified or removed once set. This is because these values are used as a lookup key for the header record, so the entire record must be removed.

Method Name Description
bool SamFileHeader::setHDTag(const char* tag, const char* value) Set the specified tag to the specified value in the HD header. A tag is removed by passing in value = "".

Returns true if the tag was successfully set.

bool SamFileHeader::setPGTag(const char* tag, const char* value, const char* id) Set the specified tag to the specified value in the PG header with the specified id. If the header does not yet exist, the header is added and so is the ID tag with the value set to the passed in id. A tag is removed by passing in value = "". The ID tag may not be modified or removed after it is set.

Returns true if the tag was successfully set.

bool SamFileHeader::setSQTag(const char* tag, const char* value, const char* name) Set the specified tag to the specified value in the SQ header with the specified name. If the header does not yet exist, the header is added and so is the SN tag with the value set to the passed in name. A tag is removed by passing in value = "". The SN tag may not be modified or removed after it is set.

Returns true if the tag was successfully set.

bool SamFileHeader::setRGTag(const char* tag, const char* value, const char* id) Set the specified tag to the specified value in the RG header with the specified id. If the header does not yet exist, the header is added and so is the ID tag with the value set to the passed in id. A tag is removed by passing in value = "". The ID tag may not be modified or removed after it is set.

Returns true if the tag was successfully set.

Add an Already Setup SamHeaderRecord

NOTE: These methods add a pointer to the passed in record. The header record will be deleted when it is cleaned up from this header.

NOTE: Do NOT delete the passed in record, the SamFileHeader class takes care of that itself.

Method Name Description
bool SamFileHeader::addHD(SamHeaderHD* hd) Add the HD record to the header.

Returns true if the record was successfully added.

bool SamFileHeader::addPG(SamHeaderPG* pg) Add the PG record to the header.

Returns true if the record was successfully added.

bool SamFileHeader::addSQ(SamHeaderSQ* sq) Add the SQ record to the header.

Returns true if the record was successfully added.

bool SamFileHeader::addRG(SamHeaderRG* rg) Add the SQ record to the header.

Returns true if the record was successfully added.

Remove an Entire Header Record

Method Name Description
bool SamFileHeader::removeHD() Remove the HD record.

Returns true if successfully removed or if it didn't exist in the first place. Returns false if the record still exists.

bool SamFileHeader::removePG(const char* id) Remove the PG record associated with the specified id.

Returns true if successfully removed or if it didn't exist in the first place. Returns false if the record still exists.

bool SamFileHeader::removeSQ(const char* name) Remove the SQ record associated with the specified name.

Returns true if successfully removed or if it didn't exist in the first place. Returns false if the record still exists.

bool SamFileHeader::removeRG(const char* id) Remove the RG record associated with the specified id.

Returns true if successfully removed or if it didn't exist in the first place. Returns false if the record still exists.

Add a Comment

Method Name Description
bool SamFileHeader::addComment(const char* comment) Add the specified comment to the header. The comment should NOT include the "@CO" or "\n".

Returns true if successfully added.


Getting fields from the Header

The following sets of methods are accessors to pull information from the SAM/BAM Header.

Get the Entire Header

Get the entire header as a single string.

Method Name Description
bool SamFileHeader::getHeaderString(std::string& header) Set the passed in string to the entire header string. Clearing its current contents.

Returns true if successfully set (even if set to "")

Get the Header Record/Line by Record/Line

The following methods are for iterating through the header.

Method Name Description
SamHeaderRecord* SamFileHeader::getNextHeaderRecord() Returns the next header record. After all headers have been retrieved, NULL is returned until a reset is called.

Does not return the Comment lines.

NOTE: both getNextHeaderRecord and getNextHeaderLine increment the same iterator.

const char* SamFileHeader::getNextHeaderLine() Returns the string version of the next header record. After all headers have been retrieved, "" is returned until a reset is called.

Return the comment lines.

NOTE: both getNextHeaderRecord and getNextHeaderLine increment the same iterator.

void SamFileHeader::resetHeaderRecordIter() Resets to the beginning of the header records so the next call to getNextHeaderRecord/getNextHeaderLine returns the first header line.

Get a Specific Tag

These methods return the value associated with the specified tag. If the tag does not exist in the record "" is returned.

For SQ and RG the value returned is for the tag associated with the specified key (name/id). If a record with that key does not exist or if the tag does not exist for the record with that key, "" is returned.

Method Name Description
const char* SamFileHeader::getHDTagValue(const char* tag) Returns the value associated with the specified HD tag. Returns "" if the tag does not exist in the header.
const char* SamFileHeader::getPGTagValue(const char* tag, const char* id) Returns the value associated with the specified tag on the PG line with the specified id. Returns "" if the tag does not exist in the specified line or if the specified line does not exist.
const char* SamFileHeader::getSQTagValue(const char* tag, const char* name) Returns the value associated with the specified tag on the SQ line with the specified sequence name. Returns "" if the tag does not exist in the specified line or if the specified line does not exist.
const char* SamFileHeader::getRGTagValue(const char* tag, const char* id) Returns the value associated with the specified tag on the RG line with the specified id. Returns "" if the tag does not exist in the specified line or if the specified line does not exist.

Get a Specific Record

These methods return a reference to the specific record that was requested. They return NULL if that record does not exist in the header.

The returned record can be modified to add/remove some tags. Since a reference is returned, the SamHeaderFile automatically reflects these changes.

Method Name Description
SamHeaderHD* SamFileHeader::getHD() Get the HD object. Returns NULL if there is no HD record.
SamHeaderPG* SamFileHeader::getPG(const char* id) Get the PG object with the specified id. Returns NULL if there is no PG record with that key.
SamHeaderSQ* SamFileHeader::getSQ(const char* name) Get the SQ object with the specified sequence name. Returns NULL if there is no SQ object with that key.
SamHeaderRG* SamFileHeader::getRG(const char* id) Get the RG object with the specified id. Returns NULL if there is no RG object with that key.

Get Just Comments

These methods just access the header comments.

Method Name Description
const char* SamFileHeader::getNextComment() Returns the comment on the next comment. Does not include the "@CO" or the "\n"

Returns "" if all comment lines have been returned, until resetCommentIter is called.

void SamFileHeader::resetCommentIter() Resets to the beginning of the comments so getNextComment returns the first comment.

Additional Accessors

Method Name Description
const char* SamFileHeader::getTagSO() Return the value of the SO tag in the HD record. If the field does not exist, "unsorted" is returned.


Additional Proposed Accessors

  • HD
    • getVersion - returns the VN field (will only be one)
  • SQ
    • getRefSequenceCount - count of the number of SQ entries in the header
    • getRefSequenceName - gets the next reference sequence name.
    • getRefSequenceLength - gets the length associated with the specified reference sequence.
  • RG
    • getSampleID - for a specified Read Group....???? but SampleID is the key...maybe passing in a record?
    • getReadGroup - pass in record, return a read group structure?
    • getLibrary - for a given read group
    • getSample - for a given read group
    • getTechnology - for a given read group
    • getPlatformUnit - for a given read group

NOTE: More Get Accessors will be coming. Let me know if you need a specific one, and I can add that first