C++ Class: SamFileHeader

From Genome Analysis Wiki
Revision as of 18:41, 20 April 2010 by Mktrost (talk | contribs)
Jump to navigationJump to search

SamFileHeader

This class allows a user to get/set the fields in a SAM/BAM Header.

This class is part of libbam.

Sam Header Basics

The SamFileHeader is comprised of multiple SamHeaderRecords.

There are 4 types of SAM Header Records:

  1. HD - Header
  2. SQ - Sequence Dictionary
  3. RG - Read Group
  4. PG - Program

A SAM Header Record is comprised of Tag/Value pairs. Each tag only appears once within a specific record.

A SAM Header can have 0 or 1 HD records, 0 or 1 PG records, 0 or more SQ Records, and 0 or more RG records. The SQ records are keyed off of the SN, Sequence Name, tag. This tag must be unique between SQ records. The RG records are keyed off of the ID, Unique Read Group Identifier, tag. This tag must be unique between SQ records.

The SamFileHeader also contains Comments, type CO. They are not included as part of the SamHeaderRecord class since they do not contain Tag/Value pairs.

Setting fields in the Header

The SamFileHeader class contains accessors to set the header lines of a SAM/BAM header. By using these set methods to setup the header, they can be pulled back out using the get accessors or the header can be later written to a SAM/BAM file.

Adding an entire Header Line

The methods found in the SamFileHeader class for setting an entire header record are:

Method Name Description
bool SamFileHeader::addHeaderLine(const char* type, const char* tag, const char* value) Adds the type, tag, and const char* value to the header.

Returns true if successfully added, false if not.

NOTE: currently, this method will only do one tag per type on a line. If a type has multiple tags, then the whole line needs to be added at once.

bool SamFileHeader::addHeaderLine(const char* headerLine) Adds the already setup/formatted headerLine to the header. It is assumed that the line does not contain a “\n”.

Returns true if successfully added, false if not.

Set/Add/Remove a Single Tag

NOTE: A tag is removed by setting it to ""

NOTE: For the SQ header type, the SN tag may not be modified once it is set, and for the RG header type, the ID tag may not be modified once it is set. That is because these values are used as a lookup key for the header record.

Method Name Description
bool SamFileHeader::setHDTag(const char* tag, const char* value) Set the specified tag to the specified value in the HD header. A tag is removed by passing in value = "".

Returns true if the tag was successfully set.

bool SamFileHeader::setPGTag(const char* tag, const char* value) Set the specified tag to the specified value in the PG header.
bool SamFileHeader::setSQTag(const char* tag, const char* value, const char* name) Set the specified tag to the specified value in the SQ header with the specified name. If the header does not yet exist, the header is added and so is the SN tag with the value set to the passed in name. A tag is removed by passing in value = "". The SN tag may not be modified or removed after it is set.

Returns true if the tag was successfully set.

bool SamFileHeader::setRGTag(const char* tag, const char* value, const char* id) Set the specified tag to the specified value in the RG header with the specified id. If the header does not yet exist, the header is added and so is the ID tag with the value set to the passed in id. A tag is removed by passing in value = "". The ID tag may not be modified or removed after it is set.

Returns true if the tag was successfully set.

Add an Already Setup SamHeaderRecord

NOTE: These methods add a pointer to the passed in record. The header record will be deleted when it is cleaned up from this header.

NOTE: Do NOT delete the passed in record, the SamFileHeader class takes care of that itself.

Method Name Description
bool SamFileHeader::addHD(SamHeaderHD* hd) Add the HD record to the header.

Returns true if the record was successfully added.

bool SamFileHeader::addPG(SamHeaderPG* pg) Add the PG record to the header.

Returns true if the record was successfully added.

bool SamFileHeader::addSQ(SamHeaderSQ* sq) Add the SQ record to the header.

Returns true if the record was successfully added.

bool SamFileHeader::addRG(SamHeaderRG* rg) Add the SQ record to the header.

Returns true if the record was successfully added.

Remove an Entire Header Record

Method Name Description
bool SamFileHeader::removeHD() Remove the HD record.

Returns true if successfully removed or if it didn't exist in the first place. Returns false if the record still exists.

bool SamFileHeader::removePG() Remove the PG record.

Returns true if successfully removed or if it didn't exist in the first place. Returns false if the record still exists.

bool SamFileHeader::removeSQ(const char* name) Remove the SQ record associated with the specified name.

Returns true if successfully removed or if it didn't exist in the first place. Returns false if the record still exists.

bool SamFileHeader::removeRG(const char* id) Remove the RG record associated with the specified id.

Returns true if successfully removed or if it didn't exist in the first place. Returns false if the record still exists.


Getting fields from the Header

Method Name Description
bool SamFileHeader::getHeaderString(std::string& header) Set the passed in string to the entire header string. Clearing its current contents.

Returns true if successfully set (even if set to "")

const char* SamFileHeader::getTagSO() Return the value of the SO tag. If the field does not exist, "unsorted" is returned.


Proposed Accessors

  • HD
    • getVersion - returns the VN field (will only be one)
  • SQ
    • getRefSequenceCount - count of the number of SQ entries in the header
    • getRefSequenceName - gets the next reference sequence name.
    • getRefSequenceLength - gets the length associated with the specified reference sequence.
  • RG
    • getSampleID - for a specified Read Group....???? but SampleID is the key...maybe passing in a record?
    • getReadGroup - pass in record, return a read group structure?
    • getLibrary - for a given read group
    • getSample - for a given read group
    • getTechnology - for a given read group
    • getPlatformUnit - for a given read group

NOTE: More Get Accessors will be coming. Let me know if you need a specific one, and I can add that first