C++ Class: SamFileHeader

From Genome Analysis Wiki
Jump to navigationJump to search


SamFileHeader

This class allows a user to get/set the fields in a SAM/BAM Header.

This class is part of C++ Library: libStatGen.

Sam Header Basics

The SamFileHeader is comprised of multiple SamHeaderRecords.

There are 4 types of SAM Header Records:

  1. HD - Header
  2. SQ - Sequence Dictionary
  3. RG - Read Group
  4. PG - Program

A SAM Header Record is comprised of Tag/Value pairs. Each tag only appears once within a specific record.

A SAM Header can have 0 or 1 HD records, 0 or more PG records, 0 or more SQ Records, and 0 or more RG records. The PG records are keyed off of the ID tag. The SQ records are keyed off of the SN, Sequence Name, tag. The RG records are keyed off of the ID, Unique Read Group Identifier, tag. The keys must be unique for that record type within the file.

The SamFileHeader also contains Comments, type CO. They are not included as part of the SamHeaderRecord class since they do not contain Tag/Value pairs.

See: http://csg.sph.umich.edu//mktrost/doxygen/current/classSamFileHeader.html for documentation.

Additional Proposed Accessors

  • HD
    • getVersion - returns the VN field (will only be one)
  • SQ
    • getRefSequenceCount - count of the number of SQ entries in the header
    • getRefSequenceName - gets the next reference sequence name.
    • getRefSequenceLength - gets the length associated with the specified reference sequence.
  • RG
    • getSampleID - for a specified Read Group....???? but SampleID is the key...maybe passing in a record?
    • getReadGroup - pass in record, return a read group structure?
    • getLibrary - for a given read group
    • getSample - for a given read group
    • getTechnology - for a given read group
    • getPlatformUnit - for a given read group

NOTE: More Get Accessors will be coming. Let me know if you need a specific one, and I can add that first