Difference between revisions of "C++ Class: SamFileHeader"

From Genome Analysis Wiki
Jump to navigationJump to search
 
(17 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
[[Category:C++]]
 +
[[Category:libStatGen]]
 +
[[Category:libStatGen BAM]]
 +
 
== SamFileHeader ==
 
== SamFileHeader ==
 
This class allows a user to get/set the fields in a SAM/BAM Header.
 
This class allows a user to get/set the fields in a SAM/BAM Header.
  
=== Setting fields in the Header ===
+
This class is part of [[C++ Library: libStatGen]].
The '''SamFileHeader''' class contains accessors to set the header lines of a SAM/BAM header. By using these set methods to setup the header, they can be pulled back out using the get accessors or the header can be later written to a SAM/BAM file.
+
 
The methods found in the '''SamFileHeader''' class for setting fields are:
+
=== Sam Header Basics ===
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
+
The SamFileHeader is comprised of multiple [http://csg.sph.umich.edu//mktrost/doxygen/current/classSamHeaderRecord.html SamHeaderRecords].
|-style="background: #f2f2f2; text-align: center;"  '''SamFileHeader Class Methods'''
+
 
! Method Name !! Description
+
There are 4 types of SAM Header Records:
|-
+
# HD - Header
| <code>bool SamFileHeader::addHeaderLine(const char* type, const char* tag, int value)</code>
+
# SQ - Sequence Dictionary
| Adds the type, tag, and integer value to the header.
+
# RG - Read Group
Returns true if successfully added, false if not.
+
# PG - Program
NOTE: currently, this method will only do one tag per type on a lineIf a type has multiple tags, then the whole line needs to be added at once.
+
 
|-
+
A SAM Header Record is comprised of Tag/Value pairsEach tag only appears once within a specific record.
| <code>bool SamFileHeader::addHeaderLine(const char* type, const char* tag, const char* value)</code>
 
| Adds the type, tag, and const char* value to the header.
 
Returns true if successfully added, false if not.
 
NOTE: currently, this method will only do one tag per type on a line.  If a type has multiple tags, then the whole line needs to be added at once.
 
|-
 
| <code>bool SamFileHeader::addHeaderLine(const char* headerLine)</code>
 
| Adds the already setup/formatted headerLine to the header.  It is assumed that the line does not contain a “\n”.
 
Returns true if successfully added, false if not.
 
|-
 
|}
 
  
 +
A SAM Header can have 0 or 1 HD records, 0 or more PG records, 0 or more SQ Records, and 0 or more RG records.  The PG records are keyed off of the ID tag.  The SQ records are keyed off of the SN, Sequence Name, tag.  The RG records are keyed off of the ID, Unique Read Group Identifier, tag.  The keys must be unique for that record type within the file.
  
=== Getting fields from the Header ===
+
The '''SamFileHeader''' also contains Comments, type CO.  They are not included as part of the '''SamHeaderRecord''' class since they do not contain Tag/Value pairs.
  
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
+
See: http://csg.sph.umich.edu//mktrost/doxygen/current/classSamFileHeader.html for documentation.
|-style="background: #f2f2f2; text-align: center;"  '''SamFileHeader Class Methods'''
 
! Method Name !!  Description
 
|-
 
| <code>const char* SamFileHeader::getTagSO()</code>
 
| Return the value of the SO tag. If the field does not exist, "unsorted" is returned.
 
|}
 
  
 +
==== Additional Proposed Accessors ====
 +
* HD
 +
** getVersion - returns the VN field (will only be one)
 +
* SQ
 +
** getRefSequenceCount - count of the number of SQ entries in the header
 +
** getRefSequenceName - gets the next reference sequence name.
 +
** getRefSequenceLength - gets the length associated with the specified reference sequence.
 +
* RG
 +
** getSampleID - for a specified Read Group....???? but SampleID is the key...maybe passing in a record?
 +
** getReadGroup - pass in record, return a read group structure?
 +
** getLibrary - for a given read group
 +
** getSample - for a given read group
 +
** getTechnology - for a given read group
 +
** getPlatformUnit - for a given read group
 
'''NOTE: More Get Accessors will be coming.  Let me know if you need a specific one, and I can add that first'''
 
'''NOTE: More Get Accessors will be coming.  Let me know if you need a specific one, and I can add that first'''

Latest revision as of 11:05, 2 February 2017


SamFileHeader

This class allows a user to get/set the fields in a SAM/BAM Header.

This class is part of C++ Library: libStatGen.

Sam Header Basics

The SamFileHeader is comprised of multiple SamHeaderRecords.

There are 4 types of SAM Header Records:

  1. HD - Header
  2. SQ - Sequence Dictionary
  3. RG - Read Group
  4. PG - Program

A SAM Header Record is comprised of Tag/Value pairs. Each tag only appears once within a specific record.

A SAM Header can have 0 or 1 HD records, 0 or more PG records, 0 or more SQ Records, and 0 or more RG records. The PG records are keyed off of the ID tag. The SQ records are keyed off of the SN, Sequence Name, tag. The RG records are keyed off of the ID, Unique Read Group Identifier, tag. The keys must be unique for that record type within the file.

The SamFileHeader also contains Comments, type CO. They are not included as part of the SamHeaderRecord class since they do not contain Tag/Value pairs.

See: http://csg.sph.umich.edu//mktrost/doxygen/current/classSamFileHeader.html for documentation.

Additional Proposed Accessors

  • HD
    • getVersion - returns the VN field (will only be one)
  • SQ
    • getRefSequenceCount - count of the number of SQ entries in the header
    • getRefSequenceName - gets the next reference sequence name.
    • getRefSequenceLength - gets the length associated with the specified reference sequence.
  • RG
    • getSampleID - for a specified Read Group....???? but SampleID is the key...maybe passing in a record?
    • getReadGroup - pass in record, return a read group structure?
    • getLibrary - for a given read group
    • getSample - for a given read group
    • getTechnology - for a given read group
    • getPlatformUnit - for a given read group

NOTE: More Get Accessors will be coming. Let me know if you need a specific one, and I can add that first