C++ Class: CigarRoller

From Genome Analysis Wiki
Revision as of 18:05, 23 August 2011 by Mktrost (talk | contribs)
Jump to: navigation, search


Cigar

This class is part of C++ Library: libStatGen.

The purpose of this class is to provide utilities for processing CIGARs. It has read-only operators that do not allow modification to the class other than for lazy-evaluation.

See: http://www.sph.umich.edu/csg/mktrost/doxygen/current/classCigar.html for documentation.

The static methods are helpful for determining information about the operator.

See Mapping Between Reference and Read/Query for a more detailed explanation with examples as to how the mapping between the read/query works.

See Determining the Number of Reference and Read/Query Overlaps for a more detailed explanation with examples as to how determining overlaps works.


CigarRoller

This class is part of libcsg.

The purpose of this class is to provide accessors for setting, updating, modifying the CIGAR object. It is a child class of Cigar.

Public Methods

Method Name Description
CigarRoller::CigarRoller() Default constructor initializes as a CIGAR with no operations.
CigarRoller::CigarRoller(const char *cigarString) Constructor that initializes the object with the specified cigarString.
CigarRoller & CigarRoller::operator += (CigarRoller &rhs) Add the contents of the specified CigarRoller to this object.
CigarRoller & CigarRoller::operator += (const CigarOperator &rhs) Append the specified cigar operation to this object.
CigarRoller & CigarRoller::operator = (CigarRoller &rhs) Append the specified cigar operation to this object.
void CigarRoller::Add(Operation operation, int count) Adds the specified operation with the specified count to this object.
void CigarRoller::Add(const char *cigarString) Adds the specified cigarString to this object.
void CigarRoller::Add(CigarRoller &rhs) Adds the specified CIGAR to this object.
void CigarRoller::Set(const char *cigarString) Sets this object to the specified cigarString.
void CigarRoller::Set(const uint32_t* cigarBuffer, uint16_t bufferLen) Sets this object to the BAM formatted cigar found at the beginning of the specified buffer which is bufferLen long.
DEPRECATED int CigarRoller::getMatchPositionOffset() DO NOT USE.
const char * CigarRoller::getString() Returns the string representation of this CIGAR object.
void CigarRoller::clear() Clear this object so that it has 0 Cigar Operations.

Overloaded Streaming Operators

Method Name Description
std::ostream &operator << (std::ostream &stream, const CigarRoller& roller) Writes all of the cigar operations contained in this roller to the passed in stream.


Mapping Between Reference and Read/Query

int32_t Cigar::getRefOffset(int32_t queryIndex) and int32_t Cigar::getQueryIndex(int32_t refOffset) are used to map between the reference and the read.

The queryIndex is the index in the read - from 0 to (read length - 1). The refOffset is the offset into the reference from the starting position of the read.

For Example:

Reference: ACTGAACCTTGGAAACTGCCGGGGACT
Read: ACTGACTGAAACCATT
CIGAR: 4M10N4M3I2M4D3M
POS: 5

This means it aligns:

Reference: ACTGAACCTTGGAAACTG   CCGGGGACT
Read:      ACTG          ACTGAAACC    ATT

Adding the position:

RefPos:     5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22          23 24 25 26 27 28 29 30 31
Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G           C  C  G  G  G  G  A  C  T
Read:       A  C  T  G                                A  C  T  G  A  A  A  C  C              A  T  T

Adding the offsets:

RefPos:     5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22          23 24 25 26 27 28 29 30 31
refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17          18 19 20 21 22 23 24 25 26
Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G           C  C  G  G  G  G  A  C  T
Read:       A  C  T  G                                A  C  T  G  A  A  A  C  C              A  T  T
queryIndex: 0  1  2  3                                4  5  6  7  8  9 10 11 12             13 14 15

The results of a call to getRefOffset for each value passed in (where NA stands for INDEX_NA):

queryIndex: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16(and any value over 16)
Return:     0  1  2  3 14 15 16 17 NA NA NA 18 19 24 25 26 NA

The results of a call to getQueryIndex for each value passed in (where NA stands for INDEX_NA):

refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27(and any value over 27)
Return:     0  1  2  3 NA NA NA NA NA NA NA NA NA NA  4  5  6  7 11 12 NA NA NA NA 13 14 15 NA

The results of a call to getRefPosition passing in start position 5 (where NA stands for INDEX_NA):

queryIndex: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16(and any value over 16)
Return:     5  6  7  8 19 20 21 22 NA NA NA 23 24 29 30 31 NA

The results of a call to getQueryIndex using refPosition and start position 5 (where NA stands for INDEX_NA):

refPosition:5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32(and any value over 32)
Return:     0  1  2  3 NA NA NA NA NA NA NA NA NA NA  4  5  6  7 11 12 NA NA NA NA 13 14 15 NA


Determining the Number of Reference and Read/Query Overlaps

A useful concept is determining the number of bases that overlap between the reference and the read in a given region.

To do this, use getNumOverlaps, passing in the reference start and end positions for the region as well as the reference position where the read begins. start is inclusive, while end is exclusive.

Using the above example:

RefPos:     5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22          23 24 25 26 27 28 29 30 31
refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17          18 19 20 21 22 23 24 25 26
Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G           C  C  G  G  G  G  A  C  T
Read:       A  C  T  G                                A  C  T  G  A  A  A  C  C              A  T  T
queryIndex: 0  1  2  3                                4  5  6  7  8  9 10 11 12             13 14 15
getNumOverlaps(5,32,5) = 13 - [5, 32) covers the whole read - 13 cigar positions are "M" (found in both the reference and the read)
getNumOverlaps(5,31,5) = 12 - skips the last overlapping position
getNumOverlaps(0,100,5) = 13 - covers the whole read.
getNumOverlaps(-1, -1,5) = 13 - covers the whole read.
getNumOverlaps(-1,10,5) = 4
getNumOverlaps(10,-1,5) = 9
getNumOverlaps(9,19,5) = 0 - all skipped
getNumOverlaps(9,20,5) = 1
getNumOverlaps(9,6,5) = 0 - start is before end
getNumOverlaps(0,5,5) = 0 - outside of read
getNumOverlaps(32,40,5) = 0 - outside of read
getNumOverlaps(0,5,1) = 4 - with a different start position, this range overlaps the read with 4 bases
getNumOverlaps(32,40,32) = 4 - with a different start position, this range overlaps the read with 4 bases