Difference between revisions of "C++ Class: CigarRoller"

From Genome Analysis Wiki
Jump to: navigation, search
(Mapping Between Reference and Read/Query)
Line 182: Line 182:
  
 
For Example:
 
For Example:
  Reference: ACTGAACCTTGGAAACTG
+
  Reference: ACTGAACCTTGGAAACTGCCGGGGACT
  Read: ACTGACTG
+
  Read: ACTGACTGAAACCACT
  CIGAR: 4M10N4M
+
  CIGAR: 4M10N4M3I2M4D3M
 
  POS: 5
 
  POS: 5
  
 
This means it aligns:
 
This means it aligns:
  Reference: ACTGAACCTTGGAAACTG
+
  Reference: ACTGAACCTTGGAAACTG   CCGGGGACT
  Read:      ACTG          ACTG
+
  Read:      ACTG          ACTGAAACC    ACT
  
 
Adding the position:
 
Adding the position:
  RefPos:    5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
+
  RefPos:    5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22         23 24 25 26 27 28 29 30 31
  Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G
+
  Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G           C  C  G  G  G  G  A  C  T
  Read:      A  C  T  G                                A  C  T  G
+
  Read:      A  C  T  G                                A  C  T  G A  A  A  C  C              A  C  T
  
 
Adding the offsets:
 
Adding the offsets:
  RefPos:    5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
+
  RefPos:    5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22         23 24 25 26 27 28 29 30 31
  refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
+
  refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17         18 19 20 21 22 23 24 25 26
  Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G
+
  Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G           C  C  G  G  G  G  A  C  T
  Read:      A  C  T  G                                A  C  T  G
+
  Read:      A  C  T  G                                A  C  T  G A  A  A  C  C              A  C  T
  queryIndex: 0  1  2  3                                4  5  6  7
+
  queryIndex: 0  1  2  3                                4  5  6  7 8  9 10 11 12            13 14 15
  
 
The results of a call to getRefOffset for each value passed in (where NA stands for INDEX_NA):
 
The results of a call to getRefOffset for each value passed in (where NA stands for INDEX_NA):
  queryIndex: 0  1  2  3  4  5  6  7  8(and any value over 8)
+
  queryIndex: 0  1  2  3  4  5  6  7  8 9 10 11 12 13 14 15 16(and any value over 16)
  Return:    0  1  2  3 14 15 16 17 NA
+
  Return:    0  1  2  3 14 15 16 17 NA NA NA 18 19 24 25 26 NA
  
 
The results of a call to getQueryIndex for each value passed in (where NA stands for INDEX_NA):
 
The results of a call to getQueryIndex for each value passed in (where NA stands for INDEX_NA):
  refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18(and any value over 18)
+
  refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27(and any value over 27)
  Return:    0  1  2  3 NA NA NA NA NA NA NA NA NA NA  4  5  6  7 NA
+
Return:    0  1  2  3 NA NA NA NA NA NA NA NA NA NA  4  5  6  7 11 12 NA NA NA NA 13 14 15 NA
 +
 
 +
The results of a call to getRefPosition passing in start position 5 (where NA stands for INDEX_NA):
 +
queryIndex: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16(and any value over 16)
 +
Return:    5  6  7  8 19 20 21 22 NA NA NA 23 24 29 30 31 NA
 +
 
 +
The results of a call to getQueryIndex using refPosition and start position 5 (where NA stands for INDEX_NA):
 +
refPosition:5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32(and any value over 32)
 +
  Return:    0  1  2  3 NA NA NA NA NA NA NA NA NA NA  4  5  6  7 11 12 NA NA NA NA 13 14 15 NA

Revision as of 11:51, 3 August 2010

CigarRoller

This class is part of libcsg.

This purpose of this class is to provide utilities for creating and processing CIGAR strings.

Public Methods

Method Name Description
CigarRoller::CigarRoller() Default constructor initializes as a CIGAR with no operations.
CigarRoller::CigarRoller(const char *cigarString) Constructor that initializes the object with the specified cigarString.
CigarRoller & CigarRoller::operator += (CigarRoller &rhs) Add the contents of the specified CigarRoller to this object.
CigarRoller & CigarRoller::operator += (CigarOperator &rhs) Append the specified cigar operation to this object.
void CigarRoller::Add(Operation operation, int count) Adds the specified operation with the specified count to this object.
void CigarRoller::Add(const char *cigarString) Adds the specified cigarString to this object.
void CigarRoller::Set(const char *cigarString) Sets this object to the specified cigarString.
DEPRECATED int CigarRoller::getMatchPositionOffset() DO NOT USE.
const char * CigarRoller::getString() REturns the string representation of this CIGAR object.
void CigarRoller::getExpandedString(std::string &s) Sets the specified string to a string of characters that represent this cigar with no digits (a CIGAR of "3M" would return "MMM")
void CigarRoller::clear() Clear this object so that it has 0 Cigar Operations.
CigarOperator & CigarRoller::operator [] (int i) Return the Cigar Operation at the specified index (starting at 0).
bool CigarRoller::operator == (CigarRoller &rhs) Returns true if two Cigar Rollers are the same (the same operations of the same sizes)
int CigarRoller::size() Return the number of cigar operations in this object.
void CigarRoller::Dump() Write this object as a string to cout.
int CigarRoller::getExpectedQueryBaseCount() Returns the expected read length
int CigarRoller::getExpectedReferenceBaseCount() Return how many bases in the reference are spanned by the given CIGAR string
int32_t CigarRoller::getRefOffset(int32_t queryIndex) Return the reference offset associated with the specified query index or INDEX_NA based on this cigar.

See Mapping Between Reference and Read/Query for a more detailed explanation with examples as to how it works.

int32_t CigarRoller::getQueryIndex(int32_t refOffset) Return the query index associated with the specified reference offset or INDEX_NA based on this cigar.

See Mapping Between Reference and Read/Query for a more detailed explanation with examples as to how it works.

int32_t CigarRoller::getRefPosition(int32_t queryIndex, int32_t queryStartPos) Return the reference position associated with the specified query index or INDEX_NA based on this cigar and the specified queryStartPos.

queryStartPops is the leftmost mapping position of the first matching base in the query.

See Mapping Between Reference and Read/Query for a more detailed explanation with examples as to how it works.

int32_t CigarRoller::getQueryIndex(int32_t refPosition, int32_t queryStartPos) Return the query index associated with the specified reference position and queryStartPos or INDEX_NA based on this cigar.

queryStartPops is the leftmost mapping position of the first matching base in the query.

See Mapping Between Reference and Read/Query for a more detailed explanation with examples as to how it works.


Overloaded Streaming Operators

Method Name Description
std::ostream &operator << (std::ostream &stream, const CigarRoller& roller) Writes all of the cigar operations contained in this roller to the passed in stream.
std::ostream &operator << (std::ostream &stream, const CigarRoller::CigarOperator& o) Writes the specified cigar operation to the specified stream as <count><char> (3M).


Public Enums

enum SPACE_TYPE
Enum Value Description
none No operation has been specified
match The query sequence and the reference sequence bases are the same for the bases associated with this cigar operation.

Both match and mismatch are associated with CIGAR Operation "M"

mismatch The query sequence and the reference sequence bases are different for the bases associated with this cigar operation, but bases exist in both the query and the reference.

Both match and mismatch are associated with CIGAR Operation "M"

insert Insertion to the reference (the query sequence contains bases that have no corresponding base in the reference).

Associated with CIGAR Operation "I"

del Deletion from the reference (the reference contains bases that have no corresponding base in the query sequence).

Associated with CIGAR Operation "D"

skip Skipped region from the reference (the reference contains bases that have no corresponding base in the query sequence).

Associated with CIGAR Operation "N"

softClip Soft clip on the read (clipped sequence present in the query sequence)

Associated with CIGAR Operation "S"

hardClip Hard clip on the read (clipped sequence not present in the query sequence)

Associated with CIGAR Operation "H"

pad Padding (silent deletion from the padded reference sequence)

Associated with CIGAR Operation "P"


Public Constants

Constant Value Description
INDEX_NA -1 Value associated with an index that is not applicable/does not exist.

Used for converting between query and reference indexes/offsets when an associated index/offset does not exist.


Nested Class

CigarOperation

Public Methods

Method Name Description
CigarOperator::CigarOperator(Operation operation, uint32_t count) Set the cigar operator with the specified operation and count length.
char CigarOperator::getChar() Returns the character code (M, I, D, N, S, H, or P) associated with this operation.
bool CigarOperator::operator == (CigarOperator &rhs) Returns true if the passed in operator is the same as this operator, false if not.
bool CigarOperator::operator != (CigarOperator &rhs) Returns true if the passed in operator is not the same as this operator, false if they are the same.


Mapping Between Reference and Read/Query

int32_t CigarRoller::getRefOffset(int32_t queryIndex) and int32_t CigarRoller::getQueryIndex(int32_t refOffset) are used to map between the reference and the read.

The queryIndex is the index in the read - from 0 to (read length - 1). The refOffset is the offset into the reference from the starting position of the read.

For Example:

Reference: ACTGAACCTTGGAAACTGCCGGGGACT
Read: ACTGACTGAAACCACT
CIGAR: 4M10N4M3I2M4D3M
POS: 5

This means it aligns:

Reference: ACTGAACCTTGGAAACTG   CCGGGGACT
Read:      ACTG          ACTGAAACC    ACT

Adding the position:

RefPos:     5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22          23 24 25 26 27 28 29 30 31
Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G           C  C  G  G  G  G  A  C  T
Read:       A  C  T  G                                A  C  T  G  A  A  A  C  C              A  C  T

Adding the offsets:

RefPos:     5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22          23 24 25 26 27 28 29 30 31
refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17          18 19 20 21 22 23 24 25 26
Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G           C  C  G  G  G  G  A  C  T
Read:       A  C  T  G                                A  C  T  G  A  A  A  C  C              A  C  T
queryIndex: 0  1  2  3                                4  5  6  7  8  9 10 11 12             13 14 15

The results of a call to getRefOffset for each value passed in (where NA stands for INDEX_NA):

queryIndex: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16(and any value over 16)
Return:     0  1  2  3 14 15 16 17 NA NA NA 18 19 24 25 26 NA

The results of a call to getQueryIndex for each value passed in (where NA stands for INDEX_NA):

refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27(and any value over 27)
Return:     0  1  2  3 NA NA NA NA NA NA NA NA NA NA  4  5  6  7 11 12 NA NA NA NA 13 14 15 NA

The results of a call to getRefPosition passing in start position 5 (where NA stands for INDEX_NA):

queryIndex: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16(and any value over 16)
Return:     5  6  7  8 19 20 21 22 NA NA NA 23 24 29 30 31 NA

The results of a call to getQueryIndex using refPosition and start position 5 (where NA stands for INDEX_NA):

refPosition:5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32(and any value over 32)
Return:     0  1  2  3 NA NA NA NA NA NA NA NA NA NA  4  5  6  7 11 12 NA NA NA NA 13 14 15 NA