Difference between revisions of "C++ Class: CigarRoller"

From Genome Analysis Wiki
Jump to: navigation, search
(Public Methods)
(Cigar)
 
(7 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
[[Category:C++]]
 +
[[Category:libStatGen]]
 +
[[Category:libStatGen general]]
 +
 
= Cigar=
 
= Cigar=
This class is part of [[C++ Library: libcsg|libcsg]].
+
This class is part of [[libStatGen: general]].
 
 
This purpose of this class is to provide utilities for processing CIGARs.  It has read-only operators that do not allow modification to the class other than for lazy-evaluation.
 
 
 
== static Methods ==
 
These methods are helpful for determining information about the operator.
 
 
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"
 
! Method Name !!  Description
 
|-
 
| <code>bool Cigar::foundInQuery(Operation op)</code>
 
| Return true if the specified operation is found in the query sequence, false if not.
 
|-
 
| <code>bool Cigar::isClip(Operation op)</code>
 
| Return true if the specified operation is a clipping operation, false if not.
 
|-
 
|}
 
 
 
== Public Methods ==
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"
 
! Method Name !!  Description
 
|-
 
| <code>Cigar::Cigar()</code>
 
| Default constructor initializes as a CIGAR with no operations.
 
|-
 
| <code>void Cigar::getCigarString(String& cigarString)</code>
 
| Sets the passed in String to the string representation of this CIGAR object.
 
|-
 
| <code>void Cigar::getCigarString(std::string& cigarString)</code>
 
| Sets the passed in std::string to the string representation of this CIGAR object.
 
|-
 
| <code>void Cigar::getExpandedString(std::string &s)</code>
 
| Sets the specified string to a string of characters that represent this cigar with no digits (a CIGAR of "3M" would return "MMM")
 
|-
 
| <code>CigarOperator & Cigar::operator [] (int i)</code>
 
| Return the Cigar Operation at the specified index (starting at 0).
 
|-
 
| <code>CigarOperator & Cigar::getOperator(int i)</code>
 
| Return the Cigar Operation at the specified index (starting at 0).
 
|-
 
| <code>bool Cigar::operator == (CigarRoller &rhs)</code>
 
| Returns true if two Cigars are the same (the same operations of the same sizes)
 
|-
 
| <code>int Cigar::size()</code>
 
| Return the number of cigar operations in this object.
 
|-
 
| <code>void Cigar::Dump()</code>
 
| Write this object as a string to cout.
 
|-
 
| <code>int Cigar::getExpectedQueryBaseCount()</code>
 
| Returns the expected read length
 
|-
 
| <code>int Cigar::getExpectedReferenceBaseCount()</code>
 
| Return how many bases in the reference are spanned by the given CIGAR string
 
|-
 
| <code>int Cigar::getNumBeginClips()</code>
 
| Return the number of clips that are at the beginning of the cigar.
 
|-
 
| <code>int Cigar::getNumEndClips()</code>
 
| Return the number of clips that are at the end of the cigar.
 
|-
 
| <code>int32_t Cigar::getRefOffset(int32_t queryIndex)</code>
 
|Return the reference offset associated with the specified query index or INDEX_NA based on this cigar.
 
See [[C++ Class: CigarRoller#Mapping Between Reference and Read/Query|Mapping Between Reference and Read/Query]] for a more detailed explanation with examples as to how it works.
 
|-
 
| <code>int32_t Cigar::getQueryIndex(int32_t refOffset)</code>
 
| Return the query index associated with the specified reference offset or INDEX_NA based on this cigar.
 
See [[C++ Class: CigarRoller#Mapping Between Reference and Read/Query|Mapping Between Reference and Read/Query]] for a more detailed explanation with examples as to how it works.
 
|-
 
| <code>int32_t Cigar::getRefPosition(int32_t queryIndex, int32_t queryStartPos)</code>
 
|Return the reference position associated with the specified query index or INDEX_NA based on this cigar and the specified queryStartPos. 
 
queryStartPops is the leftmost mapping position of the first matching base in the query.
 
 
 
See [[C++ Class: CigarRoller#Mapping Between Reference and Read/Query|Mapping Between Reference and Read/Query]] for a more detailed explanation with examples as to how it works.
 
|-
 
| <code>int32_t Cigar::getQueryIndex(int32_t refPosition, int32_t queryStartPos)</code>
 
| Return the query index associated with the specified reference position and queryStartPos or INDEX_NA based on this cigar.
 
queryStartPops is the leftmost mapping position of the first matching base in the query.
 
 
 
See [[C++ Class: CigarRoller#Mapping Between Reference and Read/Query|Mapping Between Reference and Read/Query]] for a more detailed explanation with examples as to how it works.
 
|-
 
| <code>uint32_t Cigar::getNumOverlaps(int32_t start, int32_t end, int32_t queryStartPos)</code>
 
| Return the number of bases that overlap the reference and the read associated with this cigar that falls within the specified region.
 
start : inclusive start position (reference position) of the region to check for overlaps in. (-1 indicates to start at the beginning of the reference.)
 
 
end  : exclusive end position (reference position) of the region to check for overlaps in. (-1 indicates to go to the end of the reference.)
 
 
 
queryStartPos : leftmost mapping position of the first matching base in the query.
 
 
 
NOTE: ensure that start, end, and queryStartPos are all in the same base (0 or 1).
 
 
 
See [[C++ Class: CigarRoller#Determining the Number of Reference and Read/Query Overlaps|Determining the Number of Reference and Read/Query Overlaps]] for a more detailed explanation with examples as to how it works.
 
 
 
|}
 
 
 
== Overloaded Streaming Operators ==
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"
 
! Method Name !!  Description
 
|-
 
| <code> std::ostream &operator << (std::ostream &stream, const Cigar& cigar)</code>
 
| Writes all of the cigar operations contained in the cigar to the passed in stream.
 
|-
 
| <code> std::ostream &operator << (std::ostream &stream, const Cigar::CigarOperator& o)</code>
 
| Writes the specified cigar operation to the specified stream as <count><char> (3M).
 
|}
 
 
 
 
 
  
== Public Enums ==
+
The purpose of this class is to provide utilities for processing CIGARsIt has read-only operators that do not allow modification to the class other than for lazy-evaluation.
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"
 
! colspan="2"| enum SPACE_TYPE
 
|-
 
! Enum Value !!  Description
 
|-
 
| none
 
| No operation has been specified
 
|-
 
| match
 
| The query sequence and the reference sequence bases are the same for the bases associated with this cigar operation.
 
Both <code>match</code> and <code>mismatch</code> are associated with CIGAR Operation "M"
 
|-
 
| mismatch
 
|  The query sequence and the reference sequence bases are different for the bases associated with this cigar operation, but bases exist in both the query and the reference.
 
Both <code>match</code> and <code>mismatch</code> are associated with CIGAR Operation "M"
 
|-
 
| insert
 
| Insertion to the reference (the query sequence contains bases that have no corresponding base in the reference).
 
Associated with CIGAR Operation "I"
 
|-
 
| del
 
|Deletion from the reference (the reference contains bases that have no corresponding base in the query sequence).
 
Associated with CIGAR Operation "D"
 
|-
 
| skip
 
| Skipped region from the reference (the reference contains bases that have no corresponding base in the query sequence).
 
Associated with CIGAR Operation "N"
 
|-
 
| softClip
 
| Soft clip on the read (clipped sequence present in the query sequence)
 
Associated with CIGAR Operation "S"
 
|-
 
| hardClip
 
| Hard clip on the read (clipped sequence not present in the query sequence)
 
Associated with CIGAR Operation "H"
 
|-
 
|pad
 
| Padding (silent deletion from the padded reference sequence)
 
Associated with CIGAR Operation "P"
 
|}
 
  
 +
See: http://csg.sph.umich.edu//mktrost/doxygen/current/classCigar.html for documentation.
  
== Public Constants ==
+
The static methods are helpful for determining information about the operator.
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"
 
! Constant !! Value !! Description
 
|-
 
| INDEX_NA
 
| -1
 
| Value associated with an index that is not applicable/does not exist.
 
Used for converting between query and reference indexes/offsets when an associated index/offset does not exist.
 
|}
 
  
 +
See [[C++ Class: CigarRoller#Mapping Between Reference and Read/Query|Mapping Between Reference and Read/Query]] for a more detailed explanation with examples as to how the mapping between the read/query works.
  
== Nested Class ==
+
See [[C++ Class: CigarRoller#Determining the Number of Reference and Read/Query Overlaps|Determining the Number of Reference and Read/Query Overlaps]] for a more detailed explanation with examples as to how determining overlaps works.
 
 
=== CigarOperation ===
 
 
 
==== Public Methods ====
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"
 
! Method Name !!  Description
 
|-
 
| <code>CigarOperator::CigarOperator(Operation operation, uint32_t count)</code>
 
| Set the cigar operator with the specified operation and count length.
 
|-
 
| <code>char CigarOperator::getChar()</code>
 
| Returns the character code (M, I, D, N, S, H, or P) associated with this operation.
 
|-
 
| <code>bool CigarOperator::operator == (const CigarOperator &rhs)</code>
 
| Returns true if the passed in operator is the same as this operator, false if not.
 
|-
 
| <code>bool CigarOperator::operator != (const CigarOperator &rhs)</code>
 
| Returns true if the passed in operator is not the same as this operator, false if they are the same.
 
|}
 
  
 
= CigarRoller=
 
= CigarRoller=
This class is part of [[C++ Library: libcsg|libcsg]].
+
This class is part of [[libStatGen: general]].
  
 
The purpose of this class is to provide accessors for setting, updating, modifying the CIGAR object.  It is a child class of Cigar.
 
The purpose of this class is to provide accessors for setting, updating, modifying the CIGAR object.  It is a child class of Cigar.
  
== Public Methods ==
+
See: http://csg.sph.umich.edu//mktrost/doxygen/current/classCigarRoller.html for documentation.
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"
 
! Method Name !!  Description
 
|-
 
| <code>CigarRoller::CigarRoller()</code>
 
| Default constructor initializes as a CIGAR with no operations.
 
|-
 
| <code>CigarRoller::CigarRoller(const char *cigarString)</code>
 
| Constructor that initializes the object with the specified cigarString.
 
|-
 
| <code>CigarRoller & CigarRoller::operator += (CigarRoller &rhs)</code>
 
| Add the contents of the specified CigarRoller to this object.
 
|-
 
| <code>CigarRoller & CigarRoller::operator += (const CigarOperator &rhs)</code>
 
| Append the specified cigar operation to this object.
 
|-
 
| <code>CigarRoller & CigarRoller::operator = (CigarRoller &rhs)</code>
 
| Append the specified cigar operation to this object.
 
|-
 
| <code>void CigarRoller::Add(Operation operation, int count)</code>
 
| Adds the specified operation with the specified count to this object.
 
|-
 
| <code>void CigarRoller::Add(const char *cigarString)</code>
 
| Adds the specified cigarString to this object.
 
|-
 
| <code>void CigarRoller::Add(CigarRoller &rhs)</code>
 
| Adds the specified CIGAR to this object.
 
|-
 
| <code>void CigarRoller::Set(const char *cigarString)</code>
 
| Sets this object to the specified cigarString.
 
|-
 
| <code>void CigarRoller::Set(const uint32_t* cigarBuffer, uint16_t bufferLen)</code>
 
| Sets this object to the BAM formatted cigar found at the beginning of the specified buffer which is bufferLen long.
 
|-
 
| ''' DEPRECATED''' <code>int CigarRoller::getMatchPositionOffset()</code>
 
| DO NOT USE.
 
|-
 
| <code>const char * CigarRoller::getString()</code>
 
| Returns the string representation of this CIGAR object.
 
|-
 
| <code>void CigarRoller::clear()</code>
 
| Clear this object so that it has 0 Cigar Operations.
 
|}
 
 
 
== Overloaded Streaming Operators ==
 
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
 
|-style="background: #f2f2f2; text-align: center;"
 
! Method Name !!  Description
 
|-
 
| <code> std::ostream &operator << (std::ostream &stream, const CigarRoller& roller)</code>
 
| Writes all of the cigar operations contained in this roller to the passed in stream.
 
|}
 
 
 
  
 
= Mapping Between Reference and Read/Query =
 
= Mapping Between Reference and Read/Query =
Line 320: Line 94:
 
  getNumOverlaps(0,5,1) = 4 - with a different start position, this range overlaps the read with 4 bases
 
  getNumOverlaps(0,5,1) = 4 - with a different start position, this range overlaps the read with 4 bases
 
  getNumOverlaps(32,40,32) = 4 - with a different start position, this range overlaps the read with 4 bases
 
  getNumOverlaps(32,40,32) = 4 - with a different start position, this range overlaps the read with 4 bases
 
 
[[Category:C++]]
 
[[Category:libcsg]]
 

Latest revision as of 12:00, 2 February 2017


Cigar

This class is part of libStatGen: general.

The purpose of this class is to provide utilities for processing CIGARs. It has read-only operators that do not allow modification to the class other than for lazy-evaluation.

See: http://csg.sph.umich.edu//mktrost/doxygen/current/classCigar.html for documentation.

The static methods are helpful for determining information about the operator.

See Mapping Between Reference and Read/Query for a more detailed explanation with examples as to how the mapping between the read/query works.

See Determining the Number of Reference and Read/Query Overlaps for a more detailed explanation with examples as to how determining overlaps works.

CigarRoller

This class is part of libStatGen: general.

The purpose of this class is to provide accessors for setting, updating, modifying the CIGAR object. It is a child class of Cigar.

See: http://csg.sph.umich.edu//mktrost/doxygen/current/classCigarRoller.html for documentation.

Mapping Between Reference and Read/Query

int32_t Cigar::getRefOffset(int32_t queryIndex) and int32_t Cigar::getQueryIndex(int32_t refOffset) are used to map between the reference and the read.

The queryIndex is the index in the read - from 0 to (read length - 1). The refOffset is the offset into the reference from the starting position of the read.

For Example:

Reference: ACTGAACCTTGGAAACTGCCGGGGACT
Read: ACTGACTGAAACCATT
CIGAR: 4M10N4M3I2M4D3M
POS: 5

This means it aligns:

Reference: ACTGAACCTTGGAAACTG   CCGGGGACT
Read:      ACTG          ACTGAAACC    ATT

Adding the position:

RefPos:     5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22          23 24 25 26 27 28 29 30 31
Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G           C  C  G  G  G  G  A  C  T
Read:       A  C  T  G                                A  C  T  G  A  A  A  C  C              A  T  T

Adding the offsets:

RefPos:     5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22          23 24 25 26 27 28 29 30 31
refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17          18 19 20 21 22 23 24 25 26
Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G           C  C  G  G  G  G  A  C  T
Read:       A  C  T  G                                A  C  T  G  A  A  A  C  C              A  T  T
queryIndex: 0  1  2  3                                4  5  6  7  8  9 10 11 12             13 14 15

The results of a call to getRefOffset for each value passed in (where NA stands for INDEX_NA):

queryIndex: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16(and any value over 16)
Return:     0  1  2  3 14 15 16 17 NA NA NA 18 19 24 25 26 NA

The results of a call to getQueryIndex for each value passed in (where NA stands for INDEX_NA):

refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27(and any value over 27)
Return:     0  1  2  3 NA NA NA NA NA NA NA NA NA NA  4  5  6  7 11 12 NA NA NA NA 13 14 15 NA

The results of a call to getRefPosition passing in start position 5 (where NA stands for INDEX_NA):

queryIndex: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16(and any value over 16)
Return:     5  6  7  8 19 20 21 22 NA NA NA 23 24 29 30 31 NA

The results of a call to getQueryIndex using refPosition and start position 5 (where NA stands for INDEX_NA):

refPosition:5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32(and any value over 32)
Return:     0  1  2  3 NA NA NA NA NA NA NA NA NA NA  4  5  6  7 11 12 NA NA NA NA 13 14 15 NA


Determining the Number of Reference and Read/Query Overlaps

A useful concept is determining the number of bases that overlap between the reference and the read in a given region.

To do this, use getNumOverlaps, passing in the reference start and end positions for the region as well as the reference position where the read begins. start is inclusive, while end is exclusive.

Using the above example:

RefPos:     5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22          23 24 25 26 27 28 29 30 31
refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17          18 19 20 21 22 23 24 25 26
Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G           C  C  G  G  G  G  A  C  T
Read:       A  C  T  G                                A  C  T  G  A  A  A  C  C              A  T  T
queryIndex: 0  1  2  3                                4  5  6  7  8  9 10 11 12             13 14 15
getNumOverlaps(5,32,5) = 13 - [5, 32) covers the whole read - 13 cigar positions are "M" (found in both the reference and the read)
getNumOverlaps(5,31,5) = 12 - skips the last overlapping position
getNumOverlaps(0,100,5) = 13 - covers the whole read.
getNumOverlaps(-1, -1,5) = 13 - covers the whole read.
getNumOverlaps(-1,10,5) = 4
getNumOverlaps(10,-1,5) = 9
getNumOverlaps(9,19,5) = 0 - all skipped
getNumOverlaps(9,20,5) = 1
getNumOverlaps(9,6,5) = 0 - start is before end
getNumOverlaps(0,5,5) = 0 - outside of read
getNumOverlaps(32,40,5) = 0 - outside of read
getNumOverlaps(0,5,1) = 4 - with a different start position, this range overlaps the read with 4 bases
getNumOverlaps(32,40,32) = 4 - with a different start position, this range overlaps the read with 4 bases