Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2,531 bytes removed ,  12:00, 2 February 2017
Line 1: Line 1: −
= CigarRoller=
+
[[Category:C++]]
This class is part of [[C++ Library: libcsg|libcsg]].
+
[[Category:libStatGen]]
 +
[[Category:libStatGen general]]
   −
This purpose of this class is to provide utilities for creating and processing CIGAR strings.
+
= Cigar=
 +
This class is part of [[libStatGen: general]].
   −
== Public Methods ==
+
The purpose of this class is to provide utilities for processing CIGARs. It has read-only operators that do not allow modification to the class other than for lazy-evaluation.
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
  −
|-style="background: #f2f2f2; text-align: center;"
  −
! Method Name !!  Description
  −
|-
  −
| <code>CigarRoller::CigarRoller()</code>
  −
| Default constructor initializes as a CIGAR with no operations.
  −
|-
  −
| <code>CigarRoller::CigarRoller(const char *cigarString)</code>
  −
| Constructor that initializes the object with the specified cigarString.
  −
|-
  −
| <code>CigarRoller & CigarRoller::operator += (CigarRoller &rhs)</code>
  −
| Add the contents of the specified CigarRoller to this object.
  −
|-
  −
| <code>CigarRoller & CigarRoller::operator += (CigarOperator &rhs)</code>
  −
| Append the specified cigar operation to this object.
  −
|-
  −
| <code>void CigarRoller::Add(Operation operation, int count)</code>
  −
| Adds the specified operation with the specified count to this object.
  −
|-
  −
| <code>void CigarRoller::Add(const char *cigarString)</code>
  −
| Adds the specified cigarString to this object.
  −
|-
  −
| <code>void CigarRoller::Set(const char *cigarString)</code>
  −
| Sets this object to the specified cigarString.
  −
|-
  −
| ''' DEPRECATED''' <code>int CigarRoller::getMatchPositionOffset()</code>
  −
| DO NOT USE.
  −
|-
  −
| <code>const char * CigarRoller::getString()</code>
  −
| REturns the string representation of this CIGAR object.
  −
|-
  −
| <code>void CigarRoller::getExpandedString(std::string &s)</code>
  −
| Sets the specified string to a string of characters that represent this cigar with no digits (a CIGAR of "3M" would return "MMM")
  −
|-
  −
| <code>void CigarRoller::clear()</code>
  −
| Clear this object so that it has 0 Cigar Operations.
  −
|-
  −
| <code>CigarOperator & CigarRoller::operator [] (int i)</code>
  −
| Return the Cigar Operation at the specified index (starting at 0).
  −
|-
  −
| <code>bool CigarRoller::operator == (CigarRoller &rhs)</code>
  −
| Returns true if two Cigar Rollers are the same (the same operations of the same sizes)
  −
|-
  −
| <code>int CigarRoller::size()</code>
  −
| Return the number of cigar operations in this object.
  −
|-
  −
| <code>void CigarRoller::Dump()</code>
  −
| Write this object as a string to cout.
  −
|-
  −
| <code>int CigarRoller::getExpectedQueryBaseCount()</code>
  −
| Returns the expected read length
  −
|-
  −
| <code>int CigarRoller::getExpectedReferenceBaseCount()</code>
  −
| Return how many bases in the reference are spanned by the given CIGAR string
  −
|-
  −
| <code>int32_t CigarRoller::getRefOffset(int32_t queryIndex)</code>
  −
|Return the reference offset associated with the specified query index or INDEX_NA based on this cigar.
  −
|-
  −
| <code>int32_t CigarRoller::getQueryIndex(int32_t refOffset)</code>
  −
| Return the query index associated with the specified reference offset or INDEX_NA based on this cigar.
  −
|}
      +
See: http://csg.sph.umich.edu//mktrost/doxygen/current/classCigar.html for documentation.
   −
== Overloaded Streaming Operators ==
+
The static methods are helpful for determining information about the operator.
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
  −
|-style="background: #f2f2f2; text-align: center;"
  −
! Method Name !!  Description
  −
|-
  −
| <code> std::ostream &operator << (std::ostream &stream, const CigarRoller& roller)</code>
  −
| Writes all of the cigar operations contained in this roller to the passed in stream.
  −
|-
  −
| <code> std::ostream &operator << (std::ostream &stream, const CigarRoller::CigarOperator& o)</code>
  −
| Writes the specified cigar operation to the specified stream as <count><char> (3M).
  −
|}
      +
See [[C++ Class: CigarRoller#Mapping Between Reference and Read/Query|Mapping Between Reference and Read/Query]] for a more detailed explanation with examples as to how the mapping between the read/query works.
    +
See [[C++ Class: CigarRoller#Determining the Number of Reference and Read/Query Overlaps|Determining the Number of Reference and Read/Query Overlaps]] for a more detailed explanation with examples as to how determining overlaps works.
   −
== Public Enums ==
+
= CigarRoller=
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
+
This class is part of [[libStatGen: general]].
|-style="background: #f2f2f2; text-align: center;"
  −
! colspan="2"| enum SPACE_TYPE
  −
|-
  −
! Enum Value !!  Description
  −
|-
  −
| none
  −
| No operation has been specified
  −
|-
  −
| match
  −
| The query sequence and the reference sequence bases are the same for the bases associated with this cigar operation.
  −
Both <code>match</code> and <code>mismatch</code> are associated with CIGAR Operation "M"
  −
|-
  −
| mismatch
  −
|  The query sequence and the reference sequence bases are different for the bases associated with this cigar operation, but bases exist in both the query and the reference.
  −
Both <code>match</code> and <code>mismatch</code> are associated with CIGAR Operation "M"
  −
|-
  −
| insert
  −
| Insertion to the reference (the query sequence contains bases that have no corresponding base in the reference).
  −
Associated with CIGAR Operation "I"
  −
|-
  −
| del
  −
|Deletion from the reference (the reference contains bases that have no corresponding base in the query sequence).
  −
Associated with CIGAR Operation "D"
  −
|-
  −
| skip
  −
| Skipped region from the reference  (the reference contains bases that have no corresponding base in the query sequence).
  −
Associated with CIGAR Operation "N"
  −
|-
  −
| softClip
  −
| Soft clip on the read (clipped sequence present in the query sequence)
  −
Associated with CIGAR Operation "S"
  −
|-
  −
| hardClip
  −
| Hard clip on the read (clipped sequence not present in the query sequence)
  −
Associated with CIGAR Operation "H"
  −
|-
  −
|pad
  −
| Padding (silent deletion from the padded reference sequence)
  −
Associated with CIGAR Operation "P"
  −
|}
  −
 
  −
 
  −
== Public Constants ==
  −
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
  −
|-style="background: #f2f2f2; text-align: center;"
  −
! Constant !! Value !! Description
  −
|-
  −
| INDEX_NA
  −
| -1
  −
| Value associated with an index that is not applicable/does not exist.
  −
Used for converting between query and reference indexes/offsets when an associated index/offset does not exist.
  −
|}
  −
 
  −
 
  −
== Nested Class ==
  −
 
  −
=== CigarOperation ===
     −
==== Public Methods ====
+
The purpose of this class is to provide accessors for setting, updating, modifying the CIGAR object. It is a child class of Cigar.
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
  −
|-style="background: #f2f2f2; text-align: center;"
  −
! Method Name !!  Description
  −
|-
  −
| <code>CigarOperator::CigarOperator(Operation operation, uint32_t count)</code>
  −
| Set the cigar operator with the specified operation and count length.
  −
|-
  −
| <code>char CigarOperator::getChar()</code>
  −
| Returns the character code (M, I, D, N, S, H, or P) associated with this operation.
  −
|-
  −
| <code>bool CigarOperator::operator == (CigarOperator &rhs)</code>
  −
| Returns true if the passed in operator is the same as this operator, false if not.
  −
|-
  −
| <code>bool CigarOperator::operator != (CigarOperator &rhs)</code>
  −
| Returns true if the passed in operator is not the same as this operator, false if they are the same.
  −
|}
      +
See: http://csg.sph.umich.edu//mktrost/doxygen/current/classCigarRoller.html for documentation.
   −
== Mapping Between Reference and Read/Query ==
+
= Mapping Between Reference and Read/Query =
<code>int32_t CigarRoller::getRefOffset(int32_t queryIndex)</code> and <code>int32_t CigarRoller::getQueryIndex(int32_t refOffset)</code> are used to map between the reference and the read.
+
<code>int32_t Cigar::getRefOffset(int32_t queryIndex)</code> and <code>int32_t Cigar::getQueryIndex(int32_t refOffset)</code> are used to map between the reference and the read.
    
The queryIndex is the index in the read - from 0 to (read length - 1).
 
The queryIndex is the index in the read - from 0 to (read length - 1).
Line 168: Line 30:     
For Example:
 
For Example:
  Reference: ACTGAACCTTGGAAACTG
+
  Reference: ACTGAACCTTGGAAACTGCCGGGGACT
  Read: ACTGACTG
+
  Read: ACTGACTGAAACCATT
  CIGAR: 4M10N4M
+
  CIGAR: 4M10N4M3I2M4D3M
 
  POS: 5
 
  POS: 5
    
This means it aligns:
 
This means it aligns:
  Reference: ACTGAACCTTGGAAACTG
+
  Reference: ACTGAACCTTGGAAACTG   CCGGGGACT
  Read:      ACTG          ACTG
+
  Read:      ACTG          ACTGAAACC    ATT
    
Adding the position:
 
Adding the position:
  RefPos:    5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
+
  RefPos:    5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22         23 24 25 26 27 28 29 30 31
  Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G
+
  Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G           C  C  G  G  G  G  A  C  T
  Read:      A  C  T  G                                A  C  T  G
+
  Read:      A  C  T  G                                A  C  T  G A  A  A  C  C              A  T  T
    
Adding the offsets:
 
Adding the offsets:
  RefPos:    5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22
+
  RefPos:    5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22         23 24 25 26 27 28 29 30 31
  refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
+
  refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17         18 19 20 21 22 23 24 25 26
  Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G
+
  Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G           C  C  G  G  G  G  A  C  T
  Read:      A  C  T  G                                A  C  T  G
+
  Read:      A  C  T  G                                A  C  T  G A  A  A  C  C              A  T  T
  queryIndex: 0  1  2  3                                4  5  6  7
+
  queryIndex: 0  1  2  3                                4  5  6  7 8  9 10 11 12            13 14 15
    
The results of a call to getRefOffset for each value passed in (where NA stands for INDEX_NA):
 
The results of a call to getRefOffset for each value passed in (where NA stands for INDEX_NA):
  queryIndex: 0  1  2  3  4  5  6  7  8(and any value over 8)
+
  queryIndex: 0  1  2  3  4  5  6  7  8 9 10 11 12 13 14 15 16(and any value over 16)
  Return:    0  1  2  3 14 15 16 17 NA
+
  Return:    0  1  2  3 14 15 16 17 NA NA NA 18 19 24 25 26 NA
    
The results of a call to getQueryIndex for each value passed in (where NA stands for INDEX_NA):
 
The results of a call to getQueryIndex for each value passed in (where NA stands for INDEX_NA):
  refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18(and any value over 18)
+
  refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27(and any value over 27)
  Return:    0  1  2  3 NA NA NA NA NA NA NA NA NA NA  4  5  6  7 NA
+
Return:    0  1  2  3 NA NA NA NA NA NA NA NA NA NA  4  5  6  7 11 12 NA NA NA NA 13 14 15 NA
 +
 
 +
The results of a call to getRefPosition passing in start position 5 (where NA stands for INDEX_NA):
 +
queryIndex: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16(and any value over 16)
 +
Return:    5  6  7  8 19 20 21 22 NA NA NA 23 24 29 30 31 NA
 +
 
 +
The results of a call to getQueryIndex using refPosition and start position 5 (where NA stands for INDEX_NA):
 +
refPosition:5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32(and any value over 32)
 +
  Return:    0  1  2  3 NA NA NA NA NA NA NA NA NA NA  4  5  6  7 11 12 NA NA NA NA 13 14 15 NA
 +
 
 +
 
 +
== Determining the Number of Reference and Read/Query Overlaps ==
 +
 
 +
A useful concept is determining the number of bases that overlap between the reference and the read in a given region.
 +
 
 +
To do this, use <code>getNumOverlaps</code>, passing in the reference start and end positions for the region as well as the reference position where the read begins.  start is inclusive, while end is exclusive.
 +
 
 +
Using the above example:
 +
RefPos:    5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22          23 24 25 26 27 28 29 30 31
 +
refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17          18 19 20 21 22 23 24 25 26
 +
Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G          C  C  G  G  G  G  A  C  T
 +
Read:      A  C  T  G                                A  C  T  G  A  A  A  C  C              A  T  T
 +
queryIndex: 0  1  2  3                                4  5  6  7  8  9 10 11 12            13 14 15
 +
 
 +
getNumOverlaps(5,32,5) = 13 - [5, 32) covers the whole read - 13 cigar positions are "M" (found in both the reference and the read)
 +
getNumOverlaps(5,31,5) = 12 - skips the last overlapping position
 +
getNumOverlaps(0,100,5) = 13 - covers the whole read.
 +
getNumOverlaps(-1, -1,5) = 13 - covers the whole read.
 +
getNumOverlaps(-1,10,5) = 4
 +
getNumOverlaps(10,-1,5) = 9
 +
getNumOverlaps(9,19,5) = 0 - all skipped
 +
getNumOverlaps(9,20,5) = 1
 +
getNumOverlaps(9,6,5) = 0 - start is before end
 +
getNumOverlaps(0,5,5) = 0 - outside of read
 +
getNumOverlaps(32,40,5) = 0 - outside of read
 +
getNumOverlaps(0,5,1) = 4 - with a different start position, this range overlaps the read with 4 bases
 +
getNumOverlaps(32,40,32) = 4 - with a different start position, this range overlaps the read with 4 bases
96

edits

Navigation menu