Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,616 bytes added ,  12:00, 2 February 2017
Line 1: Line 1:  +
[[Category:C++]]
 +
[[Category:libStatGen]]
 +
[[Category:libStatGen general]]
 +
 +
= Cigar=
 +
This class is part of [[libStatGen: general]].
 +
 +
The purpose of this class is to provide utilities for processing CIGARs.  It has read-only operators that do not allow modification to the class other than for lazy-evaluation.
 +
 +
See: http://csg.sph.umich.edu//mktrost/doxygen/current/classCigar.html for documentation.
 +
 +
The static methods are helpful for determining information about the operator.
 +
 +
See [[C++ Class: CigarRoller#Mapping Between Reference and Read/Query|Mapping Between Reference and Read/Query]] for a more detailed explanation with examples as to how the mapping between the read/query works.
 +
 +
See [[C++ Class: CigarRoller#Determining the Number of Reference and Read/Query Overlaps|Determining the Number of Reference and Read/Query Overlaps]] for a more detailed explanation with examples as to how determining overlaps works.
 +
 
= CigarRoller=
 
= CigarRoller=
This class is part of [[C++ Library: libcsg|libcsg]].
+
This class is part of [[libStatGen: general]].
 +
 
 +
The purpose of this class is to provide accessors for setting, updating, modifying the CIGAR object.  It is a child class of Cigar.
 +
 
 +
See: http://csg.sph.umich.edu//mktrost/doxygen/current/classCigarRoller.html for documentation.
 +
 
 +
= Mapping Between Reference and Read/Query =
 +
<code>int32_t Cigar::getRefOffset(int32_t queryIndex)</code> and <code>int32_t Cigar::getQueryIndex(int32_t refOffset)</code> are used to map between the reference and the read.
 +
 
 +
The queryIndex is the index in the read - from 0 to (read length - 1).
 +
The refOffset is the offset into the reference from the starting position of the read.
 +
 
 +
For Example:
 +
Reference: ACTGAACCTTGGAAACTGCCGGGGACT
 +
Read: ACTGACTGAAACCATT
 +
CIGAR: 4M10N4M3I2M4D3M
 +
POS: 5
   −
This purpose of this class is to provide utilities for creating and processing CIGAR strings.
+
This means it aligns:
 +
Reference: ACTGAACCTTGGAAACTG  CCGGGGACT
 +
Read:      ACTG          ACTGAAACC    ATT
   −
== Public Methods ==
+
Adding the position:
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
+
RefPos:     5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22          23 24 25 26 27 28 29 30 31
|-style="background: #f2f2f2; text-align: center;"
+
Reference: A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G          C  C  G  G  G  G  A  C  T
! Method Name !! Description
+
  Read:       A  C  T  G                                A  C  T  G  A  A  A  C  C              A  T  T
|-
  −
| void BaseAsciiMap::setBaseMapType(SPACE_TYPE spaceType)
  −
| Set the base type based on the passed in option.
  −
|}
      +
Adding the offsets:
 +
RefPos:    5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22          23 24 25 26 27 28 29 30 31
 +
refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17          18 19 20 21 22 23 24 25 26
 +
Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G          C  C  G  G  G  G  A  C  T
 +
Read:      A  C  T  G                                A  C  T  G  A  A  A  C  C              A  T  T
 +
queryIndex: 0  1  2  3                                4  5  6  7  8  9 10 11 12            13 14 15
   −
== Overloaded Streaming Operators ==
+
The results of a call to getRefOffset for each value passed in (where NA stands for INDEX_NA):
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
+
queryIndex: 0 2  3  4  5  6  7  8  9 10 11 12 13 14 15 16(and any value over 16)
|-style="background: #f2f2f2; text-align: center;"
+
Return:    0  1  2  3 14 15 16 17 NA NA NA 18 19 24 25 26 NA
! Method Name !! Description
  −
|-
  −
| <code> std::ostream &operator << (std::ostream &stream, const CigarRoller& roller)</code>
  −
| Writes all of the cigar operations contained in this roller to the passed in stream.
  −
|}
      +
The results of a call to getQueryIndex for each value passed in (where NA stands for INDEX_NA):
 +
refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27(and any value over 27)
 +
Return:    0  1  2  3 NA NA NA NA NA NA NA NA NA NA  4  5  6  7 11 12 NA NA NA NA 13 14 15 NA
    +
The results of a call to getRefPosition passing in start position 5 (where NA stands for INDEX_NA):
 +
queryIndex: 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16(and any value over 16)
 +
Return:    5  6  7  8 19 20 21 22 NA NA NA 23 24 29 30 31 NA
   −
== Public Enums ==
+
The results of a call to getQueryIndex using refPosition and start position 5 (where NA stands for INDEX_NA):
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
+
refPosition:5  6  7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32(and any value over 32)
|-style="background: #f2f2f2; text-align: center;"
+
  Return:    0  1  2  3 NA NA NA NA NA NA NA NA NA NA  4  5  6  7 11 12 NA NA NA NA 13 14 15 NA
! colspan="2"| enum SPACE_TYPE
  −
|-
  −
! Enum Value !! Description
  −
|-
  −
| none
  −
| No operation has been specified
  −
|-
  −
| match
  −
| The query sequence and the reference sequence bases are the same for the bases associated with this cigar operation.
  −
Both <code>match</code> and <code>mismatch</code> are associated with CIGAR Operation "M"
  −
|-
  −
| mismatch
  −
| The query sequence and the reference sequence bases are different for the bases associated with this cigar operation, but bases exist in both the query and the reference.
  −
Both <code>match</code> and <code>mismatch</code> are associated with CIGAR Operation "M"
  −
|-
  −
| insert
  −
| Insertion to the reference (the query sequence contains bases that have no corresponding base in the reference).
  −
Associated with CIGAR Operation "I"
  −
|-
  −
| del
  −
|Deletion from the reference (the reference contains bases that have no corresponding base in the query sequence).
  −
Associated with CIGAR Operation "D"
  −
|-
  −
| skip
  −
| Skipped region from the reference (the reference contains bases that have no corresponding base in the query sequence).
  −
Associated with CIGAR Operation "N"
  −
|-
  −
| softClip
  −
| Soft clip on the read (clipped sequence present in the query sequence)
  −
Associated with CIGAR Operation "S"
  −
|-
  −
| hardClip
  −
| Hard clip on the read (clipped sequence not present in the query sequence)
  −
Associated with CIGAR Operation "H"
  −
|-
  −
|pad
  −
| Padding (silent deletion from the padded reference sequence)
  −
Associated with CIGAR Operation "P"
  −
|}
        −
== Public Constants ==
+
== Determining the Number of Reference and Read/Query Overlaps ==
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
  −
|-style="background: #f2f2f2; text-align: center;"
  −
! Constant !! Value !! Description
  −
|-
  −
| INDEX_NA
  −
| -1
  −
| Value associated with an index that is not applicable/does not exist.
  −
Used for converting between query and reference indexes/offsets when an associated index/offset does not exist.
  −
|}
      +
A useful concept is determining the number of bases that overlap between the reference and the read in a given region.
   −
== Nested Class ==
+
To do this, use <code>getNumOverlaps</code>, passing in the reference start and end positions for the region as well as the reference position where the read begins.  start is inclusive, while end is exclusive.
   −
=== CigarOperation ===
+
Using the above example:
 +
RefPos:    5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22          23 24 25 26 27 28 29 30 31
 +
refOffset:  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17          18 19 20 21 22 23 24 25 26
 +
Reference:  A  C  T  G  A  A  C  C  T  T  G  G  A  A  A  C  T  G          C  C  G  G  G  G  A  C  T
 +
Read:      A  C  T  G                                A  C  T  G  A  A  A  C  C              A  T  T
 +
queryIndex: 0  1  2  3                                4  5  6  7  8  9 10 11 12            13 14 15
   −
==== Public Methods ====
+
getNumOverlaps(5,32,5) = 13 - [5, 32) covers the whole read - 13 cigar positions are "M" (found in both the reference and the read)
{| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
+
getNumOverlaps(5,31,5) = 12 - skips the last overlapping position
|-style="background: #f2f2f2; text-align: center;"
+
  getNumOverlaps(0,100,5) = 13 - covers the whole read.
! Method Name !! Description
+
getNumOverlaps(-1, -1,5) = 13 - covers the whole read.
|-
+
getNumOverlaps(-1,10,5) = 4
| <code>CigarOperator::CigarOperator(Operation operation, uint32_t count)</code>
+
getNumOverlaps(10,-1,5) = 9
| Set the cigar operator with the specified operation and count length.
+
getNumOverlaps(9,19,5) = 0 - all skipped
|-
+
getNumOverlaps(9,20,5) = 1
| <code>char CigarOperator::getChar()</code>
+
getNumOverlaps(9,6,5) = 0 - start is before end
| Returns the character code (M, I, D, N, S, H, or P) associated with this operation.
+
getNumOverlaps(0,5,5) = 0 - outside of read
|-
+
getNumOverlaps(32,40,5) = 0 - outside of read
| <code>bool CigarOperator::operator == (CigarOperator &rhs)</code>
+
getNumOverlaps(0,5,1) = 4 - with a different start position, this range overlaps the read with 4 bases
| Returns true if the passed in operator is the same as this operator, false if not.
+
getNumOverlaps(32,40,32) = 4 - with a different start position, this range overlaps the read with 4 bases
|-
  −
| <code>bool CigarOperator::operator != (CigarOperator &rhs)</code>
  −
| Returns true if the passed in operator is not the same as this operator, false if they are the same.
  −
|}
 
96

edits

Navigation menu