Line 1: |
Line 1: |
| + | [[Category:C++]] |
| + | [[Category:libStatGen]] |
| + | [[Category:libStatGen general]] |
| + | |
| + | = Cigar= |
| + | This class is part of [[libStatGen: general]]. |
| + | |
| + | The purpose of this class is to provide utilities for processing CIGARs. It has read-only operators that do not allow modification to the class other than for lazy-evaluation. |
| + | |
| + | See: http://csg.sph.umich.edu//mktrost/doxygen/current/classCigar.html for documentation. |
| + | |
| + | The static methods are helpful for determining information about the operator. |
| + | |
| + | See [[C++ Class: CigarRoller#Mapping Between Reference and Read/Query|Mapping Between Reference and Read/Query]] for a more detailed explanation with examples as to how the mapping between the read/query works. |
| + | |
| + | See [[C++ Class: CigarRoller#Determining the Number of Reference and Read/Query Overlaps|Determining the Number of Reference and Read/Query Overlaps]] for a more detailed explanation with examples as to how determining overlaps works. |
| + | |
| = CigarRoller= | | = CigarRoller= |
− | This class is part of [[C++ Library: libcsg|libcsg]]. | + | This class is part of [[libStatGen: general]]. |
| + | |
| + | The purpose of this class is to provide accessors for setting, updating, modifying the CIGAR object. It is a child class of Cigar. |
| + | |
| + | See: http://csg.sph.umich.edu//mktrost/doxygen/current/classCigarRoller.html for documentation. |
| + | |
| + | = Mapping Between Reference and Read/Query = |
| + | <code>int32_t Cigar::getRefOffset(int32_t queryIndex)</code> and <code>int32_t Cigar::getQueryIndex(int32_t refOffset)</code> are used to map between the reference and the read. |
| + | |
| + | The queryIndex is the index in the read - from 0 to (read length - 1). |
| + | The refOffset is the offset into the reference from the starting position of the read. |
| + | |
| + | For Example: |
| + | Reference: ACTGAACCTTGGAAACTGCCGGGGACT |
| + | Read: ACTGACTGAAACCATT |
| + | CIGAR: 4M10N4M3I2M4D3M |
| + | POS: 5 |
| | | |
− | This purpose of this class is to provide utilities for creating and processing CIGAR strings. | + | This means it aligns: |
| + | Reference: ACTGAACCTTGGAAACTG CCGGGGACT |
| + | Read: ACTG ACTGAAACC ATT |
| | | |
− | == Public Methods ==
| + | Adding the position: |
− | {| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
| + | RefPos: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
− | |-style="background: #f2f2f2; text-align: center;"
| + | Reference: A C T G A A C C T T G G A A A C T G C C G G G G A C T |
− | ! Method Name !! Description
| + | Read: A C T G A C T G A A A C C A T T |
− | |-
| |
− | | void BaseAsciiMap::setBaseMapType(SPACE_TYPE spaceType)
| |
− | | Set the base type based on the passed in option.
| |
− | |}
| |
| | | |
| + | Adding the offsets: |
| + | RefPos: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
| + | refOffset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
| + | Reference: A C T G A A C C T T G G A A A C T G C C G G G G A C T |
| + | Read: A C T G A C T G A A A C C A T T |
| + | queryIndex: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
| | | |
− | == Overloaded Streaming Operators ==
| + | The results of a call to getRefOffset for each value passed in (where NA stands for INDEX_NA): |
− | {| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
| + | queryIndex: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16(and any value over 16) |
− | |-style="background: #f2f2f2; text-align: center;"
| + | Return: 0 1 2 3 14 15 16 17 NA NA NA 18 19 24 25 26 NA |
− | ! Method Name !! Description
| |
− | |-
| |
− | | <code> std::ostream &operator << (std::ostream &stream, const CigarRoller& roller)</code>
| |
− | | Writes all of the cigar operations contained in this roller to the passed in stream.
| |
− | |}
| |
| | | |
| + | The results of a call to getQueryIndex for each value passed in (where NA stands for INDEX_NA): |
| + | refOffset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27(and any value over 27) |
| + | Return: 0 1 2 3 NA NA NA NA NA NA NA NA NA NA 4 5 6 7 11 12 NA NA NA NA 13 14 15 NA |
| | | |
| + | The results of a call to getRefPosition passing in start position 5 (where NA stands for INDEX_NA): |
| + | queryIndex: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16(and any value over 16) |
| + | Return: 5 6 7 8 19 20 21 22 NA NA NA 23 24 29 30 31 NA |
| | | |
− | == Public Enums ==
| + | The results of a call to getQueryIndex using refPosition and start position 5 (where NA stands for INDEX_NA): |
− | {| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
| + | refPosition:5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32(and any value over 32) |
− | |-style="background: #f2f2f2; text-align: center;"
| + | Return: 0 1 2 3 NA NA NA NA NA NA NA NA NA NA 4 5 6 7 11 12 NA NA NA NA 13 14 15 NA |
− | ! colspan="2"| enum SPACE_TYPE
| |
− | |-
| |
− | ! Enum Value !! Description
| |
− | |-
| |
− | | none
| |
− | | No operation has been specified
| |
− | |-
| |
− | | match
| |
− | | The query sequence and the reference sequence bases are the same for the bases associated with this cigar operation.
| |
− | Both <code>match</code> and <code>mismatch</code> are associated with CIGAR Operation "M"
| |
− | |-
| |
− | | mismatch
| |
− | | The query sequence and the reference sequence bases are different for the bases associated with this cigar operation, but bases exist in both the query and the reference.
| |
− | Both <code>match</code> and <code>mismatch</code> are associated with CIGAR Operation "M"
| |
− | |-
| |
− | | insert
| |
− | | Insertion to the reference (the query sequence contains bases that have no corresponding base in the reference).
| |
− | Associated with CIGAR Operation "I"
| |
− | |-
| |
− | | del
| |
− | |Deletion from the reference (the reference contains bases that have no corresponding base in the query sequence).
| |
− | Associated with CIGAR Operation "D"
| |
− | |-
| |
− | | skip
| |
− | | Skipped region from the reference (the reference contains bases that have no corresponding base in the query sequence).
| |
− | Associated with CIGAR Operation "N"
| |
− | |-
| |
− | | softClip
| |
− | | Soft clip on the read (clipped sequence present in the query sequence)
| |
− | Associated with CIGAR Operation "S"
| |
− | |-
| |
− | | hardClip
| |
− | | Hard clip on the read (clipped sequence not present in the query sequence)
| |
− | Associated with CIGAR Operation "H"
| |
− | |-
| |
− | |pad
| |
− | | Padding (silent deletion from the padded reference sequence)
| |
− | Associated with CIGAR Operation "P"
| |
− | |}
| |
| | | |
| | | |
− | == Public Constants == | + | == Determining the Number of Reference and Read/Query Overlaps == |
− | {| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
| |
− | |-style="background: #f2f2f2; text-align: center;"
| |
− | ! Constant !! Value !! Description
| |
− | |-
| |
− | | INDEX_NA
| |
− | | -1
| |
− | | Value associated with an index that is not applicable/does not exist.
| |
− | Used for converting between query and reference indexes/offsets when an associated index/offset does not exist.
| |
− | |}
| |
| | | |
| + | A useful concept is determining the number of bases that overlap between the reference and the read in a given region. |
| | | |
− | == Nested Class ==
| + | To do this, use <code>getNumOverlaps</code>, passing in the reference start and end positions for the region as well as the reference position where the read begins. start is inclusive, while end is exclusive. |
| | | |
− | === CigarOperation ===
| + | Using the above example: |
| + | RefPos: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
| + | refOffset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
| + | Reference: A C T G A A C C T T G G A A A C T G C C G G G G A C T |
| + | Read: A C T G A C T G A A A C C A T T |
| + | queryIndex: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
| | | |
− | ==== Public Methods ==== | + | getNumOverlaps(5,32,5) = 13 - [5, 32) covers the whole read - 13 cigar positions are "M" (found in both the reference and the read) |
− | {| style="margin: 1em 1em 1em 0; background-color: #f9f9f9; border: 1px #aaa solid; border-collapse: collapse;" border="1"
| + | getNumOverlaps(5,31,5) = 12 - skips the last overlapping position |
− | |-style="background: #f2f2f2; text-align: center;"
| + | getNumOverlaps(0,100,5) = 13 - covers the whole read. |
− | ! Method Name !! Description
| + | getNumOverlaps(-1, -1,5) = 13 - covers the whole read. |
− | |-
| + | getNumOverlaps(-1,10,5) = 4 |
− | | <code>CigarOperator::CigarOperator(Operation operation, uint32_t count)</code>
| + | getNumOverlaps(10,-1,5) = 9 |
− | | Set the cigar operator with the specified operation and count length.
| + | getNumOverlaps(9,19,5) = 0 - all skipped |
− | |-
| + | getNumOverlaps(9,20,5) = 1 |
− | | <code>char CigarOperator::getChar()</code>
| + | getNumOverlaps(9,6,5) = 0 - start is before end |
− | | Returns the character code (M, I, D, N, S, H, or P) associated with this operation.
| + | getNumOverlaps(0,5,5) = 0 - outside of read |
− | |-
| + | getNumOverlaps(32,40,5) = 0 - outside of read |
− | | <code>bool CigarOperator::operator == (CigarOperator &rhs)</code>
| + | getNumOverlaps(0,5,1) = 4 - with a different start position, this range overlaps the read with 4 bases |
− | | Returns true if the passed in operator is the same as this operator, false if not.
| + | getNumOverlaps(32,40,32) = 4 - with a different start position, this range overlaps the read with 4 bases |
− | |-
| |
− | | <code>bool CigarOperator::operator != (CigarOperator &rhs)</code>
| |
− | | Returns true if the passed in operator is not the same as this operator, false if they are the same.
| |
− | |}
| |