Difference between revisions of "C++ Class: CigarRoller"
(→Cigar) |
|||
(19 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | [[Category:C++]] | |
− | + | [[Category:libStatGen]] | |
+ | [[Category:libStatGen general]] | ||
− | This | + | = Cigar= |
+ | This class is part of [[libStatGen: general]]. | ||
− | + | The purpose of this class is to provide utilities for processing CIGARs. It has read-only operators that do not allow modification to the class other than for lazy-evaluation. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | See | + | See: http://csg.sph.umich.edu//mktrost/doxygen/current/classCigar.html for documentation. |
− | |||
− | |||
− | |||
− | |||
− | + | The static methods are helpful for determining information about the operator. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | See [[C++ Class: CigarRoller#Mapping Between Reference and Read/Query|Mapping Between Reference and Read/Query]] for a more detailed explanation with examples as to how the mapping between the read/query works. | |
− | + | See [[C++ Class: CigarRoller#Determining the Number of Reference and Read/Query Overlaps|Determining the Number of Reference and Read/Query Overlaps]] for a more detailed explanation with examples as to how determining overlaps works. | |
− | |||
− | See [[C++ Class: CigarRoller#Determining the Number of Reference and Read/Query Overlaps|Determining the Number of Reference and Read/Query Overlaps]] for a more detailed explanation with examples as to how | ||
− | |||
− | |||
+ | = CigarRoller= | ||
+ | This class is part of [[libStatGen: general]]. | ||
− | + | The purpose of this class is to provide accessors for setting, updating, modifying the CIGAR object. It is a child class of Cigar. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | See: http://csg.sph.umich.edu//mktrost/doxygen/current/classCigarRoller.html for documentation. | ||
− | + | = Mapping Between Reference and Read/Query = | |
− | + | <code>int32_t Cigar::getRefOffset(int32_t queryIndex)</code> and <code>int32_t Cigar::getQueryIndex(int32_t refOffset)</code> are used to map between the reference and the read. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | <code>int32_t | ||
The queryIndex is the index in the read - from 0 to (read length - 1). | The queryIndex is the index in the read - from 0 to (read length - 1). | ||
Line 196: | Line 31: | ||
For Example: | For Example: | ||
Reference: ACTGAACCTTGGAAACTGCCGGGGACT | Reference: ACTGAACCTTGGAAACTGCCGGGGACT | ||
− | Read: | + | Read: ACTGACTGAAACCATT |
CIGAR: 4M10N4M3I2M4D3M | CIGAR: 4M10N4M3I2M4D3M | ||
POS: 5 | POS: 5 | ||
Line 202: | Line 37: | ||
This means it aligns: | This means it aligns: | ||
Reference: ACTGAACCTTGGAAACTG CCGGGGACT | Reference: ACTGAACCTTGGAAACTG CCGGGGACT | ||
− | Read: ACTG ACTGAAACC | + | Read: ACTG ACTGAAACC ATT |
Adding the position: | Adding the position: | ||
RefPos: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | RefPos: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | ||
Reference: A C T G A A C C T T G G A A A C T G C C G G G G A C T | Reference: A C T G A A C C T T G G A A A C T G C C G G G G A C T | ||
− | Read: A C T G A C T G A A A C C A | + | Read: A C T G A C T G A A A C C A T T |
Adding the offsets: | Adding the offsets: | ||
Line 213: | Line 48: | ||
refOffset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | refOffset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ||
Reference: A C T G A A C C T T G G A A A C T G C C G G G G A C T | Reference: A C T G A A C C T T G G A A A C T G C C G G G G A C T | ||
− | Read: A C T G A C T G A A A C C A | + | Read: A C T G A C T G A A A C C A T T |
queryIndex: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | queryIndex: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ||
Line 233: | Line 68: | ||
− | + | == Determining the Number of Reference and Read/Query Overlaps == | |
+ | |||
+ | A useful concept is determining the number of bases that overlap between the reference and the read in a given region. | ||
+ | |||
+ | To do this, use <code>getNumOverlaps</code>, passing in the reference start and end positions for the region as well as the reference position where the read begins. start is inclusive, while end is exclusive. | ||
+ | |||
+ | Using the above example: | ||
+ | RefPos: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | ||
+ | refOffset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ||
+ | Reference: A C T G A A C C T T G G A A A C T G C C G G G G A C T | ||
+ | Read: A C T G A C T G A A A C C A T T | ||
+ | queryIndex: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ||
+ | |||
+ | getNumOverlaps(5,32,5) = 13 - [5, 32) covers the whole read - 13 cigar positions are "M" (found in both the reference and the read) | ||
+ | getNumOverlaps(5,31,5) = 12 - skips the last overlapping position | ||
+ | getNumOverlaps(0,100,5) = 13 - covers the whole read. | ||
+ | getNumOverlaps(-1, -1,5) = 13 - covers the whole read. | ||
+ | getNumOverlaps(-1,10,5) = 4 | ||
+ | getNumOverlaps(10,-1,5) = 9 | ||
+ | getNumOverlaps(9,19,5) = 0 - all skipped | ||
+ | getNumOverlaps(9,20,5) = 1 | ||
+ | getNumOverlaps(9,6,5) = 0 - start is before end | ||
+ | getNumOverlaps(0,5,5) = 0 - outside of read | ||
+ | getNumOverlaps(32,40,5) = 0 - outside of read | ||
+ | getNumOverlaps(0,5,1) = 4 - with a different start position, this range overlaps the read with 4 bases | ||
+ | getNumOverlaps(32,40,32) = 4 - with a different start position, this range overlaps the read with 4 bases |
Latest revision as of 12:00, 2 February 2017
Cigar
This class is part of libStatGen: general.
The purpose of this class is to provide utilities for processing CIGARs. It has read-only operators that do not allow modification to the class other than for lazy-evaluation.
See: http://csg.sph.umich.edu//mktrost/doxygen/current/classCigar.html for documentation.
The static methods are helpful for determining information about the operator.
See Mapping Between Reference and Read/Query for a more detailed explanation with examples as to how the mapping between the read/query works.
See Determining the Number of Reference and Read/Query Overlaps for a more detailed explanation with examples as to how determining overlaps works.
CigarRoller
This class is part of libStatGen: general.
The purpose of this class is to provide accessors for setting, updating, modifying the CIGAR object. It is a child class of Cigar.
See: http://csg.sph.umich.edu//mktrost/doxygen/current/classCigarRoller.html for documentation.
Mapping Between Reference and Read/Query
int32_t Cigar::getRefOffset(int32_t queryIndex)
and int32_t Cigar::getQueryIndex(int32_t refOffset)
are used to map between the reference and the read.
The queryIndex is the index in the read - from 0 to (read length - 1). The refOffset is the offset into the reference from the starting position of the read.
For Example:
Reference: ACTGAACCTTGGAAACTGCCGGGGACT Read: ACTGACTGAAACCATT CIGAR: 4M10N4M3I2M4D3M POS: 5
This means it aligns:
Reference: ACTGAACCTTGGAAACTG CCGGGGACT Read: ACTG ACTGAAACC ATT
Adding the position:
RefPos: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Reference: A C T G A A C C T T G G A A A C T G C C G G G G A C T Read: A C T G A C T G A A A C C A T T
Adding the offsets:
RefPos: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 refOffset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Reference: A C T G A A C C T T G G A A A C T G C C G G G G A C T Read: A C T G A C T G A A A C C A T T queryIndex: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
The results of a call to getRefOffset for each value passed in (where NA stands for INDEX_NA):
queryIndex: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16(and any value over 16) Return: 0 1 2 3 14 15 16 17 NA NA NA 18 19 24 25 26 NA
The results of a call to getQueryIndex for each value passed in (where NA stands for INDEX_NA):
refOffset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27(and any value over 27) Return: 0 1 2 3 NA NA NA NA NA NA NA NA NA NA 4 5 6 7 11 12 NA NA NA NA 13 14 15 NA
The results of a call to getRefPosition passing in start position 5 (where NA stands for INDEX_NA):
queryIndex: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16(and any value over 16) Return: 5 6 7 8 19 20 21 22 NA NA NA 23 24 29 30 31 NA
The results of a call to getQueryIndex using refPosition and start position 5 (where NA stands for INDEX_NA):
refPosition:5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32(and any value over 32) Return: 0 1 2 3 NA NA NA NA NA NA NA NA NA NA 4 5 6 7 11 12 NA NA NA NA 13 14 15 NA
Determining the Number of Reference and Read/Query Overlaps
A useful concept is determining the number of bases that overlap between the reference and the read in a given region.
To do this, use getNumOverlaps
, passing in the reference start and end positions for the region as well as the reference position where the read begins. start is inclusive, while end is exclusive.
Using the above example:
RefPos: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 refOffset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Reference: A C T G A A C C T T G G A A A C T G C C G G G G A C T Read: A C T G A C T G A A A C C A T T queryIndex: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
getNumOverlaps(5,32,5) = 13 - [5, 32) covers the whole read - 13 cigar positions are "M" (found in both the reference and the read) getNumOverlaps(5,31,5) = 12 - skips the last overlapping position getNumOverlaps(0,100,5) = 13 - covers the whole read. getNumOverlaps(-1, -1,5) = 13 - covers the whole read. getNumOverlaps(-1,10,5) = 4 getNumOverlaps(10,-1,5) = 9 getNumOverlaps(9,19,5) = 0 - all skipped getNumOverlaps(9,20,5) = 1 getNumOverlaps(9,6,5) = 0 - start is before end getNumOverlaps(0,5,5) = 0 - outside of read getNumOverlaps(32,40,5) = 0 - outside of read getNumOverlaps(0,5,1) = 4 - with a different start position, this range overlaps the read with 4 bases getNumOverlaps(32,40,32) = 4 - with a different start position, this range overlaps the read with 4 bases