From Genome Analysis Wiki
Jump to navigationJump to search
1,364 bytes added
, 17:11, 30 September 2010
Line 97: |
Line 97: |
| | | |
| '''How is the quality threshold checked?''' | | '''How is the quality threshold checked?''' |
− | : Easy, just loop through an alignment finding mismatches. Add up the quality of the mismatches. If they sum is greater than the quality threshold, mark the read as unmapped. | + | :Easy, just loop through an alignment finding mismatches. Add up the quality of the mismatches. If they sum is greater than the quality threshold, mark the read as unmapped. |
| | | |
| '''How is the mismatch threshold checked?''' | | '''How is the mismatch threshold checked?''' |
− | : This is requires a bit more logic... | + | :This is requires a bit more logic...and thus gets its own section. |
| + | |
| + | ==== Mismatch Threshold ==== |
| + | |
| + | Mismatch threshold is: |
| + | :<math>NumMismatches \over NumMismatches + NumMatches</math> |
| + | |
| + | But is that total number of mismatches in the entire alignment? |
| + | :No |
| + | |
| + | For mismatch threshold, the logic goes in 1 match/mismatch base at a time from each end of the read and checks its current processing against the threshold. |
| + | |
| + | If the first base is a mismatch, it is |
| + | :<math>{1 \over 1 + 0} = {1 \over 1} = 1</math> |
| + | This base will be clipped since 1 is > than the mismatch threshold (assuming that wasn't set to 1). |
| + | |
| + | If the first base is a mismatch, it is |
| + | :<math>{0 \over 1 + 0} = {0 \over 1} = 0</math> |
| + | At this point, this base will not be clipped since it is not greater than the mismatch threshold. |
| + | |
| + | When a clip occurs, NumMatches & NumMismatches are reset to 0 (those bases are now clipped and do not count as a match or a mismatch). |
| + | |
| + | To try to minimize the number of bases that are clipped, the logic keeps the NumMatches + NumMismatches when reading from the front and NumMatches + NumMismatches when reading from the back within 1 of each other. |
| + | |
| + | If the mismatch threshold is 10%, it means that none of the 1st or last 10 bases of the updated read will be a mismatch. |
| + | |
| + | If one of the 1st 10 bases is a mismatch, it will be clipped. |
| + | |
| + | If one of the next 10 bases from that clip is a mismatch, it will also be clipped. |