From Genome Analysis Wiki
Jump to: navigation, search

BamUtil: filter

1,364 bytes added, 16:11, 30 September 2010
no edit summary
'''How is the quality threshold checked?'''
: Easy, just loop through an alignment finding mismatches. Add up the quality of the mismatches. If they sum is greater than the quality threshold, mark the read as unmapped.
'''How is the mismatch threshold checked?'''
: This is requires a bit more logic...and thus gets its own section. ==== Mismatch Threshold ==== Mismatch threshold is::<math>NumMismatches \over NumMismatches + NumMatches</math> But is that total number of mismatches in the entire alignment?:No For mismatch threshold, the logic goes in 1 match/mismatch base at a time from each end of the read and checks its current processing against the threshold. If the first base is a mismatch, it is:<math>{1 \over 1 + 0} = {1 \over 1} = 1</math>This base will be clipped since 1 is > than the mismatch threshold (assuming that wasn't set to 1). If the first base is a mismatch, it is:<math>{0 \over 1 + 0} = {0 \over 1} = 0</math>At this point, this base will not be clipped since it is not greater than the mismatch threshold. When a clip occurs, NumMatches & NumMismatches are reset to 0 (those bases are now clipped and do not count as a match or a mismatch). To try to minimize the number of bases that are clipped, the logic keeps the NumMatches + NumMismatches when reading from the front and NumMatches + NumMismatches when reading from the back within 1 of each other. If the mismatch threshold is 10%, it means that none of the 1st or last 10 bases of the updated read will be a mismatch. If one of the 1st 10 bases is a mismatch, it will be clipped. If one of the next 10 bases from that clip is a mismatch, it will also be clipped.

Navigation menu