Changes

BamUtil: filter

, 16:29, 30 September 2010
no edit summary
But is that total number of mismatches in the entire alignment?
:NoIn processing, it is just the numbers for the bases that have been read so far.
For mismatch threshold, the logic goes in 1 match/mismatch base at a time from each end of the read and checks its current processing against the threshold.
If one of the next 10 bases from that clip is a mismatch, it will also be clipped.

For example consider the following strings of matches & mismatches (M indicates match, X indicates mismatch)
MMMXXXMXXM
Use a mismatch threshold of .50
The logic will read alternating, with '-' indicating which bases were processed from the front, '|' reads processed from the back, and '+' reads processed in both directions. '*' indicates clipped bases.
M M M X X X M X X M
-

Match was found from the front, so basesFromFront = 1, basesFromBack = 0, so now read from the back.

M M M X X X M X X M
- |
Match was found from the back, so basesFromFront = 1, basesFromBack = 1, so read again from the back.
M M M X X X M X X M
- | |
From back, 1/2 = .5, so is not over the .5 threshold; basesFromFront = 1, basesFromBack = 2, so read from the front.
M M M X X X M X X M
- - | |
From front, 0/2 < .5, so is not over the .5 threshold; basesFromFront = 2, basesFromBack = 2, so read again from the front.
M M M X X X M X X M
- - - | |
From front, 0/3 < .5, so is not over the .5 threshold; basesFromFront = 3, basesFromBack = 2, so read from the back.
M M M X X X M X X M
- - - | | |
From back, 2/3 > .5, so over the .5 threshold; need to clip...basesFromFront = 3, basesFromBack = 0, so read from the back.
M M M X X X M X X M
- - - | * * *
From back, 0/1 < .5, so is not over the .5 threshold; basesFromFront = 3, basesFromBack = 1, so read from the back.
M M M X X X M X X M
- - - | | * * *
From back, 1/2 = .5, so is not over the .5 threshold; basesFromFront = 3, basesFromBack = 2, so read from the back.
M M M X X X M X X M
- - - | | | * * *
From back, 2/3 > .5, so over the .5 threshold; need to clip...basesFromFront = 3, basesFromBack = 0, so read from the back.
M M M X X X M X X M
- - - | * * * * * *
From back, 1/1 > .5, so over the .5 threshold; need to clip...basesFromFront = 3, basesFromBack = 0, so read from the back.
M M M X X X M X X M
- - + * * * * * * *
From back, 0/1 < .5, so is not over the .5 threshold; basesFromFront = 3, basesFromBack = 1, so read from the back.
M M M X X X M X X M
- + + * * * * * * *
From back, 0/2 < .5, so is not over the .5 threshold; basesFromFront = 3, basesFromBack = 2, so read from the back.
M M M X X X M X X M
+ + + * * * * * * *
From back, 0/3 < .5, so is not over the .5 threshold; basesFromFront = 3, basesFromBack = 3, so read from the back.

In this example, there is no more to read from the front or the back.
The new Cigar is 3M7S
3,045
edits