From Genome Analysis Wiki
Jump to navigationJump to search
670 bytes added
, 02:48, 19 December 2009
Line 7: |
Line 7: |
| Then, the mapping quality is defined as: | | Then, the mapping quality is defined as: |
| | | |
− | MapQuality = SumBaseQual(best) / (Sigma_i (SumBaseQual(i))
| + | <math> |
| + | MAPPING\_QUALITY = log_{10} \left ({1.0 - \frac {10^{-SUM\_BASE\_Q(best)}} {\sum_i 10^{-SUM\_BASE\_Q(i)}}} \right ) |
| + | </math> |
| | | |
| + | The quantity <math>10^{-SUM\_BASE\_Q(i)}</math> tries to approximate the probability of generating a particular read when alignment ''i'' is used as template. For example, if there is a single mismatch with base quality 20, we approximate the probability of sampling the read as ~0.01; with two mismatches with base quality 20, the approximation becomes ~0.0001. Note that because this quantity will be effectively zero for most possible alignments, only a small subset of all possible alignments (those that result in small numbers of mismatches) must be considered in evaluating the denominator. |
| | | |
| For paired end reads, we calculate SUM_BASE_Q as the sum of base quality scores at mismatched bases for both reads. | | For paired end reads, we calculate SUM_BASE_Q as the sum of base quality scores at mismatched bases for both reads. |