# Difference between revisions of "Mapping Quality Scores"

Line 8: | Line 8: | ||

<math> | <math> | ||

− | MAPPING\_QUALITY = log_{10} \left ({1.0 - \frac {10^{-SUM\_BASE\_Q(best)}} {\sum_i 10^{-SUM\_BASE\_Q(i)}}} \right ) | + | MAPPING\_QUALITY = - log_{10} \left ({1.0 - \frac {10^{-SUM\_BASE\_Q(best)}} {\sum_i 10^{-SUM\_BASE\_Q(i)}}} \right ) |

</math> | </math> | ||

## Latest revision as of 03:21, 18 February 2010

**Mapping Quality Scores** quantify the probability that a read is misplaced. They were introduced by Heng Li and Richard Durbin in their paper describing MAQ and are usually reported on a Phred scale.

## Calculating a Mapping Quality Score

For a particular short sequence read, consider its best alignment in the genome. For this alignment, calculate the sum of base quality scores at mismatched bases and define a quantity *SUM_BASE_Q(best)*. Also, consider all other possible alignments for the read. For the alignment *i*, define *SUM_BASE_Q(i)* as the sum of base quality scores at mismatched bases for that alignment.

Then, the mapping quality is defined as:

The quantity tries to approximate the probability of generating a particular read when alignment *i* is used as template. For example, if there is a single mismatch with base quality 20, we approximate the probability of sampling the read as ~0.01; with two mismatches with base quality 20, the approximation becomes ~0.0001. Note that because this quantity will be effectively zero for most possible alignments, only a small subset of all possible alignments (those that result in small numbers of mismatches) must be considered in evaluating the denominator.

For paired end reads, we calculate SUM_BASE_Q as the sum of base quality scores at mismatched bases for both reads.

## Reference

Li H, Ruan J, Durbin R. (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. *Genome Research* **18**:1851-8.