Mapping Quality Scores

From Genome Analysis Wiki
Jump to: navigation, search

Mapping Quality Scores quantify the probability that a read is misplaced. They were introduced by Heng Li and Richard Durbin in their paper describing MAQ and are usually reported on a Phred scale.

Calculating a Mapping Quality Score

For a particular short sequence read, consider its best alignment in the genome. For this alignment, calculate the sum of base quality scores at mismatched bases and define a quantity SUM_BASE_Q(best). Also, consider all other possible alignments for the read. For the alignment i, define SUM_BASE_Q(i) as the sum of base quality scores at mismatched bases for that alignment.

Then, the mapping quality is defined as:


MAPPING\_QUALITY = - log_{10} \left ({1.0 - \frac {10^{-SUM\_BASE\_Q(best)}} {\sum_i 10^{-SUM\_BASE\_Q(i)}}} \right )

The quantity 10^{-SUM\_BASE\_Q(i)} tries to approximate the probability of generating a particular read when alignment i is used as template. For example, if there is a single mismatch with base quality 20, we approximate the probability of sampling the read as ~0.01; with two mismatches with base quality 20, the approximation becomes ~0.0001. Note that because this quantity will be effectively zero for most possible alignments, only a small subset of all possible alignments (those that result in small numbers of mismatches) must be considered in evaluating the denominator.

For paired end reads, we calculate SUM_BASE_Q as the sum of base quality scores at mismatched bases for both reads.

Reference

Li H, Ruan J, Durbin R. (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18:1851-8.