Difference between revisions of "Mapping Quality Scores"

From Genome Analysis Wiki
Jump to navigationJump to search
 
Line 8: Line 8:
  
 
<math>
 
<math>
MAPPING\_QUALITY = log_{10} \left ({1.0 - \frac {10^{-SUM\_BASE\_Q(best)}} {\sum_i 10^{-SUM\_BASE\_Q(i)}}} \right )
+
MAPPING\_QUALITY = - log_{10} \left ({1.0 - \frac {10^{-SUM\_BASE\_Q(best)}} {\sum_i 10^{-SUM\_BASE\_Q(i)}}} \right )
 
</math>
 
</math>
  

Latest revision as of 03:21, 18 February 2010

Mapping Quality Scores quantify the probability that a read is misplaced. They were introduced by Heng Li and Richard Durbin in their paper describing MAQ and are usually reported on a Phred scale.

Calculating a Mapping Quality Score

For a particular short sequence read, consider its best alignment in the genome. For this alignment, calculate the sum of base quality scores at mismatched bases and define a quantity SUM_BASE_Q(best). Also, consider all other possible alignments for the read. For the alignment i, define SUM_BASE_Q(i) as the sum of base quality scores at mismatched bases for that alignment.

Then, the mapping quality is defined as:

The quantity tries to approximate the probability of generating a particular read when alignment i is used as template. For example, if there is a single mismatch with base quality 20, we approximate the probability of sampling the read as ~0.01; with two mismatches with base quality 20, the approximation becomes ~0.0001. Note that because this quantity will be effectively zero for most possible alignments, only a small subset of all possible alignments (those that result in small numbers of mismatches) must be considered in evaluating the denominator.

For paired end reads, we calculate SUM_BASE_Q as the sum of base quality scores at mismatched bases for both reads.

Reference

Li H, Ruan J, Durbin R. (2008) Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18:1851-8.