Changes

1,393 bytes added , 15:05, 2 February 2010

Created page with '== Grouping == When evaluating read mappers, we should always focus on well defined sets of reads: * Reads with no polymorphisms. * Reads with 1, 2, 3 or more SNPs. * Reads wit…'

== Grouping ==

When evaluating read mappers, we should always focus on well defined sets of reads:

* Reads with no polymorphisms.
* Reads with 1, 2, 3 or more SNPs.
* Reads with specific types of short indels (<10bp).
* Reads with larger structural variants (>100bp).

SNPs and errors are different because SNPs can lead to mismatches in high-quality bases. In addition to integrating according to the metrics above, we could separate results by the number of errors in each read.

== Bulk Statistics ==

* Speed (millions of reads per hour)
* Memory requirements
* Size of output files
* Raw count of mapped reads

== Mapping Accuracy ==

The key quantities are:

* How many reads were not mapped at all?
* How many reads were mapped incorrectly? '''This is the least desirable outcome'''.
* How many reads were mapped correctly?

Correct mapping should be defined as:

* Most stringent: matches simulated location and CIGAR string.
* Less stringent: overlaps simulated location at base-pair level, CIGAR string and end positions may differ.
* Incorrect: Doesn't overlap simulated location.

== Mapping Qualities ==

We should evaluate mapping qualities by counting how many reads are assigned each mapping quality (or greater) and among those how many map correctly or incorrectly. This gives a Heng Li graph, where one plots number of correctly mapped reads vs. number of mismapped reads.

Goncalo

Bureaucrats, Administrators

1,555

edits

Changes

Evaluating a Read Mapper on Simulated Data (view source)

Revision as of 15:05, 2 February 2010

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools