Changes

139 bytes added , 23:19, 8 September 2010

no edit summary

Line 1: Line 1: −

== Grouping ==

+

== Grouping ==

−

When evaluating read mappers, we should always focus on well defined sets of reads:

+

When evaluating read mappers, we should always focus on well defined sets of reads:

−

* Reads with no polymorphisms.

+

*Reads with no polymorphisms.

−

* Reads with 1, 2, 3 or more SNPs.

+

*Reads with 1, 2, 3 or more SNPs.

−

* Reads with specific types of short indels (<10bp).

+

*Reads with specific types of short indels (<10bp).

−

* Reads with larger structural variants (>100bp).

+

*Reads with larger structural variants (>100bp).

−

SNPs and errors are different because SNPs can lead to mismatches in high-quality bases. In addition to integrating according to the metrics above, we could separate results by the number of errors in each read.

+

SNPs and errors are different because SNPs can lead to mismatches in high-quality bases. In addition to integrating according to the metrics above, we could separate results by the number of errors in each read.

−

~~== Bulk Statistics ==~~

+

Should also be grouped according to whether reads are '''paired-end''' or '''single-end''' and according to '''read-length'''.

−

* Speed (millions of reads per hour)

+

== Bulk Statistics ==

−

* Memory requirements

−

* Size of output files

−

* Raw count of mapped reads

−

~~== Mapping Accuracy ==~~

+

*Speed (millions of reads per hour)

+

*Memory requirements

+

*Size of output files

+

*Raw count of mapped reads

−

~~The key quantities are:~~

+

== Mapping Accuracy ==

−

* How many reads were not mapped at all?

+

The key quantities are:

−

* How many reads were mapped incorrectly? '''This is the least desirable outcome'''.

−

* How many reads were mapped correctly?

−

~~Correct mapping should be defined as:~~

+

*How many reads were not mapped at all?

+

*How many reads were mapped incorrectly? '''This is the least desirable outcome'''.

+

*How many reads were mapped correctly?

−

* Most stringent: ~~matches simulated location and CIGAR string.~~

+

Correct mapping should be defined as:

−

* Less stringent: overlaps simulated location at base-pair level, CIGAR string and end positions may differ.

−

* Incorrect: Doesn't overlap simulated location.

−

== Mapping Qualities ==

+

*Most stringent: matches simulated location and CIGAR string.

+

*Less stringent: overlaps simulated location at base-pair level, CIGAR string and end positions may differ.

+

*Incorrect: Doesn't overlap simulated location.

+

== Mapping Qualities ==

We should evaluate mapping qualities by counting how many reads are assigned each mapping quality (or greater) and among those how many map correctly or incorrectly. This gives a Heng Li graph, where one plots number of correctly mapped reads vs. number of mismapped reads.

Goncalo

Bureaucrats, Administrators

1,555

edits

Changes

Evaluating a Read Mapper on Simulated Data (view source)

Revision as of 23:19, 8 September 2010

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools