Line 1: |
Line 1: |
− | == Grouping == | + | == Grouping == |
| | | |
− | When evaluating read mappers, we should always focus on well defined sets of reads: | + | When evaluating read mappers, we should always focus on well defined sets of reads: |
| | | |
− | * Reads with no polymorphisms. | + | *Reads with no polymorphisms. |
− | * Reads with 1, 2, 3 or more SNPs. | + | *Reads with 1, 2, 3 or more SNPs. |
− | * Reads with specific types of short indels (<10bp). | + | *Reads with specific types of short indels (<10bp). |
− | * Reads with larger structural variants (>100bp). | + | *Reads with larger structural variants (>100bp). |
| | | |
− | SNPs and errors are different because SNPs can lead to mismatches in high-quality bases. In addition to integrating according to the metrics above, we could separate results by the number of errors in each read. | + | SNPs and errors are different because SNPs can lead to mismatches in high-quality bases. In addition to integrating according to the metrics above, we could separate results by the number of errors in each read. |
| | | |
− | == Bulk Statistics ==
| + | Should also be grouped according to whether reads are '''paired-end''' or '''single-end''' and according to '''read-length'''. |
| | | |
− | * Speed (millions of reads per hour)
| + | == Bulk Statistics == |
− | * Memory requirements
| |
− | * Size of output files
| |
− | * Raw count of mapped reads
| |
| | | |
− | == Mapping Accuracy ==
| + | *Speed (millions of reads per hour) |
| + | *Memory requirements |
| + | *Size of output files |
| + | *Raw count of mapped reads |
| | | |
− | The key quantities are:
| + | == Mapping Accuracy == |
| | | |
− | * How many reads were not mapped at all?
| + | The key quantities are: |
− | * How many reads were mapped incorrectly? '''This is the least desirable outcome'''.
| |
− | * How many reads were mapped correctly?
| |
| | | |
− | Correct mapping should be defined as:
| + | *How many reads were not mapped at all? |
| + | *How many reads were mapped incorrectly? '''This is the least desirable outcome'''. |
| + | *How many reads were mapped correctly? |
| | | |
− | * Most stringent: matches simulated location and CIGAR string.
| + | Correct mapping should be defined as: |
− | * Less stringent: overlaps simulated location at base-pair level, CIGAR string and end positions may differ.
| |
− | * Incorrect: Doesn't overlap simulated location.
| |
| | | |
− | == Mapping Qualities == | + | *Most stringent: matches simulated location and CIGAR string. |
| + | *Less stringent: overlaps simulated location at base-pair level, CIGAR string and end positions may differ. |
| + | *Incorrect: Doesn't overlap simulated location. |
| + | |
| + | == Mapping Qualities == |
| | | |
| We should evaluate mapping qualities by counting how many reads are assigned each mapping quality (or greater) and among those how many map correctly or incorrectly. This gives a Heng Li graph, where one plots number of correctly mapped reads vs. number of mismapped reads. | | We should evaluate mapping qualities by counting how many reads are assigned each mapping quality (or greater) and among those how many map correctly or incorrectly. This gives a Heng Li graph, where one plots number of correctly mapped reads vs. number of mismapped reads. |