Line 2: |
Line 2: |
| | | |
| (1) Empirical vs reported Phred score: | | (1) Empirical vs reported Phred score: |
| + | |
| Conditioning on the reported base quality, we count the total time of bases that matches or not matches the reference genome, and thus calculate the empirical quality by : -10 * log10 ( 1 - total # of mismatched / total bases) . In following cases, we will not use that bases for calculating empirical qualities: | | Conditioning on the reported base quality, we count the total time of bases that matches or not matches the reference genome, and thus calculate the empirical quality by : -10 * log10 ( 1 - total # of mismatched / total bases) . In following cases, we will not use that bases for calculating empirical qualities: |
| | | |
Line 13: |
Line 14: |
| | | |
| (2) Empirical Phred score by cycle: | | (2) Empirical Phred score by cycle: |
| + | |
| Conditioning on read cycle (e.g. first base, second base... be cautious using quality trimmed reads or bar-coded reads, as the real cycle may differ), we calculate empirical quality as above. | | Conditioning on read cycle (e.g. first base, second base... be cautious using quality trimmed reads or bar-coded reads, as the real cycle may differ), we calculate empirical quality as above. |
| If specifying --region, only bases falling in the target regions will be calculated. | | If specifying --region, only bases falling in the target regions will be calculated. |
| | | |
| (3) Mean depth vs. GC | | (3) Mean depth vs. GC |
| + | |
| We will count depth for whole genome or specified region (--region). | | We will count depth for whole genome or specified region (--region). |
| Default GC window size is 100. | | Default GC window size is 100. |
Line 24: |
Line 27: |
| | | |
| (4) Insert size | | (4) Insert size |
| + | |
| For mapped paired-end reads, the insert size distribution will be ploted. Otherwise, this graph would be empty. | | For mapped paired-end reads, the insert size distribution will be ploted. Otherwise, this graph would be empty. |
| Specifying --region will not affect this graph. | | Specifying --region will not affect this graph. |
| | | |
| (5) Empirical Q20 bases count by cycle | | (5) Empirical Q20 bases count by cycle |
| + | |
| We count the number of Q20 bases (base qualities that are larger than 20) conditioning on cycle number. | | We count the number of Q20 bases (base qualities that are larger than 20) conditioning on cycle number. |
| If specifying --regions, only bases in the target regions will be calculated. In such case, some reads will have their head and trail outside of the region. Thus you will likely to see a parabolic shape. | | If specifying --regions, only bases in the target regions will be calculated. In such case, some reads will have their head and trail outside of the region. Thus you will likely to see a parabolic shape. |
| | | |
| (6) Flag stats | | (6) Flag stats |
| + | |
| We count the number of reads in these categories: total, mapped, paired, proper paired, duplicated, QC failed. | | We count the number of reads in these categories: total, mapped, paired, proper paired, duplicated, QC failed. |
| These categories are determined by FLAG field from each BAM file. | | These categories are determined by FLAG field from each BAM file. |
| | | |
| (7) Mean depth of sequencing | | (7) Mean depth of sequencing |
− | Total mapped bases divided by total number of positions that are covered by at least one base. | + | |
| + | Total mapped bases divided by total number of positions that are covered by at least one base. The y-axis, percentage is calculated by sites divided by total sites (e.g. for whole genome, it's the total length; for target sequencing, it's the total length of all targeted region). |
| | | |
| (8) Empirical Q20 count | | (8) Empirical Q20 count |
| + | |
| We examine each base by its reported base quality, if that reported base quality corresponds to empirical base quality bettern Phred score 20, than we will count once as Q20 base. | | We examine each base by its reported base quality, if that reported base quality corresponds to empirical base quality bettern Phred score 20, than we will count once as Q20 base. |