Changes

194 bytes added , 16:20, 21 November 2011

no edit summary

Line 2: Line 2:

(1) Empirical vs reported Phred score:

+

Conditioning on the reported base quality, we count the total time of bases that matches or not matches the reference genome, and thus calculate the empirical quality by : -10 * log10 ( 1 - total # of mismatched / total bases) . In following cases, we will not use that bases for calculating empirical qualities:

Line 13: Line 14:

(2) Empirical Phred score by cycle:

+

Conditioning on read cycle (e.g. first base, second base... be cautious using quality trimmed reads or bar-coded reads, as the real cycle may differ), we calculate empirical quality as above.

If specifying --region, only bases falling in the target regions will be calculated.

(3) Mean depth vs. GC

+

We will count depth for whole genome or specified region (--region).

Default GC window size is 100.

Line 24: Line 27:

(4) Insert size

+

For mapped paired-end reads, the insert size distribution will be ploted. Otherwise, this graph would be empty.

Specifying --region will not affect this graph.

(5) Empirical Q20 bases count by cycle

+

We count the number of Q20 bases (base qualities that are larger than 20) conditioning on cycle number.

If specifying --regions, only bases in the target regions will be calculated. In such case, some reads will have their head and trail outside of the region. Thus you will likely to see a parabolic shape.

(6) Flag stats

+

We count the number of reads in these categories: total, mapped, paired, proper paired, duplicated, QC failed.

These categories are determined by FLAG field from each BAM file.

(7) Mean depth of sequencing

−

Total mapped bases divided by total number of positions that are covered by at least one base.

+

Total mapped bases divided by total number of positions that are covered by at least one base. The y-axis, percentage is calculated by sites divided by total sites (e.g. for whole genome, it's the total length; for target sequencing, it's the total length of all targeted region).

(8) Empirical Q20 count

+

We examine each base by its reported base quality, if that reported base quality corresponds to empirical base quality bettern Phred score 20, than we will count once as Q20 base.

Zhanxw

255

edits

Changes

Talk:QPLOT (view source)

Revision as of 16:20, 21 November 2011

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools