Changes

From Genome Analysis Wiki
Jump to navigationJump to search
194 bytes added ,  16:20, 21 November 2011
no edit summary
Line 2: Line 2:     
(1) Empirical vs reported Phred score:
 
(1) Empirical vs reported Phred score:
 +
 
Conditioning on the reported base quality, we count the total time of bases that matches or not matches the reference genome, and thus calculate the empirical quality by : -10 * log10 ( 1 - total # of mismatched / total bases) . In following cases, we will not use that bases for calculating empirical qualities:
 
Conditioning on the reported base quality, we count the total time of bases that matches or not matches the reference genome, and thus calculate the empirical quality by : -10 * log10 ( 1 - total # of mismatched / total bases) . In following cases, we will not use that bases for calculating empirical qualities:
   Line 13: Line 14:     
(2) Empirical Phred score by cycle:
 
(2) Empirical Phred score by cycle:
 +
 
Conditioning on read cycle (e.g. first base, second base... be cautious using quality trimmed reads or bar-coded reads, as the real cycle may differ), we calculate empirical quality as above.
 
Conditioning on read cycle (e.g. first base, second base... be cautious using quality trimmed reads or bar-coded reads, as the real cycle may differ), we calculate empirical quality as above.
 
If specifying --region, only bases falling in the target regions will be calculated.
 
If specifying --region, only bases falling in the target regions will be calculated.
    
(3) Mean depth vs. GC
 
(3) Mean depth vs. GC
 +
 
We will count depth for whole genome or specified region (--region).
 
We will count depth for whole genome or specified region (--region).
 
Default GC window size is 100.
 
Default GC window size is 100.
Line 24: Line 27:     
(4) Insert size
 
(4) Insert size
 +
 
For mapped paired-end reads, the insert size distribution will be ploted. Otherwise, this graph would be empty.
 
For mapped paired-end reads, the insert size distribution will be ploted. Otherwise, this graph would be empty.
 
Specifying --region will not affect this graph.
 
Specifying --region will not affect this graph.
    
(5) Empirical Q20 bases count by cycle
 
(5) Empirical Q20 bases count by cycle
 +
 
We count the number of Q20 bases (base qualities that are larger than 20) conditioning on cycle number.
 
We count the number of Q20 bases (base qualities that are larger than 20) conditioning on cycle number.
 
If specifying --regions, only bases in the target regions will be calculated. In such case, some reads will have their head and trail outside of the region. Thus you will likely to see a parabolic shape.
 
If specifying --regions, only bases in the target regions will be calculated. In such case, some reads will have their head and trail outside of the region. Thus you will likely to see a parabolic shape.
    
(6) Flag stats
 
(6) Flag stats
 +
 
We count the number of reads in these categories: total, mapped, paired, proper paired, duplicated, QC failed.
 
We count the number of reads in these categories: total, mapped, paired, proper paired, duplicated, QC failed.
 
These categories are determined by FLAG field from each BAM file.
 
These categories are determined by FLAG field from each BAM file.
    
(7) Mean depth of sequencing
 
(7) Mean depth of sequencing
Total mapped bases divided by total number of positions that are covered by at least one base.
+
 
 +
Total mapped bases divided by total number of positions that are covered by at least one base. The y-axis, percentage is calculated by sites divided by total sites (e.g. for whole genome, it's the total length; for target sequencing, it's the total length of all targeted region).
    
(8) Empirical Q20 count
 
(8) Empirical Q20 count
 +
 
We examine each base by its reported base quality, if that reported base quality corresponds to empirical base quality bettern Phred score 20, than we will count once as Q20 base.
 
We examine each base by its reported base quality, if that reported base quality corresponds to empirical base quality bettern Phred score 20, than we will count once as Q20 base.
255

edits

Navigation menu