From Genome Analysis Wiki
Jump to: navigation, search


87 bytes added, 23:01, 11 May 2010
no edit summary
It read SAM/BAM file line by line. Then according to CIGAR string, it compares the alignment to reference genome (base by base) and record match and mismatch frequencies grouped by observed base quality.
The output will be observed quality (generated by Illumina machine) and empirical quality (calculated by Prob(Mismatch bases | base quality Q) = (Total number of mismatched bases with base quality Q) / (Total number of bases with base quality Q)), both in Phred quality score. For convenience, you can pipe the output by '| Rscript --vanilla -' to obtain a graph.
By default, we omit soft clips, insertion and deletion.

Navigation menu