BaseQualityCheck

From Genome Analysis Wiki
Revision as of 22:42, 11 May 2010 by Zhanxw (talk | contribs)
Jump to: navigation, search

Base Quality Check

(May 11, 2010 - Paul, Xiaowei)


Location: $(repository)/baseQualityCheck/baseQualityCheck

Algorithm:

It read SAM/BAM file line by line; then according to CIGAR string, it compares the alignment to reference genome base by base, then group match and mismatch frequencies by observed base quality. The output will be observed quality (generated by Illumina machine) and empirical quality (generated by Prob(Mismatch bases | base quality Q) = (Total number of mismatched bases| base quality Q) / (Total number of bases| base quality Q)). We omit soft clips, insertion and deletion.

Syntax:

baseQualityCheck [-c max record count] [-q minimumMapQuality] [-r reference] [-s dbSNP file] [-v]
-c -> only process first (max record count).
-q -> alignment with less than minimum mapping quality will not be counted
-r -> reference genome (in KARMA format)
-s -> load SNP positions from the file.  It may either be a text file with chr/index pairs, one per line, or you may use a file created from mkgenomevector (binary memory mapped file). For NCBI 37, a sample dbSNP file is located in /home/bingshan/data/db/dbSNP130.UCSC.coordinates.tbl
-v -> output SAM record in which mismatched bases exist

Thank Bingshan for his qPlot program and his input to finish this program.