Open main menu

Genome Analysis Wiki β


BamUtil: recab

717 bytes added, 15:38, 18 September 2012
no edit summary
= Overview of the <code>recab</code> function of <code>[[bamUtil]]</code> =
The <code>recab</code> option of [[bamUtil]] recalibrates a SAM/BAM file.
Recalibration can also be called as an option of [[bamUtil: dedup]]. This will perform the recalibration and the deduping in the same set of steps, increasing processing speed.
==Handling Recalibration/Implementation Notes==
Reads Not Recalibrated:
* Duplicates
* Unmapped
* Mapping Quality = 0
* Mapping Quality = 255
Recalibration is a 2-step process that loops through the file twice:
# Apply Recalibration Table
The Recalibration is done by grouping Table groups bases based on a set of covariates:
* Read Group
* Quality (either from the quality string or from a tag)* Cycle(reverse complement for reverse strands)
* 1st/2nd read in pair
* Previous Cycle's Base(reverse complement for reverse strands)* This Cycle's Base(reverse complement for reverse strands) The Recalibration Table tracks the number of matches/mismatches for each set of covariates. Only bases meeting all of the following criteria are used to Build the Recalibration Table:* Read criteria** not a duplicate** mapped** mapping quality != 0** mapping quality != 255* Base criteria** match/mismatch (not an insertion/deletion/skip/clip)** not a dbSNP position** base quality > minBaseQual (5 by default)* Additional criteria for cycle != 1 (can be turned off via flags)** previous base is a CIGAR Match/Mismatch** previous base position is not a dbSNP position The Recalibration Table is applied to all bases meeting all of the following criteria:* base quality > minBaseQual (5 by default) The Recalibrated Quality is calculated using: <math>-10 * \log \frac{mismatches + 1}{mismatches + matches + 1}</math> If the Recalibration Table has no matches & no mismatches for a set of covariates, the original base quality is kept.
For Reverse StrandsIf the Recalibrated Quality is greater than maxBaseQual, the reverse complement of the SAM/BAM updated quality is used for the cycle, previous cycle's base, and current cycle's baseset to maxBaseQual.
Not all bases are used for building Optionally, the Recalibration tableprevious quality can be stored in a tag. Only bases meeting the following criteria are used:* Base is a q Match/Mismatch* Previous base is a CIGAR Match/Mismatch or it is the first cycle* Base position is not a dbSNP position* Previous base position is not a dbSNP position (if not first cycle)* Base quality > 5 (or the configurable minimum)
The Recalibration Table is applied on all bases in the read sequence (ignoring the alignment/CIGAR) unless the base quality is < 5 (or the configurable minimum)current recalibration logic was designed for recalibrating ILLUMINA data.
This recalibration logic was designed for recalibrating ILLUMINA data.
== How to use it ==