Changes

BamUtil: recab (view source)

Revision as of 16:38, 18 September 2012

717 bytes added , 16:38, 18 September 2012

no edit summary

Line 5: Line 5:

= Overview of the <code>recab</code> function of <code>[[bamUtil]]</code> =

The <code>recab</code> option of [[bamUtil]] recalibrates a SAM/BAM file.

+

Recalibration can also be called as an option of [[bamUtil: dedup]]. This will perform the recalibration and the deduping in the same set of steps, increasing processing speed.

==Handling Recalibration/Implementation Notes==

−

~~Reads Not Recalibrated:~~

−

* Duplicates

−

* Unmapped

−

* Mapping Quality = 0

−

* Mapping Quality = 255

Recalibration is a 2-step process that loops through the file twice:

Line 18: Line 14:

# Apply Recalibration Table

−

Recalibration ~~is done by grouping~~ bases based on a set of covariates:

+

The Recalibration Table groups bases based on a set of covariates:

* Read Group

−

* Cycle

+

* Quality (either from the quality string or from a tag)

+

* Cycle (reverse complement for reverse strands)

* 1st/2nd read in pair

−

* Previous Cycle's Base

+

* Previous Cycle's Base (reverse complement for reverse strands)

−

* This Cycle's Base

+

* This Cycle's Base (reverse complement for reverse strands)

+

The Recalibration Table tracks the number of matches/mismatches for each set of covariates.

+

Only bases meeting all of the following criteria are used to Build the Recalibration Table:

+

* Read criteria

+

** not a duplicate

+

** mapped

+

** mapping quality != 0

+

** mapping quality != 255

+

* Base criteria

+

** match/mismatch (not an insertion/deletion/skip/clip)

+

** not a dbSNP position

+

** base quality > minBaseQual (5 by default)

+

* Additional criteria for cycle != 1 (can be turned off via flags)

+

** previous base is a CIGAR Match/Mismatch

+

** previous base position is not a dbSNP position

+

The Recalibration Table is applied to all bases meeting all of the following criteria:

+

* base quality > minBaseQual (5 by default)

+

The Recalibrated Quality is calculated using: <math>-10 * \log \frac{mismatches + 1}{mismatches + matches + 1}</math>

+

If the Recalibration Table has no matches & no mismatches for a set of covariates, the original base quality is kept.

−

~~For Reverse Strands~~, the ~~reverse complement of the SAM/BAM~~ is ~~used for the cycle, previous cycle's base, and current cycle's base~~.

+

If the Recalibrated Quality is greater than maxBaseQual, the updated quality is set to maxBaseQual.

−

~~Not all bases are used for building~~ the ~~Recalibration table~~. ~~Only bases meeting the following criteria are used:~~

+

Optionally, the previous quality can be stored in a tag.

−

* Base is a q Match/Mismatch

−

* Previous base is a CIGAR Match/Mismatch or it is the first cycle

−

* Base position is not a dbSNP position

−

* Previous base position is not a dbSNP position (if not first cycle)

−

* Base quality > 5 (or the configurable minimum)

−

The ~~Recalibration Table is applied on all bases in the read sequence (ignoring the alignment/CIGAR) unless the base quality is < 5 (or the configurable minimum)~~

+

The current recalibration logic was designed for recalibrating ILLUMINA data.

−

~~This recalibration logic was designed for recalibrating ILLUMINA data.~~

== How to use it ==

Mktrost

Administrators

3,045

edits

Changes

BamUtil: recab (view source)

Revision as of 16:38, 18 September 2012

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools