Open main menu

Genome Analysis Wiki β

Changes

BamUtil: recab

2,549 bytes added, 17:42, 18 September 2012
no edit summary
** base quality > [[#Minimum Recalibration Base Quality (--minBaseQual)|minBaseQual (5 by default)]]
* Additional criteria for cycle != 1 (can be turned off via flags)
** previous base is a CIGAR Match/Mismatch(Use [[#Allow Previous Base Non-Match/Mismatch (--keepPrevNonAdjacent)|<code>--keepPrevNonAdjacent</code>]] to disable)** previous base position is not a [[#DBSNP File (--dbsnp)|dbSNP position]](Use [[#Allow Previous Base DBSNP (--keepPrevDbsnp)|<code>--keepPrevDbsnp</code>]] to disable)
Recalibrated Quality is: <math>-10 * \log \frac{mismatches + 1}{mismatches + matches + 1}</math>
Alternatively, [[#Logistic Regression (--useLogReg)|logistic regression]] can be used for calculating the new quality.
If the Recalibrated Quality is greater than [[#Maximum Recalibration Base Quality (--maxBaseQual)|maxBaseQual]], the updated quality is set to maxBaseQual.
= Usage =
./bam recab (options) --in <InputBamFile> --out <OutputFile> [--log <logFile>] [--verbose] [--noeof] [--params] --refFile <ReferenceFile> [--dbsnp <dbsnpFile>] [--minBaseQual <minBaseQual>] [--maxBaseQual <maxBaseQual>] [--blended <weight>] [--skipFitModelfitModel] [--fast] [--keepPrevDbsnp] [--keepPrevNonAdjacent] [--useLogReg]
= Parameters =
--maxBaseQual <maxBaseQual> : maximum recalibrated base quality (default: 50)
--blended <weight> : blended model weight
--skipFitModel fitModel : do not check if the logistic regression model fits the data overriden by fast, but automatically applied by useLogReg
--fast : use a compact representation that only allows:
* at most 256 Read Groups
* maximum quality 63
* at most 127 cycles
automatically enables skipFitModeloverrides fitModel, but is overridden by useLogReg
uses up to about 2.25G more memory than running without --fast.
--keepPrevDbsnp : do not exclude entries where the previous base is in dbsnp when
By default they are excluded from the table (except the first cycle).
--useLogReg : use logistic regression calculated quality for the new quality
ignores setting of skipFitModel automatically applies fitModel and overrides fast.
--qualField <quality tag> : tag to get the starting base quality
(default is to get it from the Quality field)
== Reference File (<code>--refFile</code>) ==
The reference file to use is a required parameter used for comparing read bases to the reference.
== DBSNP File (<code>--dbsnp</code>) ==
The dbsnp file that specifies positions to skip recalibrating. Tab delimited file with the chromosome in the first column and the 1-based position in the 2nd column.
 
== Blended Model Weight (<code>--blended</code>) ==
 
<span style="color:red">TBD - this parameter is not yet implemented.</span>
== Minimum Recalibration Base Quality (<code>--minBaseQual</code>) ==
This value sets the maximum phred base quality assigned to a base after recalibrating. Any qualities above this value will be set to this value. It is defaulted to 50.
 
== Blended Model Weight (<code>--blended</code>) ==
 
<span style="color:red">TBD - this parameter is not yet implemented.</span>
 
== Fit Model (<code>--fitModel</code>) ==
 
Check if the logistic regression model fits the data.
 
This option does NOT set the new qualities to the logistic regression calculated qualities, it only checks the fit. To apply the logistic regression qualities, see [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]]. <code>--fitModel</code> is automatically applied when <code>--useLogReg</code> is specified.
This option cannot be used in conjunction with [[#Fast Recalibration (--fast)|<code>--fast</code>]] and is overriden by <code>--fast</code>, but automatically applied by useLogReg
 
== Fast Recalibration (<code>--fast</code>) ==
 
Use a compact representation of the Recalibration Table that only allows:
* at most 256 Read Groups
* maximum quality 63
* at most 127 cycles
 
This option will run faster than the default recalibration, but uses up to about 2.25G more memory than running without --fast.
 
This option cannot be used in conjunction with [[#Fit Model (--fitModel)|<code>--fitModel</code>]], or [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]] and overrides [[#Fit Model (--fitModel)|<code>--fitModel</code>]], but is overridden by [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]].
 
This option cannot be used in conjunction with [[#Fast Recalibration (--fast)|<code>--fast</code>]] and is overriden by <code>--fast</code>, but automatically applied by useLogReg
 
== Allow Previous Base DBSNP (<code>--keepPrevDbsnp</code>) ==
 
By default bases where the previous base is in DBSNP are excluded from the Recalibration Table.
 
This option includes these bases in the building of the Recalibration Table.
 
== Allow Previous Base Non-Match/Mismatch (<code>--keepPrevNonAdjacent</code>) ==
 
By default bases where the previous base is not a CIGAR Match/Mismatch are excluded from the Recalibration Table.
 
This option includes these bases in the building of the Recalibration Table.
 
== Logistic Regression (<code>--useLogReg</code>) ==
Use the logistic regression empirical qualities for setting the new base qualities instead of the default formula: . This option automatically enables [[#Fit Model (--10 * log10((#mismatches+1fitModel)|<code>--fitModel</code>]] and disables [[#Fast Recalibration (#total+1)--fast)|<code>--fast</code>]].
== Read the quality from a tag (<code>--qualField</code>) ==