3,045
edits
Changes
From Genome Analysis Wiki
→Required Generic Parameters
= Overview of the <code>recab</code> function of <code>[[bamUtil]]</code> =
The <code>recab</code> option of [[bamUtil]] recalibrates a SAM/BAM file.
Recalibration can also be called as an option of [[bamUtil: dedup]]. This will perform the recalibration and the deduping in the same set of steps, increasing processing speed.
==Handling Recalibration/Implementation Notes==
# Build Recalibration Table
# Apply Recalibration Table
The Recalibration is done by grouping Table groups bases based on a set of covariates:
* Read Group
* Quality (either from the quality string or [[#Read the quality from a tag (--qualField)|from a tag]])* Cycle(reverse complement for reverse strands)
* 1st/2nd read in pair
* Previous Cycle's Base(reverse complement for reverse strands)* This Cycle's Base(reverse complement for reverse strands) The Recalibration Table tracks the number of matches/mismatches for each set of covariates. Only bases meeting all of the following criteria are used to Build the Recalibration Table:* Read criteria** not a duplicate** mapped** mapping quality != 0** mapping quality != 255* Base criteria** match/mismatch (not an insertion/deletion/skip/clip)** not a [[#DBSNP File (--dbsnp)|dbSNP position]]** base quality > [[#Minimum Recalibration Base Quality (--minBaseQual)|minBaseQual (5 by default)]]* Additional criteria for cycle != 1 (can be turned off via flags)** previous base is a CIGAR Match/Mismatch (Use [[#Allow Previous Base Non-Match/Mismatch (--keepPrevNonAdjacent)|<code>--keepPrevNonAdjacent</code>]] to disable)** previous base position is not a [[#DBSNP File (--dbsnp)|dbSNP position]] (Use [[#Allow Previous Base DBSNP (--keepPrevDbsnp)|<code>--keepPrevDbsnp</code>]] to disable) The Recalibration Table is applied to all bases meeting all of the following criteria (even if they were not used for creating the table):* base quality > [[#Minimum Recalibration Base Quality (--minBaseQual)|minBaseQual (5 by default)]]* at least 1 match or mismatch for the set of covariates Recalibrated Quality is: <math>-10 * \log \frac{mismatches + 1}{mismatches + matches + 1}</math> Alternatively, [[#Logistic Regression (--useLogReg)|logistic regression]] can be used for calculating the new quality. If the Recalibrated Quality is greater than [[#Maximum Recalibration Base Quality (--maxBaseQual)|maxBaseQual]], the updated quality is set to maxBaseQual.
== How to use it ==
The input SAM/BAM file ([[#input File (--in)|--in]]), the output SAM/BAM file ([[#output File (--out)|--out]]), and the reference file ([[#Reference File (--refFile)|--refFile]]) are required inputs.
Recommended usage with Deduper:
/usr/cluster/bin/bam dedup --recab --in ${INPUT}.bam --out ${OUTPUT}.bam --force --refFile ${REF} --dbsnp ${DBSNP} --oneChrom --storeQualTag OQ --maxBaseQual 40
Recommended usage without Deduper:
/usr/cluster/bin/bam recab --in ${INPUT}.bam --out ${OUTPUT}.bam --refFile ${REF} --dbsnp ${DBSNP} --storeQualTag OQ --maxBaseQual 40
= Usage =
./bam recab (options) --in <InputBamFile> --out <OutputFile> [--log <logFile>] [--verbose] [--noeof] [--params] --refFile <ReferenceFile> [--dbsnp <dbsnpFile>] [--minBaseQual <minBaseQual>] [--maxBaseQual <maxBaseQual>] [--blended <weight>] [--recabLogRegfitModel] [--fast] [--keepPrevDbsnp] [--keepPrevNonAdjacent] [--useLogReg] [--qualField <tag>] [--storeQualTag <tag>] [--buildExcludeFlags <flag>] [--applyExcludeFlags <flag>]
= Parameters =
<pre>
Required General Parameters :
Recab Specific Required Parameters
</pre>
{{PhoneHomeParamDesc}}
== Required Generic Parameters =={{inBAMInputFile|noStdin=1}}
{{outBAMOutputFile}}
== Optional Generic Parameters ===== Output log & Summary Statistics FileName (<code>--log</code>) ===
Output file name for writing logs & summary statistics.
If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log". Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr.
=== Turn on Verbose Mode (<code>--verbose</code>) ===
Turn on verbose logging to get more log messages in the log and to stderr.
{{paramsParameter}}
== Blended Model Weight = DBSNP File (<code>--blendeddbsnp</code>) ===
=== Minimum Recalibration Base Quality (<code>--minBaseQual</code>) ===
When recalibrating reads, only positions with a base quality greater than this minimum phred quality will be recalibrated. If <code>--minBaseQual</code> is not specified, it is defaulted to 5.
The ILLUMINA specs indicate that any quality below 5 can be used as an error indicator so we do not want to recalibrate those.
=== Maximum Recalibration Base Quality (<code>--maxBaseQual</code>) ===
This value sets the maximum phred base quality assigned to a base after recalibrating. Any qualities above this value will be set to this value. It is defaulted to 50.
== = Blended Model Weight (<code>--blended</code>) === <span style="color:red">TBD - this parameter is not yet implemented.</span> === Fit Model (<code>--fitModel</code>) === Check if the logistic regression model fits the data. This option does NOT set the new qualities to the logistic regression calculated qualities, it only checks the fit. To apply the logistic regression qualities, see [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]]. <code>--fitModel</code> is automatically applied when <code>--useLogReg</code> is specified. This option cannot be used in conjunction with [[#Fast Recalibration (--fast)|<code>--fast</code>]] and is overriden by <code>--fast</code>, but automatically applied by useLogReg == = Fast Recalibration (<code>--recabLogRegfast</code>) === Use a compact representation of the Recalibration Table that only allows:* at most 256 Read Groups* maximum quality 63* at most 127 cycles This option will run faster than the default recalibration, but uses up to about 2.25G more memory than running without --fast. This option cannot be used in conjunction with [[#Fit Model (--fitModel)|<code>--fitModel</code>]], or [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]] and overrides [[#Fit Model (--fitModel)|<code>--fitModel</code>]], but is overridden by [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]]. === Allow Previous Base DBSNP (<code>--keepPrevDbsnp</code>) === By default bases where the previous base is in DBSNP are excluded from the Recalibration Table. This option includes these bases in the building of the Recalibration Table. === Allow Previous Base Non-Match/Mismatch (<code>--keepPrevNonAdjacent</code>) === By default bases where the previous base is not a CIGAR Match/Mismatch are excluded from the Recalibration Table.
=== Logistic Regression (<code>--useLogReg</code>) === Use the logistic regression empirical qualities for setting the new base qualities instead of the default formula. This option automatically enables [[#Fit Model (--fitModel)|<code>--fitModel</code>]] and disables [[#Fast Recalibration (--fast)|<code>--fast</code>]]. === Read the quality from a tag (<code>--qualField</code>) ===
If this parameter is set, then read the quality string from the specified tag name. If the tag is not found, the quality is read from the quality field.
=== Store the original quality (<code>--storeQualTag</code>) ===
If this parameter is set, the original quality will be stored as a string in the specified tag.
=== Skip Records with any of the Specified Flags (<code>--buildExcludeFlags</code>, <code>--applyExcludeFlags</code>) ===
Use <code>--buildExcludeFlags</code> to skip records with any of the specified flags set when building the recalibration table, default 0xF04.
By default, when building the recalibration table reads with any of the following flags set are skipped:
* unmapped
* secondary alignment
* fails QC checks
* duplicate
* supplementary alignment
Use <code>--applyExcludeFlags</code> to skip records with any of the specified flags set when applying the recalibration table. The default value is 0x000, do not skip any reads.
= Return Value =
Returns -1 if input parameters are invalid.
Returns the SamStatus for the reads/writes (0 on success, non-0 on failure).