Line 37: |
Line 37: |
| ** base quality > [[#Minimum Recalibration Base Quality (--minBaseQual)|minBaseQual (5 by default)]] | | ** base quality > [[#Minimum Recalibration Base Quality (--minBaseQual)|minBaseQual (5 by default)]] |
| * Additional criteria for cycle != 1 (can be turned off via flags) | | * Additional criteria for cycle != 1 (can be turned off via flags) |
− | ** previous base is a CIGAR Match/Mismatch | + | ** previous base is a CIGAR Match/Mismatch (Use [[#Allow Previous Base Non-Match/Mismatch (--keepPrevNonAdjacent)|<code>--keepPrevNonAdjacent</code>]] to disable) |
− | ** previous base position is not a [[#DBSNP File (--dbsnp)|dbSNP position]] | + | ** previous base position is not a [[#DBSNP File (--dbsnp)|dbSNP position]] (Use [[#Allow Previous Base DBSNP (--keepPrevDbsnp)|<code>--keepPrevDbsnp</code>]] to disable) |
| | | |
| | | |
Line 48: |
Line 48: |
| Recalibrated Quality is: <math>-10 * \log \frac{mismatches + 1}{mismatches + matches + 1}</math> | | Recalibrated Quality is: <math>-10 * \log \frac{mismatches + 1}{mismatches + matches + 1}</math> |
| | | |
| + | Alternatively, [[#Logistic Regression (--useLogReg)|logistic regression]] can be used for calculating the new quality. |
| | | |
| If the Recalibrated Quality is greater than [[#Maximum Recalibration Base Quality (--maxBaseQual)|maxBaseQual]], the updated quality is set to maxBaseQual. | | If the Recalibrated Quality is greater than [[#Maximum Recalibration Base Quality (--maxBaseQual)|maxBaseQual]], the updated quality is set to maxBaseQual. |
Line 65: |
Line 66: |
| | | |
| = Usage = | | = Usage = |
− | ./bam recab (options) --in <InputBamFile> --out <OutputFile> [--log <logFile>] [--verbose] [--noeof] [--params] --refFile <ReferenceFile> [--dbsnp <dbsnpFile>] [--minBaseQual <minBaseQual>] [--maxBaseQual <maxBaseQual>] [--blended <weight>] [--skipFitModel] [--fast] [--keepPrevDbsnp] [--keepPrevNonAdjacent] [--useLogReg] | + | ./bam recab (options) --in <InputBamFile> --out <OutputFile> [--log <logFile>] [--verbose] [--noeof] [--params] --refFile <ReferenceFile> [--dbsnp <dbsnpFile>] [--minBaseQual <minBaseQual>] [--maxBaseQual <maxBaseQual>] [--blended <weight>] [--fitModel] [--fast] [--keepPrevDbsnp] [--keepPrevNonAdjacent] [--useLogReg] |
| | | |
| = Parameters = | | = Parameters = |
Line 85: |
Line 86: |
| --maxBaseQual <maxBaseQual> : maximum recalibrated base quality (default: 50) | | --maxBaseQual <maxBaseQual> : maximum recalibrated base quality (default: 50) |
| --blended <weight> : blended model weight | | --blended <weight> : blended model weight |
− | --skipFitModel : do not check if the logistic regression model fits the data | + | --fitModel : check if the logistic regression model fits the data |
| + | overriden by fast, but automatically applied by useLogReg |
| --fast : use a compact representation that only allows: | | --fast : use a compact representation that only allows: |
| * at most 256 Read Groups | | * at most 256 Read Groups |
| * maximum quality 63 | | * maximum quality 63 |
| * at most 127 cycles | | * at most 127 cycles |
− | automatically enables skipFitModel, but is overridden by useLogReg | + | overrides fitModel, but is overridden by useLogReg |
| uses up to about 2.25G more memory than running without --fast. | | uses up to about 2.25G more memory than running without --fast. |
| --keepPrevDbsnp : do not exclude entries where the previous base is in dbsnp when | | --keepPrevDbsnp : do not exclude entries where the previous base is in dbsnp when |
Line 99: |
Line 101: |
| By default they are excluded from the table (except the first cycle). | | By default they are excluded from the table (except the first cycle). |
| --useLogReg : use logistic regression calculated quality for the new quality | | --useLogReg : use logistic regression calculated quality for the new quality |
− | ignores setting of skipFitModel and fast. | + | automatically applies fitModel and overrides fast. |
| --qualField <quality tag> : tag to get the starting base quality | | --qualField <quality tag> : tag to get the starting base quality |
| (default is to get it from the Quality field) | | (default is to get it from the Quality field) |
Line 123: |
Line 125: |
| == Reference File (<code>--refFile</code>) == | | == Reference File (<code>--refFile</code>) == |
| | | |
− | The reference file to use for comparing read bases to the reference. | + | The reference file is a required parameter used for comparing read bases to the reference. |
| | | |
| == DBSNP File (<code>--dbsnp</code>) == | | == DBSNP File (<code>--dbsnp</code>) == |
| | | |
| The dbsnp file that specifies positions to skip recalibrating. Tab delimited file with the chromosome in the first column and the 1-based position in the 2nd column. | | The dbsnp file that specifies positions to skip recalibrating. Tab delimited file with the chromosome in the first column and the 1-based position in the 2nd column. |
− |
| |
− | == Blended Model Weight (<code>--blended</code>) ==
| |
− |
| |
− | <span style="color:red">TBD - this parameter is not yet implemented.</span>
| |
| | | |
| == Minimum Recalibration Base Quality (<code>--minBaseQual</code>) == | | == Minimum Recalibration Base Quality (<code>--minBaseQual</code>) == |
Line 142: |
Line 140: |
| | | |
| This value sets the maximum phred base quality assigned to a base after recalibrating. Any qualities above this value will be set to this value. It is defaulted to 50. | | This value sets the maximum phred base quality assigned to a base after recalibrating. Any qualities above this value will be set to this value. It is defaulted to 50. |
| + | |
| + | == Blended Model Weight (<code>--blended</code>) == |
| + | |
| + | <span style="color:red">TBD - this parameter is not yet implemented.</span> |
| + | |
| + | == Fit Model (<code>--fitModel</code>) == |
| + | |
| + | Check if the logistic regression model fits the data. |
| + | |
| + | This option does NOT set the new qualities to the logistic regression calculated qualities, it only checks the fit. To apply the logistic regression qualities, see [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]]. <code>--fitModel</code> is automatically applied when <code>--useLogReg</code> is specified. |
| + | |
| + | This option cannot be used in conjunction with [[#Fast Recalibration (--fast)|<code>--fast</code>]] and is overriden by <code>--fast</code>, but automatically applied by useLogReg |
| + | |
| + | == Fast Recalibration (<code>--fast</code>) == |
| + | |
| + | Use a compact representation of the Recalibration Table that only allows: |
| + | * at most 256 Read Groups |
| + | * maximum quality 63 |
| + | * at most 127 cycles |
| + | |
| + | This option will run faster than the default recalibration, but uses up to about 2.25G more memory than running without --fast. |
| + | |
| + | This option cannot be used in conjunction with [[#Fit Model (--fitModel)|<code>--fitModel</code>]], or [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]] and overrides [[#Fit Model (--fitModel)|<code>--fitModel</code>]], but is overridden by [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]]. |
| + | |
| + | |
| + | This option cannot be used in conjunction with [[#Fast Recalibration (--fast)|<code>--fast</code>]] and is overriden by <code>--fast</code>, but automatically applied by useLogReg |
| + | |
| + | == Allow Previous Base DBSNP (<code>--keepPrevDbsnp</code>) == |
| + | |
| + | By default bases where the previous base is in DBSNP are excluded from the Recalibration Table. |
| + | |
| + | This option includes these bases in the building of the Recalibration Table. |
| + | |
| + | == Allow Previous Base Non-Match/Mismatch (<code>--keepPrevNonAdjacent</code>) == |
| + | |
| + | By default bases where the previous base is not a CIGAR Match/Mismatch are excluded from the Recalibration Table. |
| + | |
| + | This option includes these bases in the building of the Recalibration Table. |
| + | |
| | | |
| == Logistic Regression (<code>--useLogReg</code>) == | | == Logistic Regression (<code>--useLogReg</code>) == |
| | | |
− | Use the logistic regression empirical qualities for setting the new base qualities instead of the default formula: -10 * log10((#mismatches+1)/(#total+1)) | + | Use the logistic regression empirical qualities for setting the new base qualities instead of the default formula. |
| + | |
| + | This option automatically enables [[#Fit Model (--fitModel)|<code>--fitModel</code>]] and disables [[#Fast Recalibration (--fast)|<code>--fast</code>]]. |
| | | |
| == Read the quality from a tag (<code>--qualField</code>) == | | == Read the quality from a tag (<code>--qualField</code>) == |