Line 10: |
Line 10: |
| ==Handling Recalibration/Implementation Notes== | | ==Handling Recalibration/Implementation Notes== |
| | | |
− | Recalibration is a 2-step process that loops through the file twice: | + | Recalibration is a 2-step process that loops through the file twice (stdin is not support as input): |
| # Build Recalibration Table | | # Build Recalibration Table |
| # Apply Recalibration Table | | # Apply Recalibration Table |
Line 59: |
Line 59: |
| | | |
| NOTE: GATK ignores/skips adapters, but our logic does not. | | NOTE: GATK ignores/skips adapters, but our logic does not. |
− |
| |
| | | |
| == How to use it == | | == How to use it == |
Line 78: |
Line 77: |
| | | |
| = Usage = | | = Usage = |
− | ./bam recab (options) --in <InputBamFile> --out <OutputFile> [--log <logFile>] [--verbose] [--noeof] [--params] --refFile <ReferenceFile> [--dbsnp <dbsnpFile>] [--minBaseQual <minBaseQual>] [--maxBaseQual <maxBaseQual>] [--blended <weight>] [--fitModel] [--fast] [--keepPrevDbsnp] [--keepPrevNonAdjacent] [--useLogReg] | + | ./bam recab (options) --in <InputBamFile> --out <OutputFile> [--log <logFile>] [--verbose] [--noeof] [--params] --refFile <ReferenceFile> [--dbsnp <dbsnpFile>] [--minBaseQual <minBaseQual>] [--maxBaseQual <maxBaseQual>] [--blended <weight>] [--fitModel] [--fast] [--keepPrevDbsnp] [--keepPrevNonAdjacent] [--useLogReg] [--qualField <tag>] [--storeQualTag <tag>] [--buildExcludeFlags <flag>] [--applyExcludeFlags <flag>] |
| | | |
| = Parameters = | | = Parameters = |
| <pre> | | <pre> |
| Required General Parameters : | | Required General Parameters : |
− | --in <infile> : input BAM file name
| + | --in <infile> : input BAM file name |
− | --out <outfile> : output recalibration file name
| + | --out <outfile> : output recalibration file name |
− | Optional General Parameters : | + | Optional General Parameters : |
− | --log <logfile> : log and summary statistics (default: [outfile].log)
| + | --log <logfile> : log and summary statistics (default: [outfile].log) |
− | --verbose : Turn on verbose mode
| + | --verbose : Turn on verbose mode |
− | --noeof : do not expect an EOF block on a bam file.
| + | --noeof : do not expect an EOF block on a bam file. |
− | --params : print the parameter settings
| + | --params : print the parameter settings |
| | | |
| Recab Specific Required Parameters | | Recab Specific Required Parameters |
− | --refFile <reference file> : reference file name
| + | --refFile <reference file> : reference file name |
− | Recab Specific Optional Parameters : | + | Recab Specific Optional Parameters : |
− | --dbsnp <known variance file> : dbsnp file of positions
| + | --dbsnp <known variance file> : dbsnp file of positions |
− | --minBaseQual <minBaseQual> : minimum base quality of bases to recalibrate (default: 5)
| + | --minBaseQual <minBaseQual> : minimum base quality of bases to recalibrate (default: 5) |
− | --maxBaseQual <maxBaseQual> : maximum recalibrated base quality (default: 50)
| + | --maxBaseQual <maxBaseQual> : maximum recalibrated base quality (default: 50) |
− | --blended <weight> : blended model weight
| + | qualities over this value will be set to this value. |
− | --fitModel : check if the logistic regression model fits the data
| + | This setting is applied after binning (if applicable). |
− | overriden by fast, but automatically applied by useLogReg
| + | --blended <weight> : blended model weight |
− | --fast : use a compact representation that only allows:
| + | --fitModel : check if the logistic regression model fits the data |
− | * at most 256 Read Groups
| + | overriden by fast, but automatically applied by useLogReg |
− | * maximum quality 63
| + | --fast : use a compact representation that only allows: |
− | * at most 127 cycles
| + | * at most 256 Read Groups |
− | overrides fitModel, but is overridden by useLogReg
| + | * maximum quality 63 |
− | uses up to about 2.25G more memory than running without --fast.
| + | * at most 127 cycles |
− | --keepPrevDbsnp : do not exclude entries where the previous base is in dbsnp when
| + | overrides fitModel, but is overridden by useLogReg |
− | building the recalibration table
| + | uses up to about 2.25G more memory than running without --fast. |
− | By default they are excluded from the table.
| + | --keepPrevDbsnp : do not exclude entries where the previous base is in dbsnp when |
− | --keepPrevNonAdjacent : do not exclude entries where the previous base is not adjacent
| + | building the recalibration table |
− | (not a Cigar M/X/=) when building the recalibration table
| + | By default they are excluded from the table. |
− | By default they are excluded from the table (except the first cycle).
| + | --keepPrevNonAdjacent : do not exclude entries where the previous base is not adjacent |
− | --useLogReg : use logistic regression calculated quality for the new quality
| + | (not a Cigar M/X/=) when building the recalibration table |
− | automatically applies fitModel and overrides fast.
| + | By default they are excluded from the table (except the first cycle). |
− | --qualField <quality tag> : tag to get the starting base quality
| + | --useLogReg : use logistic regression calculated quality for the new quality |
− | (default is to get it from the Quality field)
| + | automatically applies fitModel and overrides fast. |
− | --storeQualTag <quality tag> : tag to store the previous quality into
| + | --qualField <quality tag> : tag to get the starting base quality |
| + | (default is to get it from the Quality field) |
| + | --storeQualTag <quality tag> : tag to store the previous quality into |
| + | --buildExcludeFlags <flag> : exclude reads with any of these flags set when building the |
| + | recalibration table. Default is 0xF04 |
| + | --applyExcludeFlags <flag> : do not apply the recalibration table to any reads with any of these flags set |
| + | Quality Binning Parameters (optional): |
| + | Bin qualities by phred score, into the ranges specified by binQualS or binQualF (both cannot be used) |
| + | Ranges are specified by comma separated minimum phred score for the bin, example: 1,17,20,30,40,50,70 |
| + | The first bin always starts at 0, so does not need to be specified. |
| + | By default, the bin value is the low end of the range. |
| + | --binQualS : Bin the Qualities as specified (phred): minQualOfBin2, minQualofBin3... |
| + | --binQualF : Bin the Qualities based on the specified file |
| + | --binMid : Use the mid point of the quality bin range for the quality value of the bin. |
| + | --binHigh : Use the high end of the quality bin range for the quality value of the bin. |
| + | |
| </pre> | | </pre> |
| + | {{PhoneHomeParamDesc}} |
| | | |
− | {{inBAMInputFile}} | + | == Required Generic Parameters == |
| + | {{inBAMInputFile|noStdin=1}} |
| {{outBAMOutputFile}} | | {{outBAMOutputFile}} |
| | | |
− | == Output log & Summary Statistics FileName (<code>--log</code>) == | + | == Optional Generic Parameters == |
| + | === Output log & Summary Statistics FileName (<code>--log</code>) === |
| | | |
| Output file name for writing logs & summary statistics. | | Output file name for writing logs & summary statistics. |
Line 128: |
Line 145: |
| If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log". Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr. | | If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log". Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr. |
| | | |
− | == Turn on Verbose Mode (<code>--verbose</code>) == | + | === Turn on Verbose Mode (<code>--verbose</code>) === |
| | | |
| Turn on verbose logging to get more log messages in the log and to stderr. | | Turn on verbose logging to get more log messages in the log and to stderr. |
Line 135: |
Line 152: |
| {{paramsParameter}} | | {{paramsParameter}} |
| | | |
− | == Reference File (<code>--refFile</code>) == | + | {{PhoneHomeParameters}} |
| + | |
| + | == Required Recalibration Parameters == |
| + | === Reference File (<code>--refFile</code>) === |
| | | |
| The reference file is a required parameter used for comparing read bases to the reference. | | The reference file is a required parameter used for comparing read bases to the reference. |
| | | |
− | == DBSNP File (<code>--dbsnp</code>) == | + | == Optional Recalibration Parameters == |
| + | |
| + | === DBSNP File (<code>--dbsnp</code>) === |
| | | |
| The dbsnp file that specifies positions to skip recalibrating. Tab delimited file with the chromosome in the first column and the 1-based position in the 2nd column. | | The dbsnp file that specifies positions to skip recalibrating. Tab delimited file with the chromosome in the first column and the 1-based position in the 2nd column. |
| | | |
− | == Minimum Recalibration Base Quality (<code>--minBaseQual</code>) == | + | === Minimum Recalibration Base Quality (<code>--minBaseQual</code>) === |
| | | |
| When recalibrating reads, only positions with a base quality greater than this minimum phred quality will be recalibrated. If <code>--minBaseQual</code> is not specified, it is defaulted to 5. | | When recalibrating reads, only positions with a base quality greater than this minimum phred quality will be recalibrated. If <code>--minBaseQual</code> is not specified, it is defaulted to 5. |
Line 149: |
Line 171: |
| The ILLUMINA specs indicate that any quality below 5 can be used as an error indicator so we do not want to recalibrate those. | | The ILLUMINA specs indicate that any quality below 5 can be used as an error indicator so we do not want to recalibrate those. |
| | | |
− | == Maximum Recalibration Base Quality (<code>--maxBaseQual</code>) == | + | === Maximum Recalibration Base Quality (<code>--maxBaseQual</code>) === |
| | | |
| This value sets the maximum phred base quality assigned to a base after recalibrating. Any qualities above this value will be set to this value. It is defaulted to 50. | | This value sets the maximum phred base quality assigned to a base after recalibrating. Any qualities above this value will be set to this value. It is defaulted to 50. |
| | | |
− | == Blended Model Weight (<code>--blended</code>) == | + | === Blended Model Weight (<code>--blended</code>) === |
| | | |
| <span style="color:red">TBD - this parameter is not yet implemented.</span> | | <span style="color:red">TBD - this parameter is not yet implemented.</span> |
| | | |
− | == Fit Model (<code>--fitModel</code>) == | + | === Fit Model (<code>--fitModel</code>) === |
| | | |
| Check if the logistic regression model fits the data. | | Check if the logistic regression model fits the data. |
Line 165: |
Line 187: |
| This option cannot be used in conjunction with [[#Fast Recalibration (--fast)|<code>--fast</code>]] and is overriden by <code>--fast</code>, but automatically applied by useLogReg | | This option cannot be used in conjunction with [[#Fast Recalibration (--fast)|<code>--fast</code>]] and is overriden by <code>--fast</code>, but automatically applied by useLogReg |
| | | |
− | == Fast Recalibration (<code>--fast</code>) == | + | === Fast Recalibration (<code>--fast</code>) === |
| | | |
| Use a compact representation of the Recalibration Table that only allows: | | Use a compact representation of the Recalibration Table that only allows: |
Line 176: |
Line 198: |
| This option cannot be used in conjunction with [[#Fit Model (--fitModel)|<code>--fitModel</code>]], or [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]] and overrides [[#Fit Model (--fitModel)|<code>--fitModel</code>]], but is overridden by [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]]. | | This option cannot be used in conjunction with [[#Fit Model (--fitModel)|<code>--fitModel</code>]], or [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]] and overrides [[#Fit Model (--fitModel)|<code>--fitModel</code>]], but is overridden by [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]]. |
| | | |
− | == Allow Previous Base DBSNP (<code>--keepPrevDbsnp</code>) == | + | === Allow Previous Base DBSNP (<code>--keepPrevDbsnp</code>) === |
| | | |
| By default bases where the previous base is in DBSNP are excluded from the Recalibration Table. | | By default bases where the previous base is in DBSNP are excluded from the Recalibration Table. |
Line 182: |
Line 204: |
| This option includes these bases in the building of the Recalibration Table. | | This option includes these bases in the building of the Recalibration Table. |
| | | |
− | == Allow Previous Base Non-Match/Mismatch (<code>--keepPrevNonAdjacent</code>) == | + | === Allow Previous Base Non-Match/Mismatch (<code>--keepPrevNonAdjacent</code>) === |
| | | |
| By default bases where the previous base is not a CIGAR Match/Mismatch are excluded from the Recalibration Table. | | By default bases where the previous base is not a CIGAR Match/Mismatch are excluded from the Recalibration Table. |
Line 189: |
Line 211: |
| | | |
| | | |
− | == Logistic Regression (<code>--useLogReg</code>) == | + | === Logistic Regression (<code>--useLogReg</code>) === |
| | | |
| Use the logistic regression empirical qualities for setting the new base qualities instead of the default formula. | | Use the logistic regression empirical qualities for setting the new base qualities instead of the default formula. |
Line 195: |
Line 217: |
| This option automatically enables [[#Fit Model (--fitModel)|<code>--fitModel</code>]] and disables [[#Fast Recalibration (--fast)|<code>--fast</code>]]. | | This option automatically enables [[#Fit Model (--fitModel)|<code>--fitModel</code>]] and disables [[#Fast Recalibration (--fast)|<code>--fast</code>]]. |
| | | |
− | == Read the quality from a tag (<code>--qualField</code>) == | + | === Read the quality from a tag (<code>--qualField</code>) === |
| | | |
| If this parameter is set, then read the quality string from the specified tag name. If the tag is not found, the quality is read from the quality field. | | If this parameter is set, then read the quality string from the specified tag name. If the tag is not found, the quality is read from the quality field. |
| | | |
− | == Store the original quality (<code>--storeQualTag</code>) == | + | === Store the original quality (<code>--storeQualTag</code>) === |
| | | |
| If this parameter is set, the original quality will be stored as a string in the specified tag. | | If this parameter is set, the original quality will be stored as a string in the specified tag. |
| + | |
| + | === Skip Records with any of the Specified Flags (<code>--buildExcludeFlags</code>, <code>--applyExcludeFlags</code>) === |
| + | Use <code>--buildExcludeFlags</code> to skip records with any of the specified flags set when building the recalibration table, default 0xF04. |
| + | |
| + | By default, when building the recalibration table reads with any of the following flags set are skipped: |
| + | * unmapped |
| + | * secondary alignment |
| + | * fails QC checks |
| + | * duplicate |
| + | * supplementary alignment |
| + | |
| + | Use <code>--applyExcludeFlags</code> to skip records with any of the specified flags set when applying the recalibration table. The default value is 0x000, do not skip any reads. |
| | | |
| = Return Value = | | = Return Value = |
Line 207: |
Line 241: |
| Returns -1 if input parameters are invalid. | | Returns -1 if input parameters are invalid. |
| | | |
− | Returns the SamStatus for the reads/writes (0 on success). | + | Returns the SamStatus for the reads/writes (0 on success, non-0 on failure). |