Difference between revisions of "BamUtil: recab"
(Created page with 'validate Category:BAM Software Category:Software ='''COMING SOON, June, 2012'''= = Overview of the <code>recab</code> function of <code>bamUtil…') |
|||
Line 18: | Line 18: | ||
=== Covariates Notes === | === Covariates Notes === | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== How to use it == | == How to use it == | ||
− | When <code> | + | When <code>recab</code> is invoked without any arguments the usage information is displayed as described below under [[#Usage|Usage]]. |
− | |||
− | |||
− | The output SAM/BAM file | + | The input SAM/BAM file ([[#input File (--in)|--in]]), the output SAM/BAM file ([[#output File (--out)|--out]]), and the reference file ([[#Reference File (--refFile)|--refFile]]) are required inputs. |
= Usage = | = Usage = | ||
Line 67: | Line 48: | ||
{{inBAMInputFile}} | {{inBAMInputFile}} | ||
{{outBAMOutputFile}} | {{outBAMOutputFile}} | ||
− | |||
− | |||
− | |||
− | |||
== Output log & Summary Statistics FileName (<code>--log</code>) == | == Output log & Summary Statistics FileName (<code>--log</code>) == | ||
Line 78: | Line 55: | ||
If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log". Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr. | If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log". Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr. | ||
− | == | + | == Turn on Verbose Mode (<code>--verbose</code>) == |
− | + | Turn on verbose logging to get more log messages in the log and to stderr. | |
− | + | {{noeofBGZFParameter}} | |
+ | {{paramsParameter}} | ||
− | + | == Reference File (<code>--refFile</code>) == | |
− | + | The reference file to use for comparing read bases to the reference. | |
− | + | == DBSNP File (<code>--dbsnp</code>) == | |
− | + | The dbsnp file that specifies positions to skip recalibrating. Tab delimited file with the chromosome in the first column and the 1-based position in the 2nd column. | |
− | + | == Blended Model Weight (<code>--blended</code>) == | |
− | = | + | <span style="color:red">TBD - this parameter is not yet implemented.</span> |
− | + | == BAM File Is Sorted By Read Name (<code>--minRecabQual</code>) == | |
− | + | When recalibrating reads, only positions with a base quality greater than this minimum will be recalibrated. If <code>--minQual</code> is not specified, it is defaulted to <span style="color:red">TBD - this parameter is not yet implemented.</span>. | |
− | |||
= Return Value = | = Return Value = |
Revision as of 11:02, 15 June 2012
COMING SOON, June, 2012
Overview of the recab
function of bamUtil
The recab
option of bamUtil recalibrates a SAM/BAM file.
Handling Recalibration
Reads Not Recalibrated:
- Duplicates
- Unmapped
- Mapping Quality = 0
- Mapping Quality = 255
Covariates Notes
How to use it
When recab
is invoked without any arguments the usage information is displayed as described below under Usage.
The input SAM/BAM file (--in), the output SAM/BAM file (--out), and the reference file (--refFile) are required inputs.
Usage
./bam recab --in <InputBamFile> --out <OutputFile> [--log <logFile>] [--verbose] [--noeof] [--params] --refFile <ReferenceFile> [--dbsnp <dbsnpFile>] [--blended <weight>]
Parameters
Required General Parameters : --in <infile> : input BAM file name --out <outfile> : output recalibration file name Optional General Parameters : --log <logfile> : log and summary statistics (default: [outfile].log) --verbose : Turn on verbose mode --noeof : do not expect an EOF block on a bam file. --params : print the parameter settings Recab Specific Required Parameters --refFile <reference file> : reference file name Recab Specific Optional Parameters : --dbsnp <known variance file> : dbsnp file of positions --blended <weight> : blended model weight
Input File (--in
)
Use --in
followed by your file name to specify the SAM/BAM input file.
The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.
A -
is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).
SAM/BAM/Uncompressed BAM from file | --in yourFileName
|
SAM from stdin | --in - |
BAM from stdin | --in -.bam |
Uncompressed BAM from stdin | --in -.ubam |
Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools
implementation so pipes between our tools and samtools
are supported.
Output File (--out
)
Use --out
followed by your file name to specify the SAM/BAM output file.
The file extension is used to determine whether to write SAM/BAM/uncompressed BAM. A -
is used to indicate stdout and the extension for file type (no extension is SAM).
SAM to file | --out yourFileName.sam
|
BAM to file | --out yourFileName.bam
|
Uncompressed BAM to file | --out yourFileName.ubam
|
SAM to stdout | --out -
|
BAM to stdout | --out -.bam
|
Uncompressed BAM to stdout | --out -.ubam
|
Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools
implementation so pipes between our tools and samtools
are supported.
Output log & Summary Statistics FileName (--log
)
Output file name for writing logs & summary statistics.
If this parameter is not specified, it will write to the output file specified in --out
+ ".log". Or if the output bam is written to stdout (--out
starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr.
Turn on Verbose Mode (--verbose
)
Turn on verbose logging to get more log messages in the log and to stderr.
Do not require BGZF EOF block (--noeof
)
Use --noeof
if you do not expect a trailing eof block in your bgzf file.
By default, the trailing empty block is expected and checked for.
Print the Program Parameters (--params
)
Use --params
to print the parameters for your program to stderr.
Reference File (--refFile
)
The reference file to use for comparing read bases to the reference.
DBSNP File (--dbsnp
)
The dbsnp file that specifies positions to skip recalibrating. Tab delimited file with the chromosome in the first column and the 1-based position in the 2nd column.
Blended Model Weight (--blended
)
TBD - this parameter is not yet implemented.
BAM File Is Sorted By Read Name (--minRecabQual
)
When recalibrating reads, only positions with a base quality greater than this minimum will be recalibrated. If --minQual
is not specified, it is defaulted to TBD - this parameter is not yet implemented..
Return Value
Returns -1 if input parameters are invalid.
Returns the SamStatus for the reads/writes (0 on success).