BamUtil: recab
COMING SOON, June, 2012
Overview of the recab
function of bamUtil
The recab
option of bamUtil recalibrates a SAM/BAM file.
Handling Recalibration
Reads Not Recalibrated:
- Duplicates
- Unmapped
- Mapping Quality = 0
- Mapping Quality = 255
Covariates Notes
Coming Soon
How to use it
When recab
is invoked without any arguments the usage information is displayed as described below under Usage.
The input SAM/BAM file (--in), the output SAM/BAM file (--out), and the reference file (--refFile) are required inputs.
Usage
./bam recab --in <InputBamFile> --out <OutputFile> [--log <logFile>] [--verbose] [--noeof] [--params] --refFile <ReferenceFile> [--dbsnp <dbsnpFile>] [--blended <weight>]
Parameters
Required General Parameters : --in <infile> : input BAM file name --out <outfile> : output recalibration file name Optional General Parameters : --log <logfile> : log and summary statistics (default: [outfile].log) --verbose : Turn on verbose mode --noeof : do not expect an EOF block on a bam file. --params : print the parameter settings Recab Specific Required Parameters --refFile <reference file> : reference file name Recab Specific Optional Parameters : --dbsnp <known variance file> : dbsnp file of positions --blended <weight> : blended model weight
Input File (--in
)
Use --in
followed by your file name to specify the SAM/BAM input file.
The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.
A -
is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).
SAM/BAM/Uncompressed BAM from file | --in yourFileName
|
SAM from stdin | --in - |
BAM from stdin | --in -.bam |
Uncompressed BAM from stdin | --in -.ubam |
Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools
implementation so pipes between our tools and samtools
are supported.
Output File (--out
)
Use --out
followed by your file name to specify the SAM/BAM output file.
The file extension is used to determine whether to write SAM/BAM/uncompressed BAM. A -
is used to indicate stdout and the extension for file type (no extension is SAM).
SAM to file | --out yourFileName.sam
|
BAM to file | --out yourFileName.bam
|
Uncompressed BAM to file | --out yourFileName.ubam
|
SAM to stdout | --out -
|
BAM to stdout | --out -.bam
|
Uncompressed BAM to stdout | --out -.ubam
|
Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools
implementation so pipes between our tools and samtools
are supported.
Output log & Summary Statistics FileName (--log
)
Output file name for writing logs & summary statistics.
If this parameter is not specified, it will write to the output file specified in --out
+ ".log". Or if the output bam is written to stdout (--out
starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr.
Turn on Verbose Mode (--verbose
)
Turn on verbose logging to get more log messages in the log and to stderr.
Do not require BGZF EOF block (--noeof
)
Use --noeof
if you do not expect a trailing eof block in your bgzf file.
By default, the trailing empty block is expected and checked for.
Print the Program Parameters (--params
)
Use --params
to print the parameters for your program to stderr.
Reference File (--refFile
)
The reference file to use for comparing read bases to the reference.
DBSNP File (--dbsnp
)
The dbsnp file that specifies positions to skip recalibrating. Tab delimited file with the chromosome in the first column and the 1-based position in the 2nd column.
Blended Model Weight (--blended
)
TBD - this parameter is not yet implemented.
BAM File Is Sorted By Read Name (--minRecabQual
)
When recalibrating reads, only positions with a base quality greater than this minimum will be recalibrated. If --minQual
is not specified, it is defaulted to TBD - this parameter is not yet implemented..
Return Value
Returns -1 if input parameters are invalid.
Returns the SamStatus for the reads/writes (0 on success).