COMING SOON, June, 2012

Overview of the recab function of bamUtil

The recab option of bamUtil recalibrates a SAM/BAM file.

Handling Recalibration

Reads Not Recalibrated:

  • Duplicates
  • Unmapped
  • Mapping Quality = 0
  • Mapping Quality = 255

Covariates Notes

Duplicates are determined by checking for matching keys.

The Key is comprised of:

  1. Chromosome
  2. Orientation (forward/reverse)
  3. Unclipped Start(forward)/End(reverse)
  4. Library


  • Skip Unmapped Reads, they are not marked as duplicate
  • Mark a Single-End Read Duplicate (or remove it if configured to do so) if:
    1. A paired-end record has the same key (even if the pair is not proper/the mate is unmapped/the mate is not found)
    2. A single-end record has the same key and a higher base quality sum (sum of all base qualities in the record)
  • Mark both Paired-End Reads Duplicate if:
  1. Another paired-end pair has the same set of keys and has a higher base quality sum.

This code assumes that at most 1000 bases are clipped at the start of a read.

How to use it

When dedup is invoked without any arguments the usage information is displayed as described below under Usage.

The input SAM/BAM file is required, input File (--in), and must be sorted by coordinate.

The output SAM/BAM file is also required, output File (--out).


./bam recab --in <InputBamFile> --out <OutputFile> [--log <logFile>] [--verbose] [--noeof] [--params] --refFile <ReferenceFile> [--dbsnp <dbsnpFile>] [--blended <weight>] 


Required General Parameters :
	--in <infile>   : input BAM file name
	--out <outfile> : output recalibration file name
Optional General Parameters : 
	--log <logfile> : log and summary statistics (default: [outfile].log)
	--verbose       : Turn on verbose mode
	--noeof         : do not expect an EOF block on a bam file.
	--params        : print the parameter settings

Recab Specific Required Parameters
	--refFile <reference file>    : reference file name
Recab Specific Optional Parameters : 
	--dbsnp <known variance file> : dbsnp file of positions
	--blended <weight>            : blended model weight

Input File (--in)

Use --in followed by your file name to specify the SAM/BAM input file.

The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.

A - is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).

SAM/BAM/Uncompressed BAM from file --in yourFileName
SAM from stdin --in -
BAM from stdin --in -.bam
Uncompressed BAM from stdin --in -.ubam

Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

Output File (--out)

Use --out followed by your file name to specify the SAM/BAM output file.

The file extension is used to determine whether to write SAM/BAM/uncompressed BAM. A - is used to indicate stdout and the extension for file type (no extension is SAM).

SAM to file --out yourFileName.sam
BAM to file --out yourFileName.bam
Uncompressed BAM to file --out yourFileName.ubam
SAM to stdout --out -
BAM to stdout --out -.bam
Uncompressed BAM to stdout --out -.ubam

Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

When recalibrating reads, only positions with a base quality greater than this minimum will be recalibrated. If --minQual is not specified, it is defaulted to TBD.

Output log & Summary Statistics FileName (--log)

Output file name for writing logs & summary statistics.

If this parameter is not specified, it will write to the output file specified in --out + ".log". Or if the output bam is written to stdout (--out starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr.

Turn on Verbose Mode (--verbose)

Turn on verbose logging to get more log messages in the log and to stderr.

Do not require BGZF EOF block (--noeof)

Use --noeof if you do not expect a trailing eof block in your bgzf file.

By default, the trailing empty block is expected and checked for.

Print the Program Parameters (--params)

Use --params to print the parameters for your program to stderr.

Return Value

Returns -1 if input parameters are invalid.

Returns the SamStatus for the reads/writes (0 on success).