BamUtil: recab

From Genome Analysis Wiki
Revision as of 11:02, 15 June 2012 by Mktrost (talk | contribs)
Jump to: navigation, search


COMING SOON, June, 2012

Overview of the recab function of bamUtil

The recab option of bamUtil recalibrates a SAM/BAM file.

Handling Recalibration

Reads Not Recalibrated:

  • Duplicates
  • Unmapped
  • Mapping Quality = 0
  • Mapping Quality = 255


Covariates Notes

Coming Soon

How to use it

When recab is invoked without any arguments the usage information is displayed as described below under Usage.

The input SAM/BAM file (--in), the output SAM/BAM file (--out), and the reference file (--refFile) are required inputs.

Usage

./bam recab --in <InputBamFile> --out <OutputFile> [--log <logFile>] [--verbose] [--noeof] [--params] --refFile <ReferenceFile> [--dbsnp <dbsnpFile>] [--blended <weight>] 

Parameters

Required General Parameters :
	--in <infile>   : input BAM file name
	--out <outfile> : output recalibration file name
Optional General Parameters : 
	--log <logfile> : log and summary statistics (default: [outfile].log)
	--verbose       : Turn on verbose mode
	--noeof         : do not expect an EOF block on a bam file.
	--params        : print the parameter settings

Recab Specific Required Parameters
	--refFile <reference file>    : reference file name
Recab Specific Optional Parameters : 
	--dbsnp <known variance file> : dbsnp file of positions
	--blended <weight>            : blended model weight

Input File (--in)

Use --in followed by your file name to specify the SAM/BAM input file.

The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.

A - is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).

SAM/BAM/Uncompressed BAM from file --in yourFileName
SAM from stdin --in -
BAM from stdin --in -.bam
Uncompressed BAM from stdin --in -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

Output File (--out)

Use --out followed by your file name to specify the SAM/BAM output file.

The file extension is used to determine whether to write SAM/BAM/uncompressed BAM. A - is used to indicate stdout and the extension for file type (no extension is SAM).

SAM to file --out yourFileName.sam
BAM to file --out yourFileName.bam
Uncompressed BAM to file --out yourFileName.ubam
SAM to stdout --out -
BAM to stdout --out -.bam
Uncompressed BAM to stdout --out -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

Output log & Summary Statistics FileName (--log)

Output file name for writing logs & summary statistics.

If this parameter is not specified, it will write to the output file specified in --out + ".log". Or if the output bam is written to stdout (--out starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr.

Turn on Verbose Mode (--verbose)

Turn on verbose logging to get more log messages in the log and to stderr.

Do not require BGZF EOF block (--noeof)

Use --noeof if you do not expect a trailing eof block in your bgzf file.

By default, the trailing empty block is expected and checked for.

Print the Program Parameters (--params)

Use --params to print the parameters for your program to stderr.

Reference File (--refFile)

The reference file to use for comparing read bases to the reference.

DBSNP File (--dbsnp)

The dbsnp file that specifies positions to skip recalibrating. Tab delimited file with the chromosome in the first column and the 1-based position in the 2nd column.

Blended Model Weight (--blended)

TBD - this parameter is not yet implemented.

BAM File Is Sorted By Read Name (--minRecabQual)

When recalibrating reads, only positions with a base quality greater than this minimum will be recalibrated. If --minQual is not specified, it is defaulted to TBD - this parameter is not yet implemented..

Return Value

Returns -1 if input parameters are invalid.

Returns the SamStatus for the reads/writes (0 on success).