BamUtil: splitBam

From Genome Analysis Wiki
Jump to navigationJump to search

Overview of the splitBam function of bamUtil

The splitBam option on the bamUtil executable split a BAM file into multiple BAM files based on ReadGroup according to the following details.

  1. Creates multiple output files named [outprefix].[RGID].bam, for each ReadGroup ID (RGID) existing in the bam file
  2. Headers are a copy of the original file, removing @RG and @PG headers where IDs match with the other ReadGroup IDs.
  3. Copy each of the original file's BAM record to one of the output file where the ReadGroup ID matches


Usage

./bam splitBam [-v] -i <inputBAMFile> -o <outPrefix> [-L logFile]


Parameters

Required arguments:
  -i/--in [inputBAMFile] : Original BAM file containing readGroup info
  -o/--out [outPrefix] : prefix of output bam files of [outprefix].[RGID].bam
Optional arguments:
  -L/--log [logFile]  : log file name. default is listFile.log
  -v/--verbose : turn on verbose mode
  -n/--noeof : turn off the check for an EOF block at the end of a bam file
	PhoneHome:
		--noPhoneHome       : disable PhoneHome (default enabled)
		--phoneHomeThinning : adjust the PhoneHome thinning parameter (default 50)

Required Parameters

Input File (--in)

Use --in followed by your file name to specify the SAM/BAM input file.

The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.

A - is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).

SAM/BAM/Uncompressed BAM from file --in yourFileName
SAM from stdin --in -
BAM from stdin --in -.bam
Uncompressed BAM from stdin --in -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

Output File Basename (--out)

Use --out followed by the base output filename (no extension) to specify the BAM basename to use for the output files.

The read group name and .bam will be appended to the specified basename.

Optional Parameters

Specify Log Filename (--log)

Use --log followed by the log filename to specify the log filename. Default is the output file basename with a .log extension

Verbose (--verbose)

Use --verbose to turn on verbose mode.

Do not require BGZF EOF block (--noeof)

Use --noeof if you do not expect a trailing eof block in your bgzf file.

By default, the trailing empty block is expected and checked for.

PhoneHome Parameters

See PhoneHome for more information on how PhoneHome works and what it does.

Turn off PhoneHome (--noPhoneHome)

Use the --noPhoneHome option to completely disable PhoneHome. PhoneHome is enabled by default based on the thinning parameter.

Adjust the Frequency of PhoneHome (--phoneHomeThinning)

Use --phoneHomeThinning to modify the percentage of the time that PhoneHome will run (0-100).

  • By default, --phoneHomeThinning is set to 50, running 50% of the time.
  • PhoneHome will only occur if the run's random number modulo 100 is less than the --phoneHomeThinning value.
  • N/A if --noPhoneHome is set.

Return Value

  • 0: Success.
  • non-0: Failure.