BamUtil: splitChromosome

From Genome Analysis Wiki
Jump to: navigation, search


Overview of the splitChromosome function of bamUtil

The splitChromosome option on the bamUtil executable splits an indexed BAM file into multiple files based on the Chromosome (Reference Name).

The files all have the same base name, but with the chromosome name ".bam" or ".sam" appended.

Usage

./bam splitChromosome --in <inputFilename>  --out <outputFileBaseName> [--noeof] [--bamout|--samout] [--params]

Parameters

    Required Parameters:
        --in       : the BAM file to be split
        --out      : the base filename for the SAM/BAM files to write into.  Does not include the extension.
                     CHROM.bam or CHROM.sam will be appended to the basename where CHROM is the chromosome name.
    Optional Parameters:
        --noeof  : do not expect an EOF block on a bam file.
        --bamout : write the output files in BAM format (default).
        --samout : write the output files in SAM format.
        --params : print the parameter settings
	PhoneHome:
		--noPhoneHome       : disable PhoneHome (default enabled)
		--phoneHomeThinning : adjust the PhoneHome thinning parameter (default 50)

Required Parameters

Input File (--in)

Use --in followed by your file name to specify the SAM/BAM input file.

The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.

A - is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).

SAM/BAM/Uncompressed BAM from file --in yourFileName
SAM from stdin --in -
BAM from stdin --in -.bam
Uncompressed BAM from stdin --in -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

Output File Basename (--out)

Use --out followed by the base output filename (no extension) to specify the SAM/BAM basename to use for the output files.

The chromosome name and the appropriate extension (sam/bam) will be appended to the specified basename.

Optional Parameters

Do not require BGZF EOF block (--noeof)

Use --noeof if you do not expect a trailing eof block in your bgzf file.

By default, the trailing empty block is expected and checked for.

Output a BAM File (--bamout)

--bamout indicates to output a BAM file. This flag is enabled by default.

Output a SAM File (--samout)

Use --samout to output a SAM file.

Print the Program Parameters (--params)

Use --params to print the parameters for your program to stderr.

PhoneHome Parameters

See PhoneHome for more information on how PhoneHome works and what it does.

Turn off PhoneHome (--noPhoneHome)

Use the --noPhoneHome option to completely disable PhoneHome. PhoneHome is enabled by default based on the thinning parameter.

Adjust the Frequency of PhoneHome (--phoneHomeThinning)

Use --phoneHomeThinning to modify the percentage of the time that PhoneHome will run (0-100).

  • By default, --phoneHomeThinning is set to 50, running 50% of the time.
  • PhoneHome will only occur if the run's random number modulo 100 is less than the --phoneHomeThinning value.
  • N/A if --noPhoneHome is set.

Return Value

  • 0: all records are successfully read and written.
  • non-0: at least one record was not successfully read or written.

Example Output

Reference Name: 1 has 5 records
Reference Name: 2 has 2 records
Reference Name: 3 has 1 records
Reference Name: * has 2 records
Number of records = 10
Returning: 0 (SUCCESS)