BamUtil: asp

From Genome Analysis Wiki
Revision as of 17:08, 27 January 2012 by Mktrost (talk | contribs)
Jump to navigationJump to search

Overview of the asp function of bamUtil

The asp option on the bamUtil executable generates a pileup in ASP format from the specified BAM file.

ASP is a new format that is currently in production, so this tool is not yet available for public release.


	./bam asp --in <inputFile> --out <outputFile> --refFile <referenceFilename> [--bamIndex <bamIndexFile>] [--regionList <regFileName>] [--noeof] [--params]


	Required Parameters:
		--in       : the SAM/BAM file to calculate asp for
		--out      : the output file to write
		--refFile  : the reference file
	Optional Parameters:
		--bamIndex    : The path/name of the bam index file
		                (if required and not specified uses the --in value + ".bai")
		--regionList  : File containing the regions to be processed chr<tab>start_pos<tab>end<pos>.
		                Positions are 0 based and the end_pos is not included in the region.
		                Uses bamIndex.
		--gapSize     : Gap Size threshold such that position gaps less than this size have an
		                empty record written, while gaps larger than this size have a new
		                chrom/position header written, Default = 100.
		--noeof       : Do not expect an EOF block on a bam file.
		--params      : Print the parameter settings

Input File (--in)

Use --in followed by your file name to specify the SAM/BAM input file.

The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.

A - is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).

SAM/BAM/Uncompressed BAM from file --in yourFileName
SAM from stdin --in -
BAM from stdin --in -.bam
Uncompressed BAM from stdin --in -.ubam

Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

output File (--out)

Use --out followed by your file name to specify the ASP file to write from the pileup.

To compress the output, specify a filename with a .gz extension.

Reference File (--refFile)

Use --refFile followed by the reference file name to specify the reference sequence file.

Bam Index File (--bamIndex)

Use --bamIndex followed by your file name to specify the BAM index file to use for reading the BAM file.

If this file is required but not specified, it will use the input file name + ".bai".

Region List (--regionList)

Use the --regionList option if you only want to pileup specific regions instead of the entire BAM file. The region list file has one region on each line.

Format of each line:


The positions are 0 based and the end_pos is not included in the region.

This option uses a bamIndex file for jumping between the regions.

If a position is covered by multiple regions, the position will be piled up multiple times (once for each region).

Gap Size (--gapSize)

When writing an ASP file, there are two ways to skip positions that do not have any data (records/bases) associated with them.

  1. Write an Empty record indicating no data for that position.
  2. Write a new position record indicating the next position that has data.

The --gapSize option specifies at what point a Position record should be written instead of an Empty record. If the space between two positions that have data is larger than the gap size, then a Position record is written. Otherwise Empty records are written until the next position that has data.

The default gap size is 100.

Do not require BGZF EOF block (--noeof)

Use --noeof if you do not expect a trailing eof block in your bgzf file.

By default, the trailing empty block is expected and checked for.

Print the Program Parameters (--params)

Use --params to print the parameters for your program to stderr.

Return Value

  • 0: the file was processed successfully.
  • non-0: the file was not processed successfully.


An ASP file is written containing the pileup for the specified BAM file.