BamUtil: asp

From Genome Analysis Wiki
Jump to: navigation, search


Overview of the asp function of bamUtil

The asp option on the bamUtil executable generates a pileup in ASP format from the specified BAM file.

ASP is a new format that is currently in production, so this tool is not yet available for public release.


Rules

Dealing with 'N' Bases

  • If the reference is 'N':
    • Do Not write REF_ONLY or DETAILED records
    • Either write EMPTY or no record (depending on Gap Size and the next data record)
  • If all reads at this position are 'N':
    • Either write EMPTY or no record (depending on Gap Size and the next data record)
  • If some reads are 'N' and the rest are the reference (not 'N')
    • Write a REF_ONLY record but do not include the 'N's in the numBases
  • If some reads are 'N' and some are non-reference (not 'N')
    • DEFAULT: Write a DETAILED record and include the 'N's in the numBases
    • OPTIONAL: Write a DETAILED record but do not include the 'N's in the numBases


Usage

	./bam asp --in <inputFile> --out <outputFile> --refFile <referenceFilename> [--bamIndex <bamIndexFile>] [--regionList <regFileName>] [--noeof] [--params]


Parameters

	Required Parameters:
		--in       : the SAM/BAM file to calculate asp for
		--out      : the output file to write
		--refFile  : the reference file
	Optional Parameters:
		--bamIndex    : The path/name of the bam index file
		                (if required and not specified uses the --in value + ".bai")
		--regionList  : File containing the regions to be processed chr<tab>start_pos<tab>end<pos>.
		                Positions are 0 based and the end_pos is not included in the region.
		                Uses bamIndex.
		--gapSize     : Gap Size threshold such that position gaps less than this size have an
		                empty record written, while gaps larger than this size have a new
		                chrom/position header written, Default = 100.
		--noeof       : Do not expect an EOF block on a bam file.
		--params      : Print the parameter settings

Required Parameters

Input File (--in)

Use --in followed by your file name to specify the SAM/BAM input file.

The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.

A - is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).

SAM/BAM/Uncompressed BAM from file --in yourFileName
SAM from stdin --in -
BAM from stdin --in -.bam
Uncompressed BAM from stdin --in -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

output File (--out)

Use --out followed by your file name to specify the ASP file to write from the pileup.

To compress the output, specify a filename with a .gz extension.

Reference File (--refFile)

Use --refFile followed by the reference file name to specify the reference sequence file.

Optional Parameters

Bam Index File (--bamIndex)

Use --bamIndex followed by your file name to specify the BAM index file to use for reading the BAM file.

If this file is required but not specified, it will use the input file name + ".bai".

Region List (--regionList)

Use the --regionList option if you only want to pileup specific regions instead of the entire BAM file. The region list file has one region on each line.

Format of each line:

chr<tab>start_pos<tab>end<pos>

The positions are 0 based and the end_pos is not included in the region.

This option uses a bamIndex file for jumping between the regions.

If a position is covered by multiple regions, the position will be piled up multiple times (once for each region).

Gap Size (--gapSize)

When writing an ASP file, there are two ways to skip positions that do not have any data (records/bases) associated with them.

  1. Write an Empty record indicating no data for that position.
  2. Write a new position record indicating the next position that has data.

The --gapSize option specifies at what point a Position record should be written instead of an Empty record. If the space between two positions that have data is larger than the gap size, then a Position record is written. Otherwise Empty records are written until the next position that has data.

The default gap size is 100.

Do not require BGZF EOF block (--noeof)

Use --noeof if you do not expect a trailing eof block in your bgzf file.

By default, the trailing empty block is expected and checked for.

Print the Program Parameters (--params)

Use --params to print the parameters for your program to stderr.


Return Value

  • 0: the file was processed successfully.
  • non-0: the file was not processed successfully.

Output

An ASP file is written containing the pileup for the specified BAM file. ASP files are by default compressed using BGZF.

The number of each type of record is output to stderr.

For example:

Number of Position Records = 6
Number of Empty Records = 39
Number of Reference Only Records = 12
Number of Detailed Records = 29