BamUtil: asp

From Genome Analysis Wiki
Revision as of 17:06, 27 January 2012 by Mktrost (talk | contribs) (Created page with 'asp Category:BAM Software Category:Software = Overview of the <code>asp</code> function of <code>bamUtil</code> = The <code>asp</code> option on the…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Overview of the asp function of bamUtil

The asp option on the bamUtil executable generates a pileup in ASP format from the specified BAM file.

ASP is a new format that is currently in production, so this tool is not yet available for public release.


Usage

	./bam asp --in <inputFile> --out <outputFile> --refFile <referenceFilename> [--bamIndex <bamIndexFile>] [--regionList <regFileName>] [--noeof] [--params]


Parameters

	Required Parameters:
		--in       : the SAM/BAM file to calculate asp for
		--out      : the output file to write
		--refFile  : the reference file
	Optional Parameters:
		--bamIndex    : The path/name of the bam index file
		                (if required and not specified uses the --in value + ".bai")
		--regionList  : File containing the regions to be processed chr<tab>start_pos<tab>end<pos>.
		                Positions are 0 based and the end_pos is not included in the region.
		                Uses bamIndex.
		--gapSize     : Gap Size threshold such that position gaps less than this size have an
		                empty record written, while gaps larger than this size have a new
		                chrom/position header written, Default = 100.
		--noeof       : Do not expect an EOF block on a bam file.
		--params      : Print the parameter settings

Input File (--in)

Use --in followed by your file name to specify the SAM/BAM input file.

The program automatically determines if your input file is SAM/BAM/uncompressed BAM without any input other than a filename from the user, unless your input file is stdin.

A - is used to indicate to read from stdin and the extension is used to determine the file type (no extension indicates SAM).

SAM/BAM/Uncompressed BAM from file --in yourFileName
SAM from stdin --in -
BAM from stdin --in -.bam
Uncompressed BAM from stdin --in -.ubam


Note: Uncompressed BAM is compressed using compression level-0 (so it is not an entirely uncompressed file). This matches the samtools implementation so pipes between our tools and samtools are supported.

output File (--out)

Use --out followed by your file name to specify the ASP file to write from the pileup.

To compress the output, specify a filename with a .gz extension.

Reference File (--refFile)

Use --refFile followed by the reference file name to specify the reference sequence file.

Bam Index File (--bamIndex)

Use --bamIndex followed by your file name to specify the BAM index file to use for reading the BAM file.

If this file is required but not specified, it will use the input file name + ".bai".

Region List (--regionList)

Use the --regionList option if you only want to pileup specific regions instead of the entire BAM file. The region list file has one region on each line.

Format of each line:

chr<tab>start_pos<tab>end<pos>

The positions are 0 based and the end_pos is not included in the region.

This option uses a bamIndex file for jumping between the regions.

If a position is covered by multiple regions, the position will be piled up multiple times (once for each region).

Gap Size (--gapSize)

When writing an ASP file, there are two ways to skip positions that do not have any data (records/bases) associated with them.

  1. Write an Empty record indicating no data for that position.
  2. Write a new position record indicating the next position that has data.

The --gapSize option specifies at what point a Position record should be written instead of an Empty record. If the space between two positions that have data is larger than the gap size, then a Position record is written. Otherwise Empty records are written until the next position that has data.

The default gap size is 100.

Do not require BGZF EOF block (--noeof)

Use --noeof if you do not expect a trailing eof block in your bgzf file.

By default, the trailing empty block is expected and checked for.

Print the Program Parameters (--params)

Use --params to print the parameters for your program to stderr.


Asp File Name (--asp)

Use --asp followed by the file name of the ASP file that you want to read.

Only print Data Records (--dataOnly)

The --dataOnly option tells the tool to print only Reference Only and Detailed records. Any Empty and Position records are not printed.

Print the Program Parameters (--params)

Use --params to print the parameters for your program to stderr.

Return Value

  • 0: the file was processed successfully.
  • non-0: the file was not processed successfully.

Output

Each ASP record is printed on one line with each field separated by a tab.

The 1st field in the row is the chromosomeID and 0-based position separated by a ':'.

The 2nd field is the record type, POS, EMPTY, REF_ONLY, or DETAILED.

POS and EMPTY records have no additional columns.

REF_ONLY records have 3 additional fields:

  1. numBases - the number of bases at this position
  2. GLH - the GLH for this position
  3. GLA - the GLA for this position

DETAILED records have 6 additional fields:

  1. numBases - the number of bases at this position
  2. bases - the bases at this position. String of ACTGND characters that is numBases long. ('D' represents a deletion)
  3. qualities - the qualities at this position. String of characters representing the qualities that is numBases long. (' ' represents the quality of a deletion)
  4. cycles - the cycles for this position. There are numBases cycles, separated by a ':'. (-1 represents the cycle of a deletion)
  5. strands - the strands for this position. Sequence of numBases 0's and 1's. 0 represents forward strand and 1 represents reverse strand.
  6. mqs - the mapping qualities for this position. There are numBases mapping qualities, separated by a ':'.

Sample Output