Changes

From Genome Analysis Wiki
Jump to navigationJump to search
4,784 bytes added ,  17:06, 27 January 2012
Created page with 'asp Category:BAM Software Category:Software = Overview of the <code>asp</code> function of <code>bamUtil</code> = The <code>asp</code> option on the…'
[[Category:BamUtil|asp]]
[[Category:BAM Software]]
[[Category:Software]]

= Overview of the <code>asp</code> function of <code>bamUtil</code> =
The <code>asp</code> option on the [[bamUtil]] executable generates a pileup in [[LibStatGen: ASP|ASP]] format from the specified BAM file.

<span style="color:#D2691E">ASP is a new format that is currently in production, so this tool is not yet available for public release.</span>


= Usage =
./bam asp --in <inputFile> --out <outputFile> --refFile <referenceFilename> [--bamIndex <bamIndexFile>] [--regionList <regFileName>] [--noeof] [--params]


= Parameters =
<pre>
Required Parameters:
--in : the SAM/BAM file to calculate asp for
--out : the output file to write
--refFile : the reference file
Optional Parameters:
--bamIndex : The path/name of the bam index file
(if required and not specified uses the --in value + ".bai")
--regionList : File containing the regions to be processed chr<tab>start_pos<tab>end<pos>.
Positions are 0 based and the end_pos is not included in the region.
Uses bamIndex.
--gapSize : Gap Size threshold such that position gaps less than this size have an
empty record written, while gaps larger than this size have a new
chrom/position header written, Default = 100.
--noeof : Do not expect an EOF block on a bam file.
--params : Print the parameter settings
</pre>

{{inBAMInputFile}}

== output File <code>(--out)</code>==

Use <code>--out</code> followed by your file name to specify the ASP file to write from the pileup.

To compress the output, specify a filename with a .gz extension.

{{RefFile}}
{{BamIndex}}

== Region List <code>(--regionList)</code> ==
Use the <code>--regionList</code> option if you only want to pileup specific regions instead of the entire BAM file. The region list file has one region on each line.

Format of each line:
<pre>
chr<tab>start_pos<tab>end<pos>
</pre>
The positions are 0 based and the end_pos is not included in the region.

This option uses a bamIndex file for jumping between the regions.

If a position is covered by multiple regions, the position will be piled up multiple times (once for each region).

== Gap Size <code>(--gapSize)</code> ==
When writing an ASP file, there are two ways to skip positions that do not have any data (records/bases) associated with them.
# Write an Empty record indicating no data for that position.
# Write a new position record indicating the next position that has data.

The <code>--gapSize</code> option specifies at what point a Position record should be written instead of an Empty record. If the space between two positions that have data is larger than the gap size, then a Position record is written. Otherwise Empty records are written until the next position that has data.

The default gap size is 100.

{{noeofBGZFParameter}}
{{paramsParameter}}


== Asp File Name <code>(--asp)</code>==

Use <code>--asp</code> followed by the file name of the ASP file that you want to read.

== Only print Data Records <code>(--dataOnly)</code>==
The <code>--dataOnly</code> option tells the tool to print only Reference Only and Detailed records. Any Empty and Position records are not printed.

{{paramsParameter}}

= Return Value =
* 0: the file was processed successfully.
* non-0: the file was not processed successfully.

=Output=
Each ASP record is printed on one line with each field separated by a <code>tab</code>.

The 1st field in the row is the chromosomeID and 0-based position separated by a ':'.

The 2nd field is the record type, <code>POS</code>, <code>EMPTY</code>, <code>REF_ONLY</code>, or <code>DETAILED</code>.

<code>POS</code> and <code>EMPTY</code> records have no additional columns.

<code>REF_ONLY</code> records have 3 additional fields:
# numBases - the number of bases at this position
# GLH - the GLH for this position
# GLA - the GLA for this position

<code>DETAILED</code> records have 6 additional fields:
# numBases - the number of bases at this position
# bases - the bases at this position. String of ACTGND characters that is numBases long. ('D' represents a deletion)
# qualities - the qualities at this position. String of characters representing the qualities that is numBases long. (' ' represents the quality of a deletion)
# cycles - the cycles for this position. There are numBases cycles, separated by a ':'. (-1 represents the cycle of a deletion)
# strands - the strands for this position. Sequence of numBases 0's and 1's. 0 represents forward strand and 1 represents reverse strand.
# mqs - the mapping qualities for this position. There are numBases mapping qualities, separated by a ':'.

==Sample Output==

Navigation menu