Changes

From Genome Analysis Wiki
Jump to: navigation, search

BamUtil: clipOverlap

1,110 bytes added, 14:52, 18 November 2011
Update clipOverlap
--storeOrig : Store the original cigar in the specified tag.
--readName : Original file is sorted by Read Name instead of coordinate.
--poolSize : Maximum number of records the program is allowed to allocate
for clipping on Coordinate sorted files. (Default: 500)
--noeof : Do not expect an EOF block on a bam file.
--params : Print the parameter settings
Clipping By Coordinate Optional Parameters:
--poolSize : Maximum number of records the program is allowed to allocate
for clipping on Coordinate sorted files. (Default: 5000)
--poolSkipClip : Skip clipping reads to free of usable records when the
poolSize is hit. The default action is to just clip the
first read in a pair to free up the record.
</pre>
== Set the SAM/BAMs record buffer size (<code>--poolSize</code>) ==
To handle coordinate sorted files, SAM/BAM records are buffered until it is known that all following records will have a later start position. To prevent the program from running away with memory, a limit is set to the number of records that can be buffered (defaults to 5005000). If the poolSize is exhausted, the code will write the earliest record awaiting its overlapping mate and any previous records that are being buffered. Depending on whether or not <code>--poolSkipClip</code> is set, it will either, clip the end of the read at the position where the mate is supposed to start or it will not clip either read. An error message is written to stderr to indicate that one of these has happened and an unsuccessful return value is returned (2: NO_MORE_RECS). The resulting file will still be sorted by coordinate.  == Skip Clipping Coordinate Sorted Files When Out of Records (<code>--poolSkipClip</code>) == When clipping coordinate sorted SAM/BAM files, we can run out of buffers available in the pool (<code>--poolSize</code>). By default when we run out of pooled records, we can no longer read in new records, so instead we release some of the stored records. We do this by dropping the first record that is being held awaiting its mate.
If This record can either be:* Clipped starting at its mate's start position until the end of the poolSize read (DEFAULT)* Left as is exhaustedwith no clipping, leaving the mates mates overlapping (specify <code will write the earliest record awaiting its overlapping mate and any previous records that are being buffered. This record and its mate will NOT be clipped since it cannot be held onto any longer. An error message is written to stderr to indicate that this happened.>--poolSkipClip</code>)
The With either option, the resulting file will still be sorted by coordinate, but not all overlapping mates will have been clipped.

Navigation menu