Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,110 bytes added ,  14:52, 18 November 2011
Update clipOverlap
Line 92: Line 92:  
--storeOrig  : Store the original cigar in the specified tag.
 
--storeOrig  : Store the original cigar in the specified tag.
 
--readName    : Original file is sorted by Read Name instead of coordinate.
 
--readName    : Original file is sorted by Read Name instead of coordinate.
--poolSize    : Maximum number of records the program is allowed to allocate
  −
                for clipping on Coordinate sorted files. (Default: 500)
   
--noeof      : Do not expect an EOF block on a bam file.
 
--noeof      : Do not expect an EOF block on a bam file.
 
--params      : Print the parameter settings
 
--params      : Print the parameter settings
 +
Clipping By Coordinate Optional Parameters:
 +
--poolSize    : Maximum number of records the program is allowed to allocate
 +
                for clipping on Coordinate sorted files. (Default: 5000)
 +
--poolSkipClip : Skip clipping reads to free of usable records when the
 +
                poolSize is hit. The default action is to just clip the
 +
                first read in a pair to free up the record.
 
</pre>
 
</pre>
   Line 118: Line 122:  
== Set the SAM/BAMs record buffer size (<code>--poolSize</code>) ==
 
== Set the SAM/BAMs record buffer size (<code>--poolSize</code>) ==
   −
To handle coordinate sorted files, SAM/BAM records are buffered until it is known that all following records will have a later start position.  To prevent the program from running away with memory, a limit is set to the number of records that can be buffered (defaults to 500).
+
To handle coordinate sorted files, SAM/BAM records are buffered until it is known that all following records will have a later start position.  To prevent the program from running away with memory, a limit is set to the number of records that can be buffered (defaults to 5000).
 +
 
 +
If the poolSize is exhausted, the code will write the earliest record awaiting its overlapping mate and any previous records that are being buffered.
 +
 
 +
Depending on whether or not <code>--poolSkipClip</code> is set, it will either, clip the end of the read at the position where the mate is supposed to start or it will not clip either read.  An error message is written to stderr to indicate that one of these has happened and an unsuccessful return value is returned (2: NO_MORE_RECS).
 +
 
 +
The resulting file will still be sorted by coordinate.
 +
 
 +
 
 +
== Skip Clipping Coordinate Sorted Files When Out of Records (<code>--poolSkipClip</code>) ==
 +
 
 +
When clipping coordinate sorted SAM/BAM files, we can run out of buffers available in the pool (<code>--poolSize</code>).
 +
 
 +
By default when we run out of pooled records, we can no longer read in new records, so instead we release some of the stored records.  We do this by dropping the first record that is being held awaiting its mate.
   −
If the poolSize is exhausted, the code will write the earliest record awaiting its overlapping mate and any previous records that are being buffered.  This record and its mate will NOT be clipped since it cannot be held onto any longer.  An error message is written to stderr to indicate that this happened.
+
This record can either be:
 +
* Clipped starting at its mate's start position until the end of the read (DEFAULT)
 +
* Left as is with no clipping, leaving the mates mates overlapping (specify <code>--poolSkipClip</code>)
   −
The resulting file will still be sorted by coordinate, but not all overlapping mates will have been clipped.
+
With either option, the resulting file will still be sorted by coordinate.
     

Navigation menu