Changes

From Genome Analysis Wiki
Jump to navigationJump to search
3,079 bytes added ,  11:58, 15 November 2011
no edit summary
Line 6: Line 6:  
The <code>clipOverlap</code> option on the [[bamUtil]] executable clips overlapping read pairs.
 
The <code>clipOverlap</code> option on the [[bamUtil]] executable clips overlapping read pairs.
   −
= RESTRICTIONS =
+
The input file and resulting output file is sorted by coordinate (or readName is specified in the options).
   −
*Assumes the file is sorted by ReadName
+
When a read is clipped from the front:
 +
* the read start position is updated to reflect the clipping
 +
* the mate's mate start position is updated to reflect the record's new position.
 +
* the record is placed in the output file in the correct location based on the updated position.
 +
 
 +
To handle coordinate sorted files, SAM/BAM records are buffered up until it is known that all following records will have a later start position.  To prevent the program from running away with memory, a limit is set to the number of records that can be buffered, see [[#Set the SAM/BAMs record buffer size (--poolSize)|<code>--poolSize</code>]] for more information.
 +
 
 +
 
 +
 
 +
== ASSUMPTIONS/RESTRICTIONS ==
 +
 
 +
*Assumes the file is sorted by Coordinate (or ReadName if using <code>--readName</code> option)
 
*Assumes only 2 reads have matching ReadNames
 
*Assumes only 2 reads have matching ReadNames
 
**It matches in pairs, so if there are 3, the first 2 will be matched and compared, but the 3rd won't.  If there are 4, the first 2 will be matched and the last 2 will be matched and compared.
 
**It matches in pairs, so if there are 3, the first 2 will be matched and compared, but the 3rd won't.  If there are 4, the first 2 will be matched and the last 2 will be matched and compared.
 
*Only mapped reads will be clipped
 
*Only mapped reads will be clipped
 +
*Mate information in records are accurate
    
= Rules for Clipping =
 
= Rules for Clipping =
Line 67: Line 79:     
= Usage =
 
= Usage =
 
+
./bam clipOverlap --in <inputFile> --out <outputFile> [--storeOrig <tag>] [--readName] [--poolSize <numRecords allowed to allocate>] [--noeof] [--params]
    
= Parameters =
 
= Parameters =
 
<pre>
 
<pre>
 
Required Parameters:
 
Required Parameters:
--in         : the SAM/BAM file to be read
+
--in : the SAM/BAM file to clip overlaping read pairs for
 
--out        : the SAM/BAM file to be written
 
--out        : the SAM/BAM file to be written
 
Optional Parameters:
 
Optional Parameters:
--noeof     : do not expect an EOF block on a bam file.
+
--storeOrig  : Store the original cigar in the specified tag.
--params     : print the parameter settings
+
--readName    : Original file is sorted by Read Name instead of coordinate.
 +
--poolSize    : Maximum number of records the program is allowed to allocate
 +
                for clipping on Coordinate sorted files. (Default: 500)
 +
--noeof       : Do not expect an EOF block on a bam file.
 +
--params     : Print the parameter settings
 
</pre>
 
</pre>
   Line 83: Line 99:  
{{outBAMOutputFile}}
 
{{outBAMOutputFile}}
    +
{{noeofBAMParameter}}
 +
{{paramsParameter}}
 +
 +
 +
== Store the original cigar string in a tag (<code>--storeOrig</code>) ==
 +
 +
Use <code>--storeOrig</code> followed by the two character TAG to store the original CIGAR.
 +
 +
It will be stored with the specified tag as a "Z" tag type.
    +
 +
== Work on SAM/BAMs sorted by Read Name instead of by coordinate (<code>--readName</code>) ==
 +
 +
If your file is sorted by read name rather than by coordinate, specify <code>--readName</code>.  The resulting file will still be sorted by read name.
 +
 +
 +
== Set the SAM/BAMs record buffer size (<code>--poolSize</code>) ==
 +
 +
To handle coordinate sorted files, SAM/BAM records are buffered until it is known that all following records will have a later start position.  To prevent the program from running away with memory, a limit is set to the number of records that can be buffered (defaults to 500).
 +
 +
If the poolSize is exhausted, the code will write the earliest record awaiting its overlapping mate and any previous records that are being buffered.  This record and its mate will NOT be clipped since it cannot be held onto any longer.  An error message is written to stderr to indicate that this happened.
 +
 +
The resulting file will still be sorted by coordinate, but not all overlapping mates will have been clipped.
       
= Return Value =
 
= Return Value =
 +
 +
Returns -1 if input parameters are invalid.
 +
 
Returns the SamStatus for the reads/writes.
 
Returns the SamStatus for the reads/writes.
      −
= Example Output =
+
= Output =
 +
 
 +
The number of records that are expected to overlap with a mate (based on the mate information), but could not be matched up with the mate (based on mate positions & read names) is printed to stderr after the run has completed.
 +
 
 +
When processing has been completed, "Completed ClipOverlap." is printed to stderr.
 +
 
 +
 
 +
== Example Output ==  
 
<pre>
 
<pre>
 +
Failed to find expected overlapping mates for 2 records.
 +
Completed ClipOverlap.
 
</pre>
 
</pre>

Navigation menu