Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 6: Line 6:  
The <code>clipOverlap</code> option on the [[bamUtil]] executable clips overlapping read pairs.
 
The <code>clipOverlap</code> option on the [[bamUtil]] executable clips overlapping read pairs.
   −
The input file and resulting output file is sorted by coordinate (or readName if specified in the options).
+
The input file and resulting output file are sorted by coordinate (or readName if specified in the options).
    
When a read is clipped from the front:
 
When a read is clipped from the front:
* the read start position is updated to reflect the clipping
+
* the read start position is updated to reflect the clipping.
 
* the mate's mate start position is updated to reflect the record's new position.
 
* the mate's mate start position is updated to reflect the record's new position.
 
* the record is placed in the output file in the correct location based on the updated position.
 
* the record is placed in the output file in the correct location based on the updated position.
   −
To handle coordinate sorted files, SAM/BAM records are buffered up until it is known that all following records will have a later start position.  To prevent the program from running away with memory, a limit is set to the number of records that can be buffered, see [[#Set the SAM/BAMs record buffer size (--poolSize)|<code>--poolSize</code>]] for more information.
+
To handle coordinate-sorted files, SAM/BAM records are buffered up until it is known that all following records will have a later start position.  To prevent the program from running away with memory, a limit is set to the number of records that can be buffered, see [[#Set the SAM/BAMs record buffer size (--poolSize)|<code>--poolSize</code>]] for more information.
   −
When two mates overlap, this tool will clip the record's whose clipped region would has the lowest average quality.
+
When two mates overlap, this tool will clip the record's whose clipped region would have the lowest average quality.
   −
It also checks strand. If a forward strand extends past the end of a reverse strand, that will be clipped.  Similarly, if a reverse strand starts before the forward strand, the region prior to the forward strand will be clipped. If the reverse strand occurs entirely before the forward strand, both strands will be entirely clipped.  If the [[#Mark entirely clipped reads as unmapped (--unmapped)|<code>--unmapped</code>]] option is specified rather than clipping an entire read, it will be marked as unmapped.
+
It also checks strand. If a forward strand extends past the end of a reverse strand, that will be clipped.  Similarly, if a reverse strand starts before the forward strand, the region prior to the forward strand will be clipped. If the reverse strand occurs entirely before the forward strand, both strands will be entirely clipped.  If the [[#Mark entirely clipped reads as unmapped (--unmapped)|<code>--unmapped</code>]] option is specified, then rather than clipping an entire read, it will be marked as unmapped.
    
The qualities on the two strands remain unchanged even with clipping.
 
The qualities on the two strands remain unchanged even with clipping.
Line 28: Line 28:  
**It matches in pairs, so if there are 3, the first 2 will be matched and compared, but the 3rd won't.  If there are 4, the first 2 will be matched and the last 2 will be matched and compared.
 
**It matches in pairs, so if there are 3, the first 2 will be matched and compared, but the 3rd won't.  If there are 4, the first 2 will be matched and the last 2 will be matched and compared.
 
*Only mapped reads will be clipped
 
*Only mapped reads will be clipped
*Mate information in records are accurate
+
*Assumes that mate information in records are accurate
    
= Rules for Clipping =
 
= Rules for Clipping =
61

edits

Navigation menu