Changes

From Genome Analysis Wiki
Jump to navigationJump to search
Line 6: Line 6:  
The <code>clipOverlap</code> option on the [[bamUtil]] executable clips overlapping read pairs.
 
The <code>clipOverlap</code> option on the [[bamUtil]] executable clips overlapping read pairs.
   −
The input file and resulting output file is sorted by coordinate (or readName if specified in the options).
+
The input file and resulting output file are sorted by coordinate (or readName if specified in the options).
    
When a read is clipped from the front:
 
When a read is clipped from the front:
* the read start position is updated to reflect the clipping
+
* the read start position is updated to reflect the clipping.
 
* the mate's mate start position is updated to reflect the record's new position.
 
* the mate's mate start position is updated to reflect the record's new position.
 
* the record is placed in the output file in the correct location based on the updated position.
 
* the record is placed in the output file in the correct location based on the updated position.
   −
To handle coordinate sorted files, SAM/BAM records are buffered up until it is known that all following records will have a later start position.  To prevent the program from running away with memory, a limit is set to the number of records that can be buffered, see [[#Set the SAM/BAMs record buffer size (--poolSize)|<code>--poolSize</code>]] for more information.
+
To handle coordinate-sorted files, SAM/BAM records are buffered up until it is known that all following records will have a later start position.  To prevent the program from running away with memory, a limit is set to the number of records that can be buffered, see [[#Set the SAM/BAMs record buffer size (--poolSize)|<code>--poolSize</code>]] for more information.
   −
When two mates overlap, this tool will clip the record's whose clipped region would has the lowest average quality.
+
When two mates overlap, this tool will clip the record's whose clipped region would have the lowest average quality.
   −
It also checks strand. If a forward strand extends past the end of a reverse strand, that will be clipped.  Similarly, if a reverse strand starts before the forward strand, the region prior to the forward strand will be clipped. If the reverse strand occurs entirely before the forward strand, both strands will be entirely clipped.  If the [[#Mark entirely clipped reads as unmapped (<code>--unmapped</code>)|<code>--unmapped</code>]] option is specified rather than clipping an entire read, it will be marked as unmapped.
+
It also checks strand. If a forward strand extends past the end of a reverse strand, that will be clipped.  Similarly, if a reverse strand starts before the forward strand, the region prior to the forward strand will be clipped. If the reverse strand occurs entirely before the forward strand, both strands will be entirely clipped.  If the [[#Mark entirely clipped reads as unmapped (--unmapped)|<code>--unmapped</code>]] option is specified, then rather than clipping an entire read, it will be marked as unmapped.
    
The qualities on the two strands remain unchanged even with clipping.
 
The qualities on the two strands remain unchanged even with clipping.
Line 25: Line 25:     
*Assumes the file is sorted by Coordinate (or ReadName if using <code>--readName</code> option)
 
*Assumes the file is sorted by Coordinate (or ReadName if using <code>--readName</code> option)
*Assumes only 2 reads have matching ReadNames
+
*Assumes only 2 reads have matching ReadNames (Supplementary and Secondary reads are ignored/skipped by default so will not cause a problem)
 
**It matches in pairs, so if there are 3, the first 2 will be matched and compared, but the 3rd won't.  If there are 4, the first 2 will be matched and the last 2 will be matched and compared.
 
**It matches in pairs, so if there are 3, the first 2 will be matched and compared, but the 3rd won't.  If there are 4, the first 2 will be matched and the last 2 will be matched and compared.
 
*Only mapped reads will be clipped
 
*Only mapped reads will be clipped
*Mate information in records are accurate
+
*Assumes that mate information in records are accurate
    
= Rules for Clipping =
 
= Rules for Clipping =
Line 97: Line 97:  
--stats        : Print some statistics on the overlaps.
 
--stats        : Print some statistics on the overlaps.
 
--overlapsOnly : Only output overlapping read pairs
 
--overlapsOnly : Only output overlapping read pairs
--excludeFlags : Skip records with any of the specified flags set, default 0x70C
+
--excludeFlags : Skip records with any of the specified flags set, default 0xF0C
 
                 --unmapped    : Mark records that would be completely clipped as unmapped
 
                 --unmapped    : Mark records that would be completely clipped as unmapped
 
  --noeof        : Do not expect an EOF block on a bam file.
 
  --noeof        : Do not expect an EOF block on a bam file.
Line 154: Line 154:     
=== Skip Records with any of the Specified Flags (<code>--excludeFlags</code>)===
 
=== Skip Records with any of the Specified Flags (<code>--excludeFlags</code>)===
Skip records with any of the specified flags set, default 0x70C
+
Skip records with any of the specified flags set, default 0xF0C
    
By default skips reads with any of the following flags set:
 
By default skips reads with any of the following flags set:
Line 162: Line 162:  
* fails QC checks
 
* fails QC checks
 
* duplicate
 
* duplicate
 
+
* supplementary
This parameter was added in version 1.0.10.
      
=== Mark entirely clipped reads as unmapped (<code>--unmapped</code>)===
 
=== Mark entirely clipped reads as unmapped (<code>--unmapped</code>)===

Navigation menu