Changes

From Genome Analysis Wiki
Jump to navigationJump to search
85 bytes added ,  22:31, 8 April 2019
Line 10: Line 10:  
NOTE: This tool does not properly work on templates that have more than 2 segments.  It does not properly match reads when more than 2 reads have the same read name.
 
NOTE: This tool does not properly work on templates that have more than 2 segments.  It does not properly match reads when more than 2 reads have the same read name.
    +
NOTE: Dedup cannot read from stdin since it reads the input file twice.
    
Potential future features:
 
Potential future features:
Line 38: Line 39:  
Rules:
 
Rules:
 
* Skip Unmapped Reads, they are not marked as duplicate
 
* Skip Unmapped Reads, they are not marked as duplicate
 +
* Reads whose mate is unmapped are treated as single-end
 
* Mark a Single-End Read Duplicate (or remove it if configured to do so) if:
 
* Mark a Single-End Read Duplicate (or remove it if configured to do so) if:
*# A paired-end record has the same key (even if the pair is not proper/the mate is unmapped/the mate is not found)<br/>-OR-
+
*# A paired-end record has the same key (even if the pair is not proper/the mate is not found)<br/>-OR-
 
*# A single-end record has the same key and a higher base quality sum (sum of all base qualities in the record above [[#Minimum Quality for Quality Calculations (--minQual)|<code>--minBaseQual</code>]])
 
*# A single-end record has the same key and a higher base quality sum (sum of all base qualities in the record above [[#Minimum Quality for Quality Calculations (--minQual)|<code>--minBaseQual</code>]])
 
* Mark both Paired-End Reads Duplicate if:
 
* Mark both Paired-End Reads Duplicate if:
Line 45: Line 47:  
   
 
   
 
This code assumes that at most 1000 bases are clipped at the start of a read.
 
This code assumes that at most 1000 bases are clipped at the start of a read.
 +
 +
 +
Deduping requires two passes through the file, so cannot read from stdin.
    
==Handling Recalibration==
 
==Handling Recalibration==
Line 99: Line 104:     
== Required Parameters ==
 
== Required Parameters ==
{{inBAMInputFile}}
+
{{inBAMInputFile|noStdin=1}}
    
Note: The input file must be sorted by coordinate.
 
Note: The input file must be sorted by coordinate.
Line 138: Line 143:  
* supplementary reads
 
* supplementary reads
   −
The Deduper will not work if secondary or supplementary reads are not excluded.  It will not properly find the mates since there will be more than 2 reads with the same mate information.
+
Secondary (0x100) and Supplementary (0x800) reads currently must be excluded.
    
=== Turn on Verbose Mode (<code>--verbose</code>) ===
 
=== Turn on Verbose Mode (<code>--verbose</code>) ===

Navigation menu