Changes

From Genome Analysis Wiki
Jump to navigationJump to search
271 bytes added ,  22:31, 8 April 2019
Line 10: Line 10:  
NOTE: This tool does not properly work on templates that have more than 2 segments.  It does not properly match reads when more than 2 reads have the same read name.
 
NOTE: This tool does not properly work on templates that have more than 2 segments.  It does not properly match reads when more than 2 reads have the same read name.
    +
NOTE: Dedup cannot read from stdin since it reads the input file twice.
    
Potential future features:
 
Potential future features:
Line 38: Line 39:  
Rules:
 
Rules:
 
* Skip Unmapped Reads, they are not marked as duplicate
 
* Skip Unmapped Reads, they are not marked as duplicate
 +
* Reads whose mate is unmapped are treated as single-end
 
* Mark a Single-End Read Duplicate (or remove it if configured to do so) if:
 
* Mark a Single-End Read Duplicate (or remove it if configured to do so) if:
*# A paired-end record has the same key (even if the pair is not proper/the mate is unmapped/the mate is not found)<br/>-OR-
+
*# A paired-end record has the same key (even if the pair is not proper/the mate is not found)<br/>-OR-
 
*# A single-end record has the same key and a higher base quality sum (sum of all base qualities in the record above [[#Minimum Quality for Quality Calculations (--minQual)|<code>--minBaseQual</code>]])
 
*# A single-end record has the same key and a higher base quality sum (sum of all base qualities in the record above [[#Minimum Quality for Quality Calculations (--minQual)|<code>--minBaseQual</code>]])
 
* Mark both Paired-End Reads Duplicate if:
 
* Mark both Paired-End Reads Duplicate if:
Line 45: Line 47:  
   
 
   
 
This code assumes that at most 1000 bases are clipped at the start of a read.
 
This code assumes that at most 1000 bases are clipped at the start of a read.
 +
 +
 +
Deduping requires two passes through the file, so cannot read from stdin.
    
==Handling Recalibration==
 
==Handling Recalibration==
Line 88: Line 93:  
                  and exit when trying to run on an already mark-duplicated BAM
 
                  and exit when trying to run on an already mark-duplicated BAM
 
--excludeFlags <flag>    : exclude reads with any of these flags set when determining or marking duplicates
 
--excludeFlags <flag>    : exclude reads with any of these flags set when determining or marking duplicates
                          by default (0x304): exclude unmapped, secondary reads, and QC failures
+
                          by default (0xB04): exclude unmapped, secondary reads, QC failures, and supplementary reads
 
--verbose      : Turn on verbose mode
 
--verbose      : Turn on verbose mode
 
--noeof        : Do not expect an EOF block on a bam file.
 
--noeof        : Do not expect an EOF block on a bam file.
Line 99: Line 104:     
== Required Parameters ==
 
== Required Parameters ==
{{inBAMInputFile}}
+
{{inBAMInputFile|noStdin=1}}
    
Note: The input file must be sorted by coordinate.
 
Note: The input file must be sorted by coordinate.
Line 130: Line 135:     
=== Skip Records with any of the Specified Flags (<code>--excludeFlags</code>)===
 
=== Skip Records with any of the Specified Flags (<code>--excludeFlags</code>)===
Skip records with any of the specified flags set, default 0x304
+
Skip records with any of the specified flags set, default 0xB04
    
By default skips reads with any of the following flags set:
 
By default skips reads with any of the following flags set:
Line 136: Line 141:  
* secondary alignment
 
* secondary alignment
 
* fails QC checks
 
* fails QC checks
 +
* supplementary reads
   −
This parameter was added in version 1.0.10.
+
Secondary (0x100) and Supplementary (0x800) reads currently must be excluded.
    
=== Turn on Verbose Mode (<code>--verbose</code>) ===
 
=== Turn on Verbose Mode (<code>--verbose</code>) ===

Navigation menu