From Genome Analysis Wiki
Jump to navigationJump to search
85 bytes added
, 22:31, 8 April 2019
Line 10: |
Line 10: |
| NOTE: This tool does not properly work on templates that have more than 2 segments. It does not properly match reads when more than 2 reads have the same read name. | | NOTE: This tool does not properly work on templates that have more than 2 segments. It does not properly match reads when more than 2 reads have the same read name. |
| | | |
| + | NOTE: Dedup cannot read from stdin since it reads the input file twice. |
| | | |
| Potential future features: | | Potential future features: |
Line 38: |
Line 39: |
| Rules: | | Rules: |
| * Skip Unmapped Reads, they are not marked as duplicate | | * Skip Unmapped Reads, they are not marked as duplicate |
| + | * Reads whose mate is unmapped are treated as single-end |
| * Mark a Single-End Read Duplicate (or remove it if configured to do so) if: | | * Mark a Single-End Read Duplicate (or remove it if configured to do so) if: |
− | *# A paired-end record has the same key (even if the pair is not proper/the mate is unmapped/the mate is not found)<br/>-OR- | + | *# A paired-end record has the same key (even if the pair is not proper/the mate is not found)<br/>-OR- |
| *# A single-end record has the same key and a higher base quality sum (sum of all base qualities in the record above [[#Minimum Quality for Quality Calculations (--minQual)|<code>--minBaseQual</code>]]) | | *# A single-end record has the same key and a higher base quality sum (sum of all base qualities in the record above [[#Minimum Quality for Quality Calculations (--minQual)|<code>--minBaseQual</code>]]) |
| * Mark both Paired-End Reads Duplicate if: | | * Mark both Paired-End Reads Duplicate if: |
Line 45: |
Line 47: |
| | | |
| This code assumes that at most 1000 bases are clipped at the start of a read. | | This code assumes that at most 1000 bases are clipped at the start of a read. |
| + | |
| + | |
| + | Deduping requires two passes through the file, so cannot read from stdin. |
| | | |
| ==Handling Recalibration== | | ==Handling Recalibration== |
Line 99: |
Line 104: |
| | | |
| == Required Parameters == | | == Required Parameters == |
− | {{inBAMInputFile}} | + | {{inBAMInputFile|noStdin=1}} |
| | | |
| Note: The input file must be sorted by coordinate. | | Note: The input file must be sorted by coordinate. |
Line 138: |
Line 143: |
| * supplementary reads | | * supplementary reads |
| | | |
− | The Deduper will not work if secondary or supplementary reads are not excluded. It will not properly find the mates since there will be more than 2 reads with the same mate information.
| + | Secondary (0x100) and Supplementary (0x800) reads currently must be excluded. |
| | | |
| === Turn on Verbose Mode (<code>--verbose</code>) === | | === Turn on Verbose Mode (<code>--verbose</code>) === |