Line 10: |
Line 10: |
| NOTE: This tool does not properly work on templates that have more than 2 segments. It does not properly match reads when more than 2 reads have the same read name. | | NOTE: This tool does not properly work on templates that have more than 2 segments. It does not properly match reads when more than 2 reads have the same read name. |
| | | |
| + | NOTE: Dedup cannot read from stdin since it reads the input file twice. |
| | | |
| Potential future features: | | Potential future features: |
Line 38: |
Line 39: |
| Rules: | | Rules: |
| * Skip Unmapped Reads, they are not marked as duplicate | | * Skip Unmapped Reads, they are not marked as duplicate |
| + | * Reads whose mate is unmapped are treated as single-end |
| * Mark a Single-End Read Duplicate (or remove it if configured to do so) if: | | * Mark a Single-End Read Duplicate (or remove it if configured to do so) if: |
− | *# A paired-end record has the same key (even if the pair is not proper/the mate is unmapped/the mate is not found)<br/>-OR- | + | *# A paired-end record has the same key (even if the pair is not proper/the mate is not found)<br/>-OR- |
| *# A single-end record has the same key and a higher base quality sum (sum of all base qualities in the record above [[#Minimum Quality for Quality Calculations (--minQual)|<code>--minBaseQual</code>]]) | | *# A single-end record has the same key and a higher base quality sum (sum of all base qualities in the record above [[#Minimum Quality for Quality Calculations (--minQual)|<code>--minBaseQual</code>]]) |
| * Mark both Paired-End Reads Duplicate if: | | * Mark both Paired-End Reads Duplicate if: |
Line 45: |
Line 47: |
| | | |
| This code assumes that at most 1000 bases are clipped at the start of a read. | | This code assumes that at most 1000 bases are clipped at the start of a read. |
| + | |
| + | |
| + | Deduping requires two passes through the file, so cannot read from stdin. |
| | | |
| ==Handling Recalibration== | | ==Handling Recalibration== |
Line 70: |
Line 75: |
| | | |
| = Usage = | | = Usage = |
− | ./bam dedup --in <InputBamFile> --out <OutputBamFile> [--minQual <minPhred>] [--log <logFile>] [--oneChrom] [--rmDups] [--force] [--verbose] [--noeof] [--params] [--recab] | + | ./bam dedup --in <InputBamFile> --out <OutputBamFile> [--minQual <minPhred>] [--log <logFile>] [--oneChrom] [--rmDups] [--force] [--excludeFlags <flag>] [--verbose] [--noeof] [--params] [--recab] |
| | | |
| Additional Recalibration Usage is documented at [[BamUtil: recab#Usage|BamUtil: recab -> Usage]] | | Additional Recalibration Usage is documented at [[BamUtil: recab#Usage|BamUtil: recab -> Usage]] |
Line 87: |
Line 92: |
| duplicates and apply this duplicate marking logic. Default is to throw errors | | duplicates and apply this duplicate marking logic. Default is to throw errors |
| and exit when trying to run on an already mark-duplicated BAM | | and exit when trying to run on an already mark-duplicated BAM |
| + | --excludeFlags <flag> : exclude reads with any of these flags set when determining or marking duplicates |
| + | by default (0xB04): exclude unmapped, secondary reads, QC failures, and supplementary reads |
| --verbose : Turn on verbose mode | | --verbose : Turn on verbose mode |
| --noeof : Do not expect an EOF block on a bam file. | | --noeof : Do not expect an EOF block on a bam file. |
Line 92: |
Line 99: |
| --recab : Recalibrate in addition to deduping | | --recab : Recalibrate in addition to deduping |
| </pre> | | </pre> |
| + | {{PhoneHomeParamDesc}} |
| | | |
| Additional Recalibration Parameters are documented at [[BamUtil: recab#Parameters|BamUtil: recab -> Parameters]] | | Additional Recalibration Parameters are documented at [[BamUtil: recab#Parameters|BamUtil: recab -> Parameters]] |
| | | |
− | {{inBAMInputFile}} | + | == Required Parameters == |
| + | {{inBAMInputFile|noStdin=1}} |
| | | |
| Note: The input file must be sorted by coordinate. | | Note: The input file must be sorted by coordinate. |
| {{outBAMOutputFile}} | | {{outBAMOutputFile}} |
| | | |
− | == Minimum Quality for Quality Calculations (<code>--minQual</code>) == | + | == Optional Parameters== |
| + | === Minimum Quality for Quality Calculations (<code>--minQual</code>) === |
| | | |
| When duplicate reads are encountered, the read with the highest quality is kept. | | When duplicate reads are encountered, the read with the highest quality is kept. |
Line 106: |
Line 116: |
| To determine the quality of a read, all of the phred base quality scores above the <code>--minQual</code> value are added together. If <code>--minQual</code> is not specified, it is defaulted to 15. | | To determine the quality of a read, all of the phred base quality scores above the <code>--minQual</code> value are added together. If <code>--minQual</code> is not specified, it is defaulted to 15. |
| | | |
− | == Output log & Summary Statistics FileName (<code>--log</code>) == | + | === Output log & Summary Statistics FileName (<code>--log</code>) === |
| | | |
| Output file name for writing logs & summary statistics. | | Output file name for writing logs & summary statistics. |
Line 112: |
Line 122: |
| If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log". Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr. | | If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log". Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr. |
| | | |
− | == Treat Reads with Mates On Different Chromosomes As Single-Ended (<code>--oneChrom</code>) == | + | === Treat Reads with Mates On Different Chromosomes As Single-Ended (<code>--oneChrom</code>) === |
| | | |
| If a read's mate is not found it will not be used for duplicate marking. If you are running on a single chromosome, all read's whose mates are on different chromosomes will not be used for duplicate marking. The <code>--oneChrom</code> option will treat reads with mates on a different chromosome as single-ended. | | If a read's mate is not found it will not be used for duplicate marking. If you are running on a single chromosome, all read's whose mates are on different chromosomes will not be used for duplicate marking. The <code>--oneChrom</code> option will treat reads with mates on a different chromosome as single-ended. |
| | | |
− | == Remove Duplicates (<code>--rmDups</code>) == | + | === Remove Duplicates (<code>--rmDups</code>) === |
| | | |
| Instead of marking a read as duplicate in the flag, the <code>--rmDups</code> option will remove it from the output BAM file. | | Instead of marking a read as duplicate in the flag, the <code>--rmDups</code> option will remove it from the output BAM file. |
| | | |
− | == Ignore Previous Duplicate Marking (<code>--force</code>) == | + | === Ignore Previous Duplicate Marking (<code>--force</code>) === |
| | | |
| By default the deduper will throw an error and stop if a read is already marked as duplicate. The <code>--force</code> option will removes any previous duplicate marking and marks the reads from scratch. The resulting output file will only have reads determined by the deduper marked as duplicates. | | By default the deduper will throw an error and stop if a read is already marked as duplicate. The <code>--force</code> option will removes any previous duplicate marking and marks the reads from scratch. The resulting output file will only have reads determined by the deduper marked as duplicates. |
| | | |
− | == Turn on Verbose Mode (<code>--verbose</code>) == | + | === Skip Records with any of the Specified Flags (<code>--excludeFlags</code>)=== |
| + | Skip records with any of the specified flags set, default 0xB04 |
| + | |
| + | By default skips reads with any of the following flags set: |
| + | * unmapped |
| + | * secondary alignment |
| + | * fails QC checks |
| + | * supplementary reads |
| + | |
| + | Secondary (0x100) and Supplementary (0x800) reads currently must be excluded. |
| + | |
| + | === Turn on Verbose Mode (<code>--verbose</code>) === |
| | | |
| Turn on verbose logging to get more log messages in the log and to stderr. | | Turn on verbose logging to get more log messages in the log and to stderr. |
Line 136: |
Line 157: |
| | | |
| See [[BamUtil: recab]] for recalibration details. | | See [[BamUtil: recab]] for recalibration details. |
| + | |
| + | {{PhoneHomeParameters}} |
| | | |
| = Return Value = | | = Return Value = |
Line 141: |
Line 164: |
| Returns -1 if input parameters are invalid. | | Returns -1 if input parameters are invalid. |
| | | |
− | Returns the SamStatus for the reads/writes (0 on success). | + | Returns the SamStatus for the reads/writes (0 on success, non-0 on failure). |