Line 18: |
Line 18: |
| | | |
| === Covariates Notes === | | === Covariates Notes === |
− | Duplicates are determined by checking for matching keys.
| |
− |
| |
− | The Key is comprised of:
| |
− | # Chromosome
| |
− | # Orientation (forward/reverse)
| |
− | # Unclipped Start(forward)/End(reverse)
| |
− | # Library
| |
− |
| |
− | Rules:
| |
− | * Skip Unmapped Reads, they are not marked as duplicate
| |
− | * Mark a Single-End Read Duplicate (or remove it if configured to do so) if:
| |
− | *# A paired-end record has the same key (even if the pair is not proper/the mate is unmapped/the mate is not found)<br/>-OR-
| |
− | *# A single-end record has the same key and a higher base quality sum (sum of all base qualities in the record)
| |
− | * Mark both Paired-End Reads Duplicate if:
| |
− | # Another paired-end pair has the same set of keys and has a higher base quality sum.
| |
− |
| |
− | This code assumes that at most 1000 bases are clipped at the start of a read.
| |
| | | |
| == How to use it == | | == How to use it == |
| | | |
− | When <code>dedup</code> is invoked without any arguments the usage information is displayed as described below under [[#Usage|Usage]]. | + | When <code>recab</code> is invoked without any arguments the usage information is displayed as described below under [[#Usage|Usage]]. |
− | | |
− | The input SAM/BAM file is required, [[#input File (--in)|input File (--in)]], and must be sorted by coordinate.
| |
| | | |
− | The output SAM/BAM file is also required, [[#output File (--out)|output File (--out)]]. | + | The input SAM/BAM file ([[#input File (--in)|--in]]), the output SAM/BAM file ([[#output File (--out)|--out]]), and the reference file ([[#Reference File (--refFile)|--refFile]]) are required inputs. |
| | | |
| = Usage = | | = Usage = |
Line 67: |
Line 48: |
| {{inBAMInputFile}} | | {{inBAMInputFile}} |
| {{outBAMOutputFile}} | | {{outBAMOutputFile}} |
− |
| |
− | == BAM File Is Sorted By Read Name (<code>--minRecabQual</code>) ==
| |
− |
| |
− | When recalibrating reads, only positions with a base quality greater than this minimum will be recalibrated. If <code>--minQual</code> is not specified, it is defaulted to <span style="color:red">TBD</span>.
| |
| | | |
| == Output log & Summary Statistics FileName (<code>--log</code>) == | | == Output log & Summary Statistics FileName (<code>--log</code>) == |
Line 78: |
Line 55: |
| If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log". Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr. | | If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log". Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr. |
| | | |
− | == Treat Reads with Mates On Different Chromosomes As Single-Ended (<code>--oneChrom</code>) == | + | == Turn on Verbose Mode (<code>--verbose</code>) == |
| | | |
− | If a read's mate is not found it will not be used for duplicate marking. If you are running on a single chromosome, all read's whose mates are on different chromosomes will not be used for duplicate marking. The <code>--oneChrom</code> option will treat reads with mates on a different chromosome as single-ended.
| + | Turn on verbose logging to get more log messages in the log and to stderr. |
| | | |
− | == Recalibrate (<code>--recab</code>) ==
| + | {{noeofBGZFParameter}} |
| + | {{paramsParameter}} |
| | | |
− | This option will recalibrate the input file in addition to deduping.
| + | == Reference File (<code>--refFile</code>) == |
| | | |
− | == Remove Duplicates (<code>--rmDups</code>) ==
| + | The reference file to use for comparing read bases to the reference. |
| | | |
− | Instead of marking a read as duplicate in the flag, the <code>--rmDups</code> option will remove it from the output BAM file.
| + | == DBSNP File (<code>--dbsnp</code>) == |
| | | |
− | == Ignore Previous Duplicate Marking (<code>--force</code>) ==
| + | The dbsnp file that specifies positions to skip recalibrating. Tab delimited file with the chromosome in the first column and the 1-based position in the 2nd column. |
| | | |
− | By default the deduper will throw an error and stop if a read is already marked as duplicate. The <code>--force</code> option will removes any previous duplicate marking and marks the reads from scratch. The resulting output file will only have reads determined by the deduper marked as duplicates.
| + | == Blended Model Weight (<code>--blended</code>) == |
| | | |
− | == Turn on Verbose Mode (<code>--verbose</code>) == | + | <span style="color:red">TBD - this parameter is not yet implemented.</span> |
| | | |
− | Turn on verbose logging to get more log messages in the log and to stderr.
| + | == BAM File Is Sorted By Read Name (<code>--minRecabQual</code>) == |
| | | |
− | {{noeofBGZFParameter}}
| + | When recalibrating reads, only positions with a base quality greater than this minimum will be recalibrated. If <code>--minQual</code> is not specified, it is defaulted to <span style="color:red">TBD - this parameter is not yet implemented.</span>. |
− | {{paramsParameter}}
| |
| | | |
| = Return Value = | | = Return Value = |