Changes

From Genome Analysis Wiki
Jump to navigationJump to search
935 bytes added ,  22:31, 8 April 2019
Line 10: Line 10:  
NOTE: This tool does not properly work on templates that have more than 2 segments.  It does not properly match reads when more than 2 reads have the same read name.
 
NOTE: This tool does not properly work on templates that have more than 2 segments.  It does not properly match reads when more than 2 reads have the same read name.
    +
NOTE: Dedup cannot read from stdin since it reads the input file twice.
    
Potential future features:
 
Potential future features:
Line 38: Line 39:  
Rules:
 
Rules:
 
* Skip Unmapped Reads, they are not marked as duplicate
 
* Skip Unmapped Reads, they are not marked as duplicate
 +
* Reads whose mate is unmapped are treated as single-end
 
* Mark a Single-End Read Duplicate (or remove it if configured to do so) if:
 
* Mark a Single-End Read Duplicate (or remove it if configured to do so) if:
*# A paired-end record has the same key (even if the pair is not proper/the mate is unmapped/the mate is not found)<br/>-OR-
+
*# A paired-end record has the same key (even if the pair is not proper/the mate is not found)<br/>-OR-
 
*# A single-end record has the same key and a higher base quality sum (sum of all base qualities in the record above [[#Minimum Quality for Quality Calculations (--minQual)|<code>--minBaseQual</code>]])
 
*# A single-end record has the same key and a higher base quality sum (sum of all base qualities in the record above [[#Minimum Quality for Quality Calculations (--minQual)|<code>--minBaseQual</code>]])
 
* Mark both Paired-End Reads Duplicate if:
 
* Mark both Paired-End Reads Duplicate if:
Line 45: Line 47:  
   
 
   
 
This code assumes that at most 1000 bases are clipped at the start of a read.
 
This code assumes that at most 1000 bases are clipped at the start of a read.
 +
 +
 +
Deduping requires two passes through the file, so cannot read from stdin.
    
==Handling Recalibration==
 
==Handling Recalibration==
Line 70: Line 75:     
= Usage =
 
= Usage =
  ./bam dedup --in <InputBamFile> --out <OutputBamFile> [--minQual <minPhred>] [--log <logFile>] [--oneChrom] [--rmDups] [--force] [--verbose] [--noeof] [--params] [--recab]  
+
  ./bam dedup --in <InputBamFile> --out <OutputBamFile> [--minQual <minPhred>] [--log <logFile>] [--oneChrom] [--rmDups] [--force] [--excludeFlags <flag>] [--verbose] [--noeof] [--params] [--recab]
    
Additional Recalibration Usage is documented at [[BamUtil: recab#Usage|BamUtil: recab -> Usage]]
 
Additional Recalibration Usage is documented at [[BamUtil: recab#Usage|BamUtil: recab -> Usage]]
Line 87: Line 92:  
                  duplicates and apply this duplicate marking logic.  Default is to throw errors
 
                  duplicates and apply this duplicate marking logic.  Default is to throw errors
 
                  and exit when trying to run on an already mark-duplicated BAM
 
                  and exit when trying to run on an already mark-duplicated BAM
 +
--excludeFlags <flag>    : exclude reads with any of these flags set when determining or marking duplicates
 +
                          by default (0xB04): exclude unmapped, secondary reads, QC failures, and supplementary reads
 
--verbose      : Turn on verbose mode
 
--verbose      : Turn on verbose mode
 
--noeof        : Do not expect an EOF block on a bam file.
 
--noeof        : Do not expect an EOF block on a bam file.
Line 92: Line 99:  
--recab        : Recalibrate in addition to deduping
 
--recab        : Recalibrate in addition to deduping
 
</pre>
 
</pre>
 +
{{PhoneHomeParamDesc}}
    
Additional Recalibration Parameters are documented at [[BamUtil: recab#Parameters|BamUtil: recab -> Parameters]]
 
Additional Recalibration Parameters are documented at [[BamUtil: recab#Parameters|BamUtil: recab -> Parameters]]
   −
{{inBAMInputFile}}
+
== Required Parameters ==
 +
{{inBAMInputFile|noStdin=1}}
    
Note: The input file must be sorted by coordinate.
 
Note: The input file must be sorted by coordinate.
 
{{outBAMOutputFile}}
 
{{outBAMOutputFile}}
   −
== Minimum Quality for Quality Calculations (<code>--minQual</code>) ==
+
== Optional Parameters==
 +
=== Minimum Quality for Quality Calculations (<code>--minQual</code>) ===
    
When duplicate reads are encountered, the read with the highest quality is kept.
 
When duplicate reads are encountered, the read with the highest quality is kept.
Line 106: Line 116:  
To determine the quality of a read, all of the phred base quality scores above the <code>--minQual</code> value are added together.  If <code>--minQual</code> is not specified, it is defaulted to 15.
 
To determine the quality of a read, all of the phred base quality scores above the <code>--minQual</code> value are added together.  If <code>--minQual</code> is not specified, it is defaulted to 15.
   −
== Output log & Summary Statistics FileName (<code>--log</code>) ==
+
=== Output log & Summary Statistics FileName (<code>--log</code>) ===
    
Output file name for writing logs & summary statistics.
 
Output file name for writing logs & summary statistics.
Line 112: Line 122:  
If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log".  Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr.  If the filename after --log starts with '-' it will write to stderr.
 
If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log".  Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr.  If the filename after --log starts with '-' it will write to stderr.
   −
== Treat Reads with Mates On Different Chromosomes As Single-Ended (<code>--oneChrom</code>) ==
+
=== Treat Reads with Mates On Different Chromosomes As Single-Ended (<code>--oneChrom</code>) ===
    
If a read's mate is not found it will not be used for duplicate marking.  If you are running on a single chromosome, all read's whose mates are on different chromosomes will not be used for duplicate marking.  The <code>--oneChrom</code> option will treat reads with mates on a different chromosome as single-ended.
 
If a read's mate is not found it will not be used for duplicate marking.  If you are running on a single chromosome, all read's whose mates are on different chromosomes will not be used for duplicate marking.  The <code>--oneChrom</code> option will treat reads with mates on a different chromosome as single-ended.
   −
== Remove Duplicates (<code>--rmDups</code>) ==
+
=== Remove Duplicates (<code>--rmDups</code>) ===
    
Instead of marking a read as duplicate in the flag, the <code>--rmDups</code> option will remove it from the output BAM file.   
 
Instead of marking a read as duplicate in the flag, the <code>--rmDups</code> option will remove it from the output BAM file.   
   −
== Ignore Previous Duplicate Marking (<code>--force</code>) ==
+
=== Ignore Previous Duplicate Marking (<code>--force</code>) ===
    
By default the deduper will throw an error and stop if a read is already marked as duplicate.  The <code>--force</code> option will removes any previous duplicate marking and marks the reads from scratch.  The resulting output file will only have reads determined by the deduper marked as duplicates.
 
By default the deduper will throw an error and stop if a read is already marked as duplicate.  The <code>--force</code> option will removes any previous duplicate marking and marks the reads from scratch.  The resulting output file will only have reads determined by the deduper marked as duplicates.
   −
== Turn on Verbose Mode (<code>--verbose</code>) ==
+
=== Skip Records with any of the Specified Flags (<code>--excludeFlags</code>)===
 +
Skip records with any of the specified flags set, default 0xB04
 +
 
 +
By default skips reads with any of the following flags set:
 +
* unmapped
 +
* secondary alignment
 +
* fails QC checks
 +
* supplementary reads
 +
 
 +
Secondary (0x100) and Supplementary (0x800) reads currently must be excluded.
 +
 
 +
=== Turn on Verbose Mode (<code>--verbose</code>) ===
    
Turn on verbose logging to get more log messages in the log and to stderr.
 
Turn on verbose logging to get more log messages in the log and to stderr.
Line 136: Line 157:     
See [[BamUtil: recab]] for recalibration details.
 
See [[BamUtil: recab]] for recalibration details.
 +
 +
{{PhoneHomeParameters}}
    
= Return Value =
 
= Return Value =
Line 141: Line 164:  
Returns -1 if input parameters are invalid.
 
Returns -1 if input parameters are invalid.
   −
Returns the SamStatus for the reads/writes (0 on success).
+
Returns the SamStatus for the reads/writes (0 on success, non-0 on failure).

Navigation menu