Changes

BamUtil: dedup (view source)

Revision as of 17:49, 5 June 2012

2,279 bytes added , 17:49, 5 June 2012

no edit summary

Line 22: Line 22:

The deduper assumes that duplicates in the input BAM file are not marked.

−

When the deduper detects a marked duplicate in the input BAM file, it will throw an error and stop. To override this behavior, use the --force option; in this mode, alignments that are marked as duplicates in the input file are unmarked before the deduper begins its detection algorithm. The result is that only duplicates detected by the deduper will be marked in or removed from the output file.

+

When the deduper detects a marked duplicate in the input BAM file, it will throw an error and stop. To override this behavior, use the [[#Ignore Previous Duplicate Marking (--force)|<code>--force</code>]] option; in this mode, alignments that are marked as duplicates in the input file are unmarked before the deduper begins its detection algorithm. The result is that only duplicates detected by the deduper will be marked in or removed from the output file.

−

The handling of paired-end reads assumes that the mate information in the SAM/BAM records is accurate. If a mate is not found at the expected position, an error message is printed (once per file) indicating this error. Paired-end reads whose mate cannot be found are not marked duplicate and are not used for duplicate marking of other paired-end reads. Single-end reads with the same key as paired-end reads whose mate cannot be found are still marked as duplicate. If this error is encountered, you may want to fix the mate information and reprocess the file through the deduper. Use the <code>--oneChrom</code> option to treat reads with a mate on a different chromosome as single-ended. This option is useful if you are running the deduper on just a single chromosome.

+

The handling of paired-end reads assumes that the mate information in the SAM/BAM records is accurate. If a mate is not found at the expected position, an error message is printed (once per file) indicating this error. Paired-end reads whose mate cannot be found are not marked duplicate and are not used for duplicate marking of other paired-end reads. Single-end reads with the same key as paired-end reads whose mate cannot be found are still marked as duplicate. If this error is encountered, you may want to fix the mate information and reprocess the file through the deduper. Use the [[#Treat Reads with Mates On Different Chromosomes As Single-Ended (--oneChrom)|<code>--oneChrom</code>]] option to treat reads with a mate on a different chromosome as single-ended. This option is useful if you are running the deduper on just a single chromosome.

Line 33: Line 33:

# Chromosome

# Orientation (forward/reverse)

−

# ~~unclipped start~~(forward)/~~end~~(reverse)

+

# Unclipped Start(forward)/End(reverse)

# Library

Line 62: Line 62:

= Usage =

−

./bam dedup (options) --in <InputBamFile> --out <OutputBamFile> [--log <logFile>] [--oneChrom] [--rmDups] [--force] [--verbose] [--noeof] [--params] [--recab] --refFile <ReferenceFile> [--dbsnp <dbsnpFile>] [--blended <weight>]

+

./bam dedup (options) --in <InputBamFile> --out <OutputBamFile> [--minQual <minPhred>] [--log <logFile>] [--oneChrom] [--rmDups] [--force] [--verbose] [--noeof] [--params] [--recab] --refFile <ReferenceFile> [--dbsnp <dbsnpFile>] [--blended <weight>]

= Parameters =

<pre>

−

~~Required~~ parameters :

+

equired parameters :

--in <infile> : input BAM file name (must be sorted)

--out <outfile> : output BAM file name (same order with original file)

Optional parameters : (see SAM format specification for details)

−

--log <logfile> : log and summary statistics (default: [outfile].log)

+

--minQual <int> : only add scores over this phred quality when determining a read's quality (default: 15)

+

--log <logfile> : log and summary statistics (default: [outfile].log, or stderr if --out starts with '-')

--oneChrom : Treat reads with mates on different chromosomes as single-ended.

--rmDups : Remove duplicates (default is to mark duplicates)

Line 92: Line 93:

+

== BAM File Is Sorted By Read Name (<code>--minQual</code>) ==

+

When duplicate reads are encountered, the read with the highest quality is kept.

+

To determine the quality of a read, all of the phred base quality scores above the <code>--minQual</code> value are added together. If <code>--minQual</code> is not specified, it is defaulted to 15.

+

== Output log & Summary Statistics FileName (<code>--log</code>) ==

+

Output file name for writing logs & summary statistics.

+

If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log". Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr.

+

== Treat Reads with Mates On Different Chromosomes As Single-Ended (<code>--oneChrom</code>) ==

+

If a read's mate is not found it will not be used for duplicate marking. If you are running on a single chromosome, all read's whose mates are on different chromosomes will not be used for duplicate marking. The <code>--oneChrom</code> option will treat reads with mates on a different chromosome as single-ended.

+

== Recalibrate (<code>--recab</code>) ==

+

This option will recalibrate the input file in addition to deduping.

+

== Remove Duplicates (<code>--rmDups</code>) ==

+

Instead of marking a read as duplicate in the flag, the <code>--rmDups</code> option will remove it from the output BAM file.

+

== Ignore Previous Duplicate Marking (<code>--force</code>) ==

+

By default the deduper will throw an error and stop if a read is already marked as duplicate. The <code>--force</code> option will removes any previous duplicate marking and marks the reads from scratch. The resulting output file will only have reads determined by the deduper marked as duplicates.

+

== Turn on Verbose Mode (<code>--verbose</code>) ==

+

Turn on verbose logging to get more log messages in the log and to stderr.

Mktrost

Administrators

3,045

edits

Changes

BamUtil: dedup (view source)

Revision as of 17:49, 5 June 2012

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools