Changes

BamUtil: recab (view source)

Revision as of 11:02, 15 June 2012

1,341 bytes removed , 11:02, 15 June 2012

no edit summary

Line 18: Line 18:

=== Covariates Notes ===

−

~~Duplicates are determined by checking for matching keys.~~

−

~~The Key is comprised of:~~

−

~~# Chromosome~~

−

~~# Orientation (forward/reverse)~~

−

~~# Unclipped Start(forward)/End(reverse)~~

−

~~# Library~~

−

~~Rules:~~

−

* Skip Unmapped Reads, they are not marked as duplicate

−

* Mark a Single-End Read Duplicate (or remove it if configured to do so) if:

−

*# A paired-end record has the same key (even if the pair is not proper/the mate is unmapped/the mate is not found) -OR-

−

*# A single-end record has the same key and a higher base quality sum (sum of all base qualities in the record)

−

* Mark both Paired-End Reads Duplicate if:

−

~~# Another paired-end pair has the same set of keys and has a higher base quality sum.~~

−

~~This code assumes that at most 1000 bases are clipped at the start of a read.~~

== How to use it ==

−

When <code>~~dedup~~</code> is invoked without any arguments the usage information is displayed as described below under [[#Usage|Usage]].

+

When <code>recab</code> is invoked without any arguments the usage information is displayed as described below under [[#Usage|Usage]].

−

~~The input SAM/BAM file is required, [[#input File (--in)|input File (--in)]], and must be sorted by coordinate~~.

−

The output SAM/BAM file ~~is also required,~~ [[#output File (--out)|~~output~~ File (--~~out~~)]].

+

The input SAM/BAM file ([[#input File (--in)|--in]]), the output SAM/BAM file ([[#output File (--out)|--out]]), and the reference file ([[#Reference File (--refFile)|--refFile]]) are required inputs.

= Usage =

Line 67: Line 48:

−

~~== BAM File Is Sorted By Read Name (<code>--minRecabQual</code>) ==~~

−

When recalibrating reads, only positions with a base quality greater than this minimum will be recalibrated. If <code>--minQual</code> is not specified, it is defaulted to TBD.

== Output log & Summary Statistics FileName (<code>--log</code>) ==

Line 78: Line 55:

If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log". Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr.

−

== ~~Treat Reads with Mates On Different Chromosomes As Single-Ended~~ (<code>--~~oneChrom~~</code>) ==

+

== Turn on Verbose Mode (<code>--verbose</code>) ==

−

~~If a read's mate is not found it will not be used for duplicate marking. If you are running~~ on a single chromosome, all read's whose mates are on different chromosomes will not be used for duplicate marking. The <code>--oneChrom</code> option will treat reads with mates on a different chromosome as single-ended.

+

Turn on verbose logging to get more log messages in the log and to stderr.

−

~~== Recalibrate (<code>--recab</code>) ==~~

+

−

~~This option will recalibrate the input file in addition to deduping.~~

+

== Reference File (<code>--refFile</code>) ==

−

~~== Remove Duplicates (<code>--rmDups</code>) ==~~

+

The reference file to use for comparing read bases to the reference.

−

~~Instead of marking a read as duplicate in the flag, the~~ <code>--~~rmDups~~</code> ~~option will remove it from the output BAM file.~~

+

== DBSNP File (<code>--dbsnp</code>) ==

−

~~== Ignore Previous Duplicate Marking (<code>~~-~~-force</code>) ==~~

+

The dbsnp file that specifies positions to skip recalibrating. Tab delimited file with the chromosome in the first column and the 1-based position in the 2nd column.

−

~~By default the deduper will throw an error and stop if a read is already marked as duplicate. The~~ <code>--~~force~~</code> ~~option will removes any previous duplicate marking and marks the reads from scratch. The resulting output file will only have reads determined by the deduper marked as duplicates.~~

+

== Blended Model Weight (<code>--blended</code>) ==

−

=~~= Turn on Verbose Mode (<code~~>-~~-verbose~~</~~code~~>~~) ==~~

+

TBD - this parameter is not yet implemented.

−

~~Turn on verbose logging to get more log messages in the log and to stderr.~~

+

== BAM File Is Sorted By Read Name (<code>--minRecabQual</code>) ==

−

~~{{noeofBGZFParameter}}~~

+

When recalibrating reads, only positions with a base quality greater than this minimum will be recalibrated. If <code>--minQual</code> is not specified, it is defaulted to TBD - this parameter is not yet implemented..

−

~~{{paramsParameter}}~~

= Return Value =

Mktrost

Administrators

3,045

edits

Changes

BamUtil: recab (view source)

Revision as of 11:02, 15 June 2012

Navigation menu

Page actions

Page actions

Personal tools

quick links

teaching

Navigation

Search

Tools