Changes

From Genome Analysis Wiki
Jump to navigationJump to search
4,470 bytes added ,  10:34, 15 June 2012
Created page with 'validate Category:BAM Software Category:Software ='''COMING SOON, June, 2012'''= = Overview of the <code>recab</code> function of <code>bamUtil…'
[[Category:BamUtil|validate]]
[[Category:BAM Software]]
[[Category:Software]]

='''COMING SOON, June, 2012'''=

= Overview of the <code>recab</code> function of <code>[[bamUtil]]</code> =
The <code>recab</code> option of [[bamUtil]] recalibrates a SAM/BAM file.

==Handling Recalibration==

Reads Not Recalibrated:
* Duplicates
* Unmapped
* Mapping Quality = 0
* Mapping Quality = 255


=== Covariates Notes ===
Duplicates are determined by checking for matching keys.

The Key is comprised of:
# Chromosome
# Orientation (forward/reverse)
# Unclipped Start(forward)/End(reverse)
# Library

Rules:
* Skip Unmapped Reads, they are not marked as duplicate
* Mark a Single-End Read Duplicate (or remove it if configured to do so) if:
*# A paired-end record has the same key (even if the pair is not proper/the mate is unmapped/the mate is not found)<br/>-OR-
*# A single-end record has the same key and a higher base quality sum (sum of all base qualities in the record)
* Mark both Paired-End Reads Duplicate if:
# Another paired-end pair has the same set of keys and has a higher base quality sum.

This code assumes that at most 1000 bases are clipped at the start of a read.

== How to use it ==

When <code>dedup</code> is invoked without any arguments the usage information is displayed as described below under [[#Usage|Usage]].

The input SAM/BAM file is required, [[#input File (--in)|input File (--in)]], and must be sorted by coordinate.

The output SAM/BAM file is also required, [[#output File (--out)|output File (--out)]].

= Usage =
./bam recab --in <InputBamFile> --out <OutputFile> [--log <logFile>] [--verbose] [--noeof] [--params] --refFile <ReferenceFile> [--dbsnp <dbsnpFile>] [--blended <weight>]

= Parameters =
<pre>
Required General Parameters :
--in <infile> : input BAM file name
--out <outfile> : output recalibration file name
Optional General Parameters :
--log <logfile> : log and summary statistics (default: [outfile].log)
--verbose : Turn on verbose mode
--noeof : do not expect an EOF block on a bam file.
--params : print the parameter settings

Recab Specific Required Parameters
--refFile <reference file> : reference file name
Recab Specific Optional Parameters :
--dbsnp <known variance file> : dbsnp file of positions
--blended <weight> : blended model weight
</pre>

{{inBAMInputFile}}
{{outBAMOutputFile}}

== BAM File Is Sorted By Read Name (<code>--minRecabQual</code>) ==

When recalibrating reads, only positions with a base quality greater than this minimum will be recalibrated. If <code>--minQual</code> is not specified, it is defaulted to <span style="color:red">TBD</span>.

== Output log & Summary Statistics FileName (<code>--log</code>) ==

Output file name for writing logs & summary statistics.

If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log". Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr. If the filename after --log starts with '-' it will write to stderr.

== Treat Reads with Mates On Different Chromosomes As Single-Ended (<code>--oneChrom</code>) ==

If a read's mate is not found it will not be used for duplicate marking. If you are running on a single chromosome, all read's whose mates are on different chromosomes will not be used for duplicate marking. The <code>--oneChrom</code> option will treat reads with mates on a different chromosome as single-ended.

== Recalibrate (<code>--recab</code>) ==

This option will recalibrate the input file in addition to deduping.

== Remove Duplicates (<code>--rmDups</code>) ==

Instead of marking a read as duplicate in the flag, the <code>--rmDups</code> option will remove it from the output BAM file.

== Ignore Previous Duplicate Marking (<code>--force</code>) ==

By default the deduper will throw an error and stop if a read is already marked as duplicate. The <code>--force</code> option will removes any previous duplicate marking and marks the reads from scratch. The resulting output file will only have reads determined by the deduper marked as duplicates.

== Turn on Verbose Mode (<code>--verbose</code>) ==

Turn on verbose logging to get more log messages in the log and to stderr.

{{noeofBGZFParameter}}
{{paramsParameter}}

= Return Value =

Returns -1 if input parameters are invalid.

Returns the SamStatus for the reads/writes (0 on success).

Navigation menu