Changes

From Genome Analysis Wiki
Jump to navigationJump to search
1,341 bytes removed ,  11:02, 15 June 2012
no edit summary
Line 18: Line 18:     
=== Covariates Notes ===
 
=== Covariates Notes ===
Duplicates are determined by checking for matching keys. 
  −
  −
The Key is comprised of:
  −
# Chromosome
  −
# Orientation (forward/reverse)
  −
# Unclipped Start(forward)/End(reverse)
  −
# Library
  −
  −
Rules:
  −
* Skip Unmapped Reads, they are not marked as duplicate
  −
* Mark a Single-End Read Duplicate (or remove it if configured to do so) if:
  −
*# A paired-end record has the same key (even if the pair is not proper/the mate is unmapped/the mate is not found)<br/>-OR-
  −
*# A single-end record has the same key and a higher base quality sum (sum of all base qualities in the record)
  −
* Mark both Paired-End Reads Duplicate if:
  −
# Another paired-end pair has the same set of keys and has a higher base quality sum.
  −
  −
This code assumes that at most 1000 bases are clipped at the start of a read.
      
== How to use it ==
 
== How to use it ==
   −
When <code>dedup</code> is invoked without any arguments the usage information is displayed as described below under [[#Usage|Usage]].
+
When <code>recab</code> is invoked without any arguments the usage information is displayed as described below under [[#Usage|Usage]].
 
  −
The input SAM/BAM file is required, [[#input File (--in)|input File (--in)]], and must be sorted by coordinate.
     −
The output SAM/BAM file is also required, [[#output File (--out)|output File (--out)]].
+
The input SAM/BAM file ([[#input File (--in)|--in]]), the output SAM/BAM file ([[#output File (--out)|--out]]), and the reference file ([[#Reference File (--refFile)|--refFile]]) are required inputs.
    
= Usage =
 
= Usage =
Line 67: Line 48:  
{{inBAMInputFile}}
 
{{inBAMInputFile}}
 
{{outBAMOutputFile}}
 
{{outBAMOutputFile}}
  −
== BAM File Is Sorted By Read Name (<code>--minRecabQual</code>) ==
  −
  −
When recalibrating reads, only positions with a base quality greater than this minimum will be recalibrated.  If <code>--minQual</code> is not specified, it is defaulted to <span style="color:red">TBD</span>.
      
== Output log & Summary Statistics FileName (<code>--log</code>) ==
 
== Output log & Summary Statistics FileName (<code>--log</code>) ==
Line 78: Line 55:  
If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log".  Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr.  If the filename after --log starts with '-' it will write to stderr.
 
If this parameter is not specified, it will write to the output file specified in <code>--out</code> + ".log".  Or if the output bam is written to stdout (<code>--out</code> starts with '-'), the logs will be written to stderr.  If the filename after --log starts with '-' it will write to stderr.
   −
== Treat Reads with Mates On Different Chromosomes As Single-Ended (<code>--oneChrom</code>) ==
+
== Turn on Verbose Mode (<code>--verbose</code>) ==
   −
If a read's mate is not found it will not be used for duplicate marking.  If you are running on a single chromosome, all read's whose mates are on different chromosomes will not be used for duplicate marking.  The <code>--oneChrom</code> option will treat reads with mates on a different chromosome as single-ended.
+
Turn on verbose logging to get more log messages in the log and to stderr.
   −
== Recalibrate (<code>--recab</code>) ==
+
{{noeofBGZFParameter}}
 +
{{paramsParameter}}
   −
This option will recalibrate the input file in addition to deduping.
+
== Reference File (<code>--refFile</code>) ==
   −
== Remove Duplicates (<code>--rmDups</code>) ==
+
The reference file to use for comparing read bases to the reference.
   −
Instead of marking a read as duplicate in the flag, the <code>--rmDups</code> option will remove it from the output BAM file. 
+
== DBSNP File (<code>--dbsnp</code>) ==
   −
== Ignore Previous Duplicate Marking (<code>--force</code>) ==
+
The dbsnp file that specifies positions to skip recalibrating.  Tab delimited file with the chromosome in the first column and the 1-based position in the 2nd column.
   −
By default the deduper will throw an error and stop if a read is already marked as duplicate.  The <code>--force</code> option will removes any previous duplicate marking and marks the reads from scratch.  The resulting output file will only have reads determined by the deduper marked as duplicates.
+
== Blended Model Weight (<code>--blended</code>) ==
   −
== Turn on Verbose Mode (<code>--verbose</code>) ==
+
<span style="color:red">TBD - this parameter is not yet implemented.</span>
   −
Turn on verbose logging to get more log messages in the log and to stderr.
+
== BAM File Is Sorted By Read Name (<code>--minRecabQual</code>) ==
   −
{{noeofBGZFParameter}}
+
When recalibrating reads, only positions with a base quality greater than this minimum will be recalibrated.  If <code>--minQual</code> is not specified, it is defaulted to <span style="color:red">TBD - this parameter is not yet implemented.</span>.
{{paramsParameter}}
      
= Return Value =
 
= Return Value =

Navigation menu