Changes

From Genome Analysis Wiki
Jump to navigationJump to search
2,549 bytes added ,  17:42, 18 September 2012
no edit summary
Line 37: Line 37:  
** base quality > [[#Minimum Recalibration Base Quality (--minBaseQual)|minBaseQual (5 by default)]]
 
** base quality > [[#Minimum Recalibration Base Quality (--minBaseQual)|minBaseQual (5 by default)]]
 
* Additional criteria for cycle != 1 (can be turned off via flags)
 
* Additional criteria for cycle != 1 (can be turned off via flags)
** previous base is a CIGAR Match/Mismatch
+
** previous base is a CIGAR Match/Mismatch (Use [[#Allow Previous Base Non-Match/Mismatch (--keepPrevNonAdjacent)|<code>--keepPrevNonAdjacent</code>]] to disable)
** previous base position is not a [[#DBSNP File (--dbsnp)|dbSNP position]]
+
** previous base position is not a [[#DBSNP File (--dbsnp)|dbSNP position]] (Use [[#Allow Previous Base DBSNP (--keepPrevDbsnp)|<code>--keepPrevDbsnp</code>]] to disable)
      Line 48: Line 48:  
Recalibrated Quality is: <math>-10 * \log \frac{mismatches + 1}{mismatches + matches + 1}</math>
 
Recalibrated Quality is: <math>-10 * \log \frac{mismatches + 1}{mismatches + matches + 1}</math>
    +
Alternatively, [[#Logistic Regression (--useLogReg)|logistic regression]] can be used for calculating the new quality.
    
If the Recalibrated Quality is greater than [[#Maximum Recalibration Base Quality (--maxBaseQual)|maxBaseQual]], the updated quality is set to maxBaseQual.
 
If the Recalibrated Quality is greater than [[#Maximum Recalibration Base Quality (--maxBaseQual)|maxBaseQual]], the updated quality is set to maxBaseQual.
Line 65: Line 66:     
= Usage =
 
= Usage =
  ./bam recab (options) --in <InputBamFile> --out <OutputFile> [--log <logFile>] [--verbose] [--noeof] [--params] --refFile <ReferenceFile> [--dbsnp <dbsnpFile>] [--minBaseQual <minBaseQual>] [--maxBaseQual <maxBaseQual>] [--blended <weight>] [--skipFitModel] [--fast] [--keepPrevDbsnp] [--keepPrevNonAdjacent] [--useLogReg]
+
  ./bam recab (options) --in <InputBamFile> --out <OutputFile> [--log <logFile>] [--verbose] [--noeof] [--params] --refFile <ReferenceFile> [--dbsnp <dbsnpFile>] [--minBaseQual <minBaseQual>] [--maxBaseQual <maxBaseQual>] [--blended <weight>] [--fitModel] [--fast] [--keepPrevDbsnp] [--keepPrevNonAdjacent] [--useLogReg]
    
= Parameters =
 
= Parameters =
Line 85: Line 86:  
--maxBaseQual <maxBaseQual>  : maximum recalibrated base quality (default: 50)
 
--maxBaseQual <maxBaseQual>  : maximum recalibrated base quality (default: 50)
 
--blended <weight>            : blended model weight
 
--blended <weight>            : blended model weight
--skipFitModel                : do not check if the logistic regression model fits the data
+
--fitModel                    : check if the logistic regression model fits the data
 +
                                overriden by fast, but automatically applied by useLogReg
 
--fast                        : use a compact representation that only allows:
 
--fast                        : use a compact representation that only allows:
 
                                  * at most 256 Read Groups
 
                                  * at most 256 Read Groups
 
                                  * maximum quality 63
 
                                  * maximum quality 63
 
                                  * at most 127 cycles
 
                                  * at most 127 cycles
                                automatically enables skipFitModel, but is overridden by useLogReg
+
                                overrides fitModel, but is overridden by useLogReg
 
                                uses up to about 2.25G more memory than running without --fast.
 
                                uses up to about 2.25G more memory than running without --fast.
 
--keepPrevDbsnp              : do not exclude entries where the previous base is in dbsnp when
 
--keepPrevDbsnp              : do not exclude entries where the previous base is in dbsnp when
Line 99: Line 101:  
                                By default they are excluded from the table (except the first cycle).
 
                                By default they are excluded from the table (except the first cycle).
 
--useLogReg                  : use logistic regression calculated quality for the new quality
 
--useLogReg                  : use logistic regression calculated quality for the new quality
                                ignores setting of skipFitModel and fast.
+
                                automatically applies fitModel and overrides fast.
 
--qualField <quality tag>    : tag to get the starting base quality
 
--qualField <quality tag>    : tag to get the starting base quality
 
                                (default is to get it from the Quality field)
 
                                (default is to get it from the Quality field)
Line 123: Line 125:  
== Reference File (<code>--refFile</code>) ==
 
== Reference File (<code>--refFile</code>) ==
   −
The reference file to use for comparing read bases to the reference.
+
The reference file is a required parameter used for comparing read bases to the reference.
    
== DBSNP File (<code>--dbsnp</code>) ==
 
== DBSNP File (<code>--dbsnp</code>) ==
    
The dbsnp file that specifies positions to skip recalibrating.  Tab delimited file with the chromosome in the first column and the 1-based position in the 2nd column.
 
The dbsnp file that specifies positions to skip recalibrating.  Tab delimited file with the chromosome in the first column and the 1-based position in the 2nd column.
  −
== Blended Model Weight (<code>--blended</code>) ==
  −
  −
<span style="color:red">TBD - this parameter is not yet implemented.</span>
      
== Minimum Recalibration Base Quality (<code>--minBaseQual</code>) ==
 
== Minimum Recalibration Base Quality (<code>--minBaseQual</code>) ==
Line 142: Line 140:     
This value sets the maximum phred base quality assigned to a base after recalibrating. Any qualities above this value will be set to this value.  It is defaulted to 50.  
 
This value sets the maximum phred base quality assigned to a base after recalibrating. Any qualities above this value will be set to this value.  It is defaulted to 50.  
 +
 +
== Blended Model Weight (<code>--blended</code>) ==
 +
 +
<span style="color:red">TBD - this parameter is not yet implemented.</span>
 +
 +
== Fit Model (<code>--fitModel</code>) ==
 +
 +
Check if the logistic regression model fits the data.
 +
 +
This option does NOT set the new qualities to the logistic regression calculated qualities, it only checks the fit.  To apply the logistic regression qualities, see [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]].  <code>--fitModel</code> is automatically applied when <code>--useLogReg</code> is specified.
 +
                             
 +
This option cannot be used in conjunction with [[#Fast Recalibration (--fast)|<code>--fast</code>]] and is overriden by <code>--fast</code>, but automatically applied by useLogReg
 +
 +
== Fast Recalibration (<code>--fast</code>) ==
 +
 +
Use a compact representation of the Recalibration Table that only allows:
 +
* at most 256 Read Groups
 +
* maximum quality 63
 +
* at most 127 cycles
 +
 +
This option will run faster than the default recalibration, but uses up to about 2.25G more memory than running without --fast.
 +
 +
This option cannot be used in conjunction with [[#Fit Model (--fitModel)|<code>--fitModel</code>]], or [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]] and overrides [[#Fit Model (--fitModel)|<code>--fitModel</code>]], but is overridden by [[#Logistic Regression (--useLogReg)|<code>--useLogReg</code>]].
 +
 +
                             
 +
This option cannot be used in conjunction with [[#Fast Recalibration (--fast)|<code>--fast</code>]] and is overriden by <code>--fast</code>, but automatically applied by useLogReg
 +
 +
== Allow Previous Base DBSNP (<code>--keepPrevDbsnp</code>) ==
 +
 +
By default bases where the previous base is in DBSNP are excluded from the Recalibration Table.
 +
 +
This option includes these bases in the building of the Recalibration Table.
 +
 +
== Allow Previous Base Non-Match/Mismatch (<code>--keepPrevNonAdjacent</code>) ==
 +
 +
By default bases where the previous base is not a CIGAR Match/Mismatch are excluded from the Recalibration Table.
 +
 +
This option includes these bases in the building of the Recalibration Table.
 +
    
== Logistic Regression (<code>--useLogReg</code>) ==
 
== Logistic Regression (<code>--useLogReg</code>) ==
   −
Use the logistic regression empirical qualities for setting the new base qualities instead of the default formula-10 * log10((#mismatches+1)/(#total+1))
+
Use the logistic regression empirical qualities for setting the new base qualities instead of the default formula.
 +
 
 +
This option automatically enables [[#Fit Model (--fitModel)|<code>--fitModel</code>]] and disables [[#Fast Recalibration (--fast)|<code>--fast</code>]].
    
== Read the quality from a tag (<code>--qualField</code>) ==
 
== Read the quality from a tag (<code>--qualField</code>) ==

Navigation menu