Difference between revisions of "GotCloud: Filters"

From Genome Analysis Wiki
Jump to navigationJump to search
 
(2 intermediate revisions by the same user not shown)
Line 10: Line 10:
 
== Hard Filters ==
 
== Hard Filters ==
  
 +
See [[GotCloud:_Variant_Calling_Options#Hard_Filtering_Options|Hard Filtering Options]] for configuration settings having to do with hard filters.
 +
 +
In GotCloud:
 
{|border="1" cellspacing="0" cellpadding="2"
 
{|border="1" cellspacing="0" cellpadding="2"
 
! Filter Prefix !! Filter Description !! Filter if !! Default Value  
 
! Filter Prefix !! Filter Description !! Filter if !! Default Value  
Line 35: Line 38:
 
| STZ ||Strand Bias z-score || INFO:STZ > FILTER_MAX_STZ < INT_MAX || 5,10
 
| STZ ||Strand Bias z-score || INFO:STZ > FILTER_MAX_STZ < INT_MAX || 5,10
 
|-
 
|-
| fic || INFO:FIC < FILTER_MIN_FIC/100. > INT_MIN || -20,-10
+
| fic || || INFO:FIC < FILTER_MIN_FIC/100. > INT_MIN || -20,-10
 
|-
 
|-
 
| CBR || Cycle Bias Peason's correlation || INFO:CBR > FILTER_MAX_CBR/100. < 1 || 20,10
 
| CBR || Cycle Bias Peason's correlation || INFO:CBR > FILTER_MAX_CBR/100. < 1 || 20,10
 
|-
 
|-
| LQR || INFO:LQR > FILTER_MAX_LQR/100. < 1 || 30,20
+
| LQR || || INFO:LQR > FILTER_MAX_LQR/100. < 1 || 30,20
 
|-
 
|-
 
| AOI || Alternate allele inflation score || INFO:AOI > FILTER_MAX_AOI < INT_MAX || 5
 
| AOI || Alternate allele inflation score || INFO:AOI > FILTER_MAX_AOI < INT_MAX || 5
Line 45: Line 48:
 
| MQ0_ || Fraction of bases with mapQ=0 || INFO:MQ0 > FILTER_MAX_MQ0/100. < 1 || 10
 
| MQ0_ || Fraction of bases with mapQ=0 || INFO:MQ0 > FILTER_MAX_MQ0/100. < 1 || 10
 
|-
 
|-
| IOR || Ratio of base-quality inflation || INFO:IOR || FILTER_MAX_IOR/100. < INT_MAX/100. || off
+
| IOR || Ratio of base-quality inflation || INFO:IOR > FILTER_MAX_IOR < INT_MAX || off
 
|-
 
|-
 
| AOZ || Alternate allele quality z-score || INFO:AOZ > FILTER_MAX_AOZ < INT_MAX || off
 
| AOZ || Alternate allele quality z-score || INFO:AOZ > FILTER_MAX_AOZ < INT_MAX || off
|-
+
|}
 +
 
 +
To remove a filter, set it to blank or "off" in your user configuration file
 +
 
 +
The values of these filters must be numbers (or comma/space separated list of numbers
 +
 
 +
These rules apply to the following filters:
 +
*  Specifying 1 value in the filter will turn that filter on and use that value.
 +
*  Specifying 2 values in the filter (separated by ',' and/or ' ') turns on the filter. 
 +
**      Use the 1st value if the number of samples is below FILTER_FORMULA_MIN_SAMPLES
 +
**    Use the 2nd value if the number of samples is above FILTER_FORMULA_MAX_SAMPLES
 +
**      If the number of samples is between the MIN & MAX, a logscale is used:
 +
<pre>      (minVal - maxVal) * (log(maxSamples) - log(numSamples)) / (log(maxSamples) - log(minSamples)) + maxVal
 +
</pre>
 +
with:
 +
 
 +
FILTER_FORMULA_MIN_SAMPLES = 100
 +
 
 +
FILTER_FORMULA_MAX_SAMPLES = 1000
 +
 
 +
To add additional filters, set FILTER_ADDITIONAL with the --min/max specified below and the appropriate value, like:
  
 +
<pre>FILTER_ADDITIONAL = --maxSTP 5 --minLQZ 5</pre>
  
| FFRQ || || winFFRQ, maxFFRQ both > 0 || 0
+
{|border="1" cellspacing="0" cellpadding="2"
 +
! Filter Prefix !! Filter Description !! Filter if !! Default Value
 +
|-
 +
| FFRQ || || TBD winFFRQ, maxFFRQ both > 0 ||  
 
|-
 
|-
| STP || INFO:STP || maxSTP || < INT_MAX ||  
+
| STP || || INFO:STP > maxSTP < INT_MAX ||  
 
|-
 
|-
| TTT || INFO:TTT || maxTTT || < INT_MAX ||
+
| TTT || || INFO:TTT > maxTTT < INT_MAX ||
 
|-
 
|-
| ttt || INFO:TTT || minTTT || > INT_MIN ||
+
| ttt || || INFO:TTT < minTTT > INT_MIN ||
 
|-
 
|-
| LQZ || INFO:LQZ || maxLQZ || < INT_MAX ||  
+
| LQZ || || INFO:LQZ > maxLQZ < INT_MAX ||  
 
|-
 
|-
| lqz || INFO:LQZ || minLQZ || > INT_MIN ||  
+
| lqz || || INFO:LQZ < minLQZ > INT_MIN ||  
 
|-
 
|-
| RBZ || INFO:RBZ || maxRBZ || < INT_MAX ||  
+
| RBZ || || INFO:RBZ > maxRBZ < INT_MAX ||  
 
|-
 
|-
| rbz || INFO:RBZ || minRBZ || > INT_MIN ||  
+
| rbz || || INFO:RBZ < minRBZ > INT_MIN ||  
 
|-
 
|-
 
| CBZ || Cycle Bias z-score || INFO:CBZ > maxCBZ < INT_MAX ||  
 
| CBZ || Cycle Bias z-score || INFO:CBZ > maxCBZ < INT_MAX ||  
 
|-
 
|-
| cbr || INFO:CBR || minCBR/100. || > -100 ||  
+
| cbr || || INFO:CBR < minCBR/100. > -1 ||  
 
|-
 
|-
| QBR || INFO:QBR || maxQBR/100. || < 100 ||
+
| QBR || || INFO:QBR > maxQBR/100. < 1 ||
 
|-
 
|-
| qbr || INFO:QBR || minQBR/100. || > -100 ||
+
| qbr || || INFO:QBR < minQBR/100. > -1 ||
 
|-
 
|-
| CSR || INFO:CSR || maxCSR/100. || < 100 ||  
+
| CSR || Cycle-Strand Peason's Correlation || INFO:CSR > maxCSR/100. < 1 ||  
 
|-
 
|-
| csr || INFO:CSR || minCSR/100. || > -100 ||  
+
| csr || Cycle-Strand Peason's Correlation || INFO:CSR < minCSR/100. > -1 ||  
 
|-
 
|-
| IOZ || INFO:IOZ || maxIOZ || < INT_MAX ||  
+
| IOZ || Base quality inflation z-score || INFO:IOZ > maxIOZ < INT_MAX ||  
 
|-
 
|-
| ior || INFO:IOR || minIOR/100. || > INT_MIN ||  
+
| ior || Ratio of base-quality inflation || INFO:IOR < minIOR/100. > INT_MIN/100. ||  
 
|-
 
|-
| MQ10_ || INFO:MQ10 || maxMQ10/100. || < 100 || off
+
| MQ10_ || Fraction of bases with mapQ<=10 || INFO:MQ10 > maxMQ10/100. < 1 ||
 
|-
 
|-
| MQ20_ || INFO:MQ20 || maxMQ20/100. || < 100 || off
+
| MQ20_ || Fraction of bases with mapQ<=20 || INFO:MQ20 > maxMQ20/100. < 1 ||
 
|-
 
|-
| ABE || INFO:ABE || maxABE/100. || < 100 ||  
+
| ABE || || INFO:ABE > maxABE/100. < 1 ||  
 
|-
 
|-
| abe || INFO:ABE || minABE/100. || > -100 ||  
+
| abe || || INFO:ABE < minABE/100. > -1 ||  
 
|-
 
|-
| MBR || INFO:MBR || maxMBR/100. || < 100 ||  
+
| MBR || || INFO:MBR > maxMBR/100. < 1 ||  
 
|-
 
|-
| mbr || INFO:MBR || minMBR/100. || > -100 ||  
+
| mbr || || INFO:MBR < minMBR/100. > -1 ||  
 
|-
 
|-
| ABZ || INFO:ABZ || maxABZ || < INT_MAX ||  
+
| ABZ || || INFO:ABZ > maxABZ < INT_MAX ||  
 
|-
 
|-
| abz || INFO:ABZ || minABZ || > INT_MIN ||  
+
| abz || || INFO:ABZ < minABZ > INT_MIN ||  
 
|-
 
|-
| BCS || INFO:BCS || maxBCS || < INT_MAX ||  
+
| BCS || || INFO:BCS > maxBCS < INT_MAX ||  
 
|}
 
|}

Latest revision as of 10:53, 29 October 2014

GotCloud Filters

GotCloud uses multiple filters throughout the GotCloud Pipeline.

The primary filters applied are during the Variant Calling snpcall step.

There are two phases of filters:

  • Hard Filters
  • SVM Filters

Hard Filters

See Hard Filtering Options for configuration settings having to do with hard filters.

In GotCloud:

Filter Prefix Filter Description Filter if Default Value
q pred-scaled quality score QUAL < FILTER_MIN_QUAL > 0 5
INDEL FILTER_WIN_INDEL > 0 5
m Root Mean Squared Mapping Quality INFO:MQ < FILTER_MIN_MQ > 0 20
dp Total Depth at Site INFO:DP < #samples * FILTER_MIN_SAMPLE_DP > 0 #samples*1
DP Total Depth at Site INFO:DP > #samples * FILTER_MAX_SAMPLE_DP < INT_MAX #samples*1000
ns Number of Samples With Coverage INFO:NS < (FILTER_MIN_NS or FILTER_MIN_NS_FRAC * #samples) > 0 .5*#samples
AB Allele Balance in Heterozygotes INFO:AB > FILTER_MAX_ABL/100. < 1 70,65
STR Strand Bias Pearson's Correlation INFO:STR > FILTER_MAX_STR/100. < 1 20,10
str Strand Bias Pearson's Correlation INFO:STR < FILTER_MIN_STR/100. > -1 -20,-10
stz Strand Bias z-score INFO:STZ < FILTER_MIN_STZ > INT_MIN -5,-10
STZ Strand Bias z-score INFO:STZ > FILTER_MAX_STZ < INT_MAX 5,10
fic INFO:FIC < FILTER_MIN_FIC/100. > INT_MIN -20,-10
CBR Cycle Bias Peason's correlation INFO:CBR > FILTER_MAX_CBR/100. < 1 20,10
LQR INFO:LQR > FILTER_MAX_LQR/100. < 1 30,20
AOI Alternate allele inflation score INFO:AOI > FILTER_MAX_AOI < INT_MAX 5
MQ0_ Fraction of bases with mapQ=0 INFO:MQ0 > FILTER_MAX_MQ0/100. < 1 10
IOR Ratio of base-quality inflation INFO:IOR > FILTER_MAX_IOR < INT_MAX off
AOZ Alternate allele quality z-score INFO:AOZ > FILTER_MAX_AOZ < INT_MAX off

To remove a filter, set it to blank or "off" in your user configuration file

The values of these filters must be numbers (or comma/space separated list of numbers

These rules apply to the following filters:

  • Specifying 1 value in the filter will turn that filter on and use that value.
  • Specifying 2 values in the filter (separated by ',' and/or ' ') turns on the filter.
    • Use the 1st value if the number of samples is below FILTER_FORMULA_MIN_SAMPLES
    • Use the 2nd value if the number of samples is above FILTER_FORMULA_MAX_SAMPLES
    • If the number of samples is between the MIN & MAX, a logscale is used:
      (minVal - maxVal) * (log(maxSamples) - log(numSamples)) / (log(maxSamples) - log(minSamples)) + maxVal

with:

FILTER_FORMULA_MIN_SAMPLES = 100

FILTER_FORMULA_MAX_SAMPLES = 1000

To add additional filters, set FILTER_ADDITIONAL with the --min/max specified below and the appropriate value, like:

FILTER_ADDITIONAL = --maxSTP 5 --minLQZ 5
Filter Prefix Filter Description Filter if Default Value
FFRQ TBD winFFRQ, maxFFRQ both > 0
STP INFO:STP > maxSTP < INT_MAX
TTT INFO:TTT > maxTTT < INT_MAX
ttt INFO:TTT < minTTT > INT_MIN
LQZ INFO:LQZ > maxLQZ < INT_MAX
lqz INFO:LQZ < minLQZ > INT_MIN
RBZ INFO:RBZ > maxRBZ < INT_MAX
rbz INFO:RBZ < minRBZ > INT_MIN
CBZ Cycle Bias z-score INFO:CBZ > maxCBZ < INT_MAX
cbr INFO:CBR < minCBR/100. > -1
QBR INFO:QBR > maxQBR/100. < 1
qbr INFO:QBR < minQBR/100. > -1
CSR Cycle-Strand Peason's Correlation INFO:CSR > maxCSR/100. < 1
csr Cycle-Strand Peason's Correlation INFO:CSR < minCSR/100. > -1
IOZ Base quality inflation z-score INFO:IOZ > maxIOZ < INT_MAX
ior Ratio of base-quality inflation INFO:IOR < minIOR/100. > INT_MIN/100.
MQ10_ Fraction of bases with mapQ<=10 INFO:MQ10 > maxMQ10/100. < 1
MQ20_ Fraction of bases with mapQ<=20 INFO:MQ20 > maxMQ20/100. < 1
ABE INFO:ABE > maxABE/100. < 1
abe INFO:ABE < minABE/100. > -1
MBR INFO:MBR > maxMBR/100. < 1
mbr INFO:MBR < minMBR/100. > -1
ABZ INFO:ABZ > maxABZ < INT_MAX
abz INFO:ABZ < minABZ > INT_MIN
BCS INFO:BCS > maxBCS < INT_MAX