GotCloud: Filters

From Genome Analysis Wiki
Jump to navigationJump to search

GotCloud Filters

GotCloud uses multiple filters throughout the GotCloud Pipeline.

The primary filters applied are during the Variant Calling snpcall step.

There are two phases of filters:

  • Hard Filters
  • SVM Filters

Hard Filters

See Hard Filtering Options for configuration settings having to do with hard filters.

In GotCloud:

Filter Prefix Filter Description Filter if Default Value
q pred-scaled quality score QUAL < FILTER_MIN_QUAL > 0 5
INDEL FILTER_WIN_INDEL > 0 5
m Root Mean Squared Mapping Quality INFO:MQ < FILTER_MIN_MQ > 0 20
dp Total Depth at Site INFO:DP < #samples * FILTER_MIN_SAMPLE_DP > 0 #samples*1
DP Total Depth at Site INFO:DP > #samples * FILTER_MAX_SAMPLE_DP < INT_MAX #samples*1000
ns Number of Samples With Coverage INFO:NS < (FILTER_MIN_NS or FILTER_MIN_NS_FRAC * #samples) > 0 .5*#samples
AB Allele Balance in Heterozygotes INFO:AB > FILTER_MAX_ABL/100. < 1 70,65
STR Strand Bias Pearson's Correlation INFO:STR > FILTER_MAX_STR/100. < 1 20,10
str Strand Bias Pearson's Correlation INFO:STR < FILTER_MIN_STR/100. > -1 -20,-10
stz Strand Bias z-score INFO:STZ < FILTER_MIN_STZ > INT_MIN -5,-10
STZ Strand Bias z-score INFO:STZ > FILTER_MAX_STZ < INT_MAX 5,10
fic INFO:FIC < FILTER_MIN_FIC/100. > INT_MIN -20,-10
CBR Cycle Bias Peason's correlation INFO:CBR > FILTER_MAX_CBR/100. < 1 20,10
LQR INFO:LQR > FILTER_MAX_LQR/100. < 1 30,20
AOI Alternate allele inflation score INFO:AOI > FILTER_MAX_AOI < INT_MAX 5
MQ0_ Fraction of bases with mapQ=0 INFO:MQ0 > FILTER_MAX_MQ0/100. < 1 10
IOR Ratio of base-quality inflation INFO:IOR > FILTER_MAX_IOR < INT_MAX off
AOZ Alternate allele quality z-score INFO:AOZ > FILTER_MAX_AOZ < INT_MAX off

To remove a filter, set it to blank or "off" in your user configuration file

The values of these filters must be numbers (or comma/space separated list of numbers

These rules apply to the following filters:

  • Specifying 1 value in the filter will turn that filter on and use that value.
  • Specifying 2 values in the filter (separated by ',' and/or ' ') turns on the filter.
    • Use the 1st value if the number of samples is below FILTER_FORMULA_MIN_SAMPLES
    • Use the 2nd value if the number of samples is above FILTER_FORMULA_MAX_SAMPLES
    • If the number of samples is between the MIN & MAX, a logscale is used:
      (minVal - maxVal) * (log(maxSamples) - log(numSamples)) / (log(maxSamples) - log(minSamples)) + maxVal

with:

FILTER_FORMULA_MIN_SAMPLES = 100

FILTER_FORMULA_MAX_SAMPLES = 1000

To add additional filters, set FILTER_ADDITIONAL with the --min/max specified below and the appropriate value, like:

FILTER_ADDITIONAL = --maxSTP 5 --minLQZ 5
Filter Prefix Filter Description Filter if Default Value
FFRQ TBD winFFRQ, maxFFRQ both > 0
STP INFO:STP > maxSTP < INT_MAX
TTT INFO:TTT > maxTTT < INT_MAX
ttt INFO:TTT < minTTT > INT_MIN
LQZ INFO:LQZ > maxLQZ < INT_MAX
lqz INFO:LQZ < minLQZ > INT_MIN
RBZ INFO:RBZ > maxRBZ < INT_MAX
rbz INFO:RBZ < minRBZ > INT_MIN
CBZ Cycle Bias z-score INFO:CBZ > maxCBZ < INT_MAX
cbr INFO:CBR < minCBR/100. > -1
QBR INFO:QBR > maxQBR/100. < 1
qbr INFO:QBR < minQBR/100. > -1
CSR Cycle-Strand Peason's Correlation INFO:CSR > maxCSR/100. < 1
csr Cycle-Strand Peason's Correlation INFO:CSR < minCSR/100. > -1
IOZ Base quality inflation z-score INFO:IOZ > maxIOZ < INT_MAX
ior Ratio of base-quality inflation INFO:IOR < minIOR/100. > INT_MIN/100.
MQ10_ Fraction of bases with mapQ<=10 INFO:MQ10 > maxMQ10/100. < 1
MQ20_ Fraction of bases with mapQ<=20 INFO:MQ20 > maxMQ20/100. < 1
ABE INFO:ABE > maxABE/100. < 1
abe INFO:ABE < minABE/100. > -1
MBR INFO:MBR > maxMBR/100. < 1
mbr INFO:MBR < minMBR/100. > -1
ABZ INFO:ABZ > maxABZ < INT_MAX
abz INFO:ABZ < minABZ > INT_MIN
BCS INFO:BCS > maxBCS < INT_MAX