GotCloud: Filters

From Genome Analysis Wiki
Jump to: navigation, search

GotCloud Filters

GotCloud uses multiple filters throughout the GotCloud Pipeline.

The primary filters applied are during the Variant Calling snpcall step.

There are two phases of filters:

  • Hard Filters
  • SVM Filters

Hard Filters

See Hard Filtering Options for configuration settings having to do with hard filters.

In GotCloud:

Filter Prefix Filter Description Filter if Default Value
q pred-scaled quality score QUAL < FILTER_MIN_QUAL > 0 5
INDEL FILTER_WIN_INDEL > 0 5
m Root Mean Squared Mapping Quality INFO:MQ < FILTER_MIN_MQ > 0 20
dp Total Depth at Site INFO:DP < #samples * FILTER_MIN_SAMPLE_DP > 0 #samples*1
DP Total Depth at Site INFO:DP > #samples * FILTER_MAX_SAMPLE_DP < INT_MAX #samples*1000
ns Number of Samples With Coverage INFO:NS < (FILTER_MIN_NS or FILTER_MIN_NS_FRAC * #samples) > 0 .5*#samples
AB Allele Balance in Heterozygotes INFO:AB > FILTER_MAX_ABL/100. < 1 70,65
STR Strand Bias Pearson's Correlation INFO:STR > FILTER_MAX_STR/100. < 1 20,10
str Strand Bias Pearson's Correlation INFO:STR < FILTER_MIN_STR/100. > -1 -20,-10
stz Strand Bias z-score INFO:STZ < FILTER_MIN_STZ > INT_MIN -5,-10
STZ Strand Bias z-score INFO:STZ > FILTER_MAX_STZ < INT_MAX 5,10
fic INFO:FIC < FILTER_MIN_FIC/100. > INT_MIN -20,-10
CBR Cycle Bias Peason's correlation INFO:CBR > FILTER_MAX_CBR/100. < 1 20,10
LQR INFO:LQR > FILTER_MAX_LQR/100. < 1 30,20
AOI Alternate allele inflation score INFO:AOI > FILTER_MAX_AOI < INT_MAX 5
MQ0_ Fraction of bases with mapQ=0 INFO:MQ0 > FILTER_MAX_MQ0/100. < 1 10
IOR Ratio of base-quality inflation INFO:IOR > FILTER_MAX_IOR < INT_MAX off
AOZ Alternate allele quality z-score INFO:AOZ > FILTER_MAX_AOZ < INT_MAX off

To remove a filter, set it to blank or "off" in your user configuration file

The values of these filters must be numbers (or comma/space separated list of numbers

These rules apply to the following filters:

  • Specifying 1 value in the filter will turn that filter on and use that value.
  • Specifying 2 values in the filter (separated by ',' and/or ' ') turns on the filter.
    • Use the 1st value if the number of samples is below FILTER_FORMULA_MIN_SAMPLES
    • Use the 2nd value if the number of samples is above FILTER_FORMULA_MAX_SAMPLES
    • If the number of samples is between the MIN & MAX, a logscale is used:
      (minVal - maxVal) * (log(maxSamples) - log(numSamples)) / (log(maxSamples) - log(minSamples)) + maxVal

with:

FILTER_FORMULA_MIN_SAMPLES = 100

FILTER_FORMULA_MAX_SAMPLES = 1000

To add additional filters, set FILTER_ADDITIONAL with the --min/max specified below and the appropriate value, like:

FILTER_ADDITIONAL = --maxSTP 5 --minLQZ 5
Filter Prefix Filter Description Filter if Default Value
FFRQ TBD winFFRQ, maxFFRQ both > 0
STP INFO:STP > maxSTP < INT_MAX
TTT INFO:TTT > maxTTT < INT_MAX
ttt INFO:TTT < minTTT > INT_MIN
LQZ INFO:LQZ > maxLQZ < INT_MAX
lqz INFO:LQZ < minLQZ > INT_MIN
RBZ INFO:RBZ > maxRBZ < INT_MAX
rbz INFO:RBZ < minRBZ > INT_MIN
CBZ Cycle Bias z-score INFO:CBZ > maxCBZ < INT_MAX
cbr INFO:CBR < minCBR/100. > -1
QBR INFO:QBR > maxQBR/100. < 1
qbr INFO:QBR < minQBR/100. > -1
CSR Cycle-Strand Peason's Correlation INFO:CSR > maxCSR/100. < 1
csr Cycle-Strand Peason's Correlation INFO:CSR < minCSR/100. > -1
IOZ Base quality inflation z-score INFO:IOZ > maxIOZ < INT_MAX
ior Ratio of base-quality inflation INFO:IOR < minIOR/100. > INT_MIN/100.
MQ10_ Fraction of bases with mapQ<=10 INFO:MQ10 > maxMQ10/100. < 1
MQ20_ Fraction of bases with mapQ<=20 INFO:MQ20 > maxMQ20/100. < 1
ABE INFO:ABE > maxABE/100. < 1
abe INFO:ABE < minABE/100. > -1
MBR INFO:MBR > maxMBR/100. < 1
mbr INFO:MBR < minMBR/100. > -1
ABZ INFO:ABZ > maxABZ < INT_MAX
abz INFO:ABZ < minABZ > INT_MIN
BCS INFO:BCS > maxBCS < INT_MAX