GotCloud: Filters
From Genome Analysis Wiki
GotCloud Filters
GotCloud uses multiple filters throughout the GotCloud Pipeline.
The primary filters applied are during the Variant Calling snpcall step.
There are two phases of filters:
- Hard Filters
- SVM Filters
Hard Filters
In GotCloud:
Filter Prefix | Filter Description | Filter if | Default Value |
---|---|---|---|
q | pred-scaled quality score | QUAL < FILTER_MIN_QUAL > 0 | 5 |
INDEL | FILTER_WIN_INDEL > 0 | 5 | |
m | Root Mean Squared Mapping Quality | INFO:MQ < FILTER_MIN_MQ > 0 | 20 |
dp | Total Depth at Site | INFO:DP < #samples * FILTER_MIN_SAMPLE_DP > 0 | #samples*1 |
DP | Total Depth at Site | INFO:DP > #samples * FILTER_MAX_SAMPLE_DP < INT_MAX | #samples*1000 |
ns | Number of Samples With Coverage | INFO:NS < (FILTER_MIN_NS or FILTER_MIN_NS_FRAC * #samples) > 0 | .5*#samples |
AB | Allele Balance in Heterozygotes | INFO:AB > FILTER_MAX_ABL/100. < 1 | 70,65 |
STR | Strand Bias Pearson's Correlation | INFO:STR > FILTER_MAX_STR/100. < 1 | 20,10 |
str | Strand Bias Pearson's Correlation | INFO:STR < FILTER_MIN_STR/100. > -1 | -20,-10 |
stz | Strand Bias z-score | INFO:STZ < FILTER_MIN_STZ > INT_MIN | -5,-10 |
STZ | Strand Bias z-score | INFO:STZ > FILTER_MAX_STZ < INT_MAX | 5,10 |
fic | INFO:FIC < FILTER_MIN_FIC/100. > INT_MIN | -20,-10 | |
CBR | Cycle Bias Peason's correlation | INFO:CBR > FILTER_MAX_CBR/100. < 1 | 20,10 |
LQR | INFO:LQR > FILTER_MAX_LQR/100. < 1 | 30,20 | |
AOI | Alternate allele inflation score | INFO:AOI > FILTER_MAX_AOI < INT_MAX | 5 |
MQ0_ | Fraction of bases with mapQ=0 | INFO:MQ0 > FILTER_MAX_MQ0/100. < 1 | 10 |
IOR | Ratio of base-quality inflation | INFO:IOR > FILTER_MAX_IOR < INT_MAX | off |
AOZ | Alternate allele quality z-score | INFO:AOZ > FILTER_MAX_AOZ < INT_MAX | off |
To remove a filter, set it to blank or "off" in your user configuration file
The values of these filters must be numbers (or comma/space separated list of numbers
These rules apply to the following filters:
- Specifying 1 value in the filter will turn that filter on and use that value.
- Specifying 2 values in the filter (separated by ',' and/or ' ') turns on the filter.
- Use the 1st value if the number of samples is below FILTER_FORMULA_MIN_SAMPLES
- Use the 2nd value if the number of samples is above FILTER_FORMULA_MAX_SAMPLES
- If the number of samples is between the MIN & MAX, a logscale is used:
(minVal - maxVal) * (log(maxSamples) - log(numSamples)) / (log(maxSamples) - log(minSamples)) + maxVal
with:
FILTER_FORMULA_MIN_SAMPLES = 100
FILTER_FORMULA_MAX_SAMPLES = 1000
To add additional filters, set FILTER_ADDITIONAL with the --min/max specified below and the appropriate value, like:
FILTER_ADDITIONAL = --maxSTP 5 --minLQZ 5
Filter Prefix | Filter Description | Filter if | Default Value |
---|---|---|---|
FFRQ | TBD winFFRQ, maxFFRQ both > 0 | ||
STP | INFO:STP > maxSTP < INT_MAX | ||
TTT | INFO:TTT > maxTTT < INT_MAX | ||
ttt | INFO:TTT < minTTT > INT_MIN | ||
LQZ | INFO:LQZ > maxLQZ < INT_MAX | ||
lqz | INFO:LQZ < minLQZ > INT_MIN | ||
RBZ | INFO:RBZ > maxRBZ < INT_MAX | ||
rbz | INFO:RBZ < minRBZ > INT_MIN | ||
CBZ | Cycle Bias z-score | INFO:CBZ > maxCBZ < INT_MAX | |
cbr | INFO:CBR < minCBR/100. > -1 | ||
QBR | INFO:QBR > maxQBR/100. < 1 | ||
qbr | INFO:QBR < minQBR/100. > -1 | ||
CSR | Cycle-Strand Peason's Correlation | INFO:CSR > maxCSR/100. < 1 | |
csr | Cycle-Strand Peason's Correlation | INFO:CSR < minCSR/100. > -1 | |
IOZ | Base quality inflation z-score | INFO:IOZ > maxIOZ < INT_MAX | |
ior | Ratio of base-quality inflation | INFO:IOR < minIOR/100. > INT_MIN/100. | |
MQ10_ | Fraction of bases with mapQ<=10 | INFO:MQ10 > maxMQ10/100. < 1 | |
MQ20_ | Fraction of bases with mapQ<=20 | INFO:MQ20 > maxMQ20/100. < 1 | |
ABE | INFO:ABE > maxABE/100. < 1 | ||
abe | INFO:ABE < minABE/100. > -1 | ||
MBR | INFO:MBR > maxMBR/100. < 1 | ||
mbr | INFO:MBR < minMBR/100. > -1 | ||
ABZ | INFO:ABZ > maxABZ < INT_MAX | ||
abz | INFO:ABZ < minABZ > INT_MIN | ||
BCS | INFO:BCS > maxBCS < INT_MAX |