Too stupid for quantile_over_time

43 views
Skip to first unread message

Daniel Trüssel

unread,
Mar 31, 2020, 2:55:20 AM3/31/20
to Prometheus Users
Hey

alert config:

avg_over_time(probe_duration_seconds{env="Prod",job=~"yyy|ccc",service!="vvv"}[15m])
>1

see screenshots in attachment for peaks > 5 sec, most of the calls are fast.

I wish to filter out those peaks for night alerting.
I searched with DDG for quantile_over_time examples, but I not
understand the basic math behind it.

kind regards
Daniel
08Ga3Wp.png
66ojspa.png

Brian Candler

unread,
Mar 31, 2020, 4:56:14 AM3/31/20
to Prometheus Users
quantile_over_time(0.5, foo[t]) returns some value X, where 50% of the values in foo over the time range are below X and 50% are above X (i.e. "the median value").  Think of it as: sort all the values from low to high, and pick the middle one.

quantile_over_time(0.95, foo[t]) returns some value X, where 95% of the values in foo over the time range are below X and 5% are above X.  Think of it as: discard the highest 5% of values, then pick the highest remaining.

Therefore:

quantile_over_time(0.95, foo[t]) > 1 will alert if 5% or more of the samples have a duration of more than 1 second.
Reply all
Reply to author
Forward
0 new messages