how to alert for sudden changes in the value of metric?

2,139 views
Skip to first unread message

JI Ioannidis

unread,
Jul 24, 2017, 11:41:47 PM7/24/17
to Prometheus Users
How can I express an alert rule such as "if metric X becomes 2 times what its average has been for the last week, and stays there for ten minutes". I tried:

ALERT suddenjump
  IF fufutos > 2 * avg_over_time(fufutos [7d])
    FOR 10m

(fufutos is my metric, obviously, and I want a different alert for every label combination, hence no labels are selected). 

It seems to work, but is this the "proper" way of setting up such an alert, or is there some catch that I'm missing / a better way? I assume I can use any of the foo_over_time functions?

Thanks,

/ji

Ben Kochie

unread,
Jul 25, 2017, 1:22:54 AM7/25/17
to JI Ioannidis, Prometheus Users
That seems like the right way to do what you're looking for.

But, the "proper" thing to do is follow the "symptoms-based alerting" guidelines:



The difficulty with what you're trying to do is going to be problematic on multiple levels.
* What does 2x the last week mean for your users?  You could just set a threshold.
* What if last week was bad, it's going to ruin your alerting for this week.
* What if the problem is getting worse slowly, this will likely miss it.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/4168794b-6050-4e1a-a94b-471317ebb61a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Matthias Rampke

unread,
Jul 25, 2017, 5:51:53 AM7/25/17
to Ben Kochie, JI Ioannidis, Prometheus Users
… of course, "it unexpectedly doubled / halved" can be a symptom :)

Other forms of this that I've used:

comparing the value to the one exactly a week go (if you have a daily/weekly usage cycle): some_metric < 0.9* (some_metric offset 7d)
the same, but slightly smoothed: avg_over_time(some_metric[1h]) < 0.9* avg_over_time(some_metric[1h] offset 7d)
ditto, but subtracting now and last week & imposing an absolute threshold: avg_over_time(some_metric[1h]) - avg_over_time(some_metric[1h] offset 7d) > 9000

I've lately been experimenting with the holt_winters function, having it create a smoothed prediction based on the last 7d and then alerting on the difference between that and reality, but I don't actually know if that …means anything?

/MR

On Tue, Jul 25, 2017 at 5:22 AM Ben Kochie <sup...@gmail.com> wrote:
That seems like the right way to do what you're looking for.

But, the "proper" thing to do is follow the "symptoms-based alerting" guidelines:



The difficulty with what you're trying to do is going to be problematic on multiple levels.
* What does 2x the last week mean for your users?  You could just set a threshold.
* What if last week was bad, it's going to ruin your alerting for this week.
* What if the problem is getting worse slowly, this will likely miss it.
On Tue, Jul 25, 2017 at 5:41 AM, JI Ioannidis <jay...@gmail.com> wrote:
How can I express an alert rule such as "if metric X becomes 2 times what its average has been for the last week, and stays there for ten minutes". I tried:

ALERT suddenjump
  IF fufutos > 2 * avg_over_time(fufutos [7d])
    FOR 10m

(fufutos is my metric, obviously, and I want a different alert for every label combination, hence no labels are selected). 

It seems to work, but is this the "proper" way of setting up such an alert, or is there some catch that I'm missing / a better way? I assume I can use any of the foo_over_time functions?

Thanks,

/ji

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CABbyFmoBW%2BN5pvGssJe4m%2BD2VUGRE5Vn62-X%3DMPoH8fbTJSdHg%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages