Inhibit alerts that already fired in the last 24 hours

298 views
Skip to first unread message

Atanas Filyanov

unread,
Mar 15, 2023, 11:08:24 AM3/15/23
to Prometheus Users
Hi,

I am trying to have an alert that could only be fired once per 24 hours. That mean if the alert fires now, then resolves after, e.g., 10 mins and after another 10 gets into a firing state again, it won't send the second alert as it already fired in the last 24 hours.

I thought to use inhibit_rules and ALERTS_FOR_STATE[24h] but I do not think these can be used together.

Does anyone have an idea of how this could be achieved?

Thanks,
Atanas

Julius Volz

unread,
Mar 21, 2023, 6:21:05 AM3/21/23
to Atanas Filyanov, Prometheus Users
Hi,

I guess the nature of the alert is such that the various notification throttling intervals in the Alertmanager ("repeat_interval" etc.) are not sufficient for what you want?

Inhibit rules can only refer to source alerts that are currently firing, and while you could *theoretically* write something like this:

        $MY_ALERTING_EXPRESSION
    unless ignoring(alertstate, alertname, $OTHER_ALERTING_RULE_LABELS)
        present_over_time(ALERTS{alertstate="firing",alertname="$MY_ALERT_NAME"}[24h])

...the problem with that is that it would really only generate the alert on a single rule evaluation cycle and then immediately resolve it (because in the second evaluation it sees that the alert already fired in the last 24h, so the expression returns nothing anymore and the alert resolves). That will not yield anything stable for you, and you might even miss the alert completely in case the communication between Prometheus and Alertmanager is interrupted exactly at that initial time. So it could be interesting to learn more about what your actual use case is to understand the underlying requirements for this better.

Caveat: Prometheus 2.42.2 (https://github.com/prometheus/prometheus/releases/tag/v2.42.0) introduced a "keep_firing_for" field for alerting rules that could be used to keep the alert firing for X amount of minutes or hours after that single first firing cycle. Still, it would auto-resolve after that time window, even if no underlying problem has been addressed/fixed.

Regards,
Julius

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/e4443270-ede8-4eee-a993-ec42295bed86n%40googlegroups.com.


--
Julius Volz
PromLabs - promlabs.com

Atanas Filyanov

unread,
Mar 21, 2023, 7:43:34 AM3/21/23
to Julius Volz, Prometheus Users
Hi Julius,

Thanks for the reply and the suggestions. So my use case is as follows - I am monitoring the price movements of certains coins, if the price drops, e.g., more than 5%, I would fire an alert. But once the alert is fired, I don't want it to fire for the next 24 hours. I already use the repeat_interval which works to some extent, but the problem with that is that as the price of the coin changes, it could rise again and resolve the alert but then after a few hours could drop under the threshold again and the alert will fire, which I don't want to. I have defined rules for other threshold levels, such as 10%, 15%, etc. and want the same behavior for these too.

Does it make sense?

I understand what you are saying regarding the problems with the above expression. It will fire and resolve after my evaluation interval in the next cycle. And not when the value actually rises above the threshold, which would be the expected behavior.

Thanks,
Atanas

Atanas Filyanov

unread,
Mar 21, 2023, 8:04:56 AM3/21/23
to Julius Volz, Prometheus Users
I was also thinking of adding a label in the alert, such as:
labels:
  severity: warning
  mute: |
    "
    {{ `changes(ALERTS_FOR_STATE{alertname="MYALERTNAME", name="%s"}[24h])` $labels.name | query | first }}
    "

and then only forward to the receiver if the "mute" value is 0. The expression is not quite right though, I need to fix it and check if it does the job correctly.

Brian Candler

unread,
Mar 21, 2023, 1:06:34 PM3/21/23
to Prometheus Users
> But once the alert is fired, I don't want it to fire for the next 24 hours

I think the new "keep_firing_for: 24h" feature may do what you need.  The alert won't be resolved until there has been a solid 24 hour period with no further price drops.  However, that does mean if there are small price drops every 20 hours (say), then it won't resolve at all, nor will you get new alerts.

Combining this with "repeat_interval: 48h" in alertmanager would mean you at least get an alert every 2 days, if the price keeps falling.

Reply all
Reply to author
Forward
0 new messages