Muting Prometheus alerts older than 1 day

43 views
Skip to first unread message

shubra bhattacharyya

unread,
Oct 25, 2021, 7:35:53 PM10/25/21
to Prometheus Users
Hi
Is there some option to stop sending alerts older than 1 day from the Prometheus Alert Manager to Rocket Chat?
In our use case we are using the alert Manager to send notification about failed argo workflows to Rocket Chat.
We want to see only the failed workflows in the last 24 hours .
However we keep getting "all" the previous alerts as well from the  Alert manager.
Any help is appreciated.

Regards
Shubra
 

Brian Candler

unread,
Oct 26, 2021, 3:38:28 AM10/26/21
to Prometheus Users
You can turn off periodic resending entirely by setting a very large repeat interval: e.g.

  repeat_interval: 10y

However, when a new alert comes along which is grouped with older alerts, the new notification will include *all* the currently firing alerts in that group.

You can change the alerting rule "expr" to have a condition which stops alerting if it's been active for more than 24 hours, perhaps along the lines of

    expr: up == 0 unless up offset 24h == 0
    expr: up == 0 unless min_over_time(up[24h] offset 24h) == 0  # any fail between 24 and 48 hours ago mutes the alert

But ISTM that timeseries and alerting are being used in an inappropriate way here.  If a failure condition can't be resolved, then it's not actionable.  Would it be possible instead to mute the alert at source, e.g. by deleting the argo workflow once you've gathered enough info about the failure and are satisfied that the "failure" is resolved?
Reply all
Reply to author
Forward
0 new messages