AlertManager & Prometheus Events API v2 will re-trigger events

227 views
Skip to first unread message

Russ Robinson

unread,
Mar 9, 2023, 4:17:24 PM3/9/23
to Prometheus Users
  I have alertmanager configured to send "critical" alerts to Pagerduty over Events v2 api.  If the prometheus rule has an alert that lasts longer than 20 minutes or so; the pagerduty alert will be resolved and then re-triggers a new event.

  I have tried disabling grouping (with "group_by [...]").  The pagerduty alert's log just says: "Resolved through the integration API.".

  However, the alert still shows in Alertmanager.  In addition, I have messages going to slack.  The alert message shows up there; but never a resolved message either.

  Any ideas why alertmanager would close/resolve the pagerduty incident and then re-trigger/open one again?

Brian Candler

unread,
Mar 10, 2023, 3:34:57 AM3/10/23
to Prometheus Users
I think the starting point is to look at your alerting expressions, how they change over time in the PromQL GUI (graph view), and the synthetic metric "ALERTS".

If an alert expression drops out for even a single rule evaluation interval, then the alert is immediately resolved; and then it will re-fire on the next cycle (or after the "for:" period if present)

There is a change in prometheus-2.42.0 which may help address this:
  • [FEATURE] Add 'keep_firing_for' field to alerting rules. #11827

Reply all
Reply to author
Forward
0 new messages