Setting stability filter for alerts in prometheus

39 views
Skip to first unread message

shivakumar sajjan

unread,
Mar 15, 2023, 3:56:36 AM3/15/23
to Prometheus Users
I have a service that pushes forecast metrics every 1 minute to Prometheus using a push gateway and configured an alert rule in Prometheus.

Requirements:

  • 5 minutes is the stability filter.
  • If forecasts are not published over the last 5 minutes, then trigger a firing alert.
  • If forecasts are published over the last 5 minutes, then resolve the alert.

The reason we need a stability filter is: sometimes the service is not able to push metrics due to the push gateway service being down for 1 minute and the push gateway service being recovered within 2 minutes, so we do not want to send firing alerts in this scenario.

Prometheus configurations:

evaluation_interval: 1m

scrape_interval: 30s

Alert rule:

- alert: forecaster expr: rate(forecasts_published_counter{job=\"metrics_job\", module_name=\"forecaster\"}[5m]) <= 0 for: 5m

I experimented stability filter using FOR clause, it works for firing alerts but it does not work for resolving alerts.

The service is not publishing forecasts for over the 5 minutes:

  • The time taken to change the alert state from inactive to pending is 1m i.e evaluation interval.
  • The time taken to change the alert state from pending to firing is between (FOR clause interval) and (evaluation_interval + scrape_interval + FOR clause interval) i.e 5m and (1m + 30s + 5m = 6m 30s).

The service publishing forecasts over the 5 minutes:

  • The time taken to change the alert state from firing to inactive (Resolved) is 1m i.e evaluation interval.

I can change the evaluation interval to 5m but it affects other services. So I do not want to change it.

Is there any other way to set a stability filter (5m) in Prometheus for changing the alert state from firing to inactive(Resolved)?



Thanks,

Shivakumar Sajjan

Brian Candler

unread,
Mar 15, 2023, 5:19:33 AM3/15/23
to Prometheus Users
> I can change the evaluation interval to 5m but it affects other services. So I do not want to change it.

Evaluation intervals can be set at the level of each Rule Group, so you just put your rule into its own Rule Group and you can give it whatever evaluation interval you like.

I experimented stability filter using FOR clause, it works for firing alerts but it does not work for resolving alerts.

There's a new feature in Prometheus v2.42.0 to give the equivalent behaviour for resolving alerts:
  • [FEATURE] Add 'keep_firing_for' field to alerting rules. #11827
Reply all
Reply to author
Forward
0 new messages