help with alert rule

129 views
Skip to first unread message

dc3o

unread,
Jun 24, 2021, 4:39:12 AM6/24/21
to Prometheus Users
Using black box exporter for monitoring internal apps. In non production environments I would like to set the alerting rule to  skip registering the alert if monitored endpoint is down for more than a few days. My main concern is that alert rule like:

     probe_success{job="blackbox"} != 1 and avg_over_time(probe_success[3d]) *100 > 10

could miss some issues in prod environments.


Marcelo Magallón

unread,
Jun 24, 2021, 11:08:59 AM6/24/21
to dc3o, Prometheus Users
My first thought when I read this is to use inhibit rules in Alert Manager: define an alert that will fire after the endpoint is down for more than the number of days and use that as the source. The bit I'm not sure about is the concern you have. I would expect that you have labels to tell non-prod apart from prod, so you can inhibit the non-prod alerts and leave the prod alerts alone.

Does that make sense?

--
Marcelo Magallón

Nemanja Delic

unread,
Jun 24, 2021, 11:50:14 AM6/24/21
to Marcelo Magallón, Prometheus Users
Alert inhibition will only prevent the alerts to hit, I'd like to totally ignore these and don't show them as alerts. What I did is just different expression for non prod environments.
Reply all
Reply to author
Forward
0 new messages