Inhibit_rule (mute alert per target)

54 views
Skip to first unread message

Александр Админ

unread,
Aug 1, 2023, 10:48:16 AM8/1/23
to Prometheus Users
Hello, 
we have a blackbox exporter which is probing around 50 targets.
We also have an alert rule: up == 0 
so we can know then certain targets are not scraped.

The problem is: if blackbox exporter goes down, we receive 50 alerts. One for each "down" target (instance). We would like to receive only single alert.
I was thinking that inhibit_rule might solve that problem.
But it's not.
Here is the config:
inhibit_rules: 
  - equal: 
      - instance 
    source_matchers: 
      - alertname="Prometheus target state" 
    target_matchers: 
      - alertname="Prometheus target state" 
       - job="Blackbox exporter job"

Does anyone know how we can solve that problem? Thanks!

Brian Candler

unread,
Aug 1, 2023, 11:46:29 AM8/1/23
to Prometheus Users
I observe that unless blackbox_exporter itself fails you should never see up{job="blackbox"} == 0; a real alert is from probe_success == 0.

Hence I suggest separate expressions:

expr: up{job!="blackbox"} == 0    # other targets are down

expr: probe_success == 0    # blackbox target has failed

expr: min by (job) (up{job="blackbox"}) == 0    # a single alert, for "blackbox_exporter itself has failed" (which alerts if *any* scrape to blackbox fails)

I think this is cleaner than inhibits. Using inhibits you'd still need an expression like max(up{job="blackbox"}) == 0 to be used to inhibit all the other blackbox_exporter up==0 alerts.

Or, you could have a routing rule which matches job="blackbox", and under that have group_by: ['job'].  Then you'll receive one message, with up to 50 separate alerts listed within it. However, it will also group together all other blackbox alerts like probe_success == 0, unless you add a specific label and test for that in your routing rule.

Reply all
Reply to author
Forward
0 new messages