alert rules: multiple rules for general & special cases

35 views
Skip to first unread message

Mario Cornaccini

unread,
Jan 18, 2023, 8:00:37 AM1/18/23
to Prometheus Users
hi,

for the same metric, i want to have multiple rules in alert manager, to have longer FOR: times for some special cases.

the way i do this now is:
alert1 # general
probe_succes{ somelabel !~"specialcase1|specialcase2"}
alert2 # special
probe_success{somelabel =~"specialcase1|specialcase2"}
.. which is obviously badly maintainable and ugly and won't scale..

but it looks a bit, well, hard to maintain..



so i got this idea, what would happen if i did this:

in prometheus rules :
alert1 # handles the special case
ie. probe_success{somelabel="XYZ"}
labels:
   someswitch: true

alert 2 #handles the general case
probe_success{}

and in alert manager:
define an inhibit rule which mutes the general alert, if there is also an special case one, based on the someswitch label, would that work?

any help/pointers/comments greatly appreciated,
cheers,
mario

Julius Volz

unread,
Jan 18, 2023, 8:48:59 AM1/18/23
to Mario Cornaccini, Prometheus Users
If your special cases have a *longer* "for" duration than the general ones, then I guess they won't be useful for inhibiting the general ones, since the special cases will start firing too late relative to the general ones to inhibit them. I guess you could introduce a copy of each special case alert without any "for" duration (or a shorter one) that you don't route anywhere and that is only used for inhibitions. And then you have a second version of it that's actually routed, with a longer "for" duration?

Whether that's more maintainable than going for !~ and =~ regex matchers as you described is a good question though. Maybe rather than distinguishing each special case in the alerting rules themselves, maybe you can attach a special new (single) label to your targets that differentiate the general ones from the longer "for" duration ones, so you can just use that one label for filtering in the rules?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/5fe28eec-a91c-4619-af41-128966ad08d9n%40googlegroups.com.


--
Julius Volz
PromLabs - promlabs.com

Brian Candler

unread,
Jan 18, 2023, 9:36:15 AM1/18/23
to Prometheus Users
If you refactor the rules a bit, you may find them easier to maintain:

alert1:
  expr: probe_success{somelabel="XYZ"} == 0
  labels:
    someswitch: foo

alert2:
  expr: probe_success{somelabel="ABC"} == 0
  labels:
    someswitch: bar

alert3:
  expr: |
    probe_success == 0
    unless probe_success{somelabel="XYZ"} == 0
    unless probe_success{somelabel="ABC"} == 0

The 'special cases' alert1 and alert2 have particular rules; alert3 has the generic catch-all rule with 'unless' blocks to suppress the alert1 and alert2 cases, but using identical expressions.

I think this approach is easier to reason about than having to generate new expressions with inverted logic.  The expressions do have to return the same label sets (which, in the case of the same metric, should be true)
Reply all
Reply to author
Forward
0 new messages