Alert continues to first receiver but not beyond in Prometheus

159 views
Skip to first unread message

realElonMusk

unread,
Jul 18, 2023, 1:42:13 PM7/18/23
to Prometheus Users
Hello,

I'm experiencing an issue with the continue: true option in my Prometheus alert routing setup. Here's the configuration:

route:
  routes:
    - matchers: [ owner = middleEarth ]
      receiver: 'middleEarth-alerts'
      routes:
        - matchers: [ alertname = MordorThreatens ]
          receiver: 'middleEarth-alerts-prod-critical'
          routes:
            - matchers: [ realm =~ 'middleEarth-.*.middle-earth.com' ]
              receiver: 'middleEarth-alerts-prod-critical'
              continue: true
            - matchers: [ realm = 'middleEarth-rohan.middle-earth.com' ]
              receiver: 'rohan-alerts'
              continue: true
            - matchers: [ realm = 'middleEarth-rivendell.middle-earth.com' ]
              receiver: 'rivendell-alerts'
              continue: true
            - matchers: [ realm = 'middleEarth-shire.middle-earth.com' ]
              receiver: 'shire-alerts'
              continue: true
            - matchers: [ realm = 'middleEarth-moria.middle-earth.com' ]
              receiver: 'moria-alerts'
              continue: true

In this setup, when an alert with realm='middleEarth-shire.middle-earth.com' is triggered, it successfully matches against the first route and is routed to the 'middleEarth-alerts-prod-critical' receiver as expected. However, the routing doesn't continue to the next matchers. Specifically, the alert is never matched against realm='middleEarth-shire.middle-earth.com' to be sent to the 'shire-alerts' receiver, even though continue: true is set.

I've verified the labels and they seem to be correct. Why does the alert routing not continue after the first match? Any insights on how to resolve this issue would be greatly appreciated.

Thank you.


Brian Candler

unread,
Jul 18, 2023, 2:25:59 PM7/18/23
to Prometheus Users
Quite possibly *none* of your sub-matchers are matching, and it's falling back to the default receiver 'middleEarth-alerts-prod-critical' which is at the same level as routes:

route:
  routes:
    - matchers: [ owner = middleEarth ]
      receiver: 'middleEarth-alerts'
      routes:
        - matchers: [ alertname = MordorThreatens ]
          receiver: 'middleEarth-alerts-prod-critical'    << this is used if *none* of the routes below match
          routes: ...

But without seeing your actual alert labels and conditions I can't give any more help.

There is an alert route testing tool you can use online, and you can paste in your *real* labels and matchers:

realElonMusk

unread,
Jul 18, 2023, 3:28:32 PM7/18/23
to Prometheus Users
Hello Brian,

Thank you for your response and for suggesting the use of the tool. After using the tool with my labels, I observed that all routes under alertname = MordorThreatens are highlighted. This result is not what I expected based on my understanding of the routing configuration.

My expectation was that only the route with realm = 'middleEarth-shire.middle-earth.com' should be highlighted given the label configuration I am testing with. Could you help me understand why all routes under alertname = MordorThreatens are highlighted instead?

Here are the labels I used for testing:

{owner="middleEarth", alertname="MordorThreatens", realm="middleEarth-shire.middle-earth.com"}

Any further insights would be greatly appreciated.

Brian Candler

unread,
Jul 19, 2023, 3:36:06 AM7/19/23
to Prometheus Users
It looks like you're using the wrong sort of quotes in your "matchers".

            - matchers: [ 'realm = middleEarth-moria.middle-earth.com' ]  # this works
            - matchers: [ realm = "middleEarth-moria.middle-earth.com" ]  # this works
            - matchers: [ realm = 'middleEarth-moria.middle-earth.com' ]  # THIS SILENTLY MATCHES EVERYTHING!
            - matchers: [ "realm = 'middleEarth-moria.middle-earth.com'" ]  # SO DOES THIS!

For me, it works correctly, at least in the web tool, if you rewrite to the following:

route:
  routes:
    - matchers: [ 'owner = middleEarth' ]
      receiver: 'middleEarth-alerts'
      routes:
        - matchers: [ 'alertname = MordorThreatens' ]
          receiver: 'middleEarth-alerts-prod-critical'
          routes:
            - matchers: [ 'realm =~ "middleEarth-.*.middle-earth.com"' ]
              receiver: 'middleEarth-alerts-prod-critical'
              continue: true
            - matchers: [ 'realm = middleEarth-rohan.middle-earth.com' ]
              receiver: 'rohan-alerts'
              continue: true
            - matchers: [ 'realm = middleEarth-rivendell.middle-earth.com' ]
              receiver: 'rivendell-alerts'
              continue: true
            - matchers: [ 'realm = middleEarth-shire.middle-earth.com' ]
              receiver: 'shire-alerts'
              continue: true
            - matchers: [ 'realm = middleEarth-moria.middle-earth.com' ]
              receiver: 'moria-alerts'
              continue: true

(Note that I've also quoted the entire matcher expression, to be sure that it doesn't get broken up by YAML parsing)

I don't know what's going on under the hood, and perhaps alertmanager ought to give some sort of error if it sees a rule it doesn't understand, rather than silently passing the test.

Brian Candler

unread,
Jul 19, 2023, 4:47:52 AM7/19/23
to Prometheus Users
What version of alertmanager are you using? FWIW, I was unable to reproduce this with alertmanager 0.25.0

route:
  receiver: r0
  routes:
    - receiver: r1
      matchers: [ 'instance =~ ".+"' ]
      continue: true
    - receiver: r2
      matchers: [ "instance = 'bar'" ]
      continue: true

If I raise an alert with instance="foo", it only sends to r1.

However, the web-based routing tree editor *is* confused by the example above.  So unfortunately, I think this is just a symptom of the web tool being different from the real thing.

realElonMusk

unread,
Jul 19, 2023, 8:59:03 PM7/19/23
to Prometheus Users
Thanks a lot Brian.
This solved my issue.

Brian Candler

unread,
Jul 20, 2023, 2:46:51 AM7/20/23
to Prometheus Users
Could you say what version of alertmanager you're using?

If it's current (0.25.0) and the issue is reproducible, then I think it's worth raising as a ticket (if only because the behaviour is confusing).  But I'd be interested to know what it is that makes it reproducible in your case, but not by my simplified version.

realElonMusk

unread,
Jul 20, 2023, 7:46:51 AM7/20/23
to Prometheus Users
Hey, Brian, sorry I wasn't clear.
Using the the right quotes in the matchers solved the issue.
We're using 0.23.0 in production and this is where the issue was first noticed. I'm yet to verify this in our production environment.
However, the issue with the original configuration (without the right quotes) is reproducible on 0.23.0 and 0.25.0. I'm using 0.25.0 on my local environment.
Reply all
Reply to author
Forward
0 new messages