Alertmanager configuration: routes

833 views
Skip to first unread message

Thomas Schneider

unread,
Sep 2, 2021, 6:52:20 AM9/2/21
to Prometheus Users
Hello,

can you please advise what is represented by a service in alertmanager configuration, e.g.
routes: 
# All alerts with service=mysql or service=cassandra 
# are dispatched to the database pager. - receiver: 'database-pager' group_wait: 10s matchers: 
 - service=~"mysql|cassandra"

Where do I find the service in the rules or in Prometheus -> Alerts?

THX

Brian Candler

unread,
Sep 2, 2021, 7:58:11 AM9/2/21
to Prometheus Users
It looks like "service" is a label that you have set in the prometheus alerting rule.

Thomas Schneider

unread,
Sep 2, 2021, 9:05:22 AM9/2/21
to Prometheus Users
Hello,

I have defined several rule files, e.g. this general.rules.yml:
groups:
- name: general.rules
  rules:
  - alert: TargetDown
    annotations:
      message: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.instance
        }} instances are down.'
    expr: 100 * (count(up == 0) BY (job, instance) / count(up) BY (job,
      instance)) > 10
    for: 10m
    labels:
      severity: warning

However, I don't see the correlation to service.

Brian Candler

unread,
Sep 2, 2021, 9:22:55 AM9/2/21
to Prometheus Users
Correct, that expression will only give "job" and "instance" labels.

I don't think your alertmanager rule will ever match on this alert.

Thomas Schneider

unread,
Sep 2, 2021, 10:48:57 AM9/2/21
to Prometheus Users
What should be the configuration in alertmanager.yml to match to the rule?

Brian Candler

unread,
Sep 2, 2021, 1:18:37 PM9/2/21
to Prometheus Users
Remove the match on service=~"mysql|cassandra" in your routing rule.

I'm not saying with 100% certainty that your alert *doesn't* have a service=xxx label; it's possible that it was added via other means, such as external_labels or alert_relabel_configs.  If you go into the prometheus or alertmanager web interface, you can see active alerts and their labels, so you'll know what you have.

There was a nice web-based interface for testing alerting rules here:
https://prometheus.io/webtools/alerting/routing-tree-editor/
but it doesn't seem to work properly any more.

Thomas Schneider

unread,
Sep 3, 2021, 3:20:49 AM9/3/21
to Prometheus Users
It's clear that the config
- service=~"mysql|cassandra"
does not match the rule.
This was just an example.

But this question is still open:
What must the alertmanager config be for this rule?
groups:
- name: general.rules
  rules:
  - alert: TargetDown
    annotations:
      message: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.instance
        }} instances are down.'
    expr: 100 * (count(up == 0) BY (job, instance) / count(up) BY (job,
      instance)) > 10
    for: 10m
    labels:
      severity: warning

Brian Candler

unread,
Sep 3, 2021, 4:09:56 AM9/3/21
to Prometheus Users
The only labels you can match on from that rule are "severity: warning", and the "job" and "instance" labels.


> What must the alertmanager config be for this rule?

You don't need *any* matching rules in alertmanager.  At simplest, you can just have

route:
  receiver: default

receivers:
- name: default
  email_configs:
    send_resolved: true
    send_resolved: true

Any more than that, and it depends on your business requirements.  Do you want all alerts with severity "warning" to be treated differently?  Use a routing rule (in the "routes" section under "route").  Do you want a certain subset of targets to be handled by a particular team? Then either add a label in the alerting rules themselves, or ensure that those targets already have a particular label in their scrape config, and match that label in the "routes" section.

Brian Candler

unread,
Sep 3, 2021, 4:13:24 AM9/3/21
to Prometheus Users
Note that an "alertname" label is added automatically, so you could match on alertname="TargetDown" if you want.  Doesn't scale very well, but with a small number of rules that approach will get you started.

If you go to your prometheus web interface, at prometheus:9090, and click on the "Alerts" tab at the top, then you can see firing alerts, including all the labels on them.

img1.png

Thomas Schneider

unread,
Sep 3, 2021, 4:26:35 AM9/3/21
to Prometheus Users
This means
alert in Prometheus - Rules config
is equal to
service in Prometheus - Alertmanager config
?

Brian Candler

unread,
Sep 3, 2021, 1:47:22 PM9/3/21
to Prometheus Users
No, definitely not. There is no such thing as "service" in Prometheus - Alertmanager config.

But if you wish, you can have a label on your timeseries called "service", or called "environment", or anything you like.  You can add labels at scrape time:

  - job_name: node
    scrape_interval: 1m
    static_configs:
      - targets:
          - bar:9100
          - baz:9100
        # these labels are added to every timeseries scraped from those targets
        labels:
          environment: prod

(note that "job" and "instance" labels are also added automatically as part of the scrape; the remaining labels come from the exporter).

Or you can add a label in your alerting rule:

groups:
- name: UpDown
  rules:
  - alert: UpDown
    expr: up == 0
    for: 3m
    # these labels are added to every alert generated from this rule
    labels:
      environment: prod

Note: it would be unusual to add label "environment: prod" in an alerting rule, but adding a label like "severity: critical" or "team: oncall" is more common - something which is specific to that alert, rather than the server.

In either of these cases, the alert which arrives at alertmanager will have the given labels on it.  Hence you can match on it in alertmanager, to decide how to route the alert.

Brian Candler

unread,
Sep 3, 2021, 2:05:37 PM9/3/21
to Prometheus Users
And I forgot to say: given an alerting rule like

  - alert: UpDown
    expr: up == 0
    for: 3m

then the label alertname="UpDown" is also added automatically (similar to how "job" and "instance" labels are added automatically at scrape time).

So at the end, you have a mixture of labels from the exporter, plus system-generated labels like "job" and "instance" and "alertname", plus any labels you've chosen to add yourself.  The "matchers" in alertmanager can match any of these.

Thomas Schneider

unread,
Sep 10, 2021, 7:53:47 AM9/10/21
to Prometheus Users
Thanks for this information.

If my understanding is correct, the alert name, specified in in file rules.yml with parameter -alert: <alertname> must be used in file alertmanager.yml with parameter -service=<alertname>. Optionally one could add labels in rules.yml, e.g. team: oncall and then use this with -service="team: oncall".

Is this correct?

Brian Candler

unread,
Sep 10, 2021, 10:37:35 AM9/10/21
to Prometheus Users
Sorry, I don't know what you mean by  parameter -service=<alertname> or  parameter -service="team: oncall"

I still don't know where you're getting "service" from here.  There are only labels, which are name/value pairs.  Each alert can carry one or more labels. e.g.

alertname="foo"    # label "alertname" with value "foo"
team="oncall"     # label "team" with value "oncall"
service="web"    # label "service" with value "web"

Something like "service=team=oncall" doesn't make any sense.

As for the dash, I think you may be confused by YAML syntax.  A dash starts a member of a list.  For example:

colours:
  - red
  - green
  - blue

which can also be written in YAML as

colours: [red, green, blue]

and is equivalent to the following JSON:

{"colors": ["red", "green", "blue"]}

If a list contains objects, then the start of each object is marked by a dash. e.g.

shirts:
  - colour: red
    size: small
  - colour: red
    size: medium
  - colour: green
    size: medium

(Note how important getting the alignment right is!)

This equates to JSON:

{"shirts": [
  {"color":"red", "size":"small"},
  {"color":"red", "size":"medium"},
  {"color":"green", "size":"medium"}
]}

(which doesn't care about alignment because there are explicit opening and closing braces and brackets)

Your alerting rules are similar to this: each rule starts with a dash, and then there are one or more settings below it.  Formatting your original example properly:

- receiver: 'database-pager'
  group_wait: 10s
  matchers:
    - service=~"mysql|cassandra"

This is a single rule within a list of rules (the first "-" marks this as a list element)
This rule has three settings: receiver, group_wait, and matchers.
matchers is itself a list.
There is one element in this list (also marked by a dash)
The content of that element is a string:
    service=~"mysql|cassandra"

Each element under "matchers" is a PromQL matching rule.  In this case, it matches the label "service" against that regular expression, which matches either the value "mysql" or "cassandra".

Therefore: that rule matches alerts which have a label "service" with value "mysql" or "cassandra".  If the rule matches, then the alert is delivered to "database-pager" with group_wait of 10 seconds.  If the alert doesn't match, then it moves onto the next rule.  And if none of the rules match, it falls back to the default receiver selected elsewhere in the config.

It looks like I'm not expressing myself clearly, so perhaps someone else might be able to explain it more clearly than me.
Reply all
Reply to author
Forward
0 new messages