Alertmanager configuration: routes

Thomas Schneider

unread,

Sep 2, 2021, 6:52:20 AM9/2/21

to Prometheus Users

Hello,

can you please advise what is represented by a service in alertmanager configuration, e.g.

routes:

# All alerts with service=mysql or service=cassandra

# are dispatched to the database pager. - receiver: 'database-pager' group_wait: 10s matchers:

- service=~"mysql|cassandra"

Where do I find the service in the rules or in Prometheus -> Alerts?

THX

Brian Candler

unread,

Sep 2, 2021, 7:58:11 AM9/2/21

to Prometheus Users

It looks like "service" is a label that you have set in the prometheus alerting rule.

Thomas Schneider

unread,

Sep 2, 2021, 9:05:22 AM9/2/21

to Prometheus Users

Hello,

I have defined several rule files, e.g. this general.rules.yml:

groups:

- name: general.rules

rules:

- alert: TargetDown

annotations:

message: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.instance

}} instances are down.'

expr: 100 * (count(up == 0) BY (job, instance) / count(up) BY (job,

instance)) > 10

for: 10m

labels:

severity: warning

However, I don't see the correlation to service.

Brian Candler

unread,

Sep 2, 2021, 9:22:55 AM9/2/21

to Prometheus Users

Correct, that expression will only give "job" and "instance" labels.

I don't think your alertmanager rule will ever match on this alert.

Thomas Schneider

unread,

Sep 2, 2021, 10:48:57 AM9/2/21

to Prometheus Users

What should be the configuration in alertmanager.yml to match to the rule?

Brian Candler

unread,

Sep 2, 2021, 1:18:37 PM9/2/21

to Prometheus Users

Remove the match on service=~"mysql|cassandra" in your routing rule.

I'm not saying with 100% certainty that your alert *doesn't* have a service=xxx label; it's possible that it was added via other means, such as external_labels or alert_relabel_configs. If you go into the prometheus or alertmanager web interface, you can see active alerts and their labels, so you'll know what you have.

There was a nice web-based interface for testing alerting rules here:

https://prometheus.io/webtools/alerting/routing-tree-editor/
but it doesn't seem to work properly any more.

Thomas Schneider

unread,

Sep 3, 2021, 3:20:49 AM9/3/21

to Prometheus Users

It's clear that the config

- service=~"mysql|cassandra"

does not match the rule.

This was just an example.

But this question is still open:

What must the alertmanager config be for this rule?

groups:

- name: general.rules

rules:

- alert: TargetDown

annotations:

message: '{{ printf "%.4g" $value }}% of the {{ $labels.job }}/{{ $labels.instance

}} instances are down.'

expr: 100 * (count(up == 0) BY (job, instance) / count(up) BY (job,

instance)) > 10

for: 10m

labels:

severity: warning

Brian Candler

unread,

Sep 3, 2021, 4:09:56 AM9/3/21

to Prometheus Users

The only labels you can match on from that rule are "severity: warning", and the "job" and "instance" labels.

> What must the alertmanager config be for this rule?

You don't need *any* matching rules in alertmanager. At simplest, you can just have

route:

receiver: default

receivers:

- name: default

email_configs:

- to: us...@example.com

send_resolved: true

- to: us...@example.com

send_resolved: true

Any more than that, and it depends on your business requirements. Do you want all alerts with severity "warning" to be treated differently? Use a routing rule (in the "routes" section under "route"). Do you want a certain subset of targets to be handled by a particular team? Then either add a label in the alerting rules themselves, or ensure that those targets already have a particular label in their scrape config, and match that label in the "routes" section.

Brian Candler

unread,

Sep 3, 2021, 4:13:24 AM9/3/21

to Prometheus Users

Note that an "alertname" label is added automatically, so you could match on alertname="TargetDown" if you want. Doesn't scale very well, but with a small number of rules that approach will get you started.

If you go to your prometheus web interface, at prometheus:9090, and click on the "Alerts" tab at the top, then you can see firing alerts, including all the labels on them.

Thomas Schneider

unread,

Sep 3, 2021, 4:26:35 AM9/3/21

to Prometheus Users

This means

alert in Prometheus - Rules config

is equal to

service in Prometheus - Alertmanager config

?

Brian Candler

unread,

Sep 3, 2021, 1:47:22 PM9/3/21

to Prometheus Users

No, definitely not. There is no such thing as "service" in Prometheus - Alertmanager config.

But if you wish, you can have a label on your timeseries called "service", or called "environment", or anything you like. You can add labels at scrape time:

- job_name: node

scrape_interval: 1m

static_configs:

- targets:

- bar:9100

- baz:9100

# these labels are added to every timeseries scraped from those targets

labels:

environment: prod

(note that "job" and "instance" labels are also added automatically as part of the scrape; the remaining labels come from the exporter).

Or you can add a label in your alerting rule:

groups:

- name: UpDown

rules:

- alert: UpDown

expr: up == 0

for: 3m

# these labels are added to every alert generated from this rule

labels:

environment: prod

Note: it would be unusual to add label "environment: prod" in an alerting rule, but adding a label like "severity: critical" or "team: oncall" is more common - something which is specific to that alert, rather than the server.

In either of these cases, the alert which arrives at alertmanager will have the given labels on it. Hence you can match on it in alertmanager, to decide how to route the alert.

Brian Candler

unread,

Sep 3, 2021, 2:05:37 PM9/3/21

to Prometheus Users

And I forgot to say: given an alerting rule like

- alert: UpDown

expr: up == 0

for: 3m

then the label alertname="UpDown" is also added automatically (similar to how "job" and "instance" labels are added automatically at scrape time).

So at the end, you have a mixture of labels from the exporter, plus system-generated labels like "job" and "instance" and "alertname", plus any labels you've chosen to add yourself. The "matchers" in alertmanager can match any of these.

Thomas Schneider

unread,

Sep 10, 2021, 7:53:47 AM9/10/21

to Prometheus Users

Thanks for this information.

If my understanding is correct, the alert name, specified in in file rules.yml with parameter -alert: <alertname> must be used in file alertmanager.yml with parameter -service=<alertname>. Optionally one could add labels in rules.yml, e.g. team: oncall and then use this with -service="team: oncall".

Is this correct?

Brian Candler

unread,

Sep 10, 2021, 10:37:35 AM9/10/21

to Prometheus Users

Sorry, I don't know what you mean by parameter -service=<alertname> or parameter -service="team: oncall"

I still don't know where you're getting "service" from here. There are only labels, which are name/value pairs. Each alert can carry one or more labels. e.g.

alertname="foo" # label "alertname" with value "foo"

team="oncall" # label "team" with value "oncall"

service="web" # label "service" with value "web"

Something like "service=team=oncall" doesn't make any sense.

As for the dash, I think you may be confused by YAML syntax. A dash starts a member of a list. For example:

colours:

- red

- green

- blue

which can also be written in YAML as

colours: [red, green, blue]

and is equivalent to the following JSON:

{"colors": ["red", "green", "blue"]}

If a list contains objects, then the start of each object is marked by a dash. e.g.

shirts:

- colour: red

size: small

- colour: red

size: medium

- colour: green

size: medium

(Note how important getting the alignment right is!)

This equates to JSON:

{"shirts": [
{"color":"red", "size":"small"},

{"color":"red", "size":"medium"},

{"color":"green", "size":"medium"}

]}

(which doesn't care about alignment because there are explicit opening and closing braces and brackets)

Your alerting rules are similar to this: each rule starts with a dash, and then there are one or more settings below it. Formatting your original example properly:

- receiver: 'database-pager'

group_wait: 10s

matchers:

- service=~"mysql|cassandra"

This is a single rule within a list of rules (the first "-" marks this as a list element)

This rule has three settings: receiver, group_wait, and matchers.

matchers is itself a list.

There is one element in this list (also marked by a dash)

The content of that element is a string:

service=~"mysql|cassandra"

Each element under "matchers" is a PromQL matching rule. In this case, it matches the label "service" against that regular expression, which matches either the value "mysql" or "cassandra".

Therefore: that rule matches alerts which have a label "service" with value "mysql" or "cassandra". If the rule matches, then the alert is delivered to "database-pager" with group_wait of 10 seconds. If the alert doesn't match, then it moves onto the next rule. And if none of the rules match, it falls back to the default receiver selected elsewhere in the config.

It looks like I'm not expressing myself clearly, so perhaps someone else might be able to explain it more clearly than me.

Reply all

Reply to author

Forward