restrict (respectively silence) alert rules to/for certain instances

212 views
Skip to first unread message

Christoph Anton Mitterer

unread,
Apr 24, 2023, 9:59:54 PM4/24/23
to Prometheus Users
Hey.

I have some troubles understanding how to do things right™ with respect
to alerting.

In principle I'd like to do two things:

a) have certain alert rules run only for certain instances
(though that may in practise actually be less needed, when only the
respective nodes would generate the respective metrics - not sure
yet, whether this will be the case)
b) silence certain (or all) alerts for a given set of instances
e.g. these may be nodes where I'm not an admin how can take action
on an incident, but just view the time series graphs to see what's
going on


As example I'll take an alert that fires when the root fs has >85%
usage:
groups:
- name: node_alerts
rules:
- alert: node_free_fs_space
expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs"} * 100) / node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs"}) >= 85



With respect to (a):
I could of course a yet another key like:
instance=~"someRegexThatDescribesMyInstances"
to each time series, but when that regex gets more complex, everything
becomes quite unreadable and it's quite error prone to forget about a
place (assuming one has many alerts) when the regex changes.

Is there some way like defining host groups or so? Where I have a
central place where I could define the list of hosts respectively a
regex for that... and just use the name of that definition in the
actual alert rules?


With respect to (b):
Similarly to above,... if I had various instances for which I'd never
wanted to see any alerts, I could of course add a regex to all my
alerts.
But seems quite ugly to clutter up all the rules just for a potentially
long list/regex of things for which I don't want to see anyway.

Another idea I had was that I do the filtering/silencing in the
alertmanager config at route level:
Like by adding a "ignore" route, that matches via regex on all the
instances I'd like to silence (and have a mute_time_interval set to
24/7), before any other routes match.

But AFAIU this would only suppress the message (e.g. mail), but the
alert would still show up in the alertmanager webpages/etc. as firing.



Not sure whether anything can be done better via adding labels at some
stage.
- Doing external_labels: in prometheus config doesn't seem to help
here (only stact values?)
- Same for labels: in <static_config> in prometheus config.
- Setting some "noalerts" label via <relabel_config> in prometheus
config would also set that in the DB, right?
This I rather wouldn't want.

- Maybe using:
alerting:
alert_relabel_configs:
- <relabel_config>
would work? Like matching hostnames on instance and replacing with
e.g. "yes" in some "noalerts" target?
And then somehow using that in the alert rules...

But also sounds a bit ugly, TBH.


So... what's the proper way to do this? :-)


Thanks,
Chris.


btw: Is there any difference between:
1) alerting:
alert_relabel_configs:
- <relabel_config>
and
2) the relabel_configs: in <alertmanager_config>

Brian Candler

unread,
Apr 25, 2023, 3:59:12 AM4/25/23
to Prometheus Users
On Tuesday, 25 April 2023 at 02:59:54 UTC+1 Christoph Anton Mitterer wrote:
In principle I'd like to do two things:

a) have certain alert rules run only for certain instances
(though that may in practise actually be less needed, when only the
respective nodes would generate the respective metrics - not sure
yet, whether this will be the case)
b) silence certain (or all) alerts for a given set of instances
e.g. these may be nodes where I'm not an admin how can take action
on an incident, but just view the time series graphs to see what's
going on

"Silence" has a special meaning in Prometheus: it means a temporary override to sending out alerts in alertmanager (typically for maintenance periods).

So really I'd divide the possibilities 3 ways:

a. Prevent the alert being generated from prometheus in the first place, by writing the expr in such a way that it filters out conditions that you don't want to alert on

b. Let the alert arrive at alertmanager, but permanently prevent it from sending out notifications for certain instances

c. Apply a temporary silence in alertmanager for certain alerts or groups of alerts

(1) is done by writing your 'expr' to match only specific instances or to exclude specific instances

(2) is done by matching on labels in your alertmanager routing rules (and if necessary, by adding extra labels in your 'expr')

(3) is done by creating a silence in alertmanager through its UI or API (or a frontend like karma or alerta.io)


 

As example I'll take an alert that fires when the root fs has >85%
usage:
groups:
- name: node_alerts
rules:
- alert: node_free_fs_space
expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs"} * 100) / node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs"}) >= 85



With respect to (a):
I could of course a yet another key like:
instance=~"someRegexThatDescribesMyInstances"
to each time series, but when that regex gets more complex, everything
becomes quite unreadable and it's quite error prone to forget about a
place (assuming one has many alerts) when the regex changes.

If you want to apply a threshold to only certain filesystems, and/or to have different thresholds per filesystem, then it's possible to put the thresholds in their own set of static timeseries:


But I don't recommend this, and I find such alerts are brittle.  It helps to rethink exactly what you should be alerting on:

For the majority of cases: "alert on symptoms, rather than causes".  That is, alert when a service isn't *working* (which you always need to know about), and in those alerts you can include potential cause-based information (e.g. CPU load is high, RAM is full, database is down etc).

Now, there are also some things you want to know about *before* they become a problem, like "disk is nearly full".  But the trouble with static alerts is, they are a pain to manage.  Suppose you have a threshold at 85%, and you have one server which is consistently at 86% but not growing - you know this is the case, you have no need to grow the filesystem, so you end up tweaking thresholds per instance.

I would suggest two alternatives:

1. Check dashboards daily.  If you want automatic notifications then don't send the sort of alert which gets someone out of bed, but a "FYI" notification to something like Slack or Teams.

2. Write dynamic alerts, e.g. have alerting rules which identify disk usage which is growing rapidly and likely to fill in the next few hours or days.

- name: DiskRate10m
  interval: 1m
  rules:
  # Warn if rate of growth over last 10 minutes means filesystem will fill in 2 hours
  - alert: DiskFilling10m
    expr: |
        node_filesystem_avail_bytes / (node_filesystem_avail_bytes -
        (predict_linear(node_filesystem_avail_bytes{fstype!~"fuse.*|nfs.*"}[10m], 7200) < 0)) * 7200
    for: 20m
    labels:
      severity: critical
    annotations:
      summary: 'Filesystem will be full in {{ $value | humanizeDuration }} at current 10m growth rate'

- name: DiskRate3h
  interval: 10m
  rules:
  # Warn if rate of growth over last 3 hours means filesystem will fill in 2 days
  - alert: DiskFilling3h
    expr: |
        node_filesystem_avail_bytes / (node_filesystem_avail_bytes -
        (predict_linear(node_filesystem_avail_bytes{fstype!~"fuse.*|nfs.*"}[3h], 172800) < 0)) * 172800
    for: 6h
    labels:
      severity: warning
    annotations:
      summary: 'Filesystem will be full in {{ $value | humanizeDuration }} at current 3h growth rate'



Not sure whether anything can be done better via adding labels at some
stage.

As well as target labels, you can set labels in the alerting rules themselves, for when an alert fires. That doesn't help you filter the alert expr itself, but it can be useful when deciding how to route the notification in alertmanager.

Target labels are a decent way to classify machines, e.g. target labels for "development" and "production" mean that you can easily alert or dispatch alerts differently for those two environments.  But you should beware of changing them frequently, because every time the set of labels on a metric changes, it becomes a new timeseries.  This makes it hard to follow the history of the metric.

If you want to do really clever stuff like classifying hosts dynamically, then you can do it by having *separate* timeseries for those classifications:


Again, unless you really need it, this is arguably getting "too clever" - and it will make the actual alerting rules more complex.

Christoph Anton Mitterer

unread,
Apr 25, 2023, 8:04:26 PM4/25/23
to Prometheus Users
Hey Brian

On Tuesday, April 25, 2023 at 9:59:12 AM UTC+2 Brian Candler wrote:
So really I'd divide the possibilities 3 ways:

a. Prevent the alert being generated from prometheus in the first place, by writing the expr in such a way that it filters out conditions that you don't want to alert on

b. Let the alert arrive at alertmanager, but permanently prevent it from sending out notifications for certain instances

c. Apply a temporary silence in alertmanager for certain alerts or groups of alerts

(1) is done by writing your 'expr' to match only specific instances or to exclude specific instances

(2) is done by matching on labels in your alertmanager routing rules (and if necessary, by adding extra labels in your 'expr')

I think in my case (where I want to simply get no alerts at all for a certain group of instances) it would be (1) or (2), with (1) probably being the cleaner one.

I guess with (2) you also meant having a route which is then permanently muted?


If you want to apply a threshold to only certain filesystems, and/or to have different thresholds per filesystem, then it's possible to put the thresholds in their own set of static timeseries:


But I don't recommend this, and I find such alerts are brittle.

Would also sound like a solution that's a bit over-engineered to me.
Thanks but I'm not sure sure whether the above applies to my scenario.

For me it's really like this:
My Prometheus instance monitors:
- my "own" instances, where I need to react on things like >85% usage on root filesystem (and thus want to get an alert)
- "foreign" instances, where I just get the node exporter data and show e.g. CPU usage, IO usage, and so on as a convenience to users of our cluster - but any alert conditions wouldn't cause any further action on my side (and the guys in charge of those servers have their own monitoring)

So in the end it just boils down to my desire to keep my alert rules small/simple/readable.
   expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs"} * 100) / node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs"}) >= 85
=> would fire for all nodes, bad

   expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"} * 100) / node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"}) >= 85
=> would work, I guess, but seems really ugly to read/maintain


 
Not sure whether anything can be done better via adding labels at some
stage.

As well as target labels, you can set labels in the alerting rules themselves, for when an alert fires. That doesn't help you filter the alert expr itself, but it can be useful when deciding how to route the notification in alertmanager.

Those (target labels) are the ones that would get saved in the TSDB, right?
 
Target labels are a decent way to classify machines, e.g. target labels for "development" and "production" mean that you can easily alert or dispatch alerts differently for those two environments.  But you should beware of changing them frequently, because every time the set of labels on a metric changes, it becomes a new timeseries.  This makes it hard to follow the history of the metric.

Which is why I would rather not want to use them (for that purpose).

 
If you want to do really clever stuff like classifying hosts dynamically, then you can do it by having *separate* timeseries for those classifications:


Again, unless you really need it, this is arguably getting "too clever" - and it will make the actual alerting rules more complex.

Which (making the rules complex) is just what I want to avoid... plus, new time series means more storage usage.


From all that it seems to me that the "best" solution is either:
a) simply making more complex and error prone alert rules, that filter out the instances in the first place, like in:
   expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"} * 100) / node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"}) >= 85

b) The idea hat I had above:
- using <alert_relabel_configs> to filter on the instances and add a label if it should be silenced
- use only that label in the expr instead of the full regex
But would that even work?
Cause documentation says "Alert relabeling is applied to alerts before they are sent to the Alertmanager."... but the alert rules are already evaluated before, right?


Thanks,
Chris.

Brian Candler

unread,
Apr 26, 2023, 3:14:35 AM4/26/23
to Prometheus Users
> I guess with (2) you also meant having a route which is then permanently muted?

I'd use a route with a null receiver (i.e. a receiver which has no <transport>_configs under it)

> b) The idea hat I had above:
> - using <alert_relabel_configs> to filter on the instances and add a label if it should be silenced
> - use only that label in the expr instead of the full regex
> But would that even work?

No, because as far as I know alert_relabel_configs is done *after* the alert is generated from the alerting rule. It's only used to add extra labels before sending the generated alert to alertmanager. (It occurs to me that it *might* be possible to use 'drop' rules here to discard alerts; that would be a very confusing config IMO)

> For me it's really like this:
> My Prometheus instance monitors:
> - my "own" instances, where I need to react on things like >85% usage on root filesystem (and thus want to get an alert)
> - "foreign" instances, where I just get the node exporter data and show e.g. CPU usage, IO usage, and so on as a convenience to users of our cluster - but any alert conditions wouldn't cause any further action on my side (and the guys in charge of those servers have their own monitoring)

In this situation, and if you are using static_configs or file_sd_configs to identify the hosts, then I would simply use a target label (e.g. "owner") to distinguish which targets are yours and which are foreign; or I would use two different scrape jobs for self and foreign (which means the "job" label can be used to distinguish them)

The storage cost of having extra labels in the TSDB is essentially zero, because it's the unique combination of labels that identifies the timeseries - the bag of labels is mapped to an integer ID I believe.  So the only problem is if this label changes often, and to me it sounds like a 'local' or 'foreign' instance remains this way indefinitely.

If you really want to keep these labels out of the metrics, then having a separate timeseries with metadata for each instance is the next-best option. Suppose you have a bunch of metrics with an 'instance' label, e.g.

node_filesystem_free_bytes(instance="bar", ....}
node_filesystem_size_bytes(instance="bar", ....}
...

as the actual metrics you're monitoring, then you create one extra static timeseries per host (instance) like this:

meta{instance="bar",owner="self",site="london"} 1

(aside: TSDB storage for this will be almost zero, because of the delta-encoding used). These can be created by scraping a static webserver, or by using recording rules.

Then your alerting rules can be like this:

expr: |
  (
     ... normal rule here ...
  ) * on(instance) group_left(site) meta{owner="self"}

The join will:
* Limit alerting to those hosts which have a corresponding 'meta' timeseries (matched on 'instance') and which has label owner="self"
* Add the "site" label to the generated alerts

Beware that:

1. this will suppress alerts for any host which does not have a corresponding 'meta' timeseries. It's possible to work around this to default to sending rather than not sending alerts, but makes the expressions more complex:

2.  the "instance" labels must match exactly. So for example, if you're currently scraping with the default label instance="foo:9100" then you'll need to change this to instance="foo" (which is good practice anyway).  See

(I use some relabel_configs tricks for this; examples posted in this group previously)

> From all that it seems to me that the "best" solution is either:
> a) simply making more complex and error prone alert rules, that filter out the instances in the first place, like in:
>    expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"} * 100) / node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"}) >= 85

That's not great, because as you observe it will become more and more complex over time; and in any case won't work if you want to treat certain combinations of labels differently (e.g. stop alerting on a specific *filesystem* on a specific host)

If you really don't want to use either of the solutions I've given above, then another way is to write some code to preprocess your alerting rules, i.e. expand a single template rule into a bunch of separate rules, based on your own templates and data sources.

HTH,

Brian.

Brian Candler

unread,
Apr 26, 2023, 5:29:24 AM4/26/23
to Prometheus Users
P.S. Your expression

>    expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"} * 100) / node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"}) >= 85

can be simplified to:

>    expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"} * 100) / node_filesystem_size_bytes) >= 85

That's because the result instant vector for an expression like "foo / bar" only includes entries where the label sets match on left and right hand sides.  Any others are dropped silently.  (This form may be slightly less efficient, but I wouldn't expect it to be a problem unless you have hundreds of thousands of filesystems)

I would be inclined to simplify it further to:

>    expr: node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"} / node_filesystem_size_bytes < 0.15

You can use {{ $value | humanizePercentage }} in your alert annotations to show readable percentages.

Christoph Anton Mitterer

unread,
Apr 27, 2023, 7:16:41 PM4/27/23
to Prometheus Users
On Wednesday, April 26, 2023 at 9:14:35 AM UTC+2 Brian Candler wrote:
> I guess with (2) you also meant having a route which is then permanently muted?

I'd use a route with a null receiver (i.e. a receiver which has no <transport>_configs under it)

Ah, interesting. It wasn't even clear to me from the documentation, that this works, but as you say - it does.

Nevertheless, it only suppresses the alert notifications, but e.g. within the AlertManager they would still show up as firing (as expected).

 
> b) The idea hat I had above:
> - using <alert_relabel_configs> to filter on the instances and add a label if it should be silenced
> - use only that label in the expr instead of the full regex
> But would that even work?

No, because as far as I know alert_relabel_configs is done *after* the alert is generated from the alerting rule.

I've already assumed so from the documentation,.. thanks for confirmation.

 
It's only used to add extra labels before sending the generated alert to alertmanager. (It occurs to me that it *might* be possible to use 'drop' rules here to discard alerts; that would be a very confusing config IMO)

What do you mean by drop rules?

 
> For me it's really like this:
> My Prometheus instance monitors:
> - my "own" instances, where I need to react on things like >85% usage on root filesystem (and thus want to get an alert)
> - "foreign" instances, where I just get the node exporter data and show e.g. CPU usage, IO usage, and so on as a convenience to users of our cluster - but any alert conditions wouldn't cause any further action on my side (and the guys in charge of those servers have their own monitoring)

In this situation, and if you are using static_configs or file_sd_configs to identify the hosts, then I would simply use a target label (e.g. "owner") to distinguish which targets are yours and which are foreign; or I would use two different scrape jobs for self and foreign (which means the "job" label can be used to distinguish them)

I had thought about that too, but the downside of it would be that I have to "hardcode" this into the labels within the TDSB. Even if storage is not a concern, what might happen sometimes is that a formerly "foreign" server moves into my responsibility.
Then I think things would get messy.

In general, TBH, to me its also not really clear what the best practise is in terms of scrape jobs:

At one time I planned to use them to "group" servers that somehow belong together, e.g. in the case of a job for data from the node exporter, I would have made node_storage_servers, node_compute_servers or something like that.
But then I felt this could actually cause troubles later on, when I want to e.g. filter time series based on the job (or as above: when a server moves its roles).

So right now I put everything (from one exporter) in one job.
Not really sure whether this is stupid or not ;-)

 
The storage cost of having extra labels in the TSDB is essentially zero, because it's the unique combination of labels that identifies the timeseries - the bag of labels is mapped to an integer ID I believe.  So the only problem is if this label changes often, and to me it sounds like a 'local' or 'foreign' instance remains this way indefinitely.

Arguably, for the above particular use case, it would be rather quite rare that it changes.
But for the node_storage_servers vs. node_compute_servers case... it would actually happen quite often in my environment.
 

If you really want to keep these labels out of the metrics, then having a separate timeseries with metadata for each instance is the next-best option. Suppose you have a bunch of metrics with an 'instance' label, e.g.

node_filesystem_free_bytes(instance="bar", ....}
node_filesystem_size_bytes(instance="bar", ....}
...

as the actual metrics you're monitoring, then you create one extra static timeseries per host (instance) like this:

meta{instance="bar",owner="self",site="london"} 1

(aside: TSDB storage for this will be almost zero, because of the delta-encoding used). These can be created by scraping a static webserver, or by using recording rules.

Then your alerting rules can be like this:

expr: |
  (
     ... normal rule here ...
  ) * on(instance) group_left(site) meta{owner="self"}

The join will:
* Limit alerting to those hosts which have a corresponding 'meta' timeseries (matched on 'instance') and which has label owner="self"
* Add the "site" label to the generated alerts

Beware that:

1. this will suppress alerts for any host which does not have a corresponding 'meta' timeseries. It's possible to work around this to default to sending rather than not sending alerts, but makes the expressions more complex:

2.  the "instance" labels must match exactly. So for example, if you're currently scraping with the default label instance="foo:9100" then you'll need to change this to instance="foo" (which is good practice anyway).  See

That's a pretty neat idea. So meta would basically serve as an in-metric encoded information about instance grouping/owner/etc.?

This should go to some howto or BCP document.
 

(I use some relabel_configs tricks for this; examples posted in this group previously)

For the port removal, I came up with my own solution (based on some others I've had found) ... and asked for inclusion in the documentation:

Not sure if you'd find mine proper ;-)

 
> From all that it seems to me that the "best" solution is either:
> a) simply making more complex and error prone alert rules, that filter out the instances in the first place, like in:
>    expr: 100 - ((node_filesystem_avail_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"} * 100) / node_filesystem_size_bytes{mountpoint="/",fstype!="rootfs",instance=~"someRegexThatMatchesTheRighHosts"}) >= 85

That's not great, because as you observe it will become more and more complex over time; and in any case won't work if you want to treat certain combinations of labels differently (e.g. stop alerting on a specific *filesystem* on a specific host)

If you really don't want to use either of the solutions I've given above, then another way is to write some code to preprocess your alerting rules, i.e. expand a single template rule into a bunch of separate rules, based on your own templates and data sources.

It would be nice if Prometheus would have some more built-in means for things like this. I mean your solution with meta above is nice, but also adds some complexity (which someone who'd read my config) would then first need to understand... plus there are, as you said, pitfalls, like your point (1) above.
 
HTH,

It did, and was greatly appreciated :-)

Thanks,
Chris.

PS: Thanks also for your hints on how to improve the expression (which I've had merely copied and pasted from the full node exporter grafana dashboard. O:-)
Message has been deleted

Brian Candler

unread,
Apr 29, 2023, 4:33:16 AM4/29/23
to Prometheus Users
Does anybody know why Google Groups occasionally deletes message? The previous reply I wrote nows shows as "Message has been deleted". And as I wrote it in the web interface, I don't have a copy :-(
Reply all
Reply to author
Forward
0 new messages