Different disk full thresholds for alerts

1,618 views
Skip to first unread message

R. Diez

unread,
Nov 20, 2021, 5:20:11 PM11/20/21
to Prometheus Users
Hi all:

I have a number of computers, and each has a number of disks. The metrics I am interested in are like this:

windows_logical_disk_free_bytes{instance="PC1",volume="HarddiskVolume1"}
windows_logical_disk_free_bytes{instance="PC1",volume="HarddiskVolume2"}
windows_logical_disk_free_bytes{instance="PC1",volume="HarddiskVolume3"}
windows_logical_disk_free_bytes{instance="PC2",volume="HarddiskVolume1"}
windows_logical_disk_free_bytes{instance="PC2",volume="HarddiskVolume2"}
windows_logical_disk_free_bytes{instance="PC2",volume="HarddiskVolume3"}
...

I am using an alert like this, which I found on the Internet:

  - alert: DiskSpaceUsageOnWindows
    expr: 100.0 - 100 * (windows_logical_disk_free_bytes / windows_logical_disk_size_bytes) > 50
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "Disk Space Usage (instance {{ $labels.instance }})"

The trouble is, I want a different alert threshold depending on the disk, and the thresholds can be pretty arbitrary.

What is the best way to achieve that?

If using arbitrary values is hard in Prometheus, I can live with a few predefined groups like these:

alarm_threshold_for_disk_full_a: 50%
alarm_threshold_for_disk_full_b: 90%

 I could then add particular labels to specific disk metrics. For example, I could add "alarm_threshold_for_disk_full_a" to disks {"PC1", "HarddiskVolume2"} and {"PC2", "HarddiskVolume3"}.

Then I can write an alert rule for each label.

I have seen a way to add a label to all metrics of a particular target, but how do I add a label to a particular metric in a particular target? Do I have to resort to some "metric_relabel_configs" magic? Could someone provide an example on how to do that?

Thanks in advance,
  rdiez

Brian Candler

unread,
Nov 21, 2021, 5:15:51 AM11/21/21
to Prometheus Users
On Saturday, 20 November 2021 at 22:20:11 UTC rdie...@gmail.com wrote:
The trouble is, I want a different alert threshold depending on the disk, and the thresholds can be pretty arbitrary.

What is the best way to achieve that?


However, I've found it's better not to have static alert thresholds anyway.  The problem is: a volume hits 90% full, but it's working fine,  and isn't growing, and nobody wants to mess with data just to bring it back down to 89% to silence the alert.  You obviously don't want the alert firing forever, so what do you do?  Move the threshold to 91%, and repeat the whole thing later?

Instead, I have two sets of disk space alerts.

* A critical alert for "the disk is full, or near as dammit" (less than 100MB free), for any filesystem whose capacity is more than 120MB. This fires almost immediately.

  - alert: DiskFull
    expr: |
      node_filesystem_avail_bytes{fstype!~"fuse.*|nfs.*"} < 100000000 unless node_filesystem_size_bytes < 120000000
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: 'Filesystem full or less than 100MB free space'

* Warning alerts for "the disk is filling up, and at this rate is going to be full soon"

- name: DiskRate10m
  interval: 1m
  rules:
  # Warn if rate of growth over last 10 minutes means filesystem will fill in 2 hours
  - alert: DiskFilling
    expr: |
      predict_linear(node_filesystem_avail_bytes{fstype!~"fuse.*|nfs.*"}[10m], 7200) < 0
    for: 20m
    labels:
      severity: warning
    annotations:
      summary: 'Filesystem will be full in less than 2h at current 10m growth rate'

- name: DiskRate3h
  interval: 10m
  rules:
  # Warn if rate of growth over last 3 hours means filesystem will fill in 2 days
  - alert: DiskFilling
    expr: |
      predict_linear(node_filesystem_avail_bytes{fstype!~"fuse.*|nfs.*"}[3h], 2*86400) < 0
    for: 6h
    labels:
      severity: warning
    annotations:
      summary: 'Filesystem will be full in less than 2d at current 3h growth rate'

- name: DiskRate12h
  interval: 1h
  rules:
  # Warn if rate of growth over last 12 hours means filesystem will fill in 7 days
  - alert: DiskFilling
    expr: |
      predict_linear(node_filesystem_avail_bytes{fstype!~"fuse.*|nfs.*"}[12h], 7*86400) < 0
    for: 24h
    labels:
      severity: warning
    annotations:
      summary: 'Filesystem will be full in less than 1w at current 12h growth rate'

These are evaluated over different time periods and with different "for" periods, to reduce noise from filesystems which have a regular filling and emptying pattern.  For example, I see some systems where the disk space grows and shrinks in an hourly or daily pattern.

I would like to rework those expressions so that they return the estimated time-until-full (i.e. work out where the zero crossing takes place), but I never got round to it.

In practice I also have to exclude a few noisy systems by applying more label filters:
node_filesystem_avail_bytes{fstype!~"...",instance!~"...",mountpoint!~"..."}

HTH,

Brian.

Brian Candler

unread,
Nov 21, 2021, 11:49:23 AM11/21/21
to Prometheus Users
I *think* these calculate the time-to-full:

- name: DiskRate10m
  interval: 1m
  rules:
  # Warn if rate of growth over last 10 minutes means filesystem will fill in 2 hours
  - alert: DiskFilling10m
    expr: |
        node_filesystem_avail_bytes / (node_filesystem_avail_bytes -
        (predict_linear(node_filesystem_avail_bytes{fstype!~"fuse.*|nfs.*"}[10m], 7200) < 0)) * 7200
    for: 20m
    labels:
      severity: warning
    annotations:
      summary: 'Filesystem will be full in {{ $value | humanizeDuration }} at current 10m growth rate'

- name: DiskRate3h
  interval: 10m
  rules:
  # Warn if rate of growth over last 3 hours means filesystem will fill in 2 days
  - alert: DiskFilling3h
    expr: |
        node_filesystem_avail_bytes / (node_filesystem_avail_bytes -
        (predict_linear(node_filesystem_avail_bytes{fstype!~"fuse.*|nfs.*"}[3h], 172800) < 0)) * 172800
    for: 6h
    labels:
      severity: warning
    annotations:
      summary: 'Filesystem will be full in {{ $value | humanizeDuration }} at current 3h growth rate'

- name: DiskRate12h
  interval: 1h
  rules:
  # Warn if rate of growth over last 12 hours means filesystem will fill in 7 days
  - alert: DiskFilling12h
    expr: |
        node_filesystem_avail_bytes / (node_filesystem_avail_bytes -
        (predict_linear(node_filesystem_avail_bytes{fstype!~"fuse.*|nfs.*"}[12h], 604800) < 0)) * 604800
    for: 24h
    labels:
      severity: warning
    annotations:
      summary: 'Filesystem will be full in {{ $value | humanizeDuration }} at current 12h growth rate'

R. Diez

unread,
Nov 21, 2021, 3:20:24 PM11/21/21
to Prometheus Users
First of all, thanks for your answers.


I do not understand how that article can help me. If I understood that correctly, it seems to describe a way for different teams to define their own thresholds.

But I have no teams, I am a lone fighter. 8-) I just want to assign different thresholds to different disks.

> [...]
> You obviously don't want the alert firing forever, so what do you do?
> Move the threshold to 91%, and repeat the whole thing later?

Say I have 3 thresholds:
- only 100 MB left free
- 90 % full
- 95 % full

If a disk hits 90 %, I may then decide to move it to the 95 % group.

Initially, I thought I could create one alarm per threshold, and each alarm would only trigger for their particular threshold label. I could then add a label to each computer, in order to place all of its disks in one of the alarm thresholds.

But later on I realised that I would need to add a label to each disk (to each disk metric). For example, some computers some several disks. The thresholds for the system disks (where the OS is installed) are generally different from the thresholds for user data disks.

At that point I thought I could add the corresponding threshold label to each separate disk, but that's what I am struggling to do in Prometheus. Do I have to resort to some "metric_relabel_configs" magic? Could you provide an example on how to do that?

Best regards,
  rdiez

Brian Candler

unread,
Nov 21, 2021, 4:36:03 PM11/21/21
to Prometheus Users
On Sunday, 21 November 2021 at 20:20:24 UTC rdie...@gmail.com wrote:
First of all, thanks for your answers.


I do not understand how that article can help me. If I understood that correctly, it seems to describe a way for different teams to define their own thresholds.

It shows the general idea of using one timeseries as the threshold to another.  You can forget the "teams" idea entirely.  Just create a new timeseries called, say, "windows_logical_disk_used_threshold_percent".  Give it the same set of labels as "windows_logical_disk_free_bytes" - at least 'instance' and 'volume'.  Then alert on:

100.0 - 100 * (windows_logical_disk_free_bytes / windows_logical_disk_size_bytes) > windows_logical_disk_used_threshold_percent

That's it.  If the label sets are not exactly the same, in particular if windows_logical_disk_free_bytes has some extra labels which you want to ignore, then using ignoring(...) on the extra labels.
 
You can generate the static timeseries by using node_exporter textfile collector, or by putting up a static web page somewhere containing all the threshold metrics and scraping it.  Either way you'll be overriding the "instance" label, so you'll need the "honor_labels" setting in the scrape job, or else some relabelling.


At that point I thought I could add the corresponding threshold label to each separate disk, but that's what I am struggling to do in Prometheus. Do I have to resort to some "metric_relabel_configs" magic? Could you provide an example on how to do that?


I think that's a poor way to do it.  Firstly, you're hard-coding information on *how to alert* into the metrics themselves.  The metrics should purely represent the information collected, since they may be used by different systems for different purposes.  Secondly, if you change the thresholds to use, the labelsets will change, which means they become different timeseries.

R. Diez

unread,
Nov 21, 2021, 5:03:36 PM11/21/21
to Prometheus Users
> Just create a new timeseries called, say, "windows_logical_disk_used_threshold_percent".
> Give it the same set of labels as "windows_logical_disk_free_bytes" - at least 'instance' and 'volume'.

I understand now. The trick is that the labels match between the timeseries. Labels 'instance' and 'volume' would match between disk metric and threshold metric, so each disk gets mapped to its threshold.

It looks workable, and is probably cleaner in a theoretical way. But it looks complicated. You have to learn about 'ignoring' and 'honor_labels'. You have to scrape an artificial metric, and worry about details such as how to get a warning if you forget one of the disk volumes in the threshold metric.

I think that adding a label per disk metric is simpler, at least for a very small network like mine. I guess it would be very simple then to write an alert for disks where you forgot a threshold label.

> [...]
> Secondly, if you change the thresholds to use, the labelsets will change,
> which means they become different timeseries.

I don't think that becoming a different timeseries because a label has changed is really a problem in practice, at least for a simple monitoring system like mine.

Anyway, even if it is not a good solution, I would still like to know the best way to add a label to just one metric inside a scraping target. That may be useful in other situations.

Is there a nice way to do that? Or do I have to resort to some "metric_relabel_configs" magic?

Best regards,
  rdiez

Brian Candler

unread,
Nov 22, 2021, 3:33:04 AM11/22/21
to Prometheus Users
>  You have to scrape an artificial metric, and worry about details such as how to get a warning if you forget one of the disk volumes in the threshold metric.

You can also write the rule to use a default value if the threshold metric doesn't exist.  That's helpful if most of the drives use the same threshold, and only a few need to be different.  The article I linked to shows how to do that.

> to add a label to just one metric ... do I have to resort to some "metric_relabel_configs" magic?

Yes.  You can add the label to all metrics from a given target by using the target labels at scraping time.  If you want to add that label to certain timeseries only then you'll have to add them post-scraping using metric_relabel_configs.  You'd match the metric name and labels, and conditionally add new labels.

I'm not going to write this for you, because:
(1) as I said, it's a bad way to build a monitoring system
(2) in the limit, you will end up with a separate rewriting rule for every instance+volume combination

If you're going to do that, you may as well simply write separate alerting rules for each volume:

expr: windows_logical_disk_free_bytes{instance="PC1",volume="HarddiskVolume1"} / windows_logical_disk_size_bytes < 0.5

expr: windows_logical_disk_free_bytes{instance="PC1",volume="HarddiskVolume2"} / windows_logical_disk_size_bytes < 0.1

expr: windows_logical_disk_free_bytes{instance="PC1",volume="HarddiskVolume3"} / windows_logical_disk_size_bytes < 0.05
... etc

This doesn't scale to thousands of alerting rules, but neither does metric relabeling with thousands of rules.

R. Diez

unread,
Nov 22, 2021, 12:25:41 PM11/22/21
to Prometheus Users
> [...]
> (1) as I said, it's a bad way to build a monitoring system

It is of course cleaner, in a theoretical way, to place the thresholds in a separate location, and not change the disk metrics every time your relabel them in order to move disks to a separate threshold alert.

But I am not convinced that your solution is better in practice, especially for small networks like mine.

What you are suggesting is actually a work-around. It feels like Prometheus is missing an easy way to assign alerts to an arbitrary set of metrics, so you have to simulate metrics to provide thresholds. Then you can use existing PromQL syntax to check those thresholds against their disks.

If I understood the idea correctly, these virtual metrics would just provide the same values for all timestamps, because only the disk instance is relevant. That is an indication that the concept is not clean, just a work-around. Those "virtual" metrics are going to waste data space, because they are real time-series as far as Prometheus is concerned. They are going to double the number of windows_logical_disk_free_bytes time-series, because each disk instance metric will need a threshold counterpart. If you have thousands of disks, you can argue that this solution does not scale well either.

Associating disks with alert thresholds on an arbitrary basis is a very common requirement. I think that is going to happen all over the place. For example, you may have many thermometers measuring temperatures in the same way, but each temperature may require its own alert threshold. I am surprised that Prometheus makes this hard to achieve.

Your solution seems to be designed to assign an independent threshold per disk, but the most common scenario is that you will only have a small number of thresholds. For example, all Windows system disks (normally C:) would need an alert threshold, all Linux system disks will have another threshold. There will probably be a small number of data disk categories, say log disks, photo disks and document disks, and each one will need a separate alert threshold. But it is improbable that every disk will need a custom threshold. Similarly, if you are alerting based on temperatures, you will probably have groups too, like ambient temperature, fridge temperature and freezer temperatures. Not every thermometer will need a custom alert threshold.

So you need one alarm per threshold, and then a way to assign arbitrary disks or thermometers to one alarm. The easiest way now is probably to use labels. But in fact you are looking for a switch statement:

switch ( computer-instance, disk-volume )
{
  case PC1, Volume1: Assign to Alert A.
  case PC3, Volume3: Assign to Alert K.
  default:           Assign to Alert M.

}


> (2) in the limit, you will end up with a separate rewriting rule for every instance+volume combination

That's not too bad. The rewriting rules are just adding a label to each disk. It is perhaps rather verbose, the way the Prometheus syntax is, but those rewriting rules are (or can be) close to the disks they apply to. After all, you have to decide which threshold to apply to each disk, and where exactly you do that, or how verbose it is, does not make much difference in my opinion.



> This doesn't scale to thousands of alerting rules, but neither does metric relabeling with thousands of rules.

- If you solve this problem with alarms, you have to write or modify an alarm per disk you add. You may end up with many alarms.
- If you solve this problem with relabeling, you have to create or modify a label rewriting rule per disk you add. You may end up with many rewriting rules.
- If you solve this problem with virtual threshold metrics, you have to create a virtual metric per disk you add. You may end up with many metrics.

The difference in scalability is not great, as far as I can see (with my rather limited Prometheus knowledge).

Regards,
  rdiez

Brian Candler

unread,
Nov 22, 2021, 3:25:26 PM11/22/21
to Prometheus Users
>  If you solve this problem with virtual threshold metrics, you have to create a virtual metric per disk you add. You may end up with many metrics.

That's the point: Prometheus *does* scale to millions of metrics, easily.

Personally, I'd also prefer that prometheus was able to have "virtual" timeseries for thresholds, where the value is whatever it is right now, and assumed to be the same for all points forwards and backwards in history and not saved.  The idea has been raised, and has been rejected.  If you want to build this yourself, you could probably do so using the Remote Read protocol.

But you'll find your life much easier if you just do what everyone recommends, which is to scrape new timeseries for the thresholds.  In practice, the amount of disk space is used is miniscule, because Prometheus compresses so well: adjacent threshold values in the same timeseries are identical, so the difference between them is zero.

> switch ( computer-instance, disk-volume )
> {
>   case PC1, Volume1: Assign to Alert A.
>   case PC3, Volume3: Assign to Alert K.
>   default:           Assign to Alert M.
> }

You can write that switch statement as a series of alerting rules too.  It's no different.  Expand your alerting rules from a templating language of your choice.
Reply all
Reply to author
Forward
0 new messages