How to display the remaining memory in alerts, which takes the value of the node_filesystem_avail_bytes metric and divides it by / 1024 / 1024 (to convert to Mb)

Станислав Кузнецов

unread,

Aug 16, 2023, 3:31:13 AM8/16/23

to Prometheus Users

Good afternoon!

I have a problem.I want to display the remaining memory in alerts, which takes the value of the node_filesystem_avail_bytes metric and divides it by / 1024 / 1024 (to convert to Mb).

Now my rule looks like this:
- alert: HostOutOfDiskSpace_test
expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes > 0 and ON (instance, device, mountpoint) node_filesystem_readonly == 0 and {instance=~"10.22.22.18.*"}
for: 1s
labels:
severity: high
annotations:
summary: "Host {{ $labels.instance }} out of memory \n - Device: {{ $labels.device }} \n - Mountpoint - {{ $labels.mountpoint }}"
description: "Node memory is filling up. Value = {{ $value | printf `%.2f` }} ({{ printf \"node_filesystem_avail_bytes{mountpoint='%s'}\" .Labels.mountpoint | query | first | value|humanize1024}})"

But the value is displayed incorrectly. Prometheus shows the correct values of the node_filesystem_avail_bytes metrics, but in the notification it is not recalculated correctly. Here are examples of notifications from telegrams:

🔥 [PROBLEM] HostOutOfDiskSpace_test_1
Severity: high
Summary: Host 10.22.22.181:9100 out of memory
- Device: /dev/sda1
- Mountpoint - /boot
Description: Node memory is filling up. Value = 72.62 (736.4Mi)
Starts at: 2023-08-15 16:19:52.25 +0300 MSK
____________________

🔥 [PROBLEM] HostOutOfDiskSpace_test_1
Severity: high
Summary: Host 10.22.22.181:9100 out of memory
- Device: tmpfs
- Mountpoint - /run
Description: Node memory is filling up. Value = 98.14 (298.2Mi)
Starts at: 2023-08-15 16:19:52.25 +0300 MSK
____________________

🔥 [PROBLEM] HostOutOfDiskSpace_test_1
Severity: high
Summary: Host 10.22.22.181:9100 out of memory
- Device: tmpfs
- Mountpoint - /run/user/1000
Description: Node memory is filling up. Value = 100.00 (298.8Mi)
Starts at: 2023-08-15 16:19:52.25 +0300 MSK
____________________

🔥 [PROBLEM] HostOutOfDiskSpace_test_1
Severity: high
Summary: Host 10.22.22.181:9100 out of memory
- Device: /dev/mapper/rhel-root
- Mountpoint - /
Description: Node memory is filling up. Value = 87.08 (31.48Gi)
Starts at: 2023-08-15 16:19:52.25 +0300 MSK
____________________

At the same time, Prometheus shows the following values:
{device="/dev/mapper/rhel-root", fstype="xfs", instance="10.22.22.181:9100", mountpoint="/"} 15146.296875
{device="/dev/sda1", fstype="xfs", instance="10.22.22.181:9100", mountpoint="/boot"} 736.39453125
{device="tmpfs", fstype="tmpfs", instance="10.22.22.181:9100", mountpoint="/run"} 872.640625
{device="tmpfs", fstype="tmpfs", instance="10.22.22.181:9100", mountpoint="/run/user/1000"} 177.83984375

Here is the output from the OS:
Filesystem Size Used Avail Use% Mounted on
devtmpfs 871M 0 871M 0% /dev
tmpfs 890M 0 890M 0% /dev/shm
tmpfs 890M 17M 873M 2% /run
tmpfs 890M 0 890M 0% /sys/fs/cgroup
/dev/mapper/rhel-root 17G 2.2G 15G 13% /
/dev/sda1 1014M 278M 737M 28% /boot
tmpfs 178M 0 178M 0% /run/user/1000

Reading the official documentation from the prometheus.io website, I realized that the problem was in the translation of the data. I need one rule to work for different devices and mount points. The construct {{ printf \"node_filesystem_avail_bytes{mountpoint='%s'}\" .Labels.mountpoint | query | first | value | humanize1024 }} works the way I want it to, but the mountpoint string that contains the given value doesn't translate correctly to humanize1024. humanize Very far from real values, so I don't consider it.

Maybe someone has come across this. How can I display node_filesystem_avail_bytes associated with a specific device and mount point, bypassing the humanize1024 function, but simply dividing by / 1024 / 1024 or some other conversion to MB or GB?

Thank you for your responses!

Brian Candler

unread,

Aug 16, 2023, 4:04:40 AM8/16/23

to Prometheus Users

If it were me, I would forget all the printf and query stuff, and just make the value of the alerting expression be the value you want.

expr: node_filesystem_avail_bytes and (.... some other trigger expression ...)

This gives the value of the LHS, if there is a matching expression on the RHS (exact same set of labels); it ignores the value on the RHS.

e.g.

expr: node_filesystem_avail_bytes and (node_filesystem_avail_bytes / node_filesystem_size_bytes < 0.1)

Then I would use {{ $value | humanize1024 }} in the annotation, although alternatively you could divide the expr to get MiB:

expr: (node_filesystem_avail_bytes/1024/1024) and (node_filesystem_avail_bytes / node_filesystem_size_bytes < 0.1)

However if you want both the percentage *and* the raw MiB in the annotation, then yes you have to jump through hoops. I've never used the "query" filter, but if it's regular PromQL then you could do the arithmetic there:

printf \"node_filesystem_avail_bytes{mountpoint='%s'}/1024/1024\" .Labels.mountpoint | query | first | value

Станислав Кузнецов

unread,

Aug 16, 2023, 10:17:55 AM8/16/23

to Brian Candler, Prometheus Users

I solved this problem. A colleague from another forum helped. I also added a binding to instance, and not just to mountpoint, and everything worked as it should.

Working result: {{ printf \"node_filesystem_avail_bytes{mountpoint='%s', instance='%s'}\" .Labels.mountpoint .Labels.instance | query | first | value | humanize1024 }}

ср, 16 авг. 2023 г. в 11:04, Brian Candler <b.ca...@pobox.com>:

--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/rx9V3K8Kcb8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/c144fea2-ac7a-4b9c-b92b-6ae779ba16c7n%40googlegroups.com.

Reply all

Reply to author

Forward