I have a problem.I want to display the remaining memory in alerts, which takes the value of the node_filesystem_avail_bytes metric and divides it by / 1024 / 1024 (to convert to Mb).
Now my rule looks like this:
- alert: HostOutOfDiskSpace_test
expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes > 0 and ON (instance, device, mountpoint) node_filesystem_readonly == 0 and {instance=~"10.22.22.18.*"}
for: 1s
labels:
severity: high
annotations:
summary: "Host {{ $labels.instance }} out of memory \n - Device: {{ $labels.device }} \n - Mountpoint - {{ $labels.mountpoint }}"
description: "Node memory is filling up. Value = {{ $value | printf `%.2f` }} ({{ printf \"node_filesystem_avail_bytes{mountpoint='%s'}\" .Labels.mountpoint | query | first | value|humanize1024}})"
But the value is displayed incorrectly. Prometheus shows the correct values of the node_filesystem_avail_bytes metrics, but in the notification it is not recalculated correctly. Here are examples of notifications from telegrams:
🔥 [PROBLEM] HostOutOfDiskSpace_test_1
Severity: high
Summary: Host
10.22.22.181:9100 out of memory
- Device: /dev/sda1
- Mountpoint - /boot
Description: Node memory is filling up. Value = 72.62 (736.4Mi)
Starts at: 2023-08-15 16:19:52.25 +0300 MSK
____________________
🔥 [PROBLEM] HostOutOfDiskSpace_test_1
Severity: high
Summary: Host
10.22.22.181:9100 out of memory
- Device: tmpfs
- Mountpoint - /run
Description: Node memory is filling up. Value = 98.14 (298.2Mi)
Starts at: 2023-08-15 16:19:52.25 +0300 MSK
____________________
🔥 [PROBLEM] HostOutOfDiskSpace_test_1
Severity: high
Summary: Host
10.22.22.181:9100 out of memory
- Device: tmpfs
- Mountpoint - /run/user/1000
Description: Node memory is filling up. Value = 100.00 (298.8Mi)
Starts at: 2023-08-15 16:19:52.25 +0300 MSK
____________________
🔥 [PROBLEM] HostOutOfDiskSpace_test_1
Severity: high
Summary: Host
10.22.22.181:9100 out of memory
- Device: /dev/mapper/rhel-root
- Mountpoint - /
Description: Node memory is filling up. Value = 87.08 (31.48Gi)
Starts at: 2023-08-15 16:19:52.25 +0300 MSK
____________________
At the same time, Prometheus shows the following values:
{device="/dev/mapper/rhel-root", fstype="xfs", instance="
10.22.22.181:9100", mountpoint="/"} 15146.296875
{device="/dev/sda1", fstype="xfs", instance="
10.22.22.181:9100", mountpoint="/boot"} 736.39453125
{device="tmpfs", fstype="tmpfs", instance="
10.22.22.181:9100", mountpoint="/run"} 872.640625
{device="tmpfs", fstype="tmpfs", instance="
10.22.22.181:9100", mountpoint="/run/user/1000"} 177.83984375
Here is the output from the OS:
Filesystem Size Used Avail Use% Mounted on
devtmpfs 871M 0 871M 0% /dev
tmpfs 890M 0 890M 0% /dev/shm
tmpfs 890M 17M 873M 2% /run
tmpfs 890M 0 890M 0% /sys/fs/cgroup
/dev/mapper/rhel-root 17G 2.2G 15G 13% /
/dev/sda1 1014M 278M 737M 28% /boot
tmpfs 178M 0 178M 0% /run/user/1000
Reading the official documentation from the
prometheus.io website, I realized that the problem was in the translation of the data. I need one rule to work for different devices and mount points. The construct {{ printf \"node_filesystem_avail_bytes{mountpoint='%s'}\" .Labels.mountpoint | query | first | value | humanize1024 }} works the way I want it to, but the mountpoint string that contains the given value doesn't translate correctly to humanize1024. humanize Very far from real values, so I don't consider it.
Maybe someone has come across this. How can I display node_filesystem_avail_bytes associated with a specific device and mount point, bypassing the humanize1024 function, but simply dividing by / 1024 / 1024 or some other conversion to MB or GB?
Thank you for your responses!