We use a Prometheus alert (and node-exporter) to check whether we are running out of memory on a node.
Issue: In many cases I get an alert with a $value that is below the threshold value in the expression.The expression is:
alert: GettingOutOfMemory expr: max(sum by(instance) ((((node_memory_MemTotal_bytes) - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / (node_memory_MemTotal_bytes)) * 100)) >= 90 for: 5m labels: severity: warning annotations: description: Docker Swarm node {{ $labels.instance }} memory usage is at {{ humanize $value}}%. summary: Memory is getting low for Swarm node '{{ $labels.node_name }}'
I get messages saying that we ran out of memory at e.g. 63%. So that is the value of the $value. This is clearly below the 90% threshold.
Why do I get this alert even though the $value is below the threshold?
How can I repair this Prometheus alert rule so I will only get only alerts when the $value is above the threshold?
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/6a2cc11f-b733-4a66-87b5-55e8355d7ebb%40googlegroups.com.
the value node_memory_MemTotal_bytes? Is there a way to print the values of the parts as well, so e.g.the value node_memory_MemTotal_bytes?Or is it possible to have the evaluation steps logged?
Op vrijdag 24 april 2020 11:18:28 UTC+2 schreef J Houer:We use a Prometheus alert (and node-exporter) to check whether we are running out of memory on a node.
Issue: In many cases I get an alert with a $value that is below the threshold value in the expression.The expression is:
alert: GettingOutOfMemory expr: max(sum by(instance) ((((node_memory_MemTotal_bytes) - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / (node_memory_MemTotal_bytes)) * 100)) >= 90 for: 5m labels: severity: warning annotations: description: Docker Swarm node {{ $labels.instance }} memory usage is at {{ humanize $value}}%. summary: Memory is getting low for Swarm node '{{ $labels.node_name }}'
I get messages saying that we ran out of memory at e.g. 63%. So that is the value of the $value. This is clearly below the 90% threshold.
Why do I get this alert even though the $value is below the threshold?
How can I repair this Prometheus alert rule so I will only get only alerts when the $value is above the threshold?
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/4d2c74d4-89c8-47dc-b858-f651ec8d9d9b%40googlegroups.com.
We use a Prometheus alert (and node-exporter) to check whether we are running out of memory on a node.