False Positive Alerts for node CPU Usage for one node

James S

unread,

Jul 8, 2021, 9:31:24 AM7/8/21

to Prometheus Users

We are getting False positive for only one node all the time. we do not have this issue with other nodes

we have the rule configured for the CPU usage was

alert:NodeCPUUtilWar

expr: instance:node_cpu_utilisation:rate1m > 0.8

for: 5m

record: instance:node_cpu_utilisation:rate1m

- expr:

1 - avg without (cpu, mode) (rate(node_cpu-seconds_total{job="node_exporter", mode ="idle"} [1m]))

Stuart Clark

unread,

Jul 8, 2021, 9:50:37 AM7/8/21

to James S, Prometheus Users

What makes you say it is a false positive? What does the graph of that
metric show?

--
Stuart Clark

James S

unread,

Jul 8, 2021, 11:07:43 AM7/8/21

to Prometheus Users

We do not see any stress on the cluster and we do not see this in GCP cloud monitoring this behavior.

Stuart Clark

unread,

Jul 8, 2021, 12:13:23 PM7/8/21

to James S, Prometheus Users

On 2021-07-08 16:07, James S wrote:
> We do not see any stress on the cluster and we do not see this in GCP
> cloud monitoring this behavior.
>

What does the graph of the metric look like?

Is this a single or multiple CPU machine?

James S

unread,

Jul 8, 2021, 12:48:05 PM7/8/21

to Prometheus Users

It is 4 CPU machine

the Grafana graph:

GCP monitoring:

James S

unread,

Jul 8, 2021, 12:49:54 PM7/8/21

to Prometheus Users

GCP monitoring CPU usage for the node

James S

unread,

Jul 8, 2021, 2:54:55 PM7/8/21

to Prometheus Users

I have changed the query to

sum(rate(node_cpu_seconds_total{mode!="idle"} [5m])) by (node) / sum(kube_node_status_capacity_cpu_cores) by node

But the result is the same. my problem is not fixed

Laurent Dumont

unread,

Jul 9, 2021, 6:19:19 AM7/9/21

to James S, Prometheus Users

I don't know how GCP calculates their CPU metrics, but node_cpu_seconds_total looks to contain statistics for user/kernel/interrupt etc spaces. Maybe you can make a separate graph based on each of those and see if one is much higher (https://www.robustperception.io/understanding-machine-cpu-usage)

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/7c0120dd-6e8c-4e55-9f46-a97d0d176229n%40googlegroups.com.

Reply all

Reply to author

Forward