Node Exporter - Load Average greater then CPU Count?

1,142 views
Skip to first unread message

Lucas Siba

unread,
May 27, 2022, 12:24:10 PM5/27/22
to Prometheus Users
Does anyone know an easy way to get get a list of instances from Node Exporter where the load average is greater the number of CPU's on that instance?

I can do this to get the CPU Count per instance: (count by (instance)(node_cpu_seconds_total{mode="idle"}))

But I can't use this because the left side isn't grouping by the instance: node_load5{}) > (count by (instance)(node_cpu_seconds_total{mode="idle"}))

I could technically do this, but it seems misleading: sum(node_load5{}) by (instance) > (count by (instance)(node_cpu_seconds_total{mode="idle"}))

Any suggestions?

Brian Candler

unread,
May 27, 2022, 3:31:27 PM5/27/22
to Prometheus Users
https://prometheus.io/docs/prometheus/latest/querying/operators/#many-to-one-and-one-to-many-vector-matches

(count by (instance)(node_cpu_seconds_total{mode="idle"})) has labels like:
{instance="nuc1"}
{instance="nuc2"}

node_load5 has labels like:
node_load5{instance="nuc1", job="node"}
node_load5{instance="nuc2", job="node"}

Since there's a one-to-one correspondence, you can simply do
node_load5 > on (instance) (count by (instance)(node_cpu_seconds_total{mode="idle"}))

If there were multiple values on the right for the same instance (a 1:N relationship), then you could do:
node_load5 > on (instance) group_right() (count by (instance)(node_cpu_seconds_total{mode="idle"}))

Lucas Siba

unread,
May 27, 2022, 3:40:51 PM5/27/22
to Prometheus Users
Thanks! That works.
Reply all
Reply to author
Forward
0 new messages