CPU usage by monitoring system

6,547 views
Skip to first unread message

denik.ka...@gmail.com

unread,
Jun 25, 2018, 5:35:59 AM6/25/18
to Prometheus Users


Hi!
I have a virtual machine with Prometheus, Grafana and node_exporter. I want to know what part of the CPU time from the total system and user is used my monitoring. The resulting graph shows that monitoring takes more than all processes, but I thought that the node_exporter collects data for all processes, and therefore the value of the metric "node_cpu_seconds_total" must be greater than sum of time used by 3 processes. What am I doing wrong?



Ben Kochie

unread,
Jun 25, 2018, 5:39:43 AM6/25/18
to denik.ka...@gmail.com, Prometheus Users
Prometheus components expose their own CPU use as "process_cpu_seconds_total".

You can use those metrics to see the per-process use.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/203e05ad-bd7c-4465-aa07-e2c97dabe82d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

denik.ka...@gmail.com

unread,
Jun 25, 2018, 5:58:24 AM6/25/18
to Prometheus Users
I thought that the query on the chart is enough, but I was wrong. I'm adding it here.

avg(sum(irate(process_cpu_seconds_total[5m]))) / avg(sum (irate(node_cpu_seconds_total{job="node", mode=~"(system|user)"}[5m])))

понедельник, 25 июня 2018 г., 14:39:43 UTC+5 пользователь Ben Kochie написал:

Ben Kochie

unread,
Jun 25, 2018, 6:45:57 AM6/25/18
to denik.ka...@gmail.com, Prometheus Users
Sorry, I didn't fully read the graph query. I think I understand what you're trying to do. But I'm not sure that query is going to produce the right answer. I need to think about it a bit.

One problem is, you're using irate(), which can produce very odd results for a query like that.

Another, only cosmetic issue, is that the inner sum() function makes the avg() not do anything.

denik.ka...@gmail.com

unread,
Jul 12, 2018, 10:36:09 AM7/12/18
to Prometheus Users
The question is still relevant. I would like to know how to calculate the monitoring system CPU usage percent. If irate is not suitable for this what should i use?  

понедельник, 25 июня 2018 г., 15:45:57 UTC+5 пользователь Ben Kochie написал:

Matthias Rampke

unread,
Jul 13, 2018, 7:48:02 AM7/13/18
to denik.ka...@gmail.com, Prometheus Users
Use `rate()` with a suitably long interval (at least 4x, better 10x, the scrape interval). The longer the interval the smoother. I think a large part of what you're seeing (and why it's so regular) is interference patterns between the times when `process_cpu_seconds_total` and `node_cpu_seconds_total` are being scraped.

/MR

Reply all
Reply to author
Forward
0 new messages