How to get number of CPUs within Prometheus PromQL

1,299 views
Skip to first unread message

yasi...@gmail.com

unread,
Sep 20, 2018, 9:37:56 AM9/20/18
to Prometheus Users
Hi,
  
  I want to generate the High Load Average Alert, Currently I have following Alert Configuration:


  rules:
  - alert: High Load
    expr: avg_over_time(node_load1[4m]) > 0.8
    labels:
      severity: Major
    annotations:
      description: '{{ $labels.instance }} of job {{ $labels.job }} is under high
        load.'
      summary: Instance {{ $labels.instance }} under high load

Problem is node_load1 is factor of number of CPUs, if there are 8 CPUs, the true load average is node_load1[4m]/8. How I can get the number of CPUs dynamically using PromQL?


Thanks,
Yasir.

Ben Kochie

unread,
Sep 20, 2018, 10:44:00 AM9/20/18
to yasi...@gmail.com, Prometheus Users
Rather than look at load1, which is not CPU usage, but run queue length, I would look at the real CPU utilization with these recording rules:


Then you can alert like this:

- alert: High Load
  expr: avg_over_time(instance:node_cpu_utilization:ratio[4m]) * 100 > 80

I added the * 100 so the ratio can be shown as a percent in the description with {{ $value | printf "%.2f" }}

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/1873ab58-83d2-48e2-b267-31bdc997a1d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yasir Saleem

unread,
Sep 26, 2018, 1:20:55 AM9/26/18
to Prometheus Users
Hi Ben,
  Thanks for the help. I did not understand the Grammar of this expression. This does not seems to be PromQL. Is there separate language or grammar for alerting rules expression? - Where I can get more help on this? - Also, how to test those expressions?

Thanks,

Ben Kochie

unread,
Sep 26, 2018, 2:36:59 AM9/26/18
to Yasir Saleem, Prometheus Users

Yasir Saleem

unread,
Sep 27, 2018, 9:49:36 AM9/27/18
to Prometheus Users
I am confused about the syntax of these recording rules. This document says that 'expr:' is a PromQL expression, but when I test 'avg_over_time(instance:node_cpu_utilization:ratio[4m]) * 100 > 80' in the prometheus browser it does not work. Plus, I do not find any documentation that explains 'instance:node_cpu_utilization:ratio' this syntax of PromQL expression? - Where I can get help on that specific format/syntax of PromQL

Thanks.
Reply all
Reply to author
Forward
0 new messages