combine 2 querues

19 views
Skip to first unread message

Alan Miller

unread,
Dec 17, 2020, 5:06:13 PM12/17/20
to Prometheus Users
I'm trying to create a prometheus rule that will detect when CPU usage is above 95%
only on hostnames matching a string (eg: "^as(.*)" which are my application servers.

The problem is that my "instance" fields are IP address:port (eg: 10.123.5.5:9182).

So this query returns the instances and what looks like % cpu utilization:
  ( 100 - (avg by (instance) (irate(windows_cpu_time_total{mode="idle"}[5m])) * 100) > 95)

And this query gets me the hostnames I'm looking for:
  windows_cs_hostname{hostname=~".*as.*"}

So how do I combine them so that I get the CPU utilization value for ONLY hostnames 
starting with "as" (here again, the instance fields are the same ipaddress:port pairs.

Regards,
Alan

Christian Hoffmann

unread,
Dec 17, 2020, 5:58:21 PM12/17/20
to Alan Miller, Prometheus Users
Hi,

On 2020-12-17 23:06, Alan Miller wrote:
> The problem is that my "instance" fields are IP address:port (eg:
> 10.123.5.5:9182).
The best solution would be to fix exactly this. ;)
https://www.robustperception.io/controlling-the-instance-label

> So this query returns the instances and what looks like % cpu utilization:
>   ( 100 - (avg by (instance)
> (irate(windows_cpu_time_total{mode="idle"}[5m])) * 100) > 95)
>
> And this query gets me the hostnames I'm looking for:
>   windows_cs_hostname{hostname=~".*as.*"}
>
> So how do I combine them so that I get the CPU utilization value for
> ONLY hostnames
> starting with "as" (here again, the instance fields are the same
> ipaddress:port pairs.


Try something like:

( 100 - (avg by (instance)
> (irate(windows_cpu_time_total{mode="idle"}[5m])) * 100) > 95) and
on(instance) windows_cs_hostname{hostname=~".*as.*"}


Kind regards,
Christian

Alan Miller

unread,
Dec 21, 2020, 5:52:21 PM12/21/20
to Prometheus Users
Thank you Christian,

For now, assuming I don't/can't fix the labels, this should give me 
CPU usage > 90% on "non-SQL servers" (by hostname convention)

         +-------------- percentage of time the machine is idle --------------+
         |                                                                    |
         v                                                                    v
 ((100 - (avg by (instance) (irate(wmi_cpu_time_total{mode="idle"}[5m])) * 100)) > 90) and on(instance) wmi_cs_hostname{hostname!~".*SQL.*"}
 ^^                                                                            ^     ^     ^                                               ^
 ||                                                                            |     |     |                                               |
 |+--------------------- percentage of time the machine is not idlea ----------+     |     |                                               |
 |                                                                                   |     |                                               |
 +---------------  percentage of time the machine is above 90% utilization ----------+ and +---- hostname NOT an SQL server ---------------+

Reply all
Reply to author
Forward
0 new messages