Efficient Alert rule for Specific and Common threshold for over 1000 instances ?

30 views
Skip to first unread message

swe...@gmail.com

unread,
Sep 9, 2020, 9:59:33 AM9/9/20
to Prometheus Users
Hi, 

I tried to find proper soluion in this group but I still can't find possible one.

I am monitoring over 1000 AWS instances.
and I have 3 Alert rules for CPU, Memory and Disk.

There is common threshold for all instances something like,
CPU : 0.7, Memory : 0.7, Disk: 0.7
-> e.g. For Common (All instances)
- alert: CPU
  expr: CPU > 0.7
  ...
- alert: MEM
  expr: MEM > 0.7
  ...

Now, I need to add different threshold for specific instances which it want to use different threshold for CPU.
- alert: CPU
  expr: CPU{instance="AAA"} > 0.8

Simplest approach like below, But If there are more instances with specific case, file will be complexed.
# For specific one
- alert: CPU
  expr: CPU{instance="AAA"} > 0.8
- alert: CPU
  expr: CPU{instance="BBB"} > 0.9
# For common
- alert: CPU
  expr: CPU{instance!~"AAA|BBB} > 0.7
- alert: MEM
  expr: MEM > 0.7

In this case, Isn't there more efficient way to make rules for specific and common.

Thanks for your answer in advance.

Brian Candler

unread,
Sep 9, 2020, 1:26:13 PM9/9/20
to Prometheus Users

Ben Kochie

unread,
Sep 10, 2020, 3:40:02 AM9/10/20
to swe...@gmail.com, Prometheus Users

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/0cfd686c-106a-4941-b99b-7dc3dec7ba48n%40googlegroups.com.

swe...@gmail.com

unread,
Sep 10, 2020, 9:06:48 AM9/10/20
to Prometheus Users
thanks for your reply. It's a little bit understand but I will read it deeply.

2020년 9월 10일 목요일 오전 2시 26분 13초 UTC+9에 b.ca...@pobox.com님이 작성:

swe...@gmail.com

unread,
Sep 10, 2020, 9:08:12 AM9/10/20
to Prometheus Users
wow, It's helpful information for me. thanks for your reply!!

2020년 9월 10일 목요일 오후 4시 40분 2초 UTC+9에 sup...@gmail.com님이 작성:

swe...@gmail.com

unread,
Sep 11, 2020, 6:35:01 AM9/11/20
to Prometheus Users
Yes. I checked that post and this is absolutely what I want.
I can't understand 100% of promql, but It works what I want.

thanks a lot!!!

2020년 9월 10일 목요일 오전 2시 26분 13초 UTC+9에 b.ca...@pobox.com님이 작성:
Reply all
Reply to author
Forward
0 new messages