Query for composite metric

253 views
Skip to first unread message

Igor Barsukov

unread,
May 31, 2018, 8:08:26 AM5/31/18
to Prometheus Users
I'm trying to create composite metric about common health status of microservice. 
I want to calculate it based on 3 other metrics - cpu usage, memory usage, heap usage by service in %. 
Basically, if cpu or memory or heap usage of service > 50% -> I need to show 'Warning' health status (on Grafana dashboard);
if cpu or memory or heap usage of service > 80% -> I need to show 'Critical' health status.

I've decided to use Prometheus recording rules to implement it. 
I've created next rules to calculate basic metrics - 

service:cpu_usage:percent, 
service:memory_usage:percent, 
service:heap_usage:percent

and based on it I've created next rules:
service:health_warning:bool = ((service:cpu_usage:percent > bool 50 and service:cpu_usage:percent < 80) or (service:memory_usage:percent > bool 50 and service:memory_usage:percent < bool 80) or (service:heap_usage:percent > bool 50 and service:heap_usage:percent < bool 80))
service:health_critical:bool = (service:cpu_usage:percent > bool 80 or service:memory_usage:percent > bool 80 or service:heap_usage:percent > bool 80)

But I can't come up with final solution - how could I combine all results in single recording rule?
It would be great if I could calculate 'health_critical' and 'health_warning' in numeric representation , e.g. 'health_warning' take values 0 (corresponds to 'not warning') and 1 (corresponds to 'warning') , and 'health_critical' - 0 (corresponds to 'not critical') and 2 (corresponds to 'critical'). And then I would simply summarize 'health_warning' and 'health_critical' and get suitable result. 

But I'm not sure is it posible to implement my idea? Or may be I've choiced the wrong way and my task could be implemented differently?   

Brian Brazil

unread,
May 31, 2018, 8:18:30 AM5/31/18
to Igor Barsukov, Prometheus Users
On 31 May 2018 at 13:08, Igor Barsukov <igor.s....@gmail.com> wrote:
I'm trying to create composite metric about common health status of microservice. 
I want to calculate it based on 3 other metrics - cpu usage, memory usage, heap usage by service in %. 
Basically, if cpu or memory or heap usage of service > 50% -> I need to show 'Warning' health status (on Grafana dashboard);
if cpu or memory or heap usage of service > 80% -> I need to show 'Critical' health status.

I've decided to use Prometheus recording rules to implement it. 
I've created next rules to calculate basic metrics - 

service:cpu_usage:percent, 
service:memory_usage:percent, 
service:heap_usage:percent

and based on it I've created next rules:
service:health_warning:bool = ((service:cpu_usage:percent > bool 50 and service:cpu_usage:percent < 80) or (service:memory_usage:percent > bool 50 and service:memory_usage:percent < bool 80) or (service:heap_usage:percent > bool 50 and service:heap_usage:percent < bool 80))
service:health_critical:bool = (service:cpu_usage:percent > bool 80 or service:memory_usage:percent > bool 80 or service:heap_usage:percent > bool 80)

But I can't come up with final solution - how could I combine all results in single recording rule?

You're most of the way there, https://www.robustperception.io/booleans-logic-and-math/ covers how to do a boolean or via "a + b > bool 0".

Brian
 
It would be great if I could calculate 'health_critical' and 'health_warning' in numeric representation , e.g. 'health_warning' take values 0 (corresponds to 'not warning') and 1 (corresponds to 'warning') , and 'health_critical' - 0 (corresponds to 'not critical') and 2 (corresponds to 'critical'). And then I would simply summarize 'health_warning' and 'health_critical' and get suitable result. 

But I'm not sure is it posible to implement my idea? Or may be I've choiced the wrong way and my task could be implemented differently?   

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/78a4d2ed-a2af-4de8-91bc-a5f469a2750a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Igor Barsukov

unread,
May 31, 2018, 9:09:24 AM5/31/18
to Prometheus Users
Brian, thanks for reply.
I familiar with this page and I've thought in that direction.
But I guess I can't solve the problem based only on boolean algebra. Because of my concluding metric should takes 3 possible values (like 0 for 'ok' status, 1 for 'warning', 2  for 'critical'), not 2.
Maybe some other suggestions?
Thank you in advance. 


четверг, 31 мая 2018 г., 15:18:30 UTC+3 пользователь Brian Brazil написал:
On 31 May 2018 at 13:08, Igor Barsukov <igor.s....@gmail.com> wrote:
I'm trying to create composite metric about common health status of microservice. 
I want to calculate it based on 3 other metrics - cpu usage, memory usage, heap usage by service in %. 
Basically, if cpu or memory or heap usage of service > 50% -> I need to show 'Warning' health status (on Grafana dashboard);
if cpu or memory or heap usage of service > 80% -> I need to show 'Critical' health status.

I've decided to use Prometheus recording rules to implement it. 
I've created next rules to calculate basic metrics - 

service:cpu_usage:percent, 
service:memory_usage:percent, 
service:heap_usage:percent

and based on it I've created next rules:
service:health_warning:bool = ((service:cpu_usage:percent > bool 50 and service:cpu_usage:percent < 80) or (service:memory_usage:percent > bool 50 and service:memory_usage:percent < bool 80) or (service:heap_usage:percent > bool 50 and service:heap_usage:percent < bool 80))
service:health_critical:bool = (service:cpu_usage:percent > bool 80 or service:memory_usage:percent > bool 80 or service:heap_usage:percent > bool 80)

But I can't come up with final solution - how could I combine all results in single recording rule?

You're most of the way there, https://www.robustperception.io/booleans-logic-and-math/ covers how to do a boolean or via "a + b > bool 0".

Brian
 
It would be great if I could calculate 'health_critical' and 'health_warning' in numeric representation , e.g. 'health_warning' take values 0 (corresponds to 'not warning') and 1 (corresponds to 'warning') , and 'health_critical' - 0 (corresponds to 'not critical') and 2 (corresponds to 'critical'). And then I would simply summarize 'health_warning' and 'health_critical' and get suitable result. 

But I'm not sure is it posible to implement my idea? Or may be I've choiced the wrong way and my task could be implemented differently?   

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.



--

Brian Brazil

unread,
May 31, 2018, 9:11:20 AM5/31/18
to Igor Barsukov, Prometheus Users
On 31 May 2018 at 14:09, Igor Barsukov <igor.s....@gmail.com> wrote:
Brian, thanks for reply.
I familiar with this page and I've thought in that direction.
But I guess I can't solve the problem based only on boolean algebra. Because of my concluding metric should takes 3 possible values (like 0 for 'ok' status, 1 for 'warning', 2  for 'critical'), not 2.
Maybe some other suggestions?

A tri-state value is a bit hard to work with in Prometheus, but you could do something like: (critical == 1) * 2 or warning 
where critical and warning were bools.

Brian
 
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/686b6000-8e0d-4362-b6c7-b3092ec6206a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--

Igor Barsukov

unread,
May 31, 2018, 3:18:20 PM5/31/18
to Prometheus Users
Thank you, that really helped!

четверг, 31 мая 2018 г., 16:11:20 UTC+3 пользователь Brian Brazil написал:



--
Reply all
Reply to author
Forward
0 new messages