PromQL: understanding the and operator

63 views
Skip to first unread message

Puneet Singh

unread,
Feb 22, 2024, 2:28:52 PM2/22/24
to Prometheus Users
Hi All, 
I have a metric called go_service_status where  i use the "sum without" operator to determine whether a service is up or down on a server. Now there can be a situation where service can be down simultaneously on 2 master servers and I am unable to figure out a PromQL query to detect that situation. Example -  

go_service_status{SERVICETYPE="grade1",SERVER_CATEGORY="db1",instance=~"server1:7878"}
and it can have 2 possible series -
go_service_status{HOSTNAME="server1", SERVER_CATEGORY="db1", SERVICETYPE="grade1", USER="admin", instance="server1:7878", job="customprocessexporter01"} 0
go_service_status{HOSTNAME="server1", SERVER_CATEGORY="db1", SERVICETYPE="grade1", USER="root", instance="server1:7878", job="customprocessexporter01"} 1

and in the same way
go_service_status{SERVICETYPE="grade1",SERVER_CATEGORY="db1",instance=~"server2:7878"}
and it can have 2 possible series -
go_service_status{HOSTNAME="server2", SERVER_CATEGORY="db1", SERVICETYPE="grade1", USER="admin", instance="server2:7878", job="customprocessexporter01"} 0
go_service_status{HOSTNAME="server2", SERVER_CATEGORY="db1", SERVICETYPE="grade1", USER="root", instance="server2:7878", job="customprocessexporter01"} 0  


Here;s the query using which i figure out status of the service on server1.  Example - 

(sum without (USER) (go_service_status{HOSTNAME="server1",SERVER_CATEGORY="db1",SERVICETYPE="grade1"}) < 1)Untitled.png

so the server1's service is momentarily 0


and server2's service is always down , example - 
(sum without (USER) (go_lsf_service_status{HOSTNAME="server2",SERVER_CATEGORY="db1",SERVICETYPE="grade1"}) < 1)Untitled.png


Now i tried to find the time duration where both these service were simultaneously down / 0 on both server1 and server2 :
(sum without (USER) (go_service_status{HOSTNAME="server1",SERVER_CATEGORY="db1",SERVICETYPE="grade1"}) < 1) and (sum without (USER) (go_service_status{HOSTNAME="server2",SERVER_CATEGORY="db1",SERVICETYPE="grade1"}) < 1)


I was expecting a graph similar to the once for server2 , but i got :
Untitled.png

I think i need to ignore the HOSTNAME label , but unable to figure out the way to ignore the HOSTNAME label in combination with sum without clause.

Any help/hint to improve this query will be very useful for me to understand the and condition in context of sum without  clause.

Thanks,
Puneet

Puneet Singh

unread,
Feb 22, 2024, 2:30:55 PM2/22/24
to Prometheus Users
*Correction: I was expecting a graph similar to the once for server2 , but i got :
should be - I was expecting a graph similar to the server1 , but i got :

Puneet Singh

unread,
Feb 22, 2024, 3:58:08 PM2/22/24
to Prometheus Users
okay, So I think should this be the correct way to perform the and operation ? - 
(sum without (USER, HOSTNAME ,instance ) (go_service_status{HOSTNAME="server1",SERVER_CATEGORY="db1",SERVICETYPE="grade1"}) < 1) and (sum without ( USER, HOSTNAME ,instance  ) (go_service_status{HOSTNAME="server2",SERVER_CATEGORY="db1",SERVICETYPE="grade1"}) < 1)

Regards
P


On Friday 23 February 2024 at 00:58:52 UTC+5:30 Puneet Singh wrote:

Alexander Wilke

unread,
Feb 23, 2024, 11:45:28 AM2/23/24
to Prometheus Users
In Grafana i create query A and Query B and then an Expression C with "Math" and then I can compare Like $A > 0 && B > 0.
Maybe there is "Transform Data" and then a calcukation Option.

Alexander Wilke

unread,
Feb 23, 2024, 1:00:57 PM2/23/24
to Prometheus Users
Another possibility could be

QueryA + queryB == 0  #both down

Or the other way
QueryA + querxB == 2 # both up



Brian Candler

unread,
Feb 23, 2024, 8:52:03 PM2/23/24
to Prometheus Users
On Friday 23 February 2024 at 02:28:52 UTC+7 Puneet Singh wrote:
Now i tried to find the time duration where both these service were simultaneously down / 0 on both server1 and server2 :
(sum without (USER) (go_service_status{HOSTNAME="server1",SERVER_CATEGORY="db1",SERVICETYPE="grade1"}) < 1) and (sum without (USER) (go_service_status{HOSTNAME="server2",SERVER_CATEGORY="db1",SERVICETYPE="grade1"}) < 1)


I was expecting a graph similar to the once for server2 , but i got :
Untitled.png

I think i need to ignore the HOSTNAME label , but unable to figure out the way to ignore the HOSTNAME label in combination with sum without clause.

You've got exactly the right idea.  It's not the "sum without" that needs modifying, it's the "and"

(....) and ignoring (hostname) (....)


In this particular example, there are other ways to do this which might end up with a more compact expression. You could have an outer sum over the inner sums, but then I think the whole expression simplifies to just

sum without (USER) (go_service_status{HOSTNAME=~"server1|server2",SERVER_CATEGORY="db1",SERVICETYPE="grade1"}) < 1

Brian Candler

unread,
Feb 23, 2024, 10:19:50 PM2/23/24
to Prometheus Users
On Saturday 24 February 2024 at 01:00:57 UTC+7 Alexander Wilke wrote:
Another possibility could be

QueryA + queryB == 0  #both down

No, that doesn't work, for exactly the same reason that "QueryA and QueryB" doesn't work.

With a binary expression like "foo + bar", each side is a vector, and each element of the vector has a different label set.

The result only combines values from the left and right hand sides with *exactly* matching label sets.  Therefore, an element in the LHS with {HOSTNAME="server1"} does not match an element in the RHS with {hostname="server2"}.  Elements in the LHS which don't match any element in the RHS (and vice versa) are dropped.

But you can modify that logic, using for example "foo + ignoring(HOSTNAME) bar"

In this case, the HOSTNAME label is ignored when matching the LHS and RHS. But if an element on the LHS then matches multiple on the RHS, or vice versa, there will be an error.  N:1 or 1:N matches can be made to work by adding group_left or group_right clauses. If multiple elements on LHS match multiple elements on the RHS, then that doesn't work.
Reply all
Reply to author
Forward
0 new messages