Does not understand the result of Promql expression

69 views
Skip to first unread message

Rico

unread,
Apr 16, 2020, 4:30:38 AM4/16/20
to Prometheus Users
Hello
I have a question about a query. When I made this PromQL expression :
avg(irate(http_server_requests_duration_seconds_sum{application="database-api", namespace="front-search-engine",path!="/ping",path!="/health",path!="/metrics",path!="/healthcheck",status="200"}[1m]) > 0
/
irate(http_server_requests_duration_seconds_count{application="database-api", namespace="front-search-engine",path!="/ping",path!="/health",path!="/metrics",path!="/healthcheck",status="200"}[1m]) > 0)
I have this result:
0.6882829615020314
And when I removed the > 0 from the dividend I have a different result and I cannot undestand why...
Expression :
avg(irate(http_server_requests_duration_seconds_sum{application="database-api", namespace="front-search-engine",path!="/ping",path!="/health",path!="/metrics",path!="/healthcheck",status="200"}[1m])
/
irate(http_server_requests_duration_seconds_count{application="database-api", namespace="front-search-engine",path!="/ping",path!="/health",path!="/metrics",path!="/healthcheck",status="200"}[1m]) > 0)
Result :
0.025240713195708097
Does anyone here know why ? Thanks for your support :+1:

Rico

unread,
Apr 16, 2020, 6:04:57 AM4/16/20
to Prometheus Users
We have found the source of the problem.
The PromQL query is wrong because the dividend is not interpreted by the division.

Exemple:

For this query :
irate(http_server_requests_duration_seconds_sum{application="database-api", namespace="front-search-engine",path!="/ping",path!="/health",path!="/metrics",path!="/healthcheck",status="200"}[1m]) 
> 0 / irate(http_server_requests_duration_seconds_count{application="database-api", namespace="front-search-engine",path!="/ping",path!="/health",path!="/metrics",path!="/healthcheck",status="200"}[1m]) > 0
PromQL split this query in to part
First part:
0 / irate(http_server_requests_duration_seconds_count{application="database-api", namespace="front-search-engine",path!="/ping",path!="/health",path!="/metrics",path!="/healthcheck",status="200"}[1m]) > 0
Result : 0 # because 0/xxxx always return 0
And the next step is :
irate(http_server_requests_duration_seconds_sum{application="database-api", namespace="front-search-engine",path!="/ping",path!="/health",path!="/metrics",path!="/healthcheck",status="200"}[1m]) 
> 0
The result is just the value of the irate greater than 0.
So the right PromQL query must be
(irate(http_server_requests_duration_seconds_sum{application="database-api", namespace="front-search-engine",path!="/ping",path!="/health",path!="/metrics",path!="/healthcheck",status="200"}[1m]) 
> 0) / (irate(http_server_requests_duration_seconds_count{application="database-api", namespace="front-search-engine",path!="/ping",path!="/health",path!="/metrics",path!="/healthcheck",status="200"}[1m]) > 0)
And then the result is OK :beers:
So for the first query I have posted, must look likes this :
avg((irate(http_server_requests_duration_seconds_sum{application="database-api", namespace="front-search-engine",path!="/ping",path!="/health",path!="/metrics",path!="/healthcheck",status="200"}[1m]) > 0)
/ (irate(http_server_requests_duration_seconds_count{application="database-api", namespace="front-search-engine",path!="/ping",path!="/health",path!="/metrics",path!="/healthcheck",status="200"}[1m]) > 0))
Best regards

Julius Volz

unread,
Apr 16, 2020, 7:25:05 AM4/16/20
to Rico, Prometheus Users
On Thu, Apr 16, 2020 at 12:05 PM Rico <AYMERI...@gmail.com> wrote:
So for the first query I have posted, must look likes this :
avg((irate(http_server_requests_duration_seconds_sum{application="database-api", namespace="front-search-engine",path!="/ping",path!="/health",path!="/metrics",path!="/healthcheck",status="200"}[1m]) > 0)
/ (irate(http_server_requests_duration_seconds_count{application="database-api", namespace="front-search-engine",path!="/ping",path!="/health",path!="/metrics",path!="/healthcheck",status="200"}[1m]) > 0))

One word of caution here: if I see this correctly, you are taking the average of multiple averages, which is usually incorrect: https://math.stackexchange.com/questions/95909/why-is-an-average-of-an-average-usually-incorrect

If your goal is to calculate the overall average latency of your service, you will want to just sum up over all sums and counts before doing the division (and then you can omit the outer avg()). You probably don't need the > 0 bit either then, so the whole expression would just become:

  sum(irate(xxx_sum{...}[1m]))
/
  sum(irate(xxx_count{...}[1m]))

Btw., you can also abbreviate:

    path!="/ping",path!="/health",path!="/metrics",path!="/healthcheck"

...to just:

    path!~"/(ping|metrics|healthcheck)"

Rico

unread,
Apr 16, 2020, 9:37:11 AM4/16/20
to Prometheus Users
Thank for your quick answer and for the tips and tricks ! :)
Reply all
Reply to author
Forward
0 new messages