sum_over_time from a query where the result is a bool function.

Luciano Polo

unread,

Apr 23, 2021, 11:17:42 AM4/23/21

to Prometheus Users

HI,

I am not able to make this query to work precise enough. Hopefully, someone can help me out.

This Query works fine. It returns 1 if the result is less than 98
100*sum(http_service_api{apiCode="0"}) by (method) / ignoring (apiCode) group_left() sum(http_service_api{} ) by (method) < bool 98

This is the part I am having issues with. I need to add all ones within a period of time where the result was lower than 98.
So the query is:

sum_over_time(sum(100*sum(http_service_api{apiCode="0"}) by (method) / ignoring (apiCode) group_left() sum(http_service_api{} ) by (method) < bool 98) by (method)[1d:1m])

I should expect the number to increase every time the result is less than 98, I this but I also see it decreasing. I have not been able to figure out why?

Any idea is appreciated.

Thanks in advanced

Julius Volz

unread,

Apr 23, 2021, 12:00:34 PM4/23/21

to Luciano Polo, Prometheus Users

Hi,

There are at least a couple of things off in that query:

* http_service_api looks like it's probably a counter metric (although it doesn't have a "_total" suffix), so you need to apply rate() to it first, before applying a sum(). Otherwise you are looking at the absolute value of the counter, which has accumulated over a long time, which doesn't tell you much.

* Due to your "by(method)" your output labels are just "method" on both sides of the binop, so you don't need the "ignoring()" and also not the "group_left()".

Maybe you mean something closer to this? https://demo.promlens.com/?l=14PEjXeJocK (changed the metric and label names to ones that have data in the demo setup)

sum_over_time(sum by(method) (100*sum by (method) (rate(demo_api_request_duration_seconds_count{status="200"}[1m])) / sum by (method) (rate(demo_api_request_duration_seconds_count[1m])) < bool 98)[1d:1m])

I guess in the end it should give you something close to "how many minutes per method over the last day did each method have a percentage of <98 of statusCode="0"?

Regards,

Julius

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/9decf191-cf6d-4194-a092-d714e2cfdab5n%40googlegroups.com.

--

Julius Volz

PromLabs - promlabs.com

Luciano Polo

unread,

Apr 23, 2021, 12:35:48 PM4/23/21

to Prometheus Users

Thanks for your observations.
I tried the approach you suggested but I am not getting the expected result which is an incremental counter. I see the results incrementing when is less than 98 but I also see decreasing result sometimes when is not less than 98. It does not make any sense to me.
So I tried a work around with a recording rule to register the times when the result is less than 98. The recording rule comes from this query:

record: service:http_service:availability:1m

expr:

sum by(method) (100*sum by (method) (rate(demo_api_request_duration_seconds_count{status="200"}[1m])) / sum by (method) (rate(demo_api_request_duration_seconds_count[1m])) < bool 98

It works fine. Then , I just do:
sum_over_time(service:agent_dispatcher_api:availability:1m[1d:1m])

I solved my problem from this perspective. But I am still not able to figure out why I am getting decreasing results from the query in question.

Thanks
Lp

Reply all

Reply to author

Forward