Counts from rates

53 views
Skip to first unread message

Daniel Sabsay

unread,
Jun 29, 2023, 2:22:43 AM6/29/23
to Prometheus Users
Is it possible to accurately calculate original counts from pre-recorded rates?
My experiments suggest the answer is no. But I’m curious to get other perspectives to see if I’ve overlooked something or if there is a more effective way to approach this.

My full question, reason for the question, and experiment is here: https://github.com/dsabsay/prom_rates_and_counters/blob/main/README.md

Thanks!

Brian Candler

unread,
Jun 29, 2023, 3:27:43 AM6/29/23
to Prometheus Users
Not in the presence of counter resets, no.


> My full question, reason for the question, and experiment is here: https://github.com/dsabsay/prom_rates_and_counters/blob/main/README.md

"In other words, can one get the equivalent of increase(some_metric[1d]) by using the output of a recording rule like rate(some_metric[5m])?

I think that increase(...) doesn't work in the way you think it does.

increase(...) and rate(...) are the same thing, only differing by a factor of the time window.  That is: increase(some_metric[1d]) is *exactly* the same as rate(some_metric[1d])*86400.

To find the exact difference between a metric now and 1d ago, you can use
some_metric - some_metric offset 1d

However that does not work across counter resets, for obvious reasons.

The underlying requirement you have is, I think, given a collection of recorded rate[5m] values, can you turn this into a rate[1d] ?  I think the avg_over_time() of those rates is the best you can do.  If these are 5m rates, then you'd want 5 minute steps: avg_over_time(    [ xx : 5m] )

But you are testing this over very short time periods (10m) and therefore it's not going to be exact. In particular, rate([5m]) takes the rate between the first and last data points in a 5 minute window. This means that if you are scraping at 1 minute intervals, you're actually calculating a rate over a 4 minute period.

Daniel Sabsay

unread,
Jun 29, 2023, 7:28:34 PM6/29/23
to Prometheus Users
Thanks for the reply.


> The underlying requirement you have is, I think, given a collection of recorded rate[5m] values, can you turn this into a rate[1d] ?

Yes, that’s correct.

Regarding the 5m vs. 4m: I tried to adjust for that by increasing the rate window by one scrape interval (in this case 15s). The result is still quite a ways off:

avg_over_time(rate(requests_total[315s])[10m:5m])*(10*60) => 1820 (expected 2153.84)

I also tried this, to include more “slices” in the subquery. Still off:

avg_over_time(rate(requests_total[75s])[10m:1m])*(10*60) => 2045.45 (expected 2153.84)

I suspect there is some mathematical property I can’t articulate yet that means this won’t ever be 100% accurate, even if we adjust for what promql is doing. I interpreted your last comment as suggesting that the short time period and choice of rate windows was the problem. While those things do affect the result (see above), I can’t find a combination that produces the expected result. Could you elaborate what you meant?

Brian Candler

unread,
Jun 30, 2023, 2:30:32 AM6/30/23
to Prometheus Users
You can break down your query into parts to find out what's happening.

If you go to the PromQL web interface, you can enter a range query like

    requests_total[10m15s]

and you'll see the raw, timestamped data points in that time window. (You must be in the "Table" view rather than the "Graph" view).  Similarly you can do subqueries like

    rate(requests_total[5m15s])[10m:5m])

And finally you can average those values by hand, and compare to running avg_over_time on that expression. 

Of course, set the query evaluation time to be a fixed point in time, so they all align.

What I'd expect to see is:
- 41 data points in the first range query: let's call them x0 to x40 (from oldest to newest)
- a rate calculated between x0 and x20, which I'd expect equals (x20-x0)/300 in the absence of counter resets
- a rate calculated between x20 and x40, which I'd expect equals (x40-x20)/300

If you average these two rates you should get ((x20-x0)/300 + (x40-x20)/300)/2 = (x40-x0)/600 which is the rate you're looking for

By doing each of the steps by hand, you should be able to work out where your assumptions are falling down.
Reply all
Reply to author
Forward
0 new messages