Re: [prometheus-users] Counter metric resets

25 views
Skip to first unread message

Stuart Clark

unread,
Apr 7, 2022, 10:52:33 AM4/7/22
to Yaron B, Prometheus Users
On 07/04/2022 14:04, Yaron B wrote:
> Hello,
> we have a counter metric that counts each time a pod is doing a
> specific action.
> I need to count how many times the pod (actually sum of all the pods
> from a certain deployment) did the action over 24 hours.
> problem is, the pod is on spot, and when it gets restarted, the
> counter resets, so the metric might be 20 at 1:00, but at 2:00 it
> might be 3, so when I try to do delta, or sum over time, I am getting
> wrong results..
> any ideas how can I get the real delta for the action in a 24 hours range?

Look at using rate() which handles counter resets. If you multiply the
value produced by the time period it is over you would get the number of
actions that occurred. Note that this will only ever be an estimate (for
example you might not scrape a pod before it is destroyed, missing the
detection of some actions) and will most likely not be an integer (due
to the way interpolation happens).

--
Stuart Clark

Julius Volz

unread,
Apr 7, 2022, 12:04:49 PM4/7/22
to Stuart Clark, Yaron B, Prometheus Users
On Thu, Apr 7, 2022 at 4:52 PM Stuart Clark <stuart...@jahingo.com> wrote:
On 07/04/2022 14:04, Yaron B wrote:
> Hello,
> we have a counter metric that counts each time a pod is doing a
> specific action.
> I need to count how many times the pod (actually sum of all the pods
> from a certain deployment) did the action over 24 hours.
> problem is, the pod is on spot, and when it gets restarted, the
> counter resets, so the metric might be 20 at 1:00, but at 2:00 it
> might be 3, so when I try to do delta, or sum over time, I am getting
> wrong results..
> any ideas how can I get the real delta for the action in a 24 hours range?

Look at using rate() which handles counter resets. If you multiply the
value produced by the time period it is over you would get the number of
actions that occurred.

That sounds equivalent to just using increase() - increase() is identical to rate(), except that it does not convert the unit to be per-second, but keeps it per-whatever-time-interval-you-specified.

But yep, with metrics and resets, this is only ever going to be an estimate, and both rate() and increase() do some extrapolation, see also https://promlabs.com/blog/2021/01/29/how-exactly-does-promql-calculate-rates.

Yaron Bialik

unread,
Apr 7, 2022, 12:12:23 PM4/7/22
to Julius Volz, Stuart Clark, Prometheus Users
Thanks! 
Reply all
Reply to author
Forward
0 new messages