Need a query function which returns an actual increase in counter values within a time interval and handles counter reset cases

75 views
Skip to first unread message

anantha sai ram

unread,
Oct 18, 2022, 1:53:47 AM10/18/22
to Prometheus Users

Is there a functionality using which we can get the difference in the values of counter samples within the time interval, while handling counter reset case and excluding the extrapolate functionality?

We tried using the increase() function, but it returns an extrapolated result. We have observed that the difference between the actual increase and the extrapolated result is considerably high.

Example:
If we are calculating the increase in a metric "node_disk_read_bytes_total" every 5 mins, with prometheus scrape interval set as 1 min:

  • Consider the following sample values for the metric "node_disk_read_bytes_total" within a 5 mins interval :
    [23758450955264 (F), 23758499419136, 23758518625280, 23758519292928, 23758519870464 (L)]

Result of the increase function:

  • Extrapolated value returned by increase function over 5 mins: 86144000

Our requirement is a function which:

  • Handles the counters resets similar to increase function
  • And returns the actual difference b/w the values of the first sample(F) and Last sample(L) within the specified interval : 68915200
Thanks!
Message has been deleted

Brian Candler

unread,
Oct 18, 2022, 5:21:10 AM10/18/22
to Prometheus Users
The closest would be something like this:

    node_disk_read_bytes_total - node_disk_read_bytes_total offset 5m

The difference is that "node_disk_read_bytes_total offset 5m" will look at the nearest data point *at or more than* than 5 minutes ago (by default looking up to a further 5 minutes back, i.e. between 5 minutes and up to 10 minutes ago), rather than the first data point *within* the 5 minute range.  (Oddly, there is last_over_time() but not first_over_time() for a range vector).

You can avoid returning any result when there's a counter reset.  This is the cheap and approximate way:

    (node_disk_read_bytes_total - node_disk_read_bytes_total offset 5m) >= 0

A more robust expression:

    (node_disk_read_bytes_total - node_disk_read_bytes_total offset 5m) and resets(node_disk_read_bytes_total[5m]) == 0

Or combine both, for belt-and-braces.

    (node_disk_read_bytes_total - node_disk_read_bytes_total offset 5m) >= 0 and resets(node_disk_read_bytes_total[5m]) == 0

> Is there a functionality using which we can get the difference in the values of counter samples within the time interval, while handling counter reset case and excluding the extrapolate functionality?


No, and it's unclear what it would do anyway.  Suppose you had the following points in the time range:

    [   10,  25,  40,   5,  15  ]
       @t0  @t1  @t2  @t3  @t4


What result would you expect it to return? Obviously there was a counter reset between 40 and 5, but you have no idea how much the counter changed (e.g. it might have gone from 40 to 45 before resetting).  This invalidates the total sum over the range. Ignoring this (i.e. adding no value) would give 40-10 plus 15-5 = 40, which would imply this is how much the counter increased, when in fact it almost certainly increased by more.  Even if you assume that the counter reset all the way to zero, (40-10) + (15-0) = 45 is still also a value lower than the truth.  Either way, the returned value says that the counter increased by X, when in fact that is just a lower bound for how much the counter increased in that period.

I believe that the rate() and increase() functions would calculate the total known increase (40-10) + (15-5), and divide by the the corresponding total time intervals (t2-t0) + (t4-t3).  That is, you get the average rate over the periods where data is available, ignoring those periods where it is not. This eliminates most of the uncertainty; you're still assuming that the average rate during the counter reset period would not significantly alter the average rate over the whole period, but that's the best you can do.

Reply all
Reply to author
Forward
0 new messages