my counters start at zero

57 views
Skip to first unread message

Mike Spreitzer

unread,
Jul 29, 2020, 12:25:04 AM7/29/20
to Prometheus Users
Suppose I have a counter metric; let's name it `foo`.  Suppose foo first shows up with a value of 0 in a scrape at time t0, shows up with a value of 10 in a scrape at time t0+10s, and has value 10 in all subsequent scrapes.  What will the PromQL expression `rate(foo[60s])` get me?  I suppose nothing until time t0+60s; some non-zero value from t0+60s to t0+70s; and zero from t0+70s onward.  Is that right?  If not, what will I get?

Now suppose instead that foo first shows up in a scrape at time t0 with a value of 10, and in every scrape after that the value of foo is also 10.  What will `rate(foo[60s])` give me?  If I understand correctly, it will give me nothing until time t0+60s, and from then on it will give me zero.  Have I got this right?  That is a rather disappointing answer.  This counter really did start at zero, and got 10 increments before the first scrape.  It would be gratifying to have a PromQL query that shows this blip of activity.  Can I write a different PromQL query that will get this result?  While retaining all the other smarts of `rate`?

Thanks,
Mike

Brian Candler

unread,
Jul 29, 2020, 3:27:03 AM7/29/20
to Prometheus Users
rate() calculates the rate between the first and last available samples in the given time window, as long as there are at least two samples.

irate() calculates the rate between the last two samples in the given time window.

On Wednesday, 29 July 2020 05:25:04 UTC+1, Mike Spreitzer wrote:
Now suppose instead that foo first shows up in a scrape at time t0 with a value of 10, and in every scrape after that the value of foo is also 10.  What will `rate(foo[60s])` give me?  If I understand correctly, it will give me nothing until time t0+60s, and from then on it will give me zero.  Have I got this right?

It will show a rate of 0 as soon as two values are available, that is, from t0+10s onwards.

If a new counter appears with value 10, it tells you nothing about rate just before the counter appeared.  It maybe that scraping was broken, and the counter had value 10 for the last year.  It could be that the counter had being going 1-2-3-4-5-6-7-8-9-10 at intervals of 10 seconds.  Or at intervals of 1 week.

As a real-world example, it is very common to start polling an SNMP device and find its interface byte counters already at huge values, reflecting how much traffic has been carried in total by that interface since the device was powered on.  It would be completely wrong to have an enormous blip which effectively compresses months or years of traffic into one sample interval.

Mike Spreitzer

unread,
Jul 31, 2020, 2:38:21 AM7/31/20
to Brian Candler, Prometheus Users
I have a specific scenario. I have counters that start at zero when the
scraped process starts; they are counting something that happens in the
scraped process. If a counter first appears with a non-zero value, I know
all those counts happened since the previous scrape. I am not asserting
that `rate()` should be changed for everybody. Is there a PromQL query I
can write that will behave similarly to `rate()` but will recognize that
an initial non-zero count is due to increments since the previous scrape
of the same process (yes, restricted to the situations where the process
has been scraped before)?

Thanks,
Mike

promethe...@googlegroups.com wrote on 07/29/2020 03:27:03 AM:
> --
> You received this message because you are subscribed to the Google
> Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/
> d/msgid/prometheus-users/b9dfe865-3be6-414f-
> b6f9-7e55caa52196o%40googlegroups.com.

Brian Candler

unread,
Jul 31, 2020, 3:05:34 AM7/31/20
to Prometheus Users
On Friday, 31 July 2020 07:38:21 UTC+1, Mike Spreitzer wrote:
Is there a PromQL query I
can write that will behave similarly to `rate()` but will recognize that
an initial non-zero count is due to increments since the previous scrape
of the same process (yes, restricted to the situations where the process
has been scraped before)?

rate(foo[60s]) or min_over_time(foo[5m]) / 10

There are a couple of fundamental issues:

- prometheus only looks back 5 minutes to find a previous value of a timeseries.  You can't distinguish between "this counter has just appeared" and "this counter went away for >5 mins and came back"

- how to assign a timestamp to the zero value.  As I said before: rate() calculates the rate between the first and last available samples in the given time window.  If there are two values, it takes the difference between the values and divides by the difference between the timestamps.  A query like rate(foo[60s]) gives no hint that the data points are being scraped at nominally 10-second intervals.  That's why I have to hard-code "10" in the query above.

But it means the initial rate will almost certainly be wrong.  Consider for example that the process starts with value 0 at time (t-2s) and the value of the counter is 10 at scrape time (t).  The rate will be calculated as if the process started at time (t-10s), so will be 1/5th of the correct value.

More seriously: if the scrape fails for 5 minutes, and then comes back, you will get a stupidly high spike.

Brian Candler

unread,
Jul 31, 2020, 8:00:44 AM7/31/20
to Prometheus Users
If you want to take this to extremes, you could export a metric which is the absolute timestamp when your process started.  As long as the clocks between your processing node and prometheus server are properly synced, then you could use something like

rate(foo[60s]) or (foo / (timestamp(foo) - process_start_time_seconds))

possibly with some label matching ('on' or 'ignoring' etc) if the label set of process_start_time_seconds is not the same as the label set of foo.

In the above, the LHS is the rate over 60 seconds, and the RHS is the average rate since the process started, which only needs a single data point.
Reply all
Reply to author
Forward
0 new messages