Function for detecting a change in data

51 views
Skip to first unread message

iono sphere

unread,
Oct 27, 2020, 11:14:43 AM10/27/20
to Prometheus Users
Currently I have a metric that simply change over time by +1 every once in a while. Let's call it Parcel. Now, I am using idelta function to see the difference in time so that if this is on a graph, it will look like a spike up. The reason for this is that I would like to use Grafana to alert me whenever Parcel got a change.

On Grafana, this is what I got right now:


The yellow line is Parcel, and it goes from 1 to 2 in here correctly, but the green line (a little hard to see here as it intersects with Grafana's red Alert line at y = 0) stays at 0 and didn't spike up. The metric used for the green line is idelta(parcel[20s]) .

My prometheus.yml has the following scrape interval:
  scrape_interval: 5s
  evaluation_interval: 5s

So what am I doing wrong here?

Iono

Brian Candler

unread,
Oct 27, 2020, 11:39:53 AM10/27/20
to Prometheus Users
(You posted a blank image!)

idelta(metric[period]) only gives the difference between the last two points in a range.  Suppose the time period covered has the values (0, 0, 1, 1).  The last two values are (1, 1) and idelta will give zero.

You say you're scraping data at 5 second intervals, and you're doing idelta(parcel[20s]).  That looks at the last two points in a 20 second range

Whether or not you see any blip on your graph depends on the graph resolution.  Say you zoom out in your graphing frontend, such that the X axis only shows one point every 10 seconds.  It's quite possible you'll skip over the increase. It might see:

(0, 0, 0, 0)   at time 0
(0, 0, 1, 1)   at time 10 seconds
(1, 1, 1, 1)   at time 20 seconds

and hence the idelta value for all three points will be zero.  You should use delta not idelta, with a time window big enough to cover at least two data points.

Note: since this is a counter, you should be using "increase", "irate" or "rate", not "(i)delta"

- "rate" calculates the per-second increase, between the first and last data points in the time window (but skipping ranges where the counter resets, i.e. goes down).  This is the normal one to use.
- "irate" gives the rate between the last two data points in the time window.  It suffers from the same problem you saw with idelta.
- "increase" is "rate" multiplied by the time window - instead of "units per second" you get "units per 5 minutes" or whatever

iono sphere

unread,
Oct 28, 2020, 10:07:20 AM10/28/20
to Prometheus Users
(Blank image, indeed! )

I see. I have tried playing around with rate and increase some more. For a simple detection, I could make the app alert if the rate of Parcel is > 0, and I do get Grafana Alerts for that. Thank you very much.

But one more thing though, rate and increase give me the per-second increase and "units per x mins" which are decimal values < 1. Is it possible to make it so we know the exact changed value for each scraped point? For example, suppose I have data coming in like this every 5 seconds:

[ 10, 20, 10, 23, 12 ,5 , 1050, 1200, 1100, 1050, 1030, 20] . 

Is it possible to ask Prometheus to turn this above into this below exactly?

[ 0, 10, -10, 13, -11, -7, 1045, 150, -100, -50, -20, -1010 ]

This is so that I could do an alert if the incoming new scraped value is increased by x amount exactly. 

Brian Candler

unread,
Oct 29, 2020, 6:43:14 AM10/29/20
to Prometheus Users
You can do:

foo - foo offset 5s

to compare foo with the value it had 5 seconds ago.  Note that if the scrapes are not *exactly* 5 seconds apart, there's a small risk that this will cover 2 scrapes, or no scrapes.

The reason rate() and increase() don't do this is so that they handle rate calculation properly and counter resets properly.

In your case, you have a scrape interval of 5s and are looking at a window of 5s.  The window will look like this:

[..10....15....20....23..]
    <--- 15 seconds -->

The time interval between the first and last data points is 15 seconds.  rate() uses the first and last data points, so it calculates the rate as (23-10)/15 = 0.867  (units per second)

increase() scales this to the whole 20 second period, so it calculates 0.867*20 = 17.33.  It doesn't actually see the values at the start and end of the window, but it assumes the rate stays the same, so interpolates the expected increase over that period.  As you can see, this can give a non-integer value.

If the counter resets it's a bit more complicated:

[..10....15....10....23..]
   <--A--> <-B-> <-C->

We have no idea what happened in time period B; the counter reset.  It could have gone 15-->100-->0-->10 or 15-->16-->0-->10.  So that period is excluded.

This means you have an increase of 5 for period A, and an increase of 13 for period C, giving a total known counter increase of 18, over a total time period of 10 seconds, giving a rate of 1.8.

Reply all
Reply to author
Forward
0 new messages