How to show delta between all data points in the last 1 minute? (Gauge)

78 views
Skip to first unread message

tomeri

unread,
May 24, 2020, 10:05:36 AM5/24/20
to Prometheus Users
Hi,

I run an application that export metrics about the total number of lines of each file in my directory, basically the product of `wc -l`
So on each interval (every 1m) my app counts the total number of lines and then updates the Gague metric.

For example:
First iteration: 1000 total lines
Second iteration: 1300 total lines
Third iteration: 1900 total lines
Fourth iteration: 2400 total lines
...

Prometheus scraps my app's metric every 15s/25s (depends on the env)
What i want to plot is a graph that will show the rate per minute - how many lines produced in each file for that last 1 minute.

No matter what I tried, I couldn't make the graph to show to correct results.

Julius Volz

unread,
May 24, 2020, 10:39:02 AM5/24/20
to tomeri, Prometheus Users
Since it's a gauge (and at least theoretically line counts can decrease, not only increase), you'll want either the delta() (https://prometheus.io/docs/prometheus/2.18/querying/functions/#delta) or the deriv() (https://prometheus.io/docs/prometheus/2.18/querying/functions/#deriv) function, multiplied by 60 (to get from per-second to per-minute).

Note that both functions can give you non-integer results even if the line numbers only change by integer increments/decrements, as delta() extrapolates the observed slope to the edges of the provided time window, and deriv() does a linear regression to estimate how fast a gauge is going up or down.

Another thing you could do (if you care about integer results) is:

    my_lines_total - my_lines_total offset 1m

...to give you the absolute difference between the last sample value seen 1m ago and the currently last-seen sample value. Note that while this returns you an integer result, it might be further away from the "true" rate due to the lack of extrapolation, because the two samples you will be comparing will not be exactly 1m apart.

This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the addressee you must not use, copy, disclose or take action based on this message or any information herein. 
If you have received this message in error, please advise the sender immediately by reply email and delete this message. Thank you.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/4d971809-72e0-4f36-b0cf-0e9914a2d251%40googlegroups.com.


--
Julius Volz
PromLabs - promlabs.com

tomeri

unread,
May 24, 2020, 11:02:39 AM5/24/20
to Prometheus Users
Hi Julius,

Thanks for the fast response.

Do you think Gauge is the right metric kind for this purpose?
How do I handle resets? i see drops with deriv/delta

I tried both drive and delta with 1m time range multiple by 60 and got wrong results, here is my query:
sum by(datacenter)(delta(files_total_lines_gauge{datacenter="xxx"}[1m])) * 60 > 0

I use >0 to workaround the drops

I basically try to get the same result as I were to use graphite with nonNegativeDerivative
To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

Julius Volz

unread,
May 24, 2020, 1:42:12 PM5/24/20
to tomeri, Prometheus Users
On Sun, May 24, 2020 at 5:02 PM tomeri <tom...@liveperson.com> wrote:
Hi Julius,

Thanks for the fast response.

Do you think Gauge is the right metric kind for this purpose?

A gauge tracks anything that can naturally go up or down. The length of lines in a file can in principle go up or down, so normally that would be a gauge.

If you have files that are basically append-only (like log files), and you want to track the rate at which they grow, then it would be better to think of the metric conceptually not as "how many lines does this file have", but "how many lines have I added to this file" (can only go up, and you count lines as they are being added, not just proxying some third-party line count), which would be more fitting for a counter metric.
 
How do I handle resets? i see drops with deriv/delta

If it's actually a counter (see above) where drops truly indicate a reset, then you will need to use rate() or increase(). In your case "increase(my_lines_total[1m])". This will handle counter resets, but will still have the same extrapolation caveat as I mentioned earlier.
 
I tried both drive and delta with 1m time range multiple by 60 and got wrong results, here is my query:
sum by(datacenter)(delta(files_total_lines_gauge{datacenter="xxx"}[1m])) * 60 > 0

I use >0 to workaround the drops

I basically try to get the same result as I were to use graphite with nonNegativeDerivative

It sounds like the closest thing in PromQL is indeed "increase(foo[1m])", but it sounds like the Graphite function doesn't do the extrapolation bit.
 
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/ca9dc1da-c8c2-4fa7-85d1-ebff2dc393da%40googlegroups.com.

Ben Kochie

unread,
May 24, 2020, 4:38:55 PM5/24/20
to Julius Volz, tomeri, Prometheus Users
And for just counting log lines in files, there are tools like mtail that are very good at this.


tomeri

unread,
May 25, 2020, 7:48:17 AM5/25/20
to Prometheus Users
You right, i care about "how many lines have I added to this file", and yes I'm actually want to check the rate of log files (append-only).
Let's say counter is the best fit here, on the code side how do i get the diff between the current total value to the previous sample? do i need to hold it in my app memory? is it something that Prometheus can solve?

Ben Kochie

unread,
May 25, 2020, 8:14:28 AM5/25/20
to tomeri, Prometheus Users
In your code, you just add deltas to a counter metric. This tracks the local known state of how many lines it's seen. Prometheus tracks this over time, allowing you to use rate() and increase() functions to calculate the deltas over time.

If you're looking to observe log files, I highly recommend mtail. It's a well tested and reliable option for producing metrics about and from log files.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/5fbe9bc7-8b5c-48c7-b4bd-0429dcf011e6%40googlegroups.com.

tomeri

unread,
May 25, 2020, 8:20:18 AM5/25/20
to Prometheus Users
Hi Ben,

I looked into mtail it looks cool but probably doesn't fit my needs
I run a pod (DaemonSet) in k8s that counts the peer-pods (same host) log's producing rate

Reply all
Reply to author
Forward
0 new messages