Simple increase - sum of the metrics in time range with dynamic metric count

3,941 views
Skip to first unread message

Michał Idzikowski

unread,
Sep 22, 2023, 9:32:19 AM9/22/23
to Prometheus Users
Hello!
I'm fightning hard to get corect result. The problem is - I need to sum data processed by service. Metrics are counters, instances are sometimes replaced by newer ones, so they don't outlive time range most of the time.

I've tried multiple combinations of increase, sum_over_time, sum(increase), increase(sum) and even tried on VictoriaMetrics. Each time I got a result were the final sum is dropping in multiple places - and as you may image - there's is no un-processing of the data.


metryki_acc1.jpg
metryki_acc2.jpg
metryki_acc.jpg

Brian Candler

unread,
Sep 22, 2023, 9:58:07 AM9/22/23
to Prometheus Users
You haven't shown any examples of the metrics, nor the queries relating to each of those graphs.

Speaking generally though, given that counters can reset, I think the best you can do is to use increase(foo[time]) which will give you an *estimate* of the amount the counter has increased over the given time window (it calculates the rate, skipping over counter resets, and scales it up to cover the whole window period. This may give a non-integer result).  You should then be able to sum() over that: sum(increase(foo[time])).

Note that it will give you the amount of processing done in that specified time window, not since you started monitoring.

You said you tried sum(increase), and maybe it's one of the graphs you showed.  Suppose you made a graph of sum(increase(foo[24h])); then each point on that graph represents the amount of work done *in the 24 hours up to that point*. This value will of course go up and down, since the amount of work done in any given 1 day period may go up and down.

You can't possibly know the total amount of work done since the beginning of time, if the counters arbitrarily reset.  You would have to have to create a persistent, non-resetting counter.

increase(sum) is wrong because it can't handle the counter resets properly: see https://www.robustperception.io/rate-then-sum-never-sum-then-rate (rate and increase are essentially the same function, except increase scales its output by the width of the window)

sum_over_time is very wrong: it would add together all the values within the time window.

Michał Idzikowski

unread,
Sep 24, 2023, 1:43:52 PM9/24/23
to Prometheus Users
raw_data - processed_bytes
increase - increase(processed_bytes)
sum increase - sum(increase(processed_bytes))
sum increase range - sum(increase(processed_bytes[$__range]))

What I want to see is ~530G at the end, starting from 0. I need to manually check, because maybe this graph/metric (sum increase range) has correct final value, just misleading shape falling down few times.
graphs1.jpg
graph2.jpg

Aliaksandr Valialkin

unread,
Sep 26, 2023, 8:22:33 AM9/26/23
to Michał Idzikowski, Prometheus Users


вс, 24 сент. 2023 г., 19:43 Michał Idzikowski <creati...@gmail.com>:
raw_data - processed_bytes
increase - increase(processed_bytes)
sum increase - sum(increase(processed_bytes))
sum increase range - sum(increase(processed_bytes[$__range]))

What I want to see is ~530G at the end, starting from 0.

In VictoriaMetrics you can use the following MetricsQL query for bulilding summary increase graph over multiple time series of counter type, which starts from zero on the left side:

running_sum(sum(increase(metric_name)))

Note that you don't need specifying lookbehind window in square brackets at increase(...), since VictoriaMetrics automatically sets it to the interval between points shown on the graph (aka step query arg automatically passed by Grafana to /api/v1/query_range - see https://prometheus.io/docs/prometheus/latest/querying/api/#range-queries ).

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/99452267-8979-4ca9-9c3c-1b5d6b496401n%40googlegroups.com.

Colin Kelley

unread,
Oct 12, 2023, 5:00:54 AM10/12/23
to Prometheus Users
Hello,

We have run into similar confusion and surprising result with Prometheus `rate` and `increase` and have a proposal that addresses those which I just submitted here: https://github.com/prometheus/prometheus/issues/12967

We have been running a fork that implements that proposal for about 6 months with great results.

-Colin

Reply all
Reply to author
Forward
0 new messages