Add all values into a single value

40 views
Skip to first unread message

Gergely Brautigam

unread,
Aug 20, 2020, 9:40:33 AM8/20/20
to Prometheus Users
Hello everyone.

I was wondering if I could get some help on a query I'm building.

I have a dashboard, which shows server failure restarts over a period of time defined by Grafana's time period selector.

It's a counter, so it shows something like this:

           3 |
           2 |             2       
count 1 |_1__1_____1_ 
                 1    2   3    4  days

So over the days, the number of restarts. Now, is it possible to get a single number out of this which sums these numbers up? So like, get the number 5. 1 + 1 + 2 + 1.

I would like to display that in a state graph by grafana, which would be just a single number. Just to see something like, in total there were 5 restarts these past 4 days.

Also, another difficulty is that this needs to be without instance and pods. ( the environment is kubernetes ) to remove duplicate reports. I think. :)

Thanks for ANY advice on this. Now I know that promql is a sliding query engine, so what I'm asking might actually not be possible at all. In that case, please tell me, and I'll try to look for another solution.

Thanks!
Gergely.

Brian Candler

unread,
Aug 20, 2020, 9:56:11 AM8/20/20
to Prometheus Users
Try sum_over_time(metric[4d])

If you are getting separate values per instance or pod, then sum() over all the timeseries.

Gergely Brautigam

unread,
Aug 20, 2020, 10:03:45 AM8/20/20
to Prometheus Users


On Thursday, August 20, 2020 at 3:56:11 PM UTC+2 b.ca...@pobox.com wrote:
Try sum_over_time(metric[4d])

Hi!

Yep, tried that. Something like this:

sum(sum(sum_over_time(server_restarts{result="failed"}[1h])) without (instance))

But unfortunately this doesn't give an accurate number at all. If I select last 7 days in Grafana, but the time range here is 1h this messes up badly. And to be honest, I don't fully understand the correlation between time range in promql and the grafana time range. I do understand that time range `[1h]` is like a sample rate. So I could try and change this to 7d but that messes up things even more. :D
 
What I can understand to use would to have a variable which you ALSO need to adjust while selecting the time frame in Grafana. I guess that could work.

Ben Kochie

unread,
Aug 20, 2020, 11:08:41 AM8/20/20
to Gergely Brautigam, Prometheus Users
Usually what I recommend is [$__interval] and setting the "min step" to 1h. This will make sure that the sum_over_time() works how you would expect, each point in the chart matches the width of the step..

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/32608f79-363f-4be1-9eb2-513fc5226e6bn%40googlegroups.com.

Gergely Brautigam

unread,
Aug 20, 2020, 11:18:06 AM8/20/20
to Prometheus Users
Nope... That did not work as expected. :( Last 3 hours I had 4 restarts; doing that ( $_interval ) displayed a whole bunch of non-sense. :/ Like, 3.4 or 1.1...
Especially when using something like selecting a range in the graph. I think the metric is correct, as in I can verify that there was a server restart failure at the time when the metric says 1.... 
But it's flimsy, because it's coming from multiple instances. I tried doing without (instance) but that isn't working with sum_over_time.

Julien Pivotto

unread,
Aug 20, 2020, 11:21:14 AM8/20/20
to Gergely Brautigam, Prometheus Users
changes(process_start_time_seconds[1h]) is usually how you do this.
> >> <https://groups.google.com/d/msgid/prometheus-users/32608f79-363f-4be1-9eb2-513fc5226e6bn%40googlegroups.com?utm_medium=email&utm_source=footer>
> >> .
> >>
> >
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/b529a81a-057c-4a57-af80-542776dd22bdn%40googlegroups.com.


--
Julien Pivotto
@roidelapluie

Gergely Brautigam

unread,
Aug 20, 2020, 11:31:49 AM8/20/20
to Prometheus Users
On Thursday, August 20, 2020 at 5:21:14 PM UTC+2 Julien Pivotto wrote:
changes(process_start_time_seconds[1h]) is usually how you do this. 

This actually might work. :) 
Reply all
Reply to author
Forward
0 new messages