Prometheus / Grafana - How to best display counters in a Singlestat panel?

10,738 views
Skip to first unread message

mru...@gmail.com

unread,
Jul 19, 2017, 3:17:40 PM7/19/17
to Prometheus Users
Prometheus: v1.5.0
Grafana: v4.2.0

We're collecting a counter metric for the number of calls that each endpoint has. This is scraped every 60s.

One common request I'm receiving from some of our teams is if we display the total number of calls over a specified period of time in Grafana through a Singlestat panel.

Currently, I have the Grafana query setup like the following. We take the increase of the counter and then sum it up for all instances of that endpoint. We're using the $__interval variable provided by Grafana to handle the varying intervals. However, the numbers we receive from these queries don't appear to be correct, and I'm fairly sure it has to be due to the way interval and resolution/stepping is involved.
  • sum(increase(app_callscount{endpoint="/services/exampleService"}[$__interval]))
Below I'll show the queries being performed by Grafana and then the results. The results are the summed total of the returned values matrix. This is done by Grafana using "total" under Options -> Value -> Stat field.

30 Min Period
Last 7 Days
Last 60 days
This Year
Should we let Grafana auto-adjust the interval based on the selected time range rather than statically providing it a value? 

As you can see above, the range "This Year" provides a result nearly half of what is provided by "Last 60 Days" which doesn't make sense when people are viewing the dashboard.

Has anyone had experience with this and can provide recommendations or best practices?

Regards,
Matt

mru...@gmail.com

unread,
Jul 27, 2017, 1:52:40 PM7/27/17
to Prometheus Users, mru...@gmail.com
To follow-up on this, the solution we've implemented for the past week is: 
  • sum(app_callscount{endpoint="/services/exampleService"}) - to get the total calls of all instances of the endpoint
  • Let Grafana handle the "increase" calculation by using "Stat: delta" under the Options -> Value tab.
So far, we've had more confidence in this approach as the results have been accurate.

Matthias Rampke

unread,
Jul 28, 2017, 3:26:19 AM7/28/17
to mru...@gmail.com, Prometheus Users

Be careful with sum(something_total) – this is only valid if all instances reset their counters (restart) at the same time.

The best way would be

sum(increase(app_callscount{endpoint="/services/exampleService"}[$duration]))

and using the current value. Not sure how to do that in Grafana though, this doesn't seem to be available as a variable. Maybe you can trick Grafana into using exactly one interval for the ministat so that $__interval == the duration you are interested in?

/MR


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/9d401c47-74a4-4130-83a5-40f510f8e7d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

mru...@gmail.com

unread,
Aug 1, 2017, 8:30:25 AM8/1/17
to Prometheus Users, mru...@gmail.com
I agree with you. That was the path we initially took. It seems that the $__interval variable passed by Grafana doesn't exactly match up to the actual duration of the selected time range which makes the results inaccurate.

Unfortunately, this was a compromise that we had to make. Our use cases require having the ability for users to choose varying time ranges.

pavel...@gmail.com

unread,
Jan 12, 2018, 11:03:55 AM1/12/18
to Prometheus Users
Hi, did you find a solution for that? We met same problem and don't know what to do.

Ben Kochie

unread,
Jan 12, 2018, 12:34:11 PM1/12/18
to pavel...@gmail.com, Prometheus Users
I don't remember the exact timing, but the $__interval values, and behavior, in Grafana were improved in the last couple releases.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/e9feef32-e7df-4214-953a-122969bdb40e%40googlegroups.com.
Message has been deleted

wangchao...@gmail.com

unread,
Sep 19, 2018, 8:19:14 AM9/19/18
to Prometheus Users
Why sum(somethinbg_total) is only valid if all instances reset their counters at the same time? 

I just did some tests on Grafana 5.2.1 using "sum(increase(app_callscount{endpoint="/services/exampleService"}[$__interval]))", and found the result with time range "today so far" is bigger than the result with "today", and the result with "this week so far" is bigger than "This week". It's really confusing.

在 2017年7月28日星期五 UTC+8下午3:26:19,Matthias Rampke写道:

fran...@gmail.com

unread,
Jan 31, 2019, 10:50:23 PM1/31/19
to Prometheus Users
Range functions extrapolate the metrics to meet your time slot. This gave us issues trying to get/show exact values.
For example: 
1. An app is polled every 60 seconds for metrics
2. We want to use an increase function over a 2 minute span
3. During that span there is only one result of 10 which covers a 60 second period
4. That leaves 60 seconds of missing data that Prometheus extrapolates (makes best guess)
5. Leaving an estimated value of 20 over 2 minutes

NOTE: Don’t use range vector functions if needing exact values

Costi Muraru

unread,
Jun 5, 2020, 10:03:43 AM6/5/20
to Prometheus Users
Grafana introduced $__range for Prometheus. 

To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages