integral() function proposal

868 views
Skip to first unread message

cbe...@openai.com

unread,
Aug 2, 2018, 7:53:10 PM8/2/18
to Prometheus Developers
Following up on Github #1335, I'd like to propose adding an integral() function to PromQL. This would be similar to sum_over_time(), except that points would not have equal weight, and instead the time-integral would be computed using the delta between points. Ideally, a max delta would be configurable to account for resets in a gauge metric. This would be similar to InfluxDB's integral() function (http://docs.influxdata.com/influxdb/v1.6/query_language/functions/#integral).

Use case: compute the integral of a gauge metric, such as kube_pod_container_resource_requests_cpu_cores, to understand the total resource usage of a Pod in Kubernetes.

Brian Brazil

unread,
Aug 2, 2018, 8:02:04 PM8/2/18
to cbe...@openai.com, Prometheus Developers
On 3 August 2018 at 00:53, <cbe...@openai.com> wrote:
Following up on Github #1335, I'd like to propose adding an integral() function to PromQL. This would be similar to sum_over_time(), except that points would not have equal weight, and instead the time-integral would be computed using the delta between points.

I'm unclear on what exactly you're proposing. Can you give an example of how this would work?
 
Ideally, a max delta would be configurable to account for resets in a gauge metric. This would be similar to InfluxDB's integral() function (http://docs.influxdata.com/influxdb/v1.6/query_language/functions/#integral).

Use case: compute the integral of a gauge metric, such as kube_pod_container_resource_requests_cpu_cores, to understand the total resource usage of a Pod in Kubernetes.


avg_over_time should cover that.

--

Christopher Berner

unread,
Aug 2, 2018, 8:05:17 PM8/2/18
to Brian Brazil, Prometheus Developers
Yep, for example, suppose we have a metric with points:
t=0, value=2
t=10, value=2
t=30, value=2

integral() would result in 60

Unless I misunderstand how avg_over_time() works, it would return 2

Brian Brazil

unread,
Aug 3, 2018, 3:27:05 AM8/3/18
to Christopher Berner, Prometheus Developers
On 3 August 2018 at 01:05, Christopher Berner <cbe...@openai.com> wrote:
Yep, for example, suppose we have a metric with points:
t=0, value=2
t=10, value=2
t=30, value=2

integral() would result in 60

Unless I misunderstand how avg_over_time() works, it would return 2

Yes, which you could multiply by the time period to get 60.

It also occurred to me that what you probably want here is a Counter, which is resilient to failed scrapes and is also not dependent on a high scrape frequency to catch all the peaks and dips in order to provide you with a good average.


 

On Thu, Aug 2, 2018 at 5:02 PM Brian Brazil <brian.brazil@robustperception.io> wrote:
On 3 August 2018 at 00:53, <cbe...@openai.com> wrote:
Following up on Github #1335, I'd like to propose adding an integral() function to PromQL. This would be similar to sum_over_time(), except that points would not have equal weight, and instead the time-integral would be computed using the delta between points.

I'm unclear on what exactly you're proposing. Can you give an example of how this would work?
 
Ideally, a max delta would be configurable to account for resets in a gauge metric.

Gauges don't reset, only Counters do that.

Brian

 
This would be similar to InfluxDB's integral() function (http://docs.influxdata.com/influxdb/v1.6/query_language/functions/#integral).

Use case: compute the integral of a gauge metric, such as kube_pod_container_resource_requests_cpu_cores, to understand the total resource usage of a Pod in Kubernetes.


avg_over_time should cover that.


--



--

Lucas Serven Marin

unread,
Aug 3, 2018, 3:58:09 AM8/3/18
to Brian Brazil, Christopher Berner, Prometheus Developers
Note: in this toy example avg_over_time multiplied by the period generates the same value as the proposed integral function but only because the values were all the same. Since the avg_over_time function does not calculate a weighted average, it differs meaningfully from an integral.

Consider the following counter example:

t=0, value=2
t=1, value=2
t=10, value=5

avg_over_time * period = 30
integral = 47

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAHJKeLp60Yvmq4AyxxqUAe2npgcV6LZhYZ2BnUbUot7-peE%2Bmg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Brian Brazil

unread,
Aug 3, 2018, 4:02:39 AM8/3/18
to Lucas Serven Marin, Christopher Berner, Prometheus Developers
On 3 August 2018 at 08:57, Lucas Serven Marin <lse...@redhat.com> wrote:
Note: in this toy example avg_over_time multiplied by the period generates the same value as the proposed integral function but only because the values were all the same. Since the avg_over_time function does not calculate a weighted average, it differs meaningfully from an integral.

It was decided to not weight the over_time functions for simplicity, and as in the real world it is unlikely to make much of a difference.
 

Consider the following counter example:

t=0, value=2
t=1, value=2
t=10, value=5

The target here was down for 8 of 11 scrapes, you've got bigger problems here.


avg_over_time * period = 30
integral = 47

How are you getting 47 out of that, and how do you know that 47 is correct? There's multiple possible interpolations of that data.


I think you might be confusing gauges and counters.

Brian
 

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.



--

Lucas Serven Marin

unread,
Aug 3, 2018, 4:23:20 AM8/3/18
to Brian Brazil, Christopher Berner, Prometheus Developers
El El vie, ago. 3, 2018 a las 10:02, Brian Brazil <brian....@robustperception.io> escribió:
On 3 August 2018 at 08:57, Lucas Serven Marin <lse...@redhat.com> wrote:
Note: in this toy example avg_over_time multiplied by the period generates the same value as the proposed integral function but only because the values were all the same. Since the avg_over_time function does not calculate a weighted average, it differs meaningfully from an integral.

It was decided to not weight the over_time functions for simplicity, and as in the real world it is unlikely to make much of a difference.
No one is arguing that avg_over_time should be weighted, just stating that it doesn’t weight, which causes it to perform differently from an integral.
 

Consider the following counter example:

t=0, value=2
t=1, value=2
t=10, value=5

The target here was down for 8 of 11 scrapes, you've got bigger problems here.
 Sure, but that’s entirely besides the point. The functions calculate different values.


avg_over_time * period = 30
integral = 47

How are you getting 47 out of that, and how do you know that 47 is correct? There's multiple possible interpolations of that data.
Yes, there are multiple possible correct ways to calculate an integral depending on your definition but no matter how many examples are equal, all you need is one counter example to demonstrate that it is different from avg_over_time, which this example does. 


I think you might be confusing gauges and counters.
not confused, I haven’t spoken about gauges or counters at all. I am just providing an example for the original author of the post which shows how integrals are inherently different from avg_over_time.

Brian
 

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.



--

Brian Brazil

unread,
Aug 3, 2018, 4:40:57 AM8/3/18
to Lucas Serven Marin, Christopher Berner, Prometheus Developers
On 3 August 2018 at 09:23, Lucas Serven Marin <lse...@redhat.com> wrote:



El El vie, ago. 3, 2018 a las 10:02, Brian Brazil <brian.brazil@robustperception.io> escribió:
On 3 August 2018 at 08:57, Lucas Serven Marin <lse...@redhat.com> wrote:
Note: in this toy example avg_over_time multiplied by the period generates the same value as the proposed integral function but only because the values were all the same. Since the avg_over_time function does not calculate a weighted average, it differs meaningfully from an integral.

It was decided to not weight the over_time functions for simplicity, and as in the real world it is unlikely to make much of a difference.
No one is arguing that avg_over_time should be weighted, just stating that it doesn’t weight, which causes it to perform differently from an integral.
 

Consider the following counter example:

t=0, value=2
t=1, value=2
t=10, value=5

The target here was down for 8 of 11 scrapes, you've got bigger problems here.
 Sure, but that’s entirely besides the point. The functions calculate different values.

It's relevant. Prometheus is for monitoring systems that are scraped regularly, not a generic statistics tool for arbitrary time series data. It is not our aim to support every possible math function, but to provide a set of functionality that is good enough.
 


avg_over_time * period = 30
integral = 47

How are you getting 47 out of that, and how do you know that 47 is correct? There's multiple possible interpolations of that data.
Yes, there are multiple possible correct ways to calculate an integral depending on your definition but no matter how many examples are equal, all you need is one counter example to demonstrate that it is different from avg_over_time, which this example does.

It's different sure, but does it matter? I don't think we should have mildly different variants of our functions to cover niche cases as that causes confusion for users, particularly when the case being presented is one where there are more serious problems with the scenario that need resolution.


I'd like to help, but I'm still not getting what the use case is here.

Brian
 

Brian
 

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.



--



--

Christopher Berner

unread,
Aug 3, 2018, 11:47:56 AM8/3/18
to Brian Brazil, Lucas Serven Marin, Prometheus Developers
What function returns the time period? For my specific example, avg_over_time multiplied by the period would probably be ok, but I couldn't find a function that returned the period 


Brian
 

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.



--



--

Brian Brazil

unread,
Aug 3, 2018, 11:49:33 AM8/3/18
to Christopher Berner, Lucas Serven Marin, Prometheus Developers
On 3 August 2018 at 16:47, Christopher Berner <cbe...@openai.com> wrote:
What function returns the time period? For my specific example, avg_over_time multiplied by the period would probably be ok, but I couldn't find a function that returned the period 

The time period is whatever range you pass to avg_over_time, so avg_over_time(x[1m]) * 60 would be one example.

Brian
 



Brian
 

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.



--



--



--

Christopher Berner

unread,
Aug 3, 2018, 1:03:10 PM8/3/18
to Brian Brazil, Lucas Serven Marin, Prometheus Developers
Ah, so here's a more complete example where they won't work: I'd like to create a query of the most expensive (in terms of requested CPU) Pods in our Kubernetes cluster, in the last day. With an integral function, I would use integral(kube_pod_container_resource_requests_cpu_cores[24h]). I can't use avg_over_time(kube_pod_container_resource_requests_cpu_cores[24h])*86400, because many Pods won't have been running for the entire 24hours (our Kube cluster is used for a lot of batch jobs).


Brian
 

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.



--



--



--

Brian Brazil

unread,
Aug 3, 2018, 1:04:57 PM8/3/18
to Christopher Berner, Lucas Serven Marin, Prometheus Developers
On 3 August 2018 at 18:02, Christopher Berner <cbe...@openai.com> wrote:
Ah, so here's a more complete example where they won't work: I'd like to create a query of the most expensive (in terms of requested CPU) Pods in our Kubernetes cluster, in the last day. With an integral function, I would use integral(kube_pod_container_resource_requests_cpu_cores[24h]). I can't use avg_over_time(kube_pod_container_resource_requests_cpu_cores[24h])*86400, because many Pods won't have been running for the entire 24hours (our Kube cluster is used for a lot of batch jobs).

For that usage, a plain sum_over_time should do it.

Brian
 


Brian
 

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.



--



--



--



--

Christopher Berner

unread,
Aug 3, 2018, 1:19:25 PM8/3/18
to Brian Brazil, Lucas Serven Marin, Prometheus Developers
It's directionally correct yes, but the units are unclear which would be confusing for people looking at the results: we'd expect to see core-seconds or core-hours used, but instead the units are in core-polling-intervals


Brian
 


Brian
 

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To post to this group, send email to prometheus...@googlegroups.com.



--



--



--



--

Brian Brazil

unread,
Aug 3, 2018, 1:27:06 PM8/3/18
to Christopher Berner, Lucas Serven Marin, Prometheus Developers
On 3 August 2018 at 18:19, Christopher Berner <cbe...@openai.com> wrote:
It's directionally correct yes, but the units are unclear which would be confusing for people looking at the results: we'd expect to see core-seconds or core-hours used, but instead the units are in core-polling-intervals

If you know what your interval is (which you should) you can multiply by that.

Brian
 


Brian
 


Brian
 

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.
To post to this group, send email to prometheus-developers@googlegroups.com.



--



--



--



--



--
Reply all
Reply to author
Forward
0 new messages