Calculate Max over time on Sum function

5,436 views
Skip to first unread message

Darshil Saraiya

unread,
Jul 13, 2017, 1:53:51 PM7/13/17
to Prometheus Users

I am running prometheus in my kubernetes cluster.

I have the following system in kubernetes:


I have 4 nodes. I want to calculate free memory. I want to have the summation of those four nodes. Then I want to find the maximum over 1 day. So, for example,


at time=t1
node1: 500 MB
node2: 600 MB
node3: 200 MB
node4: 300 MB
Total = 1700 MB


at time=t2
node1: 400 MB
node2: 700 MB
node3: 100 MB
node4: 200 MB
Total = 1300 MB


at time=t3
node1: 600 MB
node2: 800 MB
node3: 1200 MB
node4: 1300 MB
Total = 3900 MB


at time=t4
node1: 100 MB
node2: 200 MB
node3: 300 MB
node4: 400 MB
Total = 1000 MB


So, The answer to my query should be 3900 MB. I am not able to do max_over_time for the sum. I tried(Which did not work)

max_over_time(sum(node_memory_MemFree)[2m])


Ben Kochie

unread,
Jul 13, 2017, 1:57:59 PM7/13/17
to Darshil Saraiya, Prometheus Users
You need to use the range vector function first.

sum(max_over_time(node_memory_MemFree[2m]))


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/76a9f208-e725-491f-9079-5aa628675756%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Darshil Saraiya

unread,
Jul 13, 2017, 2:32:36 PM7/13/17
to Prometheus Users
I want to do the sum of all the nodes first at a particular time. So, Sum of time t1, Sum of time t2, Sum of time t3, Sum of time t3. And then, I want to find the maximum among them. So, What sum(max_over_time(node_memory_MemFree[2m])) will do is it will first find maximum from all the nodes. So, the answer will be the sum of all the maximum values of all the nodes, which I don't want. This will give me:

for node1 : max: 500 
for node2 : max: 800
for node3 : max: 1200
for node4 : max: 1300
total: 3800

PS: I have changed the value of node3, because it has maximum values for all the nodes.

On Thursday, 13 July 2017 10:57:59 UTC-7, Ben Kochie wrote:
You need to use the range vector function first.

sum(max_over_time(node_memory_MemFree[2m]))

On Jul 13, 2017 19:53, "Darshil Saraiya" <sdars...@gmail.com> wrote:

I am running prometheus in my kubernetes cluster.

I have the following system in kubernetes:


I have 4 nodes. I want to calculate free memory. I want to have the summation of those four nodes. Then I want to find the maximum over 1 day. So, for example,


at time=t1
node1: 500 MB
node2: 600 MB
node3: 200 MB
node4: 300 MB
Total = 1700 MB


at time=t2
node1: 400 MB
node2: 700 MB
node3: 100 MB
node4: 200 MB
Total = 1300 MB


at time=t3
node1: 100 MB


node2: 800 MB
node3: 1200 MB
node4: 1300 MB

Total = 3400 MB


at time=t4
node1: 100 MB
node2: 200 MB
node3: 300 MB
node4: 400 MB
Total = 1000 MB


So, The answer to my query should be 3400 MB. I am not able to do max_over_time for the sum. I tried(Which did not work)

max_over_time(sum(node_memory_MemFree)[2m])


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

jer...@doupe.com

unread,
Jul 13, 2017, 3:18:58 PM7/13/17
to Prometheus Users
You'll need to use a recording rule to do the sum() into a new metric, then you'll be able to use max_over_time() on that new metric.

Ben Kochie

unread,
Jul 13, 2017, 4:23:50 PM7/13/17
to Darshil Saraiya, Prometheus Users
I'm sorry, but Prometheus metrics will not work that way.  Scheduling of scrapes is intentionally spread over the scrape interval, there is explicitly no alignment of timestamps.  On the contrary, we avoid timestamp correlation.  What you are trying to do is explicitly not going to work.

If you want to find out the max over one day, you simply ask with PromQL to give you the max over the day.

sum(max(node_memory_MemFree[1d]))

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/985fe5a3-a2cd-4fed-8b1e-705d0255c977%40googlegroups.com.

Tobias Schmidt

unread,
Jul 13, 2017, 5:14:42 PM7/13/17
to Ben Kochie, Darshil Saraiya, Prometheus Users
While it's true that it's not possible to collect the value of a Gauge at the exact same timestamp across all targets, I believe that wasn't the requirement here. It is possible with Prometheus to collect the maximum of a sum evaluated at a given evaluation timestamp a.k.a "step" in the query_range API. The expression sum(max(...)) is not equivalent to max(sum(...)).

As Jeremy has described, that requires recording the sum in a new time series with recording rules. PromQL currently only supports constructing a range vector from a time series, but not from the result of an expression. Recording a new time series is a workaround that for now. Here is the longstanding issue for that limitation: https://github.com/prometheus/prometheus/issues/1227

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.

Darshil Saraiya

unread,
Jul 14, 2017, 2:24:49 PM7/14/17
to Prometheus Users
I did find the solutions of this.

I created a Recording Rule:

cluster:memory_used:bytes =


      sum by (cluster) (


        node_memory_MemTotal


      ) - sum by (cluster) (


        node_memory_MemFree


      ) - sum by (cluster) (


        node_memory_Buffers


      ) - sum by (cluster) (


        node_memory_Cached


      )


And then, I just wrote the query:

max_over_time(cluster:memory_used:bytes[2m])

Reply all
Reply to author
Forward
0 new messages