Rate function gives wrong value when data is present.

Shubham Choudhary

unread,

Apr 13, 2020, 2:56:04 PM4/13/20

to Prometheus Users

I am using the Prometheus recording rule to capture the CPU usage over 5 mins and later want to use it to get maximum CPU used for 5 mins in the last 30 days.

Recording Rule

- expr: 1 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))
record: instance:node_cpu_usage:rate5m

Max. CPU used in last 30 days over 5 mins

max_over_time(instance:node_cpu_usage:rate5m[30d])

Now, let's imagine a situation where Prometheus starts to pull data with an interval of 15sec which started at 00:00:00 so

1st pull was at 00:00:00. Here, the instance:node_cpu_usage:rate5m is not calculated since the rate function needs a minimum of 2 data points.
2nd pull was at 00:00:15. Here, the instance:node_cpu_usage:rate5m is calculated as the difference of 2nd pull - 1st pull data divided by 300 seconds. Now, this is the issue of why rate function dividing by 300 instead of 15.
3rd pull was at 00:00:45. Again, the same scenario occurs.
The wrong data is being saved in instance:node_cpu_usage:rate5m till 5th minute.

So, doing a max_over_time(instance:node_cpu_usage:rate5m[30d]) will for sure give me the first value which is wrong.

Also if there are server crashes or network issue then the rate gives wrong data.

How can I overcome this outlier?

Julien Pivotto

unread,

Apr 13, 2020, 3:00:58 PM4/13/20

to Shubham Choudhary, Prometheus Users

On 13 Apr 11:56, Shubham Choudhary wrote:
> I am using the Prometheus recording rule to capture the CPU usage over 5
> mins and later want to use it to get maximum CPU used for 5 mins in the
> last 30 days.
>
> Recording Rule
>
> - expr: 1 - avg by (instance)
> > (rate(node_cpu_seconds_total{mode="idle"}[5m]))
> > record: instance:node_cpu_usage:rate5m
>
>
> Max. CPU used in last 30 days over 5 mins
>
> max_over_time(instance:node_cpu_usage:rate5m[30d])
>
>
> Now, let's imagine a situation where Prometheus starts to pull data with an
> interval of 15sec which started at 00:00:00 so
>

> 1. 1st pull was at 00:00:00. Here, the instance:node_cpu_usage:rate5m is

> not calculated since the rate function needs a minimum of 2 data points.

> 2. 2nd pull was at 00:00:15. *Here, the instance:node_cpu_usage:rate5m

> is calculated as the difference of 2nd pull - 1st pull data divided by 300
> seconds. Now, this is the issue of why rate function dividing by 300

> instead of 15.*
> 3. 3rd pull was at 00:00:45. Again, the same scenario occurs.
> 4. The wrong data is being saved in instance:node_cpu_usage:rate5m till
> 5th minute.

Please provide the raw data:

node_cpu_seconds_total{mode="idle"}[1h] in the "table" view.

Thanks.

>
> [image: prometheus_rate_less_data.png]
>
>
> *So, doing a max_over_time(instance:node_cpu_usage:rate5m[30d]) will for
> sure give me the first value which is wrong.*

> Also if there are server crashes or network issue then the rate gives wrong
> data.
>
> How can I overcome this outlier?
>

> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/995498ad-5ac2-4b97-a53f-0ae3f1e97c18%40googlegroups.com.

--
(o- Julien Pivotto
//\ Open-Source Consultant
V_/_ Inuits - https://www.inuits.eu

signature.asc

Shubham Choudhary

unread,

Apr 13, 2020, 3:18:31 PM4/13/20

to Prometheus Users

Hi, I have attached the data in prometheus_rate_1h.txt.

Also, see the graph.

> To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

prometheus_rate_1h.txt

Ben Kochie

unread,

Apr 13, 2020, 3:46:39 PM4/13/20

to Shubham Choudhary, Prometheus Users

That seems pretty normal to me.

You have to remember, that by specifying a [1h] rate, you are asking Prometheus to look back a full hour from each point on the graph. Your graph window is only 2 hours wide, and the step is 28 seconds. So each point has 59:32 overlap backwards in time.

If you want to see a graph with no overlap, you need to match your step to your interval.

For example: https://prometheus.demo.do.prometheus.io/graph?g0.range_input=2h&g0.step_input=300&g0.expr=rate(node_cpu_seconds_total%7Bmode%3D%22idle%22%7D%5B5m%5D)&g0.tab=0

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/139b857d-1a3d-47f8-8b6a-5591a4593672%40googlegroups.com.

Reply all

Reply to author

Forward