Rate function gives wrong value when data is present.

34 views
Skip to first unread message

Shubham Choudhary

unread,
Apr 13, 2020, 2:56:04 PM4/13/20
to Prometheus Users
I am using the Prometheus recording rule to capture the CPU usage over 5 mins and later want to use it to get maximum CPU used for 5 mins in the last 30 days.

Recording Rule

- expr: 1 - avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m]))
  record: instance:node_cpu_usage:rate5m

Max. CPU used in last 30 days over 5 mins

max_over_time(instance:node_cpu_usage:rate5m[30d])
 
Now, let's imagine a situation where Prometheus starts to pull data with an interval of 15sec which started at 00:00:00 so
  1. 1st pull was at 00:00:00. Here, the instance:node_cpu_usage:rate5m is not calculated since the rate function needs a minimum of 2 data points.
  2. 2nd pull was at 00:00:15. Here, the instance:node_cpu_usage:rate5m is calculated as the difference of 2nd pull - 1st pull data divided by 300 seconds. Now, this is the issue of why rate function dividing by 300 instead of 15.
  3. 3rd pull was at 00:00:45. Again, the same scenario occurs.
  4. The wrong data is being saved in instance:node_cpu_usage:rate5m till 5th minute.

prometheus_rate_less_data.png


So, doing a max_over_time(instance:node_cpu_usage:rate5m[30d]) will for sure give me the first value which is wrong.

Also if there are server crashes or network issue then the rate gives wrong data.

How can I overcome this outlier?

Julien Pivotto

unread,
Apr 13, 2020, 3:00:58 PM4/13/20
to Shubham Choudhary, Prometheus Users
On 13 Apr 11:56, Shubham Choudhary wrote:
> I am using the Prometheus recording rule to capture the CPU usage over 5
> mins and later want to use it to get maximum CPU used for 5 mins in the
> last 30 days.
>
> Recording Rule
>
> - expr: 1 - avg by (instance)
> > (rate(node_cpu_seconds_total{mode="idle"}[5m]))
> > record: instance:node_cpu_usage:rate5m
>
>
> Max. CPU used in last 30 days over 5 mins
>
> max_over_time(instance:node_cpu_usage:rate5m[30d])
>
>
> Now, let's imagine a situation where Prometheus starts to pull data with an
> interval of 15sec which started at 00:00:00 so
>
> 1. 1st pull was at 00:00:00. Here, the instance:node_cpu_usage:rate5m is
> not calculated since the rate function needs a minimum of 2 data points.
> 2. 2nd pull was at 00:00:15. *Here, the instance:node_cpu_usage:rate5m
> is calculated as the difference of 2nd pull - 1st pull data divided by 300
> seconds. Now, this is the issue of why rate function dividing by 300
> instead of 15.*
> 3. 3rd pull was at 00:00:45. Again, the same scenario occurs.
> 4. The wrong data is being saved in instance:node_cpu_usage:rate5m till
> 5th minute.


Please provide the raw data:

node_cpu_seconds_total{mode="idle"}[1h] in the "table" view.

Thanks.

>
> [image: prometheus_rate_less_data.png]
>
>
> *So, doing a max_over_time(instance:node_cpu_usage:rate5m[30d]) will for
> sure give me the first value which is wrong.*
> Also if there are server crashes or network issue then the rate gives wrong
> data.
>
> How can I overcome this outlier?
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/995498ad-5ac2-4b97-a53f-0ae3f1e97c18%40googlegroups.com.



--
(o- Julien Pivotto
//\ Open-Source Consultant
V_/_ Inuits - https://www.inuits.eu
signature.asc

Shubham Choudhary

unread,
Apr 13, 2020, 3:18:31 PM4/13/20
to Prometheus Users


Hi, I have attached the data in prometheus_rate_1h.txt.


promethesu-rate-1hr.png
> To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.
prometheus_rate_1h.txt

Ben Kochie

unread,
Apr 13, 2020, 3:46:39 PM4/13/20
to Shubham Choudhary, Prometheus Users
That seems pretty normal to me.

You have to remember, that by specifying a [1h] rate, you are asking Prometheus to look back a full hour from each point on the graph. Your graph window is only 2 hours wide, and the step is 28 seconds. So each point has 59:32 overlap backwards in time.

If you want to see a graph with no overlap, you need to match your step to your interval.


To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/139b857d-1a3d-47f8-8b6a-5591a4593672%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages