promql stat functions return identical values

mohan garden

unread,

Sep 24, 2024, 9:55:00 AM9/24/24

to Prometheus Users

I am trying to analyse memory usage of a server for 2 specific months using Grafana and prometheus. but seems _over_time functions are returning unexpected results.

Here is the data for the duration

the summary table shows expected values

query -
(( node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) * 100 ) / node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]

Issue - when i am trying to create similar stats using PromQL at my end , i am facing issues . i fail to get the same values when i use promql , example -

( avg_over_time(node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) / avg_over_time(node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]) ) * 100

( min_over_time(node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) / min_over_time(node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]) ) * 100

( max_over_time(node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) / max_over_time(node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]) ) * 100

so you can see that avg|min|max_over_time functions return identical values with following setting

when i change from range -> instant, i see similar values

Where do i need to make modifications in PromQL so i can get the correct min/max/avg values in the gauges as reported by the

for a specific duration , say -

please advice

mohan garden

unread,

Sep 24, 2024, 10:10:33 AM9/24/24

to Prometheus Users

Hi ,
seems images in my previous post did not show up as expected.
Sorry for the spam , reposting again -

Hi ,

I am trying to analyse memory usage of a server for 2 specific months using Grafana and prometheus. but seems _over_time functions are returning unexpected results.

Here is the data for the duration

Now, the summary table shows expected values

query -
(( node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) * 100 ) / node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]

Issue - when i am trying to create similar stats using PromQL at my end , i am facing issues . i fail to get the same stats when i use the following promql , example -

( avg_over_time(node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) / avg_over_time(node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]) ) * 100

( min_over_time(node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) / min_over_time(node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]) ) * 100

( max_over_time(node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) / max_over_time(node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]) ) * 100

so you can see that avg|min|max_over_time functions return identical values which dont make much sense. I was using following setting

I tried changing from range -> instant, i see similar values

Where do i need to make modifications in PromQL so i can get the correct min/max/avg values in the gauges as correctly reported by the

for a specific duration , say -

please advice

Brian Candler

unread,

Sep 24, 2024, 10:42:29 AM9/24/24

to Prometheus Users

$__rate_interval is (roughly speaking) the interval between 2 adjacent points in the graph, with a minimum of 4 times the configured scrape interval. It's not the entire period over which Grafana is drawing the graph. You probably want $__range or $__range_s. See:

https://grafana.com/docs/grafana/latest/datasources/prometheus/template-variables/#use-__rate_interval

https://grafana.com/docs/grafana/latest/dashboards/variables/add-template-variables/#global-variables

However, questions about Grafana would be better off asked in the Grafana community. Prometheus is not Grafana, and those variables are Grafana-specific.

> so you can see that avg|min|max_over_time functions return identical values which dont make much sense

It makes sense when you realise that the time period you're querying over is very small; hence for a value that doesn't change rapidly, the min/max/average over such a short time range will all be roughly the same.

mohan garden

unread,

Sep 27, 2024, 12:59:22 AM9/27/24

to Prometheus Users

Thank you for the response Brian,

I removed the $__ variables and tried viewing disk usage metrics from past 1 hour in PromUI -

I tried the query in the Prometheus UI , and i was expecting value ~97% for past 1 hour metrics but the table view reports 48%.

I am not sure if i missed out on something within the query.

i am under impression that max function works with multiple series, and over time will generate stats from the values within the series.
Please advice.

mohan garden

unread,

Sep 27, 2024, 1:01:16 AM9/27/24

to Prometheus Users

Sorry for the double posting, image was corrupted, so reposting

Thank you for the response Brian,

I removed the $__ variables and tried viewing disk usage metrics from past 1 hour in PromUI -

I tried the query in the Prometheus UI , and i was expecting value ~97% with following query for past 1 hour metrics but the table view reports 48%.

I am not sure if i missed out on some thing within the query.

Brian Candler

unread,

Sep 27, 2024, 5:40:41 AM9/27/24

to Prometheus Users

Perform the two halves of the query separately, i.e.

max_over_time(node_filesystem_avail_bytes{...}[1h]

max_over_time(node_filesystem_size_bytes{...}[1h]

and then you'll see why they divide to give 48% instead of 97%

I expect node_filesystem_size_bytes doesn't change much, so max_over_time doesn't do much for that. But max_over_time(node_filesystem_avail_bytes) will show the *largest* available space over that 1 hour window, and therefore you'll get the value for when the disk was *least full*. If you want to know the value when it was *most full* then it would be min_over_time(node_filesystem_avail_bytes).

Note that you showed a graph, rather than a table. When you're graphing, you're repeating the same query at different evaluation times. So where the time axis show 04:00, the data point on the graph is for the 1 hour period from 03:00 to 04:00. Where the time axis is 04:45, the result is of your query covering the 1 hour from 03:45 to 04:45.

Aside: in general, I'd advise keeping percentage queries simple by removing the factor of 100, so you get a fraction between 0 and 1 instead. This can be represented as a human-friendly percentage when rendered (e.g. Grafana can quite happily render 0-1 as 0-100%)

Brian Candler

unread,

Sep 27, 2024, 5:46:42 AM9/27/24

to Prometheus Users

> e.g. Grafana can quite happily render 0-1 as 0-100%

and in alerting rules:

- expr: blah > 0.9

annotations:
summary: 'filesystem usage is high: {{ $value | humanizePercentage }}'

Reply all

Reply to author

Forward