promql stat functions return identical values

60 views
Skip to first unread message

mohan garden

unread,
Sep 24, 2024, 9:55:00 AM9/24/24
to Prometheus Users

I am trying to analyse memory usage of a server for 2 specific months using Grafana and prometheus. but seems _over_time functions are returning unexpected results.

Here is the data for the duration
image
the summary table shows expected values
image
query -
(( node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) * 100 ) / node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]


Issue - when i am trying to create similar stats using PromQL at my end , i am facing issues . i fail to get the same values when i use promql , example -

image

( avg_over_time(node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) / avg_over_time(node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]) ) * 100

( min_over_time(node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) / min_over_time(node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]) ) * 100

( max_over_time(node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) / max_over_time(node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]) ) * 100

so you can see that avg|min|max_over_time functions return identical values with following setting
image

when i change from range -> instant, i see similar values
image

Where do i need to make modifications in PromQL so i can get the correct min/max/avg values in the gauges as reported by the
image
for a specific duration , say - 
image

please advice


mohan garden

unread,
Sep 24, 2024, 10:10:33 AM9/24/24
to Prometheus Users
Hi ,
seems images in my previous post did not show up as expected.
Sorry for the spam , reposting again  - 


Hi , 

I am trying to analyse memory usage of a server for 2 specific months using Grafana and prometheus. but seems _over_time functions are returning unexpected results.

Here is the data for the duration

one.png

Now, the summary table shows expected values
one.png


query -
(( node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) * 100 ) / node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]


Issue - when i am trying to create similar stats using PromQL at my end , i am facing issues . i fail to get the same stats when i use the following promql , example -

two.png

avg_over_time(node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) / avg_over_time(node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]) ) * 100

min_over_time(node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) / min_over_time(node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]) ) * 100

max_over_time(node_memory_MemAvailable_bytes{instance="$node",job="$job"}[$__rate_interval]) / max_over_time(node_memory_MemTotal_bytes{instance="$node",job="$job"}[$__rate_interval]) ) * 100

so you can see that avg|min|max_over_time functions return identical values which dont make much sense. I was using  following setting

one.png

I tried changing from range -> instant, i see similar values
two.png

Where do i need to make modifications in PromQL so i can get the correct min/max/avg values in the gauges as correctly reported by the
one.png


for a specific duration , say - 

one.png

please advice

Brian Candler

unread,
Sep 24, 2024, 10:42:29 AM9/24/24
to Prometheus Users
$__rate_interval is (roughly speaking) the interval between 2 adjacent points in the graph, with a minimum of 4 times the configured scrape interval. It's not the entire period over which Grafana is drawing the graph. You probably want $__range or $__range_s. See:

However, questions about Grafana would be better off asked in the Grafana community. Prometheus is not Grafana, and those variables are Grafana-specific.

> so you can see that avg|min|max_over_time functions return identical values which dont make much sense

It makes sense when you realise that the time period you're querying over is very small; hence for a value that doesn't change rapidly, the min/max/average over such a short time range will all be roughly the same.

mohan garden

unread,
Sep 27, 2024, 12:59:22 AM9/27/24
to Prometheus Users
Thank you for the response Brian,

I removed the $__ variables and tried viewing disk usage metrics from past 1 hour in PromUI -
 

I tried the query in the Prometheus UI , and i was expecting value ~97%   for past 1 hour metrics but the table view reports 48%. 

max_over_time.png
I am not sure if i missed out on something within the query.

i am under impression that max function works with multiple series,  and over time will generate stats from the values within the series.
Please advice.

mohan garden

unread,
Sep 27, 2024, 1:01:16 AM9/27/24
to Prometheus Users
Sorry for the double posting, image was corrupted, so reposting 


Thank you for the response Brian,

I removed the $__ variables and tried viewing disk usage metrics from past 1 hour in PromUI -
I tried the query in the Prometheus UI , and i was expecting value ~97% with following query for past 1 hour metrics but the table view reports 48%. 

max_over_time.png
I am not sure if i missed out on some thing within the query.

Brian Candler

unread,
Sep 27, 2024, 5:40:41 AM9/27/24
to Prometheus Users
Perform the two halves of the query separately, i.e.
max_over_time(node_filesystem_avail_bytes{...}[1h]
max_over_time(node_filesystem_size_bytes{...}[1h]

and then you'll see why they divide to give 48% instead of 97%

I expect node_filesystem_size_bytes doesn't change much, so max_over_time doesn't do much for that. But max_over_time(node_filesystem_avail_bytes) will show the *largest* available space over that 1 hour window, and therefore you'll get the value for when the disk was *least full*. If you want to know the value when it was *most full* then it would be min_over_time(node_filesystem_avail_bytes).

Note that you showed a graph, rather than a table. When you're graphing, you're repeating the same query at different evaluation times. So where the time axis show 04:00, the data point on the graph is for the 1 hour period from 03:00 to 04:00. Where the time axis is 04:45, the result is of your query covering the 1 hour from 03:45 to 04:45. 

Aside: in general, I'd advise keeping percentage queries simple by removing the factor of 100, so you get a fraction between 0 and 1 instead. This can be represented as a human-friendly percentage when rendered (e.g. Grafana can quite happily render 0-1 as 0-100%)

Brian Candler

unread,
Sep 27, 2024, 5:46:42 AM9/27/24
to Prometheus Users
> e.g. Grafana can quite happily render 0-1 as 0-100%

and in alerting rules:

- expr: blah > 0.9
  annotations:
    summary: 'filesystem usage is high: {{ $value | humanizePercentage }}'

Reply all
Reply to author
Forward
0 new messages