PromQL: comparison to custom metric and/or static threshold

Justin W

unread,

May 11, 2020, 11:47:23 AM5/11/20

to Prometheus Users

Hi all,

I'm trying to compare the calculated filesystem used to either a static threshold, or a custom threshold if a certain metric is present. What I have so far is:

((avg(node_filesystem_used_bytes{mountpoint=~".*foo.*"}) by (site)/avg(node_filesystem_size_bytes{mountpoint=~".*foo.*"}) by (site)) > my_custom_threshold) or ((avg(node_filesystem_used_bytes{mountpoint=~".*foo.*"}) by (site)/avg(node_filesystem_size_bytes{mountpoint=~".*foo.*"}) by (site)) > 0.7)

What I want this query to do is "If the my_custom_threshold is present for the (site), compare against that, else compare against 70%".

But for example if my_custom_threshold is 0.8 for a particular site, a series will still show up if the percent used is greater than 70% and less than 80%.

How do I modify this query to ask "greater than 0.7 if my_custom_threshold is not present"?

Thanks!

Brian Candler

unread,

May 11, 2020, 1:18:35 PM5/11/20

to Prometheus Users

There's an example of this in

https://www.robustperception.io/using-time-series-as-alert-thresholds

"You could also provide a default, so only those teams wishing to override it need to configure a threshold. Here the default is 42:"

The trick is to get some other timeseries which has the same labels as the set you're alerting on, and use that to provide a fixed default if the custom threshold doesn't exist.

Harald Koch

unread,

May 11, 2020, 1:28:56 PM5/11/20

to Prometheus Users

On Mon, May 11, 2020, at 11:47, Justin W wrote:

Hi all,

I'm trying to compare the calculated filesystem used to either a static threshold, or a custom threshold if a certain metric is present. What I have so far is:

((avg(node_filesystem_used_bytes{mountpoint=~".*foo.*"}) by (site)/avg(node_filesystem_size_bytes{mountpoint=~".*foo.*"}) by (site)) > my_custom_threshold) or ((avg(node_filesystem_used_bytes{mountpoint=~".*foo.*"}) by (site)/avg(node_filesystem_size_bytes{mountpoint=~".*foo.*"}) by (site)) > 0.7)

What I want this query to do is "If the my_custom_threshold is present for the (site), compare against that, else compare against 70%".

I wrote a long and incoherent article about this:

https://www.haraldkoch.ca/blog/index.php/2020/03/14/prometheus-alerting-rules-and-metadata/

Suggestions welcome!!

--

Harald

Brian Candler

unread,

May 11, 2020, 3:57:26 PM5/11/20

to Prometheus Users

Here are a couple of things to help understand why the original expression doesn't work as expected.

1. The comparison operators are filters, returning a vector of 0 or more elements which is subset of all the available timeseries, not a boolean value.

(foo > my_custom_threshold) is comparing an instant vector with another instant vector. It only gives results where foo and my_custom_threshold have exactly matching label sets *and* the value of foo is greater than the corresponding value of my_custom_threshold. The values in the result vector are the values of the LHS.

(foo > 0.7) is comparing an instant vector with a scalar. It gives results for *every* metric foo whose value is > 0.7.

2. (foo > x) OR (foo > y) isn't a boolean expression, it's a set (union) operation.

(foo > x) gives all those timeseries whose metric name is foo and value is > x

(foo > y) gives all those timeseries whose metric name is foo and value is > y

(foo > x) OR (foo > y)

will give you all the metrics from the LHS, *plus* all metrics on the RHS which don't have any matching label set on the LHS.

The behaviour of this effectively is "foo > min(x,y)" [although that is not valid PromQL]. This is why you'll always get an alert for value over 0.7; if you set my_custom_threshold higher, meaning that the LHS doesn't give a result, the RHS will fill in the gap.

I find that using the PromQL query interface in the prometheus web interface is very useful for graphing the expressions or subexpressions. Once you realise that "foo > 0.7" is basically just showing you the graph of foo, but with gaps where its value falls below 0.7, suddenly things become a lot clearer.

Reply all

Reply to author

Forward