Null value in alerts

Sebastian Glock

unread,

Dec 9, 2022, 2:31:32 AM12/9/22

to Prometheus Users

Hi,

I'm having trouble setting up an alert that will send a notification when a value is different from 0 and the value is missing (i.e. null).

expression:

windows_mscluster_resourcegroup_state {name!~"Available Storage"} != 0 or on() vector(0)

The alert goes off non-stop. How can I set the metric to send an alert when the value is different from 0 and is null?

I tried with sum() but not working anyway:

sum(windows_mscluster_resourcegroup_state {name!~"Available Storage"} != 0) or on() vector(0)

Thanks for replies!

sebag...@gmail.com

unread,

Dec 9, 2022, 3:49:48 AM12/9/22

to Matthias Rampke, Prometheus Users

Thanks for advice,

So in this case I just need to use absent like this In alert?:

- alert: Resource group in cluster is down

expr: absent(windows_mscluster_resourcegroup_state {name!~"Available Storage"}) == 1

for: 10s

labels:

severity: "[Cluster]"

annotations:

summary: "Resource group in cluster is down!"

description: "{{ humanize $value }}"

This one will send message, when metric is missing?

From: Matthias Rampke <matt...@prometheus.io>
Sent: Friday, December 9, 2022 8:57 AM
To: Sebastian Glock <sebag...@gmail.com>
Cc: Prometheus Users <promethe...@googlegroups.com>
Subject: Re: [prometheus-users] Null value in alerts

When you say "the value is missing", what condition exactly do you want to alert on?

To detect that there is *no* metric matching your selector, you can use the absent(…) function. It returns 1 when … is nothing.

It gets more complicated and difficult if you want to detect that a single series has disappeared. In this case, you need to very specific in telling Prometheus which series *should* exist. Common ways to do this are

- listing them all out with separate absent(x) clauses and specific positive matchers

- comparing to a previous time (x offset 15m unless x)

- use some other metric that lets you determine what should be there

- generate recording rules to create such a metric

The fundamental challenge here is to distinguish between "this went missing" and "this went away because of expected changes".

In general, I prefer splitting "metric indicates there is a problem " and "metric is missing" into two different alerts with separate names and descriptions. To the one investigating, the difference matters. Additionally using absent() often results in different label sets because it cannot know labels for a time series that is absent. This causes trouble with templating that you sidestep by using separate alert definitions to begin with.

/MR

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/9fbc7d5d-c7ce-4b93-b653-733cac798956n%40googlegroups.com.

Stuart Clark

unread,

Dec 9, 2022, 4:00:27 AM12/9/22

to sebag...@gmail.com, Matthias Rampke, Prometheus Users

On 09/12/2022 08:49, sebag...@gmail.com wrote:

Thanks for advice,

So in this case I just need to use absent like this In alert?:

- alert: Resource group in cluster is down

expr: absent(windows_mscluster_resourcegroup_state {name!~"Available Storage"}) == 1

You aren't listing a metric here as you are using !~. You need to ensure you are only using = in any labels.

-- 
Stuart Clark

Brian Candler

unread,

Dec 9, 2022, 4:02:30 AM12/9/22

to Prometheus Users

On Friday, 9 December 2022 at 07:31:32 UTC sebag...@gmail.com wrote:

expression:
windows_mscluster_resourcegroup_state {name!~"Available Storage"} != 0 or on() vector(0)

The alert goes off non-stop.

Yes, that's correct.

PromQL expressions don't work like normal boolean expressions. They return the presence or absence of values, not a true or false value. The presence of *any* value will trigger an alert, and vector(0) generates a value all of the time.

For example, suppose you have 5 timeseries for the metric "node_filesystem_avail_bytes".

The PromQL expression "node_filesystem_avail_bytes" returns an instant vector containing 5 values.

The PromQL expression "node_filesystem_avail_bytes < 10000000" returns an instant vector containing between 0 and 5 values; you have filtered down to just those timeseries whose values are less than the threshold.

If you use this as an alerting expression, then if the instant vector is not empty, i.e. if 1 or more machines have a value less than the threshold, then an alert is generated.

How can I set the metric to send an alert when the value is different from 0 and is null?

There is no concept of "null" in PromQL. (Well, you can store a floating point value of "NaN" in a timeseries, but that's not what we're discussing here).

Either a timeseries is present, or it is not.

Hence I'm not really sure what you're trying to alert on. What do your metrics look like?

Let me guess they look something like this:

windows_mscluster_resourcegroup_state{instance="foo",name="Available Storage"} 123

windows_mscluster_resourcegroup_state{instance="foo",name="Broken Storage"} 0

windows_mscluster_resourcegroup_state{instance="bar",name="Available Storage"} 0

windows_mscluster_resourcegroup_state{instance="bar",name="Broken Storage"} 4

Now, this alerting expression:

windows_mscluster_resourcegroup_state {name!~"Available Storage"} != 0

will only alert on the last one of these (it filters to labels which are not "Available Storage", and then it filters to values which are not 0, and only the fourth metric shown matches both conditions)

Similarly, "or" works differently to what you might expect.

foo or bar

will return a union of:

- all timeseries with metric name "foo", PLUS:

- all those timeseries with metric name "bar" which *don't* have exactly the same label sets as the timeseries on the LHS (foo)

Since vector(0) has no labels, but the expression you gave on your LHS has labels, this will *always* include vector(0) in the result set, and therefore will always generate alerts.

The question is, what sort of "missing" values do you want to look for?

For example, are you trying to alert on instance "baz", which doesn't generate *any* values for windows_mscluster_resourcegroup_state ? If so, you either need to alert explicitly on this absence, or you need to cross-reference to some other timeseries which refers to "baz" (such a timeseries is often "up"). Otherwise, the PromQL expression for windows_mscluster_resourcegroup_state has no way of knowing that you *expect* a value for baz, but there isn't one.

So one possibility is:

absent(windows_mscluster_resourcegroup_state{instance="baz",name="Available Storage"})

which will alert explicitly if there is no timeseries with that metric name and those particular labels. But you've hard-coded the existence of a machine called "baz" into your alerting rules.

Or are you trying to alert on any node which is being scraped by scrape job "windows_exporter" but is not returning windows_mscluster_resourcegroup_state with a particular label? The "up" metric tells you whether something is being scraped, so the expression might be along the lines of "... or on (instance) up"

If you show the *actual* metrics you are scraping (including the full label sets), and an example of an *actual* condition you are trying to catch, then we can help you write the expression.

For more hints:

https://www.robustperception.io/absent-alerting-for-jobs/

https://www.robustperception.io/existential-issues-with-metrics/

https://www.robustperception.io/staleness-and-promql/

https://www.robustperception.io/functions-to-avoid/

Matthias Rampke

unread,

Dec 11, 2022, 12:50:24 PM12/11/22

to Sebastian Glock, Prometheus Users

When you say "the value is missing", what condition exactly do you want to alert on?

To detect that there is *no* metric matching your selector, you can use the absent(…) function. It returns 1 when … is nothing.

It gets more complicated and difficult if you want to detect that a single series has disappeared. In this case, you need to very specific in telling Prometheus which series *should* exist. Common ways to do this are

- listing them all out with separate absent(x) clauses and specific positive matchers

- comparing to a previous time (x offset 15m unless x)

- use some other metric that lets you determine what should be there

- generate recording rules to create such a metric

The fundamental challenge here is to distinguish between "this went missing" and "this went away because of expected changes".

In general, I prefer splitting "metric indicates there is a problem " and "metric is missing" into two different alerts with separate names and descriptions. To the one investigating, the difference matters. Additionally using absent() often results in different label sets because it cannot know labels for a time series that is absent. This causes trouble with templating that you sidestep by using separate alert definitions to begin with.

/MR

On Fri, 9 Dec 2022, 08:31 Sebastian Glock, <sebag...@gmail.com> wrote:

--

Yashaswini K

unread,

Dec 17, 2022, 4:05:20 AM12/17/22

to Prometheus Users

Hi Team

Reply all

Reply to author

Forward