Absence rules and target down

28 views
Skip to first unread message

Federico Buti

unread,
Aug 5, 2020, 4:32:20 AM8/5/20
to Prometheus Users
Hi all.

A few months ago we introduced target down rules to keep track of targets that were missing. The rules are relatively simple being something like e.g.

alert: target_down_slower_scraping_jobs
 
expr: up{job=~"monitoring-scripts-5m|monitoring-scripts-hourly"} == 0
 
for: 13m
 
labels:
   
severity: average
 
annotations:
 
// annotations here

A few days ago we wanted to introduce absence rules and we added them for both targets and metrics. That is all ok but with a side effect that we didn't consider, i.e. a metrics absent alert would of course spawn if the corresponding target is down. Looking into it I've found this blog post proposing to use unless binary operator but I'm not sure I've understood the usage and its implications.

Unless returns the first metric unless we have some match for the second. If I write something like

expr: up{job="node"} == 1 unless absent(check_success{check="xxxxx",stack="yyy",environment="zzz"})

I'm just going to return the upness if everything is fine with the node. Isn't that wrong? I mean, that would result in an alert because the node is up, which is not what we want. Even changing that to 0 would not solve the problem since we would still return the absence. Maybe changing to zero and inverting the two? But then wouldn't I have duplicated alerts for the upness? 

Is there a way to make sure absent rules take in account targets down? Or should I approach the issue in some other different ways which I'm not considering now?

Thanks in advance,
F.

Brian Brazil

unread,
Aug 5, 2020, 4:46:59 AM8/5/20
to Federico Buti, Prometheus Users
On Wed, 5 Aug 2020 at 09:32, Federico Buti <baca...@gmail.com> wrote:
Hi all.

A few months ago we introduced target down rules to keep track of targets that were missing. The rules are relatively simple being something like e.g.

alert: target_down_slower_scraping_jobs
 
expr: up{job=~"monitoring-scripts-5m|monitoring-scripts-hourly"} == 0
 
for: 13m
 
labels:
   
severity: average
 
annotations:
 
// annotations here


 
A few days ago we wanted to introduce absence rules and we added them for both targets and metrics. That is all ok but with a side effect that we didn't consider, i.e. a metrics absent alert would of course spawn if the corresponding target is down.

That doesn't sound right, make sure your absent rules were on "up".

Brian
 
Looking into it I've found this blog post proposing to use unless binary operator but I'm not sure I've understood the usage and its implications.

Unless returns the first metric unless we have some match for the second. If I write something like

expr: up{job="node"} == 1 unless absent(check_success{check="xxxxx",stack="yyy",environment="zzz"})

I'm just going to return the upness if everything is fine with the node. Isn't that wrong? I mean, that would result in an alert because the node is up, which is not what we want. Even changing that to 0 would not solve the problem since we would still return the absence. Maybe changing to zero and inverting the two? But then wouldn't I have duplicated alerts for the upness? 

Is there a way to make sure absent rules take in account targets down? Or should I approach the issue in some other different ways which I'm not considering now?

Thanks in advance,
F.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/7758b7cc-79e9-4b0a-b39f-bff6bcb62d4co%40googlegroups.com.


--

Federico Buti

unread,
Aug 5, 2020, 4:56:20 AM8/5/20
to Brian Brazil, Prometheus Users
Hi.

Thanks for the reply Brian.
So one should not alert on absence of a metric? Never ever? Just on the upness of the targets?

---
Federico Buti

Brian Brazil

unread,
Aug 5, 2020, 5:02:44 AM8/5/20
to Federico Buti, Prometheus Users
On Wed, 5 Aug 2020 at 09:56, Federico Buti <baca...@gmail.com> wrote:
Hi.

Thanks for the reply Brian.
So one should not alert on absence of a metric? Never ever? Just on the upness of the targets?

Generally you should alert on the absence of up, as that indicates something has either gone wrong with service discovery or the service has disappeared and no longer exists.

Alerting on absence of scraped metrics only really applies in niche cases where a target has a bug that'd result in it sometimes not exposing metrics. The blackbox exporter doesn't have such a bug that I'm aware of.

Brian

Federico Buti

unread,
Aug 5, 2020, 5:12:03 AM8/5/20
to Brian Brazil, Prometheus Users
Hi again.

Well, thinking about it, that makes sense. I guess I'll revisit our setup in light of this aspect.
Thanks a ton!

---
Federico Buti

Reply all
Reply to author
Forward
0 new messages