Hi all.
A few months ago we introduced target down rules to keep track of targets that were missing. The rules are relatively simple being something like e.g.
alert: target_down_slower_scraping_jobs
expr: up{job=~"monitoring-scripts-5m|monitoring-scripts-hourly"} == 0
for: 13m
labels:
severity: average
annotations:
// annotations here
A few days ago we wanted to introduce absence rules and we added them for both targets and metrics. That is all ok but with a side effect that we didn't consider, i.e. a metrics absent alert would of course spawn if the corresponding target is down. Looking into it I've found
this blog post proposing to use unless binary operator but I'm not sure I've understood the usage and its implications.
Unless returns the first metric unless we have some match for the second. If I write something like
expr: up{job="node"} == 1 unless absent(check_success{check="xxxxx",stack="yyy",environment="zzz"})
I'm just going to return the upness if everything is fine with the node. Isn't that wrong? I mean, that would result in an alert because the node is up, which is not what we want. Even changing that to 0 would not solve the problem since we would still return the absence. Maybe changing to zero and inverting the two? But then wouldn't I have duplicated alerts for the upness?
Is there a way to make sure absent rules take in account targets down? Or should I approach the issue in some other different ways which I'm not considering now?
Thanks in advance,
F.