Probe_sucess=0 for more than 24hrs promql

156 views
Skip to first unread message

Amit Das

unread,
Mar 16, 2021, 3:59:50 PM3/16/21
to Prometheus Users
Hi,
I would like to get alerts if my blackbox exporter targets are down for more than 24hrs.

I have used 
"for 24h"
which is not correct. I am looking for alert rules 

probe_sucess=0{job="abc"} > 24h then i should get the alert triggered.

Can someone correct the syntax

thanks

Matthias Rampke

unread,
Mar 16, 2021, 6:55:31 PM3/16/21
to Amit Das, Prometheus Users
Use the fact that it's a number and you can take the maximum of all values in the last 24 hours:

max_over_time(probe_success{job="abc"}[24h]) == 0

No for clause needed in this case.

/MR

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/66de0c3d-2ce1-42d6-92b4-6a64137d4950n%40googlegroups.com.

Amit Das

unread,
Mar 17, 2021, 1:30:24 PM3/17/21
to Prometheus Users
Hi Thank you for your reply.

I tried this querry in alerts as well as as in prometheus dashboard alerts 9090. 
There is no alert fired even though i changed max_over_time(probe_success{job="abc"}[5m]) == 0

other alerts like slow probes etc are firing. I am not sure this the correct querry i have used.  there are instances which are down for more 24hrs. so in my case i should have a alert fired.

Does the alert will wait till 24hrs and then fired?

Matthias Rampke

unread,
Mar 17, 2021, 6:25:13 PM3/17/21
to Amit Das, Prometheus Users
Try the query interactively (in the expression browser or Grafana's Explore). Strip out parts like the == 0 filter, and the max_over_time, to understand the shape of the data. You may have to vary how exactly you query. 

My main point is that you can use the fact that the metric value is a number, and do calculations on it. We often use avg_over_time to alert when (roughly) a certain fraction of probes fails.

/MR

Reply all
Reply to author
Forward
0 new messages