blackbox exporter: instance down since what time

423 views
Skip to first unread message

vinay Hegde

unread,
Dec 3, 2021, 2:43:00 AM12/3/21
to Prometheus Users
Hi Team,
I need a help in getting instance status with time information when using blackbox exporter.
Basically, I am using ICMP ping4 module to get the instance up/down(0/1) status using 'probe_success' metrics. I am able to get the instance status correctly.
But, I also need to find 'Since what time' or 'how long' the instance is down (if it is down).
I am not getting the right query for the same. Kindly help.

Below is the content of balckbox.yml:
modules: icmp: prober: icmp timeout: 5s icmp: preferred_ip_protocol: "ip4"

The query I have used is: probe_success{job="job_name"}

I need a similar query to get 'since when the system is down' when it is down.

Regards
Vinay Hegde

vinay Hegde

unread,
Dec 13, 2021, 3:12:42 AM12/13/21
to Prometheus Users
Re-posting as I did not get any response.

Thanks & Regards
Vinay Hegde

ee1

unread,
Dec 22, 2021, 7:03:41 PM12/22/21
to Prometheus Users
Here is a way.  Specifically, this looks back 1h, and returns the number of minutes the instance you care about has been down.

((240 - sum_over_time(probe_success{instance="192.168.1.71"}[1h])) * 15) / 60

In the above case, prometheus is polling every 15s, so every hour has 3600 / 15 = 240 samples.  We add up all the 1s returned by the probe_success metric, take difference from 240, and multiple by the 15s polling time.  Then divide by 60 to get minutes.  Obviously adjust the math to your desired "look back" interval and polling rate.

It's not perfect because if your instance flapped a bunch of times in that hour, we are still computing total downtime, not time since last time it went down and stayed down, but should be good enough.

vinay Hegde

unread,
Jul 13, 2022, 3:35:37 AM7/13/22
to Prometheus Users
Thank you.
Reply all
Reply to author
Forward
0 new messages