My Blackbox exporter is already running with Debug Log Mode and still, I don't see and probe failed logs for that period.



--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/oKAPrUljkU0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/90269156-0db5-4c57-bcdf-7c1f310ad7b0o%40googlegroups.com.


--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/oKAPrUljkU0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/32600890-13b1-45f5-a8cf-d3ef931065c0o%40googlegroups.com.

What does min_over_time(up{job="Ping-All-Servers"}[5m]) show? In other words, is it the scrape to BBE which is failing, or the BBE probe? (I think the latter).
--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/oKAPrUljkU0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/b466378c-050d-45ad-9910-7af69fc92d69o%40googlegroups.com.
Try looking at scrape_duration_seconds{job="Ping-All-Servers"}. Maybe it's borderline to the scrape interval.>> That's interesting. Here are the top 20 scrape_duration_seconds maxed for last 1 hour by instance. Close to 5 seconds. Can this lead to some issue?
icmp_prober:
prober: icmp
timeout: 30s
icmp:
preferred_ip_protocol: ip4 On Saturday, 7 November 2020 13:35:47 UTC, Yagyansh S. Kumar wrote:Try looking at scrape_duration_seconds{job="Ping-All-Servers"}. Maybe it's borderline to the scrape interval.>> That's interesting. Here are the top 20 scrape_duration_seconds maxed for last 1 hour by instance. Close to 5 seconds. Can this lead to some issue?Possibly. Maybe the scrape timeout handling has changed slightly between those version of prometheus. I would in any case be concerned about the scrape duration being so close to the scrape interval, although failed scrapes should still show as "up == 0".
However, I note that the scrape.yml you posted shows the Ping-All-Servers job with a scrape interval of 10s, not 5s.


I also notice your module config has:icmp_prober: prober: icmp timeout: 30s icmp: preferred_ip_protocol: ip4I *think* the timeout is clipped to just under the scrape interval, so it should work, but I'd be inclined to set it lower anyway (say 3s); if you don't get a reply within 3s, you're unlikely to get one.
>> Yes, I agree. To eliminate its role, I'll change it. I also noticed that my timeout is 4.5 seconds(Because scrape is 5s) for my icmp module.
Since this test only does one ping, I would *expect* it to fail from time to time, and hence the alert go into "pending" state until the "for: 1m" has run its course.
--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/oKAPrUljkU0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/73d68985-b5e4-4413-a70e-6c4d54bcf57eo%40googlegroups.com.
I don't think it's a false alert. If it's the rule you showed, then the only way you can get an alert is if the metric probe_success has value zero. You should try to understand why BBE is returning zero; if necessary use tcpdump or wireshark to capture the HTTP traffic to and from it.But you also need to resolve the issue with BBE panicking - does it log a backtrace when it does this? If so, showing the backtrace could help identify what's going on.
--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/oKAPrUljkU0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/0868c8e7-10a5-456b-a2b0-ce787a252257o%40googlegroups.com.
I'll try and get a backtrace and post it here.But still the question remains, is BBE is returning probe_success 0, why is it doing only for 2.20.1 🙄.
>> Pretty scary result for 2.20.1.


--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/oKAPrUljkU0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/04177020-528b-4629-bfea-92aae8c72757o%40googlegroups.com.