Monitoring >500 hosts ICMP

50 views
Skip to first unread message

Muhammad Banyu Mubarok

unread,
Mar 9, 2022, 3:37:42 AM3/9/22
to Prometheus Users
Hi,

Currently, I need to monitor a hundred hosts with the ICMP module using blackbox_exporter with file_sd_configs for every target host.

But the result I've got is almost the host result the '0' / 'down' value whereas the host is up and running and can be ping to that. Is there any misconfiguration on my own?

Here is my config example.

prometheus config prometheus.yml:

scrape_configs:
  - job_name: "rtu_status"
    scrape_interval: 10m
    metrics_path: /probe
    params:
      module: [icmp_ipv4]
    file_sd_configs:
    - files:
      - 'ping/*.json'
relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: ip_rtu
    - target_label: __address__
      replacement: localhost:9115


target config hosts.json:
[
  {"targets":["10.23.45.223"],"
     labels":
   {"rtu_sname":"RTU00-D1-ARK"}}
]

blackbox config blackbox.yml
modules:
  icmp_ipv4:
     prober: icmp
#     timeout: 1s
     icmp:
       preferred_ip_protocol: ip4

blackbox daemon config:
[Unit]
Description=blackbox_exporter
Wants=network-online.target
After=network-online.target

[Service]
User=blackbox
AmbientCapabilities=CAP_NET_RAW
ExecStart=/opt/blackbox-exporter/blackbox_exporter --config.file=/opt/blackbox-exporter/blackbox.yml
StandardOutput=syslog
StandardError=syslog
LimitNOFILE=8192

Brian Candler

unread,
Mar 9, 2022, 11:49:57 AM3/9/22
to Prometheus Users
The scrape interval of 10m is too long. 2m is the maximum recommended; this is because Prometheus considers timeseries which have not been updated for 5m to be stale.  However, that would make the timeseries vanish, not report a status of 0.

I suggest you take Prometheus out of the loop entirely, and just test blackbox_exporter by itself:


Does it return probe_success 0 or 1?  If 0, then look at tcpdump in another window:

tcpdump -i <devname> -nn -s0 -X host 10.23.45.223

This is a long shot, but are you running blackbox_exporter or the target host in a KVM virtual machine by any chance?  There are some cases where the virtual NIC doesn't do UDP or ICMP checksums properly, I think when connected to a virtual bridge.

Also, check these issues:
https://github.com/prometheus/blackbox_exporter/issues/360

BTW, it's perfectly OK to replace the "instance" label in cases like this, e.g.

relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__address__]
      target_label: instance

    - target_label: __address__
      replacement: localhost:9115

If you don't set the instance label then it gets set to __address__, which is always localhost:9115 (hence not very interesting)
Reply all
Reply to author
Forward
0 new messages