blackbox exporter ICMP probe fails on instances on another network interface

442 views
Skip to first unread message

Yashar Nesabian

unread,
May 5, 2020, 2:24:43 PM5/5/20
to Prometheus Users
We want to health check our servers with the Blackbox exporter through the ICMP probe, the black box exporter server and the Prometheus server both have two interfaces:
172.16.76.0/22
172.20.0.0/22

The problem is, the Blackbox exporter can ping the servers through the interface on 172.16.76.0/22 but it cannot ping the servers which their network interface is located on 172.20.0.0/22 but I can ping them manually in both Blackbox exporter and Prometheus server:
PING 172.20.3.29 (172.20.3.29) 56(84) bytes of data.
64 bytes from 172.20.3.29: icmp_seq=1 ttl=63 time=0.187 ms
64 bytes from 172.20.3.29: icmp_seq=2 ttl=63 time=0.294 ms
64 bytes from 172.20.3.29: icmp_seq=3 ttl=63 time=0.218 ms
64 bytes from 172.20.3.29: icmp_seq=4 ttl=63 time=0.211 ms
^C
--- 172.20.3.29 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3002ms
rtt min
/avg/max/mdev = 0.187/0.227/0.294/0.042 ms
Here is the additional information:
Blackbox exporter host and Prometheus server:
4.4.0-176-generic #206-Ubuntu SMP Fri Feb 28 05:02:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Blackbox exporter version:
blackbox_exporter, version 0.16.0 (branch: HEAD, revision: 991f898)
build user: root@64f600555645
build date: 20191111-16:27:24
go version: go1.13.4


 blackbox.yml module config:
 server_health_check:
    prober
: icmp
    timeout
: 15s
    icmp
:
      preferred_ip_protocol
: "ip4"
      source_ip_address
: "172.16.76.147"


prometheus.yml scrape config:
 - job_name: blackbox_healthcheck
    scrape_interval
: 30s
    scrape_timeout
: 15s
    metrics_path
: /probe
    params:
      module: [server_health_check]
    file_sd_configs:
      - files:
        - 'file_sd/
opennebula_vms.yml'
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: plugins-01.tool.x.y.z:9115

adding &debug=true to the probe URL:
Logs for the probe:
ts
=2020-05-05T16:07:39.322818925Z caller=main.go:304 module=server_health_check target=172.20.3.29 level=info msg="Beginning probe" probe=icmp timeout_seconds=15
ts
=2020-05-05T16:07:39.322941208Z caller=icmp.go:82 module=server_health_check target=172.20.3.29 level=info msg="Resolving target address" ip_protocol=ip4
ts
=2020-05-05T16:07:39.322962788Z caller=icmp.go:82 module=server_health_check target=172.20.3.29 level=info msg="Resolved target address" ip=172.20.3.29
ts
=2020-05-05T16:07:39.322975038Z caller=main.go:119 module=server_health_check target=172.20.3.29 level=info msg="Using source address" srcIP=172.16.76.147
ts
=2020-05-05T16:07:39.322990936Z caller=main.go:119 module=server_health_check target=172.20.3.29 level=info msg="Creating socket"
ts
=2020-05-05T16:07:39.323043107Z caller=main.go:119 module=server_health_check target=172.20.3.29 level=info msg="Creating ICMP packet" seq=33370 id=29272
ts
=2020-05-05T16:07:39.323060805Z caller=main.go:119 module=server_health_check target=172.20.3.29 level=info msg="Writing out packet"
ts
=2020-05-05T16:07:39.323151009Z caller=main.go:119 module=server_health_check target=172.20.3.29 level=info msg="Waiting for reply packets"
ts
=2020-05-05T16:07:54.322978818Z caller=main.go:119 module=server_health_check target=172.20.3.29 level=warn msg="Timeout reading from socket" err="read ip4 172.16.76.147: i/o timeout"
ts
=2020-05-05T16:07:54.323073248Z caller=main.go:304 module=server_health_check target=172.20.3.29 level=error msg="Probe failed" duration_seconds=15.000189909

What I get when I ping instance on 172.16.76.0/22 subnet:
probe_success 1

What I get when I ping instance on  172.20.0.0/22 subnet:
probe_success 0


Brian Candler

unread,
May 5, 2020, 5:13:37 PM5/5/20
to Prometheus Users
I notice you have in your blackbox config:    source_ip_address: "172.16.76.147"

I think this is likely the problem, and I expect you will find the same issue with ping.  That is, if you do

    ping -I 172.16.76.147 172.20.3.29

(that's a capital 'i' for source interface) I guess you don't get a response.

Solution is most likely to remove the source_ip_address setting.
Reply all
Reply to author
Forward
0 new messages