blackbox exporter: ICMP probes fails continually after some short DNS outages, until manual restart of the blackbox-exporter container

Tomáš Bartek

unread,

Apr 6, 2020, 9:31:20 AM4/6/20

to Prometheus Users

Hi everybody.

Brian told us to move here this issue, as here it is more proper place to discuss it.

We have the following issue with blackbox exporter.

We run blackbox-exporter inside docker container. Suddenly, without any changes on working machine or container,

ping probe starts failing for one or more targets, while other targets remain ok.

But when I run manually ping tool inside docker container and on host OS outside the container, both succeed.

When we restart docker container, issue disappears, but occurs after some time again.

We experienced this behavior for two of ours internal IP targets simultaneously (both from the same datacenter) and later for other public targets:

8.8.8.8, 1.1.1.1.

I examined the problem with a tcpdump and it shows only request packets (no reply packets):

tcpdump -i eth0 -nn -s0 -X icmp and host 8.8.8.8
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:28:48.734661 IP 172.17.0.5 > 8.8.8.8: ICMP echo request, id 33313, seq 41979, length 36
	0x0000:  4500 0038 f40e 4000 4001 8a90 ac11 0005  E..8..@.@.......
	0x0010:  0808 0808 0800 7648 8221 a3fb 5072 6f6d  ......vH.!..Prom
	0x0020:  6574 6865 7573 2042 6c61 636b 626f 7820  etheus.Blackbox.
	0x0030:  4578 706f 7274 6572                      Exporter
15:28:48.977456 IP 172.17.0.5 > 8.8.8.8: ICMP echo request, id 33313, seq 41982, length 36
	0x0000:  4500 0038 f41d 4000 4001 8a81 ac11 0005  E..8..@.@.......
	0x0010:  0808 0808 0800 7645 8221 a3fe 5072 6f6d  ......vE.!..Prom
	0x0020:  6574 6865 7573 2042 6c61 636b 626f 7820  etheus.Blackbox.
	0x0030:  4578 706f 7274 6572                      Exporter

This is tcpdump output, when I start ping manually inside the container, along the blackbox-exporter (blackbox-exporter id==33313):

root @ /
 [4] 🐳  →  tcpdump -i eth0 icmp and host 1.1.1.1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
14:48:50.599214 IP a382643a1270 > one.one.one.one: ICMP echo request, id 33313, seq 50421, length 36
14:48:51.384085 IP a382643a1270 > one.one.one.one: ICMP echo request, id 35072, seq 5, length 64
14:48:51.392669 IP one.one.one.one > a382643a1270: ICMP echo reply, id 35072, seq 5, length 64
14:48:51.599289 IP a382643a1270 > one.one.one.one: ICMP echo request, id 33313, seq 50435, length 36
14:48:52.384292 IP a382643a1270 > one.one.one.one: ICMP echo request, id 35072, seq 6, length 64
14:48:52.393031 IP one.one.one.one > a382643a1270: ICMP echo reply, id 35072, seq 6, length 64
14:48:52.599559 IP a382643a1270 > one.one.one.one: ICMP echo request, id 33313, seq 50449, length 36
14:48:53.384517 IP a382643a1270 > one.one.one.one: ICMP echo request, id 35072, seq 7, length 64
14:48:53.396626 IP one.one.one.one > a382643a1270: ICMP echo reply, id 35072, seq 7, length 64

I also checked if there is any zero-filled ID field in IP header, as it was discussed in a very similar issue here: #360, but it is not our case.

The only correlations which we found in Grafana, are very short outages of connection from the blackbox-exporter machine to

some of ours internal DNS servers (spikes are in the same time as the probes starts failing) monitored with the same blackbox-exporter ...

I would check more deeply, what's going on, but I have no idea where to look now.

Please, don't You have any suggestions what else to check or how to possibly debug it?

Kind regards,

Tomáš Bartek

unread,

Apr 7, 2020, 8:40:44 AM4/7/20

to Prometheus Users

To be sure about our network, we checked ICMP traffic on our border router

(last router between our intranet and internet).

It looks exactly same as on the blackbox exporter machine. ICMP request packet are going out, but

no reply packets are coming back (with the exception of manually started ping with target 8.8.8.8):

It seems to me, that blackbox exporter request packets are dumped somewhere on Internet and it is

done on the basis of their content, because standard ping is not filtered out.

Any hint what could be wrong?
Thanks in advance.

Ciao, Tom.

Dne pondělí 6. dubna 2020 15:31:20 UTC+2 Tomáš Bartek napsal(a):

Brian Candler

unread,

Apr 7, 2020, 8:58:22 AM4/7/20

to Prometheus Users

In your first example, which shows a failing blackbox_exporter ping in hex: can you shows this both when it's working and after it fails? Might show something different in the packets being sent out.

And in your second example (ping from the command line working, whilst blackbox_exporter ping not working), can you show the full packet in hex too?

Brian Candler

unread,

Apr 7, 2020, 9:24:08 AM4/7/20

to Prometheus Users

15:28:48.734661 IP 172.17.0.5 > 8.8.8.8: ICMP echo request, id 33313, seq 41979, length 36
	0x0000:  4500 0038 f40e 4000 4001 8a90 ac11 0005  E..8..@.@.......
	0x0010:  0808 0808 0800 7648 8221 a3fb 5072 6f6d  ......vH.!..Prom
	0x0020:  6574 6865 7573 2042 6c61 636b 626f 7820  etheus.Blackbox.
	0x0030:  4578 706f 7274 6572                      Exporter

FWIW, I've recalculated the IP and ICMP checksums, and both appear to be correct.

You're definitely sure the packets are leaving the machine? Can you try a blackbox test of some remote host or VM on the Internet that you control, and run tcpdump there to check for incoming ICMP?

Tomáš Bartek

unread,

Apr 7, 2020, 10:53:24 AM4/7/20

to Prometheus Users

Hi Brian,

sure, here is hex output of simultaneously running ping and blackbox exporter:

sudo tcpdump -i ens160 -n -X icmp and host 8.8.8.8
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens160, link-type EN10MB (Ethernet), capture size 262144 bytes
15:51:57.237795 IP 192.168.80.21 > 8.8.8.8: ICMP echo request, id 33313, seq 2136, length 36
	0x0000:  4500 0038 ea2e 4000 3f01 30c9 c0a8 5015  E..8..@.?.0...P.
	0x0010:  0808 0808 0800 11ec 8221 0858 5072 6f6d  .........!.XProm
	0x0020:  6574 6865 7573 2042 6c61 636b 626f 7820  etheus.Blackbox.
	0x0030:  4578 706f 7274 6572                      Exporter
15:51:57.396007 IP 192.168.80.21 > 8.8.8.8: ICMP echo request, id 16978, seq 42826, length 64
	0x0000:  4500 0054 2aa5 4000 4001 ef36 c0a8 5015  E..T*.@.@..6..P.
	0x0010:  0808 0808 0800 8da1 4252 a74a 7d85 8c5e  ........BR.J}..^
	0x0020:  0000 0000 b20a 0600 0000 0000 1011 1213  ................
	0x0030:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
	0x0040:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
	0x0050:  3435 3637                                4567
15:51:57.398327 IP 8.8.8.8 > 192.168.80.21: ICMP echo reply, id 16978, seq 42826, length 64
	0x0000:  4520 0054 0000 0000 3401 65bc 0808 0808  E..T....4.e.....
	0x0010:  c0a8 5015 0000 95a1 4252 a74a 7d85 8c5e  ..P.....BR.J}..^
	0x0020:  0000 0000 b20a 0600 0000 0000 1011 1213  ................
	0x0030:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
	0x0040:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
	0x0050:  3435 3637                                4567

To get tcpdump of packets before the ping probe of blacbox exporter failed, I have to restart the blackbox exporter container and

wait for some time (maybe hours) until the probe will start failing again (only coincidence at this times are short spikes of DNS outages in

our network)

For now, I have created a droplet on Digital Ocean and add its IP address as blackbox exporter probe. The probe is now working

ok and will fail after some time as other, I suggest. Here are tcpdumps:

tcpdump on blackbox exporter machine:

sudo tcpdump -i ens160 -n -X icmp and host 104.248.242.37
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens160, link-type EN10MB (Ethernet), capture size 262144 bytes
16:37:30.997577 IP 192.168.80.21 > 104.248.242.37: ICMP echo request, id 33313, seq 7358, length 36
	0x0000:  4500 0038 7e9d 4000 3f01 514c c0a8 5015  E..8~.@.?.QL..P.
	0x0010:  68f8 f225 0800 fd85 8221 1cbe 5072 6f6d  h..%.....!..Prom
	0x0020:  6574 6865 7573 2042 6c61 636b 626f 7820  etheus.Blackbox.
	0x0030:  4578 706f 7274 6572                      Exporter
16:37:31.013151 IP 104.248.242.37 > 192.168.80.21: ICMP echo reply, id 33313, seq 7358, length 36
	0x0000:  4520 0038 a8d3 0000 2f01 76f6 68f8 f225  E..8..../.v.h..%
	0x0010:  c0a8 5015 0000 0586 8221 1cbe 5072 6f6d  ..P......!..Prom
	0x0020:  6574 6865 7573 2042 6c61 636b 626f 7820  etheus.Blackbox.
	0x0030:  4578 706f 7274 6572                      Exporter

tcpdump on remote machine (DigitalOcean):

root@ubuntu-s-1vcpu-1gb-fra1-01:~# tcpdump -i any -X icmp 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
14:38:51.013616 IP 109.183.26.126 > ubuntu-s-1vcpu-1gb-fra1-01: ICMP echo request, id 33313, seq 7518, length 36
	0x0000:  4500 0038 a44c 4000 3001 c325 6db7 1a7e  E..8.L@.0..%m..~
	0x0010:  68f8 f225 0800 fce5 8221 1d5e 5072 6f6d  h..%.....!.^Prom
	0x0020:  6574 6865 7573 2042 6c61 636b 626f 7820  etheus.Blackbox.
	0x0030:  4578 706f 7274 6572                      Exporter
14:38:51.013648 IP ubuntu-s-1vcpu-1gb-fra1-01 > 109.183.26.126: ICMP echo reply, id 33313, seq 7518, length 36
	0x0000:  4500 0038 d6c0 0000 4001 c0b1 68f8 f225  E..8....@...h..%
	0x0010:  6db7 1a7e 0000 04e6 8221 1d5e 5072 6f6d  m..~.....!.^Prom
	0x0020:  6574 6865 7573 2042 6c61 636b 626f 7820  etheus.Blackbox.
	0x0030:  4578 706f 7274 6572                      Exporter

Thank You very much for advice,

I will post remaining output, when the new probe starts to fail ...

Regards,

Tom

Dne úterý 7. dubna 2020 15:24:08 UTC+2 Brian Candler napsal(a):

Brian Candler

unread,

Apr 7, 2020, 11:08:24 AM4/7/20

to Prometheus Users

On Tuesday, 7 April 2020 15:53:24 UTC+1, Tomáš Bartek wrote:

sudo tcpdump -i ens160 -n -X icmp and host 8.8.8.8
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens160, link-type EN10MB (Ethernet), capture size 262144 bytes
15:51:57.237795 IP 192.168.80.21 > 8.8.8.8: ICMP echo request, id 33313, seq 2136, length 36
	0x0000:  4500 0038 ea2e 4000 3f01 30c9 c0a8 5015  E..8..@.?.0...P.
	0x0010:  0808 0808 0800 11ec 8221 0858 5072 6f6d  .........!.XProm
	0x0020:  6574 6865 7573 2042 6c61 636b 626f 7820  etheus.Blackbox.
	0x0030:  4578 706f 7274 6572                      Exporter
15:51:57.396007 IP 192.168.80.21 > 8.8.8.8: ICMP echo request, id 16978, seq 42826, length 64
	0x0000:  4500 0054 2aa5 4000 4001 ef36 c0a8 5015  E..T*.@.@..6..P.
	0x0010:  0808 0808 0800 8da1 4252 a74a 7d85 8c5e  ........BR.J}..^
	0x0020:  0000 0000 b20a 0600 0000 0000 1011 1213  ................
	0x0030:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
	0x0040:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
	0x0050:  3435 3637                                4567

4500 = IP version

0038 = total length

ea2e = identification
4000 = flags (DF), zero fragment offset
3f01 = TTL 63, protocol 1 (ICMP)
30c9 = checksum
c0 a8 50 15 = source IP
08 08 08 08 = dest IP

Looks sound to me, but let's see what happens now that you have the receiver on the droplet, when it next fails.

Reply all

Reply to author

Forward