Hello,
I am running 24 smokeping prober instances in different parts of my datacenter each located behind different firewalls. All of these 24 probers running pings againts a specific target. The ping frequency is 0.2s so 5 pings per seconds per target and 5x24 = 120 pings per second.
The target is rebooting every night and the reboot takes around 30s.
For some unknown reason all 24 smokeping_probers are not able to reach this target anymore after a reboot 14 days agon. They did not recover automatically. Only change to recover is to restart the smokeping_prober service.
on another smokeping_prober instance I did a tcpdump on the nexthop device and I noticed that there is not any icmp-request sent out by smokeping_prober to this specific target. However the same smokeping_prober instance is sending pings to other targets successfully and this smokeping_prober instance (RedHat 8) ins answering icmp requests from other smokeping_prober instances.
TL;DR:
- 1 target is pingend by 24 smokeping_prober 0.8.1 imnstances on RHEL 8
- The target reboots once a day and is dow for around 30s
- 14 days ago all 24 instances can not reach this traget anymore. tcpdump confirms they do NOT send any ICMP requests anymore to only this specific target
- systemctl restart smokeping_prober.service recovers the instance and target can be reached.
Any ideas why and how to investigate why from some reason 1 target can not be reached anymore at the same time for 24 smokeping_prober instances?
PS:
Of course - running "ping" on the RHEL8 system where smokeping_prober is installed can ping the target. The issue is, smokeping_prober stopped sending out pings to this target for some reason.