Hello!
I'm setting up a k8s cluster to host several services for a project. The whole thing runs on a "one-box-wonder" which is a Rocky 9 server acting as a KVM hypervisor (diagram attached).
Each K8s VM has 2 network interfaces:
- eth0 - private NAT'ed vnet from the hypervisor configured to use 192.168.123.0/24
- eth1 - plumbed to the hypervisors bridge interface (no IP configured as I only have a /28 for this project)
The server is in a colo facility so I don't have access to the network equipment.
I have configured 2 IPAddressPools (seems to be working as expected) and am using L2 advertising.
I believe I can see the announcer responding as expected when I arping the LoadBalancer external IP:
[root@prod01 ~]# arping <ipaddr>
ARPING <ipaddr> from <ipaddr> bridge0
Unicast reply from <ipaddr> [52:54:00:CE:9B:A9] 0.723ms
Unicast reply from <ipaddr> [52:54:00:CE:9B:A9] 0.694ms
Unicast reply from <ipaddr> [52:54:00:CE:9B:A9] 0.685ms
Unicast reply from <ipaddr> [52:54:00:CE:9B:A9] 0.675ms
Unicast reply from <ipaddr> [52:54:00:CE:9B:A9] 0.688ms
Unicast reply from <ipaddr> [52:54:00:CE:9B:A9] 0.664ms
Sent 6 probes (1 broadcast(s))
Received 6 response(s)
Where [52:54:00:CE:9B:A9] is the MAC of the worker that is advertising.
Similarly I can see the request come in when I run tcpdump on the announcing node:
13:14:51.001486 ARP, Request who-has 66.85.73.221 (Broadcast) tell 66.85.73.210, length 28
13:14:52.001485 ARP, Request who-has 66.85.73.221 (52:54:00:ce:9b:a9) tell 66.85.73.210, length 28
13:14:53.001486 ARP, Request who-has 66.85.73.221 (52:54:00:ce:9b:a9) tell 66.85.73.210, length 28
13:14:54.001495 ARP, Request who-has 66.85.73.221 (52:54:00:ce:9b:a9) tell 66.85.73.210, length 28
13:14:55.001492 ARP, Request who-has 66.85.73.221 (52:54:00:ce:9b:a9) tell 66.85.73.210, length 28
However when I try to access my test deployment the connection just times out and when looking at a tcpdump - it would appear that no response is coming through:
[root@kube-wrkr-2 ~]# tcpdump -n -i eth1 src host <ipaddr>
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
13:55:23.140415 IP <hypervisor IP>.36456 > <lb IP>.irisa: Flags [S], seq 2141134184, win 32120, options [mss 1460,sackOK,TS val 2395428550 ecr 0,nop,wscale 7], length 0
13:55:24.202968 IP <hypervisor IP>.36456 > <lb IP>.irisa: Flags [S], seq 2141134184, win 32120, options [mss 1460,sackOK,TS val 2395429613 ecr 0,nop,wscale 7], length 0
13:55:26.251991 IP <hypervisor IP>.36456 > <lb IP>.irisa: Flags [S], seq 2141134184, win 32120, options [mss 1460,sackOK,TS val 2395431662 ecr 0,nop,wscale 7], length 0
13:55:30.282973 IP <hypervisor IP>.36456 > <lb IP>.irisa: Flags [S], seq 2141134184, win 32120, options [mss 1460,sackOK,TS val 2395435693 ecr 0,nop,wscale 7], length 0
13:55:38.666971 IP <hypervisor IP>.36456 > <lb IP>.irisa: Flags [S], seq 2141134184, win 32120, options [mss 1460,sackOK,TS val 2395444077 ecr 0,nop,wscale 7], length 0
13:55:55.050972 IP <hypervisor IP>.36456 > <lb IP>.irisa: Flags [S], seq 2141134184, win 32120, options [mss 1460,sackOK,TS val 2395460461 ecr 0,nop,wscale 7], length 0
I'm unsure if this is a routing issue. I added the gateway for my /28 to the bridged network interface but that had no impact (and I don't see why that would prevent the announcing node from responding to traffic requests)
I don't see anything out of sorts with the metallb deployment and when I deploy with the private (
192.168.123.0/24) network things work as expected from within the virtual network.
I'm open to any and all suggestions around what I may have done to bork this up. I'm wondering if I need to go the BGP announce route rather than L2.
Thanks in advance from the community for looking at this long post and any suggestions / troubleshooting recommendations.
Best,
Arthur