BGP mode via non-default route transfer net?

160 views
Skip to first unread message

Chris S

unread,
Jan 24, 2020, 12:12:06 PM1/24/20
to metallb-users
Hey there,

I'm currently trying to make some public IPs available via LBs on an otherwise "private" kubernetes cluster.
For this I added an isolated transfer network/vlan with the kubernetes nodes and our edgerouter in it and set up BGP peering over that. This is generally working for the incoming traffic, however the responses don't get to the client.
After setting the test-service to externalTrafficPolicy: Local it becomes apparent that I misunderstood metallb BGP routing, because the responses are getting routed to the default gateway in the internal network.

Now my question is, is this possible to make this scenario work at all? One idea was to somehow rewrite the gateway for traffic coming from LB IPs, but I have no idea how to achieve that in a kubernetes setting.... or is it somehow possible to route the traffic back through metallb?

Davy Priem

unread,
May 9, 2020, 3:34:45 PM5/9/20
to metallb-users
Hi,

I kind have a simular issue in L2 mode. My worker nodes have 2 NIC interfaces: one for private net and one for public net. On the worker node itself, there's a policy based route defined based on source address (from private_net IP to private_net router, from pub IP to pub IP router, both are 2 different routerdevices).  Default gw is set to private net. When a pub IP X is assigned directly to the nic on the worker, traffic flow is correct and services that are running on the node itself are reachable form all networks. When MetalLB assigns a pub IP Y to a container, the policy based routing is ignored and all traffic is send to the default gateway. This causes asymmetric routing and is not allowed in our network. The linux kernel will also block (on many distributions) this traffic because of the rp_filter (see https://access.redhat.com/solutions/53031). I think with k8s, the policy based routing does not work because traffic gets routed before it is source NAT-ed. If I by example make a PBR based on the IP address of the pod, I can send traffic to a different gateway.

Best regards
Davy Priem

Op vrijdag 24 januari 2020 18:12:06 UTC+1 schreef Chris S:

Chris S

unread,
May 9, 2020, 4:07:45 PM5/9/20
to metallb-users
Hey Davy,

I managed to solve this issue by changing my network plugin to kube-router, which uses bgp internally, making it a bit more predictable. That way there is less magic going on with IP rewrites via iptables (at least at the points where it matters for this) and you can actually do source-based routing. Though IIRC I did still have to disable rp_filter on the kubernetes nodes themselves.

Best Regards
Chris

Fawzi Masri

unread,
Aug 27, 2020, 1:44:40 AM8/27/20
to metallb-users
Hi Chris,
Can you point me to any info on how Kube-router works with Metallb in BGP mode?  Kube-router does not have any info on their website..only a broken link (https://www.kube-router.io/docs/user-guide/#bgp-configuration).
Regards,
Fawzi

Chris S

unread,
Aug 27, 2020, 5:49:40 AM8/27/20
to metallb-users
Hey Fawzi,
in my setup kube-router and metallb don't directly interact with each other. The switch to kube-router just meant that I was able to configure source-based routing (quick example post: https://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.rpdb.simple.html) since it internally uses "real" routing due to it internally using BGP instead of intransparent iptables-rewriting all over the place like eg. weave uses.
If you want to make the internal kubernetes network accessible from outside the cluster you'll have to peer kube-router with an external BGP router - metallb isn't able to receive routes, it can only send them out and there is no reason to send the LB routes to kube-router since AFAIK it is aware of the LB IPs and handles them internally if they are called from within the cluster.
If you want both LB IPs and the internal network reachable from outside you'll have to configure both individually to peer with the external BGP router(s), though from what I see there is little use in doing both since they are two ways to achieve the same goal: making internal cluster resources available to the outside network.

Fawzi Masri

unread,
Aug 27, 2020, 3:18:51 PM8/27/20
to metallb-users
Thanks Chris,

Ok, I see.. so Kube-route will use BGP to network the cluster internally.

I am trying to build an HA cluster with replicated  masters and workers in 3 separate buildings.  And I am not able to understand how BGP will reroute traffic for public users.  specifically, if I am on public network accessing my nextcloud-which happens to be running on node1, I would go to e.g. cloud.mycompany.com.  But if node1 has  a power outage, who would update the dns record so that my cloud.mycompny.com would go to node2 or node3?  is this where a gateway router with BGP support comes in to specify a different hop for reaching my domain name?

regards,

Reply all
Reply to author
Forward
0 new messages