Networking Workshop for Contributors summit (Kubecon Paris)

386 views
Skip to first unread message

Antonio Ojea

unread,
Feb 15, 2024, 2:38:42 PMFeb 15
to kubernetes-sig-network
Hi all,

Following up on the discussion in today's meeting, I've proposed a session/workshop with Dan Winship for the next contributor summit in Paris with the title "Future of k8s networking". 

The main motivation  that there are different groups and projects working on different solutions that may have certain overlap: CNI, CRI, KNI, Multi-Network,  DRA, ... all those projects have a considerable level of complexity and high risk of scope creep, and this is causing confusion and frustration in the community.

The goal of the workshop is to clearly define the problems so we can evaluate what are the required changes in the project to enable these new workloads in a sustainable way. We aim to follow the current extensible and pluggable model, to build the right interfaces and abstractions in Kubernetes that allows these new environments and workloads to thrive independently, having Gateway API as an example of success story.

It will help a lot if we do some homework and we write down the problems we want to solve, so we can be more productive during the workshop.

I've created this doc that can be used with such a purpose

Regards,
Antonio Ojea





Michael Zappa

unread,
Feb 21, 2024, 10:06:40 AMFeb 21
to kubernetes-sig-network
Hello,

Thanks for getting this going. Unfortunately, several of us are unable to attend and the CNCF won't allow remote. Are you able to expand on "all those projects have a considerable level of complexity and high risk of scope creep, and this is causing confusion and frustration in the community"? I am looking for specifics of the who, what, where and when, especially the last part. 

Michael Zappa

unread,
Feb 21, 2024, 11:12:57 AMFeb 21
to kubernetes-sig-network
To expand,

Maciej, myself, shane and Patrick (DRA) are syncing Friday to iron some of these details out. Only Shane will be in Paris so he can bring the discussion points to this workshop. 

Antonio Ojea

unread,
Feb 21, 2024, 11:18:08 AMFeb 21
to Michael Zappa, kubernetes-sig-network

This is a personal observation so it is a subjective opinion,  I will be super happy to be wrong,  but I think the comments on the different PRs of those KEPS are showing some evidence of this claim, also I'm spending a lot of time talking with people and this is the feedback I'm receiving of people I have in high consideration

The KEPS are still missing the user stories, they are describing "networking use cases": I want to use two interfaces, I want to connect this to that, ...  but there is no mention to the user problems that require those "networking solutions"", best example this discussion https://github.com/kubernetes/enhancements/pull/4477#discussion_r1493830431 

Just a personal reflection, Kubernetes is able to deploy a web application and expose it with autoscaling and automatics rolling update just doin "kubectl apply" ... and this is just a Deployment and Services object... you don't need to configure a LB Pod that connects to the subnet B isolated by the Router Pod ... as part of a community that is writing a piece of code on the evolution of networking I want to think that we can keep innovating and find new and elegant solutions to the existing problems


--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/cc0e327f-46bd-45be-8770-d14300e4a84cn%40googlegroups.com.
Message has been deleted

Antonio Ojea

unread,
Mar 29, 2024, 8:53:57 AMMar 29
to Benjamin Leggett, kubernetes-sig-network

One of the outcomes of the workshop was to understand these user stories, we had a lot of useful conversations, and all the people I could talk with about the storage, multus and all multi network scenarios were always referring to external networks. The user story was "I want to add an additional interface to a Pod that allows me to connect the Pod to an EXTERNAL network", it may be because my Pod can act as a router or tunnel endpoint , or some servers have a vlan that connects to my storage backends.

The concept of external network here is important, this is a network that is out of the Kubernetes network domain and also has a different persona, a "network administrator" that is independent of the "cluster administrator". Kubernetes is not an infra provider, is not that Kubernetes will now start configuring routers and switches across datacenter, or start to create VPC networks and subnets, ... we can not model these networks declaratively if Kubernetes is not authoritative on that domain ... remember we removed  one API from core just for this very same reason https://github.com/kubernetes/kubernetes/pull/121229 

The developer stories I heard once we clarified this point were more clear, we moved from "I need Services" to "I need to be able to get these additional interfaces IPs from the API so I can implement my discovery in this secondary networks, as I use technology foo that is an appliance on this external network"

So, I could confirm that my assumption was right and the user stories were missing and now they are more clear ... much. much more

On Fri, 29 Mar 2024 at 13:31, 'Benjamin Leggett' via kubernetes-sig-network <kubernetes-...@googlegroups.com> wrote:
Users don't configure networking, vendors, CNIs, and service meshes do - so you will always struggle to find user stories from end users around this. Users just expect the networking to be there and work.

That doesn't mean it's not a problem that needs solving, just that you're probably asking the wrong people (end users).

Antonio Ojea

unread,
Apr 3, 2024, 7:09:39 AMApr 3
to Maciej Skrocki, Benjamin Leggett, kubernetes-sig-network
I don't follow this comparison, Kubelet runs in the VM and exposes it as the Node object, kubelet literally owns the VM , the node object does not have many fields in Spec for the users to declaratively change the VM, it mostly exposes the VM resources IPs, memory, cpu, devices, ... so scheduler can send Pods to consume those resources.

If people want to build higher level abstractions to model external networks is ok, but not in a core API, we can not have a core API that depends on external systems and personas and impact the whole system without any chance of reconciliation, assuming that users will need to reflect the external network configuration is error prone and breaks the cluster bootstrapping, but assuming that this external network is not going to change forever once configured is simply not true.

On Wed, 3 Apr 2024 at 00:37, Maciej Skrocki <maciej...@google.com> wrote:
Following the authoritative argument, we should not have a Node object either.
I agree that it is the network administrator that configures the network outside the cluster, but then we still need a way to represent that network inside K8s, same as we do for Node.

Maciej Skrocki

unread,
Apr 5, 2024, 9:07:19 AMApr 5
to Antonio Ojea, Benjamin Leggett, kubernetes-sig-network
Following the authoritative argument, we should not have a Node object either.
I agree that it is the network administrator that configures the network outside the cluster, but then we still need a way to represent that network inside K8s, same as we do for Node.

On Fri, Mar 29, 2024 at 5:53 AM Antonio Ojea <antonio.o...@gmail.com> wrote:

Costin Manolache

unread,
Apr 5, 2024, 9:07:20 AMApr 5
to Antonio Ojea, Benjamin Leggett, kubernetes-sig-network
Is there a list of use cases and requests from users ? "I want to add an interface to a pod" sounds like an implementation choice, not a user requirement like "I want pod to be directly connected to a network so it can communicate without NAT for some protocol".

We did many hacks around CNI for mesh - a common use case is "communicate across multiple k8s clusters with different CNIs and overlapping CIDR ranges for pods/services". 

There are quite a few options involving user space proxies as sidecars and iptables - creating overlay networks without having an actual second interface in the Pod - and we are having a lot of problems making them work reliability without core support. I think we can extract some good requirements and use cases from the mesh implementations.

Costin

Sandor Szuecs

unread,
Apr 5, 2024, 6:35:05 PMApr 5
to Costin Manolache, Antonio Ojea, Benjamin Leggett, kubernetes-sig-network
Hi!

Maybe glueing multiple clusters via mesh is just wrong. We want reliable clusters and that sounds already pretty complex and the opposite of reliable. 
Why not tell that clusters should communicate through strong authnz paths via public networks or whatever cloud features you can use (vpc peering or vpc endpoint services or private link)?

Sandor Szücs | 418 I'm a teapot



Costin Manolache

unread,
Apr 5, 2024, 7:33:24 PMApr 5
to Sandor Szuecs, Antonio Ojea, Benjamin Leggett, kubernetes-sig-network


On Fri, Apr 5, 2024, 15:35 Sandor Szuecs <sandor...@zalando.de> wrote:
Hi!

Maybe glueing multiple clusters via mesh is just wrong.

I would not say only 'multiple clusters' - but VMs and devices too. Mesh is not k8s specific.

And workloads are already 'glued' via internet - with standard DNS for name discovery and well established TLS ( with client certs or not ), JWT, etc 

There is a need to also support private VPCs in a consistent matter - also including non-k8s workloads, and to represent this in the Pod - both in the CR status but as interface too. 

'Mesh' is a vague term - what used to be called 'intranet', and distinct from the public Internet, but larger than a single k8s cluster. 

Sandor Szuecs

unread,
Apr 6, 2024, 12:03:39 PMApr 6
to Costin Manolache, Antonio Ojea, Benjamin Leggett, kubernetes-sig-network
On Sat, 6 Apr 2024 at 01:33, Costin Manolache <cos...@google.com> wrote:


On Fri, Apr 5, 2024, 15:35 Sandor Szuecs <sandor...@zalando.de> wrote:
Hi!

Maybe glueing multiple clusters via mesh is just wrong.

I would not say only 'multiple clusters' - but VMs and devices too. Mesh is not k8s specific.

And workloads are already 'glued' via internet - with standard DNS for name discovery and well established TLS ( with client certs or not ), JWT, etc 

That's what we do and it works well. There is no coupling between clusters and pods do not know it.
 

There is a need to also support private VPCs in a consistent matter - also including non-k8s workloads, and to represent this in the Pod - both in the CR status but as interface too. 

1) I don't see a problem, because we do exactly the same without mesh nor that the pod has to have more than one network
2) You can also use a CoreDNS template and use it to pass to a VPCEndpoint or udp/tcp/wireguard/mesh proxy or an http router (f.e. skipper that routes to VPCEndpoints and the rest) that does the glue without a need that the pod knows about multiple interfaces with multiple networks. We do this in >200 clusters with 150k pods for internal ingress.
 

'Mesh' is a vague term - what used to be called 'intranet', and distinct from the public Internet, but larger than a single k8s cluster. 

The term I don't mind, but that you need multiple IPs or worse multiple devices in a pod that makes things complicated.
There are ways you can do this without it.
I was responsible for production data centers with the same network split (storage vs lb net), but TBH it was a mistake.
Better to have only one LACP (pod network routed here) and another one for iLO (no pod network access only admin).



Best, sandor
--


--

Costin Manolache

unread,
Apr 6, 2024, 2:25:09 PMApr 6
to Sandor Szuecs, Antonio Ojea, Benjamin Leggett, kubernetes-sig-network


On Sat, Apr 6, 2024, 09:03 Sandor Szuecs <sandor...@zalando.de> wrote:


On Sat, 6 Apr 2024 at 01:33, Costin Manolache <cos...@google.com> wrote:


On Fri, Apr 5, 2024, 15:35 Sandor Szuecs <sandor...@zalando.de> wrote:
Hi!

Maybe glueing multiple clusters via mesh is just wrong.

I would not say only 'multiple clusters' - but VMs and devices too. Mesh is not k8s specific.

And workloads are already 'glued' via internet - with standard DNS for name discovery and well established TLS ( with client certs or not ), JWT, etc 

That's what we do and it works well. There is no coupling between clusters and pods do not know it.
 

There is a need to also support private VPCs in a consistent matter - also including non-k8s workloads, and to represent this in the Pod - both in the CR status but as interface too. 

1) I don't see a problem, because we do exactly the same without mesh nor that the pod has to have more than one network

Right now pod networking is set by CNI - and it's up to CNI to decide how many interfaces or networks will be set. Yes, pod gets one 'main' address that can be used to access other pods and services in same cluster - but either via additional explicit interfaces or obscure iptables/eBPF the pods manage to use other networks that don't have K8S boundaries or rules.

Mesh is just a complicated way to attach to an overlay network that goes outside of the cluster but is still private.


2) You can also use a CoreDNS template and use it to pass to a VPCEndpoint or udp/tcp/wireguard/mesh proxy or an http router (f.e. skipper that routes to VPCEndpoints and the rest) that does the glue without a need that the pod knows about multiple interfaces with multiple networks. We do this in >200 clusters with 150k pods for internal ingress.
 

I don't think CoreDNS is relevant here - it's just a reference implementation to use in the k8s cluster. A private network DNS can't be controlled by RBAC or configa in individual clusters - and it spans VMs and other things too, and likely has its own private DNS.

Of course CoreDNS can be used as a VPC DNS - if someone really wants, and knows how to set all security features properly, control which clusters can set records, etc. Or another existing DNS server can be used - most clouds and enterprise/private networks have used DNS for many years before K8S and still works.

My  problem with CoreDNS is that implementation details are getting treated as part of the 'contract' and many of the security practices common in DNS are not used. Having many DNS implementations is great - so happy to see CoreDNS grow, but not happy if anything gets coupled to a single DNS implementation.



'Mesh' is a vague term - what used to be called 'intranet', and distinct from the public Internet, but larger than a single k8s cluster. 

The term I don't mind, but that you need multiple IPs or worse multiple devices in a pod that makes things complicated.
There are ways you can do this without it.
I was responsible for production data centers with the same network split (storage vs lb net), but TBH it was a mistake.
Better to have only one LACP (pod network routed here) and another one for iLO (no pod network access only admin).

Meshes today don't use multiple IPs - but hacks to create overlay networks. 

Multiple devices in a VM or host are common and may be a cleaner, faster approach - also likely more secure since obscure magic can be avoided. Having an explicit IP on the cross-cluster+VM network with its security behavior, maybe an IP(v6) on the public internet without NAT or other tricks, plus the K8S cluster local interface is not really that bad.

Costin
Reply all
Reply to author
Forward
0 new messages