KNI [Kubernetes Neworking reImagined]

Michael Zappa

unread,

Jan 16, 2024, 12:51:19 PMJan 16

to kubernetes-sig-network

Hello SIG Network!

Today, I am thrilled to share with you a concept we've been working on in the community to create a new standard for lower-level networking within Kubernetes.

The concept is called KNI (Kubernetes Networking reImagined [Interface], a term originally coined by Tim Hockin back in 2016). Today networking is set up/torn down by the container runtime, however with KNI, networking is separated into a new modular Kubernetes specific KNI server called the network runtime, and the CRI functions to trigger networking are replaced by the new KNI gRPC API in the kubelet. This makes the code cleaner and simpler, and it is possible for cluster implementors to replace the entire networking implementation in a modular, flexible way without changing the container runtime or core Kubernetes. The KNI is more flexible than the CRI and so will also simplify future enhancement to Kubernetes networking capabilities.

Links:

KNI Presentation (links to the poc code changes are included)

KNI post by Doug Smith (this explains how to setup the demo and thanks Tomo for making this easier)

Hopeful Next steps:

KEP + establish working group + community sync

Your feedback and thoughts are invaluable to us, as we're exploring down this path to see if this is something that helps Kubernetes grow and be ideal for more use cases. We'll be talking about it at next week's SIG Network meeting and invite you to stay tuned for more updates and feel free to reach out to us here on the mailing list, or in #sig-network!

Hope you're having a great new year so far!

Michael Brown

unread,

Jan 17, 2024, 3:15:50 AMJan 17

to kubernetes-sig-network

Saw this before bed.. couldn't sleep :-O This is great.. a lot to parse. WG is a very good idea.

Initial thoughts/questions maybe for the WG :-)

1) CRI currently has image service and runtime service; runtimes may implement one or both; ... adding a CNI++k8s focused network service to the CRI services list makes a lot of sense IMO.

2) I take it the runtime service is still responsible for network namespace creation and tear down and more particularly when running pod networking, not host, with without/user namespace isolation?

3) Decoupling initial networking for a pod from run pod might be difficult due to existing patterns/scenarios reliant on network setup prior to pod ready. For example, http probes and third party cloud provider networking setups.

4) Decoupling tear down might also be difficult, for example calling cri stop pod, requires the runtime integration to stop any running containers and tear down the pod resources it created, such as by calling CNI request same...

5) NRI integration and/or otel trace will need further thought..

6) Runtime handlers for VMs, confidential containers, new sandbox implementations (see containerd refactoring around sandboxing) will need some additional thought..

7) Given the above thoughts, on the runtime service run pod call you could add the socket to use as an override to any embedded networking implementation. Additionally the cri network service can act as proxy to the runtime handler for any VM scenarios. IOW run pod receives current host/pod/user ns arguments passed and if not implementing kni/cri network service can also receive a sock where the network service is implemented, e.g. a kubernetes network server created by kubelet.

Michael Zappa

unread,

Jan 17, 2024, 4:21:57 AMJan 17

to kubernetes-sig-network

@mikebrown

1. This would not be added to CRI-API, KNI would be a separate gRPC api. A previous email last January had a few reasons why people did not want to head this route of adding CNI to CRI-API.

2. The runtime service would still be creating/deleting the network namespaces.

3. The current pattern actually made KNI demo quite easy. The AttachNetwork RPC is called after RunPodSandbox and before Create/StartContainers RPC so the containers come up with a working network. The failure behavior is the same as well. If the network setup fails, the pod won't come up

4. The same from bullet point 3. The current pattern made this easy. The StopContainers RPC is called first then DetachNetwork so we don't pull the network from running containers.

5. The NRI should function just the same and if the container runtime implements a KNI server this would work still.

6. KNI should not impact this however it would require me to give it a quick look to verify.

7. See bullet point 1, as KNI is not implemented in the cri-api.

I am happy to go through the POC code to show you the flow for setup/tear down if that would be helpful to address a few of the bullet points (3, 4, 6 ... and others). Please reach out as others want to do a deep dive on the current implementation. The links to the code are in the kni presentation. Sorry for the brief responses its 2 am.

Antonio Ojea

unread,

Jan 17, 2024, 4:46:11 AMJan 17

to Michael Zappa, kubernetes-sig-network

Interesting concept, have some questions about the architecture (slide 8 of the presentation)

- if KNI provides the network, how does the Container Runtime know about it and does not use the CNI?

- How can you make this change backwards compatible?

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/3c703905-0984-4c15-8dd0-1e1d86ed1084n%40googlegroups.com.

Shane Utt

unread,

Jan 17, 2024, 6:06:54 AMJan 17

to kubernetes-sig-network

This is cool, thank you for your effort on this. I'm excited for us to continue talking about this and see how it develops!

Casey Callendrello

unread,

Jan 17, 2024, 8:26:42 AMJan 17

to Michael Zappa, kubernetes-sig-network

(Full disclosure: Mike and I have been going back-and-forth on KNI design as part of the CNI meeting for some time :-). So I've had some time to digest this)

One thing that is unclear between Doug's blog post vs. your presentation is the role of the kubelet in all of this. Is the kubelet expected to be the KNI client, or is the container runtime?

If the kubelet is the KNI client, then we have to be very careful around lifecycle. One cannot add a network to a PodSandbox until after the PodSandbox is created (unlike, say, mounts or devices). Likewise, both a PodSandbox and network attachment can arbitrarily enter a failed state. They have tightly bound lifecycles, though -- a failed PodSandbox will, by definition, need a new network attachment, as one has to assume the network namespace has gone away and thus all interfaces have been cleaned up.

Separately, the KNI daemon needs to be tightly coupled with the isolation domain. Network plugins are, obviously, isolation aware and need to be able to jump between network namespaces. The kubelet, however, is not isolation aware, having delegated this entirely to the runtime. Somehow, a pointer to the isolation domain (i.e. path to the bind-mounted network namespace) needs to travel up via the CRI, through the kubelet, then down to the KNI. And, all of this needs to be lifecycle aware.

I do think we need a formal CRI API to manage network configurations; I've slowly played with a proposal but I've not made it particularly public. I was hoping that it would make progress as part of the multi-networking effort.

--Casey C.

--

You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/a7b7b9c0-f42b-4053-ab38-ef9706a22a34n%40googlegroups.com.

jay vyas

unread,

Jan 17, 2024, 10:16:08 AMJan 17

to Casey Callendrello, Michael Zappa, kubernetes-sig-network

Hiya mike thanks.

What other projects would change if this were to be officially supported ,.... like....

.... would k8s then have a default impl of this that replaced some of the common logic in the CRI?

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/CACRAZPJrzR8%2B-X%3DCkUkcK1cqtHT11L7Akm40GcDc5uysuwrsUQ%40mail.gmail.com.

Michael Zappa

unread,

Jan 17, 2024, 10:51:48 AMJan 17

to kubernetes-sig-network

@casey thanks for the response!

To clarify the kubelet would be the KNI client at least in the current POC/Demo. The container runtime is responsible for creating/deleting the network namespaces. The container runtime could implement a KNI server/plugin though. My hopes is that I get the container runtime KNI plugin done in the next week or so. The standalone version of KNI server is rather quick to implement.

For your reference a majority of the points above can be answered in https://github.com/MikeZappa87/kubernetes/blob/49a7fccbe7dca4e955ded52722ea140083c73b9c/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L1069

As for the flow, RunPodSandbox needs to complete, then PodSandboxStatus needs to complete (this is what brings back the network namespace path) and then AttachNetwork would be executed all synchronous. If AttachNetwork fails, the behavior is exactly the same if a CNI ADD were to fail. CreateContainer/StartContainer are executed after a successful AttachNetwork RPC. Please reach out if you want me to provide a deep dive into the current flow I am happy to do so!

Michael Zappa

unread,

Jan 17, 2024, 10:56:12 AMJan 17

to kubernetes-sig-network

@antonio thanks for the response!

The container runtime in the POC/demo is completely uninvolved with the CNI execution (ADD/DEL) and has a feature flag disabling the CNI. The container runtimes responsibility is to create/delete the network namespace. However, it is very possible for the KNI server to be implemented as a containerd plugin and in fact I have that queued up to do. The demo KNI server (network runtime) actually executes the CNI plugins as a way to be backwards compatible. With the demo the changes are completely transparent to the end user.

Antonio Ojea

unread,

Jan 17, 2024, 3:01:46 PMJan 17

to Michael Zappa, kubernetes-sig-network

I like the concept but I agree with Casey´s comments , the object here is the Pod, the "network" is a part of the Pod and the Pod lifecycle is handled by the container runtime, splitting responsibilities in terms of functionalities in the kubelet makes the solution more complicated.

If we are going to try to solve these problems I think we should solve them only once, enhancing the CRI API and making KNI part of the runtimes sounds a more viable solution and easier to make backwards compatible, I'd like to keep this option on the table if possible.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/408cd4f1-b88e-483c-97d4-21fa25ca1e1dn%40googlegroups.com.

Michael Zappa

unread,

Jan 17, 2024, 3:08:31 PMJan 17

to Antonio Ojea, kubernetes-sig-network

Something to keep note is that CRI-API is not just for K8s and KNI has a mission to be k8s specific. One of the areas people enjoyed with KNI that it had the flexibility to be decoupled from the container runtime. It would be possible to do if the kni is a different service in the cri-api. However not certain we gain anything with that?

Michael Zappa

unread,

Jan 17, 2024, 3:10:17 PMJan 17

to Michael Zappa, Antonio Ojea, kubernetes-sig-network

This mail from last year has some thoughts on the consolidation.

Should the CRI include networking? Should it be the CNI? (google.com)

--
You received this message because you are subscribed to a topic in the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-sig-network/dkrHPlVz2ME/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-sig-ne...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/SJ0PR00MB1304FA17142C3E900A720CB2DC722%40SJ0PR00MB1304.namprd00.prod.outlook.com.

Michael Zappa

unread,

Jan 17, 2024, 3:48:38 PMJan 17

to Antonio Ojea, kubernetes-sig-network

Something that we could consider and I’ll toss it onto the table is.

Fork/rename cri-api to kri-api and make this k8s specific and add a network service that could be implemented by the container runtime and also if the end user wants have a standalone network service that isn’t a container runtime plugin.

Get Outlook for iOS

From: Michael Zappa <michae...@microsoft.com>
Sent: Wednesday, January 17, 2024 1:10:06 PM
To: Michael Zappa <michae...@microsoft.com>; Antonio Ojea <antonio.o...@gmail.com>

Mike Morris

unread,

Jan 18, 2024, 3:40:10 PMJan 18

to kubernetes-sig-network

Tiny nit - the colors on slide 8 seem slightly off, KNI flows appear to be labeled with both a green-ish cyan and a light blue (key in legend and also confusingly GarbageCollect, which leaves the RemovePodSandbox call in CRI purple, which I think the whole flow should be?)

Shane Utt

unread,

Jan 19, 2024, 2:12:56 PMJan 19

to Mike Morris, kubernetes-sig-network

The interest in this project has been very strong, and other than just this mailing list there's been a lot of different conversations going on about this topic. Mike and I have been discussing how to best serve the community with this idea and make sure we get everyone coordinated and connected, so for the moment we wanted to suggest a zoom meeting for this topic on the SIG network calendar. We've created a thread here for those interested to let us know what times might work for them:

https://kubernetes.slack.com/archives/C09QYUH5W/p1705691463263059

Please let us know!

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/b8ffc0e9-5bb4-4167-9bb0-d7de736c5c56n%40googlegroups.com.

Shane Utt

unread,

Jan 24, 2024, 12:16:03 PMJan 24

to kubernetes-sig-network

In a zoom sync today amongst people working on Blixt (a Gateway API adjacent kubernetes-sigs project with a current focus on L4 support) we discussed the potential for Blixt to help support the KNI project by implementing KNI and becoming a sort of "playground KNI implementation" much like it is already for Gateway API.

We posted a discussion thread about it here for those who might be interested: https://github.com/kubernetes-sigs/blixt/discussions/174

Benjamin Elder

unread,

Jan 24, 2024, 7:52:22 PMJan 24

to Shane Utt, kubernetes-sig-network

> Something to keep note is that CRI-API is not just for K8s and KNI has a mission to be k8s specific

CRI-API is actually Kubernetes specific and the docs clearly state this now [1].

----

[1]: https://github.com/kubernetes/cri-api#purpose

"CRI is a plugin interface which enables kubelet to use a wide variety of container runtimes, without the need to recompile."

"The CRI API is defined in kubernetes/kubernetes repository and is only intended to be used for kubelet to container runtime interactions, or for node-level troubleshooting using a tool such as crictl. It is not a common purpose container runtime API for general use, and is intended to be Kubernetes-centric."

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/5d3baf2c-f407-415a-b8a1-740c2d27831en%40googlegroups.com.

Antonio Ojea

unread,

Jan 25, 2024, 2:00:23 AMJan 25

to Benjamin Elder, Shane Utt, kubernetes-sig-network

My thoughts on this:

- there is a Pod lifecycle that is handled by the container runtime: container runtime create namespaces, calls CNI, pull images, create containers, ... this can be any project, not necessarily crio or containerd, that is what CRI API offers "CRI is a plugin interface which enables kubelet to use a wide variety of container runtimes"

- Kubelet can not do network things on the Pods/Containers created by the runtime per the isolation principle Casey refers to, also this will not work if the runtime runs Pods as VMs as kubelet will not be able to access them directly.

- Adding a new component/API that will interact with the controller runtime in parallel through a new communication channel is the same decision that Openstack took with Neutron [1], and I think this is clearly something we don't want.

My conclusion is that Kubelet communicates with the container runtime through CRI, and the network is configured at one point in time by the controller-runtime, so CRI is the only channel of communication we have with the network plugin, creating an out of bound channel by using CRDs or consuming the Kubelet API is how projects are solving this gap these days, and this is racy and hard to troubleshoot since a Pod creation call, requires the network configuration step to perform new queries to kubelet or apiserver, when this information should be part of the same operation.

One of the main problems I see is that there is no concept of network device in the OCI spec [2], only of block devices, if this concept exists, it will be very simple [3] just to use a device plugin [4] and Containerd Device Interface [5].

I think we should think holistically about this, "the Network" is not a thing, Pods and Nodes are things that have life cycles, Pods have network interfaces to communicate, how these network interfaces are connected between them is not the problem we are solving because is not a kubernetes problem, Kubernetes is not managing infrastructure, ClusterAPI does it [6], the problem we need to solve is how to configure these network interfaces so projects don't have to keep using out-of-band communication.

[1]: https://blueprints.launchpad.net/nova/+spec/get-me-a-network

[2]: https://github.com/opencontainers/runtime-spec/issues/1239

[3]: https://github.com/aojea/network-device-plugin

[4]: https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/

[5]: https://github.com/cncf-tags/container-device-interface

[6]: https://github.com/kubernetes-sigs/cluster-api

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/CAOZRXm9sNz_xJar57KX80CTXbCdvWv4SS%2B1R_Ow0v5o%3DkwVNdA%40mail.gmail.com.

Michael Zappa

unread,

Jan 25, 2024, 10:25:03 AMJan 25

to Antonio Ojea, Benjamin Elder, Shane Utt, kubernetes-sig-network

Hello, my responses are inline:

- there is a Pod lifecycle that is handled by the container runtime: container runtime create namespaces, calls CNI, pull images, create containers, ... this can be any project, not necessarily crio or containerd, that is what CRI API offers "CRI is a plugin interface which enables kubelet to use a wide variety of container runtimes"

[zappa] Are you proposing that a new container runtime? KNI aims to be responsible for networking not everything else. Networking for K8s can be done in a single place.

- Kubelet can not do network things on the Pods/Containers created by the runtime per the isolation principle Casey refers to, also this will not work if the runtime runs Pods as VMs as kubelet will not be able to access them directly.

[zappa] kubelet can execute RPC calls and it does. Are you assuming the KNI has native network objects? Are you saying that KNI cannot ever work with VM’s? Networking should be a first-class citizen.

- Adding a new component/API that will interact with the controller runtime in parallel through a new communication channel is the same decision that Openstack took with Neutron [1], and I think this is clearly something we don't want.

[zappa] What specifically did not work with Neutron? Are you assuming the CRI/KNI RPC are moving in parallel? Where is this happening? When you say we who is that?

My conclusion is that Kubelet communicates with the container runtime through CRI, and the network is configured at one point in time by the controller-runtime, so CRI is the only channel of communication we have with the network plugin, creating an out of bound channel by using CRDs or consuming the Kubelet API is how projects are solving this gap these days, and this is racy and hard to troubleshoot since a Pod creation call, requires the network configuration step to perform new queries to kubelet or apiserver, when this information should be part of the same operation.

[zappa] This makes some assumptions. The RunPodSandbox -> PodSandboxStatus -> AttachNetwork -> CreateContainers -> StartContainers. What is your specific concern here? Are you worried that the container could come up with no network? If AttachNetwork fails, the process stops.

One of the main problems I see is that there is no concept of network device in the OCI spec [2], only of block devices, if this concept exists, it will be very simple [3] just to use a device plugin [4] and Containerd Device Interface [5].

[zappa] For the OCI spec wouldn’t that be for the low level runtime aka runc/kata? I am not certain what is being proposed here. Is this to support VM’s? We need to have further discussions around how to support VM’s.

I think we should think holistically about this, "the Network" is not a thing, Pods and Nodes are things that have life cycles, Pods have network interfaces to communicate, how these network interfaces are connected between them is not the problem we are solving because is not a kubernetes problem, Kubernetes is not managing infrastructure, ClusterAPI does it [6], the problem we need to solve is how to configure these network interfaces so projects don't have to keep using out-of-band communication.

[zappa] Networking should be a first-class citizen. I would like to get your specific concerns here since I believe assumptions are being made.

To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/CABhP%3DtZh_-Q_iCpzbsbs02uD7-ZgQCpAvsYT_Fi32rwC4_5P7w%40mail.gmail.com.

Benjamin Leggett

unread,

Jan 25, 2024, 10:55:56 AMJan 25

to kubernetes-sig-network

In Istio ambient, we have for some time been hitting up against inherent shortcomings in the CNI spec (which has evolved considerably over time as the assumptions around what users and vendors actually want to do with networking has changed, as Casey can attest) and I'm excited for the KNI proposal.

Right now, Istio ambient actually works by extending your primary CNI (as supplied by managed vendor, Calico, etc) - which means the cluster is already configured, and workloads are running, and then after that we want to come in and extend the networking for (at least) new containers. Others (Cilium) also do this - it's pretty common for a service mesh to have to come in and operate or manage networking side-by-side with something else, invariably managed or owned by a third party or cloud provider. The only way this ends up being achievable is with the use of extensibility mechanisms like CNI plugins - we can drop one on your node, with permission, and extend your node networking.

We're actually in the process of adding/revising a "CNI agent" running on the node that establishes a UDS/gRPC channel with our own chained CNI plugin in order to regulate pod scheduling based on whether our plugin+agent happens to be established on the node or not, which is effectively a perverse variation on the proposed KNI pattern. Being able to rip some/all of that out in favor of KNI is an appealing thought.

How all this might work for Istio in KNI is a bit TBD of course (plugins, etc), but in general, I have a strong preference for a component that is entirely separate from the CRI and can evolve separately from it - Istio does not, in general, want to be in the business of monkeypatching people's in-use container runtimes. People (and vendors) invariably want to do odd things with networking in Kubernetes contexts specifically, and giving them clean, minimal lifecycle hooks entirely separate from the rest of the orchestration stack and container stack seems like a convenient way to end all the arguments about where networking belongs.

Reply all

Reply to author

Forward