[sig-network] Kubernetes Networking: sustainable development

178 views
Skip to first unread message

Antonio Ojea

unread,
May 12, 2023, 10:24:11 AM5/12/23
to kubernetes-sig-network
Hi all,

Kubernetes is more "stable" these days and new ecosystems are flourishing on top of Kubernetes. To keep the community thriving we need well defined APIs and specifications to avoid fragmentation and improve the user experience.

As in all OSS and fast-paced projects, sometimes implementation details leak into the APIs, so we need to make an effort to differentiate between reference implementation and reference architecture, kube-proxy and Services is a good example of this problem.

The tool that the project has to help in this regard is the Conformance test suite, so as I commented in yesterday's sig-net meeting, I plan to promote new e2e tests for the new features that recently graduated and for behaviors that are considered mandatory for some of the APIs, so we can start aligning all the different projects that implement kubernetes networking APIs before is too late or too painful for our users.

In addition, I'd like to encourage the community to start working towards a better specification of the sig-network APIs and features. I personally think that a good solution to this problem will be to remove ambiguity: improve our APIs and specifications, and write e2e tests so external projects and implementations can test against the expected behavior.

I think that DNS and Gateway API are good examples in this regard, with multiple Conformance and e2e tests and detailed APIs and/or documented specifications https://github.com/kubernetes/dns/blob/master/docs/specification.md.

There was already some work for Services, this document from Dan Winship per example , https://github.com/kubernetes-sigs/kpng/blob/master/doc/service-proxy.md, but we still have a lot of gaps to cover, just to highlight some of them:
- Service/Endpoint lifecycle and connection draining: https://github.com/kubernetes/kubernetes/issues/108523
- NodePorts addresses (and localhost): https://github.com/kubernetes/kubernetes/issues/111840

Another area that I feel needs some love is the Pod networking, with the project evolution the CNI is no longer part of the core and is part of the container runtimes leaving the Pod network in a limbo territory without consistency.
This lack of feedback loop between Kubernetes and the pod network is a source of problems for projects and operators, forcing projects to use "hacky" or "out-of-band" solutions that are suboptimal or us to take decisions like the one to reorder the IPs returned to the Kubelet
One of the possible solutions can be to define this network relation in the CRI API https://hackmd.io/@squeed/cri-cni and bring the pod network lifecycle to the kubelet again, so we can be more opinionated about it and bring some democratization ...

Sorry for the long email, buI these are some thoughts that I think will help to maintain the core stable and will allow us to keep adding new features in the Kubernetes networking area.

Regards,
Antonio Ojea


Shane Utt

unread,
May 12, 2023, 2:40:43 PM5/12/23
to Antonio Ojea, kubernetes-sig-network
Thank you very much Antonio for this follow up from yesterday.

> we can start aligning all the different projects that implement kubernetes networking APIs before it is too late or too painful for our users.

I really appreciate that you're thinking about this. In recent years I've had a growing concern about the disparity between implementations resulting in a confusing ecosystem for users. I want to avoid schism in implementations and I think these tests are a very important part of the solution.

> In addition, I'd like to encourage the community to start working towards a better specification of the sig-network APIs and features.
> I personally think that a good solution to this problem will be to remove ambiguity: improve our APIs and specifications, and write e2e tests so external projects and implementations can test against the expected behavior.

This is a good callout and something we should be thinking about on a recurring basis. I've added an agenda item for the next meeting to discuss this further and to work on deriving some relevant action items.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/CABhP%3Dta-22c7wajv5dwLtvorZBsJ%2BSxNDpYibfkTd5d4z0APSw%40mail.gmail.com.

jay vyas

unread,
May 13, 2023, 11:43:30 AM5/13/23
to kubernetes-sig-network
thx antonio.  yeah shane brought this up as well this wk at kpng so ... we figured wed start fixing our backlog of skiplists.  

what about for services ?  (i.e. recent issue  from dan https://github.com/kubernetes/kubernetes/issues/114210

1) Im wondering if we should ad service conformance to the mix ? i feel like its tricky to think about the different semantics of services and when they work wrt other services, what mutations are possible , internalTrafficPolicy, externalTrafficPolicy, etc ?  

2) As of now... are we missing some coverege for service mutation (i know we have some good tests for some of the conversions,  in https://github.com/kubernetes/kubernetes/blob/master/test/e2e/network/service.go .... but do we have all of em) ?   i always get confused with things like - can i convert nodeports to clusterips (if so what happens to the nodeport)/ and back ? ... actually sometimes i wonder wether in k8s 2.0 they should just say "ok, thats it... services are immutable from now on:)" 

Antonio Ojea

unread,
May 13, 2023, 1:48:38 PM5/13/23
to jay vyas, kubernetes-sig-network
On Sat, 13 May 2023 at 17:43, jay vyas <jayunit1...@gmail.com> wrote:
thx antonio.  yeah shane brought this up as well this wk at kpng so ... we figured wed start fixing our backlog of skiplists.  

what about for services ?  (i.e. recent issue  from dan https://github.com/kubernetes/kubernetes/issues/114210

1) Im wondering if we should ad service conformance to the mix ? i feel like its tricky to think about the different semantics of services and when they work wrt other services, what mutations are possible , internalTrafficPolicy, externalTrafficPolicy, etc ?  

 

2) As of now... are we missing some coverege for service mutation (i know we have some good tests for some of the conversions,  in https://github.com/kubernetes/kubernetes/blob/master/test/e2e/network/service.go .... but do we have all of em) ?   i always get confused with things like - can i convert nodeports to clusterips (if so what happens to the nodeport)/ and back ? ... actually sometimes i wonder wether in k8s 2.0 they should just say "ok, thats it... services are immutable from now on:)" 

We do have Conformance test for Services ... but we did not keep adding Conformance tests for the new Services features, that is why this topic started, I'm planning to start promoting more of these tests, but is a heads up for projects that implement Services or kubernetes networks features to ensure they are not going to have problems or that some of the tests make wrong assumptions.

I always hear from people that want to contribute but they do not know how to do it, exploring and documenting and adding tests for all these Services mutations is a good example ... but unfortunately much of the interest is in 
adding features to the core, that is complex at this stage of the project, where the bar for stability and the experience with the codebase required is very high

jay vyas

unread,
May 15, 2023, 3:54:49 PM5/15/23
to kubernetes-sig-network
I guess we have conformance tests for services, but iirc, not that many?

1) 
When you say "we have conformance tests for services"... ... IIRC theres only 3 conformance tests for services... ?  (this might be out of date but ... iirc this was the list...)  

      "[sig-network] Services should be able to switch session affinity for service with type clusterIP [LinuxOnly] [Conformance]",

      "[sig-network] Services should be able to change the type from ExternalName to ClusterIP [Conformance]",

      "[sig-network] Services should be able to change the type from NodePort to ExternalName [Conformance]"


2)  But we have a bunch of other tests, like tests that mutate services, and other tests that specifically look at particular connectivity patterns - Are we going to add alot of these existing sig-network to conformance ?  (I think that would be awesome ... but... it would make the conformance tests wayyy longer) but i think thats ok.......... 

should function for client IP based session affinity: http [LinuxOnly]

  should function for client IP based session affinity: udp [LinuxOnly]

  should function for endpoint-Service: http

  should function for endpoint-Service: udp

  should function for node-Service: http

  should function for node-Service: udp

  should function for pod-Service: http

  should function for pod-Service: udp

  should implement service.kubernetes.io/headless

  should implement service.kubernetes.io/service-proxy-name

  should preserve source pod IP for traffic thru service cluster IP [LinuxOnly]

  should prevent Ingress creation if more than 1 IngressClass marked as default [Serial]

  should prevent NodePort collisions



Anyways this is good news.  
Will make it easier to do acceptance testing for customers: Alot of times right now we have to give them multiple ginkgo incantations to run w/ sonobuoy - but if its all in conformance, we can just run conformance, and nothing else. 

I guess this would fundamentally change the "Definition" of Kubernetes, since iirc the definition of k8s used to be "the conformance tests"......... so, is it expected to take a long time to get these merged? 

Antonio Ojea

unread,
May 15, 2023, 4:59:15 PM5/15/23
to jay vyas, kubernetes-sig-network

How are you getting that list? there are 16 Conformance tests for Services https://testgrid.k8s.io/conformance-kind#conformance,%20master%20(dev)%20%5Bnon-serial%5D&include-filter-by-regex=Services

Kubernetes e2e suite.[It] [sig-network] Services should be able to create a functioning NodePort service [Conformance]
Kubernetes e2e suite.[It] [sig-network] Services should be able to switch session affinity for NodePort service [LinuxOnly] [Conformance]
Kubernetes e2e suite.[It] [sig-network] Services should be able to change the type from ClusterIP to ExternalName [Conformance]
Kubernetes e2e suite.[It] [sig-network] Services should be able to change the type from ExternalName to ClusterIP [Conformance]
Kubernetes e2e suite.[It] [sig-network] Services should be able to change the type from ExternalName to NodePort [Conformance]
Kubernetes e2e suite.[It] [sig-network] Services should be able to change the type from NodePort to ExternalName [Conformance]
Kubernetes e2e suite.[It] [sig-network] Services should be able to switch session affinity for service with type clusterIP [LinuxOnly] [Conformance]
Kubernetes e2e suite.[It] [sig-network] Services should complete a service status lifecycle [Conformance]
Kubernetes e2e suite.[It] [sig-network] Services should delete a collection of services [Conformance]
Kubernetes e2e suite.[It] [sig-network] Services should find a service from listing all namespaces [Conformance]
Kubernetes e2e suite.[It] [sig-network] Services should have session affinity work for NodePort service [LinuxOnly] [Conformance]
Kubernetes e2e suite.[It] [sig-network] Services should have session affinity work for service with type clusterIP [LinuxOnly] [Conformance]
Kubernetes e2e suite.[It] [sig-network] Services should provide secure master service [Conformance]
Kubernetes e2e suite.[It] [sig-network] Services should serve a basic endpoint from pods [Conformance]
Kubernetes e2e suite.[It] [sig-network] Services should serve multiport endpoints from pods [Conformance]
Kubernetes e2e suite.[It] [sig-network] Services should test the lifecycle of an Endpoint [Conformance]



jay vyas

unread,
May 15, 2023, 5:43:35 PM5/15/23
to Antonio Ojea, kubernetes-sig-network
Ah yeah. By "service" I was thinking of "tests that fail if the svc proxy is broken or becomes broken after cluster creation."... as a way of determining "kube proxy"  conformance for third party proxy impls.... 

In any case... I can try to recalculate that empirically.... but in general the q I had is...


1) How many sig-net tests are you going to promote ? 

Only curious bc i know conformance promotion is kinda a big deal.... bc it changes the definition of "certified k8s".... 
Which then has impacts in vendors who ship k8s in non standard ways......

Antonio Ojea

unread,
May 15, 2023, 6:10:34 PM5/15/23
to jay vyas, kubernetes-sig-network
The ones in the agenda yesterday's sig-net meeting

Reply all
Reply to author
Forward
0 new messages