Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

[KCE] Kubernetes Certified Extensions (Tentative Name) - Proposal

966 views
Skip to first unread message

Shane Utt

unread,
Jul 3, 2024, 11:51:23 AM7/3/24
to kubernetes-sig-network

Hello my fellow SIG Network folks!

As many of you know, with the Gateway API project, we pioneered a path for building effectively "official" extension APIs using CRDs that are hosted and supported upstream, but remain optional and not part of core.

The Gateway API has demonstrated the benefits of this approach, and since then, other projects like Network Policy have followed suit, using it as a model for their own APIs. Recently, the Multi-Network project has also decided to move out-of-tree and adopt a similar strategy.

While it's possible that APIs developed this way could eventually be integrated back into core, it's clear that a trend is emerging. Given this trend, I believe it is crucial to consider the impact on users: at a minimum, we should create a standardized method for developing, delivering, and using these kinds of APIs to ensure a consistent user experience.

Therefore, I propose that we develop either an official standard or a guide, drawing from the work already established by the Gateway API and Network Policy projects. This could potentially include a template repository with "start here" documentation for new projects. However, I think it's more important to align on the "what" and "why" before diving into the "how."

If you agree with the following:

  • What: Creating a standard or guide for developing APIs in a similar way to what we have done organically with Gateway API and Network Policy.
  • Why: Because a trend for this has emerged, and a consistent user experience is important.

Please show your support! If you have concerns or do not agree, please share your thoughts.

I recognize that this initiative may extend beyond just SIG Network, but I am starting here. If there is general support, my intention is to draft a KEP which covers the "how" and share it broadly across the organization for further feedback.

Looking forward to your feedback!

Shane

Mike Morris

unread,
Jul 3, 2024, 2:58:39 PM7/3/24
to kubernetes-sig-network
+1

I don't think we've figured out a _perfect_ pattern for doing this in Gateway API yet, but aiming for a consistent approach across similarly-situated out-of-tree projects (thinking of SIG-Multicluster's MCS API as a potential next step beyond SIG-Network) could be a big benefit for end users.

-Mike

Costin Manolache

unread,
Jul 3, 2024, 3:44:53 PM7/3/24
to Mike Morris, kubernetes-sig-network
While the Gateway API does define this pattern - I am not sure it is at all clear how it impacts users and if it is something that can be generalized.

Even for Gateway, there are big differences between implementations and I don't think we've seen a lot of feedback from users who attempt
to move from one implementation or another, or write helm or manifests that include optional CRDs/features and attempt to use them
on different clusters with different implementations and feature sets. 

So it would be great to be very very clear about the operational and security risks before marketing this as a 
best practice, in particular the core/optional fields and features in a CRD that may be used for security or critical operations, with 
a model of accepting the resource and expecting users to check the status to find which fields worked and which didn't. 

Costin

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/8bfe76a7-44a3-4dcc-a593-2ba76dbd1aa4n%40googlegroups.com.

Shane Utt

unread,
Jul 3, 2024, 4:20:06 PM7/3/24
to kubernetes-sig-network
Thank you for highlighting the importance of considering the downsides and risks associated with this approach, Costin. These will definitely need to be detailed in our larger proposal. The community is aware, and has been sharing problems and issues with the CRD approach, particularly since Gateway API went GA. Somewhat ironically, I myself have several reservations about the CRD approach and still have some lingering desire to see more in-tree development in the future. That said, and in spite of my own issues and reservations, I am committed to ensuring that what we have here and now works as well as is feasible. Hopefully the high level motivation itself sounds good, even if there's more details to iron out. We'd appreciate having you participate in a larger proposal and help us enumerate the downsides and risks.

Flynn -

unread,
Jul 3, 2024, 4:52:56 PM7/3/24
to Shane Utt, kubernetes-sig-network
I definitely agree that exploring the idea is a great plan, including taking a good hard look at the downsides. [ ;) ]
​ -- Flynn

Benjamin Elder

unread,
Jul 3, 2024, 7:05:42 PM7/3/24
to Shane Utt, kubernetes-sig-network
I think if you're going to call it anything like "Kubernetes Certified" this really should be going through SIG Arch as this will have overlap with approved APIs and the conformance program.

I do think this is a topic we should be talking about, just in broader scope than sig-net

+kubernetes-sig-architecture 

Costin Manolache

unread,
Jul 3, 2024, 7:31:01 PM7/3/24
to Benjamin Elder, Shane Utt, kubernetes-sig-network
Very good point - how to test conformance of all permutations of optional fields, and general usability and complexity if never knowing what you'll get 
are some of the serious problems. 

I think Gateway is solving this by only testing the 'core' APIs - which are required, and vendors can implement any mix of optional APIs they want
and document it somewhere.  Some core APIs also have optional fields - and it is not unreasonable - just the magnitude is different. I mostly 
gave up on Gateway and starting to appreciate Istio APIs far more...

Costin



Nick Young

unread,
Jul 3, 2024, 8:49:42 PM7/3/24
to Costin Manolache, Benjamin Elder, Shane Utt, kubernetes-sig-network
I definitely agree that folks are starting to use Gateway API patterns (or at least derivations of them) in other spaces, so it's a good time to stop and think about what's worked and what hasn't. Definitely also agree that SIG-Arch should be deeply involved.

Costin said:
> I think Gateway is solving this by only testing the 'core' APIs - which are required, and vendors can implement any mix of optional APIs they want
> and document it somewhere.  Some core APIs also have optional fields - and it is not unreasonable - just the magnitude is different.

This is incorrect. Gateway API has three levels of API or field - Core, Extended, and Implementation-Specific. 

Core is required of all implementations, with behavior enforced with conformance tests.

Extended is optional, but has behavior enforced with conformance tests.

Implementation Specific is an acknowledgement that some behaviors will not be able to be common enough to be Extended (or Core), but still need to be mentioned in the spec, so that users of the API can make informed decisions.

This is exactly the sort of thing that we need to make clearer, evaluate, and educate people about - especially since someone as familiar with the project as Costin is clearly doesn't understand something so fundamental to how Gateway API works. Which is why I strongly support talking about what we have done in Gateway API more widely.

Rob, Shane and I have been attempting to talk about this in venues like the Gteway API meetings and Contrib Summit for some time, but it would be very useful to be able to do this in an environment that's more accessible to more people.

Nick

Evan Jones

unread,
Jul 3, 2024, 9:56:30 PM7/3/24
to kubernetes-sig-network
+1 here. I'm particularly interested in lessons learned and pitfalls, though that level of detail is probably impractical for a "how to" guide.

Over in the Serving WG there's a nascent proposal to build a "cloud native AI gateway" to route, secure, and monitor traffic for self-hosted and 3rd party LLMs. Some folks have proposed building on top of Envoy with the help of plugins. Others, namely Benjamin who I see here, has raised questions about whether such a proposal is even in scope for the group. Regardless appropriateness or route taken, this sort of guide and the broader discussion invoked by its creation would be very helpful for steering that proposal in the right direction before it gets too far along.

Mattia Lavacca

unread,
Jul 4, 2024, 5:32:06 AM7/4/24
to kubernetes-sig-network
Big +1. In the Gateway API project we learned many valuable lessons on how to develop a CRD-based out-of-tree API, and creating a standard model to centralize such information has great value in my opinion. As others have already pointed out, I think a cross-SIG discussion is worth it. 

Mattia

Adrian Moisey

unread,
Jul 6, 2024, 8:59:48 AM7/6/24
to kubernetes-sig-network
+1 to this idea.

As an end-user the thought of having more "extra things I need to
manually install and maintain" is a scary one, but I totally see the
value that out-of-tree extensions bring to Kubernetes.

I'll definitely get involved with this KEP to see if I can help ensure
that we strike the right balance between being able to iterate and
move quickly, but also provide the stability and quality that users
expect.
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-network" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-ne...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-network/2dd3ee90-c2f4-4a1f-96ef-d813f9787654n%40googlegroups.com.

Surya Seetharaman

unread,
Jul 6, 2024, 8:59:48 AM7/6/24
to kubernetes-sig-network, Shane Utt
Huge +1.
Having a certain degree of consistency for out of tree APIs to follow a pattern for releasing the APIs, adding new features, graduating the features etc would be awesome.

Thanks Shane for starting this discussion.

Best Regards,
-SS


--

jay vyas

unread,
Jul 6, 2024, 3:37:03 PM7/6/24
to Surya Seetharaman, kubernetes-sig-network, Shane Utt
Hola senor Shane and folks! 

I've been curious about this also. This whole extending and customizing of things is a big SW engineering question.... Maybe the CNCF can make it more standardized.

So, to that end: 

Is your problem statement 

1) "We have too many potential approaches to building extensions to the core K8s API"? 

OR is it 

2) "We have too many  networking related API extension mechanisms"?

Depending on the *problem statement* I think I have a follow on question.....

Jen Gao

unread,
Jul 7, 2024, 8:16:43 AM7/7/24
to kubernetes-sig-network
Hi all,

This is a nice idea. I am very new to this area but am super excited to see the changes.  Please let me know if there is anything I can do. :)

My question is, if an organization has already created many CRDs in their infrastructure and those CRDs have the same names as the ones you've defined, what would happen? What are the implications if defining CRDs becomes a widespread practice?

Best Regards,

Jen

Nick Young

unread,
Jul 7, 2024, 10:36:42 PM7/7/24
to Jen Gao, kubernetes-sig-network
Hi Jen and welcome!

> This is a nice idea. I am very new to this area but am super excited to see the changes.  Please let me know if there is anything I can do. :)

The absolute best thing you can do, _whatever_ your level of experience in the area, is to share your experiences and feedback. Folks who've been working in an area for a while can find it pretty hard to recreate the experience of being new, so feedback from new folks and people who are learning is absolutely invaluable!

> My question is, if an organization has already created many CRDs in their infrastructure and those CRDs have the same names as the ones you've defined, what would happen? What are the implications if defining CRDs becomes a widespread practice?

This is definitely a question to add to any groups todo list.

The current state is that every CRD has to have a separate API group as well as Kind and Resource, and the combination of those must be distinct, or the CRD won't be able to be applied to the cluster. If two CRDs use the same Kind, then users will need to disambiguate using the API group as well (so, something like `gateway.networking.k8s.io` instead of `Gateway` for example).

Definitely a big gotcha and something that folks need to be careful about!

Nick


Nick Young

unread,
Jul 7, 2024, 10:40:26 PM7/7/24
to jay vyas, Surya Seetharaman, kubernetes-sig-network, Shane Utt
> Is your problem statement 
> 1) "We have too many potential approaches to building extensions to the core K8s API"? 
> OR is it 
> 2) "We have too many  networking related API extension mechanisms"?

tbh I think it's more like "Gateway API is the first Kubernetes API to be fully out of tree, other folks are starting to use Gateway API patterns, should we stop and think about which patterns work and which don't?"

Maybe a shorter way to say that is "What's the best way to build an official Kubernetes extension using CRDs?"

I don't think it's really correct to say there are too many approaches, because we've really only had one CRD-based extension that's found wide adoption to this point.

Nick

Costin Manolache

unread,
Jul 8, 2024, 1:50:20 PM7/8/24
to Nick Young, Benjamin Elder, Shane Utt, kubernetes-sig-network
On Wed, Jul 3, 2024 at 5:49 PM Nick Young <ino...@gmail.com> wrote:

Costin said:
> I think Gateway is solving this by only testing the 'core' APIs - which are required, and vendors can implement any mix of optional APIs they want
> and document it somewhere.  Some core APIs also have optional fields - and it is not unreasonable - just the magnitude is different.

This is incorrect. Gateway API has three levels of API or field - Core, Extended, and Implementation-Specific. 

Core is required of all implementations, with behavior enforced with conformance tests.

Extended is optional, but has behavior enforced with conformance tests.

Sorry for the confusing statement - I meant that as a user, the only guarantee is that the 
'core' API is present and working. Of course - core, optional, extended - and any API and feature - is expected to have tests, and any 'open' API/feature/protocol ( i.e. with multiple implementers) should have conformance testing,  that's a given.

There is nothing wrong with having 'optional' features - or features that are blocked by 
policies, K8S and most protocols have those.

The fundamental problem with the Gateway API is that it's covering a huge surface - with 
a wide variety of features and implementations. Pretty much all Internet is in scope, with no clear bounds - and a very low bar on adding more APIs (even adding APIs that are not compatible with a majority of existing implementations, or APIs that exist only in one implementation - usually Envoy ). I am referring to the current requirement to have 3 implementations for an extended API. 

There are other problems - probably will end up writing a doc - just because something may be done by a Gateway, like TCP, telemetry or authn/z -  doesn't mean it is in scope and should be provided as extended APIs, in particular when the use cases are far broader than gateways.


> I don't think it's really correct to say there are too many approaches, because we've really only had one CRD-based extension that's found wide adoption to this point.

I don't have the data, but I'm pretty sure there are MANY CRD extensions with wider adoption, and for Gateway it's the 'core' that has wide adoption - many of the 'optional' features (including GAMMA) are far from that and for good reasons.  CRD is the official way to extend K8S, and nothing is unique about Gateway except that it is considered 'core', is using k8s.io namespace and is governed by the k8s project.

Wide adoption is obviously not sufficient for something to be a 'certified extension' or be endorsed/adopted/allowed to use k8s.io branding - the owner of the CRD must want to 
give ownership/governance of K8S, and K8S WG must accept it based on its criteria. 

Having 'best practices' around security ( namespace isolation, etc), status, etc is great, and
some of the Gateaway approaches are very good ( but not perfect ). Providing more guidance and review - and maybe a certification/conformance for all CRD providers would be great too, but I think it would be pretty bad if Gateway or other 'certified' or 'core' WG will start duplicating popular CRDs just because they're not 'core' or don't follow a particular model. 

Costin



Daman Arora

unread,
Jul 9, 2024, 6:17:17 AM7/9/24
to kubernetes-sig-network
+1

Looking forward to this becoming a de facto standard for out-of-tree API management.

Regards,
Daman Arora


--

Nick Young

unread,
Jul 10, 2024, 12:16:54 AM7/10/24
to Costin Manolache, Benjamin Elder, Shane Utt, kubernetes-sig-network
On Tue, 9 Jul 2024 at 03:50, Costin Manolache <cos...@google.com> wrote:
> The fundamental problem with the Gateway API is that it's covering a huge surface - with
> a wide variety of features and implementations. Pretty much all Internet is in scope, with no clear bounds - and a very low bar on adding more APIs (even adding APIs that are not compatible with a majority of existing implementations, or APIs that exist only in one implementation - usually Envoy ). I am referring to the current requirement to have 3 implementations for an extended API.

Yes, we acknowledge that Gateway API is covering a huge surface,
that's why we have the varying levels and only require three
implementations for something to be Extended. That means that features
that are useful to _some_ of the community can be included in the API
without having to spend the time and effort to ensure that things work
for _every possible use case_.

We started this whole thing with the idea of wanting to improve and
replace two things: Ingress and Service. We joined those two efforts
together because of the level of commonality between them.

I don't understand what you want us to do differently here. Should we
say "Oh, those features we told you were in scope four years ago are
now out of scope, lots of luck building your own thing?". At this
point, we've made commitments to include certain functionality - like
Layer 4 forwarding, that people are _expecting_ and _waiting for_.
Turning around now and saying "these things are now out of scope"
would lead people to (rightly) question what else we're going to
decide to cut.

> There are other problems - probably will end up writing a doc - just because something may be done by a Gateway, like TCP, telemetry or authn/z - doesn't mean it is in scope and should be provided as extended APIs, in particular when the use cases are far broader than gateways.

What we have learned from Ingress, in particular, is that if a thing
_can_ be done on a resource, someone will go ahead and do it anyway.
We can say "that's not in scope of the API" until we are blue in the
face, but people _want_ and _need_ those features you've outlined, and
someone _will_ build it in the API. If we don't track the things
people are asking for in the upstream API and work on doing them, then
we will end up back in the Ingress situation, where there are 100
implementations and 100 different ways of configuring the same thing.

TCPRoute in particular is a great example. Ingress was _never_
designed to handle this, but _many_ implementations have configuration
available in unstructured, untyped annotations to configure your
Ingress to forward TCP. So we can change our mind, break our promise
and rule this out of scope, and everyone will end up doing individual
solutions with no portability of standardisation at all.

For better or worse, Gateway API is where people expect to see these
features, and that is not changeable now without a _lot_ of effort.

>
> > I don't think it's really correct to say there are too many approaches, because we've really only had one CRD-based extension that's found wide adoption to this point.
>
> I don't have the data, but I'm pretty sure there are MANY CRD extensions with wider adoption, and for Gateway it's the 'core' that has wide adoption - many of the 'optional' features (including GAMMA) are far from that and for good reasons. CRD is the official way to extend K8S, and nothing is unique about Gateway except that it is considered 'core', is using k8s.io namespace and is governed by the k8s project.

I guess I mistyped here.

I _meant_ to say "we've really only had one CRD-based extension in the
`kubernetes.io` API group, with full Kubernetes API review processes
in place". The very reason Shane raised this to begin with is that, as
you say "it is considered 'core', is using k8s.io namespace and is
governed by the k8s project."

Would this proposed working group have the right to go to Istio and
say "you have to stop doing what you are doing"? Of course not, that's
ridiculous. What it could, and I would argue should, do is to say
"Hey, if you want to develop a k8s.io CRD extension with Kubernetes
API review, and have your project as a Kubernetes subproject, here are
the things you should do". What are those things? Some may be similar
to what we do in Gateway API, some may be improvements. I don't know,
that's why a Working Group is important.

> Having 'best practices' around security ( namespace isolation, etc), status, etc is great, and
> some of the Gateaway approaches are very good ( but not perfect ). Providing more guidance and review - and maybe a certification/conformance for all CRD providers would be great too, but I think it would be pretty bad if Gateway or other 'certified' or 'core' WG will start duplicating popular CRDs just because they're not 'core' or don't follow a particular model.

I would argue that if there exists an upstream CRD that duplicates the
functionality of some existing implementation-specific CRD, then
that's probably a win for that implementation in the long term,
because once you support it and get your users to migrate over there's
less friction to moving to your implementation. (Less friction moving
from it as well, but that encourages implemtentations to compete on
features and performance rather that using implementation-specific
things that generate lock-in).

As always, I appreciate the feedback, but I think at this point, it's
probably better that we both just show up for the meetings of this
working group once it starts, and leave this discussion for a
higher-bandwidth arena.

Nick

Shane Utt

unread,
Jul 10, 2024, 11:41:37 AM7/10/24
to kubernetes-sig-network
Thanks everyone for your feedback. So far the feedback appears to be very positive overall, and I've discussed the topic in SIG Network community syncs where people seemed very supportive as well.

You'll note that in the original posting here, I used the word "guide". That language was very intentional: I think given the overwhelming support so far it makes sense to move forward and codify this, but I'm convinced now that the first step should be positioned more like a "guide" to get us started, and not a "standard" (at least for the first iteration). Essentially: "best practices" documentation which explains what's worked well in the past (and what's not worked well) for other projects so far, and include acknowledgments of roads that haven't even been traveled yet (i.e. moving from CRD to core).

I'll give this a little bit more thought and soak time and then propose a written guide which all of us can collaborate on. Please do keep the feedback coming though in the meantime!

Jen Gao

unread,
Jul 22, 2024, 8:46:38 AM7/22/24
to kubernetes-sig-network
Thanks Nick,

For the feedback: As a new member, I would love to find something that can let me start to work on (aka. break lol)  and build my own understanding, but I often find it challenging to find a suitable issue. For example, while I managed to set up the development environment, I struggled to identify an appropriate issue to tackle. I searched for issues labeled 'good first issue' in the Gateway API project but they were all already assigned.

Jen

Tim Hockin

unread,
Jul 22, 2024, 10:59:41 AM7/22/24
to Jen Gao, kubernetes-sig-network
Hi all,

I think I've been transparent about my "dream" of a Kubernetes where all of the apis are crds.  In my opinion, CRD is not complete until the majority of, or maybe all of, the core APIs can be represented as crds. So to that end, I endorse the continuing evolution of our understanding of how to do crds "correctly".

I want to be careful about using the word "standard" though.  It has so much implied meaning and baggage. Capturing all of the lessons from gateway and other pioneer APIs into a form that other projects can use, whether those are standard or entirely private, seems like a win. 

To the topic of keeping past promises, obviously we should try to do that, and I'm not making any statement about things like TCP in particular. But. Every decision should be considered in the context of what we know right now, and sometimes that may mean we change our mind about what the scope really is. Apis are forever, and it seems downright human to evolve our thinking as we understand a problem space better.

Tim


Shane Utt

unread,
Aug 5, 2024, 8:27:29 AM8/5/24
to kubernetes-sig-network
Given the support in this thread I'm gathering folks who are interested in contributing to work on a guide. As others and myself have noted, actually making a "standard" is something we might consider later, but for the moment the goal is to simply collect the guidance and best practices in a kubernetes/community document. If you're interested in collaborating on this, please reach out to me @shane on Kubernetes Slack!

Shane Utt

unread,
Oct 10, 2024, 6:30:48 AM10/10/24
to kubernetes-sig-network
I've created https://github.com/kubernetes/community/pull/8104 as a follow-up to this thread which proposes a guide doc for CRD based implementations in our SIG Network community documentation, so we can continue this conversation there.

It's in draft with several TODOs so it's not actually ready for general review, but I'm interested in collaboration from others on it so please do feel free to reach out to me if you're interested and/or feel free to send PRs into my fork!
Reply all
Reply to author
Forward
0 new messages