What if we eliminate the beta stage for new features?

1,149 views
Skip to first unread message

de...@redhat.com

unread,
Aug 29, 2024, 1:33:49 PM8/29/24
to kubernetes-sig-architecture

From the perspective of people outside the core project, I think this is the situation we’re already in.  Our beta is simply GA by a different name.


Currently, kube features (other than new API types) are enabled by default in production when they reach beta.  This property has a few notable repercussions:

  1. Testing must be completed prior to promotion to beta.

  2. The ecosystem starts relying on them in beta, not GA.

  3. Because the ecosystem starts relying on them, kube maintainers are very hesitant to change behavior.

  4. Because the ecosystem starts relying on them in beta, the kube maintainers almost never remove them.

  5. Because features are available in production clusters at beta, even well meaning contributors find it harder to justify efforts to complete capabilities not critical to their own use-cases.


Features that have complete testing, are enabled in production, are not subject to change, and are not removed are indistinguishable from GA.  We may continue to extend these features with new capabilities, but there isn’t a reason to expect the ecosystem or cluster-admins to treat our beta features differently from our GA features.


Let’s go directly from alpha to GA

We could go directly from alpha to GA with a requirement that for the first release a new feature is enabled by default in production clusters, it must be disable-able via a feature gate.  The feature gate can be locked to true (finished) in the first release that has zero changes.


Benefits

  1. For API types, when the feature is eligible for default enablement in production clusters, the API serialization will be stable at the same time.

  2. Features can be delivered faster since there isn’t an additional release for beta.

  3. The stage will match what we’re actually willing to do with it (enablement, breaking changes, removal).

  4. Features need to be complete before being enabled by default in production.
    I suspect our ecosystem and cluster-admins expect this to be the case today.  They expect testing to be complete and the capabilities of the feature to be present.


Downsides

  1. For API types, if we mess up things like defaulting or nesting or something else, we cannot fix it.
    This is not a real downside because we’re already in this state since we don’t notice we’ve messed up defaulting until the API is enabled in production by default.

  2. Features need to be complete before being enabled by default in production.  This makes it “harder” to add capability to a beta feature.

    We can introduce features that build on top of existing features.  This is what we do today, but today the additional capabilities added under a beta feature are enabled in production immediately, without any alpha stage.  If the new capabilities extending a beta feature break, a cluster-admin has no choice but to disabled the entire feature, even the pieces that are stable and critical to the ecosystem.  That option is still available if we eliminate beta (immediate addition to a GA feature), but this also forces consideration of adding a new feature gate for the capability to selectively enable/disable new capabilities for users.


Do the benefits outweigh the downsides?

han...@google.com

unread,
Aug 29, 2024, 2:24:24 PM8/29/24
to kubernetes-sig-architecture
It's an interesting idea, but couldn't we achieve the same thing with stricter rules regarding BETA features (i.e. mandate BETA features have stable API serialization)? Then the distinction between BETA and GA becomes whether a feature is locked to true or not, which is a meaningful difference to people who are "outside the core project".

Also, I'm not sure features are delivered faster, since today you basically get the feature when it goes to Beta, which is 1 release after Alpha (most of the time), which would be no different than when it goes to GA (in this proposal), 1 release after Alpha.

Shane Utt

unread,
Aug 29, 2024, 3:20:36 PM8/29/24
to han...@google.com, kubernetes-sig-architecture
For a bit of context, and perhaps a little bit of precedent: in Gateway API in SIG Network we made a change to drop beta some time back for similar reasons: https://gateway-api.sigs.k8s.io/concepts/versioning/#rationale

From my perspective it has not been a source of trouble, but indeed seems to have provided some simplification.

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/989b6df4-1ff9-4d7d-b4b3-407590c4d401n%40googlegroups.com.

Brendan Burns

unread,
Aug 29, 2024, 3:20:37 PM8/29/24
to de...@redhat.com, kubernetes-sig-architecture
I don't think that this is a good idea. While the behaviors (Beta is enabled by default, people are hesitant to change Beta) are true, there is also general hesitancy amongst users to depend too heavily on Beta APIs during the beta period (at least for the first few releases).

So while these APIs get used, they are not relied on at large scale for production usage by many customers (many enterprises forbid the use of Beta APIs for example).

We need a stage where there is additional mileage on the APIs for stabilization, without the "let's fling the doors wide open to usage" of GA.

Very few people enable Alpha APIs, so going directly from Alpha to GA is going to eliminate an important signal to users to "try, but don't buy" the beta APIs.

--brendan

From: Brendan Burns <bbu...@microsoft.com>
Sent: Thursday, August 29, 2024 11:26 AM
To: de...@redhat.com <de...@redhat.com>
Subject: Re: [EXTERNAL] What if we eliminate the beta stage for new features?
 
I don't think that this is a good idea. While the behaviors (Beta is enabled by default, people are hesitant to change Beta) are true, there is also general hesitancy amongst users to depend too heavily on Beta APIs during the beta period (at least for the first few releases).

So while these APIs get used, they are not relied on at large scale for production usage by many customers (many enterprises forbid the use of Beta APIs for example).

We need a stage where there is additional mileage on the APIs for stabilization, without the "let's fling the doors wide open to usage" of GA.

Very few people enable Alpha APIs, so going directly from Alpha to GA is going to eliminate an important signal to users to "try, but don't buy" the beta APIs.

--brendan

From: kubernetes-si...@googlegroups.com <kubernetes-si...@googlegroups.com> on behalf of de...@redhat.com <de...@redhat.com>
Sent: Thursday, August 29, 2024 10:33 AM
To: kubernetes-sig-architecture <kubernetes-si...@googlegroups.com>
Subject: [EXTERNAL] What if we eliminate the beta stage for new features?
 
--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.

Taahir Ahmed

unread,
Aug 29, 2024, 3:20:37 PM8/29/24
to de...@redhat.com, kubernetes-sig-architecture
+1

Whenever I have thought about this, I think features boil down to two categories:
  • Features that are opt-in, for people to test.
  • Features that are on by default, and can no longer be changed without heavy breakage.
----

For what it's worth, this somewhat matches the GCP position on APIs.  We used to have a complicated, multistage process for API evolution, with alpha, beta, and GA versions of each API.  Several years ago, this was simplified into: all features go into the "GA" version by default.  A feature is considered "preview", and OK to change in a backwards-incompatible way, if you have placed some sort of technical control (typically an allowlist) to restrict the number of customers who can use it.  Once a feature has a large number of customers using it, even if it is still behind an allowlist, backwards-incompatible changes are no longer possible without running a formal deprecation process.

It does make things slower, I think, but it eliminates a lot of customer pain.

(I wasn't involved in any of these policies, I just have to live by them as someone implementing features)

--

Alvaro Aleman

unread,
Aug 29, 2024, 7:04:40 PM8/29/24
to Shane Utt, han...@google.com, kubernetes-sig-architecture
I don't think that this is a good idea. While the behaviors (Beta is enabled by default, people are hesitant to change Beta) are true, there is also general hesitancy amongst users to depend too heavily on Beta APIs during the beta period (at least for the first few releases).

Is there? Do you have some data that backs this up? In my personal and obviously anecdotal experience, apis get used as soon as they are available and no one really cares that there is a `beta` in the group version.

Sandor Szuecs

unread,
Aug 30, 2024, 4:20:20 AM8/30/24
to Shane Utt, han...@google.com, kubernetes-sig-architecture
We had also the ingress case which changed from beta to ga significantly in details such that it is now better typed and we get less errors at apply stages in production environments.
Another case was extensions apigroup.

So in general I am not in favor of dropping beta phase.

Best, 

Sandor Szücs | 418 I'm a teapot



Antonio Ojea

unread,
Aug 30, 2024, 5:44:21 AM8/30/24
to Sandor Szuecs, Shane Utt, han...@google.com, kubernetes-sig-architecture
I think that most of us agree that we have a problem with the current model, but I personally don't know what is the best solution.

IMHO the new model has to solve the problem of the duality of feature gates and APIs, we are conflating both and this is causing problems if your feature depends on a new API, both have to move at the same time. Since Beta APIs can not be enabled by default , then the feature has to be also disabled and you have to go to GA without having your feature enabled. Graduating the API to GA with the feature gate beta sounds like a really bad option to me. If something goes wrong, and we need to downgrade a feature from GA to alpha, that is going to be a big problem too ... Do we leave the APIs in GA?

On the execution part I have questions about how PRR will work, and testing, and the impact on upgrades/downgrades, skew policies. ... and how we are going to do with the technical debt, we have accumulated a considerable number of features that are stuck or moving very slowly, before moving to a faster graduation we need to decide what we do with existing features or we'll end with two problems instead of one, see https://docs.google.com/document/d/1K7Rt4w497VWZlVHJfJ89L6oxQzPET4NvzJ6Jd7XZFtM/edit?usp=sharing for some previous analysis I did around this topic at the beginning of this year.

I think that the current model rigidity is preventing us from breaking ourselves, customers want new features but also smooth upgrades and absolutely not at a cost of regressing in quality or performance, ... definitely a topic we should discuss heavily, but I see us far from being able to start implementing something ... maybe a good unconference topic?

Wojciech Tyczyński

unread,
Aug 30, 2024, 5:50:19 PM8/30/24
to de...@redhat.com, kubernetes-sig-architecture
 I'm a little bit sceptical that what you're proposing here is an actual change. My mental model for is is that:
(a) Beta is about quality - we already provide a quality that it's production ready
(b) GA is about confidence - we already have some production experience, we proved it's working, we adjusted monitoring, playbooks, .. and we have now full confidence about the feature

You mentioned another thing which is "completeness". For me it's an orthogonal thing. GA is not about completeness. Of course GA has to provide some real value (so it needs to solve some usecase) and we need confidence that the extensions that we can already predict to be needed can be done in backward compatible way, but having them done is not GA requirement imho.
In general, I'm a fan of iterative approach, so starting with some MVP (matching the criteria above) and then extending it with "subfeatures" is what in my opinion works best.

Going back to Beta/GA and your proposal, I think that with your proposal, we're not going away with the distinction I described above:
(a) Beta = production readiness  - it's what you're calling GA
(b) GA = confidence - it's where you're saying that we lock to default [locking to default is exactly the evidence of our confidence]

So I think that this proposal is pretty much a "naming change" and it won't change much in terms of what is actually happening.

 thanks
wojtek

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.

han...@google.com

unread,
Aug 30, 2024, 6:32:23 PM8/30/24
to kubernetes-sig-architecture
I mostly agree with Wojtek's points about naming. 

I'm also a bit concerned about another implication of going straight to GA. Wouldn't an implication of this proposal be that any feature that wants to be default enabled would now just end up never being able to be removed? Our deprecation policy states "Features can be removed at any point in the life cycle prior to GA", which seems to imply then that we cannot remove any feature once it becomes default enabled if this proposal lands. 

Patrick Ohly

unread,
Sep 2, 2024, 4:47:04 AM9/2/24
to de...@redhat.com, kubernetes-sig-architecture
"de...@redhat.com" <de...@redhat.com> writes:
> Downsides
>
> 1.
>
> For API types, if we mess up things like defaulting or nesting or
> something else, we cannot fix it.
> This is not a real downside because we’re already in this state since we
> don’t notice we’ve messed up defaulting until the API is enabled in
> production by default.

We might not notice while in alpha or beta because a feature doesn't get
used enough, but I don't agree that we cannot improve a feature with the
current model. We can do a v1beta2 to change defaulting and nesting
while preserving compatibility with v1beta1 through conversion and thus
can continue to support both. It's more work, but it's doable.

Then once the feature and API is GA, both beta APIs can (eventually...)
get removed. There are some problems (storage version...), but at least
conceptually it can be done.

> 2.
>
> Features need to be complete before being enabled by default in
> production. This makes it “harder” to add capability to a beta feature.
>
> We can introduce features that build on top of existing features. This
> is what we do today, but today the additional capabilities added under a
> beta feature are enabled in production immediately, without any alpha
> stage.

Really? Extension have their own feature gate and the apiserver drops
the additional fields during create or update. So even if the base
feature is enabled by default, extensions typically aren't.

I agree with Wojtek that it is better to deliver features
incrementally. Defining what it means to be "complete" may be possible
for very simple features, but not for complex ones. Even if it was
possible, the implementation would longer and during that time, users
who would already would be perfectly happy with a subset of the final
functionality are kept waiting.

I prefer keeping beta. If we want to speed up development and adoption,
then treating beta more like GA by enabling beta API groups by default
seems better to me than skipping beta entirely.

This would imply reverting
https://github.com/kubernetes/enhancements/tree/master/keps/sig-architecture/3136-beta-apis-off-by-default

--
Best Regards

Patrick Ohly
Cloud Software Architect

Tim Bannister

unread,
Sep 3, 2024, 4:36:45 AM9/3/24
to kubernetes-sig-architecture
I think beta is a valuable stage to include. For alpha APIs, the stability promises are right there in the manifest. Same for new APIs that reach beta. The hard part is actually new fields.

For example, v1.31 updated the Service API (not a new API; Service is enabled by default) to have a trafficDistribution field (now there by default, but you can disable it). If the transition were alpha to stable, we'd have to be really sure we got the alpha right first go. Once we mark any field generally available, we can't yank it without either bumping the major version of Kubernetes (lots of work) or breaking promises we made (lots of arguments).

Existing tooling may already have the logic around our stability guarantees coded in; existing humans may not read a note about a policy change around field stability.

Yes, new APIs are off by default - but we could make selective and time limited exceptions. Take the IPAddress API; it's beta, so it's off by default. But we could enable it by default for v1.32 with constraint that the subsequent minor release will either graduate it or make a new beta API version (which would either carry the same constraint or be off by default again).
We might also have an API server configuration option that controls sets of APIs, so that if you run with --live-dangerously=false then new beta APIs are off even if the default for that API is on.


If equipped with a time machine, we could require that writing an alpha or beta field for an object is only allowed when certain metadata fields are set (eg if .metadata.allowUnstableFieldWrites is true). We should have done that much earlier if that's what we wanted; it's still possible providing we find a user friendly path to get there - and time to staff the enhancement work. The benefit is that people using a stable API and an on-by-default beta field get a bit more of a warning, and perhaps a Warning: response header, about what they are letting themselves in for.

Tim Hockin

unread,
Sep 3, 2024, 1:44:36 PM9/3/24
to Tim Bannister, kubernetes-sig-architecture
Patrick and Tim, 

Here's the problem as I see it. If beta is on by default, then people WILL use it. Some will know that they're using it, and some will not. Some will do it accidentally because they copied something from the Internet, and some will do it with intention. It doesn't matter why or how, they will use it.

If we subsequently change it, or rip it away, that causes pain. Pain. I once thought that this was reasonable - they opted into it after all.  But years of experience with real customers tells me that I was wrong. Breaking or EOLing betas is excessively painful and tedious for our dear users. So much so we, as a project, have taken pain on ourselves to prevent them from using beta, and when they do, we very rarely make breaking changes, even if we like to pretend that we could.

TL; DR - everything we call beta is either off by default, effectively alpha, or on by default, effectively GA. 

Tim





--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.

Lubomir I. Ivanov

unread,
Sep 3, 2024, 1:44:36 PM9/3/24
to de...@redhat.com, kubernetes-sig-architecture
beta could be recommended but optional, based on decisions during KEP review.

lubomir
--
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/38b92fea-4320-4025-8889-402cbb4cdf56n%40googlegroups.com.

Tim Hockin

unread,
Sep 3, 2024, 1:44:36 PM9/3/24
to Brendan Burns, de...@redhat.com, kubernetes-sig-architecture
I am, perhaps not surprisingly, in pretty strong agreement with this
proposal, or some variant thereof. Almost 2 years ago I wrote
https://docs.google.com/document/d/1roVAHyF7eWZAccmCKw7MXYUNgx4BCDSXF2IMS8h10oY/edit?pli=1&resourcekey=0-x6Tw2qz1SpCIPhbec6Qa2A&tab=t.0
which I know Davic cited in slack. I made many of the same arguments.

Since I wrote that doc, we've made a little progress on normalizing
gates, but the core thesis stands. It is too easy to accidentally use
beta features, to the extent that beta is almost always "scary GA".

Collected replies here.

> there is also general hesitancy amongst users to depend too heavily on Beta APIs

This does not match my experience. We do not require "enthusiastic
consent" from a user to engage a beta feature, and so many people use
beta things sometimes without knowing. Sometimes they use a thing
that claims to be GA, which itself uses beta APIs -- so end users
don't even know.

This is what I think David meant with "Because the ecosystem starts
relying on them, kube maintainers are very hesitant to change
behavior" and "Because the ecosystem starts relying on them in beta,
the kube maintainers almost never remove them". The risk-reward ratio
is poor and we rarely take the risk.

> We had also the ingress case which changed from beta to ga significantly

Ingress is a wonderful example of how NOT to run things. I don't
think it particularly supports or refutes anything here. It was just
SNAFU.

> I'm also a bit concerned about another implication of going straight to GA. Wouldn't an implication of this proposal be that any feature that wants to be default enabled would now just end up never being able to be removed?

IMO, yes. That's exactly the promise this proposal makes.

On Thu, Aug 29, 2024 at 12:20 PM 'Brendan Burns' via
kubernetes-sig-architecture
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/PH7PR21MB3261EAA16140D0EAD57FA8E5DB962%40PH7PR21MB3261.namprd21.prod.outlook.com.

Tim Allclair

unread,
Sep 3, 2024, 3:08:08 PM9/3/24
to Tim Hockin, Brendan Burns, de...@redhat.com, kubernetes-sig-architecture
+1 to Antonio - we should be separating out the feature gate & API discussions. Here is my proposed solution:

Feature gates don't have a beta: they are off-by-default (alpha) or on-by-default (GA). On by default means fully production ready.

APIs still have a beta, and a beta API requires a GA feature gate. The semantics of beta APIs then becomes:
- The feature is usable in production
- We are committed to carrying the feature and its behavior forward, i.e. we won't rip it out or drop functionality
- The _shape_ of the API is still subject to change (across API versions): fields might move, validation can change, defaults can change, etc.  I'm not sure how to reconcile this with field-level api changes though (i.e. adding a beta field to a GA API).

Brendan Burns

unread,
Sep 3, 2024, 4:45:28 PM9/3/24
to Tim Allclair, Tim Hockin, de...@redhat.com, kubernetes-sig-architecture
I'm very supportive of Tim Allclair's solution.

The key points for me are:
  1. Can I as a cluster administrator easily restrict people from using Beta APIs? This is straight forward if APIs go through a v1betaN stage, because I can use policy to prevent APIs with that prefix.
  2. Do we make it clear to end-users which APIs have significant production mileage on them vs. things that just became live and have had minimal alpha production on them. Again, having APIs go through a beta phase makes this clear.
I think the perspective expressed by Tim H. And David is biased towards the API reviewers or Kubernetes community perspective and doesn't think about the managed service provider or cluster adminstrator's persona. Clearly the minute something is on by default, people will start using it in earnest, file bugs etc. so for maintainers, there's not much difference between beta and GA. However, it's important to realize that while beta APIs start getting used immediately, that usage is not monolithic across the Kubernetes user community, some people start using those APIs but many will wait until they get significantly more mileage. Eliminating the switch from beta API to GA API, forces every user who wants to wait for mileage to keep track of a myriad of different APIs and how many releases they've been available for in order to understand what has been battle tested and what is a newly minted API.

--brendan


From: Tim Allclair <timal...@gmail.com>
Sent: Tuesday, September 3, 2024 12:07 PM
To: Tim Hockin <tho...@google.com>
Cc: Brendan Burns <bbu...@microsoft.com>; de...@redhat.com <de...@redhat.com>; kubernetes-sig-architecture <kubernetes-si...@googlegroups.com>

Tim Hockin

unread,
Sep 3, 2024, 7:10:29 PM9/3/24
to Brendan Burns, Tim Allclair, de...@redhat.com, kubernetes-sig-architecture
If we want to break the problem apart, let's do it more completely.
We have not two but THREE categories of features (that come to mind,
maybe more)

1) Net new APIs, distinguished by the use of a v1alpha* or v1beta* api group
2) Extensions on existing APIs, which add one or more fields to one or
more GA api group
3) Non-API features

One problem here is that type #1 and type #2 are managed differently,
but perceived by users to be the same "risk".

For features which are purely type #1, you can perhaps argue that
users opted-in to using a beta API because it says "beta" in the
`apiVersion`. I don't really buy this argument, but let's put that
aside for now.

We have a much higher volume of type #2 features (or a combo of type
#1 + type #2) than of pure type #1. For those features I will
re-phrase my main point: If a gate is enabled by default, then people
will use it. Some will know that they're using it, and some will not.
Some will do it accidentally because they copied something from the
Internet, and some will do it with intention. It doesn't matter why or
how, they WILL use it. As such, we are generally VERY reluctant to
abandon or schematically change these features. Once they are enabled
by default, they are effectively GA. While they are not enabled by
default, all alpha/beta represents is the risk of explosion. That's
not a meaningless thing to represent, but realistically the number of
people who enable off-by-default features is miniscule. In theory, if
we had a way for users to not ACCIDENTALLY use non-GA features, this
would matter a whole lot more. It would give cluster admins the
uber-power to decide if they want to allow a feature at all, and give
regular cluster users the ability to (more) safely ands explicitly
engage the features.

Type #3 features are just like type #2 in that when they are enabled,
people will end up using them. They don't, however, have a clear way
to slowly opt-in, so I think that it's clear that once they are
enabled by default, they are GA.

Back to type #1. Few "features" are JUST a new API. They usually
have controllers or other logic that needs to be enabled, and often
have linkages into existing APIs (type #2). I don't personally see
how we could call a "whole feature" GA if it has a beta API. So if
the supposition is that an API could be beta while the rest of the
integration was "preview", I guess maybe. The problem is the same -
if lots of people use it, it is effectively "GA". If few people use
it, it may as well be "preview".

I understand the desire to want to get people to use a new API and
still reserve the right to change it. IMO, that's what preview is
for.

Benjamin Elder

unread,
Sep 3, 2024, 8:21:16 PM9/3/24
to Tim Hockin, Brendan Burns, Tim Allclair, de...@redhat.com, kubernetes-sig-architecture
> +1 to Antonio - we should be separating out the feature gate & API discussions. [...]

+1

> [...] Here is my proposed solution:

>
> Feature gates don't have a beta: they are off-by-default (alpha) or on-by-default (GA). On by default means fully production ready.
>
> APIs still have a beta, and a beta API requires a GA feature gate. The semantics of beta APIs then becomes:
> - The feature is usable in production
> - We are committed to carrying the feature and its behavior forward, i.e. we won't rip it out or drop functionality
> - The _shape_ of the API is still subject to change (across API versions): fields might move, validation can change, defaults can change, etc.  I'm not sure how to reconcile this with field-level api changes though (i.e. adding a beta field to a GA API).

I think this makes sense.

----


> We have not two but THREE categories of features (that come to mind,
maybe more)
>
> 1) Net new APIs, distinguished by the use of a v1alpha* or v1beta* api group
> 2) Extensions on existing APIs, which add one or more fields to one or
more GA api group
> 3) Non-API features

IMHO we have at least one more somewhere between 2) and 3) regarding non-user-facing APIs such as CRI, where we need to coordinate the availability of new functionality between Kubernetes components and additional components which may or may not also relate to a new user-facing API. For example EventedPLEG, which doesn't have a user-facing API but does have CRI-API changes and a feature gate.

In the cases where we interact with external implementations of the feature, availability is sort of dual-gated on kubernetes featuregate AND the external implementation ...






Patrick Ohly

unread,
Sep 4, 2024, 1:38:45 AM9/4/24
to Tim Allclair, Tim Hockin, Brendan Burns, de...@redhat.com, kubernetes-sig-architecture
Tim Allclair <timal...@gmail.com> writes:

> +1 to Antonio - we should be separating out the feature gate & API
> discussions. Here is my proposed solution:
>
> Feature gates don't have a beta: they are off-by-default (alpha) or
> on-by-default (GA). On by default means fully production ready.
>
> APIs still have a beta, and a beta API requires a GA feature gate.

How is that supposed to work when the feature gate is on (GA) but the
API is beta (off)?

Some code which sets up informers based on the feature gate will fail
when the API is not enabled.

Tim Bannister

unread,
Sep 4, 2024, 4:34:55 AM9/4/24
to kubernetes-sig-architecture
Other projects require that you set a request header when using a non-GA (or similar jargon) API. What's our appetite for that?

As well as many people who'll use things without knowing, we have cohorts who would like to opt in. Some are using managed Kubernetes; others aren't but don't get to manage the control plane or nodes.

Tim (sftim)

David Eads

unread,
Sep 4, 2024, 8:32:57 AM9/4/24
to Tim Hockin, Brendan Burns, Tim Allclair, kubernetes-sig-architecture
> Type #3 features are just like type #2 in that when they are enabled,
people will end up using them.  They don't, however, have a clear way
to slowly opt-in, so I think that it's clear that once they are
enabled by default, they are GA.

This is a key point for me regarding type 2 and type 3 features.  Once they are enabled by default, I see the ecosystem building around them and us being extremely reluctant to break or make significant changes.  That state is indistinguishable from GA.


David Eads

unread,
Sep 4, 2024, 8:55:29 AM9/4/24
to Tim Hockin, Brendan Burns, Tim Allclair, kubernetes-sig-architecture
It's been a lively discussion so far.  I've added this topic to the sig-arch agenda for tomorrow, Thursday, September 5 at 2pm eastern. Link to agenda with connection info here.

Tim Hockin

unread,
Sep 4, 2024, 11:46:03 AM9/4/24
to David Eads, Brendan Burns, Tim Allclair, kubernetes-sig-architecture
> Other projects require that you set a request header when using a non-GA (or similar jargon) API. What's our appetite for that?

I'd call that "enthusiastic consent" and I think it would very much
change the equation, at least for type #1 and #2 features. We still
have type #3, where there is no API surface.

Hypothetically, suppose we add a new field `abracadabra` to PodSpec,
and we designate it "beta". Today, as long as the "Abracadabra" gate
is enabled (and it's beta so many/most installations have it on by
default), any user can specify `Pod.spec.abracadabra`. Suppose
instead we require such consent:

```
$ kubectl apply mything.yaml
error: pod.spec.abracadabra: forbidden: this is a beta field, and
requires explicit activation to use

$ kubectl --enable-beta apply mything.yaml
created pod/mything

$ kubectl get pod mything -o json | jq '.metadata.betaEnabled'
"true"
```

At least now people can't copy something from the internet and use a
beta field without knowing. Now if we want to get rid of it or change
the schema behind it (modulo other rules about not changing the type
of a given field name, which is a different discussion) we can at
least say "you knew it was beta".

There's still a risk that people use an operator or script which does
this without asking them, of course.

>> > APIs still have a beta, and a beta API requires a GA feature gate.
>
>How is that supposed to work when the feature gate is on (GA) but the API is beta (off)?

Yeah, I agree. The distinction between beta api groups and beta
features is 100% about us. To our users, it's a distinction without a
difference. We should fix that, too.

Brendan Burns

unread,
Sep 4, 2024, 12:16:03 PM9/4/24
to Tim Hockin, David Eads, Tim Allclair, kubernetes-sig-architecture
As the maintainer for at least two client libraries I think that an opt in header would be painful. You would have to get everyone who ever builds a Kubernetes client to build support for that header in some way which will force everyone to rebuild all client tools that ever were built to use the new header.

If we want to go down this approach you're way better off to look at tools like policy enforcement (aka admissions control) vs hacking something into the http protocol.

--brendan 



From: Tim Hockin <tho...@google.com>
Sent: Wednesday, September 4, 2024 8:45:40 AM
To: David Eads <de...@redhat.com>
Cc: Brendan Burns <bbu...@microsoft.com>; Tim Allclair <timal...@gmail.com>; kubernetes-sig-architecture <kubernetes-si...@googlegroups.com>

>>> > >
>>> > > --
>>> > > You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
>>> > > To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.

>>> >
>>> > --
>>> > You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
>>> > To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.

Brendan Burns

unread,
Sep 4, 2024, 12:19:22 PM9/4/24
to Tim Hockin, David Eads, Brendan Burns, Tim Allclair, kubernetes-sig-architecture
If you wanted to implement the gate client side in kubectl that could work. If you assume that people who use custom clients are 'advanced' users.



From: 'Brendan Burns' via kubernetes-sig-architecture <kubernetes-si...@googlegroups.com>
Sent: Wednesday, September 4, 2024 9:15:54 AM
To: Tim Hockin <tho...@google.com>; David Eads <de...@redhat.com>
Cc: Tim Allclair <timal...@gmail.com>; kubernetes-sig-architecture <kubernetes-si...@googlegroups.com>

Tim Hockin

unread,
Sep 4, 2024, 1:07:57 PM9/4/24
to Brendan Burns, David Eads, Tim Allclair, kubernetes-sig-architecture
I find the "but all the clients would need to be updated" part of the
appeal. We don't want people accidentally using beta features because
it too often ends in tears.

Brendan Burns

unread,
Sep 4, 2024, 1:35:12 PM9/4/24
to Tim Hockin, David Eads, Tim Allclair, kubernetes-sig-architecture
I think the trouble is (for example) "why did my Github Actions suddenly stop working?" "oh, just add this new --beta-is-ok flag" is going to cause a lot of pain for people.

Additionally, for tools like centralized GitOps where the tool necessarily spans a bunch of different users (who may want/need different flag settings) it's going to be even more painful.


--brendan

From: Tim Hockin <tho...@google.com>
Sent: Wednesday, September 4, 2024 10:07 AM
To: Brendan Burns <bbu...@microsoft.com>
Cc: David Eads <de...@redhat.com>; Tim Allclair <timal...@gmail.com>; kubernetes-sig-architecture <kubernetes-si...@googlegroups.com>

> >>> > >
> >>> > > --
> >>> > > You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
> >>> > > To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.

> >>> >
> >>> > --
> >>> > You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
> >>> > To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.

> >>>
>
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.

Jordan Liggitt

unread,
Sep 4, 2024, 2:23:43 PM9/4/24
to Brendan Burns, Tim Hockin, David Eads, Tim Allclair, kubernetes-sig-architecture
On Wed, Sep 4, 2024 at 1:35 PM 'Brendan Burns' via kubernetes-sig-architecture <kubernetes-si...@googlegroups.com> wrote:
I think the trouble is (for example) "why did my Github Actions suddenly stop working?" "oh, just add this new --beta-is-ok flag" is going to cause a lot of pain for people.

I would expect any change we make to only be for future beta things, not existing ones, so I wouldn't expect anything currently working to break over this.
 
Additionally, for tools like centralized GitOps where the tool necessarily spans a bunch of different users (who may want/need different flag settings) it's going to be even more painful.

My observation is that any opt-in mechanism controlled by the caller to ensure they know they are depending on unstable things is often opted into by generic pipeline layers so that anything a user throws into the pipeline will work.

Tim Hockin

unread,
Sep 4, 2024, 2:37:16 PM9/4/24
to Brendan Burns, David Eads, Tim Allclair, kubernetes-sig-architecture
> I think the trouble is (for example) "why did my Github Actions suddenly stop working?"

Oh, we can't break EXISTING users, just directional for new.

On Wed, Sep 4, 2024 at 10:35 AM 'Brendan Burns' via
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/PH7PR21MB3261B37440A13ED85F60AD82DB9C2%40PH7PR21MB3261.namprd21.prod.outlook.com.

Tim Bannister

unread,
Sep 4, 2024, 3:01:29 PM9/4/24
to kubernetes-sig-architecture
For everything except the “type 3“ features, we've also got the option to warn people when an API or field is on by default but still beta. For example, we could bring these about:

Warning: detected change to field .spec.imageSignature which is beta, but client did not specify beta changes are OK

or

Warning: request used beta API group apps/v2beta1, but client did not specify beta changes are OK

and maybe:

Warning: this collection includes objects that use beta field .spec.imageSignature


Downside: people might not see the warnings
Upside: bigger providers might not feel the need to suppress them.
Upside: people can try out the new shiny
Upside: we can say we did warn them

Swirling round my mind is the idea of a kubernetes.io/behavior-on-beta-use annotation; I don't like it though because the likes of Helm (and its charts, not Helm itself) will make it too easy to hide away from the people who might actually care.

I would be cautious about introducing warnings for anything that's already beta; we might be able to do that even so - especially if the client is kubectl, making a request that explicit signals to the API server that it'd prefer to be warned!
Overall though I strongly align with only changing behavior for future beta things. I'm confident that erroring on beta use, that previous worked, will not go down well at all.

Tim (sftim)

Tim Hockin

unread,
Sep 4, 2024, 3:13:04 PM9/4/24
to Tim Bannister, kubernetes-sig-architecture
Don't get me wrong - I love the API warnings functionality, but it is
fundamentally weak, by design.

> Upside: we can say we did warn them

That may make _US_ feel good, but it doesn't help the fact that the
warnings are invisible to a lot of users.

> My observation is that any opt-in mechanism controlled by the caller to ensure they know they are depending on unstable things is often opted into by generic pipeline layers so that anything a user throws into the pipeline will work.

I agree that it's possible, even likely. We can't cover everything,
but we can try to raise the significance and awareness.

```
kubectl --allow-unstable-beta-stuff-which-will-jeopardize-your-cluster
apply -f danger_danger_danger.yaml
```
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/4ca1c39e-652d-43b9-9210-9cebbff14e39n%40googlegroups.com.

Brendan Burns

unread,
Sep 4, 2024, 3:40:28 PM9/4/24
to Tim Bannister, Tim Hockin, kubernetes-sig-architecture
I think you overestimate the degree to which people read the command line when they cut and paste things.

Once you opt-in people using 'automation' which by the way necessarily has to include tools like Terraform and Helm (and others) and you get rid of the people cutting and pasting those warning flags via web pages. I think that the number of people who actually get these warnings is pretty small.

Furthermore, the job of protecting a cluster is the cluster administrator who can do this today via mechanisms like AdmissionControl and policy.

If we really want to go down this road, I think that testing it via a client-side feature in kubectl is a great way to get some mileage on it while minimizing the risk that we break anyone's automation or other tooling. It's easier to roll back that change also.

--brendan

From: 'Tim Hockin' via kubernetes-sig-architecture <kubernetes-si...@googlegroups.com>
Sent: Wednesday, September 4, 2024 12:12 PM
To: Tim Bannister <t...@scalefactory.com>
Cc: kubernetes-sig-architecture <kubernetes-si...@googlegroups.com>


--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.

Tim Hockin

unread,
Sep 4, 2024, 3:59:24 PM9/4/24
to Brendan Burns, Tim Bannister, kubernetes-sig-architecture
What sort of admission policy are you envisioning that would understand whether a request includes non-GA fields?

That information just is not represented anywhere in the request, only the backend really understands that.

Assuming there was a way to know that, I still feel pretty strongly that safe by default is better than open by default. If cluster admins want to allow beta features, they can.

Tim Bannister

unread,
Sep 4, 2024, 4:02:14 PM9/4/24
to kubernetes-sig-architecture
💭  this might tie into a really different topic: add-on management.

If you set up a cluster, you probably use tools rather than doing it the hard way. But few cluster setup tools offer a cloud-native (declarative, repeatable, low toil, good separation of concerns) way to specify and bring up add-ons. Even fewer help you change your mind once the cluster is in place.
  • A typical cluster comes with zero ValidatingAdmissionPolicies; I'd love to see a selection of these be a part of, or managed by, a default-on add on
  • A typical cluster comes with RBAC roles and rules baked in; I'd really prefer for those to be a default-on add on [mostly a separate story!]
  • You're likely to want CoreDNS; also a default-on add on?
  • A small menu of obvious but not mandatory add-ons would be champion
    • for example: (per current discussion) “beta APIs are forbidden by default”, perhaps with a helpful error message
    • for example: metrics-server
    • for example: descheduler; configured to deschedule nothing, by default(?)
    • [your obvious add-on here]
  • and, mostly out of scope for this message, there's things like network plugins to think of too
If the challenge is for cluster administrators to warn users, to detect when alpha / beta APIs are used, or to outright block which APIs are used where, we can make it easier to do the right thing. We don't have to build a complete solution and shouldn't try. And people who don't like it would be very welcome to set up a cluster that doesn't come with these opinions.

Without giving people an easy, really easy way to bootstrap a cluster that includes the opinions we collectively hold, people aren't likely to invest time and effort into ensure their cluster does those things. And we will instead see asks for in-tree changes to provide equivalent behavior, or that solves the same problem, because getting things in-tree is the path of least resistance right now.

If changing that story makes it easier to handle alpha → beta → stable graduations, I'd definitely push for more improvements in that area. If we don't think it helps, no worries. It's then a nice idea whose time hasn't come.

Tim (sftim)

On Wednesday 4 September 2024 at 20:40:28 UTC+1 bbu...@microsoft.com wrote:
 …

Taahir Ahmed

unread,
Sep 6, 2024, 2:24:43 PM9/6/24
to Tim Bannister, kubernetes-sig-architecture
To give a concrete example, consider the ClusterTrustBundle type.  This is a new type (similar to a cluster-scoped ConfigMap), with some kube-apiserver specific features and a Kubelet integration.  The kube-apiserver and kubelet features are controlled by separate 
  • In 1.27, it was added to certificates.k8s.io/v1alpha1, and both feature gates defaulted to off
  • In 1.32, we are migrating it to certificates.k8s.io/v1beta1, and both feature gates will default to on (I think).  Additionally, kube-controller-manager will write some default ClusterTrustBundle objects.
  • In 1.33(?), we will move it to certificates.k8s.io/v1, and both feature gates will default to on.
My friction points are:
  • I don't think we benefited from using the v1alpha1 API group.  Perhaps two or three people have kicked the tires on it using Kind.  It would have been the same if we directly added it to v1beta1 and defaulted both feature gates to off.
  • I'm worried about smoothly transitioning from the v1beta1 to v1 API groups.  I think it will have to actually be staged over at least three Kubernetes releases, where in release N we enable the kube-apiserver feature gate, then in release N+1 we move the kubelet and kube-controller-manager to use v1 types, then in N+2 we can remove the v1beta1 group.
The multiple-stage migration seems excessive, especially since no one anticipates changing the type between v1beta1 and v1, even in a backwards-compatible way.  Additionally, the forced migration from v1beta1 to v1 will effectively punish any external projects who want to integrate with the feature and help us get usage experience.

In my perfect world, I would have added this type directly to v1, and committed to only making backwards-compatible changes to it.  Then, I would have used the feature gates to control access to it.  "Beta" would have been when the feature gates were default-off, and adventurous souls could kick the tires.  "Beta" would be when the feature gates default-on (but can still be switched off in case of problems).  "GA" would be when we remove the feature gates.

Taahir




--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.

Davanum Srinivas

unread,
Sep 6, 2024, 5:31:16 PM9/6/24
to Taahir Ahmed, Tim Bannister, kubernetes-sig-architecture
Folks,

For those who could not make it to the sig-arch call yesterday, please see the recording. (thanks for uploading quickly John!)

https://youtu.be/gexIa6rnh08?list=PL69nYSiGNLP2m6198LaLN6YahX7EEac5g

thanks,
Dims


Reply all
Reply to author
Forward
0 new messages