Compatibility Versions in k8s and features graduation

283 views
Skip to first unread message

Sergey Kanzhelev

unread,
Oct 15, 2024, 3:34:46 PM10/15/24
to jpb...@google.com, han...@google.com, lig...@google.com, jy...@google.com, apri...@google.com, siz...@google.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com

Hi,


As part of the KEP-4330: Compatibility Versions in Kubernetes k8s is now changing the way we graduate KEPs. Can you please share updated instructions on what is the expected way to graduate KEPs?


In SIG Node we like to graduate KEPs early in the code development process to make sure to give enough reviewers cycles to new KEPs development. If there is no instruction yet, when is it planned to be created?


Specific questions raised during the discussion of this in SIG Node:


  1. What is the plan to support e2e_node tests running in version emulation mode? Not all SIG Node features can be tested with the e2e/node tests.

  2. Can you give an example on how to convert tests that were previously testing the feature gate disablement behavior? Our understanding is that since the feature gate is locked, those tests need to somehow be converted into emulation mode tests. So best practices and some test framework helper methods would be greatly appreciated.

  3. If you have a good example of KEP being GA-d under the new process, please share. If not, we can collaborate on one of the SIG Node KEPs. For example these two:

    1. https://github.com/kubernetes/kubernetes/pull/126981#discussion_r1799779745

    2. https://github.com/kubernetes/kubernetes/pull/128046#discussion_r1799759192

  4. Are there any changes needed from the SIG Node side to enforce the proper version skew better? We envision way more confusion with this. As a minimum it must be very well explained in docs so we can link anybody to that explanation. 


One question we discussed during the SIG Node meeting today is whether to keep the “dead code” around in kubelet. Since kubelet has no plans to support emulated versions, we can potentially delete the code from it on feature GA. However to simplify reviews we will likely just keep the code in place. This creates liability of “dead untested code” being present as well as a risk 3 versions down the road to break the feature that was GA-ed. But the risk is similar to what we have with other components. Once you have general instruction up, we will update it specifically for the kubelet development.


Again, lack of clear documentation now is stalling a couple of PRs and we would appreciate clear instructions that can be widely shared.


Sergey and SIG Node leads

Jeffrey Ying

unread,
Oct 15, 2024, 5:50:35 PM10/15/24
to Sergey Kanzhelev, jpb...@google.com, han...@google.com, lig...@google.com, apri...@google.com, siz...@google.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com
Hi Sergey,

Thank you for bringing this up. We are in the process of updating the instructions for feature graduation and are tracking it via this issue: https://github.com/kubernetes/community/issues/8082. and will target to have them finalized this week.

To answer your questions:
1. Since kubelet does not support compatibility versions, no changes are planned for e2e_node tests.
2. Feature gate disablement tests for GA features are only required for unit and integration tests involving control plane components (components that support compatibility version can emulate a version where the feature is disabled). Link for example.
3. We have two examples of GA PR (CustomResourceFieldSelectors, LoadBalancerIPMode). SIG Node KEPs are a bit different since the kubelet is also affected, and collaborating on those two to set an example would be great!
4. We will update the documentation to clarify this

Again, we would like to ensure a smooth process for feature developers and will target to have the updated documentation as soon as possible. If anyone runs into difficulties with compatibility versions, please don't hesitate to reach out.

Jeffrey

Tim Hockin

unread,
Oct 18, 2024, 1:20:19 PM10/18/24
to Jeffrey Ying, Sergey Kanzhelev, jpb...@google.com, han...@google.com, lig...@google.com, apri...@google.com, siz...@google.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com
This came up in a discussion today.

We are asserting that kubelet is not (yet?) covered by version-compat,
right? As such, it seems legitimate to follow "old" practices for
gates which are node-only.
a
BUT....

In my experience, gates are generally configured the same across
components - end users should not really care which kube binaries need
to know about which gates. So if we remove a gate which WE know is
kubelet-only, we can still break users. I think.

E.g. For a gate's graduation:

* kube version X: Gate "FooBar" (only affects Kubelet) exists as BETA
and on by default.
* version X+1: Gate becomes GA and locked-on. Compat requires
emulation of X, X-1, X-2, so the gate is retained. Because kubelet is
out of scope, the gated logic can be removed.
* version X+2: Compat requires emulation of X+1, X, X-1, and the gate
was toggleable in X-1, so the gate is retained.
* version X+3: Compat requires emulation of X+2, X+1, X, and the gate
was toggleable in X, so the gate is retained.
* version X+4: Compat requires emulation of X+3, X+2, X+1, and the
gate was NOT toggleable in X+1, so the gate can be removed.

E.g. For a deprecation:
* kube version X: Deprecation announced. Gate "FooBar" (only affects
Kubelet) is added as DEPRECATED but is on by default.
* version Y=X+N: Gate is changed to be off by default, but not locked.
* version Y+1: Wait.
* version Y+2: Wait.
* version Y+3: Prior to compat-version, this is where we would remove
the gate, right? But compat requires emulation of Y+2, Y, Y-1, and
the gate was toggleable in Y-1, so the gate must be retained. But
it's kubelet-only, so is it safe to remove?

For a control-plane gate it's obvious that we can lock the gate at Y+3
(and maybe sooner, different discussion) but we need to keep it until
Y+6 (for compat with Y+5, Y+4, Y+3, none of which are toggleable).

If, we remove the gate at Y+3 (because kubelet is out of scope), we
still break compat because users pass the same set of gate-names to
all components. They may have, for example:

```
var gates = map[KubeVersion]string {
kubever(Y): "foobar:true",
kubever(Y+1): "foobar:false",
kubever(Y+2): "foobar:false",
kubever(Y+3): "",
}
```"foobar:false"

If they assert compat-version Y+2, they will pass "foobar:false" to
kube-apiserver (*we* know it is meaningless, but they might not and
should not need to). apiserver will fail with "unknown gate: foobar".

So I think that ALL deprecations, even node-only, need to AT LEAST
retain the gate declaration. We can debate whether gated non-code
should be removed at Y+3 or Y+6.

* kube version X: Deprecation announced. Gate "FooBar" is added as
DEPRECATED but is on by default.
* version Y=X+N: Gate is changed to be off by default, but not locked.
* version Y+1: Wait.
* version Y+2: Wait.
* version Y+3: Gate becomes locked-off. Compat requires emulation of
Y+2, Y+1, Y, and the gate was toggleable in Y, so the gate is
retained.
* version Y+4: Compat requires emulation of Y+3, Y+2, Y+11, and the
gate was toggleable in Y+1, so the gate is retained.
* version Y+5: Compat requires emulation of Y+4, Y+3, Y+2, and the
gate was toggleable in Y+2, so the gate is retained.
* version Y+6: Compat requires emulation of Y+5, Y+4, Y+3, and the
gate was NOT toggleable in Y+3, so the gate can be removed.

Is that analysis right?

On Tue, Oct 15, 2024 at 2:50 PM 'Jeffrey Ying' via
kubernetes-sig-architecture
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CANZyWOR-Ho6NTjhDpUdyHCTmfgGD7Dsi%2BpK6U37H4x4a5LMTSA%40mail.gmail.com.

Siyuan Zhang

unread,
Oct 22, 2024, 12:33:01 PM10/22/24
to Tim Hockin, Jeffrey Ying, Sergey Kanzhelev, jpb...@google.com, han...@google.com, lig...@google.com, apri...@google.com, kubernetes-si...@googlegroups.com, kubernete...@googlegroups.com
Thanks Tim for the analysis. I agree we should keep the gate declaration for at least 3 versions after the locked version except for alpha gates, and we can add a check in `verify-featuregates.sh` to ensure that. 
Other than that, there should be no change in the sig-node development, since kubelet does not support the `--emulated-version` flag.
Reply all
Reply to author
Forward
0 new messages