Feature gates of CPU Manager Policy Options

314 views
Skip to first unread message

Swati Sehgal

unread,
Sep 23, 2021, 11:37:48 AM9/23/21
to kubernetes-sig-architecture, Francesco Romani

Hi all,


This discussion started in the context of KEP-2625 [1] graduating to beta; it was discussed on 20210922 on SIG Arch Production Readiness subproject group and it was suggested to be moved in this forum.


The questions boils down to:

  1. When adding new fields to a Beta/stable kubelet API, should each new field be guarded by its own feature gate?


  1. Can we make sure that the new API fields have the proper graduation process while avoiding the need to add a new feature gate for every new API field? Managing so many and so granular feature gates is cumbersome for both developers and cluster operators.


Some context: KEP-2625 [1] wants to add cpumanager policy options. The cpumanager policy options:

  • Will behave like new API flags:

    • "[the value of cpuManagerPolicyOption is] A set of key=value CPU Manager policy options to use, to fine tune their behaviour. If not supplied, keep the default behaviour."

  • Affects only the kubelet configuration

  • Are meant to set by cluster admin

  • Are meant to be composable, meaning you can have any combination of them.


We added the infrastructure supporting them and the first policy option in 1.22. In 1.23, we are starting to add more [3]. We expect to add more in the near future.


Possible approaches:

  1. Creating a feature gate per policy option.

This aligns with the current process of any new feature introduced and the well known graduation process. 

However, this can be cumbersome from sys admin's perspective as they would have to configure a feature gate in addition to the policy config in the Kubelet config. Furthermore, extra burden is needed to track feature gates and the maturity level of each cpumanager option.

  1. Introducing feature gates gating groups of policy options.

It was proposed to add CPUManagerPolicyExperimentalOptions that would be gating all the experimental options.

This way, we gate groups of cpumanager options per their maturity level.

This means anyone who wants to experiment with the newly introduced policy options would have to enable only one feature gate i.e. CPUManagerPolicyExperimentalOptions to gain access to experimental policy options. Individual policy options must still be explicitly enabled in the configuration.


[1] CPU Manager Policy Options with a policy option full-pcpus-only to reject non SMT-aligned workload (KEP, k/k PR)

[2] KEP-2625: Update CPU Manager Policy Options 1.23 Beta #2933

[3] KEP-2902: Add CPUManager policy option to distribute CPUs across NUMA nodes instead of packing them #2904


--

Kind Regards

Swati Sehgal and Francesco Romani,

Red Hat






John Belamaric

unread,
Sep 23, 2021, 1:39:35 PM9/23/21
to Swati Sehgal, kubernetes-sig-architecture, Francesco Romani
Thanks Swati.

The question here is simply whether to deviate from the standard feature gate practice to avoid a proliferation of feature gates. For most features, the cluster operator enables a feature gate, and then ordinary users are empowered to consume the feature. That is, the feature gate turns OFF the functionality for OTHER users.

In this case, the feature gate controls availability of option values for a kubelet flag. So, for the most part it is the same user enabling the feature gate as the one consuming the feature. Since it's the same user, they can turn OFF the feature by simply not using the option. So in that sense, a feature gate is not really necessary. However, there is value to the feature gate even in this case; it is simply to ensure the cluster administrator is positively acknowledging that they are enabling a potentially unstable option. Without the feature gate, they may use an option that they are not aware is unstable. This value, however, does not require a separate feature gate for each and ever option value that comes along. Instead, this can be achieved with a single feature gate that hides all "unstable" options.

The downside is that this is different from every other feature gate, which has its own potential for confusion.

John


--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CALXDz51O_G_Hi_Qbjr7q2RqE90WkJcaNG4bE%3D_QCuU3z92wmsg%40mail.gmail.com.

Derek Carr

unread,
Sep 23, 2021, 2:11:00 PM9/23/21
to John Belamaric, Swati Sehgal, kubernetes-sig-architecture, Francesco Romani
Thanks Swati and John.  I know I had raised this pattern as a concern during the KEP review, but I think the explanation provided is a reasonable one.

David Eads

unread,
Sep 23, 2021, 2:46:38 PM9/23/21
to Derek Carr, John Belamaric, Swati Sehgal, kubernetes-sig-architecture, Francesco Romani
The approach seems workable, but I think it's reasonable to also have a FeatureGate for beta level options as well.  FeatureGates are discoverable and it is possible to disable all beta features using this information to ensure that only GA features are being used.  Adding a `CPUManagerPolicyBetaOptions` flag will allow that same assurance of, "I didn't accidentally use a beta feature".

Wojciech Tyczyński

unread,
Sep 23, 2021, 2:52:54 PM9/23/21
to David Eads, Derek Carr, John Belamaric, Swati Sehgal, kubernetes-sig-architecture, Francesco Romani
 I agree with David.

 During yesterdays call I wasn't fully convinced about the idea of hiding couple options behind a single FeatureGate, but the argument that it effectively is exactly the same person (cluster-admin) that both enables/disables the feature gate and sets the flag value actually convinced me.

 But I think that the same way we want Alpha to be disabled by default - we may want to be able to disable all beta feature gates. Disabling or beta features is actually getting a bit more traction and ensuring that I won't use the beta option when I explicitly disable all beta features is a requirement that we should have.

John Belamaric

unread,
Sep 24, 2021, 9:18:03 AM9/24/21
to Swati Sehgal, David Eads, Derek Carr, Francesco Romani, Wojciech Tyczyński, kubernetes-sig-architecture
Sounds good to me.

On Fri, Sep 24, 2021 at 3:07 AM Swati Sehgal <swse...@redhat.com> wrote:
Thanks everyone for chiming in on this.

Adding `CPUManagerPolicyBetaOptions` to provide the ability to disable Beta options seems reasonable to me. Given the feature gate name: `CPUManagerPolicyBetaOptions`, I was thinking it would probably more appropriate to change `CPUManagerExperimentalPolicyOptions`  to `CPUManagerPolicyAlphaOptions`.  It would align with the known behaviour of alpha and beta features and avoid potential confusion.

What do you think?

Regards,
Swati

Swati Sehgal

unread,
Sep 24, 2021, 10:07:33 AM9/24/21
to Wojciech Tyczyński, David Eads, Derek Carr, John Belamaric, kubernetes-sig-architecture, Francesco Romani
Thanks everyone for chiming in on this.

Adding `CPUManagerPolicyBetaOptions` to provide the ability to disable Beta options seems reasonable to me. Given the feature gate name: `CPUManagerPolicyBetaOptions`, I was thinking it would probably more appropriate to change `CPUManagerExperimentalPolicyOptions`  to `CPUManagerPolicyAlphaOptions`.  It would align with the known behaviour of alpha and beta features and avoid potential confusion.

What do you think?

Regards,
Swati

On Thu, Sep 23, 2021 at 7:53 PM Wojciech Tyczyński <woj...@google.com> wrote:

David Eads

unread,
Sep 24, 2021, 11:29:27 AM9/24/21
to John Belamaric, Swati Sehgal, Derek Carr, Francesco Romani, Wojciech Tyczyński, kubernetes-sig-architecture
Sounds good to me too.

Derek Carr

unread,
Sep 24, 2021, 11:33:29 AM9/24/21
to David Eads, John Belamaric, Swati Sehgal, Francesco Romani, Wojciech Tyczyński, kubernetes-sig-architecture
Sounds good to me as well.

Davanum Srinivas

unread,
Sep 24, 2021, 11:36:24 AM9/24/21
to Derek Carr, David Eads, John Belamaric, Swati Sehgal, Francesco Romani, Wojciech Tyczyński, kubernetes-sig-architecture

Swati Sehgal

unread,
Sep 24, 2021, 12:23:44 PM9/24/21
to Davanum Srinivas, Derek Carr, David Eads, John Belamaric, Francesco Romani, Wojciech Tyczyński, kubernetes-sig-architecture
Great, Thanks!

We'll update the PR based on the discussion here.

Regards
Swati

Jordan Liggitt

unread,
Jan 14, 2022, 11:54:54 AM1/14/22
to kubernetes-sig-architecture
Sorry to chime in late, I just ran across this thread. Are aggregated alpha and beta feature gates reasonable? The main reason we have them is to be able to disable problematic behavior easily, right?

I'm wondering how the following scenario would play out with aggregated gates:
* if there's some beta CPU manager feature, to start using it, I would have to enable all beta CPU Manager features
* If some other CPU manager feature is later promoted to beta and is problematic for some reason, the only way I can disable it may be to disable all beta CPU manager features, including the one I was already successfully using and is maybe disruptive to disable?

John Belamaric

unread,
Jan 14, 2022, 12:31:19 PM1/14/22
to Jordan Liggitt, kubernetes-sig-architecture
+sig arch

Accidentally just replied to Jordan. 

On Fri, Jan 14, 2022 at 9:22 AM John Belamaric <jbela...@google.com> wrote:
These aren’t arbitrary feature gates, but rather sets of valid options. These options only go into effect if used. Additionally, IIRC these are kubelet flag options, so not open to the general user. The feature gate here is really more of a positive acknowledgment by the admin that they know they are using an alpha or beta feature. 

SWATI SEHGAL

unread,
Jan 14, 2022, 1:14:38 PM1/14/22
to kubernetes-sig-architecture
Just to add to John's comment, the two feature gates (CPUManagerPolicyAlphaOptions and CPUManagerPolicyBetaOptions) are indicators of the maturity level of policy options. Enabling/Disabling a feature gate makes the feature options accessible/hidden for use in CPUManagerPolicyOptions kubelet option. To answer your questions more specifically:
 
I'm wondering how the following scenario would play out with aggregated gates:
* if there's some beta CPU manager feature, to start using it, I would have to enable all beta CPU Manager features
 CPUManagerPolicyBetaOptions feature gate when enabled (which it is by default) provides accessibility to all the options that can be enabled but to enable an option the sys admin has to specify that option explicitly using the CPUManagerPolicyOptions kubelet option. 
* If some other CPU manager feature is later promoted to beta and is problematic for some reason, the only way I can disable it may be to disable all beta CPU manager features, including the one I was already successfully using and is maybe disruptive to disable?
Disabling all beta CPU manager options (by disabling CPUManagerPolicyBetaOptions feature gate) is not necessary as if we don't want to use an option all we have to do is remove it from CPUManagerPolicyOptions kubelet option.

Hope this answers your questions,
Regards
Swati
Reply all
Reply to author
Forward
0 new messages