[RFC] Installing CRDs at cluster startup

281 views
Skip to first unread message

Nabarun Pal

unread,
May 4, 2021, 1:45:45 PM5/4/21
to kubernetes-sig...@googlegroups.com, kubernetes-si...@googlegroups.com, yuva...@gmail.com

Hi folks,

We hope that you all are safe and doing well.

Recently, we started exploring the problem of having certain CustomResourceDefinitions available for consumption right after the cluster becomes healthy. This was widely discussed around the first half of 2019 as the CRD Installation [1][2] problem. Recently, sig-multicluster presented in a sig-architecture meeting on 2021-03-25 [3] discussing the prospect of having a ClusterPoperty type [4] either as a k/k builtin type or as a CRD. This potentially can be solved if there was a way to have a set of certain CRDs available to the cluster consumers as soon as the apiserver goes healthy.

We have explored the problem space and come up with a POC [5] implementation.

The Proof of Concept takes in two different sources for the CRDs to be installed at startup. 

  1. Builtin set of CRDs packaged along with k/k using go:embed. These manifests will follow the same release cadence as k/k.

  2. An optional user provided directory which is located on the disk of the machine where the apiserver is being run. This is exposed using a command-line flag to both kube-apiserver and apiextensions-apiserver.

Do note that, the installer takes any standard CRD spec allowing us to reuse any existing CRD manifests.

The following logic is executed as a PostStartHook of the apiextensions-apiserver if the Feature Gate `InstallCRDsAtStartup` is set to `true`.

  1. Prepare a reader for manifests that are embedded in kube-apiserver binary (Source #1).

  2. Prepare a reader for the manifests located on disk (only if the user specifies through the flag) (Source #2).

  3. Read all manifests through the Readers and return a list of unstructured objects.

  4. Initialize an Installer from the PostStartHookContext.

  5. Ensure that all the necessary GVRs are up. (In the current POC, we are just checking for apiextensions.k8s.io group to become available)

  6. Install all the objects read in Step 3 and depending on whether they existed before or no we

    1. Patch them, if they exist

    2. Create them, if they don’t exist

At any point in the above logic, if there’s an error, we surface it; ensuring that, if the installation of CRDs at startup fails, the apiserver shows as unhealthy and the error is propagating up to the cluster admins.

While we were investigating the problem, there were certain questions that we wanted to discuss with the community. 

  1. Do we differentiate CRDs installed through this mechanism and the ones installed by users after cluster startup? If yes, how?

  2. What happens if a user deletes the CRDs installed at startup time? Shall we reconcile them and ensure they exist as long as the cluster is running?

  3. In the POC, we have included the capability to install any kind of resource which is either in the embedded location or provided by the user. We understand this approach can raise different concerns and wanted to get an opinion from the community? We can also gate the installable resources to only CRD or a combination of CRD and CRs if installing other resource types doesn’t sound like a good idea.

  4. What should be the considerations when we upgrade/downgrade highly available (HA) clusters? For a non-HA cluster, whenever the kube-apiserver starts, it will ensure that CRDs which are packaged with that version of kube-apiserver exist., covering upgrade/downgrade scenarios for a non-HA cluster.

We would like to demo the POC and discuss more in the upcoming sig-api-machinery bi-weekly meeting (May 5 2021) to receive feedback on the proposal. After the discussion, we will draft a KEP incorporating the comments and suggestions from the community.

[1]: https://docs.google.com/document/d/1P2Eiy7L-TJqG1pU9So-yrf8SftLCQwlMbWq4qfulyuA/edit
[2]: https://goo.gl/2cW8zQ
[3]: http://bit.ly/sig-architecture
[4]: https://docs.google.com/presentation/d/1-GUWYPMpfTXdPCyxFgnjpnzc_coY21h0D-QocyCIgY0/edit
[5]: https://github.com/kubernetes/kubernetes/pull/101729

Best,
Nabarun, Yuvaraj

David Eads

unread,
May 4, 2021, 2:25:45 PM5/4/21
to Nabarun Pal, K8s API Machinery SIG, kubernetes-si...@googlegroups.com, yuva...@gmail.com
If you ever want to add a CRD manifest to this list (on an upgrade for instance), all clients have to handle the case where the resource isn't yet present even though the kube-apiserver is ready. This being the case, what is the distinction between embedding this controller inside the kube-apiserver with your post-start-hook logic and simply running an external controller that reconciles a set of CRDs?


--
You received this message because you are subscribed to the Google Groups "K8s API Machinery SIG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAB_Fwd9tJp2izbVhCPx5x-hKPjOq%2BOo9as8nbwJxcws_WfK0Lg%40mail.gmail.com.

Daniel Smith

unread,
May 4, 2021, 6:47:35 PM5/4/21
to Nabarun Pal, Vivek Bagade, K8s API Machinery SIG, kubernetes-si...@googlegroups.com, yuva...@gmail.com
Hi, I and +Vivek and some other folks here at Google have been considering a similar mechanism but haven't gotten a proposal out yet.

As stated, the proposal here doesn't handle upgrade/downgrade conditions well (apiservers will fight). (different sets of user manifests on different apiservers also cause a fight).

Also, it's awkward to use this mechanism to delete a resource that it previously installed.

Also, it's unclear what should happen if a user modifies one of these controlled resources.

My current thinking is that it's probably not good to (essentially) combine apiserver & addon manager as in this approach.

The alternative that we had in mind is more like this:

1. Add a second authz webhook, which runs in front of RBAC.

The idea is that an authz webhook can run co-located with apiserver, and effectively implement a "platform admin" concept.

So, it could for example block all users other than those in a platform-admin group when the cluster starts up. Then the platform admin (or a binary with sufficient credentials) adds in the desired startup objects. Then the authz webhook recognizes that the cluster has been initialized, and permits other traffic.

An advantage of this mechanism is that you can make the authz webhook arbitrarily smart (i.e. should ordinary users be messing with these initialization objects?)

I'm not sure I'm ready to talk about this at tomorrow's SIG meeting, can we do it on the following one?




--

Yuvaraj Balaji

unread,
May 5, 2021, 2:08:22 PM5/5/21
to Daniel Smith, Nabarun Pal, Vivek Bagade, K8s API Machinery SIG, kubernetes-si...@googlegroups.com
Hi Daniel,

Thank you for your thoughts. Replying inline.


On Tue, May 4, 2021 at 11:16 AM Daniel Smith <dbs...@google.com> wrote:
As stated, the proposal here doesn't handle upgrade/downgrade conditions well (apiservers will fight). (different sets of user manifests on different apiservers also cause a fight).

Also, it's awkward to use this mechanism to delete a resource that it previously installed.

Also, it's unclear what should happen if a user modifies one of these controlled resources.
We recognize these issues and as you probably already saw these are along the same lines of the open questions we had in our minds. 
We are brainstorming about these open questions to come up with a reasonable solution.
 
The alternative that we had in mind is more like this:

1. Add a second authz webhook, which runs in front of RBAC.

The idea is that an authz webhook can run co-located with apiserver, and effectively implement a "platform admin" concept.

So, it could for example block all users other than those in a platform-admin group when the cluster starts up. Then the platform admin (or a binary with sufficient credentials) adds in the desired startup objects. Then the authz webhook recognizes that the cluster has been initialized, and permits other traffic.

An advantage of this mechanism is that you can make the authz webhook arbitrarily smart (i.e. should ordinary users be messing with these initialization objects?)
The "platform admin" concept sounds interesting and seems to be a good approach to solve the problem of users modifying these initialized objects. 
Also, can you elaborate on how the above approach solves the issue of initializing the objects?

I'm not sure I'm ready to talk about this at tomorrow's SIG meeting, can we do it on the following one?
That sounds like a good idea. We can discuss in more detail during the next sig-api-machinery call but in the meantime we are also happy to follow-up here or on the github PR.


Regards,
Yuvaraj & Nabarun

Yuvaraj Balaji

unread,
May 5, 2021, 2:26:00 PM5/5/21
to David Eads, Nabarun Pal, K8s API Machinery SIG, kubernetes-si...@googlegroups.com
Hi David,

Replying inline

On Tue, May 4, 2021 at 11:25 AM David Eads <de...@redhat.com> wrote:
If you ever want to add a CRD manifest to this list (on an upgrade for instance), all clients have to handle the case where the resource isn't yet present even though the kube-apiserver is ready. This being the case, what is the distinction between embedding this controller inside the kube-apiserver with your post-start-hook logic and simply running an external controller that reconciles a set of CRDs?
 Just to be clear, the post-start-hook logic does not run a controller/reconcile loop/long running routine. It is a one-off task that initializes the CRDs. 

Regardless, having an additional component run on the cluster to initialize CRDs would add an additional step to cluster bootstrappers, like kubeadm. Since the CRDs that are built-in in the proposed approach are non-optional definitions that need to be present on each cluster, delegating this responsibility to the cluster bootstrapper could risk creating inconsistencies across clusters created through different mechanisms.

Regards,
Yuvaraj & Nabarun

Daniel Smith

unread,
May 5, 2021, 5:36:39 PM5/5/21
to Yuvaraj Balaji, Nabarun Pal, Vivek Bagade, K8s API Machinery SIG, kubernetes-si...@googlegroups.com
On Wed, May 5, 2021 at 11:08 AM Yuvaraj Balaji <yuva...@gmail.com> wrote:
Hi Daniel,

Thank you for your thoughts. Replying inline.


On Tue, May 4, 2021 at 11:16 AM Daniel Smith <dbs...@google.com> wrote:
As stated, the proposal here doesn't handle upgrade/downgrade conditions well (apiservers will fight). (different sets of user manifests on different apiservers also cause a fight).

Also, it's awkward to use this mechanism to delete a resource that it previously installed.

Also, it's unclear what should happen if a user modifies one of these controlled resources.
We recognize these issues and as you probably already saw these are along the same lines of the open questions we had in our minds. 
We are brainstorming about these open questions to come up with a reasonable solution.
 
The alternative that we had in mind is more like this:

1. Add a second authz webhook, which runs in front of RBAC.

The idea is that an authz webhook can run co-located with apiserver, and effectively implement a "platform admin" concept.

So, it could for example block all users other than those in a platform-admin group when the cluster starts up. Then the platform admin (or a binary with sufficient credentials) adds in the desired startup objects. Then the authz webhook recognizes that the cluster has been initialized, and permits other traffic.

An advantage of this mechanism is that you can make the authz webhook arbitrarily smart (i.e. should ordinary users be messing with these initialization objects?)
The "platform admin" concept sounds interesting and seems to be a good approach to solve the problem of users modifying these initialized objects. 
Also, can you elaborate on how the above approach solves the issue of initializing the objects?

Today, without any changes, you can write a component that runs at startup and adds (maintains) your initial objects to the cluster. One possibility actually already exists ("add-on manager").

The thing you can't do today is prevent users from interacting with a partially-populated cluster. You could do that with an authz webhook--if it were first in the authz chain.

Basically, I don't want apiserver to be opinionated about the initial contents of the cluster, beyond what is absolutely necessary to make the first requests (e.g. accounts, RBAC roles, APF objects). It is not what apiserver is for, and the fact that apiserver is a distributed system (*not* sharded, *not* leader-elected) makes it very hard for apiserver to have a coherent opinion on this.

Initial population of objects could be done via adding a controller in controller-manager; that's not my preferred approach but I wouldn't complain (much).

Daniel Smith

unread,
May 5, 2021, 5:45:37 PM5/5/21
to Yuvaraj Balaji, David Eads, Nabarun Pal, K8s API Machinery SIG, kubernetes-si...@googlegroups.com
On Wed, May 5, 2021 at 11:26 AM Yuvaraj Balaji <yuva...@gmail.com> wrote:
Hi David,

Replying inline

On Tue, May 4, 2021 at 11:25 AM David Eads <de...@redhat.com> wrote:
If you ever want to add a CRD manifest to this list (on an upgrade for instance), all clients have to handle the case where the resource isn't yet present even though the kube-apiserver is ready. This being the case, what is the distinction between embedding this controller inside the kube-apiserver with your post-start-hook logic and simply running an external controller that reconciles a set of CRDs?
 Just to be clear, the post-start-hook logic does not run a controller/reconcile loop/long running routine. It is a one-off task that initializes the CRDs. 

If you are worried about inconsistencies across clusters (as below) then you also should probably be worried about inconsistencies over time in one cluster. Which is a longwinded way of saying that a one-off task is probably insufficient...
 

Regardless, having an additional component run on the cluster to initialize CRDs would add an additional step to cluster bootstrappers, like kubeadm. Since the CRDs that are built-in in the proposed approach are non-optional definitions that need to be present on each cluster, delegating this responsibility to the cluster bootstrapper could risk creating inconsistencies across clusters created through different mechanisms.

If there is a mandatory new component, and k8s fails the conformance tests without it, then cluster bootstrappers will have to adapt. I don't want people to do extra work, but I also don't think apiserver is suited to be the cluster initializer.

If the component is somehow composable (e.g. bootstrappers can add their own objects) then it is probably not too objectionable. Alternatively, if all we care about is that certain objects are present, and the bootstrappers already have a way of doing that, maybe we don't need to be opinionated about *how* the objects get added.
 

Regards,
Yuvaraj & Nabarun

--
You received this message because you are subscribed to the Google Groups "K8s API Machinery SIG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.

Noah Kantrowitz

unread,
May 5, 2021, 5:49:50 PM5/5/21
to K8s API Machinery SIG, kubernetes-si...@googlegroups.com
Just because I haven't seen this mentioned so far, are we sure "prevent
clients from interacting until a given CRD is available" is a thing we
actually want? The alternative approach of "if you get a type not found
error, retry your request until it works" seems like it will encourage
less brittle systems and covers a lot of the tricky edge cases being
discussed in this thread. Eventual consistency and promise theory are
cooooool.

--Noah

'Daniel Smith' via K8s API Machinery SIG wrote on 5/5/21 2:36 PM:
> <https://github.com/kubernetes/kubernetes/pull/101729>.
>
>
> Regards,
> Yuvaraj & Nabarun
>
> --
> You received this message because you are subscribed to the Google
> Groups "K8s API Machinery SIG" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to kubernetes-sig-api-m...@googlegroups.com
> <mailto:kubernetes-sig-api-m...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAB_J3bYxfVEmPiTxfhWaBKkQiHyPbsnTo2LY1%3DCeOhkwgVAT%2Bg%40mail.gmail.com
> <https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAB_J3bYxfVEmPiTxfhWaBKkQiHyPbsnTo2LY1%3DCeOhkwgVAT%2Bg%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Daniel Smith

unread,
May 5, 2021, 5:55:04 PM5/5/21
to Noah Kantrowitz, K8s API Machinery SIG, kubernetes-si...@googlegroups.com
I personally agree, however the people who want things like security policies, resource quotas, or even APF configurations, want those things there before users can interact with the cluster at all.

Not to mention admission webhook configurations...

To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/380991c4-1fb6-19d3-ca65-192126badd75%40coderanger.net.

Noah Kantrowitz

unread,
May 5, 2021, 6:01:52 PM5/5/21
to K8s API Machinery SIG, kubernetes-si...@googlegroups.com
Stuff like that definitely sounds more reasonable than just focusing on
CRDs, but that also grows the scope a good bit if we're talking about
arbitrary content now that needs to be both loaded and in some kind of
Ready state. That smells more like the "early webhook" kind of thing
where we want to delegate the logic of "it's ready" to a user service or
script somewhere, though that then has its own bootstrapping problem :-/
Still, maybe focus more on the "prevent interaction" rather than the
content loading unless we think that can be made fully generic somehow.

--Noah

Daniel Smith wrote on 5/5/21 2:54 PM:
> I personally agree, however the people who want things like security
> policies, resource quotas, or even APF configurations, want those things
> there before users can interact with the cluster at all.
>
> Not to mention admission webhook configurations...
>
> On Wed, May 5, 2021 at 2:49 PM Noah Kantrowitz <no...@coderanger.net
> <mailto:no...@coderanger.net>> wrote:
>
> Just because I haven't seen this mentioned so far, are we sure "prevent
> clients from interacting until a given CRD is available" is a thing we
> actually want? The alternative approach of "if you get a type not found
> error, retry your request until it works" seems like it will encourage
> less brittle systems and covers a lot of the tricky edge cases being
> discussed in this thread. Eventual consistency and promise theory are
> cooooool.
>
> --Noah
>
> 'Daniel Smith' via K8s API Machinery SIG wrote on 5/5/21 2:36 PM:
> >
> >
> >
> > On Wed, May 5, 2021 at 11:08 AM Yuvaraj Balaji
> <yuva...@gmail.com <mailto:yuva...@gmail.com>
> > <mailto:yuva...@gmail.com <mailto:yuva...@gmail.com>>> wrote:
> >
> >     Hi Daniel,
> >
> >     Thank you for your thoughts. Replying inline.
> >
> >
> >     On Tue, May 4, 2021 at 11:16 AM Daniel Smith
> <dbs...@google.com <mailto:dbs...@google.com>
> <mailto:kubernetes-sig-api-machinery%2Bunsu...@googlegroups.com>
> > <mailto:kubernetes-sig-api-m...@googlegroups.com
> <mailto:kubernetes-sig-api-machinery%2Bunsu...@googlegroups.com>>.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAB_J3bYxfVEmPiTxfhWaBKkQiHyPbsnTo2LY1%3DCeOhkwgVAT%2Bg%40mail.gmail.com
>
> >
> <https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAB_J3bYxfVEmPiTxfhWaBKkQiHyPbsnTo2LY1%3DCeOhkwgVAT%2Bg%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "K8s API Machinery SIG" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to
> kubernetes-sig-api-m...@googlegroups.com
> <mailto:kubernetes-sig-api-machinery%2Bunsu...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/380991c4-1fb6-19d3-ca65-192126badd75%40coderanger.net.
>

Daniel Smith

unread,
May 5, 2021, 6:11:35 PM5/5/21
to Noah Kantrowitz, K8s API Machinery SIG, kubernetes-si...@googlegroups.com
On Wed, May 5, 2021 at 3:01 PM Noah Kantrowitz <no...@coderanger.net> wrote:
Stuff like that definitely sounds more reasonable than just focusing on
CRDs, but that also grows the scope a good bit if we're talking about
arbitrary content now that needs to be both loaded and in some kind of
Ready state. That smells more like the "early webhook" kind of thing

Well, I didn't say anything about requiring it to be in a "Ready" state! :sweat_smile:

But really, the point of opening up an authz plugin is to permit that kind of logic if the provider really wants to do it; especially if the alternative is putting it in upstream where I/David have to worry about whether it's *right* and *everyone* has run it.
 
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/b4f93b1a-0d31-099e-2c05-d0deeaf40fe6%40coderanger.net.

Vivek Bagade

unread,
May 6, 2021, 1:02:52 PM5/6/21
to Daniel Smith, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-si...@googlegroups.com
Hey everyone

Adding to the conversation, we've started an issue(https://github.com/kubernetes/kubernetes/issues/101762) to add support for multiple Authorization webhooks in the kubeapiserver. This will help in implementing the mechanism that Daniel layed out earlier.

Tim Hockin

unread,
Jun 8, 2021, 4:12:02 PM6/8/21
to Vivek Bagade, Daniel Smith, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
Hi All.

I keep running into this decision point - the newest case is a revised
form of node IPAM that needs a resource to carry config(s). I feel
obligated to push a CRD solution, but absent a solution here, it feels
disingenuous to do so.

Revisiting my own doc, I think it does not go far enough. Here's what
I think now, 2 years later.

1) We want to define a (growing) set of "part of the project" APIs in
terms of CRDs.
2) Defining a k8s API necessarily includes:
a) a versioned schema
b) create and update validation
c) default values for fields (including "default on read")
d) version conversion rules
3) Our goal should be that (eventually) all of the above are
declarative and built into CRDs (for some large portion of the most
common needs). We are not "done" until this works without running pods
in-cluster.
4) We should treat API updates and kubernetes versions like a database
schema update (update schema, then update controllers).
5) We should consider implementing API-side protections that prevent
accidental breakages, but we probably can't prevent intentional
breakage.
6) Because these APIs are "part of the project" it's OK to have
controllers implemented in places like kube-controller-manager.

If I were re-tooling the proposal, I'd focus on those. "Preventing
usage of the cluster until policies are in place" is a different
problem.

Now, I'm first to admit that I just described a LOT of work, but I
think it's important to understand where we want to arrive at,
eventually.

Tim

On Thu, May 6, 2021 at 10:02 AM 'Vivek Bagade' via
kubernetes-sig-multicluster
> --
> You received this message because you are subscribed to the Google Groups "kubernetes-sig-multicluster" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-mult...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-multicluster/CAErx33eQq9p8CW_GjrGPvPajU%3D-4o8usVX-00yRC6tqty%3Da2_Q%40mail.gmail.com.

Daniel Smith

unread,
Jun 8, 2021, 4:34:13 PM6/8/21
to Tim Hockin, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
There was a fairly long discussion at our sig meeting last week (recording).

On this topic, the only thing that we kinda-sorta established is that a CRD installing controller should run from controller manager, not as part of apiserver.

One more thought below

It is not a completely different problem; IIRC, RuntimeClass couldn't use a CRD because there was a hard dependency from kubelet (?) that the type exist on startup. CRDs installed by an installer *will* show up some time after cluster start-up.

Tim Hockin

unread,
Jun 8, 2021, 5:13:07 PM6/8/21
to Daniel Smith, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Tue, Jun 8, 2021 at 1:34 PM Daniel Smith <dbs...@google.com> wrote:
>
> There was a fairly long discussion at our sig meeting last week (recording).
>
> On this topic, the only thing that we kinda-sorta established is that a CRD installing controller should run from controller manager, not as part of apiserver.

I wonder if it shouldn't be a distinct thing entirely. Upgrade the
schema BEFORE you upgrade the cluster.

>> If I were re-tooling the proposal, I'd focus on those. "Preventing
>> usage of the cluster until policies are in place" is a different
>> problem.
>
> It is not a completely different problem; IIRC, RuntimeClass couldn't use a CRD because there was a hard dependency from kubelet (?) that the type exist on startup. CRDs installed by an installer *will* show up some time after cluster start-up.

I'm not convinced. If kubelet REQUIRES runtimeclass, then what
prevents someone from unloading that type? AFAICT nothing, so either
it's OK for kubelet to crashloop the absence of the type (in which
case, just do that until the type is loaded) or it's not OK to
crashloop (in which case it has to handle the absence of the type).
If you are a cluster admin and you don't want to allow new pods to run
(at all?) in some conditions, isn't that a general policy problem?

Daniel Smith

unread,
Jun 8, 2021, 5:42:22 PM6/8/21
to Tim Hockin, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Tue, Jun 8, 2021 at 2:13 PM Tim Hockin <tho...@google.com> wrote:
On Tue, Jun 8, 2021 at 1:34 PM Daniel Smith <dbs...@google.com> wrote:
>
> There was a fairly long discussion at our sig meeting last week (recording).
>
> On this topic, the only thing that we kinda-sorta established is that a CRD installing controller should run from controller manager, not as part of apiserver.

I wonder if it shouldn't be a distinct thing entirely.  Upgrade the
schema BEFORE you upgrade the cluster.

Adding a new mandatory component is super unpopular with folks who have tooling to start clusters.
 
>> If I were re-tooling the proposal, I'd focus on those.  "Preventing
>> usage of the cluster until policies are in place" is a different
>> problem.
>
> It is not a completely different problem; IIRC, RuntimeClass couldn't use a CRD because there was a hard dependency from kubelet (?) that the type exist on startup. CRDs installed by an installer *will* show up some time after cluster start-up.

I'm not convinced.  If kubelet REQUIRES runtimeclass, then what
prevents someone from unloading that type?  AFAICT nothing, so either
it's OK for kubelet to crashloop the absence of the type (in which
case, just do that until the type is loaded) or it's not OK to
crashloop (in which case it has to handle the absence of the type).
If you are a cluster admin and you don't want to allow new pods to run
(at all?) in some conditions, isn't that a general policy problem?

I agree with you and I don't remember why we lost this argument at the time, it was a few years ago. My argument here is that other people *feel* blocked by this, not that they are necessarily actually blocked.
 

Tim Hockin

unread,
Jun 8, 2021, 5:47:46 PM6/8/21
to Daniel Smith, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Tue, Jun 8, 2021 at 2:42 PM Daniel Smith <dbs...@google.com> wrote:
>
>
>
> On Tue, Jun 8, 2021 at 2:13 PM Tim Hockin <tho...@google.com> wrote:
>>
>> On Tue, Jun 8, 2021 at 1:34 PM Daniel Smith <dbs...@google.com> wrote:
>> >
>> > There was a fairly long discussion at our sig meeting last week (recording).
>> >
>> > On this topic, the only thing that we kinda-sorta established is that a CRD installing controller should run from controller manager, not as part of apiserver.
>>
>> I wonder if it shouldn't be a distinct thing entirely. Upgrade the
>> schema BEFORE you upgrade the cluster.
>
>
> Adding a new mandatory component is super unpopular with folks who have tooling to start clusters.

I'm not even sure it's a component as much as a process during cluster
turnup. Or both? Before you start an upgrade of the cluster control
plane, you should upgrade the API schema (CRDs). Existing controllers
must tolerate this. Then you can upgrade the control plane and
controllers. I guess at runtime you want to re-assert that same
schema, which seems fine inside controller manager (I think).

>> >> If I were re-tooling the proposal, I'd focus on those. "Preventing
>> >> usage of the cluster until policies are in place" is a different
>> >> problem.
>> >
>> > It is not a completely different problem; IIRC, RuntimeClass couldn't use a CRD because there was a hard dependency from kubelet (?) that the type exist on startup. CRDs installed by an installer *will* show up some time after cluster start-up.
>>
>> I'm not convinced. If kubelet REQUIRES runtimeclass, then what
>> prevents someone from unloading that type? AFAICT nothing, so either
>> it's OK for kubelet to crashloop the absence of the type (in which
>> case, just do that until the type is loaded) or it's not OK to
>> crashloop (in which case it has to handle the absence of the type).
>> If you are a cluster admin and you don't want to allow new pods to run
>> (at all?) in some conditions, isn't that a general policy problem?
>
>
> I agree with you and I don't remember why we lost this argument at the time, it was a few years ago. My argument here is that other people *feel* blocked by this, not that they are necessarily actually blocked.

ACK.

Daniel Smith

unread,
Jun 8, 2021, 5:51:45 PM6/8/21
to Tim Hockin, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Tue, Jun 8, 2021 at 2:47 PM Tim Hockin <tho...@google.com> wrote:
On Tue, Jun 8, 2021 at 2:42 PM Daniel Smith <dbs...@google.com> wrote:
>
>
>
> On Tue, Jun 8, 2021 at 2:13 PM Tim Hockin <tho...@google.com> wrote:
>>
>> On Tue, Jun 8, 2021 at 1:34 PM Daniel Smith <dbs...@google.com> wrote:
>> >
>> > There was a fairly long discussion at our sig meeting last week (recording).
>> >
>> > On this topic, the only thing that we kinda-sorta established is that a CRD installing controller should run from controller manager, not as part of apiserver.
>>
>> I wonder if it shouldn't be a distinct thing entirely.  Upgrade the
>> schema BEFORE you upgrade the cluster.
>
>
> Adding a new mandatory component is super unpopular with folks who have tooling to start clusters.

I'm not even sure it's a component as much as a process during cluster
turnup.  Or both?  Before you start an upgrade of the cluster control
plane, you should upgrade the API schema (CRDs).  Existing controllers
must tolerate this.  Then you can upgrade the control plane and
controllers.  I guess at runtime you want to re-assert that same
schema, which seems fine inside controller manager (I think).

That makes it even worse, though, because we generally have no other imperative commands that you run during upgrade/downgrade...

Tim Hockin

unread,
Jun 8, 2021, 5:55:05 PM6/8/21
to Daniel Smith, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Tue, Jun 8, 2021 at 2:51 PM Daniel Smith <dbs...@google.com> wrote:
>
>
>
> On Tue, Jun 8, 2021 at 2:47 PM Tim Hockin <tho...@google.com> wrote:
>>
>> On Tue, Jun 8, 2021 at 2:42 PM Daniel Smith <dbs...@google.com> wrote:
>> >
>> >
>> >
>> > On Tue, Jun 8, 2021 at 2:13 PM Tim Hockin <tho...@google.com> wrote:
>> >>
>> >> On Tue, Jun 8, 2021 at 1:34 PM Daniel Smith <dbs...@google.com> wrote:
>> >> >
>> >> > There was a fairly long discussion at our sig meeting last week (recording).
>> >> >
>> >> > On this topic, the only thing that we kinda-sorta established is that a CRD installing controller should run from controller manager, not as part of apiserver.
>> >>
>> >> I wonder if it shouldn't be a distinct thing entirely. Upgrade the
>> >> schema BEFORE you upgrade the cluster.
>> >
>> >
>> > Adding a new mandatory component is super unpopular with folks who have tooling to start clusters.
>>
>> I'm not even sure it's a component as much as a process during cluster
>> turnup. Or both? Before you start an upgrade of the cluster control
>> plane, you should upgrade the API schema (CRDs). Existing controllers
>> must tolerate this. Then you can upgrade the control plane and
>> controllers. I guess at runtime you want to re-assert that same
>> schema, which seems fine inside controller manager (I think).
>
>
> That makes it even worse, though, because we generally have no other imperative commands that you run during upgrade/downgrade...

...and this has been a nagging problem FOREVER. No clean way to touch
types to force storage-version updates, etc.

But we can still position this declaratively - first you have to
update the schema payload. Then you can update API servers. Then you
can update controllers. (We already document the
API-before-controllers sequence, I think).

Daniel Smith

unread,
Jun 8, 2021, 6:07:24 PM6/8/21
to Tim Hockin, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Tue, Jun 8, 2021 at 2:55 PM Tim Hockin <tho...@google.com> wrote:
On Tue, Jun 8, 2021 at 2:51 PM Daniel Smith <dbs...@google.com> wrote:
>
>
>
> On Tue, Jun 8, 2021 at 2:47 PM Tim Hockin <tho...@google.com> wrote:
>>
>> On Tue, Jun 8, 2021 at 2:42 PM Daniel Smith <dbs...@google.com> wrote:
>> >
>> >
>> >
>> > On Tue, Jun 8, 2021 at 2:13 PM Tim Hockin <tho...@google.com> wrote:
>> >>
>> >> On Tue, Jun 8, 2021 at 1:34 PM Daniel Smith <dbs...@google.com> wrote:
>> >> >
>> >> > There was a fairly long discussion at our sig meeting last week (recording).
>> >> >
>> >> > On this topic, the only thing that we kinda-sorta established is that a CRD installing controller should run from controller manager, not as part of apiserver.
>> >>
>> >> I wonder if it shouldn't be a distinct thing entirely.  Upgrade the
>> >> schema BEFORE you upgrade the cluster.
>> >
>> >
>> > Adding a new mandatory component is super unpopular with folks who have tooling to start clusters.
>>
>> I'm not even sure it's a component as much as a process during cluster
>> turnup.  Or both?  Before you start an upgrade of the cluster control
>> plane, you should upgrade the API schema (CRDs).  Existing controllers
>> must tolerate this.  Then you can upgrade the control plane and
>> controllers.  I guess at runtime you want to re-assert that same
>> schema, which seems fine inside controller manager (I think).
>
>
> That makes it even worse, though, because we generally have no other imperative commands that you run during upgrade/downgrade...

...and this has been a nagging problem FOREVER.  No clean way to touch
types to force storage-version updates, etc.

Actually the storage migrator will do this touch for you, but it's currently optional and not every environment runs it.
 
But we can still position this declaratively - first you have to
update the schema payload.  Then you can update API servers.  Then you
can update controllers.  (We already document the
API-before-controllers sequence, I think).

Yes, but that's optional guidance, in case people are self-hosting; upgrading an entire control plane VM at a time--everything on it at once--is much more common, I think.

Tim Hockin

unread,
Jun 8, 2021, 6:16:09 PM6/8/21
to Daniel Smith, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Tue, Jun 8, 2021 at 3:07 PM Daniel Smith <dbs...@google.com> wrote:
>
>
>
> On Tue, Jun 8, 2021 at 2:55 PM Tim Hockin <tho...@google.com> wrote:
>>
>> On Tue, Jun 8, 2021 at 2:51 PM Daniel Smith <dbs...@google.com> wrote:
>> >
>> >
>> >
>> > On Tue, Jun 8, 2021 at 2:47 PM Tim Hockin <tho...@google.com> wrote:
>> >>
>> >> On Tue, Jun 8, 2021 at 2:42 PM Daniel Smith <dbs...@google.com> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Jun 8, 2021 at 2:13 PM Tim Hockin <tho...@google.com> wrote:
>> >> >>
>> >> >> On Tue, Jun 8, 2021 at 1:34 PM Daniel Smith <dbs...@google.com> wrote:
>> >> >> >
>> >> >> > There was a fairly long discussion at our sig meeting last week (recording).
>> >> >> >
>> >> >> > On this topic, the only thing that we kinda-sorta established is that a CRD installing controller should run from controller manager, not as part of apiserver.
>> >> >>
>> >> >> I wonder if it shouldn't be a distinct thing entirely. Upgrade the
>> >> >> schema BEFORE you upgrade the cluster.
>> >> >
>> >> >
>> >> > Adding a new mandatory component is super unpopular with folks who have tooling to start clusters.
>> >>
>> >> I'm not even sure it's a component as much as a process during cluster
>> >> turnup. Or both? Before you start an upgrade of the cluster control
>> >> plane, you should upgrade the API schema (CRDs). Existing controllers
>> >> must tolerate this. Then you can upgrade the control plane and
>> >> controllers. I guess at runtime you want to re-assert that same
>> >> schema, which seems fine inside controller manager (I think).
>> >
>> >
>> > That makes it even worse, though, because we generally have no other imperative commands that you run during upgrade/downgrade...
>>
>> ...and this has been a nagging problem FOREVER. No clean way to touch
>> types to force storage-version updates, etc.
>
>
> Actually the storage migrator will do this touch for you, but it's currently optional and not every environment runs it.

And it took us HOW LONG to get there? :)

>> But we can still position this declaratively - first you have to
>> update the schema payload. Then you can update API servers. Then you
>> can update controllers. (We already document the
>> API-before-controllers sequence, I think).
>
>
> Yes, but that's optional guidance, in case people are self-hosting; upgrading an entire control plane VM at a time--everything on it at once--is much more common, I think.

That's true. Maybe my point about pre-upgrading isn't so important as the rest.

Clayton Coleman

unread,
Jun 8, 2021, 8:21:13 PM6/8/21
to Tim Hockin, Daniel Smith, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
And in practice anyone upgrading hosted components like KCM and scheduler is probably not scaling to zero before they roll out the new code and so in practice they are already at risk of the new controller code running, then losing leader status, then the old controller code running.  We haven't really hit a serious issue here because we're so obsessive about not allowing incompatible schema changes... which of course, are a thing CRDs don't have solved.

One part of me just wants to say we should implement the types directly because a) crds are already successful b) i'd rather invest in improvements to CRDs vs gate keeping a bunch of optional features to solve this particular ordering problem.  The "we should do CRDs in core because we make people out of core use them" is not particularly convincing when it comes to conversion and long term support.
 

That's true.  Maybe my point about pre-upgrading isn't so important as the rest.

>> >> >> >> If I were re-tooling the proposal, I'd focus on those.  "Preventing
>> >> >> >> usage of the cluster until policies are in place" is a different
>> >> >> >> problem.
>> >> >> >
>> >> >> > It is not a completely different problem; IIRC, RuntimeClass couldn't use a CRD because there was a hard dependency from kubelet (?) that the type exist on startup. CRDs installed by an installer *will* show up some time after cluster start-up.
>> >> >>
>> >> >> I'm not convinced.  If kubelet REQUIRES runtimeclass, then what
>> >> >> prevents someone from unloading that type?  AFAICT nothing, so either
>> >> >> it's OK for kubelet to crashloop the absence of the type (in which
>> >> >> case, just do that until the type is loaded) or it's not OK to
>> >> >> crashloop (in which case it has to handle the absence of the type).
>> >> >> If you are a cluster admin and you don't want to allow new pods to run
>> >> >> (at all?) in some conditions, isn't that a general policy problem?
>> >> >
>> >> >
>> >> > I agree with you and I don't remember why we lost this argument at the time, it was a few years ago. My argument here is that other people *feel* blocked by this, not that they are necessarily actually blocked.
>> >>
>> >> ACK.

--
You received this message because you are subscribed to the Google Groups "K8s API Machinery SIG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAO_RewYgfoRSYJmvrMoFZhp4hxDg_L_uBM8qrvv8sLVYurFiug%40mail.gmail.com.

Tim Hockin

unread,
Jun 9, 2021, 12:57:04 PM6/9/21
to Clayton Coleman, Daniel Smith, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
We have had cases (I fail to recall the details) where it was assumed
that an upgraded controller will never hit a non-upgraded apiserver.

> One part of me just wants to say we should implement the types directly because a) crds are already successful b) i'd rather invest in improvements to CRDs vs gate keeping a bunch of optional features to solve this particular ordering problem. The "we should do CRDs in core because we make people out of core use them" is not particularly convincing when it comes to conversion and long term support.

I hear that, and it may be (again) the near-term solution, but it
really nags at me that CRDs are "good enough for thee, but not good
enough for me". The work to do this properly will pay off for people
doing CRDs in all contexts.

If that's what we want to do, I could rally some PRs to add
ClusterProperty, ServiceCIDR and ServiceIP, NodePortRange and
NodePort, and PodCIDR (and that's just top of my head). It would
certainly unstick some efforts...

Clayton Coleman

unread,
Jun 9, 2021, 1:21:08 PM6/9/21
to Tim Hockin, Daniel Smith, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
Right, I'm kind of taking the counter here because a) we haven't reexamined the assumption critically and b) we might make faster progress if we approached it differently.

One option would be "could we make it easier to define the schema in code" which results in a CRD OR an in tree registry.  That tooling helps controller-runtime / extenders.  That tooling works for the LCD of CRD + apiserver.  We then try to start moving in tree types into this form (so they have to work for CRD), which clearly identifies which feature's we're missing (like the fact that pods depend on strategic merge patch in kubectl and CRDs don't support it) as well as encourage / prioritize equivalence features.  In the short term, the "new things that should be CRDs" use this interface but go to internal types.  This is kind of co-opting Solly's IDL proposal with a more concrete benefit (rather than solve for in tree, we solve for LCD of CRD).  We want that layer to be thin (so we could force everyone te define a CRD, but generate an apiserver impl for it, or simply instantiate a CRD controller that takes its CRD as code, not calculated by the controller).

I.e.:

1. Write a CRD
2. Stick it in a variable
3. Add glue code that defines it as a CRD api endpoint (uses crd registry storage impl)
4. Fix other bits that assume CRDs exist
5. Have some gen code that spits out all the CRDs
6. Add the non-core types the same way, and force them to be decorated as CRD + extra rules in code initializing them
7. Be able to generate openapi only from CRDs, remove in tree generation of openapi from go types
8. etc etc etc

?

David Eads

unread,
Jun 9, 2021, 1:21:38 PM6/9/21
to Tim Hockin, Clayton Coleman, Daniel Smith, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
Since new types are treated as optional (they must be since beta (and even some GA) APIs are not required for conformance) and new fields on existing types are optional (they have to be in order for us keep our backwards compatibility guarantees), why do we need to have the new type established before a given component starts?  Even the kubelet, which explicitly indicates that it must not be newer than a kube-apiserver must not fail in the absence of a beta type and should not fail in the absence of a non-conformance-required GA type.

That being the case, it seems like a kube-controller-manager (or similar) binary could reconcile these types.  Even if we find unexpected bumps, it seems more generally helpful to have the individual binaries react at some reasonable timeframe to changes in the available types and fields.


Tim Hockin

unread,
Jun 9, 2021, 3:43:47 PM6/9/21
to Clayton Coleman, Daniel Smith, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
I am 100% sure that it would be faster/easier to just do these as core
APIs. Having a "north star" of making CRDs powerful enough doesn't
actually make them popwerful enough TODAY. That said, CRD defaulting
and validation DO cover a good set of use-cases. If we can solve the
loading problem (is there a concrete plan yet?) then I'd still like to
try to make it work.

> One option would be "could we make it easier to define the schema in code" which results in a CRD OR an in tree registry. That tooling helps controller-runtime / extenders. That tooling works for the LCD of CRD + apiserver. We then try to start moving in tree types into this form (so they have to work for CRD), which clearly identifies which feature's we're missing (like the fact that pods depend on strategic merge patch in kubectl and CRDs don't support it) as well as encourage / prioritize equivalence features. In the short term, the "new things that should be CRDs" use this interface but go to internal types. This is kind of co-opting Solly's IDL proposal with a more concrete benefit (rather than solve for in tree, we solve for LCD of CRD). We want that layer to be thin (so we could force everyone te define a CRD, but generate an apiserver impl for it, or simply instantiate a CRD controller that takes its CRD as code, not calculated by the controller).

I read this paragraph as a nice incremental approach, but it needs a
fairly significant injection of "activation energy". The way I
envisioned it is different than what you wrote below (or I
misunderstood what you wrote below), though. ACK that it feels like a
step towards IDL, which I am still (more than ever, actually)
convinced we need. What I am afraid of now is saying to these efforts
"you need to wait for $someone to write a code-generator that converts
CRDs into builtin APIs with defaulting and validation logic".

I wish I had the time to tackle these things myself - they sound like
incredibly FUN projects.

Clayton Coleman

unread,
Jun 9, 2021, 4:01:44 PM6/9/21
to Tim Hockin, Daniel Smith, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
Hypothetically, if the API had go objects and a CRD, there might be a minimum bit of boilerplate we could add that leveraged the intree code and registered the CRD object as a built-in type.  If that was relatively small, not invasive, and could potentially become that bigger thing later, I would be more interested.  I agree I am not interested in the "we need an IDL to do this in tree" approach or a "let's build a framework for this", more of keeping the source of truth in CRDish form in code.  If that could be faster than solving "install CRDs from KCM" and not too much slower than "just do these as core APIs" it might create the right incentive.

Tim Hockin

unread,
Jun 11, 2021, 2:03:36 PM6/11/21
to Clayton Coleman, Daniel Smith, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
To be clear - are you proposing to literally install a CRD in the
cluster that represents built-in types? Or just use the structure of
a CRD to approximate an IDL?

Daniel Smith

unread,
Jun 11, 2021, 2:32:55 PM6/11/21
to Tim Hockin, Clayton Coleman, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
A new controller in controller manager that installs some CRDs is like two days of effort. The KEP would be more effort.

Getting everyone to agree on it is significantly more effort than that!

I don't think schemas or IDL frameworks or anything like that has anything to do with this, it's unrelated.

If core starts using CRDs, we'll have some additional problems, but they can all be fixed over time and with motivation.

* A place to host webhooks for validation (planned additions to CRDs won't be sufficient for everything)
* Binary transport format for scalability
* Easier way to generate a well-formed schema

All of these should be solved after we have a way to install CRDs, not before. We don't really have these problems yet. We have to fix problems in the order in which they occur, otherwise we don't add value for a long time.

Clayton Coleman

unread,
Jun 11, 2021, 2:40:17 PM6/11/21
to Tim Hockin, Daniel Smith, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
> On Jun 11, 2021, at 2:03 PM, Tim Hockin <tho...@google.com> wrote:
>
> To be clear - are you proposing to literally install a CRD in the
> cluster that represents built-in types? Or just use the structure of
> a CRD to approximate an IDL?

Use a crd in code (take v1.CRD) and then just start it on that
apiserver as built in type.

Clayton Coleman

unread,
Jun 11, 2021, 2:42:32 PM6/11/21
to Daniel Smith, Tim Hockin, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster


On Jun 11, 2021, at 2:32 PM, Daniel Smith <dbs...@google.com> wrote:


A new controller in controller manager that installs some CRDs is like two days of effort. The KEP would be more effort.

Getting everyone to agree on it is significantly more effort than that!

I don't think schemas or IDL frameworks or anything like that has anything to do with this, it's unrelated.

Don’t agree :).  What im trying to highlight is we’re fixated on mechanisms vs apis.  To a consumer, crds are no different than built in.  The machinery that takes a crd object and exposes a built in could be equally straightforward, and gives us a path to put in straight jackets that ensure CRD develops features without leaking the mechanism to distributions.

Daniel Smith

unread,
Jun 11, 2021, 2:44:49 PM6/11/21
to Clayton Coleman, Tim Hockin, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Fri, Jun 11, 2021 at 11:42 AM Clayton Coleman <ccol...@redhat.com> wrote:


On Jun 11, 2021, at 2:32 PM, Daniel Smith <dbs...@google.com> wrote:


A new controller in controller manager that installs some CRDs is like two days of effort. The KEP would be more effort.

Getting everyone to agree on it is significantly more effort than that!

I don't think schemas or IDL frameworks or anything like that has anything to do with this, it's unrelated.

Don’t agree :).  What im trying to highlight is we’re fixated on mechanisms vs apis.  To a consumer, crds are no different than built in.  The machinery that takes a crd object and exposes a built in could be equally straightforward, and gives us a path to put in straight jackets that ensure CRD develops features without leaking the mechanism to distributions.

I don't think such machinery would be straightforward at all.

I think it's backwards to focus on hamstringing built-ins rather than letting those who can switch to CRDs, do so.

Daniel Smith

unread,
Jun 11, 2021, 2:45:38 PM6/11/21
to Gari Singh, Tim Hockin, Clayton Coleman, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster


On Fri, Jun 11, 2021 at 11:44 AM Gari Singh <gari...@google.com> wrote:
So place/bundle CRD yamls in some known location (or location from config option) which controller manager ingests and loads?

I would start by embedding them in source code at compile time, not by dynamically loading from disk, that seems more risky.
 
  

You received this message because you are subscribed to the Google Groups "kubernetes-sig-multicluster" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-mult...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-multicluster/CAB_J3bZcgQfBqVyxLQ_JVOFbGg3z%2BAve3wMZFpCTk77NKniH5g%40mail.gmail.com.

Gari Singh

unread,
Jun 11, 2021, 2:45:56 PM6/11/21
to Daniel Smith, Tim Hockin, Clayton Coleman, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
So place/bundle CRD yamls in some known location (or location from config option) which controller manager ingests and loads?  

On Fri, Jun 11, 2021 at 2:32 PM 'Daniel Smith' via kubernetes-sig-multicluster <kubernetes-si...@googlegroups.com> wrote:
You received this message because you are subscribed to the Google Groups "kubernetes-sig-multicluster" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-mult...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-multicluster/CAB_J3bZcgQfBqVyxLQ_JVOFbGg3z%2BAve3wMZFpCTk77NKniH5g%40mail.gmail.com.

Clayton Coleman

unread,
Jun 11, 2021, 2:56:02 PM6/11/21
to Daniel Smith, Gari Singh, Tim Hockin, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Fri, Jun 11, 2021 at 2:45 PM Daniel Smith <dbs...@google.com> wrote:


On Fri, Jun 11, 2021 at 11:44 AM Gari Singh <gari...@google.com> wrote:
So place/bundle CRD yamls in some known location (or location from config option) which controller manager ingests and loads?

I would start by embedding them in source code at compile time, not by dynamically loading from disk, that seems more risky.

I would start by embedding them in the source code of the API server and simply serving them as builtins.

I think at this point we can agree to disagree, but I'd like to actually find a forum to engage more deeply to resolve it.  I'd be more in favor of "do builtins" than "controller manager installing CRDs yet" (but note, I am far more in favor of that than "let this be someone else's problem" or "don't solve"), even if the (to me) best option is "align builtins and CRDs".

Daniel Smith

unread,
Jun 11, 2021, 4:47:44 PM6/11/21
to Clayton Coleman, Gari Singh, Tim Hockin, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Fri, Jun 11, 2021 at 11:56 AM Clayton Coleman <ccol...@redhat.com> wrote:


On Fri, Jun 11, 2021 at 2:45 PM Daniel Smith <dbs...@google.com> wrote:


On Fri, Jun 11, 2021 at 11:44 AM Gari Singh <gari...@google.com> wrote:
So place/bundle CRD yamls in some known location (or location from config option) which controller manager ingests and loads?

I would start by embedding them in source code at compile time, not by dynamically loading from disk, that seems more risky.

I would start by embedding them in the source code of the API server and simply serving them as builtins.

I think at this point we can agree to disagree, but I'd like to actually find a forum to engage more deeply to resolve it.  I'd be more in favor of "do builtins"

Happy to get in a higher bandwidth situation. I really don't know what you mean by "do builtins" or "embedding [CRDs] in the source code of the API server and simply serving them as builtins", that doesn't sound to me like a thing that can just "be done".

Tim Hockin

unread,
Jun 11, 2021, 5:32:55 PM6/11/21
to Daniel Smith, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
I have no context in the CRD codepaths, so I can't assess easy or
hard. As a sig-arch person, I want to see the project inch towards
CRDs being as powerful as builtin types, and I want to make less
builtins. As a sig-net and sig-multicluster person, I want to make
progress, and not take on side-quests.

I suspect that for some/most of my use-cases "auto-load" is all I need
(in addition to existing CRD capabilities). I boldly assume that
auto-load means "and revert any manually-made changes" (re-apply).

I haven't yet heard how a controlling in KCM will choose which version
of CRDs to load - has that been worked out?

How to proceed? I wanted to encourage the folks investigating
auto-load, but this took a bad turn...

Clayton Coleman

unread,
Jun 11, 2021, 5:48:07 PM6/11/21
to Tim Hockin, Daniel Smith, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
> On Jun 11, 2021, at 5:32 PM, Tim Hockin <tho...@google.com> wrote:
>
> I have no context in the CRD codepaths, so I can't assess easy or
> hard. As a sig-arch person, I want to see the project inch towards
> CRDs being as powerful as builtin types, and I want to make less
> builtins. As a sig-net and sig-multicluster person, I want to make
> progress, and not take on side-quests.
>
> I suspect that for some/most of my use-cases "auto-load" is all I need
> (in addition to existing CRD capabilities). I boldly assume that
> auto-load means "and revert any manually-made changes" (re-apply).
>
> I haven't yet heard how a controlling in KCM will choose which version
> of CRDs to load - has that been worked out?
>
> How to proceed? I wanted to encourage the folks investigating
> auto-load, but this took a bad turn...

Important, because we’re weighing lots of long term advantages but not
considering externalities. Is there a forum we can solicit some
tradeoffs in (i’m happy to articulate what dan is doubtful of, and
maybe we can contrast costs more effectively)? Perhaps some time in
next week?

Daniel Smith

unread,
Jun 11, 2021, 5:50:50 PM6/11/21
to Tim Hockin, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Fri, Jun 11, 2021 at 2:32 PM Tim Hockin <tho...@google.com> wrote:
I have no context in the CRD codepaths, so I can't assess easy or
hard.  As a sig-arch person, I want to see the project inch towards
CRDs being as powerful as builtin types, and I want to make less
builtins.  As a sig-net and sig-multicluster person, I want to make
progress, and not take on side-quests.

I suspect that for some/most of my use-cases "auto-load" is all I need
(in addition to existing CRD capabilities).  I boldly assume that
auto-load means "and revert any manually-made changes" (re-apply).

Sure. I'd probably use SSA + force conflicts; there's no need to fight with users over e.g. labels and stuff.
 
I haven't yet heard how a controlling in KCM will choose which version
of CRDs to load - has that been worked out?

I'd make it go with whatever version is compiled into the KCM which is the leader. I can imagine at least one cute trick to keep it from flopping back and forth if the cluster is in a split-version configuration (over an upgrade or downgrade). It shouldn't have to be that complicated.
 
How to proceed?  I wanted to encourage the folks investigating
auto-load, but this took a bad turn...

A KEP would be logical, or a feature request with some details, or a prototype.

David Eads

unread,
Jun 14, 2021, 7:51:06 AM6/14/21
to Daniel Smith, Tim Hockin, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
I haven't yet heard how a controlling in KCM will choose which version
of CRDs to load - has that been worked out?
I'd make it go with whatever version is compiled into the KCM which is the leader. I can imagine at least one cute trick to keep it from flopping back and forth if the cluster is in a split-version configuration (over an upgrade or downgrade). It shouldn't have to be that complicated.

Given how our general skew story works, I think we're actually ok in cases where the schema transitions from vN-1 to vN to vN-1 to vN until the cluster completes the upgrade and the vN-1 KCM is retired.  Clients already have to deal with the cases where their APIs are missing because features have been disabled and they have to handle cases where new fields have no data in them.

I don't think I would have the vN-1 KCM delete CRDs that only exist in vN, since that is both slightly more difficult and more destructive.
 

Shawn Hurley

unread,
Jun 14, 2021, 10:41:30 AM6/14/21
to David Eads, Daniel Smith, Tim Hockin, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster

Hello All,

I don’t know if it makes sense as part of this discussion/feature or just something that requires another forum or discussion, but how would the conversion utilities in kubectl work with this? As far as I know, it uses built-in types and conversion functions. Would we need a way to handle this for all CRDs?

Thanks,

Shawn Hurley

Tim Hockin

unread,
Jun 14, 2021, 12:10:50 PM6/14/21
to David Eads, Daniel Smith, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Mon, Jun 14, 2021 at 4:51 AM David Eads <de...@redhat.com> wrote:
>>
>> I haven't yet heard how a controlling in KCM will choose which version
>>
>> of CRDs to load - has that been worked out?
>> I'd make it go with whatever version is compiled into the KCM which is the leader. I can imagine at least one cute trick to keep it from flopping back and forth if the cluster is in a split-version configuration (over an upgrade or downgrade). It shouldn't have to be that complicated.
>
>
> Given how our general skew story works, I think we're actually ok in cases where the schema transitions from vN-1 to vN to vN-1 to vN until the cluster completes the upgrade and the vN-1 KCM is retired. Clients already have to deal with the cases where their APIs are missing because features have been disabled and they have to handle cases where new fields have no data in them.

This seems problematic if a controller sees a new field, uses it, then
that field gets nuked ?

Daniel Smith

unread,
Jun 14, 2021, 12:11:47 PM6/14/21
to David Eads, Tim Hockin, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Mon, Jun 14, 2021 at 4:51 AM David Eads <de...@redhat.com> wrote:
I haven't yet heard how a controlling in KCM will choose which version
of CRDs to load - has that been worked out?
I'd make it go with whatever version is compiled into the KCM which is the leader. I can imagine at least one cute trick to keep it from flopping back and forth if the cluster is in a split-version configuration (over an upgrade or downgrade). It shouldn't have to be that complicated.

Given how our general skew story works, I think we're actually ok in cases where the schema transitions from vN-1 to vN to vN-1 to vN until the cluster completes the upgrade and the vN-1 KCM is retired.  Clients already have to deal with the cases where their APIs are missing because features have been disabled and they have to handle cases where new fields have no data in them.

I don't think I would have the vN-1 KCM delete CRDs that only exist in vN, since that is both slightly more difficult and more destructive.

Sure, the easiest thing to do is just never downgrade the CRDs, but that leaves an upgraded-then-downgraded cluster in an untested configuration.

2nd easiest thing is just "leader wins" and not worry about flip-flopping a bit around upgrades/downgrades. Ugly but should actually be fine.

With a tiny bit of work we can improve on that, IMO.

Jordan Liggitt

unread,
Jun 14, 2021, 12:15:15 PM6/14/21
to Daniel Smith, David Eads, Tim Hockin, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
CRDs that change the stored version cannot be downgraded to a CRD that no longer defines that version

Initial CRD:
  • versions: [v1beta1, v1beta2]
  • stored version: v1beta2

Upgraded CRD:
  • versions: [v1beta1, v1beta2, v1]
  • stored version: v1

Once the upgraded CRD is present, applying the initial CRD and dropping v1 is no longer permitted



Daniel Smith

unread,
Jun 14, 2021, 12:17:20 PM6/14/21
to Jordan Liggitt, David Eads, Tim Hockin, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Mon, Jun 14, 2021 at 9:15 AM Jordan Liggitt <lig...@google.com> wrote:
CRDs that change the stored version cannot be downgraded to a CRD that no longer defines that version

Initial CRD:
  • versions: [v1beta1, v1beta2]
  • stored version: v1beta2

Upgraded CRD:
  • versions: [v1beta1, v1beta2, v1]
  • stored version: v1

Once the upgraded CRD is present, applying the initial CRD and dropping v1 is no longer permitted

You can't add v1 and change the storage version in the same release, if you want to follow the rules we have for built-ins.

The permitted CRD state changes will all let the system go up and down.

Tim Hockin

unread,
Jun 14, 2021, 12:20:22 PM6/14/21
to Daniel Smith, Jordan Liggitt, David Eads, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Mon, Jun 14, 2021 at 9:17 AM Daniel Smith <dbs...@google.com> wrote:
>
>
>
> On Mon, Jun 14, 2021 at 9:15 AM Jordan Liggitt <lig...@google.com> wrote:
>>
>> CRDs that change the stored version cannot be downgraded to a CRD that no longer defines that version
>>
>> Initial CRD:
>>
>> versions: [v1beta1, v1beta2]
>> stored version: v1beta2
>>
>>
>> Upgraded CRD:
>>
>> versions: [v1beta1, v1beta2, v1]
>> stored version: v1
>>
>>
>> Once the upgraded CRD is present, applying the initial CRD and dropping v1 is no longer permitted
>
>
> You can't add v1 and change the storage version in the same release, if you want to follow the rules we have for built-ins.

"in theory" or "the apiserver won't let you"

Daniel Smith

unread,
Jun 14, 2021, 12:21:59 PM6/14/21
to Shawn Hurley, David Eads, Tim Hockin, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
Offline kubectl convert won't be able to work for CRs.

Online (access to a cluster) kubectl convert doesn't work now AFAIK, we could make it work perhaps by having kubectl call the conversion webhook directly-- otherwise we'd need to add something to apiserver.

Jordan Liggitt

unread,
Jun 14, 2021, 12:22:28 PM6/14/21
to Tim Hockin, Daniel Smith, David Eads, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Mon, Jun 14, 2021 at 12:20 PM Tim Hockin <tho...@google.com> wrote:
On Mon, Jun 14, 2021 at 9:17 AM Daniel Smith <dbs...@google.com> wrote:

> You can't add v1 and change the storage version in the same release, if you want to follow the rules we have for built-ins.

"in theory" or "the apiserver won't let you"

The API server will let you add v1 and change the storage version in a single update on the way up. It won't let you drop v1 on the way down unless you also remove v1 from status.storedVersions (which means you have ensured stored v1 items were migrated back out of v1)

Daniel Smith

unread,
Jun 14, 2021, 12:23:03 PM6/14/21
to Tim Hockin, Jordan Liggitt, David Eads, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Mon, Jun 14, 2021 at 9:20 AM Tim Hockin <tho...@google.com> wrote:
On Mon, Jun 14, 2021 at 9:17 AM Daniel Smith <dbs...@google.com> wrote:
>
>
>
> On Mon, Jun 14, 2021 at 9:15 AM Jordan Liggitt <lig...@google.com> wrote:
>>
>> CRDs that change the stored version cannot be downgraded to a CRD that no longer defines that version
>>
>> Initial CRD:
>>
>> versions: [v1beta1, v1beta2]
>> stored version: v1beta2
>>
>>
>> Upgraded CRD:
>>
>> versions: [v1beta1, v1beta2, v1]
>> stored version: v1
>>
>>
>> Once the upgraded CRD is present, applying the initial CRD and dropping v1 is no longer permitted
>
>
> You can't add v1 and change the storage version in the same release, if you want to follow the rules we have for built-ins.

"in theory" or "the apiserver won't let you"

apiserver doesn't enforce that your schema will do the transitions right. But we, the api reviewers, will do so with CRDs that are in-tree.

Daniel Smith

unread,
Jun 14, 2021, 12:24:08 PM6/14/21
to Jordan Liggitt, Tim Hockin, David Eads, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
The constraint apiserver enforces is "the data in etcd is always interpretable".
 

Tim Hockin

unread,
Jun 14, 2021, 12:30:13 PM6/14/21
to Daniel Smith, Jordan Liggitt, David Eads, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Mon, Jun 14, 2021 at 9:23 AM Daniel Smith <dbs...@google.com> wrote:
>
>
>
> On Mon, Jun 14, 2021 at 9:20 AM Tim Hockin <tho...@google.com> wrote:
>>
>> On Mon, Jun 14, 2021 at 9:17 AM Daniel Smith <dbs...@google.com> wrote:
>> >
>> >
>> >
>> > On Mon, Jun 14, 2021 at 9:15 AM Jordan Liggitt <lig...@google.com> wrote:
>> >>
>> >> CRDs that change the stored version cannot be downgraded to a CRD that no longer defines that version
>> >>
>> >> Initial CRD:
>> >>
>> >> versions: [v1beta1, v1beta2]
>> >> stored version: v1beta2
>> >>
>> >>
>> >> Upgraded CRD:
>> >>
>> >> versions: [v1beta1, v1beta2, v1]
>> >> stored version: v1
>> >>
>> >>
>> >> Once the upgraded CRD is present, applying the initial CRD and dropping v1 is no longer permitted
>> >
>> >
>> > You can't add v1 and change the storage version in the same release, if you want to follow the rules we have for built-ins.
>>
>> "in theory" or "the apiserver won't let you"
>
>
> apiserver doesn't enforce that your schema will do the transitions right. But we, the api reviewers, will do so with CRDs that are in-tree.

Should it? Like, maybe an override to allow unsafe updates (or vice-versa) ?

Daniel Smith

unread,
Jun 14, 2021, 12:59:36 PM6/14/21
to Tim Hockin, Jordan Liggitt, David Eads, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Mon, Jun 14, 2021 at 9:30 AM Tim Hockin <tho...@google.com> wrote:
On Mon, Jun 14, 2021 at 9:23 AM Daniel Smith <dbs...@google.com> wrote:
>
>
>
> On Mon, Jun 14, 2021 at 9:20 AM Tim Hockin <tho...@google.com> wrote:
>>
>> On Mon, Jun 14, 2021 at 9:17 AM Daniel Smith <dbs...@google.com> wrote:
>> >
>> >
>> >
>> > On Mon, Jun 14, 2021 at 9:15 AM Jordan Liggitt <lig...@google.com> wrote:
>> >>
>> >> CRDs that change the stored version cannot be downgraded to a CRD that no longer defines that version
>> >>
>> >> Initial CRD:
>> >>
>> >> versions: [v1beta1, v1beta2]
>> >> stored version: v1beta2
>> >>
>> >>
>> >> Upgraded CRD:
>> >>
>> >> versions: [v1beta1, v1beta2, v1]
>> >> stored version: v1
>> >>
>> >>
>> >> Once the upgraded CRD is present, applying the initial CRD and dropping v1 is no longer permitted
>> >
>> >
>> > You can't add v1 and change the storage version in the same release, if you want to follow the rules we have for built-ins.
>>
>> "in theory" or "the apiserver won't let you"
>
>
> apiserver doesn't enforce that your schema will do the transitions right. But we, the api reviewers, will do so with CRDs that are in-tree.

Should it?  Like, maybe an override to allow unsafe updates (or vice-versa) ?

a) Up/down safety involves a lot of factors, I'm not sure we can ensure it programmatically yet, certainly not for CRDs that don't have accurate schemas
b) Especially during CRD development, there are many reasons you might want to do a transition that would be forbidden by upstream compatibility rules

Tim Hockin

unread,
Jun 15, 2021, 11:58:57 AM6/15/21
to Daniel Smith, Jordan Liggitt, David Eads, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Mon, Jun 14, 2021 at 9:59 AM Daniel Smith <dbs...@google.com> wrote:
>
>
>
> On Mon, Jun 14, 2021 at 9:30 AM Tim Hockin <tho...@google.com> wrote:
>>
>> On Mon, Jun 14, 2021 at 9:23 AM Daniel Smith <dbs...@google.com> wrote:
>> >
>> >
>> >
>> > On Mon, Jun 14, 2021 at 9:20 AM Tim Hockin <tho...@google.com> wrote:
>> >>
>> >> On Mon, Jun 14, 2021 at 9:17 AM Daniel Smith <dbs...@google.com> wrote:
>> >> >
>> >> >
>> >> >
>> >> > On Mon, Jun 14, 2021 at 9:15 AM Jordan Liggitt <lig...@google.com> wrote:
>> >> >>
>> >> >> CRDs that change the stored version cannot be downgraded to a CRD that no longer defines that version
>> >> >>
>> >> >> Initial CRD:
>> >> >>
>> >> >> versions: [v1beta1, v1beta2]
>> >> >> stored version: v1beta2
>> >> >>
>> >> >>
>> >> >> Upgraded CRD:
>> >> >>
>> >> >> versions: [v1beta1, v1beta2, v1]
>> >> >> stored version: v1
>> >> >>
>> >> >>
>> >> >> Once the upgraded CRD is present, applying the initial CRD and dropping v1 is no longer permitted
>> >> >
>> >> >
>> >> > You can't add v1 and change the storage version in the same release, if you want to follow the rules we have for built-ins.
>> >>
>> >> "in theory" or "the apiserver won't let you"
>> >
>> >
>> > apiserver doesn't enforce that your schema will do the transitions right. But we, the api reviewers, will do so with CRDs that are in-tree.
>>
>> Should it? Like, maybe an override to allow unsafe updates (or vice-versa) ?
>
>
> a) Up/down safety involves a lot of factors, I'm not sure we can ensure it programmatically yet, certainly not for CRDs that don't have accurate schemas
> b) Especially during CRD development, there are many reasons you might want to do a transition that would be forbidden by upstream compatibility rules

That's why I suggested it as an override. E.g. validation for CRD
says "you can't change the storage version to a version introduced in
the same update". It's easily worked-around (just do 2 updates!), and
maybe we need more depth, but I bet it prevents some mistakes. Then
if you pass a URL param or header (do we do that anywhere?) we bypass
that validation check. (`...?allow_scary_crd_updates=true`).

Anyway, step 1 is documenting.

Tim Hockin

unread,
Jun 15, 2021, 12:06:04 PM6/15/21
to Daniel Smith, Clayton Coleman, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
Bringing this thread back.

Let's assume I have a type for whom CRD (with no changes, no webhooks,
etc) is sufficient.

Do we, in the very near term:
a) define a mechanism for loading such CRDs (where they are defined,
stored, are they re-asserted, how are they upgraded, etc)
b) define a mechanism for loading such CRDs AND associated resources
(like webhooks)
c) focus on making CRD more complete
d) focus on a new CRD->builtin codegen
e) just send more email, accomplish nothing
?

I'd like to argue for (a). All the rest can come in time.

If so, can we nominate a reviewer for a to-be-written KEP? Daniel and
David seem like obvious candidates. Clayton and Jordan are qualified,
too. I am not (both as a sponsor of the work in progress and I am not
clueful enough about CRD internals).

Or should we just punt (again)? The folks doing the CRD-loading
thinking have been quiet on this thread - care to speak up?

Tim

Tim Hockin

unread,
Jun 15, 2021, 12:44:24 PM6/15/21
to Gari Singh, Daniel Smith, Clayton Coleman, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Tue, Jun 15, 2021 at 9:10 AM Gari Singh <gari...@google.com> wrote:
>
> I think we should minimally take a stab at a).
> But I'd also add that maybe we should also "define/declare" the intended user(s) of this mechanism? (e.g. is it intended for people who build their own distros and want to bake in their own CRDs, is it intended for in-tree submissions, etc). Perhaps I just lost track.

That is a good question. I think it's INTENDED for in-project types
(we need a word for this - it's not an extension, but it uses the
extension mechanism) but we can't prevent downstream projects from
extending. The bigger question is do we make it moderately hard to
extend (fork code, build a custom binary) or easy (use upstream
binary, put a file somewhere). I think the former (hard) is OK as a
first stab.

And this is now solidly in "write a KEP" territory.

Daniel Smith

unread,
Jun 15, 2021, 1:08:04 PM6/15/21
to Tim Hockin, Gari Singh, Clayton Coleman, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
I would review a KEP for a mechanism:
* targeted at in-tree use cases (using a CRD instead of a built-in), not targeted at end users (cluster admins) or distributors
* only loads CRDs, not necessarily other supporting types
* runs in controller manager

Davanum Srinivas

unread,
Jun 15, 2021, 1:13:55 PM6/15/21
to Daniel Smith, Tim Hockin, Gari Singh, Clayton Coleman, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
Thanks for outlining what's acceptable Daniel! 

Nikhita/Nabarun, wanna do this?

-- Dims

Clayton Coleman

unread,
Jun 15, 2021, 1:41:14 PM6/15/21
to Davanum Srinivas, Daniel Smith, Tim Hockin, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Tue, Jun 15, 2021 at 1:14 PM Davanum Srinivas <dav...@gmail.com> wrote:
Thanks for outlining what's acceptable Daniel! 

Nikhita/Nabarun, wanna do this?

-- Dims

On Tue, Jun 15, 2021 at 1:08 PM 'Daniel Smith' via K8s API Machinery SIG <kubernetes-sig...@googlegroups.com> wrote:
I would review a KEP for a mechanism:
* targeted at in-tree use cases (using a CRD instead of a built-in), not targeted at end users (cluster admins) or distributors
* only loads CRDs, not necessarily other supporting types
* runs in controller manager

I'm still not in direct agreement here, but we can take it in the KEP.  I have a ton of reservations around CRD fiddling in core given long history of what is considered correct for rollback, which CRD fiddling does not necessarily solve (which Jordan's comments were highlighting).

Daniel Smith

unread,
Jun 15, 2021, 2:20:13 PM6/15/21
to Clayton Coleman, Davanum Srinivas, Tim Hockin, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
On Tue, Jun 15, 2021 at 10:41 AM Clayton Coleman <ccol...@redhat.com> wrote:


On Tue, Jun 15, 2021 at 1:14 PM Davanum Srinivas <dav...@gmail.com> wrote:
Thanks for outlining what's acceptable Daniel! 

Nikhita/Nabarun, wanna do this?

-- Dims

On Tue, Jun 15, 2021 at 1:08 PM 'Daniel Smith' via K8s API Machinery SIG <kubernetes-sig...@googlegroups.com> wrote:
I would review a KEP for a mechanism:
* targeted at in-tree use cases (using a CRD instead of a built-in), not targeted at end users (cluster admins) or distributors
* only loads CRDs, not necessarily other supporting types
* runs in controller manager

I'm still not in direct agreement here, but we can take it in the KEP.  I have a ton of reservations around CRD fiddling in core given long history of what is considered correct for rollback, which CRD fiddling does not necessarily solve (which Jordan's comments were highlighting).

I promised to review, not to merge :)

Gari Singh

unread,
Jun 15, 2021, 2:22:08 PM6/15/21
to Tim Hockin, Daniel Smith, Clayton Coleman, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
I think we should minimally take a stab at a).
But I'd also add that maybe we should also "define/declare" the intended user(s) of this mechanism? (e.g. is it intended for people who build their own distros and want to bake in their own CRDs, is it intended for in-tree submissions, etc).  Perhaps I just lost track.

David Eads

unread,
Jun 15, 2021, 3:57:08 PM6/15/21
to Daniel Smith, Clayton Coleman, Davanum Srinivas, Tim Hockin, Gari Singh, Vivek Bagade, Yuvaraj Balaji, Nabarun Pal, K8s API Machinery SIG, kubernetes-sig-multicluster
I'm also interested in reading a KEP that attempts to do this in a non-kube-apiserver process (kube-controller-manager is my pick).

Even if we don't create the supporting types, I'd like to see the KEP explore the limitations produced by a lack of conversion webhooks and whether those limitations are fatal to the plan.

Yuvaraj Balaji

unread,
Jun 21, 2021, 12:06:37 PM6/21/21
to K8s API Machinery SIG
From all the discussion the general consensus is to start with a KEP and continue the discussion on the KEP.
This initial KEP will outline a mechanism where CRDs are loaded by a controller/installer that runs in the kube controller manager. 
The initial KEP will limit its focus to only addressing in-tree use cases and only load CRDs and not necessarily other supporting types. This would mean any other resources like webhooks will be ignored in the initial KEP.

The KEP would also cover some of the rejected proposals and list down drawbacks for each.

Nabarun and I will start working on a KEP that covers the above topics and get back to the SIG for more feedback. :) 

- Yuvaraj
Reply all
Reply to author
Forward
0 new messages