Hi folks,
We hope that you all are safe and doing well.
Recently, we started exploring the problem of having certain CustomResourceDefinitions available for consumption right after the cluster becomes healthy. This was widely discussed around the first half of 2019 as the CRD Installation [1][2] problem. Recently, sig-multicluster presented in a sig-architecture meeting on 2021-03-25 [3] discussing the prospect of having a ClusterPoperty type [4] either as a k/k builtin type or as a CRD. This potentially can be solved if there was a way to have a set of certain CRDs available to the cluster consumers as soon as the apiserver goes healthy.
We have explored the problem space and come up with a POC [5] implementation.
The Proof of Concept takes in two different sources for the CRDs to be installed at startup.
Builtin set of CRDs packaged along with k/k using go:embed. These manifests will follow the same release cadence as k/k.
An optional user provided directory which is located on the disk of the machine where the apiserver is being run. This is exposed using a command-line flag to both kube-apiserver and apiextensions-apiserver.
Do note that, the installer takes any standard CRD spec allowing us to reuse any existing CRD manifests.
The following logic is executed as a PostStartHook of the apiextensions-apiserver if the Feature Gate `InstallCRDsAtStartup` is set to `true`.
Prepare a reader for manifests that are embedded in kube-apiserver binary (Source #1).
Prepare a reader for the manifests located on disk (only if the user specifies through the flag) (Source #2).
Read all manifests through the Readers and return a list of unstructured objects.
Initialize an Installer from the PostStartHookContext.
Ensure that all the necessary GVRs are up. (In the current POC, we are just checking for apiextensions.k8s.io group to become available)
Install all the objects read in Step 3 and depending on whether they existed before or no we
Patch them, if they exist
Create them, if they don’t exist
At any point in the above logic, if there’s an error, we surface it; ensuring that, if the installation of CRDs at startup fails, the apiserver shows as unhealthy and the error is propagating up to the cluster admins.
While we were investigating the problem, there were certain questions that we wanted to discuss with the community.
Do we differentiate CRDs installed through this mechanism and the ones installed by users after cluster startup? If yes, how?
What happens if a user deletes the CRDs installed at startup time? Shall we reconcile them and ensure they exist as long as the cluster is running?
In the POC, we have included the capability to install any kind of resource which is either in the embedded location or provided by the user. We understand this approach can raise different concerns and wanted to get an opinion from the community? We can also gate the installable resources to only CRD or a combination of CRD and CRs if installing other resource types doesn’t sound like a good idea.
What should be the considerations when we upgrade/downgrade highly available (HA) clusters? For a non-HA cluster, whenever the kube-apiserver starts, it will ensure that CRDs which are packaged with that version of kube-apiserver exist., covering upgrade/downgrade scenarios for a non-HA cluster.
We would like to demo the POC and discuss more in the upcoming sig-api-machinery bi-weekly meeting (May 5 2021) to receive feedback on the proposal. After the discussion, we will draft a KEP incorporating the comments and suggestions from the community.
[1]: https://docs.google.com/document/d/1P2Eiy7L-TJqG1pU9So-yrf8SftLCQwlMbWq4qfulyuA/edit
[2]: https://goo.gl/2cW8zQ
[3]: http://bit.ly/sig-architecture
[4]: https://docs.google.com/presentation/d/1-GUWYPMpfTXdPCyxFgnjpnzc_coY21h0D-QocyCIgY0/edit
[5]: https://github.com/kubernetes/kubernetes/pull/101729
--
You received this message because you are subscribed to the Google Groups "K8s API Machinery SIG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAB_Fwd9tJp2izbVhCPx5x-hKPjOq%2BOo9as8nbwJxcws_WfK0Lg%40mail.gmail.com.
--
As stated, the proposal here doesn't handle upgrade/downgrade conditions well (apiservers will fight). (different sets of user manifests on different apiservers also cause a fight).Also, it's awkward to use this mechanism to delete a resource that it previously installed.Also, it's unclear what should happen if a user modifies one of these controlled resources.
The alternative that we had in mind is more like this:1. Add a second authz webhook, which runs in front of RBAC.The idea is that an authz webhook can run co-located with apiserver, and effectively implement a "platform admin" concept.So, it could for example block all users other than those in a platform-admin group when the cluster starts up. Then the platform admin (or a binary with sufficient credentials) adds in the desired startup objects. Then the authz webhook recognizes that the cluster has been initialized, and permits other traffic.An advantage of this mechanism is that you can make the authz webhook arbitrarily smart (i.e. should ordinary users be messing with these initialization objects?)
I'm not sure I'm ready to talk about this at tomorrow's SIG meeting, can we do it on the following one?
If you ever want to add a CRD manifest to this list (on an upgrade for instance), all clients have to handle the case where the resource isn't yet present even though the kube-apiserver is ready. This being the case, what is the distinction between embedding this controller inside the kube-apiserver with your post-start-hook logic and simply running an external controller that reconciles a set of CRDs?
Hi Daniel,Thank you for your thoughts. Replying inline.On Tue, May 4, 2021 at 11:16 AM Daniel Smith <dbs...@google.com> wrote:As stated, the proposal here doesn't handle upgrade/downgrade conditions well (apiservers will fight). (different sets of user manifests on different apiservers also cause a fight).Also, it's awkward to use this mechanism to delete a resource that it previously installed.Also, it's unclear what should happen if a user modifies one of these controlled resources.We recognize these issues and as you probably already saw these are along the same lines of the open questions we had in our minds.We are brainstorming about these open questions to come up with a reasonable solution.The alternative that we had in mind is more like this:1. Add a second authz webhook, which runs in front of RBAC.The idea is that an authz webhook can run co-located with apiserver, and effectively implement a "platform admin" concept.So, it could for example block all users other than those in a platform-admin group when the cluster starts up. Then the platform admin (or a binary with sufficient credentials) adds in the desired startup objects. Then the authz webhook recognizes that the cluster has been initialized, and permits other traffic.An advantage of this mechanism is that you can make the authz webhook arbitrarily smart (i.e. should ordinary users be messing with these initialization objects?)The "platform admin" concept sounds interesting and seems to be a good approach to solve the problem of users modifying these initialized objects.Also, can you elaborate on how the above approach solves the issue of initializing the objects?
Hi David,Replying inlineOn Tue, May 4, 2021 at 11:25 AM David Eads <de...@redhat.com> wrote:If you ever want to add a CRD manifest to this list (on an upgrade for instance), all clients have to handle the case where the resource isn't yet present even though the kube-apiserver is ready. This being the case, what is the distinction between embedding this controller inside the kube-apiserver with your post-start-hook logic and simply running an external controller that reconciles a set of CRDs?Just to be clear, the post-start-hook logic does not run a controller/reconcile loop/long running routine. It is a one-off task that initializes the CRDs.
Regardless, having an additional component run on the cluster to initialize CRDs would add an additional step to cluster bootstrappers, like kubeadm. Since the CRDs that are built-in in the proposed approach are non-optional definitions that need to be present on each cluster, delegating this responsibility to the cluster bootstrapper could risk creating inconsistencies across clusters created through different mechanisms.
Regards,Yuvaraj & Nabarun
--
You received this message because you are subscribed to the Google Groups "K8s API Machinery SIG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAKn3eovMqSYnET-duvQ2UHBTOZTLArHm_k1kp0%2BP5%2BNCnXCQ%2Bg%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/380991c4-1fb6-19d3-ca65-192126badd75%40coderanger.net.
Stuff like that definitely sounds more reasonable than just focusing on
CRDs, but that also grows the scope a good bit if we're talking about
arbitrary content now that needs to be both loaded and in some kind of
Ready state. That smells more like the "early webhook" kind of thing
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/b4f93b1a-0d31-099e-2c05-d0deeaf40fe6%40coderanger.net.
On Tue, Jun 8, 2021 at 1:34 PM Daniel Smith <dbs...@google.com> wrote:
>
> There was a fairly long discussion at our sig meeting last week (recording).
>
> On this topic, the only thing that we kinda-sorta established is that a CRD installing controller should run from controller manager, not as part of apiserver.
I wonder if it shouldn't be a distinct thing entirely. Upgrade the
schema BEFORE you upgrade the cluster.
>> If I were re-tooling the proposal, I'd focus on those. "Preventing
>> usage of the cluster until policies are in place" is a different
>> problem.
>
> It is not a completely different problem; IIRC, RuntimeClass couldn't use a CRD because there was a hard dependency from kubelet (?) that the type exist on startup. CRDs installed by an installer *will* show up some time after cluster start-up.
I'm not convinced. If kubelet REQUIRES runtimeclass, then what
prevents someone from unloading that type? AFAICT nothing, so either
it's OK for kubelet to crashloop the absence of the type (in which
case, just do that until the type is loaded) or it's not OK to
crashloop (in which case it has to handle the absence of the type).
If you are a cluster admin and you don't want to allow new pods to run
(at all?) in some conditions, isn't that a general policy problem?
On Tue, Jun 8, 2021 at 2:42 PM Daniel Smith <dbs...@google.com> wrote:
>
>
>
> On Tue, Jun 8, 2021 at 2:13 PM Tim Hockin <tho...@google.com> wrote:
>>
>> On Tue, Jun 8, 2021 at 1:34 PM Daniel Smith <dbs...@google.com> wrote:
>> >
>> > There was a fairly long discussion at our sig meeting last week (recording).
>> >
>> > On this topic, the only thing that we kinda-sorta established is that a CRD installing controller should run from controller manager, not as part of apiserver.
>>
>> I wonder if it shouldn't be a distinct thing entirely. Upgrade the
>> schema BEFORE you upgrade the cluster.
>
>
> Adding a new mandatory component is super unpopular with folks who have tooling to start clusters.
I'm not even sure it's a component as much as a process during cluster
turnup. Or both? Before you start an upgrade of the cluster control
plane, you should upgrade the API schema (CRDs). Existing controllers
must tolerate this. Then you can upgrade the control plane and
controllers. I guess at runtime you want to re-assert that same
schema, which seems fine inside controller manager (I think).
On Tue, Jun 8, 2021 at 2:51 PM Daniel Smith <dbs...@google.com> wrote:
>
>
>
> On Tue, Jun 8, 2021 at 2:47 PM Tim Hockin <tho...@google.com> wrote:
>>
>> On Tue, Jun 8, 2021 at 2:42 PM Daniel Smith <dbs...@google.com> wrote:
>> >
>> >
>> >
>> > On Tue, Jun 8, 2021 at 2:13 PM Tim Hockin <tho...@google.com> wrote:
>> >>
>> >> On Tue, Jun 8, 2021 at 1:34 PM Daniel Smith <dbs...@google.com> wrote:
>> >> >
>> >> > There was a fairly long discussion at our sig meeting last week (recording).
>> >> >
>> >> > On this topic, the only thing that we kinda-sorta established is that a CRD installing controller should run from controller manager, not as part of apiserver.
>> >>
>> >> I wonder if it shouldn't be a distinct thing entirely. Upgrade the
>> >> schema BEFORE you upgrade the cluster.
>> >
>> >
>> > Adding a new mandatory component is super unpopular with folks who have tooling to start clusters.
>>
>> I'm not even sure it's a component as much as a process during cluster
>> turnup. Or both? Before you start an upgrade of the cluster control
>> plane, you should upgrade the API schema (CRDs). Existing controllers
>> must tolerate this. Then you can upgrade the control plane and
>> controllers. I guess at runtime you want to re-assert that same
>> schema, which seems fine inside controller manager (I think).
>
>
> That makes it even worse, though, because we generally have no other imperative commands that you run during upgrade/downgrade...
...and this has been a nagging problem FOREVER. No clean way to touch
types to force storage-version updates, etc.
But we can still position this declaratively - first you have to
update the schema payload. Then you can update API servers. Then you
can update controllers. (We already document the
API-before-controllers sequence, I think).
That's true. Maybe my point about pre-upgrading isn't so important as the rest.
>> >> >> >> If I were re-tooling the proposal, I'd focus on those. "Preventing
>> >> >> >> usage of the cluster until policies are in place" is a different
>> >> >> >> problem.
>> >> >> >
>> >> >> > It is not a completely different problem; IIRC, RuntimeClass couldn't use a CRD because there was a hard dependency from kubelet (?) that the type exist on startup. CRDs installed by an installer *will* show up some time after cluster start-up.
>> >> >>
>> >> >> I'm not convinced. If kubelet REQUIRES runtimeclass, then what
>> >> >> prevents someone from unloading that type? AFAICT nothing, so either
>> >> >> it's OK for kubelet to crashloop the absence of the type (in which
>> >> >> case, just do that until the type is loaded) or it's not OK to
>> >> >> crashloop (in which case it has to handle the absence of the type).
>> >> >> If you are a cluster admin and you don't want to allow new pods to run
>> >> >> (at all?) in some conditions, isn't that a general policy problem?
>> >> >
>> >> >
>> >> > I agree with you and I don't remember why we lost this argument at the time, it was a few years ago. My argument here is that other people *feel* blocked by this, not that they are necessarily actually blocked.
>> >>
>> >> ACK.
--
You received this message because you are subscribed to the Google Groups "K8s API Machinery SIG" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-api-m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAO_RewYgfoRSYJmvrMoFZhp4hxDg_L_uBM8qrvv8sLVYurFiug%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAO_RewYjb%3DbQcr8Ri7pyjdzKFGtKgpYUNGTW%3DTCQgriRYHT2Sg%40mail.gmail.com.
A new controller in controller manager that installs some CRDs is like two days of effort. The KEP would be more effort.Getting everyone to agree on it is significantly more effort than that!I don't think schemas or IDL frameworks or anything like that has anything to do with this, it's unrelated.
On Jun 11, 2021, at 2:32 PM, Daniel Smith <dbs...@google.com> wrote:A new controller in controller manager that installs some CRDs is like two days of effort. The KEP would be more effort.Getting everyone to agree on it is significantly more effort than that!I don't think schemas or IDL frameworks or anything like that has anything to do with this, it's unrelated.Don’t agree :). What im trying to highlight is we’re fixated on mechanisms vs apis. To a consumer, crds are no different than built in. The machinery that takes a crd object and exposes a built in could be equally straightforward, and gives us a path to put in straight jackets that ensure CRD develops features without leaking the mechanism to distributions.
So place/bundle CRD yamls in some known location (or location from config option) which controller manager ingests and loads?
You received this message because you are subscribed to the Google Groups "kubernetes-sig-multicluster" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-mult...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-multicluster/CAB_J3bZcgQfBqVyxLQ_JVOFbGg3z%2BAve3wMZFpCTk77NKniH5g%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "kubernetes-sig-multicluster" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-mult...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-multicluster/CAB_J3bZcgQfBqVyxLQ_JVOFbGg3z%2BAve3wMZFpCTk77NKniH5g%40mail.gmail.com.
On Fri, Jun 11, 2021 at 11:44 AM Gari Singh <gari...@google.com> wrote:So place/bundle CRD yamls in some known location (or location from config option) which controller manager ingests and loads?I would start by embedding them in source code at compile time, not by dynamically loading from disk, that seems more risky.
On Fri, Jun 11, 2021 at 2:45 PM Daniel Smith <dbs...@google.com> wrote:On Fri, Jun 11, 2021 at 11:44 AM Gari Singh <gari...@google.com> wrote:So place/bundle CRD yamls in some known location (or location from config option) which controller manager ingests and loads?I would start by embedding them in source code at compile time, not by dynamically loading from disk, that seems more risky.I would start by embedding them in the source code of the API server and simply serving them as builtins.I think at this point we can agree to disagree, but I'd like to actually find a forum to engage more deeply to resolve it. I'd be more in favor of "do builtins"
I have no context in the CRD codepaths, so I can't assess easy or
hard. As a sig-arch person, I want to see the project inch towards
CRDs being as powerful as builtin types, and I want to make less
builtins. As a sig-net and sig-multicluster person, I want to make
progress, and not take on side-quests.
I suspect that for some/most of my use-cases "auto-load" is all I need
(in addition to existing CRD capabilities). I boldly assume that
auto-load means "and revert any manually-made changes" (re-apply).
I haven't yet heard how a controlling in KCM will choose which version
of CRDs to load - has that been worked out?
How to proceed? I wanted to encourage the folks investigating
auto-load, but this took a bad turn...
I haven't yet heard how a controlling in KCM will choose which versionof CRDs to load - has that been worked out?
I'd make it go with whatever version is compiled into the KCM which is the leader. I can imagine at least one cute trick to keep it from flopping back and forth if the cluster is in a split-version configuration (over an upgrade or downgrade). It shouldn't have to be that complicated.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAB_J3bZ%3DWFw4zx8SCyyuJLUxpfBm-4_fRJhUty5nCOpYA%3DP%3Deg%40mail.gmail.com.
Hello All,
I don’t know if it makes sense as part of this discussion/feature or just something that requires another forum or discussion, but how would the conversion utilities in kubectl work with this? As far as I know, it uses built-in types and conversion functions. Would we need a way to handle this for all CRDs?
Thanks,
Shawn Hurley
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAFS1MjKd2t_f2jtLT%3DN%3DqbkCMqobnVOXkf3nTnMKr-H7kGGdwA%40mail.gmail.com.
I haven't yet heard how a controlling in KCM will choose which versionof CRDs to load - has that been worked out?
I'd make it go with whatever version is compiled into the KCM which is the leader. I can imagine at least one cute trick to keep it from flopping back and forth if the cluster is in a split-version configuration (over an upgrade or downgrade). It shouldn't have to be that complicated.Given how our general skew story works, I think we're actually ok in cases where the schema transitions from vN-1 to vN to vN-1 to vN until the cluster completes the upgrade and the vN-1 KCM is retired. Clients already have to deal with the cases where their APIs are missing because features have been disabled and they have to handle cases where new fields have no data in them.I don't think I would have the vN-1 KCM delete CRDs that only exist in vN, since that is both slightly more difficult and more destructive.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAB_J3baYEaUEsnGhOqq1%2BOnmVju8OLSPj7Fyp8BYHfdUE8mWFg%40mail.gmail.com.
CRDs that change the stored version cannot be downgraded to a CRD that no longer defines that versionInitial CRD:
- versions: [v1beta1, v1beta2]
- stored version: v1beta2
Upgraded CRD:
- versions: [v1beta1, v1beta2, v1]
- stored version: v1
Once the upgraded CRD is present, applying the initial CRD and dropping v1 is no longer permitted
On Mon, Jun 14, 2021 at 9:17 AM Daniel Smith <dbs...@google.com> wrote:
> You can't add v1 and change the storage version in the same release, if you want to follow the rules we have for built-ins.
"in theory" or "the apiserver won't let you"
On Mon, Jun 14, 2021 at 9:17 AM Daniel Smith <dbs...@google.com> wrote:
>
>
>
> On Mon, Jun 14, 2021 at 9:15 AM Jordan Liggitt <lig...@google.com> wrote:
>>
>> CRDs that change the stored version cannot be downgraded to a CRD that no longer defines that version
>>
>> Initial CRD:
>>
>> versions: [v1beta1, v1beta2]
>> stored version: v1beta2
>>
>>
>> Upgraded CRD:
>>
>> versions: [v1beta1, v1beta2, v1]
>> stored version: v1
>>
>>
>> Once the upgraded CRD is present, applying the initial CRD and dropping v1 is no longer permitted
>
>
> You can't add v1 and change the storage version in the same release, if you want to follow the rules we have for built-ins.
"in theory" or "the apiserver won't let you"
On Mon, Jun 14, 2021 at 9:23 AM Daniel Smith <dbs...@google.com> wrote:
>
>
>
> On Mon, Jun 14, 2021 at 9:20 AM Tim Hockin <tho...@google.com> wrote:
>>
>> On Mon, Jun 14, 2021 at 9:17 AM Daniel Smith <dbs...@google.com> wrote:
>> >
>> >
>> >
>> > On Mon, Jun 14, 2021 at 9:15 AM Jordan Liggitt <lig...@google.com> wrote:
>> >>
>> >> CRDs that change the stored version cannot be downgraded to a CRD that no longer defines that version
>> >>
>> >> Initial CRD:
>> >>
>> >> versions: [v1beta1, v1beta2]
>> >> stored version: v1beta2
>> >>
>> >>
>> >> Upgraded CRD:
>> >>
>> >> versions: [v1beta1, v1beta2, v1]
>> >> stored version: v1
>> >>
>> >>
>> >> Once the upgraded CRD is present, applying the initial CRD and dropping v1 is no longer permitted
>> >
>> >
>> > You can't add v1 and change the storage version in the same release, if you want to follow the rules we have for built-ins.
>>
>> "in theory" or "the apiserver won't let you"
>
>
> apiserver doesn't enforce that your schema will do the transitions right. But we, the api reviewers, will do so with CRDs that are in-tree.
Should it? Like, maybe an override to allow unsafe updates (or vice-versa) ?
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAB_J3bYKfVfLSHL9o3kcNdFtQRpSc8yw-Hk2oGmQ00Q4VhdYVA%40mail.gmail.com.
Thanks for outlining what's acceptable Daniel!Nikhita/Nabarun, wanna do this?-- DimsOn Tue, Jun 15, 2021 at 1:08 PM 'Daniel Smith' via K8s API Machinery SIG <kubernetes-sig...@googlegroups.com> wrote:I would review a KEP for a mechanism:* targeted at in-tree use cases (using a CRD instead of a built-in), not targeted at end users (cluster admins) or distributors* only loads CRDs, not necessarily other supporting types* runs in controller manager
On Tue, Jun 15, 2021 at 1:14 PM Davanum Srinivas <dav...@gmail.com> wrote:Thanks for outlining what's acceptable Daniel!Nikhita/Nabarun, wanna do this?-- DimsOn Tue, Jun 15, 2021 at 1:08 PM 'Daniel Smith' via K8s API Machinery SIG <kubernetes-sig...@googlegroups.com> wrote:I would review a KEP for a mechanism:* targeted at in-tree use cases (using a CRD instead of a built-in), not targeted at end users (cluster admins) or distributors* only loads CRDs, not necessarily other supporting types* runs in controller managerI'm still not in direct agreement here, but we can take it in the KEP. I have a ton of reservations around CRD fiddling in core given long history of what is considered correct for rollback, which CRD fiddling does not necessarily solve (which Jordan's comments were highlighting).
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-api-machinery/CAB_J3bYCco18pTCUyMBpphjC-dO7-xgCPhG-63CP%3Dywwf6ik-A%40mail.gmail.com.