kubeadm operator update

302 views
Skip to first unread message

Paco Xu

unread,
Aug 7, 2022, 9:54:56 PM8/7/22
to kubernetes-sig-cluster-lifecycle
Hi all,

This is about the kubeadm operator on `day2` with kubeadm.

Not sure if this is the right place to discuss on kubeadm operator. There are some threads in https://github.com/kubernetes/kubeadm/issues/2317 and  [kubernetes/enhancements#2505](https://github.com/kubernetes/enhancements/issues/2505).

I write a simple [kubelet-reloader](https://github.com/pacoxu/kubelet-reloader) as a tool for kubeadm operator.

* kubelet-reloader will watch on `/usr/bin/kubelet-new`.
* once there is a different version of kubelet-new, the reloader will replace `/usr/bin/kubelet` and restart kubelet.
* todo: verify the configuration of kubelet and version before replacing it.

Currently the [kubeadm-operator v0.1.0](https://github.com/pacoxu/kubeadm-operator/releases/tag/v0.1.0) can support [upgrade cross versions ](https://github.com/pacoxu/kubeadm-operator/pull/73) like v1.22 to v1.24.

* kubeadm operator will download `kubectl`/`kubelet`/`kubeadm` and upgrade.(The current logic will download the binary directly, and I am not sure if `yum upgrade`/`apt upgrade` would be better.)
* kubelet will be placed in `/usr/bin/kubelet-new` for kubelet reloader.

See [quick-start](https://github.com/pacoxu/kubeadm-operator/#quick-start).

Some thoughts on the next steps

* [Add CRD: to define the version we want pacoxu/kubeadm-operator#88](https://github.com/pacoxu/kubeadm-operator/issues/88): a kubeadm operator CRD with the target version of this cluster. The controller can then create operations for it automatically.
* [offline install supports pacoxu/kubeadm-operator#87](https://github.com/pacoxu/kubeadm-operator/issues/87) offline supports
* ["yum/apt install" instead of download binary pacoxu/kubeadm-operator#86](https://github.com/pacoxu/kubeadm-operator/issues/86)

My version https://github.com/pacoxu/kubeadm-operator is based on Fabrizio's first implementation https://github.com/kubernetes/kubeadm/pull/2342 which is following the KEP https://github.com/kubernetes/enhancements/tree/master/keps/sig-cluster-lifecycle/kubeadm/2505-Kubeadm-operator.
BTW, https://github.com/chendave/kubeadm-operator is a similar project to mine.


Hope to receive your feedback and suggestions, or requirements on kubeadm operator.

Best regards,
Paco

Kevin Fox

unread,
Aug 8, 2022, 3:31:46 PM8/8/22
to kubernetes-sig-cluster-lifecycle
I'm a bit curious how this might play with the cluster-api. This something that fits in along with what they are doing, or does it compete with it?

Thanks,
Kevin

Paco Xu

unread,
Aug 9, 2022, 5:29:19 AM8/9/22
to kubernetes-sig-cluster-lifecycle
Hi Kevin,

Thanks for your feedback.

Here are some documents from Cluster-API below.
  • How to upgrade the Kubernetes control plane version
  • To upgrade the Kubernetes control plane version make a modification to the KubeadmControlPlane resource’s Spec.Version field. This will trigger a rolling upgrade of the control plane and, depending on the provider, also upgrade the underlying machine image.
  • Some infrastructure providers, such as AWS, require that if a specific machine image is specified, it has to match the Kubernetes version specified in the KubeadmControlPlane spec. In order to only trigger a single upgrade, the new MachineTemplate should be created first and then both the Version and InfrastructureTemplate should be modified in a single transaction.

With a stable kuebadm-operator, it can be one kubeadm-operator-provider for cluster API that can run on any infrastructure including bare metal.


Best regards,

Paco Xu

det...@gmail.com

unread,
Aug 9, 2022, 11:27:58 AM8/9/22
to kubernetes-sig-cluster-lifecycle
Paco,

While I haven't been too involved with the Cluster API project for a while now, in the past the project had discussed the use of a "kubeadm operator" to enable a subset of configuration changes that should not necessarily require a disruptive upgrade rollout, but currently do.

Based on my experience building and supporting Kubernetes lifecycle management tooling, I would not recommend trying to switch to an in-place upgrade model for the kubelet and/or OS level level changes because it introduces a level of indeterminism around the upgrade process. The one exception I would make to this is for Linux Distributions that support atomic style upgrades/rollbacks.

The Cluster API project specifically chose the model of using pre-baked images for specific Kubernetes versions because it allows all image version updates (Kernel versions, OS package dependencies, container runtimes, kubelet versions, etc) to be tested and validated for conformance prior to upgrade. This prevents issues related to version compatibility between kernel, os-level package dependencies, container runtime versions, kubelet, etc from potentially causing issues during upgrade operations. It also prevents potentially intermittent issues that can crop up related to pulling specific binaries from the internet, not to mention enables a safer path to rollback an upgrade (to the extent possible, at least).

As an aside, I even recommend against in-place upgrades for bare metal style environments. https://github.com/tinkerbell/cluster-api-provider-tinkerbell is a bare metal approach to Cluster API managed Kubernetes clusters that supports pre-baked images and provides a cloud-native approach to managing Kubernetes clusters in the datacenter.

--
Jason DeTiberus

Lubomir I. Ivanov

unread,
Aug 10, 2022, 5:26:50 AM8/10/22
to det...@gmail.com, kubernetes-sig-cluster-lifecycle
hi Jason, i think CAPI already has a number of requests for in place upgrades and in a recent meeting the maintainers discussed that there is something planned.

as far as immutability of CAPI machines goes, i don't think it will be possible to rotate the cluster CA (one of the potential kubeadm operator tasks), without treating the machines as mutable.

overall aligning the kubeadm operator design with the needs of CAPI, sounds like a prerequisite. Paco, it might be a good idea to log an issue in the CAPI repo to gather a list of use cases from the CAPI side.

lubomir
--


--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-cluster-lifecycle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-cluster...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-cluster-lifecycle/d70da427-07d9-4100-9fdb-52bf2d12ceabn%40googlegroups.com.

Paco Xu

unread,
Aug 10, 2022, 5:47:18 AM8/10/22
to kubernetes-sig-cluster-lifecycle
> While I haven't been too involved with the Cluster API project for a while now, in the past the project had discussed the use of a "kubeadm operator" to enable a subset of configuration changes that should not necessarily require a disruptive upgrade rollout, but currently do.

- `upgrade` may lead to problems. So dry-run should be supported. (If there are some apparent issues, kubeadm upgrade prelights should handle them or dry runs.)
- certs rotation
- cluster configuration changes sync are also the main part of the kubeadm operator. 

> Based on my experience building and supporting Kubernetes lifecycle management tooling, I would not recommend trying to switch to an in-place upgrade model for the kubelet and/or OS level changes because it introduces a level of indeterminism around the upgrade process. The one exception I would make to this is for Linux Distributions that support atomic-style upgrades/rollbacks.
> The Cluster API project specifically chose the model of using pre-baked images for specific Kubernetes versions because it allows all image version updates (Kernel versions, OS package dependencies, container runtimes, kubelet versions, etc) to be tested and validated for conformance prior to upgrading. This prevents issues related to version compatibility between the kernel, os-level package dependencies, container runtime versions, kubelet, etc from potentially causing issues during upgrade operations. It also prevents potentially intermittent issues that can crop up related to pulling specific binaries from the internet, not to mention enabling a safer path to roll back an upgrade (to the extent possible, at least).

I agree that pre-baked images would be a better choice.

> As an aside, I even recommend against in-place upgrades for bare metal style environments. https://github.com/tinkerbell/cluster-api-provider-tinkerbell is a bare metal approach to Cluster API managed Kubernetes clusters that support pre-baked images and provides a cloud-native approach to managing Kubernetes clusters in the data-center.

This is a nice project for bare-metal. I know Equinix Metal when I look into kubespray. This can help a lot for bare metal cluster management.

Best regards
Paco

Paco Xu

unread,
Aug 10, 2022, 5:56:48 AM8/10/22
to kubernetes-sig-cluster-lifecycle

Good idea, Lubomir. I opened https://github.com/kubernetes-sigs/cluster-api/issues/7044

Best regards,
Paco

neol...@gmail.com

unread,
Aug 31, 2022, 12:59:48 PM8/31/22
to kubernetes-sig-cluster-lifecycle
Paco, we spoke about this in the kubeadm office hours.

some points:
- the imperative vs declarative design of the operator is unclear, perhaps a combination of both is the right way to go. currently it's imperative with the Operation (CR).
- the immutable node topic in CAPI is complicated and there is no clear way to tell if this operator will be ever used in CAPI. uses cases are not clear.
- next step is to speak about it at SIG level (SIG CL zoom call) and see what more people think and whether we can give a good design direction.
- we think you can continue to work on the operator in your repository and use it if it suits your use cases, as it is
- if we have a good design direction we can update the KEP and proceed with implementation for this project as something the SIG can host under k-sigs

please watch the VOD for more details.
i will update this thread again once we have more discussion.

lubomir
--

Paco Xu

unread,
Sep 14, 2022, 6:14:51 AM9/14/22
to kubernetes-sig-cluster-lifecycle
Hi Lubomir,

Great thanks for your introduction in the kubeadm office hour and sig meeting.
- This was discussed in the SIG Cluster Lifecycle meeting: https://www.youtube.com/watch?v=-F3ak-BVaLI.
- Thanks for the discussion and comments on kubeadm operator.

1. The scope is the most important
- certs rotation 
- cluster upgrade
- re-configuration of apiserver/kube-control-manager/kubelet... 
  -- as kubeadm supports `--patch`, this is doable, I think.
- package management(apt/yum): 
  -- the current implementation is using binary overwrite. I think this should be included.
- node management: kernel, container runtime
  -- out of scope in my opinion

2. User Case 
- Cluster API uses providers to allow different IaaS implementations to be managed. 
-- Reason 1 for us not use cluster-api(Correct me if I'm wrong, as I am not sure about details of cluster API.)
i.  We investigated bare-metal use cases. We tried cluster-api-provider-maas and `Equinix Metal` that is not that convenient at that time.
ii. for migration problems and clusters across multiple platforms.
iii. Does cluster-api support re-configuration of apiserver/kube-control-manager/kubelet? Could cluster-api use kubeadm-operator to do it(or we can just implement it in cluster-api)? Or this is not the scope of cluster-api? 
-- Reason 2 to make a kubeadm-operator
i. another user uses kubespray with bare-metal but finds some problems. They want to use an operator that can run commands in a pod(Not in ansible).

BTW, the current implementation in  https://github.com/pacoxu/kubeadm-operator is something like running commands in jobs/pods(instead of using ansible) to manage the cluster.
And hope there is more feedback(seems no more feedback about this).
- If cluster API providers can fulfill your requirements, I think it is a better choice for its convenience and declarative API. Easy to scale and manage.
- If not, kubespray/ansbile is a good choice for clusters across platforms. 
In DaoCloud, we also start another project https://github.com/kubean-io/kubean to make something like a "kubespray operator"(not accuracy). 

3. the imperative vs declarative
I am not sure if I am right.
- The current imperative design is more like ansible
- Making it declarative needs re-design as I commented before.

4. a personal project or a sig project
- before we have a clear roadmap within the sig, this should be a personal project.


Again. Thanks for all your time and comments on this topic. 

Best regards,
Paco

Lubomir I. Ivanov

unread,
Sep 14, 2022, 9:47:02 AM9/14/22
to Paco Xu, kubernetes-sig-cluster-lifecycle


On Wed, Sep 14, 2022, 13:14 Paco Xu <roolli...@gmail.com> wrote:
Hi Lubomir,

Great thanks for your introduction in the kubeadm office hour and sig meeting.
- This was discussed in the SIG Cluster Lifecycle meeting: https://www.youtube.com/watch?v=-F3ak-BVaLI.
- Thanks for the discussion and comments on kubeadm operator.


no problem.
we did not make any decisions but at least we opened the discussion.


1. The scope is the most important
- certs rotation 
- cluster upgrade
- re-configuration of apiserver/kube-control-manager/kubelet... 
  -- as kubeadm supports `--patch`, this is doable, I think.
- package management(apt/yum): 
  -- the current implementation is using binary overwrite. I think this should be included.
- node management: kernel, container runtime
  -- out of scope in my opinion

agreed on the scope.


2. User Case 
- Cluster API uses providers to allow different IaaS implementations to be managed. 
-- Reason 1 for us not use cluster-api(Correct me if I'm wrong, as I am not sure about details of cluster API.)
i.  We investigated bare-metal use cases. We tried cluster-api-provider-maas and `Equinix Metal` that is not that convenient at that time.
ii. for migration problems and clusters across multiple platforms.
iii. Does cluster-api support re-configuration of apiserver/kube-control-manager/kubelet? Could cluster-api use kubeadm-operator to do it(or we can just implement it in cluster-api)? Or this is not the scope of cluster-api? 

currently capi does not allow in place reconf since it disagrees with the immutable machine principles. following these principles capi machines that a reconfigured would have to be rolled in.

-- Reason 2 to make a kubeadm-operator
i. another user uses kubespray with bare-metal but finds some problems. They want to use an operator that can run commands in a pod(Not in ansible).

i was expecting that kubespray might be interested but we have not heard comments from them yet.


BTW, the current implementation in  https://github.com/pacoxu/kubeadm-operator is something like running commands in jobs/pods(instead of using ansible) to manage the cluster.
And hope there is more feedback(seems no more feedback about this).
- If cluster API providers can fulfill your requirements, I think it is a better choice for its convenience and declarative API. Easy to scale and manage.
- If not, kubespray/ansbile is a good choice for clusters across platforms. 
In DaoCloud, we also start another project https://github.com/kubean-io/kubean to make something like a "kubespray operator"(not accuracy). 

as mentioned in the sig meeting, currently we sort of don't know how to proceed in terms of the wider sig picture and our response is that capi might do all the requested use cases eventually and there will be no sig hosted kubeadm operator per se. yet, we don't really know for sure...and that is why this needs more discussion.


3. the imperative vs declarative
I am not sure if I am right.
- The current imperative design is more like ansible
- Making it declarative needs re-design as I commented before.


true.


4. a personal project or a sig project
- before we have a clear roadmap within the sig, this should be a personal project.


agreed.




Again. Thanks for all your time and comments on this topic. 


thank you as well. if you are going to kubecon na, you could try organizing the interested parties in participating in a discussion panel (off schedule).


Best regards,
Paco

--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-cluster-lifecycle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-cluster...@googlegroups.com.

chad...@gmail.com

unread,
Sep 19, 2022, 4:16:42 PM9/19/22
to kubernetes-sig-cluster-lifecycle
On Wednesday, September 14, 2022 at 8:47:02 AM UTC-5 neol...@gmail.com wrote:
i was expecting that kubespray might be interested but we have not heard comments from them yet.

I started a new slack thread in #kubespray-dev to get some attention and input from the kubespray community. I have only recently become active again in kubespray with some minor contributions to the project after a long absence, but I am trying to find more time to get back into it.

My thoughts:
A SIG-CL backed kubeadm operator with a declarative API could simplify and potentially improve the performance of the day 2 lifecycle steps kubespray manages imperatively with ansible — kubernetes cluster version upgrades, cluster scaling, etc. There is a lot of operational experience built into kubespray's lifecycle management, and those patterns might help influence the design of a kubeadm operator.

It has been said in the past that kubespray is the "tool that uses the other tools". In other words, kubespray provides a single entrypoint (and the glue) to create and manage clusters in-place by abstracting the other SIG CL projects and lifecycle related tools. I think kubespray could replace some of its ansible managed day 2 cluster lifecycle tasks by integrating with a kubeadm operator. kubespray's abstractions are certainly easier to maintain when invoking tools with declarative APIs. This works even better when the APIs are backed by kubernetes Controllers used in common across the larger community. We've had a lot of success in wrapping kubeadm, ClusterConfiguration, KubeletConfiguration, and KubeProxyConfiguration with kubespray.

Compared to something like kubean which wraps kubespray, the kubeadm operator approach seems like it would be easier to maintain (kubespray's ansible interface changes frequently and lacks a real API), more useful to the larger k8s community, and more consistent with the existing model where kubespray is the entrypoint that wraps the other tools. Though truly I would be excited if either approach could help open the door for CAPI provider management of day 2 (and beyond) operations of kubespray clusters.

Lastly, a kubespray operator is something that has been suggested in the past, but these efforts never saw much progress nor adoption. It's good to see that others have not given up on related ideas :)

- Chad Swenson

Moshiur Rahman

unread,
Apr 3, 2023, 7:40:18 AM4/3/23
to kubernetes-sig-cluster-lifecycle
Hi Paco,

I would like to know about the CAPI compatibility of kubeadm-operator progress. I am curious to know the answer following question with kubeadm-operator.

 1. Can we test Certificate rotation not as a part of upgrade but standalone?

 2. Lets assume CA was not created by kubeadm. CA was added manually. Would kubeadm-operator respect that and use that CA?

Kind regards
Moshiur
Reply all
Reply to author
Forward
0 new messages