NMO helm packaging

128 views
Skip to first unread message

Hai Wu

unread,
May 7, 2024, 8:00:09 AM5/7/24
to medik8s
It seems the recommended way to install NMO is by OLM following the
instructions on
https://operatorhub.io/operator/node-maintenance-operator or
https://olm.operatorframework.io/docs/getting-started/, and there's no
way to install this by helm.

Is it possible for me to build a helm package from NMO source code? I
am not familiar with OLM tools, and the helm packaging currently is
the default industry standard for third party addons. I am surprised
that there is no helm support for NMO.

Thanks,
Hai

Marc Sluiter

unread,
May 8, 2024, 4:34:50 AM5/8/24
to Hai Wu, medik8s
Hello Hai,

that's correct, all our operators are built with operator-sdk and depend on OLM for deployment. 
Currently we don't plan or have the resources to add, maintain and test Helm support.

In this issue are some suggestions on how generate a helm chart for operator-sdk based projects: https://github.com/operator-framework/operator-sdk/issues/4930#issuecomment-847372698
The last comment points to the "helmify" project, which looks promising IMHO: https://github.com/arttor/helmify?tab=readme-ov-file#integrate-to-your-operator-sdkkubebuilder-project
Maybe you can give that one a try? Please let us know if that works, so we can add it to our docs at least.

Thanks and regards,

Marc

Marc Sluiter

He / Him / His

Principal Software Engineer

Red Hat

mslu...@redhat.com


Red Hat GmbH, Registered seat: Werner von Siemens Ring 12, D-85630 Grasbrunn, Germany  
Commercial register: Amtsgericht Muenchen/Munich, HRB 153243,
Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy Ross

   


--
You received this message because you are subscribed to the Google Groups "medik8s" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medik8s+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/medik8s/325f6f53-ae18-4aac-ac52-9b57bfe66013n%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

hai wu

unread,
May 10, 2024, 7:56:07 PM5/10/24
to Marc Sluiter, medik8s
Found this in its source repo:

It seems hemify did not capture this one. Normally this would be auto created if we define some ingress object with cert-manager. There's no ingress for this one.



On Wed, May 8, 2024 at 12:23 PM hai wu <haiw...@gmail.com> wrote:
Thanks Marc! I use helmify, and it does create a new helm package.

After installing the new helm package, it failed with creating pod 'node-maintenance-operator-controller-manager-777694c5c6-lt745' with this error:

"  Warning  FailedMount  25s (x7 over 56s)  kubelet            MountVolume.SetUp failed for volume "cert" : secret "webhook-server-cert" not found"

This secret 'webhook-server-cert' only shows up once in file `deployment.yaml` in the generated helm chart. Not sure what is the purpose of this secret, could you please explain? not sure how to create it properly..

Thanks,
Hai

hai wu

unread,
May 10, 2024, 7:56:07 PM5/10/24
to Marc Sluiter, medik8s
Thanks Marc! I use helmify, and it does create a new helm package.

After installing the new helm package, it failed with creating pod 'node-maintenance-operator-controller-manager-777694c5c6-lt745' with this error:

"  Warning  FailedMount  25s (x7 over 56s)  kubelet            MountVolume.SetUp failed for volume "cert" : secret "webhook-server-cert" not found"

This secret 'webhook-server-cert' only shows up once in file `deployment.yaml` in the generated helm chart. Not sure what is the purpose of this secret, could you please explain? not sure how to create it properly..

Thanks,
Hai

On Wed, May 8, 2024 at 3:34 AM Marc Sluiter <mslu...@redhat.com> wrote:

Marc Sluiter

unread,
May 13, 2024, 11:50:51 AM5/13/24
to hai wu, medik8s
Hi, sorry for the late reply due to public holidays and PTO.

Ah yes, the certificates are an issue indeed. When the operator is installed by OLM, OLM provides the webhook certificates, and mounts them to the pod automatically at a well known path.
That location is hardcoded in the operator code for reading them. However, the code doesn't fail when it doesn't find the certs at that location, it falls back to the default location of operator-sdk / kubebuilder / controller-runtime...?
See https://github.com/medik8s/node-maintenance-operator/blob/d320cba0a821387ff506476e09e4f2a60ababfdd/main.go#L188

It has been a very long while that I tried to enable certs with cert-manager. IIRC you have to uncomment multiple places in the config dir:
No clue if helmify can handle this?

BR, Marc

Andrew Beekhof

unread,
May 15, 2024, 12:01:42 AM5/15/24
to medik8s
We might want an overlay for creating a cert-manager version of the bundle, and point helmify at that

Hai Wu

unread,
Jun 17, 2024, 3:50:15 AM6/17/24
to medik8s
Not sure how to get overlay to work with this.

Even if not using overlay, after manually un-commenting all relevant lines to enable cert-manager, hit this error after running `make helm`:

Any advice?

Andrew Beekhof

unread,
Jun 18, 2024, 12:52:31 AM6/18/24
to Hai Wu, medik8s
I think I'm missing some context... I don't see a "helm" target in https://github.com/medik8s/node-maintenance-operator/blob/main/Makefile
Where are you running this from?

You received this message because you are subscribed to a topic in the Google Groups "medik8s" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/medik8s/_X5YVLQanLU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to medik8s+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/medik8s/a2db3056-39c0-44f5-8c5a-592099476663n%40googlegroups.com.

hai wu

unread,
Jun 18, 2024, 7:50:12 AM6/18/24
to Andrew Beekhof, medik8s
That's from hemify doc at https://github.com/arttor/helmify?tab=readme-ov-file#integrate-to-your-operator-sdkkubebuilder-project. I manually added the following to Makefile per its notes:

With operator-sdk version >= v1.23.0
HELMIFY ?= $(LOCALBIN)/helmify

.PHONY: helmify
helmify: $(HELMIFY) ## Download helmify locally if necessary.
$(HELMIFY): $(LOCALBIN)
	test -s $(LOCALBIN)/helmify || GOBIN=$(LOCALBIN) go install github.com/arttor/helmify/cmd/helmify@latest
    
helm: manifests kustomize helmify
	$(KUSTOMIZE) build config/default | $(HELMIFY)

Andrew Beekhof

unread,
Jun 18, 2024, 11:49:40 PM6/18/24
to hai wu, medik8s
Does the error happen when generating the helm chart, or trying to apply it?
Seems to generate fine here on a fresh fedora39 machine

hai wu

unread,
Jun 19, 2024, 2:49:34 AM6/19/24
to Andrew Beekhof, medik8s
it generates ok, but it does not work. 

Andrew Beekhof

unread,
Jun 19, 2024, 5:57:35 PM6/19/24
to hai wu, medik8s
You've not actually mentioned anywhere what type of k8s cluster you're trying to load this into.
OpenShift, kind, upstream k8s, other?

Also, did you use the -crd-dir option for helmify?

hai wu

unread,
Jun 19, 2024, 6:15:44 PM6/19/24
to Andrew Beekhof, medik8s
It is with upstream k8s, 1.27 or 1.28 release. 

No, I did not use the '-crd-dir' option. Not sure how this option might help with this case. 

Andrew Beekhof

unread,
Jun 19, 2024, 7:32:13 PM6/19/24
to hai wu, medik8s
I had hoped it would help Helm define the types before trying to create instances of them.
But looking deeper into the generated charts, I see that's not the case

Andrew Beekhof

unread,
Jun 19, 2024, 7:48:40 PM6/19/24
to hai wu, medik8s
Ok, so the error you showed is a patch failing to be applied because the thing it was asked to patch doesn't exist.
Yet that's a kustomize feature, not a Helm one.  So I'm really unclear what you're doing.

I would suggest three things:
1. Make changes in a copy of default (ie. create a new overlay)
   rsync -a --delete ./config/default/ ./config/vanilla/ 
   This would create the possibility of merging the changes upstream
2. Push these changes to a public repo somewhere, so we can see exactly what you're doing.
3. Include the full series of commands and outputs here, so we have the whole context

hai wu

unread,
Jun 25, 2024, 12:42:47 AM6/25/24
to Andrew Beekhof, medik8s
I couldn't do the above easily from the air-gapped network..

I basically cloned ./config/default as you mentioned to be ./config/vanilla, and cloned ./config/crd to be ./config/vanilla_crd, and uncommented all related lines for certmanager and webhook, and added hemify lines in Makefile to ensure 'make helm' would use ./config/vanilla instead, and got the exact same error.

And if we look at this file 'https://github.com/medik8s/node-maintenance-operator/blob/main/config/default/webhookcainjection_patch.yaml', we could see that it has both 'MutatingWebhookConfiguration' and 'ValidatingWebhookConfiguration'. But in file 'https://github.com/medik8s/node-maintenance-operator/blob/main/config/webhook/manifests.yaml', there's only 'ValidatingWebhookConfiguration' defined here.

It seems the above 'ValidatingWebhookConfiguration' would be auto generated here at 'https://github.com/medik8s/node-maintenance-operator/blob/main/api/v1beta1/nodemaintenance_webhook.go' via line '//+kubebuilder:webhook:path=/validate-nodemaintenance-medik8s-io-v1beta1-nodemaintenance,mutating=false,failurePolicy=fail,sideEffects=None,groups=nodemaintenance.medik8s.io,resources=nodemaintenances,verbs=create;update,versions=v1beta1,name=vnodemaintenance.kb.io,admissionReviewVersions=v1', and it has 'mutating=false'.

Does this mean the code should not have 'MutatingWebhookConfiguration' in 'webhookcainjection_patch.yaml'?

Marc Sluiter

unread,
Jun 25, 2024, 3:24:00 AM6/25/24
to hai wu, Andrew Beekhof, medik8s
On Tue, Jun 25, 2024 at 6:42 AM hai wu <haiw...@gmail.com> wrote:
I couldn't do the above easily from the air-gapped network..

I basically cloned ./config/default as you mentioned to be ./config/vanilla, and cloned ./config/crd to be ./config/vanilla_crd, and uncommented all related lines for certmanager and webhook, and added hemify lines in Makefile to ensure 'make helm' would use ./config/vanilla instead, and got the exact same error.

And if we look at this file 'https://github.com/medik8s/node-maintenance-operator/blob/main/config/default/webhookcainjection_patch.yaml', we could see that it has both 'MutatingWebhookConfiguration' and 'ValidatingWebhookConfiguration'. But in file 'https://github.com/medik8s/node-maintenance-operator/blob/main/config/webhook/manifests.yaml', there's only 'ValidatingWebhookConfiguration' defined here.


It seems the above 'ValidatingWebhookConfiguration' would be auto generated here at 'https://github.com/medik8s/node-maintenance-operator/blob/main/api/v1beta1/nodemaintenance_webhook.go' via line '//+kubebuilder:webhook:path=/validate-nodemaintenance-medik8s-io-v1beta1-nodemaintenance,mutating=false,failurePolicy=fail,sideEffects=None,groups=nodemaintenance.medik8s.io,resources=nodemaintenances,verbs=create;update,versions=v1beta1,name=vnodemaintenance.kb.io,admissionReviewVersions=v1', and it has 'mutating=false'.

Does this mean the code should not have 'MutatingWebhookConfiguration' in 'webhookcainjection_patch.yaml'?

Correct, NMO doesn't have a MutatingWebhook, you can safely delete it in webhookcainjection_patch.yaml

BR, Marc

 

hai wu

unread,
Jun 25, 2024, 9:32:57 AM6/25/24
to Marc Sluiter, Andrew Beekhof, medik8s
Cool. Just created and installed the helm chart on a test cluster, and tried to use it to put a node into maintenance mode. 

The yaml is almost the same as this one:
$ cat config/samples/nodemaintenance_v1beta1_nodemaintenance.yaml
apiVersion: nodemaintenance.medik8s.io/v1beta1
kind: NodeMaintenance
metadata:
  name: nodemaintenance-sample
spec:
  nodeName: node02
  reason: "Test node maintenance"

It failed with these messages:

W0625 12:34:09.677774       1 reflector.go:539] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:208: failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:default:nmo-controller-manager" cannot list resource "namespaces" in API group "" at the cluster scope
E0625 12:34:09.677809       1 reflector.go:147] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:208: Failed to watch *v1.Namespace: failed to list *v1.Namespace: namespaces is forbidden: User "system:serviceaccount:default:nmo-controller-manager" cannot list resource "namespaces" in API group "" at the cluster scope...

in config/rbac/role.yaml, it has:
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: manager-role
rules:
- apiGroups:
  - ""
  resources:
  - namespaces
  verbs:
  - create
  - get
- apiGroups:
  - ""
  resources:
.....

it seems it would be auto-generated by this line: (I added 'list' to it, and now it is not hitting that error message)
//+kubebuilder:rbac:groups="",resources=namespaces,verbs=get;create

But I still got this:
WARNING: ignoring DaemonSet-managed Pods ...

Thus there are still daemonset managed pods running on the node that is in maintenance mode.

Also got this in message: 'E0625 13:26:15.925524       1 reflector.go:147] sigs.k8s.io/controller-runtime/pkg/cache/internal/informers.go:208: Failed to watch *v1.Namespace: unknown (get namespaces)'. 

So it seems we also need to add 'watch' as well?


hai wu

unread,
Jun 25, 2024, 12:28:30 PM6/25/24
to Marc Sluiter, Andrew Beekhof, medik8s
I went ahead and added 'watch'. Also did the following in file 'controllers/nodemaintenance_controller.go':

-//+kubebuilder:rbac:groups="coordination.k8s.io",resources=leases,verbs=get;list;update;patch;watch;create
+//+kubebuilder:rbac:groups="coordination.k8s.io",resources=leases,verbs=get;list;update;patch;watch;create;delete

but still getting this error when trying to delete this nm via kubectl:
User "system:serviceaccount:default:nmo-controller-manager" cannot delete resource "leases" in API group "coordination.k8s.io" in the namespace "medik8s-leases"

I also updated the following in file 'controllers/nodemaintenance_controller.go':
- drainer.IgnoreAllDaemonSets = true
+ drainer.IgnoreAllDaemonSets = false

But it is not working, it did not evict daemonsets-managed pods.

Has this been tested ok with vanilla k8s? It does not seem to work properly..

Marc Sluiter

unread,
Jun 26, 2024, 2:54:28 AM6/26/24
to hai wu, Andrew Beekhof, medik8s
Hi.

The missing namespace permission: unfortunately there is a OLM feature which adds some permissions for namespaces automatically, that's why we didn't run into this issue.
The missing lease permission: we have a e2e test which tests lease deletion, we need to investigate why it didn't catch this. I will create an issue for this.
The IgnoreAllDaemonSets setting: we need to be careful with changing such a setting, according to the comment at least in the past there was reason for it. We might want to re-evaluate.
But since daemonset pods aren't rescheduled, the usecase for draining them is limited, isn't it? At least no one else complained about this so far.

BR, Marc

Mohammed Safadi

unread,
Feb 11, 2025, 1:53:05 AMFeb 11
to medik8s
Hi, I tried to install it without using OLM at all, do we have any way to do that? 

installing with make install, make deploy doesn't work. 
Reply all
Reply to author
Forward
0 new messages