reboot cycle

38 views
Skip to first unread message

Stanislav

unread,
Jan 24, 2024, 7:45:34 AM1/24/24
to medik8s

Hi!

I'm a novice in using self-node-remediation. I'm currently testing this operator in a cluster, and noticed the following issue. I've added the RegistryIsNotAvailable condition to the NodeHealthCheck CRD for testing. When this condition arises in a node, it successfully reboots. However, if this condition continues to persist on the node after the reboot, the taints are removed and shortly after, the node is sent to reboot again.

Could you please advise on the expected behavior of self-node-remediation in a case where a reboot does not resolve the issue?

Thanks,

Stanislav.

Michael Shitrit

unread,
Jan 25, 2024, 3:54:36 AM1/25/24
to Stanislav, medik8s
Hi Stanislav,

Thanks for reaching out.
In general this is an unexpected behavior, taints are to be removed only when the node becomes healthy again.
My best guess is that this is caused by the node shortly becoming healthy. 

Can you please add NHC / SNR logs, so we may investigate further ?


--
You received this message because you are subscribed to the Google Groups "medik8s" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medik8s+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/medik8s/a72cb7f6-c481-4e5c-a85e-0b12e0a90637n%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--

Michael Shitrit

Principal Software Engineer

Red Hat

Stanislav

unread,
Jan 26, 2024, 7:30:37 AM1/26/24
to Michael Shitrit, med...@googlegroups.com
Hello, Michael! 

Thanks for the fast response.

I have already run tests on several clusters. Here are my logs from the following actions:

• NHC and SNR are monitoring three working nodes where the Node Problem Detector is running (docker image from a private repository: node-problem-detector:v1)
• I put tag v1 on the docker image with NPD build where the RegistryIsNotAvailable check is broken
• I restart NPD on one node, as a result of which node has a RegistryIsNotAvailable condition
• After 120s selfnoderemediations.self-node-remediation.medik8s.io is created for this node
• The node reboots (I intentionnaly gave SNR wrong path to watchdog as far as we have bug in watchdog driver, but software reboot works as expected)
• After rebooting, NPD with a broken RegistryIsNotAvailable check starts on the node
• Despite this, taints are removed from it, and selfnoderemediations.self-node-remediation.medik8s.io is created again
• While the node is rebooting, I tag v1 on the docker image with the normal NPD build
• Taints are removed from the node and, since this time it doesn't have the RegistryIsNotAvailable condition, it continues to work

Maybe I'm doing the test wrongly, but it currently looks like SNR ignores the condition after rebooting and removes taints in any case, which leads to a reboot cycle.

kubectl describe node

Events:
 Type Reason Age From Message
 ---- ------ ---- ---- -------
 Warning RegistryIsNotAvailable 10m registry-custom-plugin-monitor Node condition RegistryIsNotAvailable is now: True, reason: RegistryIsNotAvailable, message: "TEST! INTENTIONALLY BROKEN (actually 200)"
 Normal NodeNotSchedulable 8m5s (x3 over 2d13h) kubelet Node <NODE_NAME> status is now: NodeNotSchedulable
 Normal Starting 5m43s kube-proxy Starting kube-proxy.
 Normal Starting 5m33s kubelet Starting kubelet.
 Normal NodeAllocatableEnforced 5m33s kubelet Updated Node Allocatable limit across pods
 Normal NodeHasNoDiskPressure 5m32s (x2 over 5m33s) kubelet Node <NODE_NAME> status is now: NodeHasNoDiskPressure
 Normal NodeHasSufficientMemory 5m32s (x2 over 5m33s) kubelet Node <NODE_NAME> status is now: NodeHasSufficientMemory
 Normal NodeHasSufficientPID 5m32s (x2 over 5m33s) kubelet Node <NODE_NAME> status is now: NodeHasSufficientPID
 Warning Rebooted 5m32s kubelet Node <NODE_NAME> has been rebooted, boot id: e04613d4-0d7c-489d-8a74-297ecbe49856
 Normal NodeReady 5m32s kubelet Node <NODE_NAME> status is now: NodeReady
 Warning RegistryIsNotAvailable 5m29s registry-custom-plugin-monitor Node condition RegistryIsNotAvailable is now: True, reason: RegistryIsNotAvailable, message: "TEST! INTENTIONALLY BROKEN (actually 200)"
 Normal NodeSchedulable 5m22s kubelet Node <NODE_NAME> status is now: NodeSchedulable
 Warning RegistryIsNotAvailable 5m19s registry-custom-plugin-monitor Node condition RegistryIsNotAvailable is now: True, reason: RegistryIsNotAvailable, message: "TEST! INTENTIONALLY BROKEN (actually 200)"
 Normal NodeNotSchedulable 3m11s (x2 over 5m32s) kubelet Node <NODE_NAME> status is now: NodeNotSchedulable
 Normal Starting 64s kube-proxy Starting kube-proxy.
 Normal Starting 54s kubelet Starting kubelet.
 Normal NodeAllocatableEnforced 54s kubelet Updated Node Allocatable limit across pods
 Normal NodeHasSufficientMemory 53s (x2 over 54s) kubelet Node <NODE_NAME> status is now: NodeHasSufficientMemory
 Normal NodeHasNoDiskPressure 53s (x2 over 54s) kubelet Node <NODE_NAME> status is now: NodeHasNoDiskPressure
 Normal NodeHasSufficientPID 53s (x2 over 54s) kubelet Node <NODE_NAME> status is now: NodeHasSufficientPID
 Warning Rebooted 53s kubelet Node <NODE_NAME> has been rebooted, boot id: d12599aa-126b-42db-b0de-9c775d7f6213
 Normal NodeReady 53s kubelet Node <NODE_NAME> status is now: NodeReady
 Normal NodeNotSchedulable 53s kubelet Node <NODE_NAME> status is now: NodeNotSchedulable
 Normal NodeSchedulable 22s kubelet Node <NODE_NAME> status is now: NodeSchedulable


kubectl describe nodehealthchecks.remediation.medik8s.io

Spec:
 Min Healthy: 51%
 Remediation Template:
 API Version: self-node-remediation.medik8s.io/v1alpha1
 Kind: SelfNodeRemediationTemplate
 Name: self-node-remediation-resource-deletion-template
 Namespace: operators
 Selector:
 Match Expressions:
 Key: node.kubernetes.io/role
 Operator: NotIn
 Values:
 master
 Unhealthy Conditions:
 Duration: 300s
 Status: False
 Type: Ready
 Duration: 300s
 Status: Unknown
 Type: Ready
 Duration: 120s
 Status: True
 Type: RegistryIsNotAvailable
Status:
 Conditions:
 Last Transition Time: 2023-12-29T12:44:00Z
 Message: No issues found, NodeHealthCheck is enabled.
 Reason: NodeHealthCheckEnabled
 Status: False
 Type: Disabled
 Healthy Nodes: 3
 Last Update Time: 2024-01-26T11:15:56Z
 Observed Nodes: 3
 Phase: Enabled
 Reason: NHC is enabled, no ongoing remediation
Events:
 Type Reason Age From Message
 ---- ------ ---- ---- -------
 Normal RemediationCreated 3m57s (x6 over 7d20h) NodeHealthCheck Created remediation object for node <NODE_NAME>
 Normal RemediationRemoved 70s (x6 over 4d1h) NodeHealthCheck Deleted remediation CR of kind SelfNodeRemediation for node <NODE_NAME>
```

logs of SNR pod on NODE_NAME

I0126 11:07:16.813449 46624 request.go:690] Waited for 1.039674561s due to client-side throttling, not priority and fairness, request: GET:https://10.108.192.1:443/apis/cert-manager.io/v1beta1?timeout=32s

2024-01-26T11:07:17.515844816Z INFO controller-runtime.metrics Metrics server is starting to listen {"addr": ":8080"}

2024-01-26T11:07:17.515992993Z INFO setup Starting as a self node remediation agent that should run as part of the daemonset

2024-01-26T11:07:17.516010663Z ERROR watchdog watchdog file path couldn't be accessed {"error": "watchdog device not found: stat /dev/watchdog15: no such file or directory"}

github.com/medik8s/self-node-remediation/pkg/watchdog.NewLinux

/workspace/pkg/watchdog/linux.go:74

main.initSelfNodeRemediationAgent

/workspace/main.go:219

main.main

/workspace/main.go:113

runtime.main

/root/sdk/go1.20/src/runtime/proc.go:250

2024-01-26T11:07:17.516044895Z INFO watchdog trying to enable softdog

2024-01-26T11:07:17.517978707Z INFO watchdog auto detected softdog path {"path": "/dev/watchdog0"}

2024-01-26T11:07:17.534410431Z INFO setup init grpc server

2024-01-26T11:07:17.534470352Z INFO setup starting manager

2024-01-26T11:07:17.534617656Z INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "[::]:8080"}

2024-01-26T11:07:17.53465909Z INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}

2024-01-26T11:07:17.534691222Z INFO watchdog watchdog started

2024-01-26T11:07:17.534783347Z INFO api-check api connectivity check started

2024-01-26T11:07:17.534783762Z INFO Starting EventSource {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation", "source": "kind source: *v1alpha1.SelfNodeRemediation"}

2024-01-26T11:07:17.534807792Z INFO Starting Controller {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation"}

2024-01-26T11:07:17.635325961Z INFO safe-time-calculator calculated minTimeToAssumeNodeRebooted is: {"minTimeToAssumeNodeRebooted": "2m45s"}

2024-01-26T11:07:17.635348025Z INFO Starting workers {"controller": "selfnoderemediation", "controllerGroup": "self-node-remediation.medik8s.io", "controllerKind": "SelfNodeRemediation", "worker count": 1}

2024-01-26T11:07:17.635502721Z INFO peers peers started

2024-01-26T11:07:17.636600854Z INFO peerhealth.server peer health server started

2024-01-26T11:08:18.044142406Z INFO controllers.SelfNodeRemediation pre-reboot not completed yet, prepare for rebooting {"selfnoderemediation": "operators/<NODE_NAME>"}

2024-01-26T11:08:18.16185926Z INFO controllers.SelfNodeRemediation pre-reboot not completed yet, prepare for rebooting {"selfnoderemediation": "operators/<NODE_NAME>"}

2024-01-26T11:08:18.182010913Z INFO controllers.SelfNodeRemediation NoExecute taint added {"selfnoderemediation": "operators/<NODE_NAME>", "new taints": [{"key":"dbaas.pool","value":"default","effect":"NoSchedule"},{"key":"pool","value":"default","effect":"NoSchedule"},{"key":"medik8s.io/remediation","value":"self-node-remediation","effect":"NoExecute","timeAdded":"2024-01-26T11:08:18Z";}]}

2024-01-26T11:08:18.182039984Z INFO controllers.SelfNodeRemediation Marking node as unschedulable {"selfnoderemediation": "operators/<NODE_NAME>", "node name": "<NODE_NAME>"}

2024-01-26T11:08:18.199657738Z INFO controllers.SelfNodeRemediation pre-reboot not completed yet, prepare for rebooting {"selfnoderemediation": "operators/<NODE_NAME>"}

2024-01-26T11:08:18.19978848Z INFO controllers.SelfNodeRemediation waiting for unschedulable taint to appear {"selfnoderemediation": "operators/<NODE_NAME>", "node name": "<NODE_NAME>"}

2024-01-26T11:08:19.162873178Z INFO controllers.SelfNodeRemediation pre-reboot not completed yet, prepare for rebooting {"selfnoderemediation": "operators/<NODE_NAME>"}

2024-01-26T11:08:19.162997623Z INFO controllers.SelfNodeRemediation updating snr with node backup and updating time to assume node has been rebooted {"selfnoderemediation": "operators/<NODE_NAME>", "node name": "<NODE_NAME>"}

2024-01-26T11:08:19.163016468Z INFO safe-time-calculator calculated minTimeToAssumeNodeRebooted is: {"minTimeToAssumeNodeRebooted": "2m45s"}

2024-01-26T11:08:19.171288414Z INFO controllers.SelfNodeRemediation node reboot not completed yet, start rebooting {"selfnoderemediation": "operators/<NODE_NAME>"}

2024-01-26T11:08:19.171308848Z INFO rebooter watchdog feeding has stopped, waiting for reboot to commence

2024-01-26T11:08:49.176826894Z INFO controllers.SelfNodeRemediation node reboot not completed yet, start rebooting {"selfnoderemediation": "operators/<NODE_NAME>"}

2024-01-26T11:08:49.176850931Z INFO rebooter watchdog is triggered, waiting for watchdog reboot to commence

2024-01-26T11:08:49.176855292Z INFO rebooter watchdog reboot is stuck, too long has passed since last feed time {"time passed since last feed": "31.64071577s"}

2024-01-26T11:08:49.176881619Z INFO rebooter about to try software reboot


logs of node-healthcheck-controller-manager

2024-01-26T11:06:17.828317864Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:06:17.828410254Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:06:17.828471726Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:06:17.83559766Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
time="2024-01-26T11:06:17Z" level=info msg="invalidating lease"
time="2024-01-26T11:06:17Z" level=info msg="getting lease"
2024-01-26T11:06:17.837810966Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:06:17Z" level=info msg="invalidating lease"
time="2024-01-26T11:06:17Z" level=info msg="getting lease"
2024-01-26T11:06:17.839885617Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:06:17Z" level=info msg="invalidating lease"
time="2024-01-26T11:06:17Z" level=info msg="getting lease"
2024-01-26T11:06:17.84207031Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "2m0.166850924s"}
2024-01-26T11:08:18.009811502Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:08:18.01697544Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:08:18Z" level=info msg="invalidating lease"
time="2024-01-26T11:08:18Z" level=info msg="getting lease"
2024-01-26T11:08:18.019413649Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:08:18Z" level=info msg="invalidating lease"
time="2024-01-26T11:08:18Z" level=info msg="getting lease"
2024-01-26T11:08:18.021558573Z INFO controllers.NodeHealthCheck handling unhealthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:08:18.027384941Z INFO controllers.NodeHealthCheck.resource manager Attempting to obtain Node Lease {"Node name": "<NODE_NAME>"}
time="2024-01-26T11:08:18Z" level=info msg="request lease"
time="2024-01-26T11:08:18Z" level=info msg="getting lease"
time="2024-01-26T11:08:18Z" level=info msg="create lease"
2024-01-26T11:08:18.034437689Z INFO controllers.NodeHealthCheck.resource manager Creating a remediation CR {"CR name": "<NODE_NAME>", "CR kind": "SelfNodeRemediation", "namespace": "operators"}
2024-01-26T11:08:18.043713213Z DEBUG events Created remediation object for node <NODE_NAME> {"type": "Normal", "object": {"kind":"NodeHealthCheck","name":"nodehealthcheck-test","uid":"a05edcfd-fd55-44ef-89b0-d5a88bce7fa4","apiVersion":"remediation.medik8s.io/v1alpha1","resourceVersion":"207083880";}, "reason": "RemediationCreated"}
2024-01-26T11:08:18.046531385Z INFO controllers.NodeHealthCheck Patching NHC status {"NodeHealthCheck name": "nodehealthcheck-test", "new status": {"observedNodes":3,"healthyNodes":2,"unhealthyNodes":[{"name":"<NODE_NAME>","remediations":[{"resource":{"kind":"SelfNodeRemediation","namespace":"operators","name":"<NODE_NAME>","uid":"55717f7d-0ff9-4349-8b18-4867446518e9","apiVersion":"self-node-remediation.medik8s.io/v1alpha1";},"started":"2024-01-26T11:08:18Z"}]}],"inFlightRemediations":{"<NODE_NAME>":"2024-01-26T11:08:18Z"},"conditions":[{"type":"Disabled","status":"False","lastTransitionTime":"2023-12-29T12:44:00Z","reason":"NodeHealthCheckEnabled","message":"No issues found, NodeHealthCheck is enabled."}],"phase":"Remediating","reason":"NHC is remediating 1 nodes","lastUpdateTime":"2024-01-23T21:27:15Z"}, "patch": "{\"status\":{\"healthyNodes\":2,\"inFlightRemediations\":{\"<NODE_NAME>\":\"2024-01-26T11:08:18Z\"},\"phase\":\"Remediating\",\"reason\":\"NHC is remediating 1 nodes\",\"unhealthyNodes\":[{\"name\":\"<NODE_NAME>\",\"remediations\":[{\"resource\":{\"apiVersion\":\"self-node-remediation.medik8s.io/v1alpha1\",\"kind\":\"SelfNodeRemediation\",\"name\":\"<NODE_NAME>\",\"namespace\":\"operators\",\"uid\":\"55717f7d-0ff9-4349-8b18-4867446518e9\";},\"started\":\"2024-01-26T11:08:18Z\"}]}]}}"}
2024-01-26T11:08:18.155992902Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:08:18.156021131Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:08:18.165543146Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:08:18.16556025Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:08:18.190420591Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:08:18.190439812Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:08:18.253303834Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "10m0s"}
2024-01-26T11:08:18.253406488Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:08:18.262271516Z INFO controllers.NodeHealthCheck found condition {"NodeHealthCheck name": "nodehealthcheck-test", "type": "Succeeded", "status": "Unknown", "reason": "", "message": "", "lastTransition": "2024-01-26T11:08:18Z"}
2024-01-26T11:08:18.262299588Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:08:18Z" level=info msg="invalidating lease"
time="2024-01-26T11:08:18Z" level=info msg="getting lease"
2024-01-26T11:08:18.266527097Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:08:18Z" level=info msg="invalidating lease"
time="2024-01-26T11:08:18Z" level=info msg="getting lease"
2024-01-26T11:08:18.270073765Z INFO controllers.NodeHealthCheck handling unhealthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:08:18.276169992Z INFO controllers.NodeHealthCheck.resource manager external remediation CR already exists {"CR name": "<NODE_NAME>", "kind": "SelfNodeRemediation", "namespace": "operators"}
time="2024-01-26T11:08:18Z" level=info msg="getting lease"
2024-01-26T11:08:18.276239545Z INFO controllers.NodeHealthCheck.nhc lease manager managing lease - about to try to acquire/extended the lease {"NodeHealthCheck name": "nodehealthcheck-test", "lease name": "node-<NODE_NAME>", "NHC is lease owner": true, "lease expiration time": "2024-01-26T11:38:18.027461Z"}
2024-01-26T11:08:18.279777221Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "10m0s"}
2024-01-26T11:08:19.170968176Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:08:19.171000966Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:08:19.171059758Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:08:19.178356722Z INFO controllers.NodeHealthCheck found condition {"NodeHealthCheck name": "nodehealthcheck-test", "type": "Succeeded", "status": "Unknown", "reason": "", "message": "", "lastTransition": "2024-01-26T11:08:18Z"}
2024-01-26T11:08:19.178378255Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:08:19Z" level=info msg="invalidating lease"
time="2024-01-26T11:08:19Z" level=info msg="getting lease"
2024-01-26T11:08:19.18167916Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:08:19Z" level=info msg="invalidating lease"
time="2024-01-26T11:08:19Z" level=info msg="getting lease"
2024-01-26T11:08:19.184521336Z INFO controllers.NodeHealthCheck handling unhealthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:08:19.189642749Z INFO controllers.NodeHealthCheck.resource manager external remediation CR already exists {"CR name": "<NODE_NAME>", "kind": "SelfNodeRemediation", "namespace": "operators"}
time="2024-01-26T11:08:19Z" level=info msg="getting lease"
2024-01-26T11:08:19.189697741Z INFO controllers.NodeHealthCheck.nhc lease manager managing lease - about to try to acquire/extended the lease {"NodeHealthCheck name": "nodehealthcheck-test", "lease name": "node-<NODE_NAME>", "NHC is lease owner": true, "lease expiration time": "2024-01-26T11:38:18.027461Z"}
2024-01-26T11:08:19.193113057Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "10m0s"}
2024-01-26T11:09:24.209842886Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:09:24.209928105Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:09:24.209989002Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:09:24.219270317Z INFO controllers.NodeHealthCheck found condition {"NodeHealthCheck name": "nodehealthcheck-test", "type": "Succeeded", "status": "Unknown", "reason": "", "message": "", "lastTransition": "2024-01-26T11:08:18Z"}
2024-01-26T11:09:24.219296617Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:09:24Z" level=info msg="invalidating lease"
time="2024-01-26T11:09:24Z" level=info msg="getting lease"
2024-01-26T11:09:24.222925481Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:09:24Z" level=info msg="invalidating lease"
time="2024-01-26T11:09:24Z" level=info msg="getting lease"
2024-01-26T11:09:24.225887984Z INFO controllers.NodeHealthCheck handling unhealthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:09:24.23109878Z INFO controllers.NodeHealthCheck.resource manager external remediation CR already exists {"CR name": "<NODE_NAME>", "kind": "SelfNodeRemediation", "namespace": "operators"}
time="2024-01-26T11:09:24Z" level=info msg="getting lease"
2024-01-26T11:09:24.231153095Z INFO controllers.NodeHealthCheck.nhc lease manager managing lease - about to try to acquire/extended the lease {"NodeHealthCheck name": "nodehealthcheck-test", "lease name": "node-<NODE_NAME>", "NHC is lease owner": true, "lease expiration time": "2024-01-26T11:38:18.027461Z"}
time="2024-01-26T11:09:24Z" level=info msg="request lease"
time="2024-01-26T11:09:24Z" level=info msg="getting lease"
time="2024-01-26T11:09:24Z" level=info msg="renew lease owned by NodeHealthCheck-nodehealthcheck-test setAcquireTime=false"
2024-01-26T11:09:24.244231019Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "10m0s"}
2024-01-26T11:10:55.775946225Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:10:55.77603465Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:10:55.776092086Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:10:55.78330685Z INFO controllers.NodeHealthCheck found condition {"NodeHealthCheck name": "nodehealthcheck-test", "type": "Succeeded", "status": "Unknown", "reason": "", "message": "", "lastTransition": "2024-01-26T11:08:18Z"}
2024-01-26T11:10:55.783334639Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:10:55Z" level=info msg="invalidating lease"
time="2024-01-26T11:10:55Z" level=info msg="getting lease"
2024-01-26T11:10:55.78671796Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:10:55Z" level=info msg="invalidating lease"
time="2024-01-26T11:10:55Z" level=info msg="getting lease"
2024-01-26T11:10:55.790399614Z INFO controllers.NodeHealthCheck handling unhealthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:10:55.795906213Z INFO controllers.NodeHealthCheck.resource manager external remediation CR already exists {"CR name": "<NODE_NAME>", "kind": "SelfNodeRemediation", "namespace": "operators"}
time="2024-01-26T11:10:55Z" level=info msg="getting lease"
2024-01-26T11:10:55.795960559Z INFO controllers.NodeHealthCheck.nhc lease manager managing lease - about to try to acquire/extended the lease {"NodeHealthCheck name": "nodehealthcheck-test", "lease name": "node-<NODE_NAME>", "NHC is lease owner": true, "lease expiration time": "2024-01-26T11:38:18.027461Z"}
time="2024-01-26T11:10:55Z" level=info msg="request lease"
time="2024-01-26T11:10:55Z" level=info msg="getting lease"
time="2024-01-26T11:10:55Z" level=info msg="renew lease owned by NodeHealthCheck-nodehealthcheck-test setAcquireTime=false"
2024-01-26T11:10:55.803662786Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "10m0s"}
2024-01-26T11:10:57.9282935Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:10:57.928356552Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:10:57.928401186Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:10:57.939813491Z INFO controllers.NodeHealthCheck found condition {"NodeHealthCheck name": "nodehealthcheck-test", "type": "Succeeded", "status": "Unknown", "reason": "", "message": "", "lastTransition": "2024-01-26T11:08:18Z"}
2024-01-26T11:10:57.939836369Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:10:57.954666899Z INFO controllers.NodeHealthCheck deleted remediation CR {"NodeHealthCheck name": "nodehealthcheck-test", "name": "<NODE_NAME>"}
2024-01-26T11:10:57.954684861Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
2024-01-26T11:10:57.954846976Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:10:57.954874669Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
time="2024-01-26T11:10:57Z" level=info msg="invalidating lease"
time="2024-01-26T11:10:57Z" level=info msg="getting lease"
2024-01-26T11:10:57.957764694Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:10:57Z" level=info msg="invalidating lease"
time="2024-01-26T11:10:57Z" level=info msg="getting lease"
2024-01-26T11:10:57.960505072Z INFO controllers.NodeHealthCheck Patching NHC status {"NodeHealthCheck name": "nodehealthcheck-test", "new status": {"observedNodes":3,"healthyNodes":2,"unhealthyNodes":[{"name":"<NODE_NAME>","remediations":[{"resource":{"kind":"SelfNodeRemediation","namespace":"operators","name":"<NODE_NAME>","uid":"55717f7d-0ff9-4349-8b18-4867446518e9","apiVersion":"self-node-remediation.medik8s.io/v1alpha1";},"started":"2024-01-26T11:08:18Z"}],"conditionsHealthyTimestamp":"2024-01-26T11:10:57Z"}],"inFlightRemediations":{"<NODE_NAME>":"2024-01-26T11:08:18Z"},"conditions":[{"type":"Disabled","status":"False","lastTransitionTime":"2023-12-29T12:44:00Z","reason":"NodeHealthCheckEnabled","message":"No issues found, NodeHealthCheck is enabled."}],"phase":"Remediating","reason":"NHC is remediating 1 nodes","lastUpdateTime":"2024-01-26T11:08:18Z"}, "patch": "{\"status\":{\"unhealthyNodes\":[{\"conditionsHealthyTimestamp\":\"2024-01-26T11:10:57Z\",\"name\":\"<NODE_NAME>\",\"remediations\":[{\"resource\":{\"apiVersion\":\"self-node-remediation.medik8s.io/v1alpha1\",\"kind\":\"SelfNodeRemediation\",\"name\":\"<NODE_NAME>\",\"namespace\":\"operators\",\"uid\":\"55717f7d-0ff9-4349-8b18-4867446518e9\";},\"started\":\"2024-01-26T11:08:18Z\"}]}]}}"}
2024-01-26T11:10:58.167584834Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "11s"}
2024-01-26T11:10:58.167655006Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:10:58.178094464Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:10:58Z" level=info msg="invalidating lease"
time="2024-01-26T11:10:58Z" level=info msg="getting lease"
2024-01-26T11:10:58.181963185Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:10:58Z" level=info msg="invalidating lease"
time="2024-01-26T11:10:58Z" level=info msg="getting lease"
2024-01-26T11:10:58.184896717Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:10:58.198346791Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "11s"}
2024-01-26T11:10:58.920308673Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:10:58.920368454Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:10:58.920413139Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:10:58.926383662Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:10:58Z" level=info msg="invalidating lease"
time="2024-01-26T11:10:58Z" level=info msg="getting lease"
2024-01-26T11:10:58.92968194Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:10:58Z" level=info msg="invalidating lease"
time="2024-01-26T11:10:58Z" level=info msg="getting lease"
2024-01-26T11:10:58.933404487Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:10:58.939783828Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "11s"}
2024-01-26T11:11:05.01390928Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:11:05.013962413Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:11:05.014028208Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:11:05.02131988Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:11:05Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:05Z" level=info msg="getting lease"
2024-01-26T11:11:05.024719172Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:11:05.031639608Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:11:05Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:05Z" level=info msg="getting lease"
2024-01-26T11:11:05.034804329Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "11s"}
2024-01-26T11:11:05.577829581Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:11:05.577858422Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:11:05.577902296Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:11:05.584542307Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:11:05Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:05Z" level=info msg="getting lease"
2024-01-26T11:11:05.587661587Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:11:05.593544416Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:11:05Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:05Z" level=info msg="getting lease"
2024-01-26T11:11:05.596925783Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "11s"}
2024-01-26T11:11:06.047628904Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:11:06.047695765Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:11:06.052386858Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:11:06Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:06Z" level=info msg="getting lease"
2024-01-26T11:11:06.054142764Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:11:06Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:06Z" level=info msg="getting lease"
2024-01-26T11:11:06.056360562Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
time="2024-01-26T11:11:06Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:06Z" level=info msg="getting lease"
2024-01-26T11:11:06.063515523Z DEBUG events Deleted remediation CR of kind SelfNodeRemediation for node <NODE_NAME> {"type": "Normal", "object": {"kind":"NodeHealthCheck","name":"nodehealthcheck-test","uid":"a05edcfd-fd55-44ef-89b0-d5a88bce7fa4","apiVersion":"remediation.medik8s.io/v1alpha1","resourceVersion":"209651621";}, "reason": "RemediationRemoved"}
2024-01-26T11:11:06.063657698Z INFO controllers.NodeHealthCheck Patching NHC status {"NodeHealthCheck name": "nodehealthcheck-test", "new status": {"observedNodes":3,"healthyNodes":3,"conditions":[{"type":"Disabled","status":"False","lastTransitionTime":"2023-12-29T12:44:00Z","reason":"NodeHealthCheckEnabled","message":"No issues found, NodeHealthCheck is enabled."}],"phase":"Enabled","reason":"NHC is enabled, no ongoing remediation","lastUpdateTime":"2024-01-26T11:10:57Z"}, "patch": "{\"status\":{\"healthyNodes\":3,\"inFlightRemediations\":null,\"phase\":\"Enabled\",\"reason\":\"NHC is enabled, no ongoing remediation\",\"unhealthyNodes\":null}}"}
2024-01-26T11:11:06.270147886Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "1m52.949788431s"}
2024-01-26T11:11:08.42106474Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:11:08.421142981Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:11:08.421186196Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:11:08.427751587Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
time="2024-01-26T11:11:08Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:08Z" level=info msg="getting lease"
2024-01-26T11:11:08.431108651Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:11:08Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:08Z" level=info msg="getting lease"
2024-01-26T11:11:08.43371295Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:11:08Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:08Z" level=info msg="getting lease"
2024-01-26T11:11:08.436062715Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "0s"}
2024-01-26T11:11:09.16870641Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:11:09.175221113Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:11:09Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:09Z" level=info msg="getting lease"
2024-01-26T11:11:09.1777251Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:11:09Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:09Z" level=info msg="getting lease"
2024-01-26T11:11:09.180211763Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
time="2024-01-26T11:11:09Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:09Z" level=info msg="getting lease"
2024-01-26T11:11:09.182299808Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "0s"}
2024-01-26T11:11:09.230578049Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:11:09.23064881Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:11:09.230699946Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:11:09.240158038Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
time="2024-01-26T11:11:09Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:09Z" level=info msg="getting lease"
2024-01-26T11:11:09.244700149Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:11:09Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:09Z" level=info msg="getting lease"
2024-01-26T11:11:09.247202716Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:11:09Z" level=info msg="invalidating lease"
time="2024-01-26T11:11:09Z" level=info msg="getting lease"
2024-01-26T11:11:09.250310339Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "1m59.763530493s"}
2024-01-26T11:13:09.014616977Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:13:09.025091838Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:13:09Z" level=info msg="invalidating lease"
time="2024-01-26T11:13:09Z" level=info msg="getting lease"
2024-01-26T11:13:09.02764994Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:13:09Z" level=info msg="invalidating lease"
time="2024-01-26T11:13:09Z" level=info msg="getting lease"
2024-01-26T11:13:09.029769579Z INFO controllers.NodeHealthCheck handling unhealthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:13:09.033901896Z INFO controllers.NodeHealthCheck.resource manager Attempting to obtain Node Lease {"Node name": "<NODE_NAME>"}
time="2024-01-26T11:13:09Z" level=info msg="request lease"
time="2024-01-26T11:13:09Z" level=info msg="getting lease"
time="2024-01-26T11:13:09Z" level=info msg="create lease"
2024-01-26T11:13:09.039372096Z INFO controllers.NodeHealthCheck.resource manager Creating a remediation CR {"CR name": "<NODE_NAME>", "CR kind": "SelfNodeRemediation", "namespace": "operators"}
2024-01-26T11:13:09.047717942Z DEBUG events Created remediation object for node <NODE_NAME> {"type": "Normal", "object": {"kind":"NodeHealthCheck","name":"nodehealthcheck-test","uid":"a05edcfd-fd55-44ef-89b0-d5a88bce7fa4","apiVersion":"remediation.medik8s.io/v1alpha1","resourceVersion":"209651878";}, "reason": "RemediationCreated"}
2024-01-26T11:13:09.050422773Z INFO controllers.NodeHealthCheck Patching NHC status {"NodeHealthCheck name": "nodehealthcheck-test", "new status": {"observedNodes":3,"healthyNodes":2,"unhealthyNodes":[{"name":"<NODE_NAME>","remediations":[{"resource":{"kind":"SelfNodeRemediation","namespace":"operators","name":"<NODE_NAME>","uid":"747547b8-a049-4671-85d3-87fa01554752","apiVersion":"self-node-remediation.medik8s.io/v1alpha1";},"started":"2024-01-26T11:13:09Z"}]}],"inFlightRemediations":{"<NODE_NAME>":"2024-01-26T11:13:09Z"},"conditions":[{"type":"Disabled","status":"False","lastTransitionTime":"2023-12-29T12:44:00Z","reason":"NodeHealthCheckEnabled","message":"No issues found, NodeHealthCheck is enabled."}],"phase":"Remediating","reason":"NHC is remediating 1 nodes","lastUpdateTime":"2024-01-26T11:11:06Z"}, "patch": "{\"status\":{\"healthyNodes\":2,\"inFlightRemediations\":{\"<NODE_NAME>\":\"2024-01-26T11:13:09Z\"},\"phase\":\"Remediating\",\"reason\":\"NHC is remediating 1 nodes\",\"unhealthyNodes\":[{\"name\":\"<NODE_NAME>\",\"remediations\":[{\"resource\":{\"apiVersion\":\"self-node-remediation.medik8s.io/v1alpha1\",\"kind\":\"SelfNodeRemediation\",\"name\":\"<NODE_NAME>\",\"namespace\":\"operators\",\"uid\":\"747547b8-a049-4671-85d3-87fa01554752\";},\"started\":\"2024-01-26T11:13:09Z\"}]}]}}"}
2024-01-26T11:13:09.055424502Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:13:09.055450931Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:13:09.061593313Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:13:09.061609107Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:13:09.091822625Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:13:09.091836181Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:13:09.108077456Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:13:09.108092058Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:13:09.258291149Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "10m0s"}
2024-01-26T11:13:09.25835713Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:13:09.265323504Z INFO controllers.NodeHealthCheck found condition {"NodeHealthCheck name": "nodehealthcheck-test", "type": "Succeeded", "status": "Unknown", "reason": "", "message": "", "lastTransition": "2024-01-26T11:13:09Z"}
2024-01-26T11:13:09.265338645Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:13:09Z" level=info msg="invalidating lease"
time="2024-01-26T11:13:09Z" level=info msg="getting lease"
2024-01-26T11:13:09.269544441Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:13:09Z" level=info msg="invalidating lease"
time="2024-01-26T11:13:09Z" level=info msg="getting lease"
2024-01-26T11:13:09.274188625Z INFO controllers.NodeHealthCheck handling unhealthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:13:09.279581621Z INFO controllers.NodeHealthCheck.resource manager external remediation CR already exists {"CR name": "<NODE_NAME>", "kind": "SelfNodeRemediation", "namespace": "operators"}
time="2024-01-26T11:13:09Z" level=info msg="getting lease"
2024-01-26T11:13:09.279633914Z INFO controllers.NodeHealthCheck.nhc lease manager managing lease - about to try to acquire/extended the lease {"NodeHealthCheck name": "nodehealthcheck-test", "lease name": "node-<NODE_NAME>", "NHC is lease owner": true, "lease expiration time": "2024-01-26T11:43:09.033961Z"}
2024-01-26T11:13:09.283964833Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "10m0s"}
2024-01-26T11:14:04.213078217Z INFO adding NHC to reconcile queue for handling node {"node": "avi-ef-kub-free08", "NHC": "nodehealthcheck-test"}
2024-01-26T11:14:04.213162504Z INFO adding NHC to reconcile queue for handling node {"node": "avi-ef-kub-free08", "NHC": "nodehealthcheck-test"}
2024-01-26T11:14:04.213217359Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:14:04.220755121Z INFO controllers.NodeHealthCheck found condition {"NodeHealthCheck name": "nodehealthcheck-test", "type": "Succeeded", "status": "Unknown", "reason": "", "message": "", "lastTransition": "2024-01-26T11:13:09Z"}
2024-01-26T11:14:04.220804024Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:14:04Z" level=info msg="invalidating lease"
time="2024-01-26T11:14:04Z" level=info msg="getting lease"
2024-01-26T11:14:04.227017106Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:14:04Z" level=info msg="invalidating lease"
time="2024-01-26T11:14:04Z" level=info msg="getting lease"
2024-01-26T11:14:04.23161739Z INFO controllers.NodeHealthCheck handling unhealthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:14:04.239268634Z INFO controllers.NodeHealthCheck.resource manager external remediation CR already exists {"CR name": "<NODE_NAME>", "kind": "SelfNodeRemediation", "namespace": "operators"}
time="2024-01-26T11:14:04Z" level=info msg="getting lease"
2024-01-26T11:14:04.239322303Z INFO controllers.NodeHealthCheck.nhc lease manager managing lease - about to try to acquire/extended the lease {"NodeHealthCheck name": "nodehealthcheck-test", "lease name": "node-<NODE_NAME>", "NHC is lease owner": true, "lease expiration time": "2024-01-26T11:43:09.033961Z"}
2024-01-26T11:14:04.251960714Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "10m0s"}
2024-01-26T11:14:19.350546438Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:14:19.350612348Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:14:19.350653984Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:14:19.362415669Z INFO controllers.NodeHealthCheck found condition {"NodeHealthCheck name": "nodehealthcheck-test", "type": "Succeeded", "status": "Unknown", "reason": "", "message": "", "lastTransition": "2024-01-26T11:13:09Z"}
2024-01-26T11:14:19.362446651Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:14:19Z" level=info msg="invalidating lease"
time="2024-01-26T11:14:19Z" level=info msg="getting lease"
2024-01-26T11:14:19.366880684Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:14:19Z" level=info msg="invalidating lease"
time="2024-01-26T11:14:19Z" level=info msg="getting lease"
2024-01-26T11:14:19.369927682Z INFO controllers.NodeHealthCheck handling unhealthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:14:19.393391481Z INFO controllers.NodeHealthCheck.resource manager external remediation CR already exists {"CR name": "<NODE_NAME>", "kind": "SelfNodeRemediation", "namespace": "operators"}
time="2024-01-26T11:14:19Z" level=info msg="getting lease"
2024-01-26T11:14:19.39348805Z INFO controllers.NodeHealthCheck.nhc lease manager managing lease - about to try to acquire/extended the lease {"NodeHealthCheck name": "nodehealthcheck-test", "lease name": "node-<NODE_NAME>", "NHC is lease owner": true, "lease expiration time": "2024-01-26T11:43:09.033961Z"}
time="2024-01-26T11:14:19Z" level=info msg="request lease"
time="2024-01-26T11:14:19Z" level=info msg="getting lease"
time="2024-01-26T11:14:19Z" level=info msg="renew lease owned by NodeHealthCheck-nodehealthcheck-test setAcquireTime=false"
2024-01-26T11:14:19.402180571Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "10m0s"}
2024-01-26T11:14:49.209949738Z INFO adding NHC to reconcile queue for handling node {"node": "avi-ef-kub-free08", "NHC": "nodehealthcheck-test"}
2024-01-26T11:14:49.210046482Z INFO adding NHC to reconcile queue for handling node {"node": "avi-ef-kub-free08", "NHC": "nodehealthcheck-test"}
2024-01-26T11:14:49.210099482Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:14:49.216247697Z INFO controllers.NodeHealthCheck found condition {"NodeHealthCheck name": "nodehealthcheck-test", "type": "Succeeded", "status": "Unknown", "reason": "", "message": "", "lastTransition": "2024-01-26T11:13:09Z"}
2024-01-26T11:14:49.216275218Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:14:49Z" level=info msg="invalidating lease"
time="2024-01-26T11:14:49Z" level=info msg="getting lease"
2024-01-26T11:14:49.219744029Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:14:49Z" level=info msg="invalidating lease"
time="2024-01-26T11:14:49Z" level=info msg="getting lease"
2024-01-26T11:14:49.222439386Z INFO controllers.NodeHealthCheck handling unhealthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:14:49.228711036Z INFO controllers.NodeHealthCheck.resource manager external remediation CR already exists {"CR name": "<NODE_NAME>", "kind": "SelfNodeRemediation", "namespace": "operators"}
time="2024-01-26T11:14:49Z" level=info msg="getting lease"
2024-01-26T11:14:49.228769926Z INFO controllers.NodeHealthCheck.nhc lease manager managing lease - about to try to acquire/extended the lease {"NodeHealthCheck name": "nodehealthcheck-test", "lease name": "node-<NODE_NAME>", "NHC is lease owner": true, "lease expiration time": "2024-01-26T11:43:09.033961Z"}
2024-01-26T11:14:49.2324699Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "10m0s"}
2024-01-26T11:15:34.52653851Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:15:34.526633729Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:15:34.526686396Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:15:34.533045973Z INFO controllers.NodeHealthCheck found condition {"NodeHealthCheck name": "nodehealthcheck-test", "type": "Succeeded", "status": "Unknown", "reason": "", "message": "", "lastTransition": "2024-01-26T11:13:09Z"}
2024-01-26T11:15:34.533073809Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:15:34Z" level=info msg="invalidating lease"
time="2024-01-26T11:15:34Z" level=info msg="getting lease"
2024-01-26T11:15:34.536677124Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:15:34Z" level=info msg="invalidating lease"
time="2024-01-26T11:15:34Z" level=info msg="getting lease"
2024-01-26T11:15:34.539932967Z INFO controllers.NodeHealthCheck handling unhealthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:15:34.54436734Z INFO controllers.NodeHealthCheck.resource manager external remediation CR already exists {"CR name": "<NODE_NAME>", "kind": "SelfNodeRemediation", "namespace": "operators"}
time="2024-01-26T11:15:34Z" level=info msg="getting lease"
2024-01-26T11:15:34.544426727Z INFO controllers.NodeHealthCheck.nhc lease manager managing lease - about to try to acquire/extended the lease {"NodeHealthCheck name": "nodehealthcheck-test", "lease name": "node-<NODE_NAME>", "NHC is lease owner": true, "lease expiration time": "2024-01-26T11:43:09.033961Z"}
time="2024-01-26T11:15:34Z" level=info msg="request lease"
time="2024-01-26T11:15:34Z" level=info msg="getting lease"
time="2024-01-26T11:15:34Z" level=info msg="renew lease owned by NodeHealthCheck-nodehealthcheck-test setAcquireTime=false"
2024-01-26T11:15:34.552192062Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "10m0s"}
2024-01-26T11:15:38.407782224Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:15:38.40784314Z INFO adding NHC to reconcile queue for handling node {"node": "<NODE_NAME>", "NHC": "nodehealthcheck-test"}
2024-01-26T11:15:38.40788494Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:15:38.682276206Z INFO controllers.NodeHealthCheck found condition {"NodeHealthCheck name": "nodehealthcheck-test", "type": "Succeeded", "status": "Unknown", "reason": "", "message": "", "lastTransition": "2024-01-26T11:13:09Z"}
2024-01-26T11:15:38.682308267Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:15:39.030682958Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:15:39.030724314Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:15:39.031068881Z INFO controllers.NodeHealthCheck deleted remediation CR {"NodeHealthCheck name": "nodehealthcheck-test", "name": "<NODE_NAME>"}
2024-01-26T11:15:39.031083765Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:15:39Z" level=info msg="invalidating lease"
time="2024-01-26T11:15:39Z" level=info msg="getting lease"
2024-01-26T11:15:39.034670051Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:15:39Z" level=info msg="invalidating lease"
time="2024-01-26T11:15:39Z" level=info msg="getting lease"
2024-01-26T11:15:39.038388059Z INFO controllers.NodeHealthCheck Patching NHC status {"NodeHealthCheck name": "nodehealthcheck-test", "new status": {"observedNodes":3,"healthyNodes":2,"unhealthyNodes":[{"name":"<NODE_NAME>","remediations":[{"resource":{"kind":"SelfNodeRemediation","namespace":"operators","name":"<NODE_NAME>","uid":"747547b8-a049-4671-85d3-87fa01554752","apiVersion":"self-node-remediation.medik8s.io/v1alpha1";},"started":"2024-01-26T11:13:09Z"}],"conditionsHealthyTimestamp":"2024-01-26T11:15:39Z"}],"inFlightRemediations":{"<NODE_NAME>":"2024-01-26T11:13:09Z"},"conditions":[{"type":"Disabled","status":"False","lastTransitionTime":"2023-12-29T12:44:00Z","reason":"NodeHealthCheckEnabled","message":"No issues found, NodeHealthCheck is enabled."}],"phase":"Remediating","reason":"NHC is remediating 1 nodes","lastUpdateTime":"2024-01-26T11:13:09Z"}, "patch": "{\"status\":{\"unhealthyNodes\":[{\"conditionsHealthyTimestamp\":\"2024-01-26T11:15:39Z\",\"name\":\"<NODE_NAME>\",\"remediations\":[{\"resource\":{\"apiVersion\":\"self-node-remediation.medik8s.io/v1alpha1\",\"kind\":\"SelfNodeRemediation\",\"name\":\"<NODE_NAME>\",\"namespace\":\"operators\",\"uid\":\"747547b8-a049-4671-85d3-87fa01554752\";},\"started\":\"2024-01-26T11:13:09Z\"}]}]}}"}
2024-01-26T11:15:39.246550983Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "11s"}
2024-01-26T11:15:39.246648142Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:15:39.254185762Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:15:39Z" level=info msg="invalidating lease"
time="2024-01-26T11:15:39Z" level=info msg="getting lease"
2024-01-26T11:15:39.258011487Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:15:39Z" level=info msg="invalidating lease"
time="2024-01-26T11:15:39Z" level=info msg="getting lease"
2024-01-26T11:15:39.261048564Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:15:39.273863098Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "11s"}
2024-01-26T11:15:50.247610124Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:15:50.25531297Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:15:50Z" level=info msg="invalidating lease"
time="2024-01-26T11:15:50Z" level=info msg="getting lease"
2024-01-26T11:15:50.258594703Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:15:50Z" level=info msg="invalidating lease"
time="2024-01-26T11:15:50Z" level=info msg="getting lease"
2024-01-26T11:15:50.262283718Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:15:50.269503292Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "11s"}
2024-01-26T11:15:55.0137636Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:15:55.01383323Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:15:55.013895834Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:15:55.023346455Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:15:55Z" level=info msg="invalidating lease"
time="2024-01-26T11:15:55Z" level=info msg="getting lease"
2024-01-26T11:15:55.026094099Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:15:55Z" level=info msg="invalidating lease"
time="2024-01-26T11:15:55Z" level=info msg="getting lease"
2024-01-26T11:15:55.029404498Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:15:55.036447082Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "11s"}
2024-01-26T11:15:55.374379158Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:15:55.374408788Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:15:55.374482406Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:15:55.381511997Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
2024-01-26T11:15:55.388145619Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:15:55Z" level=info msg="invalidating lease"
time="2024-01-26T11:15:55Z" level=info msg="getting lease"
2024-01-26T11:15:55.391376695Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:15:55Z" level=info msg="invalidating lease"
time="2024-01-26T11:15:55Z" level=info msg="getting lease"
2024-01-26T11:15:55.394379894Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "11s"}
2024-01-26T11:15:56.044142841Z INFO controllers.NodeHealthCheck mapper: found NHC for remediation CR {"NHC Name": "nodehealthcheck-test", "Remediation CR Name": "<NODE_NAME>", "Remediation CR Kind": "SelfNodeRemediation"}
2024-01-26T11:15:56.044205683Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:15:56.051234868Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
time="2024-01-26T11:15:56Z" level=info msg="invalidating lease"
time="2024-01-26T11:15:56Z" level=info msg="getting lease"
2024-01-26T11:15:56.083958628Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
2024-01-26T11:15:56.0840354Z DEBUG events Deleted remediation CR of kind SelfNodeRemediation for node <NODE_NAME> {"type": "Normal", "object": {"kind":"NodeHealthCheck","name":"nodehealthcheck-test","uid":"a05edcfd-fd55-44ef-89b0-d5a88bce7fa4","apiVersion":"remediation.medik8s.io/v1alpha1","resourceVersion":"209655320";}, "reason": "RemediationRemoved"}
time="2024-01-26T11:15:56Z" level=info msg="invalidating lease"
time="2024-01-26T11:15:56Z" level=info msg="getting lease"
2024-01-26T11:15:56.086224424Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:15:56Z" level=info msg="invalidating lease"
time="2024-01-26T11:15:56Z" level=info msg="getting lease"
2024-01-26T11:15:56.088160044Z INFO controllers.NodeHealthCheck Patching NHC status {"NodeHealthCheck name": "nodehealthcheck-test", "new status": {"observedNodes":3,"healthyNodes":3,"conditions":[{"type":"Disabled","status":"False","lastTransitionTime":"2023-12-29T12:44:00Z","reason":"NodeHealthCheckEnabled","message":"No issues found, NodeHealthCheck is enabled."}],"phase":"Enabled","reason":"NHC is enabled, no ongoing remediation","lastUpdateTime":"2024-01-26T11:15:39Z"}, "patch": "{\"status\":{\"healthyNodes\":3,\"inFlightRemediations\":null,\"phase\":\"Enabled\",\"reason\":\"NHC is enabled, no ongoing remediation\",\"unhealthyNodes\":null}}"}
2024-01-26T11:15:56.293703234Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "0s"}
2024-01-26T11:16:01.270595978Z INFO controllers.NodeHealthCheck reconciling {"NodeHealthCheck name": "nodehealthcheck-test"}
2024-01-26T11:16:01.276129042Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub-free08"}
time="2024-01-26T11:16:01Z" level=info msg="invalidating lease"
time="2024-01-26T11:16:01Z" level=info msg="getting lease"
2024-01-26T11:16:01.278520075Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "avi-ef-kub252"}
time="2024-01-26T11:16:01Z" level=info msg="invalidating lease"
time="2024-01-26T11:16:01Z" level=info msg="getting lease"
2024-01-26T11:16:01.281560776Z INFO controllers.NodeHealthCheck handling healthy node {"NodeHealthCheck name": "nodehealthcheck-test", "node": "<NODE_NAME>"}
time="2024-01-26T11:16:01Z" level=info msg="invalidating lease"
time="2024-01-26T11:16:01Z" level=info msg="getting lease"
2024-01-26T11:16:01.28397972Z INFO controllers.NodeHealthCheck reconcile end {"NodeHealthCheck name": "nodehealthcheck-test", "error": null, "requeue": false, "requeuAfter": "0s"}
```

logs of 
```
2024-01-26T11:08:18.038117606Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "UID": "49e97f03-61d1-4724-a2a5-b4a4b82c30ca", "kind": "self-node-remediation.medik8s.io/v1alpha1, Kind=SelfNodeRemediation", "resource": {"group":"self-node-remediation.medik8s.io","version":"v1alpha1","resource":"selfnoderemediations"}}
2024-01-26T11:08:18.03824363Z INFO selfnoderemediation-resource validate create {"name": "<NODE_NAME>"}
2024-01-26T11:08:18.038348037Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "code": 200, "reason": "", "UID": "49e97f03-61d1-4724-a2a5-b4a4b82c30ca", "allowed": true}
2024-01-26T11:08:18.151609929Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "UID": "6306e772-63b8-4083-b867-f0f6bc106ade", "kind": "self-node-remediation.medik8s.io/v1alpha1, Kind=SelfNodeRemediation", "resource": {"group":"self-node-remediation.medik8s.io","version":"v1alpha1","resource":"selfnoderemediations"}}
2024-01-26T11:08:18.15171697Z INFO selfnoderemediation-resource validate update {"name": "<NODE_NAME>"}
2024-01-26T11:08:18.151736631Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "code": 200, "reason": "", "UID": "6306e772-63b8-4083-b867-f0f6bc106ade", "allowed": true}
2024-01-26T11:08:18.153843725Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "UID": "fd4db0c5-2d33-4c4e-89cf-6a7e5755e67f", "kind": "self-node-remediation.medik8s.io/v1alpha1, Kind=SelfNodeRemediation", "resource": {"group":"self-node-remediation.medik8s.io","version":"v1alpha1","resource":"selfnoderemediations"}}
2024-01-26T11:08:18.153877544Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "UID": "75ecd221-1416-495a-b457-8f4c1f8e4a05", "kind": "self-node-remediation.medik8s.io/v1alpha1, Kind=SelfNodeRemediation", "resource": {"group":"self-node-remediation.medik8s.io","version":"v1alpha1","resource":"selfnoderemediations"}}
2024-01-26T11:08:18.153938819Z INFO selfnoderemediation-resource validate update {"name": "<NODE_NAME>"}
2024-01-26T11:08:18.153956314Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "code": 200, "reason": "", "UID": "fd4db0c5-2d33-4c4e-89cf-6a7e5755e67f", "allowed": true}
2024-01-26T11:08:18.15397457Z INFO selfnoderemediation-resource validate update {"name": "<NODE_NAME>"}
2024-01-26T11:08:18.1539906Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "code": 200, "reason": "", "UID": "75ecd221-1416-495a-b457-8f4c1f8e4a05", "allowed": true}
2024-01-26T11:09:24.238920772Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:09:24.238959486Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:09:24.238964046Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:09:24.239624613Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:09:24 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:09:24 update was successful
2024-01-26T11:09:24.259841422Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:09:24.259869381Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:09:24.259874184Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:09:24.260422687Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:09:24 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:09:24 update was successful
2024-01-26T11:10:55.804287335Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:10:55.804337841Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:10:55.804342912Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:10:55.805037628Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:10:55 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:10:55 update was successful
2024-01-26T11:10:55.825904206Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:10:55.825932056Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:10:55.825937058Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:10:55.826510376Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:10:55 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:10:55 update was successful
2024-01-26T11:11:02.538311589Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:11:02.538357787Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:11:02.53836251Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:11:02.539027722Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:11:02 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:11:02 update was successful
2024-01-26T11:11:02.560146495Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:11:02.560182399Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:11:02.560187257Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:11:02.560845982Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:11:02 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:11:02 update was successful
2024-01-26T11:11:05.258007227Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:11:05.25805296Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:11:05.258058552Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:11:05.258763919Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:11:05 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:11:05 update was successful
2024-01-26T11:11:05.279697801Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:11:05.27972412Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:11:05.279728814Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:11:05.280295082Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:11:05 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:11:05 update was successful
2024-01-26T11:11:06.037225446Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "UID": "f8944e1d-1753-4961-928d-894c35040cb0", "kind": "self-node-remediation.medik8s.io/v1alpha1, Kind=SelfNodeRemediation", "resource": {"group":"self-node-remediation.medik8s.io","version":"v1alpha1","resource":"selfnoderemediations"}}
2024-01-26T11:11:06.03742526Z INFO selfnoderemediation-resource validate update {"name": "<NODE_NAME>"}
2024-01-26T11:11:06.037462245Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "code": 200, "reason": "", "UID": "f8944e1d-1753-4961-928d-894c35040cb0", "allowed": true}
2024-01-26T11:11:16.54186446Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:11:16.541901357Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:11:16.541905999Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:11:16.542584527Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:11:16 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:11:16 update was successful
2024-01-26T11:11:16.56318888Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:11:16.563209099Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:11:16.563213972Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:11:16.563785998Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:11:16 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:11:16 update was successful
2024-01-26T11:13:09.042763839Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "UID": "0fe24e03-ddce-4d6e-b3b3-6e8ff8ba8836", "kind": "self-node-remediation.medik8s.io/v1alpha1, Kind=SelfNodeRemediation", "resource": {"group":"self-node-remediation.medik8s.io","version":"v1alpha1","resource":"selfnoderemediations"}}
2024-01-26T11:13:09.042879715Z INFO selfnoderemediation-resource validate create {"name": "<NODE_NAME>"}
2024-01-26T11:13:09.042904272Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "code": 200, "reason": "", "UID": "0fe24e03-ddce-4d6e-b3b3-6e8ff8ba8836", "allowed": true}
2024-01-26T11:13:09.05256542Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "UID": "7094ed4a-1fc8-4413-a12f-35b4daf829b1", "kind": "self-node-remediation.medik8s.io/v1alpha1, Kind=SelfNodeRemediation", "resource": {"group":"self-node-remediation.medik8s.io","version":"v1alpha1","resource":"selfnoderemediations"}}
2024-01-26T11:13:09.052650445Z INFO selfnoderemediation-resource validate update {"name": "<NODE_NAME>"}
2024-01-26T11:13:09.052664489Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "code": 200, "reason": "", "UID": "7094ed4a-1fc8-4413-a12f-35b4daf829b1", "allowed": true}
2024-01-26T11:13:09.054270698Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "UID": "31fc59cb-c948-4450-a678-ddad7bbe5ad4", "kind": "self-node-remediation.medik8s.io/v1alpha1, Kind=SelfNodeRemediation", "resource": {"group":"self-node-remediation.medik8s.io","version":"v1alpha1","resource":"selfnoderemediations"}}
2024-01-26T11:13:09.054347424Z INFO selfnoderemediation-resource validate update {"name": "<NODE_NAME>"}
2024-01-26T11:13:09.054359734Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "code": 200, "reason": "", "UID": "31fc59cb-c948-4450-a678-ddad7bbe5ad4", "allowed": true}
2024-01-26T11:14:19.379676522Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:14:19.379723789Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:14:19.379728545Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:14:19.380451291Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:14:19 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:14:19 update was successful
2024-01-26T11:14:19.401643586Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:14:19.401676718Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:14:19.401681869Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:14:19.402638937Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:14:19 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:14:19 update was successful
2024-01-26T11:15:34.553825691Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:15:34.553865936Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:15:34.553870293Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:15:34.554573392Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:15:34 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:15:34 update was successful
2024-01-26T11:15:34.575971237Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:15:34.57599388Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:15:34.575999272Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:15:34.577700834Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:15:34 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:15:34 update was successful
2024-01-26T11:15:39.092284092Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:15:39.09232807Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:15:39.092332705Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:15:39.09302018Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:15:39 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:15:39 update was successful
2024-01-26T11:15:39.11784222Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:15:39.117868589Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:15:39.117873417Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:15:39.118480403Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:15:39 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:15:39 update was successful
2024-01-26T11:15:41.090487194Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:15:41.090534318Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:15:41.090538735Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:15:41.091199894Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:15:41 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:15:41 update was successful
2024-01-26T11:15:41.113900536Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:15:41.113927681Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:15:41.113940969Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:15:41.114500546Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
2024/01/26 11:15:41 reconciling (apps/v1, Kind=DaemonSet) operators/self-node-remediation-ds
2024/01/26 11:15:41 update was successful
2024-01-26T11:15:56.035984933Z DEBUG controller-runtime.webhook.webhooks received request {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "UID": "fff2986c-4e5a-42f7-b972-8886ea1c0408", "kind": "self-node-remediation.medik8s.io/v1alpha1, Kind=SelfNodeRemediation", "resource": {"group":"self-node-remediation.medik8s.io","version":"v1alpha1","resource":"selfnoderemediations"}}
2024-01-26T11:15:56.036135666Z INFO selfnoderemediation-resource validate update {"name": "<NODE_NAME>"}
2024-01-26T11:15:56.03615916Z DEBUG controller-runtime.webhook.webhooks wrote response {"webhook": "/validate-self-node-remediation-medik8s-io-v1alpha1-selfnoderemediation", "code": 200, "reason": "", "UID": "fff2986c-4e5a-42f7-b972-8886ea1c0408", "allowed": true}
2024-01-26T11:16:05.857161511Z INFO controllers.SelfNodeRemediationConfig Syncing certs
2024-01-26T11:16:05.857207163Z INFO controllers.SelfNodeRemediationConfig Cert secret already exists
2024-01-26T11:16:05.85721197Z INFO controllers.SelfNodeRemediationConfig.syncConfigDaemonset Start to sync config daemonset
2024-01-26T11:16:05.857902821Z INFO controllers.SelfNodeRemediationConfig Updating DS tolerations
You received this message because you are subscribed to a topic in the Google Groups "medik8s" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/medik8s/ocRZ3lb5TEE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to medik8s+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/medik8s/CALOztymJiMBdjLg_UFhPxVnsc9dQdDUy%2BM%2BgUysh_XNpGXN4ag%40mail.gmail.com.

Marc Sluiter

unread,
Jan 26, 2024, 11:51:18 AM1/26/24
to Stanislav, Michael Shitrit, med...@googlegroups.com
Hi, 

not sure if I fully understand this: 

> • After rebooting, NPD with a broken RegistryIsNotAvailable check starts on the node
> • Despite this, taints are removed from it, and selfnoderemediations.self-node-remediation.medik8s.io is created again
> • While the node is rebooting, I tag v1 on the docker image with the normal NPD build
> • Taints are removed from the node and, since this time it doesn't have the RegistryIsNotAvailable condition, it continues to work

> Maybe I'm doing the test wrongly, but it currently looks like SNR ignores the condition after rebooting and removes taints in any case, which leads to a reboot cycle.

What happens when the node reboots, what is the status of the RegistryIsNotAvailable condition?
Having a quick look at the NHC logs, I assume there is a period of time when no unhealthy condition is matching (so Ready = True, and RegistryIsNotAvailable = False or Unknown, or isn't set). Then NHC deletes the SNR CR, and that triggers taint removal. 
Only after that RegistryIsNotAvailable is True again for 2 minutes, and a new remediation starts.
Does that make sense?

Just for clarification: SNR isn't watching conditions itself, it reacts to CR creation / deletion (and API server connectivity, but that's not relevant in this case)

BR,

Marc Sluiter

He / Him / His

Principal Software Engineer

Red Hat

mslu...@redhat.com


Red Hat GmbH, Registered seat: Werner von Siemens Ring 12, D-85630 Grasbrunn, Germany  
Commercial register: Amtsgericht Muenchen/Munich, HRB 153243,
Managing Directors: Ryan Barnhart, Charles Cachera, Michael O'Neill, Amy Ross

   



Stanislav

unread,
Jan 26, 2024, 12:21:25 PM1/26/24
to Marc Sluiter, med...@googlegroups.com
Hi, Marc!

Yes, I also assumed that at some point condition RegistryIsNotAvailable = True or Unknown, but I couldn't find any signs of this. It is possible that NodeProblemDetector changes the condition status at startup after a reboot. However, usually all changes in conditions are reflected in the node's events list, and it's not there. 

Besides, I'm interested in the logic of the operators working after a node reboot. What is required for a node to be marked as healthy after a reboot? Is a few seconds of condition = False sufficient for the taint to be removed?

Perhaps, I'll try to set the Unknown status for NHC as a sign of an unhealthy node. Let's see how it behaves in this case.

Thanks!
Stanislav

Marc Sluiter

unread,
Jan 27, 2024, 6:24:00 AM1/27/24
to Stanislav, med...@googlegroups.com
On Fri, Jan 26, 2024 at 6:21 PM Stanislav <lucid...@gmail.com> wrote:
Hi, Marc!

Yes, I also assumed that at some point condition RegistryIsNotAvailable = True or Unknown, but I couldn't find any signs of this. It is possible that NodeProblemDetector changes the condition status at startup after a reboot. However, usually all changes in conditions are reflected in the node's events list, and it's not there. 

Fair point, but there are multiple "RegistryIsNotAvailable is now: True" events, so I guess the condition wasn't True or didn't exist in between? The LastTransitionTime on the condition might be useful as well.
 
Besides, I'm interested in the logic of the operators working after a node reboot. What is required for a node to be marked as healthy after a reboot? Is a few seconds of condition = False sufficient for the taint to be removed?

- every change of node conditions triggers NHC. The reason for the change (e.g. kube's controller manager reacts to node availability during / after reboots and updates the Ready condition, or NPD is updating its conditions when it stopped or started?) doesn't matter.
- when no node condition matches any of the configured unhealthy conditions for the configured duration, the node is considered healthy. For the duration, the condition's LastTransitionTime is relevant. (*)
- for healthy nodes, NHC deletes SNR CR
- CR deletion triggers SNR, which removes taints.
 
(*) the upcoming release will have a change here, which *might* be relevant (https://github.com/medik8s/node-healthcheck-operator/pull/274):
A node will NOT be considered healthy, when it's unhealthy already, and a node condition matches any of the configured unhealthy conditions NO MATTER if the configured duration expired already.

A possible scenario *might* be:
- RegistryIsNotAvailable is True -> node becomes unhealthy after 2 minutes
- node is rebooted, Ready condition is set to False
- RegistryIsNotAvailable is either removed, or set to Unknown or False for whatever reason -> node still unhealthy because of Ready condition
- NPD is started and sets RegistryIsNotAvailable to True -> 2 minute "countdown" starts, node still unhealthy because of Ready condition
- boot completed, Ready condition = True
     -> current NHC version: node is healthy, because 2 minutes didn't expire yet; when they do, node is unhealthy again
     -> next NHC version: because the node was unhealthy, and there is a "running countdown" already, node is NOT considered healthy

In case you want to give the new version a try, I just pushed "v0.7.0-rc2" tags of operator and bundle images to Quay.


Thanks!
Stanislav


[...]

Stanislav

unread,
Jan 27, 2024, 1:18:13 PM1/27/24
to Marc Sluiter, med...@googlegroups.com
Hello, Marc!

I dug deeper into how NPD assigns statuses and found out that upon initialization it initializes all conditions to False. According to the documentation, this functionality can be disabled by adding skip_initial_status=true to the configuration of a custom check.

BTW, I ran some tests and found that this feature doesn't work as expected: conditions with skip_initial_status are not updated at all. The reason might be a bug made by the developer who added this feature. In the code, you can see that he completely disabled the initialization of conditions for checks with skip_initial_status. This probably leads to the check not being able to update later. However, this should not affect the removal of the NHC taint. I verified that starting NPD with skip_initial_status=true actually does not initialize RegistryIsNotAvailable=False.

I conducted an experiment by adding to the NodeHealthCheck:

duration: 30s
status: Unknown
type: RegistryIsNotAvailable

I don’t yet understand why, but after that, the taint from the node is no longer removed by NHC after reboot. I will continue to investigate this.

I also plan to test v0.7.0-rc2. Thank you for this opportunity! I will report back on Monday or Tuesday.

Have a great weekend!
Stanislav

Stanislav

unread,
Jan 29, 2024, 8:05:46 AM1/29/24
to Marc Sluiter, med...@googlegroups.com
Hi Marc!

As promised, I'm back with the results. I've tested the v0.7.0-rc2 version under the same conditions. That is, NPD sets RegistryIsNotAvailable=True, and the node reboots within 2 minutes. After the reboot, NPD (if I'm correct) resets RegistryIsNotAvailable=False and then sets it to True again after running the corresponding check. The behavior of NHC v0.7.0-rc2 is the same - it removes the taint shortly after the node comes back from reboot, and reassigns it after 2 minutes.

For now, I will focus on fixing the bug in Node Problem Detector so it's possible to disable condition reset at startup and update them later.

Best regards,
Stanislav.

Marc Sluiter

unread,
Jan 29, 2024, 9:50:42 AM1/29/24
to Stanislav, med...@googlegroups.com
Thanks for the update!
Yes, when RegistryIsNotAvailable=False (and Ready=True already I assume), the rc2 version doesn't change anything.
The NPD 
skip_initial_status sounds promising though!

If that also doesn't help, we need to evaluate options for NHC. There is a long standing idea for introducing something like a "nodeBootTime" option, during which conditions aren't evaluated. So far we didn't really need it though.

Cheers,

Marc

Stanislav

unread,
Feb 13, 2024, 9:25:42 AM2/13/24
to Marc Sluiter, med...@googlegroups.com
Hello, Marc!

The reboot cycle problem has been resolved with the combination of v0.7.0-rc2 and a patched NPD. Now, if the problem persists after reboot, the node remains cordoned.

I made a PR with a bug fix in NPD, it is waiting for testing by maintainers.

Thanks for your help!

Stanislav.

Marc Sluiter

unread,
Feb 14, 2024, 5:16:25 AM2/14/24
to Stanislav, med...@googlegroups.com
Awesome, thanks for the update! :)
We plan to release NHC v0.7.0 within the next 2 weeks.

BR,

Marc


Reply all
Reply to author
Forward
0 new messages