--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-architecture" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-arch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CABhP%3DtbAHeyxScAgQaMmsYbQsRbE%3DKCf%3D8SE0U31GYrqdjgXSQ%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CANw6fcGPSrUL%3DwEfFFEsSipVA0WnL8NPfd4WFzcLKv5rGjdHuQ%40mail.gmail.com.
Sure,On Tue, Jun 4, 2024 at 1:54 PM Davanum Srinivas wrote:
Antonio,Can you please link a few of them? I am concerned as well.thanks,Dims
On Fri, Jun 14, 2024 at 1:55 PM Filip Krepinsky <fkre...@redhat.com> wrote:
>
> The point of the NodeMaintenance and Evacuation APIs is not to solve the VM/Infra problem, but to solve pod eviction and node drain properly.
My reading is that adding a lot of workflows of day 2 operations
, I may be wrong of course, if that is the case I apologize in advance
for my confusion
>
> We support kubectl drain today, but it has many limitations. Kubectl drain can be used manually or imported as a library. Which it is by many projects (e.g. node-maintenance-operator, kured, machine-config-operator). Some projects (e.g. cluster autoscaler, karpenter) just take the inspiration from it and modify the logic to solve their needs. There are many others that use it in their scripts with varying degrees of success.
>
> None of these solutions are perfect. Not all workloads are easy to drain (PDB and eviction problems). Kubectl drain and each of these projects have quite complicated configurations to get the draining right. And it often requires custom solutions.
>
> Because draining is done in various unpredictable ways. For example an admin fires a kubectl drain, observes it to get blocked. Terminates kubectl, debugs and terminates the application (can take some time) and then resumes with kubectl drain again. There is no way for a 3rd party component to detect the progress of any of these drain solutions. And thus it is hard to build any higher level logic on top of it.
>
> If we build the NodeMaintenance as a CRD, it becomes just another drain solution that cluster components (both applications and infra components) cannot depend on. We do not want to solve the whole node lifecycle, just to do the node drain properly. All of today's solutions could then just create a NodeMaintenance object instead of doing a bunch of checks and calling kubectl. The same goes for people scripting the node shutdown. The big advantage is that it provides good observability of the drain and all the intentions of the cluster admin and other components.
>
Why don't we solve the whole node lifecycle first?
It was raised also during the review, building on top of things we
know are not in an ideal state is piling up technical debt we'll need
to pay later.
We should invest in solving the problems from the origin ...
On Jun 17, 2024, at 6:57 PM, 'Tim Hockin' via kubernetes-sig-architecture <kubernetes-si...@googlegroups.com> wrote:
Of all the APis we have, Node has some of the weirdest semantics.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAO_RewYPtfk85GnXOpAFvtmJ_SMGTYH06b8whMVSgvGxeyZWqw%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "kubernetes-sig-scheduling" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-sch...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-scheduling/CAEp5ociceU7xU1Ws_jBq0sFzWFK7mLkZ%2BZb-CSDd%3DGstUg9CHQ%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "Autoscaling Kubernetes" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-sig-auto...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-autoscaling/CAEmHhQJWTZuENHuGGpKFx_YU4t1fL8sRJpoUFpzL8n3PJvO1kA%40mail.gmail.com.
Thanks Vallery and all for the energetic discussion!
Looks like next steps logically would be to start a WG under sig-node (as primary SIG?) ... who wants to organize and set it up? :)
On Jun 19, 2024, at 11:59 AM, 'Tim Hockin' via kubernetes-sig-architecture <kubernetes-si...@googlegroups.com> wrote:
A related issue: https://github.com/kubernetes/autoscaler/issues/5201
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAO_RewbUub%3DdqdPkQhZtE8uH_aRyzYkqMNF2HiE_oMR_G25PSA%40mail.gmail.com.
involved cloud providers having to implement custom logic for clean termination of spot nodes (currently cloud specific) - we need a control plane controller that can eagerly delete pods on soon to be terminated nodes (because the node may not finish that operation in time).We have to delete the pods because endpoints controller doesn’t have a way today to default to eagerly removing endpoints on nodes performing graceful shutdown without a delete (because we forgot to spec a signal for that when we designed graceful mode shutdown).Also, we realized that node controller was supposed to mark pods on unready nodes also unready, but a bug has prevented that from working for several years in some cases.
On Wed, Jun 19, 2024, 5:23 PM Clayton <smarter...@gmail.com> wrote:involved cloud providers having to implement custom logic for clean termination of spot nodes (currently cloud specific) - we need a control plane controller that can eagerly delete pods on soon to be terminated nodes (because the node may not finish that operation in time).We have to delete the pods because endpoints controller doesn’t have a way today to default to eagerly removing endpoints on nodes performing graceful shutdown without a delete (because we forgot to spec a signal for that when we designed graceful mode shutdown).Also, we realized that node controller was supposed to mark pods on unready nodes also unready, but a bug has prevented that from working for several years in some cases.I will argue AGAINST doing this, until/unless the definition of "unready node" is way more robust and significant than it is today.
On Jun 20, 2024, at 12:32 AM, Tim Hockin <tho...@google.com> wrote:On Wed, Jun 19, 2024, 5:23 PM Clayton <smarter...@gmail.com> wrote:involved cloud providers having to implement custom logic for clean termination of spot nodes (currently cloud specific) - we need a control plane controller that can eagerly delete pods on soon to be terminated nodes (because the node may not finish that operation in time).
We have to delete the pods because endpoints controller doesn’t have a way today to default to eagerly removing endpoints on nodes performing graceful shutdown without a delete (because we forgot to spec a signal for that when we designed graceful mode shutdown).
Also, we realized that node controller was supposed to mark pods on unready nodes also unready, but a bug has prevented that from working for several years in some cases.I will argue AGAINST doing this, until/unless the definition of "unready node" is way more robust and significant than it is today.I agree, the concerning part is that it’s not clear whether it ever triggers and whether people are depending on an unreliable signal. The kubelet would override the change if it’s still able to update the API, which means this could trigger in some hairy failure modes and potentially prevent stable but split nodes from coasting.We really do need a set of folks working across the project to attack this successfully, so +1 to such a WG.On Jun 19, 2024, at 11:59 AM, 'Tim Hockin' via kubernetes-sig-architecture <kubernetes-si...@googlegroups.com> wrote:A related issue: https://github.com/kubernetes/autoscaler/issues/5201
"LB Controller needs to know when a node is ready to be deleted".
Part of node's lifecycle is the fact that nodes are sometimes used as
part of the load-balancing solution.
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-sig-architecture/CAEp5ocg6U898CdA3a%3DLu0Cn67-e5Pg7grhWTbPjmLCAUhwJmPg%40mail.gmail.com.
Folks,Any update on this new WG formation?