The Declarative Node Maintenance KEP (
https://github.com/kubernetes/enhancements/pull/4213) aimed to unify node drain and maintenance scenarios across the ecosystem. Its goal was to simply gracefully terminate all pods on a node in a user- and programatic-friendly way.
Unfortunately the current design and scope have encountered a couple of obstacles:
1. It is difficult to decide on the target of the maintenance. Should it be a node, a set of nodes, a set of DRA devices, a topology? This has implications for the node maintenance contract and the evolution of the API.
2. Similarly, it is difficult to reach a consensus on which pods should be evicted and in what order.
3. As part of the research and feedback received in the Node Lifecycle WG, the set of actions and observability requirements during the node maintenance can differ wildly across various cluster maintenance solutions. Hardcoding a state machine in stages imposes hard constraints on the community.
After much discussion, we have decided to approach this problem domain from a different angle in a different KEP. Our goal is not only to unify the node drain; we also want a solution that can be used as a general mechanism for maintenance and lifecycle, and to communicate the different needs and dependencies of various teams and actors (maintenance controllers, workloads, cloud providers, etc.). We want to start simpler and gradually build our features to support the various scenarios implemented across the ecosystem today.
Given all the interest there's been in this KEP, we want to carry that over to the replacement proposal, Specialised Lifecycle Management (
https://github.com/kubernetes/enhancements/pull/5769). Due to the change in scope, we are closing the Node Maintenance KEP, but we will bear the lessons learned in mind. The Node Maintenance effort also resulted in the EvictionRequest API proposal (
https://github.com/kubernetes/enhancements/pull/4565) which is important for the Pod lifecycle.
The first version of Specialised Lifecycle Management will not support node maintenance as the original KEP. However, our intention is to first create solid building blocks that will help us to achieve our goals (including node drain/maintenance),
We are looking forward to your feedback on both the Specialised Lifecycle Management and EvictionRequest API KEPs. You can also join us in the Node Lifecycle WG to discuss your use cases and pain points.
Best wishes
Filip