Pods undergoing graceful disruption are not removed from endpoints

4 views

Skip to first unread message

Clayton Coleman

unread,

Jul 10, 2024, 11:13:21 AM7/10/24

to kubernete...@googlegroups.com, kubernetes-sig-apps, kubernetes-...@googlegroups.com

After some exploration of https://github.com/kubernetes/kubernetes/issues/116965 and discussion with a few folks in each sig I believe that graceful node shutdown needs to provide a signal to the endpoints controllers that brings graceful disruption of a pod on the kubelet in line with graceful termination of a pod from the API (deletionTimestamp set) so that the pod is removed from the serving rotation as soon as possible, thus minimizing the chance that traffic is directed to an already terminated pod and creating user visible network disruption.

The most immediate impact of this is to workloads on cloud spot instances who are not using distribution specific "force deletion" logic, or workloads on rebooting nodes using graceful node shutdown that do not drain the nodes first.

A general description of the problem is in https://docs.google.com/document/d/1t25jgO_-LRHhjRXf4KJ5xY_t8BZYdapv7MDAxVGY6R8/edit?usp=sharing, please comment if you have strong opinions. A proof of concept implementation is linked in the bug that includes both phases, but the second proposed implementation phase will require a new Condition and thus needs to be added to a KEP and go through the new release cycle.

Reply all

Reply to author

Forward

0 new messages