In some cases people are surprised that their deployment can momentarily have more pods during a rollout than described ( replicas - maxUnavailable < x < replicas + maxSurge). The culprit are Terminating pods that can run in addition to the Running + Starting pods.
Even though Terminating pods are not considered part of a deployment this can cause problems with resource usage and scheduling:
1. Unnecessary autoscaling of nodes in tight environments and driving up cloud costs. This can hurt especially if
- you rollout multiple deployments at the same time.
- you have generous termination periods and your pods take a long time to shutdown (example here https://github.com/kubernetes/kubernetes/issues/95498#issuecomment-814048997
2. A problem also arises in contentious environments where pods are fighting for resources. This can bring up exponential backoff for not yet started pods into big numbers and unnecessarily delay start of such pods until they pop from the queue when there are computing resources to run them. This can slow down the deployment considerably.
relevant issue: https://github.com/kubernetes/kubernetes/issues/98656
In this issue the resources were limited by a quota, but this can be due to other reasons as well. In our use case we noticed, this can occur also in high availability scenarios where pods are expected to run only on certain nodes and pod anti-affinity forbids to run two pods at the same node.
For all of these issues it could make sense to wait for a pod to be terminated before scheduling a new one. Even though some issues can be partially mitigated by proper setup of maxUnavailable and maxSurge, it is not applicable for all of them.
I would like to propose a new opt-in behaviour that would solve this. Deployment controller would include Terminating pods in the computation of current running replicas when deciding if the new RS should scale up (or old in case of proportional scaling).
This could be configured for example in .spec.strategy.rollingUpdate.scalingPolicy with possible values
1. IgnoreTerminatingPods - default and current behaviour
The disadvantage of this feature is a slower rollout in resource unrestricted environments. So, using this feature would be advised only for similar use cases to the mentioned ones above.
Please let me know if you would benefit from this feature or if you see any problems associated with it.