Hey Harvey,
Thanks for the info. I'm a bit confused by what "draining" would mean in this scenario.
The way this works in my setup is that I have 3 Envoy instances (called pods in Kubernetes), which have a Kubernetes load balancer in front of them.
And when I'm doing a rolling update, one pod at a time, Kubernetes removes the pod from the LB, and sends a SIGTERM to the container (in this case to the 'envoy' process). Then it waits 30 seconds, and if the process didn't exit by that time, then sends a SIGKILL. (So we don't have to worry about the incoming requests, because Kubernetes removes the pod from the LB at the beginning of the process, so it doesn't receive any new requests.)
What a typical problem can be (not just in Envoy, but with any service) is that when the SIGTERM is sent, there might be still requests in flight. And if the service doesn't wait for them to finish, but it terminates on the SIGTERM immediately, then those requests fail.
So this "/bin/sleep 30" command in the preStop hook is a usual trick to use when the service doesn't properly handle the SIGTERM this way, this way Kubernetes waits for 30 seconds before sending the SIGTERM.
What I don't know is: how does Envoy handle a SIGTERM signal? Does it stop the process immediately, or does it wait for the current requests to finish?
Thanks,
Mark