Pressed the wrong button when I tried to edit the code example... here's the correct (simplified) example:
def handle_sigterm(*_: Any) -> None :
"""Shutdown gracefully."""
done_event = server.stop(30)
done_event.wait(30)
server = grpc.server(
futures.ThreadPoolExecutor(max_workers = options['max_workers']),
)
add_Servicer_to_server(service, server)
server.add_insecure_port(options['address'])
server.start()
signal(SIGTERM, handle_sigterm)
server.wait_for_termination()
I'm deploying my service through kubernetes, which stops pods by first sending a SIGTERM event, and, if the pod is still alive after a timeout, it kills it using SIGKILL.
The behavior that I witness:
* the time it takes for the pod to terminate looks a lot like it's using the full 30s that kubernetes gives it = it ends up getting killed by SIGKILL
* the log for the stop of the service inside the SIGTERM handler never appears
So I think that one of following things might be happening here (that I can think of):
1. the handler never gets called because the process gets killed before it gets to handle it
2. the server takes so long to shut down, that kubernetes ends up killing it before it gets to logging the message
3. the shutdown event doesn't get set even after the server is done shutting down, causing kubernetes to kill it without logging the message
And for the long run: would it be possible to get something like wait_for_termination that is capable of handling graceful shutdown natively, without the developers having to add something on top of it?
Thanks for your help!