python gRPC server graceful shutdown

M T

unread,

Jan 14, 2022, 3:20:22 PM1/14/22

to grpc.io

Hi all,

I'm currently trying to add some graceful shutdown logic into my gRPC server, but it seems that my shutdown handler never gets called:

def handle_sigterm(*_: Any) -> None :
"""Shutdown gracefully."""
done_event = server.stop(30)
done_event.wait(30)

self.stdout.write('Stop complete.')

server = grpc.server(
futures.ThreadPoolExecutor(max_workers = options['max_workers']),
)
add_
server.add_insecure_port(options['address'])
self.stdout.write(f'Starting gRPC server at {options["address"]}...')
server.start()
start_http_server(int(khaleesi_settings['MONITORING']['PORT']))
signal(SIGTERM, handle_sigterm)
self._log_server_state_event(
action = Event.Action.ActionType.START,
result = Event.Action.ResultType.SUCCESS,
details = 'Server started successfully.'
)
server.wait_for_termination()
except Exception as start_exception:
self._log_server_state_event(
action = Event.Action.ActionType.START,
result = Event.Action.ResultType.ERROR,
details = f'Server startup failed. {type(start_exception).__name__}: {str(start_exception)}'
)
raise start_exception from None

M T

unread,

Jan 14, 2022, 3:29:03 PM1/14/22

to grpc.io

Pressed the wrong button when I tried to edit the code example... here's the correct (simplified) example:

def handle_sigterm(*_: Any) -> None :
"""Shutdown gracefully."""
done_event = server.stop(30)
done_event.wait(30)

print('Stop complete.')

server = grpc.server(
futures.ThreadPoolExecutor(max_workers = options['max_workers']),
)

add_Servicer_to_server(service, server)
server.add_insecure_port(options['address'])
server.start()
signal(SIGTERM, handle_sigterm)
server.wait_for_termination()

I'm deploying my service through kubernetes, which stops pods by first sending a SIGTERM event, and, if the pod is still alive after a timeout, it kills it using SIGKILL.

The behavior that I witness:
* the time it takes for the pod to terminate looks a lot like it's using the full 30s that kubernetes gives it = it ends up getting killed by SIGKILL
* the log for the stop of the service inside the SIGTERM handler never appears

So I think that one of following things might be happening here (that I can think of):

1. the handler never gets called because the process gets killed before it gets to handle it
2. the server takes so long to shut down, that kubernetes ends up killing it before it gets to logging the message
3. the shutdown event doesn't get set even after the server is done shutting down, causing kubernetes to kill it without logging the message

I see that the official examples contain a asyncio example for graceful shutdown, can we maybe get something like for the regular, non-asyncio case (https://github.com/lidizheng/grpc/tree/master/examples/python/helloworld)?

And for the long run: would it be possible to get something like wait_for_termination that is capable of handling graceful shutdown natively, without the developers having to add something on top of it?

Thanks for your help!

Amit Saha

unread,

Jan 15, 2022, 10:05:06 PM1/15/22

to grpc.io

I don't have a direct answer for your question, but if I would to investigate what may be going on, i would try running the server outside of Kubernetes and container. I would run it on a local system, send the SIGTERM signal, and see what happens first.

Lidi Zheng

unread,

Jan 19, 2022, 5:34:19 PM1/19/22

to grpc.io

Based on the behavior observed, there might be an issue with the signal handler registration. One common problem for Python signal handling is that the signal handler can only be hooked on the main thread, otherwise, it won't take effect.

Feel free to post a feature request to grpc/grpc GitHub repo, if you feel a certain change would benefit.

M T

unread,

Jan 20, 2022, 8:15:51 AM1/20/22

to Lidi Zheng, grpc.io

Hi Amit, Hi Lidi,

thanks for your replies! In particular the one about the main thread and signals in Python. I was able to successfully adapt my setup to make graceful shutdown possible :)

What it turned out to be: I am using a shell script which ends up calling the python code that starts the gRPC server. And it was that shell script that ate all of my signals - so instead of simply calling `python myscript.py`, I now call `exec python myscript.py`, which hands over control of the main process thread to python, enabling it to receive the signals sent to the main process.

Thanks a lot for your help :)

--
You received this message because you are subscribed to a topic in the Google Groups "grpc.io" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/grpc-io/6Yi_oIQsh3w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to grpc-io+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/2563551c-4ed8-417b-b4de-03417bde06a6n%40googlegroups.com.

Reply all

Reply to author

Forward