(Python) gRPC Server leaking connections with infinite response stream

2,235 views
Skip to first unread message

er...@eric-fritz.com

unread,
May 14, 2017, 4:15:54 PM5/14/17
to grpc.io

I'm having an issue with client disconnection using a Python gRPC server with a unary-stream procedure. I may have missed something in the documentation, but multiple attempts of checking whether or not the client is still active have all failed. 

My hope is to find a way for the Servicer to detect when to stop generating values when a client disconnects. A client subscribes to a stream of updates once and receives all changes to the server state for an unbounded amount of time. The actual server implementation should also send values regularly on a heartbeat with low activity. A minimal example is given below that demonstrates my problem.

class TaskService(proto_grpc.TaskServiceServicer):
    def Subscribe(self, request, context):
        print 'subscribed'

        while True:
            time.sleep(1)
            print 'sending'
            yield proto.TaskState(state='IDLE')

        print 'finished'


def main():
    server = grpc.server(concurrent.futures.ThreadPoolExecutor(max_workers=3))

    proto_grpc.add_TaskServiceServicer_to_server(
        TaskService(),
        server,
    )

    server.add_insecure_port('0.0.0.0:3253')
    server.start()

    try:
        while True:
            time.sleep(1)
            print 'bump'
    finally:
        print 'stopping'
        server.stop(grace=0)
        print 'stopped'


There are two issues at play here. First, if a client subscribes and then is disconnected forcefully, the server will continue to send updates to that channel. This means if three clients have ever subscribed to this server, then a fourth may never connect in the future (due to the low number in the thread pool). This seems wrong, but I may be missing a setting somewhere.

Second, if a client subscribes and then disconnects, the server will never shut down completely. The process will hang in an un-killable state. This does not occur when a client never connects, or if a client connects but stays connected as the process shuts down.

The following output occurs when the server is killed after a client connects:

subscribed
bump
sending
bump
sending
^C
stopping
stopped
Traceback (most recent call last):
   
...
      time
.sleep(1)
KeyboardInterrupt
sending
<process terminated gracefully>

The following output occurs when the server is killed after a client disconnects:

subscribed
bump
sending
bump
<client is killed>
sending
bump
^C
stopping
stopped
Traceback (most recent call last):
  ...
    time.sleep(1)
KeyboardInterrupt
sending
<process hangs indefinitely>

I hope there is an easy solution as I simply misunderstood the correct use to implement a streaming response and apologize in advance if there is.

Nathaniel Manista

unread,
May 15, 2017, 4:45:10 PM5/15/17
to er...@eric-fritz.com, grpc.io
(This is also being discussed in this StackOverflow question.)

On Sun, May 14, 2017 at 1:15 PM, <er...@eric-fritz.com> wrote:
I'm having an issue with client disconnection using a Python gRPC server with a unary-stream procedure. I may have missed something in the documentation, but multiple attempts of checking whether or not the client is still active have all failed. 

My hope is to find a way for the Servicer to detect when to stop generating values when a client disconnects.

The ServicerContext's is_active and add_callback methods are the recommended means for your code to be informed of RPC termination.
This shouldn't be the case - if a disconnection happens, the server-side run-time code should detect it and refrain from further calling into response-iterator object given to it by your application. Of course there could be a bug in that logic, but that's what should happen. Notice that it is not guaranteed that "finished" will ever be printed in the case of an RPC that terminates with non-OK status (as would be the case in a disconnection).

This means if three clients have ever subscribed to this server, then a fourth may never connect in the future (due to the low number in the thread pool). This seems wrong, but I may be missing a setting somewhere.

You're right about there being a limit on the number of RPCs concurrently being serviced; if you've found a bug that prevents disconnected RPCs from being recognized as terminated, then you're right that their service would continue and would starve out additional "real" RPCs.

Second, if a client subscribes and then disconnects, the server will never shut down completely. The process will hang in an un-killable state. This does not occur when a client never connects, or if a client connects but stays connected as the process shuts down.

As you describe it this certainly smells like a defect. Are you able to demonstrate it in a single program? If so, in a single Python interpreter, or does it require a small family of communicating but separate processes?
Perhaps there's trouble of some sort - whether or not you're understanding the streaming semantics correctly should have no bearing on the fact that a server told to shut down should shut down cleanly and allow the process to exit rather than hanging indefinitely.

With what version of gRPC Python are you working, and with what version of Python and on what platform?
-Nathaniel

er...@eric-fritz.com

unread,
May 15, 2017, 5:17:34 PM5/15/17
to grpc.io, er...@eric-fritz.com
Thanks for the back-and-forth. Here is a minimal example you can play around with to see the behavior: https://gist.github.com/efritz/1417921e3d646184ee42606c069f7ada. It's difficult to demonstrate it in a single program as I need to forcefully disconnect a client. It needs at least two processes as it's currently written (but someone could certainly wedge it into one process with some creativity).

I'm using Python 2.7 on OS X (currently) but this issue was originally observed on a Debian system. As you asked in the SO thread, logging over print does not change the behavior, nor does the payload size.

Nathaniel Manista

unread,
May 16, 2017, 2:43:47 PM5/16/17
to Eric Fritz, grpc.io
On Mon, May 15, 2017 at 2:17 PM, <er...@eric-fritz.com> wrote:
Thanks for the back-and-forth. Here is a minimal example you can play around with to see the behavior: https://gist.github.com/efritz/1417921e3d646184ee42606c069f7ada. It's difficult to demonstrate it in a single program as I need to forcefully disconnect a client. It needs at least two processes as it's currently written (but someone could certainly wedge it into one process with some creativity).

I'm using Python 2.7 on OS X (currently) but this issue was originally observed on a Debian system. As you asked in the SO thread, logging over print does not change the behavior, nor does the payload size.

Thanks for the gist; let's migrate further investigation to this issue.
-N
Reply all
Reply to author
Forward
0 new messages