So this is an interesting problem. It certainly is unintuitive behavior. I'm also not sure if we should change it. Let me start by explaining the internals of gRPC Python a little bit.
A server-streaming RPC call requires the cooperation of two threads: the thread provided by the client application calling
__next__ repeatedly (thread A) and a thread created by the gRPC library that drives the event loop in the C extension, which ultimately uses a mechanism like epoll (thread B). Under the hood, __next__ (thread A) just checks to see if thread B has received a response from the server and, if so, returns it to the client code. Normally, this works out just fine.
But thread B has some other responsibilities, including running any RPC callbacks. This means that in the scenario you described above, thread A and thread B are actually the same thread. So when __next__ is called, there is no separate thread to drive the event loop and receive the responses.
So that's the cause for the deadlock you described. Now, you might say that this is an easy problem to solve. Why not just run the callbacks on a
new thread? Then there is no deadlock in this scenario. True. But we've found that additional Python threads kill performance because they're all contending for the GIL. Doing this at the library level could slow down
many existing workloads. We've actually put
quite a bit of effort into reducing the number of threads we use in the library. There are some options we could consider to make this work out of the box without destroying performance, but it's going to take some thought and careful benchmarking.
For the moment, I'd recommend that you not initiate an RPC from the callback handler and instead use the callback handler just to notify another thread that your application has ownership of, whether that's the same thread as the unary RPC was initiated from or some other thread that you've created yourself.