I'll have to get back on that since I solved my "real" problem for now with a rewrite. But I do think that there's a gremlin in the grpc server code. First of all I checked that the time of the server and the clients where within 1 ms. Then I added logging on all three nodes. When I kill a client - the disconnect notification pop up in the other client BEFORE the server gets the notification. Actually a "new" rpc from the disconnected client gets to the server before the disconnect notification of the failed rpc pops up. All this seems strange to me.
While this was interesting and somewhat annoying my "real world" problem was that I was streaming from kafka and that client has a 3 second set up time. I solved the the problem by caching and sharing consumers that were at somewhat the same position in the stream. Way more logic but startup times went from 3s to less that 1ms.
So if you think there might be a bug I can give a try of chasing it down, but I'm new to grpc and have no knowledge of the internals. Otherwise I'm off the hook.
Thanks