I encountered a weird behavior in gRPC.
The symptom - an active RPC stream is signaled as cancelled on server side (happens from time to time, I couldn't find any correspondence with other events in the environment) although the client is active and the stream shouldn't be closed.
It happens for streams initialized as response streams in RPC calls from both C++ and NodeJS clients. It happened on gRPC 1.3.6 and still happens on gRPC v1.6.0.
The problem does not reproduced easily - the system is executed under heavy load for many hours until this happens.
In my code, I have 2 main types of streams:
ServerCallContext context; // Received from RPC call as a parameter
// ...
context.CancellationToken.Register(() => System.Threading.ThreadPool.QueueUserWorkItem(async obj => { handle_disconnection(...); }));
The problem is noticeable when the control streams get disconnected.
Further investigation of the client (C++) and server (C#) logs of control stream disconnection, revealed to following:
Another note - I set the servers' RequestCallTokensPerCompletionQueue value for both C++/NodeJS client interfaces, to 32768 (32K) per completion queue.
I have 2 server interfaces (for node clients and C++ clients, which have different API), and 4 completion queues (for 8 cores machine). I don't really know if the 4 completion queues are global, or per-server.
Do you think it might cause those streams to be closed under heavy load?
In any case, my suspicious is on the C# server behavior - the CancellationToken is signaled for no apparent reason.
I didn't rule out network instability yet - although both clients and server are located on the same ESX server with 10-gig virtual adapters between them, so this is quite a long-shot.
Do you have any idea how to solve this?
Thanks!