gRPC-Java deadline exceeded scalability

22 views
Skip to first unread message

sut...@gmail.com

unread,
May 21, 2019, 10:07:11 AM5/21/19
to grpc.io
Hi folks,

I'm interested in understanding how gRPC-Java might behave in a pathological situation where a vast number of client requests start timing out causing a deadline exceeded avalanche. If a gRPC java client with 50k "sessions" is partitioned away from the server, it might start seeing a very large number deadline exceeded. Subsequent retry attempts might also meet the same fate---adding another 50k deadline exceeded. 

What might be an effect on gRPC java due to such an avalanche of deadline exceeded? Are the deadlines reported to the caller in a timely fashion? What might be the impact on well-behaved sessions?

Client might do clever things like waiting a random backoff before retrying to avoid clustering. 

Does gRPC Java use a scalable approach to track deadlines? A paranoid client might rely on its own HashedTimerWheel-based deadline exceeded reporting mechanism without relying on gRPC. Do you see a need for such a thing?

Regards,
Sumant

Carl Mastrangelo

unread,
May 21, 2019, 1:19:49 PM5/21/19
to grpc.io
Responses inline


On Tuesday, May 21, 2019 at 7:07:11 AM UTC-7, sut...@gmail.com wrote:
Hi folks,

I'm interested in understanding how gRPC-Java might behave in a pathological situation where a vast number of client requests start timing out causing a deadline exceeded avalanche. If a gRPC java client with 50k "sessions" is partitioned away from the server, it might start seeing a very large number deadline exceeded. Subsequent retry attempts might also meet the same fate---adding another 50k deadline exceeded. 

What might be an effect on gRPC java due to such an avalanche of deadline exceeded? Are the deadlines reported to the caller in a timely fashion? What might be the impact on well-behaved sessions?

They are reported.  Both the client and server know the deadline, so there is no risk of one side not being aware.  If you are using Netty, the deadlines are enforced using the EventLoopGroup's scheduler.  I'm not sure if they use Netty's HashedWheelTimer.
 

Client might do clever things like waiting a random backoff before retrying to avoid clustering. 

gRPC has an in progress retry mechanism that does auto backoff.  By default, there are no RPC retries, so you would need to do this at the application level today.
Reply all
Reply to author
Forward
0 new messages