core retryPolicy vs Go client

13 views

Skip to first unread message

Justin Israel

unread,

Aug 22, 2024, 2:31:59 PM8/22/24

to grpc.io

I have a question that first started as an issue reported to the grpc-go client, then a question on Gitter, and then a question on StackOverflow. But because the SO post was likely too opinion-based, it makes the most sense to just ask for clarification her on this list. The StackOverflow question is closed but is still a good detailed reference of my question.

The retryPolicy, as it was proposed, suggests that implementations should use a backoff that is fully randomized from 0 to the actual calculated max backoff. This is how the grpc-go client is implemented. However, in the core implementation, which also drives python/c++ clients, it is implemented as a growing exponential backoff with randomized jitter. I had a lot of issues with the grpc-go implementation, in trying get it to actually retry for long enough in an expected window, fiddling with the settings that seem to influence very little in terms of the min retry time. And I find that the core implementation is more correct and reasonable.

Which one is correct? The grpc-go client seems to do what the retryPolicy proto doc defines, but feels incorrect. And the core implementation does not follow the spec, and feels like a proper implementation of exponential backoff.