Status codes: Which might have started the server request?

Evan Jones

unread,

May 26, 2017, 5:33:33 PM5/26/17

to grpc.io

I've been looking into retry policies for my application, which meant I've spent some time reading the A6 client retries proposal [1], parts of the previous discussion thread [2], and the status code description [3]. Are there any of these status codes that are always safe to retry? In the terminology from A6, I'm trying to determine which requests have never possibly been seen by the server application logic (see transparent retries: [1]).

In particular: It wasn't clear from me in A6 how the client will determine if a "transparent retry" is safe. Will it use status codes? Something else?

It seems to me that according to the Status Codes document [3] the only code that should be safe to always retry, since the server application logic could never have executed, is UNAVAILABLE.

As for the opposite case, you should never retry UNIMPLEMENTED, UNAUTHENTICATED because they should be "permanent" errors that won't succeed when re-executed.

The rest are application dependent, since the request could have executed (partially or completely).

Did I get this right?

Thanks!

Evan

[1] https://github.com/grpc/proposal/blob/master/A6-client-retries.md#transparent-retries

[2] https://groups.google.com/forum/#!topic/grpc-io/zzHIICbwTZE

[3] https://github.com/grpc/grpc/blob/master/doc/statuscodes.md

Abhishek Kumar

unread,

Aug 14, 2017, 4:06:40 PM8/14/17

to grpc.io

The overall conclusion is correct that the client library will transparently retry only in the case where there is a protocol level guarantee that the server has not executed the RPC. However, this guarantee may be available even for some RPCs that return INTERNAL error code and the library is free to retry transparently in those cases.

I would say that applications (service owners) should be free to decide if UNIMPLEMENTED and UNAUTHENTICATED are also retriable. After all, auth systems can have their own transient failures and new implementations can be rolling out on the server side even as clients start using them. i.e., there are corner cases where specific service owners may claim that for their own services, these codes are retriable. That is why we allow service owners full flexibility in selecting the status codes that can be retried. (see: https://github.com/grpc/proposal/blob/master/A6-client-retries.md#retryable-status-codes )

Hope this helps,

-ABhishek

Evan Jones

unread,

Aug 14, 2017, 6:06:48 PM8/14/17

to grpc.io

That was very helpful, thank you.

My general conclusion is that basically I should not look at status codes, and instead decide:

a) My request can't be safely retried: I can assume the gRPC client MIGHT transparently retry in some cases, but otherwise its probably best to fail on any error.

OR

b) My request can be safely retried: In this case, I should probably just retry any error code, or do some request-specific inspection of the error.

This avoids needing to know a lot about the subtle details of the gRPC implementation. :)