Hi,
Is there a chance to add details to errors generated by gRPC layer itself to distinguish different scenarios instead of forcing gRPC users to analyze error message client-side? Parsing error messages is error-prone as they are not standardized across different languages and can change over time without any warning (and strictly speaking are not a part of api).
Few examples we'd like to differentiate:
1.1. ResourceExhausted returned by python server in case of exceeding concurrency limit - it's safe to retry request to different server (server is known to not starting processing the request) and is quite common for python setups
1.2. ResourceExhausted returned by any server in case of too big metadata/message - it's useless to retry it as message size doesn't change
1.3. ResourceExhausted returned by any server in case of too big response - it's dangerous to retry non-idempotent request as server already processed it once
2.1. Unavailable returned by application logic to indicate some dependency being down, it can or can not be safe to retry depending on the specific scenario
2.2. Unavailable returned by client to indicate that all connections are down, it's safe to retry with a hope that new connection becomes established
2.3. Unavailable returned by client to indicate that current active stream was terminated
We're interested in having individual grpc-transport-specific error codes for individual cases for better attribution of failure scenarios (in metrics/tracing) to improve system's visibility and in some cases for reacting on them differently.
While some of this cases can be mitigated by ensuring that we always properly attribute our own application-specific errors, grpc-own errors are still indistinguishable in some scenarios [1].
Thanks