Standardization of rich error reporting via google.rpc.Status

2,312 views
Skip to first unread message

Chris Toomey

unread,
May 9, 2019, 3:35:26 PM5/9/19
to grpc.io
Per this SO post and some messages in this group, there looks to be agreement on and progress towards enabling richer gRPC error reporting via google.rpc.Status as a replacement for io.grpc.Status.

But nowhere on the gRPC website including the error handling page is this mentioned, which seems a major omission -- it takes a fair bit of googling to discover this. For the sake of other developers looking to adopt gRPC and needing to support richer error reporting, will somebody update that page to talk about this? Similarly, the use of the grpc-status-details-bin header/metadata key should be added to the gRPC over HTTP2 spec. (flagged as "experimental" or something if needed).

We're using akka-grpc and until I asked about supporting google.rpc.Status  yesterday it wasn't even on their radar.

It's really hard to tell where this stands in terms of official library support too. It'd be really helpful to list/track official library support status for this on the error handling page or somewhere else on the gRPC website.

Speaking of library support status, is this currently available, or being worked on, for the following languages?

Java
JS (grpc-web)
Objective C

thanks,
Chris

Chris Toomey

unread,
May 17, 2019, 1:08:28 AM5/17/19
to grpc.io
Would somebody involved with enabling richer gRPC error reporting via google.rpc.Status as a replacement for io.grpc.Status. please comment on this effort?

Penn (Dapeng) Zhang

unread,
May 22, 2019, 2:26:23 PM5/22/19
to grpc.io
The grpc core library io.grpc:grpc-core does not and should not depend on protobuf, so we can not use google.rpc.Status as a replacement of io.grpc.Status at the core library level. 

Chris Toomey

unread,
May 22, 2019, 10:14:15 PM5/22/19
to grpc.io
Sure, but if people are working on a concerted effort to support google.rpc.Status in a bunch of grpc language libraries (Java, Go, grpc-web/JS, ...) so that developers building grpc apps with those languages can use it for richer error reporting, this effort should be publicized so that developers can take advantage of it when appropriate.

If it's good enough for Google Cloud to standardize on, why shouldn't it be documented for others to standardize on?

Carl Mastrangelo

unread,
May 23, 2019, 11:39:41 AM5/23/19
to grpc.io
gRPC is not directly a Google product, it's donated to the CNCF.   Google has use cases where google.rpc.Status is sufficient, but there are a LOT of other companies that have different use cases and requirements.  The Proto status can be impractical on Android for instance, where Protos cause large binary bloat and method count.  

Also, recall that gRPC uses HTTP/2, which means trailers typically will contain the status.  Proxies, logging, and other request processors would have to unpack the proto to use the fields.  This makes it hard to use prebuilt tools to monitor the traffic.  As an example, consider two approaches to expressing a DEADLINE_EXCEEDED scenario.   In this scenario, your monitoring wants to know if the deadline was exceeded due to the server timing out, or because one of the dependent backends of your server timed out.   If it was the first, it should alert your monitoring, in the second case, the dependent backend should be alerting.   To express this dichotomy, you include either a custom field in status Proto, or a custom header in your gRPC trailing Metadata.  Lastly, suppose you have a proxy (like Envoy, or nginx) that can see the responses of your server and fire alerts.  

In the proto scenario, your proxy (or other alerter) cannot look into the fields, so it has to assume all DEADLINE_EXCEEDED errors are noteworthy, since it cannot inspect the proto.  In the trailing Metadata scenario, a lot more tooling can inspect the headers and make a decision.   

This scenario is a little contrived, but coming back:  If we standardized on the Proto, it would exclude some valid use cases, for a proto dependency, and make gRPC not protocol agnostic.   Google /can/ decide for all of it's code that Proto is okay, but it cannot dictate it is right for everyone.   (what about the gRPC+Thrift users?)

Chris Toomey

unread,
May 24, 2019, 12:43:47 AM5/24/19
to grpc.io
Thanks for your very helpful response Carl esp. on the downsides of using google.rpc.Status.

I understand that it cannot replace io.grpc.Status and misspoke above when I said that above. What I meant to say and ask for is that the use of google.rpc.Status as an alternative for appropriate use cases be documented so that future developers learning gRPC don't see https://grpc.io/docs/guides/error/ and think either 1) they can't return error details, or 2) they have to roll their own ad hoc convention for doing so.

More specifically, it'd be super helpful to add discussion of this pseudo-standard alternative, along with the constraints and pros/cons of using it,, and ideally status of gRPC library implementations, to https://grpc.io/docs/guides/error/ . Or if not that page, somewhere else that's linked to or easily discoverable to the new gRPC developer.

Chris Toomey

unread,
May 30, 2019, 12:57:39 AM5/30/19
to grpc.io
Why the reluctance to publish guidance on this key aspect of gRPC API design?

Has anybody besides Google built a gRPC/protobuf API that provides rich error reporting? How did you implement it, and would you have benefitted from some published guidance when you started?

Carl Mastrangelo

unread,
May 30, 2019, 1:17:56 PM5/30/19
to grpc.io
Because writing good documentation is time-consuming :)   At the time I joined the gRPC team, there was an HTTP RFC in the works for standardizing on a JSON error format (it didn't pan out).  Some of the original authors were exploring using it, and defining conversions between Googles own types and the "standard"-to-be.

I wouldn't mind publishing *my* experience error handling (and I do have a lot), but it's a higher bar to speak on behalf of the rest of the team.  There are some big issues that have answers that go either way:

1.  Should gRPC use HTTP error codes in addition to the one from the status?  Some of them seem easy enough to write (UNIMPLEMENTED -> 404). Some are more difficult (UNKNOWN -> ??   500?  )  
2.  What should happen with packed error values in google.rpc.Status?   Should they be expanded into HTTP fields?    Should they remain in the proto so processors can skip parsing?  

Those are the two I know of of the top of my head, and there's a bunch of sub problems in that space that you can see gRPC *did* take a stance on (like inner error codes and messages matching the encapsulating message's error codes).   Reasonable people could go either way on several of these issues, and different environments will bias people's answers.  (If you work in Go exclusively, you might think stack traces in errors are wasteful, if you work in Java, you might be more accepting).


I am not sure this stuff answers your question.   I really wish the gRPC had a mini design series talking about the technical decisions we have made over the years.  But as I started with: writing takes time.

Chris Toomey

unread,
May 30, 2019, 10:57:44 PM5/30/19
to grpc.io
What I was thinking of was something short and high level, added as a section to https://grpc.io/docs/guides/error/ to 1) make users aware that it's not just them that's thinking "wow, gRPC's error model is REALLY limited, how am I going to ...", and 2) mentioning some approaches that have been used to address this.

I'd be happy to take a stab at it and submit a PR, but something like this:

Options for Implementing Richer Error Models

The error model described above is the only official gRPC error model and is supported by all gRPC client/server libraries and is independent of the gRPC data format (whether protocol buffers or something else). As such it is necessarily very limited and lacks the ability to communicate error details in a standard way.

If you're using protocol buffers as your data format, however, many of the gRPC libraries now support the richer error model developed and used by Google as described here. If you're not using protocol buffers, but do want to continue supporting the standard over-the-wire gRPC error model, you could similarly use gRPC response metadata to convey error details by documenting your representation model and optionally creating helper libraries to augment the gRPC libraries and assist with producing and consuming the error details.

Some things to be aware of if you adopt such an extended error model, however are ...

--
You received this message because you are subscribed to a topic in the Google Groups "grpc.io" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/grpc-io/p_gCk1bn2JE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to grpc-io+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/grpc-io/f36ae8f4-a934-41bd-89f9-552445c41093%40googlegroups.com.

Carl Mastrangelo

unread,
May 31, 2019, 12:49:16 PM5/31/19
to grpc.io
PR would be great.    I would be happy  to do review, and find some other people to look as well.  
To unsubscribe from this group and all its topics, send an email to grp...@googlegroups.com.

Chris Toomey

unread,
Jun 3, 2019, 1:00:21 AM6/3/19
to grpc.io
Reply all
Reply to author
Forward
0 new messages