proposal: codify a common set of error codes + request/response headers for all services

Mike Connor

unread,

Mar 13, 2012, 10:56:45 AM3/13/12

to servic...@mozilla.org

Over the last few years of running Sync, there's a bunch of things we've learned about running REST-like services at scale, especially around having a clear, well-supported protocol with predictable behaviours baked into the API and clients. Now that we're starting to build out multiple services (between AITC, Sync 2.0, Token Server, and more this year), I would like to define a standard core set of errors and headers common to all services. This isn't meant to limit what an API can return, but to provide a reasonably strong toolbox for Ops, and a predictable, stable set of interactions with our services for developers.

I took at stab at modifying what we have in the Sync 2.0 draft into a generic base we can apply to Sync, AITC, and the token server (along with some of the "coming soon" services).

https://services.etherpad.mozilla.org/common-api-elements

-- Mike
_______________________________________________
Services-dev mailing list
Servic...@mozilla.org
https://mail.mozilla.org/listinfo/services-dev

Ryan Kelly

unread,

Mar 13, 2012, 11:37:01 AM3/13/12

to servic...@mozilla.org

On 13/03/12 07:56, Mike Connor wrote:
> Over the last few years of running Sync, there's a bunch of things we've learned about running REST-like services at scale, especially around having a clear, well-supported protocol with predictable behaviours baked into the API and clients. Now that we're starting to build out multiple services (between AITC, Sync 2.0, Token Server, and more this year), I would like to define a standard core set of errors and headers common to all services. This isn't meant to limit what an API can return, but to provide a reasonably strong toolbox for Ops, and a predictable, stable set of interactions with our services for developers.
>
> I took at stab at modifying what we have in the Sync 2.0 draft into a generic base we can apply to Sync, AITC, and the token server (along with some of the "coming soon" services).
>
> https://services.etherpad.mozilla.org/common-api-elements

Awesome. I also think we should revise the list of "400 bad request"
status codes from here:

http://docs.services.mozilla.com/respcodes.html#respcodes

It has a lot of weave-specific codes but doesn't offer much to do in the
case of generic errors like "some data failed to parse" or "you sent too
many items".

Perhaps we can bake in the more expressive error response format from
cornice rather than a single integer code?

Ryan

Mike Connor

unread,

Mar 15, 2012, 3:49:32 PM3/15/12

to Ryan Kelly, servic...@mozilla.org

On 2012-03-13, at 11:37 AM, Ryan Kelly wrote:

> On 13/03/12 07:56, Mike Connor wrote:
>> Over the last few years of running Sync, there's a bunch of things we've learned about running REST-like services at scale, especially around having a clear, well-supported protocol with predictable behaviours baked into the API and clients. Now that we're starting to build out multiple services (between AITC, Sync 2.0, Token Server, and more this year), I would like to define a standard core set of errors and headers common to all services. This isn't meant to limit what an API can return, but to provide a reasonably strong toolbox for Ops, and a predictable, stable set of interactions with our services for developers.
>>
>> I took at stab at modifying what we have in the Sync 2.0 draft into a generic base we can apply to Sync, AITC, and the token server (along with some of the "coming soon" services).
>>
>> https://services.etherpad.mozilla.org/common-api-elements
>
> Awesome. I also think we should revise the list of "400 bad request" status codes from here:
>
> http://docs.services.mozilla.com/respcodes.html#respcodes

Yeah, let's do that, and pull those into the same place as the headers/HTTP codes.

I'm wondering if it's viable/useful to describe some sort of "namespacing" of service-specific error codes that get used. That might be trying too hard to avoid the badly-written-client problem, but it's worth considering for all of two minutes, I think.

> It has a lot of weave-specific codes but doesn't offer much to do in the case of generic errors like "some data failed to parse" or "you sent too many items".
>
> Perhaps we can bake in the more expressive error response format from cornice rather than a single integer code?

I like the idea in theory, but what is our goal here? The Cornice error format is text-driven, and human readable. I think that's probably overkill/bloat, or useless, in this case, but I'm open to persuasion. I suppose the question is "what additional information is necessary or useful for the vast majority of error responses?" I can certainly see room for something like "verbose=1" as a developer option to drop detailed debugging information, but for the default case I'd optimize for small, simple, and free from any sort of data format overhead.

-- Mike

Toby Elliott

unread,

Mar 15, 2012, 4:25:23 PM3/15/12

to Mike Connor, servic...@mozilla.org

I would look to get rid of the response code list entirely. The vast majority of those response codes are very old-sync specific. Several have been replaced by better http status codes, several are things we don't care about any more, and several are really specific things that we threw in there later because there was nowhere better to put them at the time.

I went through them a while back and I think there were maybe 3 left after we filtered out everything that was being handled better elsewhere. For the few 400-block responses remaining, I'd rather see a brief JSON blob more akin to an exception (generic type and specific details) rather than arbitrary numbers. They're supposed to be pretty rare at this point.

Toby

Ryan Kelly

unread,

Mar 15, 2012, 4:33:58 PM3/15/12

to Toby Elliott, servic...@mozilla.org

On 15/03/12 13:25, Toby Elliott wrote:
> On Mar 15, 2012, at 12:49 PM, Mike Connor wrote:
>> On 2012-03-13, at 11:37 AM, Ryan Kelly wrote:
>>
>>> Awesome. I also think we should revise the list of "400 bad request" status codes from here:

>>> It has a lot of weave-specific codes but doesn't offer much to do in the case of generic errors like "some data failed to parse" or "you sent too many items".
>>>
>>> Perhaps we can bake in the more expressive error response format from cornice rather than a single integer code?
>>
>> I like the idea in theory, but what is our goal here? The Cornice error format is text-driven, and human readable.

I will admit to one of my goals being "use cornice for input
validation". Which isn't necessarily a good reason for changing the
details of a protocol...

> I would look to get rid of the response code list entirely. The vast majority of those response codes are very old-sync specific. Several have been replaced by better http status codes, several are things we don't care about any more, and several are really specific things that we threw in there later because there was nowhere better to put them at the time.
>
> I went through them a while back and I think there were maybe 3 left after we filtered out everything that was being handled better elsewhere. For the few 400-block responses remaining, I'd rather see a brief JSON blob more akin to an exception (generic type and specific details) rather than arbitrary numbers. They're supposed to be pretty rare at this point.

The remaining 400-Bad-Request cases I can think of are things like
"badly-formed JSON", "invalid header value", and "client upgrade required".

One could argue that there's not much a machine could do with these
errors, since they indicate serious implementation problems. So more
human-friendly output might be better.

Ryan

Toby Elliott

unread,

Mar 15, 2012, 4:45:32 PM3/15/12

to Ryan Kelly, servic...@mozilla.org

Badly-formed JSON and invalid headers are 400s, and adding a code for those specific problems seems not terribly useful. Someone who actually has the power to fix this (which is basically the developer) will want a response with more details. For everyone else, it doesn't matter.

I'm not actually sure how we plan to use "client upgrade required" (it's not in the code anywhere), but my best guess is that would just be a 415.

Toby

Gregory Szorc

unread,

Mar 15, 2012, 5:03:57 PM3/15/12

to servic...@mozilla.org

On Mar 15, 2012, at 12:49 PM, Mike Connor wrote:
> I like the idea in theory, but what is our goal here? The Cornice
> error format is text-driven, and human readable.

On 3/15/12 1:33 PM, Ryan Kelly wrote:
> One could argue that there's not much a machine could do with these
> errors, since they indicate serious implementation problems. So more
> human-friendly output might be better.

My personal best practice for error reporting is to *always* include 2
values: 1 that is easily machine readable and 1 that is meant for
humans. The two don't necessarily have to be isomorphic, but there is
definitely a correlation between them.

The value intended for machines should be strongly enumerated. It can be
an integer or a string constant. Since it is intended for machines, it
really doesn't matter: it just needs to be centrally defined and not
prone to change. I prefer integers because they are much less expensive
than strings. And, clients can always locally map an integer to a short
string constant.

The value intended for humans can be whatever you want. And, the format
can change over time because machines don't care: they shouldn't be
reading it. If you need metadata in your error, I typically wrap that up
inside a machine-readable data structure, like a JSON object. That can
exist as a 3rd error value component. Or, you can just put the
made-for-humans error message inside that object.

This approach is pretty common. Just look at the first line of an HTTP
response. See also many programming languages where exceptions are
defined by both their class/type and content/message within. Think of
service errors as representations of exceptions. How would a client
handle that exception?

Mike Connor

unread,

Mar 23, 2012, 10:27:10 PM3/23/12

to Gregory Szorc, servic...@mozilla.org

On 2012-03-15, at 2:03 PM, Gregory Szorc wrote:

> On Mar 15, 2012, at 12:49 PM, Mike Connor wrote:
>> I like the idea in theory, but what is our goal here? The Cornice error format is text-driven, and human readable.
> On 3/15/12 1:33 PM, Ryan Kelly wrote:
>> One could argue that there's not much a machine could do with these errors, since they indicate serious implementation problems. So more human-friendly output might be better.
> My personal best practice for error reporting is to *always* include 2 values: 1 that is easily machine readable and 1 that is meant for humans. The two don't necessarily have to be isomorphic, but there is definitely a correlation between them.

Have we suffered in any significant way for the lack of human-readable error strings? Do we think it's worth the overhead of doing text formatting and larger responses for the handful of developers who will ever look at the raw response? Is this perhaps a case we should handle via a debug flag?

-- Mike

Reply all

Reply to author

Forward