TL;DR I think Python gets this right.
Rust: Error shaming
I'm reading a lot of Rust code right now. A surprising fraction is devoted to reporting errors, and it impacts the overall efficiency of the Cloud Hypervisor project. If error handling in Rust weren't complex enough, there's the anyhow library to help throw a hierarchy of errors, where each module has the chance to wrap the error in a higher level of context. At the same time, error handling is far from perfect. For example, in the api_client library, all errors are handled except response too short, which results in a panic. Every other problem parsing a response returns an error. The funny thing is that panic generally results in a more useful error message, since it always includes a stack trace.
There are multiple problems here:
- Error handling is in the coder's face everywhere, requiring more work than I've seen in any other language.
- It slows down execution even when there are no errors
- No stack trace is printed unless 1) the application calls panic, or 2) some rare custom work is done to manually call a library to print a stack trace (I've not seen an example of this so far).
We all want logs to be useful, especially in a cloud context where we can't simply fire up gdb and break on the error. Rust's approach leaves responsibility for this to the programmer, resulting in what I'm calling "Error shaming". The result is far too much effort put into error handling, and the error messages still leave a lot to be desired.
Google C++
Google does not use catch/throw in C++ for our services, which I think is a good thing. Instead, we use Status<T> and StatusOr<T>, similar to Rust's Result<T, E>, and try to return an error all the way to the Stubby RPC handler. This is similar to Rust, but I feel like there is a significant difference.
We use the absl:::StatusCode, with only 16 different error types, which were selected after years of RPC error handling development. In 99% of the cases, custom errors are not used, and I would argue that in 70% of the cases where custom errors are used, it is simply random Googler kicking tires on the custom error system. With StatusOr, we return an error message as a string, not an arbitrary type. In Rust, we return a type E, and almost always define the Display trait for the error type so it can simply be converted to a string. I've not seen a counter-example so far.
Google's scheme still allows an error message to be wrapped: simply return a new message that includes the old. I see this all the time in log messages when there are errors handling an RPC.
Google C++ error handling still leaves a lot to be desired:
- It still impacts the speed of execution even when errors do not occur
- Ugly macros litter our code, like ASSIGN_OR_RETURN, which Rust builds in as the ? perator.
However, it's not too bad. IMO, Rust can be saved from error shaming, by simply printing stack traces from the lowest level error, returning one of the 16 standard codes and a string, and almost never defining custom Error types.
Java
Java's error handling is pretty good but a lot of work for the programmer. Having to declare all the error types that can be thrown by a function requires merging all the error types from every function you call. In short, it is a PITA. That said, errors reported in Java typically include good log messages, often stack traces.
Pros:
- Works well
- Doesn't lead to error shaming where every error thrown requires a custom type and custom Display traits
- Does not slow down execution
Cons:
- Too much work for the coder.
- Bloats executable size a bit
PythonI do not have Google Python readability, and my views on Python are from the open source projects I've worked on. Python does not require declaration of error types, and for the most part, folks simply throw strings, and don't write custom __repr__ methods for their custom error classes. I would say a healthy culture exists here: we all hate Python functions that throw exotic errors that we can't simply catch and print. It goes against the spirit of Python to require that the user understand complex details of dependencies like that.
Rune
Runt's error handling is TBD. We have the throw keyword, but so far, no catch. Instead throw always panics and currently doesn't even print a stack trace. I would like for error handling in Rune to:
- Be lightweight in terms of how much code users need to write
- Be lightweight in terms of impact on runtime
- Automatically include stack trace info, even when catching errors
- By default, throw a standard absl::StatusCode if users call throw with a status code
- Only allow an error message, not an arbitrary class. I never felt constrained when using Google C++ to return StatusOr.
- Allow for alternative status codes for those rare cases where it is appropriate, e.g. returning an HTTP response status code.
- Automatically catch errors in RPC handlers so errors are propagated to the caller.
Implementing catch can be done a number of ways. Unwinding the Rune stack requires only:
- freeing of temporary dynamic arrays
- decrementing reference counts of ref counted objects
Freeing of temporary arrays can be almost free. Instead of allocating them on the stack, we can have a separate stack for just temp arrays. try statements would save the temp array stack position on the stack, and when catching an error, free all the ones that were allocated after the try statement was executed. I think this would have zero, or maybe even a positive impact on runtime speed.
Decrementing ref counts is trickier. The most efficient way in terms of runtime is to define "landing pads" for stack unwinding in every function that references ref-counted objects. This won't work for a C/C++ backend code generator, it bloats the binary for the LLVM IR backend, and the extra complexity is not clearly worth it, because ref counted objects being referenced in Rune should be rare.
An alternative to landing pads is a stack of object references and their unref functions. This would be compatible with a C/C++ backend where we use setjmp and longjmp to implement catch and throw. Good Rune data models should have almost no ref counted classes.
Bill