On Friday, April 5, 2013 2:17:53 PM UTC+2, Ben Noordhuis wrote:
Isaac already explained it a few posts up. I'll replicate his example
here for posterity:
function doSomething(array) {
for (var i = 0; i < array.length; i++) {
mightThrow(array[i]);
}
}
In Python:
def do_something(items):
for item in items:
might_throw(item)
How many items have been processed after an exception? You don't know
unless you add *a lot* of error handling everywhere. That kind of
error handling is very easy to screw up and very difficult to debug.
(And you will screw up. I don't believe in infallible programmers.)
Tornado (and Python in general, async or not) are just as susceptible
to this issue as node.js is.
I agree that, in your examples, in Node and in Python, if an exception is thrown, your items are in an undetermined state.
But I have two objections/questions:
1/ If the list of items is a shared global state, I agree that this state is corrupted and the best strategy is probably to restart process. But in most applications, you don't have a lot of shared global state, and the code managing it is usually well reviewed. On the contrary, if the corrupted data is attached the current request, then there is no problem with catching the error in a domain, returning an HTTP 500 response, and return to the event loop. Do you agree? This is what Tornado do by default. Let's say the code managing global state is very short and easy to review. In that case, most bugs will happen in the code manipulating data attached to the current request. In that context, if an error happens, it's perfectly ok to go on with serving the next request.
2/ Your reasoning is, as I understand it: if there is an unexpected error, then the application state may be corrupted, then we have to restart the process. It tends to suggest that "unexpected error" equals "corrupted state", which justifies the "restart process" strategy. But it's perfectly possible to have some bad behaving code that silently corrupt the application state without raising any exception. By that, I mean that restarting process guarantee by no means a clean state. We can make a distinction between three kinds of errors:
a) Unexpected errors that corrupt only the request state -> They are caught by the domain which can safely clean the request and response data and return to the event loop.
b) Unexpected errors that corrupt the global state and raise an exception -> They are caught by the domain and I agree that, in this situation, restarting the process is the best option.
c) Unexpected errors that *silently* corrupt global state without raising anything -> They cannot be caught by the domain error handler.
It is very difficult, maybe almost impossible, to distinguish (a) and (b) in the domain error handler. Because of this, the current official advice is to restart Node.js process, which is the best error handling strategy for the (b) case. But the best strategy for the (a) case is to just clean the request and response data, and go on.
I would agree that restarting the process is the best strategy if it would enable us to remove all problems with global state. But it's not. I think that a lot of bugs with global state, maybe most, are silent and do not raise any exception (this is my (c) case). Because of this, restarting the process is just a bandage to fix a small part of global state issues.
We are making the most important cause of errors, the (a) case, very difficult to recover from, just to incompletely fix some issues with global states in the (b) case.
Do you agree with some part of the above reasoning?
Cheers,
Nicolas