Error handling guidelines.

22 views
Skip to first unread message

Philippe Veber

unread,
May 25, 2014, 5:13:00 PM5/25/14
to Biocaml
Hi everyone,

I think it would be good to extract from all our discussions on error handling a short document, perhaps keeping motivations for an annex. I would like to propose a first version.




Errors occur when a function is called outside of its domain of definition, ie with arguments do not satisfy its preconditions. If a precondition of a function M.foo : u -> v is violated:

1. error handling should be based on exceptions or the result monad defined in core.
2. if the precondition is admittedly easy to check before calling the function, raise a generalist exception like Failure, Assertion_failed or Invalid_argument.
3. on the contrary if verifying the precondition of the function is roughly as difficult as what the function does, provide two versions of the function
  - foo : u -> (v, e) Result.t, where e is a type describing the error
  - foo_exn : u -> v that raises an exception E of e
4. the type e and the exception E should be defined in a module M.Error

Examples:
  - parsers are functions that assume a syntactically correct input, however this cannot be checked without parsing the input: apply rule 3
  - finding an element in a collection assumes this element is in the collection. But checking that precondition is exactly what the function is about: apply rule 3
  - building an array of negative length is not possible, but checking if an integer is positive is trivial and should be done before: apply rule 2

It roughly describes the current situation, except for rule 4. It is however not in sync with the contents of the wiki page [1], which presents result monad as the default error handling mechanism in biocaml. To be honest, I'm not confortable with this position, because it more or less implies a monadic programming style instead of exceptions, which is still a debated question in the OCaml community. Moreover we should not forget that biocaml is geared towards a community where people very seldom know of functional programming, not to mention monads. Keeping coding style simple and accessible for new comers should be a primary concern. Asking new comers to FP/OCaml to grasp monadic programming at the same time is too much IMO.

How about this?

[1] https://github.com/biocaml/biocaml/wiki/Error-Handling

Ashish Agarwal

unread,
May 25, 2014, 5:55:11 PM5/25/14
to Biocaml
Hi Philippe. Thanks for getting this discussion started. My comments below.


1. error handling should be based on exceptions or the result monad defined in core.

I agree.

 
2. if the precondition is admittedly easy to check before calling the function, raise a generalist exception like Failure, Assertion_failed or Invalid_argument.

I think you're saying in this case we should have only an exception-ful version. Why should it differ from 3, where you recommend foo returns Result and foo_exn throws an exception.

Also, what is the argument for using a general exception? I think you're right, but let's state the reasoning. I think it is: if you're raising an exception, you already decided not to care about error handling (else, assuming you except my previous point, you would have used the version returning Result). Thus, there's little point in defining a special exception. Just make the error string informative.


3. on the contrary if verifying the precondition of the function is roughly as difficult as what the function does, provide two versions of the function
  - foo : u -> (v, e) Result.t, where e is a type describing the error
  - foo_exn : u -> v that raises an exception E of e

I agree.


4. the type e and the exception E should be defined in a module M.Error

I've spent quite a lot of time now playing with different error types in large code bases. I'm really converging on using Or_error. This seems the best compromise, and was promoted as the good choice by Jane Street recently:


Using stronger types, like polymorphic variants, is very cumbersome. I tried it for months. I had fun, but I'm getting tired of it. Most programmers would run scared from it.

Minor point: I don't think exceptions have to be defined in a sub-module M.Error. They can just go directly in M. If we also decided Or_error is the way to go, then we don't have any special error types, and thus M.Error goes away completely.


wiki page [1], which presents result monad as the default error handling mechanism in biocaml. To be honest, I'm not confortable with this position, ... biocaml is geared towards a community where people very seldom know of functional programming

I agree. I hope to be proven wrong, but I doubt most bioinformaticians will ever learn what a monad is. It's rather easy to provide both versions of a function, so I don't see any problem supporting both beginners and experienced FP programmers. I'm happy with our current solution of providing foo and foo_exn, and it is easy enough to provide both versions.

Philippe Veber

unread,
May 26, 2014, 4:47:48 AM5/26/14
to Biocaml
2014-05-25 23:54 GMT+02:00 Ashish Agarwal <agarw...@gmail.com>:
Hi Philippe. Thanks for getting this discussion started. My comments below.


1. error handling should be based on exceptions or the result monad defined in core.

I agree.

 
2. if the precondition is admittedly easy to check before calling the function, raise a generalist exception like Failure, Assertion_failed or Invalid_argument.

I think you're saying in this case we should have only an exception-ful version. Why should it differ from 3, where you recommend foo returns Result and foo_exn throws an exception.
It seems that even extensive users of the result monad (I mean Janestreet here) still use exceptions. For instance as I said in an earlier thread on the subject [1], the Array module in core is not redefined as:

module Array : sig
  val make : int -> 'a -> ('a array, [`negative_array_length]) Result.t
  ...
end

which for me means that even people who are not big fans of exception still use them for a certain purpose. Being myself in favor of using exceptions all the time (mainly because ocaml has a good support for them [2]), it's not clear to me how they draw the line. I just tried to propose a criterion which I found relevant: use result types whenever you think a precondition is difficult to check (because users will tend not to check it, as it's difficult).

[1] https://groups.google.com/forum/#!topic/biocaml/ObTjYkC_sSg
[2] let's say, except for not including exceptions in the type of function. BTW, anyone knows if this would be compatible with type inference?


 

Also, what is the argument for using a general exception? I think you're right, but let's state the reasoning. I think it is: if you're raising an exception, you already decided not to care about error handling (else, assuming you except my previous point, you would have used the version returning Result). Thus, there's little point in defining a special exception. Just make the error string informative.
You're right that this is not very convincing. I just thought that preconditions that are easy to check tend to be simple predicates on arguments that maybe do not deserve to define a specific exception for that. But that may not be always the case.

 


3. on the contrary if verifying the precondition of the function is roughly as difficult as what the function does, provide two versions of the function
  - foo : u -> (v, e) Result.t, where e is a type describing the error
  - foo_exn : u -> v that raises an exception E of e

I agree.


4. the type e and the exception E should be defined in a module M.Error

I've spent quite a lot of time now playing with different error types in large code bases. I'm really converging on using Or_error. This seems the best compromise, and was promoted as the good choice by Jane Street recently:


 I had read that blog post, but was not convinced at all. With monadic style, using result type is after all very close to exception handling: you can ignore an error all along an algorithm and decide precisely where you want to deal with it. For example,

f1 () >>= fun x ->
f2 x >>= fun y ->
f3 y >>= function
| Ok z -> z
| Error e -> g e

is rather similar to

try f3 (f2 (f1 ()))
with Error e -> g e

The only difference I see between exceptions and result types is that *when* you decide to handle errors, the compiler can do an exhaustivity check with result types while it can't with exceptions. But first, as I mentionned earlier, even in core exceptions are still used, so you still have to consider them somehow. And second, if the error type in the result is always a string, then you don't need exhaustivity check. For those reasons, I fail to see the difference between using exceptions and Or_error.t. If a function returns Or_error.t, you are just saying that it can fail with a string, but just as with exceptions, you can ignore this possibility with >>= until you decide to handle the error which is a string. How is that different to have the function raise Failure in case of an error, ignore this possibility until you decide to set up a try .. with expression?

Of course if the error type is rich then yes, result types are superior to exceptions in terms of security, because when you decide to handle the error, you can be sure to treat all cases (once again, you can't be sure that there won't be exceptions also). But this behaviour could very well be emulated: result based functions suppose no exception will be raised, and this cannot be checked by the compiler; another convention could be that a given function can raise only one exception, and then you can again rely on the exhaustivity check with the argument of the exception:

try f x
with E_f e -> match e with
| ...
| ...

Am I overlooking something here?



Using stronger types, like polymorphic variants, is very cumbersome. I tried it for months. I had fun, but I'm getting tired of it. Most programmers would run scared from it.
I do believe you, and my guess is that people in Janestreet have come to the same conclusion; for that reason the Or_error type looks to me as a methodological retreat. Disclaimer: I deal with a ridiculously small codebase, so I'm probably playing out of my league with those judgements :o)). Let's say that I do not see the point here. I think the fundamental problem is that result type approaches are trying to combine two flows of information in the type system: computed values and errors. However as classically recognized, error handling is often done several levels above the original location of the error. Which means the two flows follow a rather different "network", and while this is by construction not a problem to exceptions, this results in a lot of plumbing for result monads. Notably, in addition to write your algorithm, you also have to combine error flows, meaning you're doing two things at the same time. Monad notations makes it easier, but this is still complicating things, especially when you have to combine rich error types.

So it seems to me that using Or_error here would defeat the primary purpose of using result monads. Using rich error types is certainly heavy with result types, but is it when using exceptions?
 

Minor point: I don't think exceptions have to be defined in a sub-module M.Error. They can just go directly in M. If we also decided Or_error is the way to go, then we don't have any special error types, and thus M.Error goes away completely.

If we keep rich error types, it is better to have them grouped in a submodule, and it seems logical to define each corresponding exception next to it. Of course if we don't, none of this submodule would be useful anymore, and we could drop M.Error completely.

 


wiki page [1], which presents result monad as the default error handling mechanism in biocaml. To be honest, I'm not confortable with this position, ... biocaml is geared towards a community where people very seldom know of functional programming

I agree. I hope to be proven wrong, but I doubt most bioinformaticians will ever learn what a monad is.
Let's keep that for stage 2 of our world domination plan ;o).
 
It's rather easy to provide both versions of a function, so I don't see any problem supporting both beginners and experienced FP programmers. I'm happy with our current solution of providing foo and foo_exn, and it is easy enough to provide both versions.

Yes and my point here is mainly a matter of presentation. Ideally we should come with simple snippets illlustrating the features of biocaml, and I thought avoiding monadic notation would be best. The error handling page on the wiki (which has admittedly been written a long time ago) says that
"Now that the result type is the default, client code (within Biocaml and externally) will essentially have to be monadic." and I really think we should not keep this position. To be even more of an extremist, I'd argue that result types are useful *only* when there are used scarcely enough so that you don't have to resort to monadic notation. Because then, they are a great tool to stress places where a detailed error handling is necessary (cf the prototypical Map.find example).
 

--
You received this message because you are subscribed to the Google Groups "biocaml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biocaml+u...@googlegroups.com.
To post to this group, send email to bio...@googlegroups.com.
Visit this group at http://groups.google.com/group/biocaml.
To view this discussion on the web visit https://groups.google.com/d/msgid/biocaml/CAMu2m2Jng8Am%3DxHsLGB1xy3Ec5i8XucR1nxubCeqnTUE9Tgj6A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Ashish Agarwal

unread,
May 26, 2014, 9:31:41 AM5/26/14
to Biocaml
I think we can separate out these two issues:

A) Whether to raise exceptions or return Result types.
B) What error information should be carried by either the exception or Result type.

Your argument about A) is that the Result monad doesn't provide any enhanced flow control over exceptions. In both cases, you mostly ignore the error and can look at it if you want to. I mostly agree with this, but one difference is that the Result type gives you compiler enforced documentation about which functions return errors. I think that's the only difference; do you agree or have I missed something? If so, the question is how important is that?

The second item B) is orthogonal to the decision on A). With either exceptions or Result types, we can use plain strings or highly precise types. Highly precise types are more difficult to think of and maintain, thus slowing down development. Strings are not amenable to matching, so keep us worried about safety. I think there's a middle ground that gives us a simple type and safety; Core's Error.t. When you're being lazy and want to rapidly prototype some code, you can make an Error.t from a plain string. When you want more precise information, you can easily construct an Error.t from a complex data value (by using sexp). If you need to handle the error, you can deserialize the sexp and get back your structured value. And apparently, constructing Error.t's is more efficient than constructing strings outright because your strings are constructed lazily, only if the error is ever looked at.

Your point about raising exceptions if the error can easily be checked has finally sunk in. Thanks for explaining it again. I agree with this. So many functions can raise an error, and we really don't to rename all of them to _exn. But then, for consistency, why not reverse our naming convention. How about removing the _exn suffix and instead add _res versions on functions that return Result. This would be much nicer for demos and newcomers. We really don't want to demo a script that has _exn all over the place, and, though a trivial issue, this kind of syntactic dirtiness is what keeps less experienced programmers away from otherwise good languages.







Philippe Veber

unread,
May 26, 2014, 12:19:59 PM5/26/14
to Biocaml
2014-05-26 15:31 GMT+02:00 Ashish Agarwal <agarw...@gmail.com>:
I think we can separate out these two issues:
I mostly agree, except the particular case that when using both Result types and a rich error type, the exhaustivity check helps you spot unhandled cases, giving you enhanced security (but at some syntactic cost). In that case, there is some synergy which provides more than in other combinations.


 

A) Whether to raise exceptions or return Result types.
B) What error information should be carried by either the exception or Result type.
ok!
 

Your argument about A) is that the Result monad doesn't provide any enhanced flow control over exceptions. In both cases, you mostly ignore the error and can look at it if you want to. I mostly agree with this, but one difference is that the Result type gives you compiler enforced documentation about which functions return errors. I think that's the only difference; do you agree or have I missed something? If so, the question is how important is that?

I agree with your summary. My point here is that:
- with Or_error, you don't say much, just that the function may fail, which is anyway the case of a large fraction of the functions we use everyday
- yes the error is documented in the code, but at the expense of verbosity and readability, while with monadic notation you will silently ignore errors most of the time.
- the compiler does not check if an exception can popout anyway  (even in Core, many errors are handled with exception and so not documented via the signature).

After all, this is certainly yet again a matter of taste and habit, deciding if the added verbosity is outweighted by the information conveyed in the type. And that's really why I think having both variants is a wise decision. I'm just lobbying to make exceptions the default :o).
 

The second item B) is orthogonal to the decision on A). With either exceptions or Result types, we can use plain strings or highly precise types. Highly precise types are more difficult to think of and maintain, thus slowing down development. Strings are not amenable to matching, so keep us worried about safety. I think there's a middle ground that gives us a simple type and safety; Core's Error.t. When you're being lazy and want to rapidly prototype some code, you can make an Error.t from a plain string. When you want more precise information, you can easily construct an Error.t from a complex data value (by using sexp). If you need to handle the error, you can deserialize the sexp and get back your structured value.

I acknowledge I missed that very point, and thought you could only access the string, my bad. This makes Error.t a lot more appealing to me! One question then: when you want to deserialize a complex data value, how do you know which deserializer you should apply?

I think what's at stake here is to have a reusable representation of errors in both styles. It seems doable with Error.t.
 
And apparently, constructing Error.t's is more efficient than constructing strings outright because your strings are constructed lazily, only if the error is ever looked at.

Your point about raising exceptions if the error can easily be checked has finally sunk in. Thanks for explaining it again. I agree with this. So many functions can raise an error, and we really don't to rename all of them to _exn. But then, for consistency, why not reverse our naming convention. How about removing the _exn suffix and instead add _res versions on functions that return Result.
I'd be in favor of that, definitely. For the result variant, how about just priming it?

val parse : string -> t
val parse' : string -> t Or_error.t
 
This would be much nicer for demos and newcomers. We really don't want to demo a script that has _exn all over the place, and, though a trivial issue, this kind of syntactic dirtiness is what keeps less experienced programmers away from otherwise good languages.

Sure.
 

Ashish Agarwal

unread,
May 26, 2014, 3:40:30 PM5/26/14
to Biocaml
On Mon, May 26, 2014 at 12:19 PM, Philippe Veber <philipp...@gmail.com> wrote:


This makes Error.t a lot more appealing to me! One question then: when you want to deserialize a complex data value, how do you know which deserializer you should apply?

You don't, so it isn't as safe as a rich type from the onset. However, I think there are easy ways to alleviate this. For example, we could follow a standard that any Error.t's created in module Foo include the string "Foo" as their first component (or something like that). We could even define a strong type but always serialize it to Error.t before returning it. Then you've almost got what we have now without crazy type signatures.

Except not fully. With Error.t, your error handler will always need a catch all case because you never know if the Error.t can't be deserialized to something meaningful. That's not so bad. It's also always the case that there is some exception you might not be handling.

Good thing is we don't have to figure this out right away. We can start using ad hoc Error.t's, get on with coding, then go back and improve their structure later. When we do, we won't be changing any types, so the changes won't be so intrusive.


For the result variant, how about just priming it?

val parse : string -> t
val parse' : string -> t Or_error.t

I'd be okay with that.

We're almost reaching consensus here. If anyone else has thoughts, please let us know soon.

Sebastien Mondet

unread,
May 26, 2014, 3:46:08 PM5/26/14
to bio...@googlegroups.com
For the naming, since we stick to Core, I think we should stick core's "_exn" suffix
(and the "primes" are pretty difficult to read also)


 

--
You received this message because you are subscribed to the Google Groups "biocaml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biocaml+u...@googlegroups.com.
To post to this group, send email to bio...@googlegroups.com.
Visit this group at http://groups.google.com/group/biocaml.

Philippe Veber

unread,
May 27, 2014, 9:53:57 AM5/27/14
to Biocaml
You're right Sébastien, keeping in sync with core is important, as RWO is a good material to get started with ocaml. Here is a new version of the guidelines, inspired by the parallel discussion I started on the caml list [1]

General principle
============

Errors occur when a function is called outside of its domain of definition, ie with arguments that do not satisfy its preconditions. Some preconditions are guaranteed by the compiler through argument types, but the type system is not strong enough for all kinds of preconditions. Two cases should be distinguished:
  (a) the precondition is easy to check before calling the function (e.g. an integer should be positive)
  (b) verifying the precondition roughly is as difficult as what the function does (in terms of algorithm structure or computational complexity)

In case (a), the function should raise an Invalid_argument exception, indicating that this error should be considered as a bug and fixed. The exception is not meant to be caught, but rather to help debugging thanks to the backtrace (much like an Assert_failure).

In case (b), the function should return a variant type (like option or Result.t) and should not raise an exception when the precondition is violated. In fact, if some precondition P is difficult to check, we expect users will likely not check it, so it is unsafe to make it a real precondition. Instead, we extend the domain of the function, which will return an error-representing value when P is not satisfied (None or Error _ respectively). Here are a few examples:
  - List.find assumes the searched element is in the list, but checking that is precisely what the function is about. So the function should return an option type signalling the case where the element is not in the list with the constructor None
  - in a parser you won't know if the input is syntactically correct before actually performing the parsing. So a function of_file : string -> t should be better typed of_file : string -> (t,e) Result.t, where e is a type representing a parsing error. However, the function should raise an Invalid_argument exception in case the file does not exist.

Even in case (b), the library may propose a version where failing to meet the precondition is signalled through an exception. In that case, the function should be suffixed by "_exn" and the exception should convey the same error type than the default version.


I'll continue with specific instructions on the error types. Is that ok for you up to now?



Ashish Agarwal

unread,
May 27, 2014, 10:13:43 AM5/27/14
to Biocaml
I'm quite happy with this!

Clarification on names: In a), function names will not have any suffix, i.e. foo will raise an exception (we will not call it foo_exn). In b), function foo will return Result and foo_exn will raise an exception.




Ashish Agarwal

unread,
May 27, 2014, 4:39:40 PM5/27/14
to Biocaml
I've pushed a new "fastq" branch trying out this proposal. You can ignore the Error sub-module, the exceptions, and the Transform sub-module, all of which are not being used by the new API.

Note there's actually little point in providing _exn versions of functions because opening Core provides ok_exn, which trivially lets you convert from Result to exception-ful style. However, then you can't control which exception is thrown. It is always one defined internally to Core's Error module, and IIUC they deliberately disallow handling this exception by not exposing it in the mli.

Philippe Veber

unread,
May 28, 2014, 9:41:57 AM5/28/14
to Biocaml
2014-05-26 21:40 GMT+02:00 Ashish Agarwal <agarw...@gmail.com>:
On Mon, May 26, 2014 at 12:19 PM, Philippe Veber <philipp...@gmail.com> wrote:


This makes Error.t a lot more appealing to me! One question then: when you want to deserialize a complex data value, how do you know which deserializer you should apply?

You don't, so it isn't as safe as a rich type from the onset. However, I think there are easy ways to alleviate this. For example, we could follow a standard that any Error.t's created in module Foo include the string "Foo" as their first component (or something like that). We could even define a strong type but always serialize it to Error.t before returning it. Then you've almost got what we have now without crazy type signatures.
Sorry I don't get it. IIUC, as soon as I want to get back the rich error type value the Sexp hidden in the Error, I have to do it without safety net, that is without compiler guarantee, right? So to me this approach looses something important wrt to the rich type style: now all errors are basically represented as strings, and accessing to more structured representations can be considered unsafe in a certain sense. One big argument in favor of Result.t was refactoring: if an error changes, you have the compiler complaining in all places where your code should be modified.

I understand though that using monadic style with rich error types is complicated, as all the successive error types in a sequence of calls should unify (in order to use >>=). If we refer to the nomenclature used by Yaron Minsky [1]:

- style 1/2a is nice for detailed error handling but leads to clumsy code when you chain several possibly failing functions. Alternatively, if you carefully choose your types then they can unify but then you fall in the dreadful polymorphic variant hell, with lengthy/unreadable error types

- style 2c is very relaxing when chaining many possibly failing functions, but prevents a satisfying access to structured error types

This is certainly a tough choice to make! I mean for a parsing error, along with a string explaining the error, you'd certainly want to know the line number as an int... But then should we provide three versions of a function?

type parsing_error = string * int
exception Parsing_error of parsing_error

val parser_res : string -> (t,parsing_error) Result.t
val parser_exn : string -> t (* raises Parsing_error _ *)
val parser_err : string -> t Or_error.t

After all maybe we could afford providing each flavor in its own submodule (Biocaml, Biocaml.Exn, Biocaml.Err resp.) as we already suggested? I think there are indeed three use cases: detailed recovery of errors, prototyping and safe detection of errors with associated message, respectively. In that case, we'd impose version (2) as the bare minimum, since it can be used to make the other two easily.

Hope I'm not overthinking things! :o)))

[1] https://blogs.janestreet.com/how-to-fail-introducing-or-error-dot-t/

Ashish Agarwal

unread,
May 28, 2014, 11:58:43 AM5/28/14
to Biocaml
On Wed, May 28, 2014 at 9:41 AM, Philippe Veber <philipp...@gmail.com> wrote:

as soon as I want to get back the rich error type value the Sexp hidden in the Error, I have to do it without safety net, that is without compiler guarantee, right? So to me this approach looses something important wrt to the rich type style: now all errors are basically represented as strings, and accessing to more structured representations can be considered unsafe in a certain sense.

Yes, that's correct. You definitely lose some safety, but I think not all. And yes, you also lose compiler support for refactoring, but again not completely. See more below.


One big argument in favor of Result.t was refactoring: if an error changes, you have the compiler complaining in all places where your code should be modified.

Not "in favor of Result.t". Or_error.t is also Result.t. You mean in favor of strong error types, as opposed to weak types like string or Error.t. The only strong type that also composes is polymorphic variants, and they just get too crazy. (Also objects, but they would have the same benefits and disadvantages, so the argument is not different.)

 
- style 1/2a is nice for detailed error handling but leads to clumsy code when you chain several possibly failing functions. Alternatively, if you carefully choose your types then they can unify but then you fall in the dreadful polymorphic variant hell, with lengthy/unreadable error types

Right, avoiding this hell is the benefit of using Error.t. If you compare the Fastq module in the 'master' and 'fastq' branches, you'll see both use Result. The difference is only that 'master' uses polymorphic variants and 'fastq' uses Error.t The Error.t version is *much* cleaner syntactically, and this is in an isolated module. When you start composing many functions, the difference is even more stark.


- style 2c is very relaxing when chaining many possibly failing functions, but prevents a satisfying access to structured error types

I lost track of what 2c is. If you mean, using 'a Or_error.t = ('a, Error.t) Result.t, then yes, you do lose some access but not necessarily all. See below.


type parsing_error = string * int
exception Parsing_error of parsing_error

val parser_res : string -> (t,parsing_error) Result.t
val parser_exn : string -> t (* raises Parsing_error _ *)
val parser_err : string -> t Or_error.t

I'm proposing that we don't provide parser_res. As given, it is basically unusable because parsing_error doesn't compose with other error types. So at the least you have to do:

type parsing_error = `parsing_error of string * int

That's exactly what the Fastq module in 'master' does, and it is quite hairy. Requiring to define such types right away for every module is a significant deterrent to contributors.

In my view, parser_err is better, not strictly better, just a better compromise. Here's how it could be implemented and used:

let parser_err s =
  if (* parse successful *) then
    Ok t
  else (* parse error *)
    error "string explaining error" line_num <:sexp_of< string * int >>

Benefits of the above implementation:

* Easy to write due to functions like `error` and sexp_of syntax extension.
* Simpler type signature.
* Composes.
* You don't have to predefine any types (though you may want to, see below).

Now, the disadvantage is in handling this error. If you used parser_res, you could do:

match parser_res with
| Ok t -> t
| Error (`parsing_error (msg, line_num)) -> (* handle error *)


But with parser_err, you have to do:

match parser_err with
| Ok t -> t
| Error err -> Error.sexp_of_t e |> handle_sexp

The problem is in the "handle_sexp". All this function gets is a sexp, as where with parser_res, you got a value precisely of type parsing_error.

But my feeling is that this isn't so bad. Here's how you can implement handle_sexp:

let handle_sexp sexp =
  <:of_sexp< string * int >> sexp  |> (* do something with string * int *)

If the sexp isn't a `string * int`, you'll get an exception, or you can catch it and again return an Or_error.t. If you're at a point where the error comes from potentially several places, then you can do:

let handle_sexp sexp =
  try
    <:of_sexp< string * int >> sexp |> (* do something with string * int *)
  with _ ->
    try
      <:of_sexp< string * float >> sexp (* do something with string * float *)
    with _ ->
      <:of_sexp< Fastq.parsing_error >> sexp (* do something with parsing_error *)

You can consider as many error types as you want. Notice that you don't lose access to all benefits of types. If you change the definition of the Fastq.parsing_error type, then you'll get a compiler error because the "do something with parsing_error" code will likely not type check anymore.

You do lose exhaustivity checking. You may not have covered all errors that could have reached the point where handle_sexp is being called. Well, just make handle_sexp also return Or_error.t (unlike the above implementation that raises an exception), and now all you're doing is letting errors you didn't handle propagate. Seems a reasonable compromise.

Philippe Veber

unread,
May 30, 2014, 9:33:33 AM5/30/14
to Biocaml
Thanks Ashish for fully explaining your proposal, which certainly makes sense. I think I grasped the trade-off you propose, but I am unconfortable with dropping the style carrying maximum information and safety (says the guy who made a strong call in favor of exceptions a few days ago).

I think we're taking the problem in a wrong way: it's not possible to have at the same time exception-less style with both composable and detailed error types, leading to concise code/type signature plus maximum help from the compiler. The solution your previous message advocates is to find a datastructure that offers the best possible trade-off, and I think yes Or_error is probably a good candidate. But IMO we could have a better experience by clearly separating various use cases and proposing a different API for each. For instance:

- general API, serves as a basis for other APIs
  - proposes functors parameterized by threading monads
  - structured error types
  - the functors are also parameterized by a mapping for errors (could be identity, could be to map structured errors to Error.t in a parametrizable way, in case you'd want to produce error messages in French ^^)

- easy access to the library, to introduce biocaml to beginners and to prototype scripts rapidly
  - no threading monads
  - no complicated error types
  - errors are sent via Failure or Invalid_argument

- concurrent programming, to write network applications like webservers
  - choice of threading monads
  - preference for Or_error monad

- others ?

Each flavour would be obtained from functions that are both parameterized by the threading monad and provide (if relevant) a detailed error type. As an example, Biocaml_fastq (general API) would have a terribly precise type signature, and would serve as a basis to easily implement Biocaml.EZ.Fastq, with a very simple and readable signature, Biocaml.CP.Fastq with still readable types but more safety etc etc ... In any case, if for some reason, one needs a structured error type when writing his/her webserver, the right function is still available in the general API.

This may seem overkill, but hey, we've been thinking on that for a long time now, and have changed our minds a couple of times already on this [1]. To me, this simply means there is no perfect solution, and each trade-off we settle on ends up to be annoying in a given context. Now I think that OCaml can particularly shine on this kind of situation, and simplify the construction of each flavour. This proposal is not like saying "let's not choose and offer all possibilities" -- which would be a terrible library design -- but to work on combinations that would better fit our needs.

[1] To be honest, before writing this, I was about to say that I was ok finally to drop the exn style (sic).

Ashish Agarwal

unread,
May 30, 2014, 10:11:49 AM5/30/14
to Biocaml
IIUC, you are proposing to also functorize over the error type 'err within ('ok, 'err) Result.t. Is that right? If yes, can you describe the argument to such a functor for the Fastq module.

I am unconfortable with dropping the style carrying maximum information and safety

In what sense does using Error.t reduce safety? (I insist we say Error.t, not Or_error.t, otherwise there is a misunderstanding about what I'm proposing.) You lose exhaustivity, but I see no other loss. And I don't think exhaustivity is so critical in error handling, at least not enough that it worth using a hugely more complicated error type.




--
You received this message because you are subscribed to the Google Groups "biocaml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biocaml+u...@googlegroups.com.
To post to this group, send email to bio...@googlegroups.com.
Visit this group at http://groups.google.com/group/biocaml.

Philippe Veber

unread,
May 30, 2014, 12:05:42 PM5/30/14
to Biocaml
2014-05-30 16:11 GMT+02:00 Ashish Agarwal <agarw...@gmail.com>:
IIUC, you are proposing to also functorize over the error type 'err within ('ok, 'err) Result.t. Is that right?
Sorry, I added that a bit quickly, and it was unclear. No, that's not what I had in mind, but something rather simple


 
If yes, can you describe the argument to such a functor for the Fastq module.

module Biocaml_fastq : sig
  type parsing_error = ...
  type t

  module Make(F : Future.S) : sig
    val parse : string -> (t, parsing_error) Result.t
  end

  module MakeCP(F : Future.S)(E : sig val string_of_parsing_error : parsing_error -> string end) : sig
    val parse : string -> t Or_error.t Result.t
  end
end


 

I am unconfortable with dropping the style carrying maximum information and safety

In what sense does using Error.t reduce safety?
Essentially because it is not safe to get the original value from the sexp. Safe is probably too strong a word. What I mean here is "fragile to refactoring": an error handler could be bypassed after a change without the compiler saying anything. In that sense, using Error.t is as fragile as exceptions.

Let's say you have a code like that:

let result_table fn =
  Fastq.parse fn >>= fun fq ->
  f fq >>= fun x ->
  g x >>= function
  | Ok y -> html_table y
  | Error e -> Error.sexp_of_t e |> handle_sexp

handle_sexp is supposed to handle errors from Fastq.parse, f and g so that it can produce an html element in any case. If you happen to add an intermediate step in the function

let result_table fn =
  Fastq.parse fn >>= fun fq ->
  f fq >>= fun x ->
  g x >>= fun z ->
  h z >>= function
  | Ok y -> html_table y
  | Error e -> Error.sexp_of_t e |> handle_sexp

The compiler will not warn you that handle_sexp won't do a good job with errors from h, you have to think about it yourself. In that respect, this is absolutely similar to exception handling IMO.

Here is another situation. Let's say we have

val f1 : string -> t Or_error.t
val f2 : string -> t Or_error.t

where the error hides something of type t.

Suppose now we realize that f1 and f2 should have different errors. We should now modify the error handlers in the codebase. We start separating t into t1 and t2 to have the compiler complain, but for each error handler it shows you, you'll have to make a choice between the two sexp converters. If you choose the wrong one, the compiler won't complain here.

The mechanism you describe relies on the coder: to declare explicit error types, choose a unique one for each function and never change afterwards. While these are good practice, they are the only responsibility of the programmer.

 
(I insist we say Error.t, not Or_error.t, otherwise there is a misunderstanding about what I'm proposing.)
I hope I'm not misunderstanding, it seems to me you made things as clear as possible ^^.
 
You lose exhaustivity, but I see no other loss. And I don't think exhaustivity is so critical in error handling,
If it was not, why would exceptions be considered a worse solution than Result.t [1]? In the first example I gave, the loss of exhaustivity checking leads to an exception, or forces you to put Or_error.t types virtually everywhere.

 
at least not enough that it worth using a hugely more complicated error type.
I thought the complicated error types were a problem if you have to combine them all along in monadic style. Except from that, do you think they are still more a burden than helpful? Note that here the question is not whether we should always have precise error types or not, but only if we leave the possibility to have them combined with exhaustivity checking.




[1] To be fair Daniel gave one reason on the caml-list, he says:

"If you don’t/forget to catch the exception at the right place it disrupts your whole call stack, which means that it breaks any invariants a correct excecution of your call stack was supposed to maintain. Which means that your program state is completely broken and you have to exit 1, hoping that you didn’t partially (invariant wise) persist anything to disk/network meanwhile."

In order to avoid that kind of problems, you need to have your "finally" clauses where needed (not easy).


 

Philippe Veber

unread,
May 30, 2014, 12:13:56 PM5/30/14
to Biocaml
I should add that while I liked the idea of having a couple of APIs each geared towards a particular context, I am not strongly opposed to establishing Or_error.t as the default error signaling mechanism and keeping error types simple.

Ashish Agarwal

unread,
May 31, 2014, 9:51:02 AM5/31/14
to Biocaml
On Fri, May 30, 2014 at 12:05 PM, Philippe Veber <philipp...@gmail.com> wrote:

If yes, can you describe the argument to such a functor for the Fastq module.

module Biocaml_fastq : sig
  type parsing_error = ...
  type t

  module Make(F : Future.S) : sig
    val parse : string -> (t, parsing_error) Result.t
  end

  module MakeCP(F : Future.S)(E : sig val string_of_parsing_error : parsing_error -> string end) : sig
    val parse : string -> t Or_error.t Result.t
  end
end

Okay, I get it. So this does require the main base implementation to use strong types, which I'm currently unwilling to do. It's a real pain. Using a strong type for the error case is a well known possibility amongst experts, but even the most die hard OCaml programmers have never released a library in this style. No one has the patience to write this kind of code. Sebastien did it for our code at NYU, but that code didn't have the requirement that someone else should be willing to use it too. I tried it for the last several months on some private code, and I just switched to Or_error because the strong error types were slowing down development too much.

Let's say you have a code like that:

let result_table fn =
  Fastq.parse fn >>= fun fq ->
  f fq >>= fun x ->
  g x >>= function
  | Ok y -> html_table y
  | Error e -> Error.sexp_of_t e |> handle_sexp

handle_sexp is supposed to handle errors from Fastq.parse, f and g so that it can produce an html element in any case. If you happen to add an intermediate step in the function

let result_table fn =
  Fastq.parse fn >>= fun fq ->
  f fq >>= fun x ->
  g x >>= fun z ->
  h z >>= function
  | Ok y -> html_table y
  | Error e -> Error.sexp_of_t e |> handle_sexp

The compiler will not warn you that handle_sexp won't do a good job with errors from h, you have to think about it yourself.

Right, so this is lack of exhaustiveness checking. Yes, you do lose this.


Here is another situation. Let's say we have

val f1 : string -> t Or_error.t
val f2 : string -> t Or_error.t

where the error hides something of type t.

Suppose now we realize that f1 and f2 should have different errors. We should now modify the error handlers in the codebase. We start separating t into t1 and t2 to have the compiler complain, but for each error handler it shows you, you'll have to make a choice between the two sexp converters. If you choose the wrong one, the compiler won't complain here.

Sorry, I don't get it. In `t Or_error.t`, the t is the success type. The error type is Error.t, so I'm not really sure if you meant that you change the error type or the success type.


You lose exhaustivity, but I see no other loss. And I don't think exhaustivity is so critical in error handling,
If it was not, why would exceptions be considered a worse solution than Result.t [1]?

One difference is that using Result.t is type enforced documentation that your function can return some error. There are lots of functions that really are error free (excluding truly exceptional errors like out of memory). I think it's nice to look at a module's signature, and see clearly what is error free and what is not.

Second, using Or_error.t provides a transition path to stronger error types when and if we ever want. At least you've written your code monadically, which you wouldn't have if we only provided exception-ful functions.


at least not enough that it worth using a hugely more complicated error type.
I thought the complicated error types were a problem if you have to combine them all along in monadic style. Except from that, do you think they are still more a burden than helpful?

Why exclude that criteria. You will of course compose your functions to do some overall task, so this is a necessary consideration.

How about we connect at #biocaml on IRC to continue this? It's getting too complicated over email. Let me know what times work for you.

Ashish Agarwal

unread,
Jun 2, 2014, 12:51:28 PM6/2/14
to Biocaml
On Fri, May 30, 2014 at 12:13 PM, Philippe Veber <philipp...@gmail.com> wrote:
I should add that while I liked the idea of having a couple of APIs each geared towards a particular context, I am not strongly opposed to establishing Or_error.t as the default error signaling mechanism and keeping error types simple.

Okay, so in the interest of making progress, I'll go ahead and start pushing changes to use Or_error.t. We can keep the discussion going, and maybe seeing more code in this style will help fine tune our opinions.

Philippe Veber

unread,
Jun 4, 2014, 1:41:53 AM6/4/14
to Biocaml
Yes please proceed, and let's talk on this on irc.


--
You received this message because you are subscribed to the Google Groups "biocaml" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biocaml+u...@googlegroups.com.
To post to this group, send email to bio...@googlegroups.com.
Visit this group at http://groups.google.com/group/biocaml.
Reply all
Reply to author
Forward
0 new messages