[Caml-list] Style and organization of code

ian

unread,

Mar 14, 2007, 11:06:18 PM3/14/07

to caml...@inria.fr

I'm looking for a guidebook or just some rules of thumb on how to organize my
OCaml code.

One example:

Say I have a function called "solveHardProblem". solveHardProblem relies on
several helper functions, which are not going to be useful to any other
functions in the program. So, my first instinct would be to define all the
helpers using let blocks within the definition of solveHardProblem.

But that would make the definition of solveHardProblem really long -- several
screens of text -- which I've been taught to avoid. Is it wrong to use a module
to hide those functions if the module signature will contain only that of
solveHardProblem?

And say you DO choose to use a module... The OCaml documentation says that the
compiler can automatically infer the signature without the need to create a .mli
file for it. Does anyone actually use that feature in practice, or is creating
a sig hard-wired to the act of creating a struct?

Thanks,
-Ian

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Tom

unread,

Mar 14, 2007, 11:22:25 PM3/14/07

to ian

On 14/03/07, ian <fist...@softhome.net> wrote:
>
> I'm looking for a guidebook or just some rules of thumb on how to organize
> my
> OCaml code.
>
> One example:
>
> Say I have a function called "solveHardProblem".

So, this is the first place for change. OCaml functions, values, types, and
exceptions are usually not named lowerUpperCase, but words_with_underscores
- it is somewhat disfunctional for us coming from "other" parts of the world
(typing _ includes pressing the Shift key) but so is typing uppercase letter
- there are no other alternatives. Anyways, better stick to the convention.

solveHardProblem relies on
> several helper functions, which are not going to be useful to any other
> functions in the program. So, my first instinct would be to define all
> the
> helpers using let blocks within the definition of solveHardProblem.
>
> But that would make the definition of solveHardProblem really long --
> several
> screens of text -- which I've been taught to avoid. Is it wrong to use a
> module
> to hide those functions if the module signature will contain only that of
> solveHardProblem?

No.

And say you DO choose to use a module... The OCaml documentation says that
> the
> compiler can automatically infer the signature without the need to create
> a .mli
> file for it. Does anyone actually use that feature in practice, or is
> creating
> a sig hard-wired to the act of creating a struct?

Hm... well... I mean, the compiler can actually infer things for you, but
this simply means that you don't NEED to include a signature - it's not a
compile time error if you omit it. But the comiler will assume that you want
all the values (including functions) and types from this module to be
exported. So .mli signatures are actually used for limiting exports. Exactly
what you're trying to do.

Or you can simply define helper functions, and never use them.

- Tom

Chris King

unread,

Mar 14, 2007, 11:45:56 PM3/14/07

to ian

On 3/14/07, ian <fist...@softhome.net> wrote:
> Say I have a function called "solveHardProblem". solveHardProblem relies on
> several helper functions, which are not going to be useful to any other
> functions in the program. So, my first instinct would be to define all the
> helpers using let blocks within the definition of solveHardProblem.

I usually do exactly that. I think it's fine to have a long let like
this, so long as it's made up of small, manageable pieces. The
biggest problem is that your arguments are now miles away from the
definition of your function, but this can be solved with something
like the following:

let solve_hard_problem =
let solve_simpler_problem x = ...
and solve_something_else x = ... in
fun x y z ->
[code goes here]

This is also a good trick for moving a global variable (such as a
unique ID counter or a hash table) into the scope of the function that
uses it.

FWIW I've seen some examples of this style in the standard library.
genlex.ml has a 150+ line function of the form:

let make_lexer keywords =
let kwd_table = Hashtbl.create 17 in
List.iter (fun s -> Hashtbl.add kwd_table s (Kwd s)) keywords;
let ident_or_keyword id = ...
and keyword_or_error c = ...
[snip]
in
fun input -> Stream.from (fun count -> next_token input)

Since the "keywords" argument is needed by the helper functions, it
appears at the top in the let, but "input" isn't and so it was moved
to the bottom in the fun. None of the helper functions are much
longer than one screenful so it makes for easy reading.

Chris King

unread,

Mar 14, 2007, 11:52:51 PM3/14/07

to ian

On 3/14/07, Chris King <colan...@gmail.com> wrote:
> Since the "keywords" argument is needed by the helper functions, it
> appears at the top in the let, but "input" isn't and so it was moved
> to the bottom in the fun.

Oops, on second thought, this was much more likely done to reduce
overhead when the function returned by "make_lexer keywords" is used
multiple times (which it likely will be). But even if it weren't,
that's still the style I'd recommend :)

skaller

unread,

Mar 15, 2007, 12:02:44 AM3/15/07

to ian

On Wed, 2007-03-14 at 17:25 -0500, ian wrote:
> I'm looking for a guidebook or just some rules of thumb on how to organize my
> OCaml code.

> But that would make the definition of solveHardProblem really long -- several

> screens of text -- which I've been taught to avoid.

Yeah, this is a problem with functional programming .. i have
functions that are hundreds of lines long.

Generally you want to factor out functions with minimal coupling
to the enclosing function's environment, and leave them in
if they're heavily coupled.

Furthermore if that helper is reasonably general OR it has
some semantics which are separately understandable .. you can
put that function in a separate file for additional decoupling.

> Is it wrong to use a module
> to hide those functions if the module signature will contain only that of
> solveHardProblem?

That's the normal thing to do.

Furthermore if that helper is reasonably general OR it has
some semantics which are separately understandable .. you can
put that function in a separate file for additional decoupling.

This has the downside that Ocaml's namespace management is weak,
so your function is now fully public.

But smaller modules are more pleasing and easier to manage,
so it is probably worth while.

In particular if you use 'open Module' a lot, then the
dependencies both ON and OF that module are more refined
and explicit. This is also a reasonable first order approximation
to measuring the 'coupling' between components.

> And say you DO choose to use a module... The OCaml documentation says that the
> compiler can automatically infer the signature without the need to create a .mli
> file for it. Does anyone actually use that feature in practice, or is creating
> a sig hard-wired to the act of creating a struct?

I personally never do this: there is always an mli file for
every ml file -- even if the build script makes it by copying
the mli file.

--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

ian

unread,

Mar 15, 2007, 1:45:28 AM3/15/07

to caml...@inria.fr

Tom wrote:
> OCaml functions, values, types,
> and exceptions are usually not named lowerUpperCase, but
> words_with_underscores - it is somewhat disfunctional for us coming from
> "other" parts of the world (typing _ includes pressing the Shift key)
> but so is typing uppercase letter - there are no other alternatives.
> Anyways, better stick to the convention.

OK. I was getting the lowerUpperCase from "Naming and Declarations" in this
document:
http://www.cs.caltech.edu/~cs20/a/style.html

Is that out of date? If so, can you point me to a newer/better one?
Alternatively, how would you fill in that table?

Gabriel Kerneis

unread,

Mar 15, 2007, 4:04:38 AM3/15/07

to caml...@yquem.inria.fr

Le Wed, 14 Mar 2007 20:03:32 -0500, ian <fist...@softhome.net> a
écrit :

> OK. I was getting the lowerUpperCase from "Naming and Declarations"
> in this document:
> http://www.cs.caltech.edu/~cs20/a/style.html
> Is that out of date? If so, can you point me to a newer/better one?
> Alternatively, how would you fill in that table?

If you can read french :
http://caml.inria.fr/pub/old_caml_site/FAQ/pgl-fra.html

Regards,
--
Gabriel Kerneis

signature.asc

Maxence Guesdon

unread,

Mar 15, 2007, 4:10:41 AM3/15/07

to caml...@yquem.inria.fr

This page and its english version were ported to the new site some
months ago:
http://caml.inria.fr/resources/doc/guides/guidelines.fr.html
http://caml.inria.fr/resources/doc/guides/guidelines.en.html

Regards,

--
Maxence Guesdon
http://yquem.inria.fr/~guesdon/
http://devel.inria.fr/rocq/

Gabriel Kerneis

unread,

Mar 15, 2007, 4:22:00 AM3/15/07

to caml...@yquem.inria.fr

Le Thu, 15 Mar 2007 09:09:17 +0100, Maxence Guesdon
<maxence...@inria.fr> a écrit :

> This page and its english version were ported to the new site some
> months ago:
> http://caml.inria.fr/resources/doc/guides/guidelines.fr.html
> http://caml.inria.fr/resources/doc/guides/guidelines.en.html

Thanks a lot, I couldn't find it.
Now, what about translating it to the "revised" syntax ? ;-)

Sincerely,
--
Gabriel Kerneis

signature.asc

micha

unread,

Mar 15, 2007, 5:05:27 AM3/15/07

to caml...@yquem.inria.fr

Am Thu, 15 Mar 2007 15:01:12 +1100
schrieb skaller <ska...@users.sourceforge.net>:

_______________________________________________

ian

unread,

Mar 15, 2007, 10:50:59 AM3/15/07

to caml...@yquem.inria.fr

Gabriel Kerneis wrote:
> Le Thu, 15 Mar 2007 09:09:17 +0100, Maxence Guesdon
> <maxence...@inria.fr> a écrit :
>> This page and its english version were ported to the new site some
>> months ago:
>> http://caml.inria.fr/resources/doc/guides/guidelines.fr.html
>> http://caml.inria.fr/resources/doc/guides/guidelines.en.html

Merci! C'est exactament ce que je cherche!

-Ian

Eric Cooper

unread,

Mar 15, 2007, 1:12:04 PM3/15/07

to caml...@yquem.inria.fr, caml...@inria.fr

On Thu, Mar 15, 2007 at 02:38:25PM +0100, Vu Ngoc San wrote:
> The question is: is there a way to decide, at run-time, whether the user
> already has this library or not ?

if pkg-config --exists pangocairo; then
echo yes
else
echo no
fi

--
Eric Cooper e c c @ c m u . e d u

Eric Cooper

unread,

Mar 15, 2007, 1:30:47 PM3/15/07

to caml...@inria.fr

When I list-reply to a message on this list, my outgoing message
header gets set to
caml...@yquem.inria.fr, caml...@inria.fr

When I notice it, I can manually delete one, but I often forget,
resulting in a duplicated post. Does this happen to anyone else, and
is there a fix?

Stefano Zacchiroli

unread,

Mar 15, 2007, 2:45:55 PM3/15/07

to caml...@inria.fr

On Thu, Mar 15, 2007 at 01:29:22PM -0400, Eric Cooper wrote:
> When I list-reply to a message on this list, my outgoing message
> header gets set to
> caml...@yquem.inria.fr, caml...@inria.fr
>
> When I notice it, I can manually delete one, but I often forget,
> resulting in a duplicated post. Does this happen to anyone else, and
> is there a fix?

It does happen to me as well. I've no solution, but I can bet a reason:
the caml list set both Mail-Followup-To and List-Post headers. The
former is (correctly) set to caml...@inria.fr, while the latter is set
to caml...@yquem.inria.fr.

Either removing the List-Post header or setting it to caml...@inria.fr
email address solves the problem. Unfortunately I've no ready
workaround for the current state of affairs.

Cheers.

--
Stefano Zacchiroli -*- Computer Science PhD student @ Uny Bologna, Italy
zack@{cs.unibo.it,debian.org,bononia.it} -%- http://www.bononia.it/zack/
(15:56:48) Zack: e la demo dema ? /\ All one has to do is hit the
(15:57:15) Bac: no, la demo scema \/ right keys at the right time

caml...@inria.fr

unread,

Mar 15, 2007, 2:55:42 PM3/15/07

to caml...@inria.fr

Eric Cooper <e...@cmu.edu> writes:

> When I list-reply to a message on this list, my outgoing message
> header gets set to
> caml...@yquem.inria.fr, caml...@inria.fr
>
> When I notice it, I can manually delete one, but I often forget,
> resulting in a duplicated post.

> Does this happen to anyone else, and

Yes, to me.

> is there a fix?

The mail client should interpret the List-ID header field (present in
mails going through the inria list), know from that, that the mail came
from a list and reply to the list. Perhaps there is a extra command in
mutt which does this (like "reply to list").

Unfortunately this doesn't work if you have an imap server (like
cyrus) which does duplicate elimination. Then the first mail that
arrives (usually the copy adressed to you personally) doesn't carry
the List-ID and the other copies you don't see at all :-).

This is fundamentally broken and cannot be fixed, except
heuristically. I've programmed Gnus to respect List-ID until I found
the problem with the duplicate elimination. Then I stopped trying to
bring sense into 'To:', 'CC:', 'List-ID', 'From:' whatever. There is
no way to give any specific meaning to those fields. I even got
"newsletter" mail that was apparently adressed to the sender (and then
my Gnus always sets 'From' to that adress on reply: Arrgs!

The only way I can see to handle this, is, to filter incoming mail
with procmail, detect the list address in ANY of CC, From, To and tag
the mail with a "list context". Replying should detect the list
context and address all mails belonging to a list context to the list
in question (one can use List-ID for the list-context or have a
per-folder configuration).

I know religious wars are made from how E-Mail list headers are done
right. I don't want to start any, my comment should be construed as
humble practical advice only.

Regards -- Markus

ls-ocaml-de...@m-e-leypold.de

unread,

Mar 15, 2007, 3:39:31 PM3/15/07

to caml...@inria.fr

caml...@inria.fr writes:

> Eric Cooper <e...@cmu.edu> writes:
>
>> When I list-reply to a message on this list, my outgoing message
>> header gets set to
>> caml...@yquem.inria.fr, caml...@inria.fr
>>
>> When I notice it, I can manually delete one, but I often forget,
>> resulting in a duplicated post.
>
>> Does this happen to anyone else, and
>
> Yes, to me.

Please ignore me. I seem to have an inclination to put my foot into my
mouth in public recently :-(. The problem I described is related but
different.

The fix, though, to locally process incoming mail with procmail to
"fix" the headers would be valid.

Martin Jambon

unread,

Mar 15, 2007, 4:07:30 PM3/15/07

to ian

On Thu, 15 Mar 2007, ian wrote:

> Gabriel Kerneis wrote:
> > Le Thu, 15 Mar 2007 09:09:17 +0100, Maxence Guesdon
> > <maxence...@inria.fr> a écrit :
> >> This page and its english version were ported to the new site some
> >> months ago:
> >> http://caml.inria.fr/resources/doc/guides/guidelines.fr.html
> >> http://caml.inria.fr/resources/doc/guides/guidelines.en.html
>
> Merci! C'est exactament ce que je cherche!

Another way is to inspect the source code of the standard library (with
ocamlbrowser) and use the same style.

Martin

--
Martin Jambon
http://martin.jambon.free.fr

Richard Jones

unread,

Mar 15, 2007, 6:41:33 PM3/15/07

to ian

On Wed, Mar 14, 2007 at 05:25:37PM -0500, ian wrote:
> Say I have a function called "solveHardProblem".

Ack! studlyCaps is horrible and unreadable (I know - I'm currently
involved in a project which uses them). Try "solve_hard_problem"
instead.

> solveHardProblem relies on
> several helper functions, which are not going to be useful to any other
> functions in the program. So, my first instinct would be to define all the
> helpers using let blocks within the definition of solveHardProblem.
>
> But that would make the definition of solveHardProblem really long -- several
> screens of text -- which I've been taught to avoid.

You've been taught wrong. There are some studies that show that long
functions actual reduce error rates - see Steve McConnell's book Code
Complete for references.

> Is it wrong to use a module to hide those functions if the module
> signature will contain only that of solveHardProblem?

Not wrong at all.

> And say you DO choose to use a module... The OCaml documentation
> says that the compiler can automatically infer the signature without
> the need to create a .mli file for it. Does anyone actually use
> that feature in practice, or is creating a sig hard-wired to the act
> of creating a struct?

You don't need to create a separate .mli (in fact, you sometimes
can't).

Rich.

--
Richard Jones
Red Hat

David Allsopp

unread,

Mar 15, 2007, 7:10:01 PM3/15/07

to caml...@yquem.inria.fr

> On Wed, Mar 14, 2007 at 05:25:37PM -0500, ian wrote:
> > Say I have a function called "solveHardProblem".
>
> Ack! studlyCaps is horrible and unreadable (I know - I'm currently
> involved in a project which uses them). Try "solve_hard_problem"
> instead.

Horrible and unreadable? We seem to be forgetting that camel notation versus
underscores is entirely a matter of taste... I have no problem reading camel
notation and find underscores ugly (not to mention harder to type than
caps). I've always found the argument "the standard library uses this
notation" to be a very weak argument typically coming from more senior
programmers who're clutching at straws to justify their opinions ;o)

I'm glad that, most of the time, the only standard library functions I use
with underscores are {type}_of_{other type} or {to|from|of}_{type} so don't
happen too often.

(amusing aside: I once worked for a company that mixed the two... giving
solve_Hard_Problem which was particularly tedious!!)

<snip>

> You don't need to create a separate .mli (in fact, you sometimes
> can't).

Eh? When does ocamlc -i Foo.ml > Foo.mli ever fail? I too always pair a .mli
file with a .ml file even if the signature is exactly the same.

Richard Jones

unread,

Mar 15, 2007, 7:49:56 PM3/15/07

to David Allsopp

On Thu, Mar 15, 2007 at 11:08:28PM -0000, David Allsopp wrote:
> > On Wed, Mar 14, 2007 at 05:25:37PM -0500, ian wrote:
> > > Say I have a function called "solveHardProblem".
> >
> > Ack! studlyCaps is horrible and unreadable (I know - I'm currently
> > involved in a project which uses them). Try "solve_hard_problem"
> > instead.
> Horrible and unreadable? We seem to be forgetting that camel notation versus
> underscores is entirely a matter of taste... I have no problem reading camel
> notation and find underscores ugly (not to mention harder to type than
> caps). I've always found the argument "the standard library uses this
> notation" to be a very weak argument typically coming from more senior
> programmers who're clutching at straws to justify their opinions ;o)
>
> I'm glad that, most of the time, the only standard library functions I use
> with underscores are {type}_of_{other type} or {to|from|of}_{type} so don't
> happen too often.
>
> (amusing aside: I once worked for a company that mixed the two... giving
> solve_Hard_Problem which was particularly tedious!!)

noIReallyThingYouReWrongAboutThisOne.

> > You don't need to create a separate .mli (in fact, you sometimes
> > can't).
> Eh? When does ocamlc -i Foo.ml > Foo.mli ever fail? I too always pair a .mli
> file with a .ml file even if the signature is exactly the same.

I meant for defining the type of just an inner module on its own.
It's generally a good idea to define types of interfaces in .mli
files.

Rich.

--
Richard Jones
Red Hat

_______________________________________________

Quôc Peyrot

unread,

Mar 15, 2007, 11:52:44 PM3/15/07

to caml...@yquem.inria.fr

To quote gaim/HACKING:

"Coding styles are like assholes, everyone has one and no one likes
anyone elses." - Eric Warmenhoven

--
Best Regards,
Quôc

Jon Harrop

unread,

Mar 28, 2007, 9:53:58 PM3/28/07

to caml...@yquem.inria.fr

On Wednesday 14 March 2007 22:25, ian wrote:
> I'm looking for a guidebook or just some rules of thumb on how to organize
> my OCaml code.

An excellent question.

> Say I have a function called "solveHardProblem". solveHardProblem relies
> on several helper functions, which are not going to be useful to any other
> functions in the program. So, my first instinct would be to define all the
> helpers using let blocks within the definition of solveHardProblem.
>
> But that would make the definition of solveHardProblem really long --
> several screens of text -- which I've been taught to avoid. Is it wrong to
> use a module to hide those functions if the module signature will contain
> only that of solveHardProblem?

I would recommend splitting large functions into many smaller functions. This
has several advantages:

1. Easier to test small, self-contained units, e.g. in a top-level.

2. Environment is explicit.

3. Often more efficient.

4. Easier to assimilate bite-sized chunks of code.

5. Easier to describe/document.

6. Explicit environment makes errors in recursion easier to avoid.

> And say you DO choose to use a module... The OCaml documentation says that
> the compiler can automatically infer the signature without the need to
> create a .mli file for it. Does anyone actually use that feature in
> practice, or is creating a sig hard-wired to the act of creating a struct?

I often use inferred .mlis, especially during development when I need more
flexibility. Note that you can nest modules inside compilation units as well.

In this case, if your nested module solveHardProblem would only expose a
single value then I'd leave that function and all of its helpers inlined into
the outer module and restrict its signature to hide the helpers.

My style has evolved significantly since I started programming in OCaml and I
would now recommend following the style of the stdlib.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists