[Caml-list] Question on writing efficient Ocaml.

Ian Oversby

unread,

Dec 28, 2006, 6:44:41 AM12/28/06

to caml...@yquem.inria.fr

Hi,

Apologies if this is the wrong forum in which to ask this question. Is
there anywhere else more
suitable?

I've written some Ocaml code to solve the problem of placing 8 queens on a
board so that none of them attack each-other. I've also written a C++
equivalent which is running much more quickly than the Ocaml. I assume I've
made some basic errors with the Ocaml - does anyone have any suggestions as
to what?

The code is available here:

http://curiousprogrammer.wordpress.com/2006/12/22/speed-comparison-plt-scheme-ocaml-and-c/

I compiled the Ocaml with the following command:

ocamlopt -noassert -unsafe -ccopt -O3 -ccopt -fomit-frame-pointer q.ml -o
q.exe

This is running on Windows using the MinGW build of Ocaml.

Thanks very much,

Ian

_________________________________________________________________
MSN Hotmail is evolving – check out the new Windows Live Mail
http://ideas.live.com

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Ian Oversby

unread,

Dec 28, 2006, 11:06:45 AM12/28/06

to oand...@nerim.net

Hi Olivier,

Thanks for the response.

> > Hi,

>
> > I've written some Ocaml code to solve the problem of placing 8 queens on
>a
> > board so that none of them attack each-other. I've also written a C++
> > equivalent which is running much more quickly than the Ocaml. I assume
> > I've
> > made some basic errors with the Ocaml - does anyone have any suggestions
> > as
> > to what?
>

>there is room for improvement: for instance the type definitions of posn
>and board at the very beginning of the program introduce some unneeded
>boxing of values.

Does this mean that unboxing is inefficient in OCaml? I've written an
alternative version of the C++ that returns NULL instead of out of bound
values which was close to the same speed so it would be a little
disappointing if I couldn't achieve something similar in OCaml with Some /
None. Let me try to convert the OCaml to use out of bounds board values
instead to see if that solves the speed problem.

>You might want to compare with this solution of the queens problem in
>ocaml:
> http://caml.inria.fr/pub/old_caml_site/Examples/oc/basics/queens.ml

I've written a queens solver along the same lines which is much faster than
my other example as it makes many fewer calls and constructs fewer (and
simpler) boards.

> > I compiled the Ocaml with the following command:
> >
> > ocamlopt -noassert -unsafe -ccopt -O3 -ccopt -fomit-frame-pointer q.ml
>-o
> > q.exe
>

>the "-ccopt -O3 -ccopt -fomit-frame-pointer" are completely pointless: the
>ocaml compiler does not generate C code, it generates asm !

Well, that is certainly good to know. Thanks very much :)

>--
> Olivier

Jon Harrop

unread,

Dec 28, 2006, 11:31:10 AM12/28/06

to caml...@yquem.inria.fr

On Thursday 28 December 2006 11:42, Ian Oversby wrote:
> Hi,
>
> Apologies if this is the wrong forum in which to ask this question. Is
> there anywhere else more suitable?

Here is fine.

> I've written some Ocaml code to solve the problem of placing 8 queens on a
> board so that none of them attack each-other. I've also written a C++
> equivalent which is running much more quickly than the Ocaml.

The programs are not equivalent. Profiling shows the C++ is calling
is_threatened 1,915 to get 1 solution whereas the OCaml is calling it 129,030
times.

So the algorithms are different.

The code is also needlessly verbose and inefficient. There's no point in
declaring sum types with one contructor:

type posn = Posn of int * int;;
type board = Board of square array * int;;

There is no need for pattern matches with only one pattern:

let is_threatened queen square =
match queen, square with
(x1, y1), (x2, y2) ->
x1 == x2 ||
y1 == y2 ||
(x2 - x1) == (y2 - y1) ||
(x1 - y2) == (x2 - y1);;

let is_threatened (x1, y1), (x2, y2) =
x1 == x2 || y1 == y2 || x2 - x1 == y2 - y1 || x1 - y2 == x2 - y1;;

The program is also convoluted, e.g. add_queen converts the same value to and
from index<->coords.

I'm not sure it is even worth having a board data structure. Why not have an
implicit board and a list of queen positions? I'll have a go...

> ocamlopt -noassert -unsafe -ccopt -O3 -ccopt -fomit-frame-pointer q.ml -o
> q.exe

Don't bother with the optimisation switches. Just use:

ocamlopt q.ml -o q

You don't have any assertions, bounds checking is insignificant and you're not
compiling any C code...

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists

Richard Jones

unread,

Dec 28, 2006, 12:02:55 PM12/28/06

to Ian Oversby

On Thu, Dec 28, 2006 at 04:03:55PM +0000, Ian Oversby wrote:
> Does this mean that unboxing is inefficient in OCaml? I've written an
> alternative version of the C++ that returns NULL instead of out of bound
> values which was close to the same speed so it would be a little
> disappointing if I couldn't achieve something similar in OCaml with Some /
> None.

It's not so much that boxing/unboxing is inefficient in OCaml, but
rather that ocamlopt compiles exactly what you ask it to. If you ask
it to use a box, it uses a box! (Well, mostly ...)

See: http://caml.inria.fr/pub/old_caml_site/ocaml/numerical.html
in particular the note about Gallium.

Rich.

--
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Internet Marketing and AdWords courses - http://merjis.com/courses - NEW!
Merjis blog - http://blog.merjis.com - NEW!

skaller

unread,

Dec 28, 2006, 12:15:58 PM12/28/06

to Jon Harrop

On Thu, 2006-12-28 at 16:26 +0000, Jon Harrop wrote:

> The code is also needlessly verbose and inefficient. There's no point in
> declaring sum types with one contructor:
>
> type posn = Posn of int * int;;

Doesn't Ocaml optimise the constructor away in this case?

--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

Manfred Lotz

unread,

Dec 28, 2006, 2:52:31 PM12/28/06

to caml...@inria.fr

On Thu, 28 Dec 2006 11:42:04 +0000
"Ian Oversby" <ove...@hotmail.com> wrote:

> Hi,
>
> Apologies if this is the wrong forum in which to ask this question.
> Is there anywhere else more
> suitable?
>
> I've written some Ocaml code to solve the problem of placing 8 queens
> on a board so that none of them attack each-other. I've also written
> a C++ equivalent which is running much more quickly than the Ocaml.
> I assume I've made some basic errors with the Ocaml - does anyone
> have any suggestions as to what?
>
> The code is available here:
>
> http://curiousprogrammer.wordpress.com/2006/12/22/speed-comparison-plt-scheme-ocaml-and-c/
>
> I compiled the Ocaml with the following command:
>
> ocamlopt -noassert -unsafe -ccopt -O3 -ccopt -fomit-frame-pointer
> q.ml -o q.exe
>

I personally don't believe that it is a good idea to use options
like: -noassert -unsafe -ccopt -O3 -ccopt -fomit-frame-pointer
I regard bounds checkings and assertions as good features of Ocaml and
not as something to avoid in order to achieve good performances.

It is more important to have a stable binary. If you want to speed up
it is a matter of using a good algorithm.

--
Manfred

Jon Harrop

unread,

Dec 28, 2006, 5:30:34 PM12/28/06

to caml...@yquem.inria.fr

On Thursday 28 December 2006 16:03, Ian Oversby wrote:
> Does this mean that unboxing is inefficient in OCaml?

Yes. However, you've had to go out of your way to make the OCaml slow in this
case.

> I've written an
> alternative version of the C++ that returns NULL instead of out of bound
> values which was close to the same speed so it would be a little
> disappointing if I couldn't achieve something similar in OCaml with Some /
> None.

You would be better off focusing on higher-level optimisations, like
algorithmic optimisations.

> >You might want to compare with this solution of the queens problem in
> >ocaml:
> > http://caml.inria.fr/pub/old_caml_site/Examples/oc/basics/queens.ml
>
> I've written a queens solver along the same lines which is much faster than
> my other example as it makes many fewer calls and constructs fewer (and
> simpler) boards.

Why are you optimising this version if you already have a faster one?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists

_______________________________________________

Jon Harrop

unread,

Dec 28, 2006, 9:13:27 PM12/28/06

to caml...@yquem.inria.fr

On Thursday 28 December 2006 11:42, Ian Oversby wrote:

> I've written some Ocaml code to solve the problem of placing 8 queens on a
> board so that none of them attack each-other.

Your program is very verbose: many times longer than is necessary. Most of
this verbosity can be attributed to performing various boxing, unboxing and
allocation tasks that are superfluous and simply slow the code down. However,
profiling suggests that you're also using a different algorithm in OCaml as
the core functions are called many more times in the OCaml than in the C++.

Contrast your code with a minimal program to solve the n-queens problem in
OCaml:

let rec unthreatened (x1, y1) (x2, y2) =
x1 <> x2 && y1 <> y2 && x2 - x1 <> y2 - y1 && x1 - y2 <> x2 - y1;;

let rec search n f qs ps =
if length qs = n then f qs else
iter (fun q -> search n f (q::qs) (filter (unthreatened q) ps)) ps;;

let ps = rev (flatten (init n (fun i -> init n (fun j -> i, j))))
search n f [] ps

This program starts with a list of all board positions "ps" and simply filters
out threatened positions as queens are added. Performance is poor compared to
your C++:

0.625s C++ (130 LOC)
4.466s OCaml (28 LOC)

My first solution is written for brevity and not performance so it is still 3x
slower than the C++:

The core of the program is just this:

open List;;

let print_board n queens =
for y=0 to n-1 do
for x=0 to n-1 do
print_string (if mem (x, y) queens then "Q" else ".")
done;
print_newline()
done;
print_newline();;

let rec fold_n2 f accu x y n =
if y=n then accu else
if x=n then fold_n2 f accu 0 (y + 1) n else
fold_n2 f (f accu x y) (x + 1) y n;;

let unthreatened (x1, y1) (x2, y2) =
x1 <> x2 && y1 <> y2 && x2 - x1 <> y2 - y1 && x1 - y2 <> x2 - y1;;

let rec search n f queens () x y =
if for_all (unthreatened (x, y)) queens then
if length queens = n - 1 then f ((x, y) :: queens) else
fold_n2 (search n f ((x, y) :: queens)) () (x + 1) y n;;

then I wrote this to search once and print and then search a given number of
times (for benchmarking):

exception Queens of (int * int) list;;

let _ =
let n = 8 in
let f qs = raise (Queens qs) in
(try search n f [] () 0 0 with Queens qs -> print_board n qs);
match Sys.argv with
| [|_; reps|] ->
for rep=2 to int_of_string reps do
try search n (fun _ -> raise Exit) [] () 0 0 with Exit -> ()
done
| _ -> ()

The "fold_n2" and "search" functions are both polymorphic folds. The former
folds over a square array and the latter folds over all solutions to the
n-queens problem. The generality of the "search" function is not used in this
case, as I just print the first solution found and bail using an exception.

On my machine, 1000 solutions takes:

0.625s C++ (130 LOC)
1.764s OCaml (30 LOC)

So OCaml is >4x more concise but 2.8x slower than C++.

Profiling shows that a lot of time is spent in List.for_all. We can roll this
ourselves to remove some polymorphism and improve performance:

let rec unthreatened x1 y1 = function
| (x2, y2) :: t ->
x1 <> x2 && y1 <> y2 && x2 - x1 <> y2 - y1 && x1 - y2 <> x2 - y1 &&
unthreatened x1 y1 t
| [] -> true;;

let rec search n f queens () x y =
if unthreatened x y queens then
if length queens = n - 1 then f ((x, y) :: queens) else
fold_n2 (search n f ((x, y) :: queens)) () (x + 1) y n;;

This gets the time down to:

1.159s OCaml (33 LOC)

We can continue optimising by removing some generality. Let's turn the "fold"
into an "iter" so that "search" becomes monomorphic:

let rec iter_n2 f x y n =
if y<n then
if x=n then iter_n2 f 0 (y + 1) n else
(f x y;
iter_n2 f (x + 1) y n);;

let rec search n f queens x y =
if unthreatened x y queens then
if length queens = n - 1 then f ((x, y) :: queens) else
iter_n2 (search n f ((x, y) :: queens)) (x + 1) y n;;

Performance is improved even more, at the cost of generality.

0.827s OCaml (34 LOC)

So OCaml is now only 30% slower whilst still being ~4x more concise.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists

_______________________________________________

Jacques Garrigue

unread,

Dec 29, 2006, 1:09:17 AM12/29/06

to ska...@users.sourceforge.net

From: skaller <ska...@users.sourceforge.net>

> On Thu, 2006-12-28 at 16:26 +0000, Jon Harrop wrote:
>
> > The code is also needlessly verbose and inefficient. There's no point in
> > declaring sum types with one contructor:
> >
> > type posn = Posn of int * int;;
>
> Doesn't Ocaml optimise the constructor away in this case?

It's not really an optimization, rather that Posn being a constructor
with 2 arguments (rather than a constructor taking a pair as
argument), it ends up having exactly the same representation as a
pair. So there is no loss in efficiency here.

In general, I wouldn't explicitly advise against using single
constructor type definitions for documentation, except when the
argument is a single int or float and performance matters.

Jacques Garrigue

Ian Oversby

unread,

Dec 29, 2006, 4:46:05 AM12/29/06

to caml...@yquem.inria.fr

>From: Jon Harrop <j...@ffconsultancy.com>

>Why are you optimising this version if you already have a faster one?

I was hoping to compare similar approaches in Scheme, Ocaml and C++ in much
the same way as the language shootout does. I don't really have a desperate
need for a fast queens solver ;-) Anyway, thanks for pointing out the
algorithmic difference between the C++ and the Ocaml. I'll see if I can
track down what causes this.

Cheers,

Ian

_________________________________________________________________
MSN Hotmail is evolving – check out the new Windows Live Mail
http://ideas.live.com

_______________________________________________

Ian Oversby

unread,

Dec 29, 2006, 5:01:57 AM12/29/06

to Andrej...@andrej.com, caml...@yquem.inria.fr

>Dear Ian,
>
>your ocaml code is quite "unusual", and I must say I am not convinced your
>C++ code is written optimally (but I don't know anything about C++).

I'm not convinced it is optimal either - I wrote the scheme version first
and then translated first into Ocaml and then to C++. It is my first
piece of Ocaml (and hopefully not the last) which probably accounts for
the unusualness.

>I transcribed your C++ version to a reasonable ocaml version. Please
>compare what you did with how I transliterated your C++ code (attached file
>queen.ml). In particular, I use more obvious types (position is just
>int*int, a board is a 2D array)
>
>Then I wrote my own ocaml version which mimicks your algorithm, except that
>instead of creating new boards all the time, it just keeps a list of
>current positions of queens (attached file queen2.ml).

Excellent. Thanks very much. I'll have a look at both of them and
see what I can learn.

>By the way, why are you placing 8 queens on the board, no matter how large
>the board is? I thought the idea was to put n queens on a board of size n x
>n.

Yes, good catch - it is a bug. I didn't do sufficient testing.

[snip]

>Feel free to post this on your blog.

Thanks again. I'll probably write a follow-up when I understand what
is going on. Jon Harrop also uncovered an algorithmic difference between
the C++ and the Ocaml so I'll look into that too.

Mattias Engdegård

unread,

Dec 29, 2006, 6:18:32 AM12/29/06

to garr...@math.nagoya-u.ac.jp

>In general, I wouldn't explicitly advise against using single
>constructor type definitions for documentation, except when the
>argument is a single int or float and performance matters.

Is there a reason for this? To my innocent eyes, code like

type length = Length of int

looks quite reasonable and could be useful at times.

If binary compatibility with old OCaml releases would be the reason,
is there a policy stating when such ABI details can be changed?

Nathaniel Gray

unread,

Jan 5, 2007, 7:54:10 PM1/5/07

to Mattias Engdegård

On 12/29/06, Mattias Engdegård <mat...@virtutech.se> wrote:
> >In general, I wouldn't explicitly advise against using single
> >constructor type definitions for documentation, except when the
> >argument is a single int or float and performance matters.
>
> Is there a reason for this? To my innocent eyes, code like
>
> type length = Length of int
>
> looks quite reasonable and could be useful at times.

I agree. Sadly, the ocaml devs don't.

http://caml.inria.fr/mantis/view.php?id=3978

As Xavier points out, one can use modules to hide basic types, but
this is pretty clumsy in practice. There's rumored to be a solution
using phantom types, but my attempts don't work:

# type 'a boink = int;;
type 'a boink = int
# let f = (20 : string boink);;
val f : string boink = 20
# let g = (30 : int boink);;
val g : int boink = 30
# f;;
- : string boink = 20
# g;;
- : int boink = 30
# f + g;;
- : int = 50

> If binary compatibility with old OCaml releases would be the reason,
> is there a policy stating when such ABI details can be changed?

The ocaml devs have been quite clear that the ABI can change at any
time, so that's not the reason.

Cheers,
-n8

--
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->

Philippe Wang

unread,

Jan 5, 2007, 8:03:16 PM1/5/07

to Nathaniel Gray

Le 6 janv. 07 à 01:52, Nathaniel Gray a écrit :

> As Xavier points out, one can use modules to hide basic types, but
> this is pretty clumsy in practice. There's rumored to be a solution
> using phantom types, but my attempts don't work:
>
> # type 'a boink = int;;
> type 'a boink = int
> # let f = (20 : string boink);;
> val f : string boink = 20
> # let g = (30 : int boink);;
> val g : int boink = 30
> # f;;
> - : string boink = 20
> # g;;
> - : int boink = 30
> # f + g;;
> - : int = 50

Hmmm... the expression
> f + g;;
becomes an int because ( + ) : int -> int -> int

If you write :

((+) : int -> int boink -> string boink) f g

instead, it may do what you meant to do...

Cheers,

Philippe Wang
mail(at)philippewang.info

brogoff

unread,

Jan 5, 2007, 8:17:00 PM1/5/07

to Nathaniel Gray

On Fri, 5 Jan 2007, Nathaniel Gray wrote:
> On 12/29/06, Mattias Engdegård <mat...@virtutech.se> wrote:
> > Is there a reason for this? To my innocent eyes, code like
> >
> > type length = Length of int
> >
> > looks quite reasonable and could be useful at times.
>
> I agree. Sadly, the ocaml devs don't.
>
> http://caml.inria.fr/mantis/view.php?id=3978
>
> As Xavier points out, one can use modules to hide basic types, but
> this is pretty clumsy in practice. There's rumored to be a solution
> using phantom types, but my attempts don't work:

You need to use modules to make phantom types work. Something like this

module type BOINK =
sig
type 'a t
val inj : int -> 'a t
val prj : 'a t -> int
val plus : 'a t -> 'a t -> 'a t
end;;

module Boink : BOINK =
struct
type 'a t = int
let inj n = n
let prj t = t
let plus x y = x + y
end;;

let f : string Boink.t = inj 20;;
let g : int Boink.t = inj 30;;

Boink.plus f g;;

I didn't compile that, but you get the idea...

-- Brian

Martin Jambon

unread,

Jan 5, 2007, 9:28:58 PM1/5/07

to brogoff

In case anyone finds it useful, I have this code which is ready to use
(copy the files it into your project):

http://martin.jambon.free.fr/ocaml.html#opaque

It works only for strings and ints.
You can write:

open Opaque
let x : [`Year] int_t = int_t 2007
let next_year x : [`Year] int_t = int_t (t_int x + 1)

It gives you:
val x : [ `Year ] Opaque.int_t
val next_year : 'a Opaque.int_t -> [ `Year ] Opaque.int_t

Note that we need one more type annotation if we want to get the
following signature:
val next_year : [ `Year ] Opaque.int_t -> [ `Year ] Opaque.int_t

or we can just use "successor" which is equivalent to "succ":
val successor : 'a Opaque.int_t -> 'a Opaque.int_t

As you can see, things can get pretty ugly, but I found this technique
useful to avoid confusion between identifiers that identify different
types of objects. Not in my average "hello world" script.

Martin

--
Martin Jambon
http://martin.jambon.free.fr

Nathaniel Gray

unread,

Jan 8, 2007, 5:26:50 PM1/8/07

to Martin Jambon

Thanks to everybody for the discussion of phantom types. I guess
modules are unavoidable then. Too bad... :^(

--
>>>-- Nathaniel Gray -- Caltech Computer Science ------>
>>>-- Mojave Project -- http://mojave.cs.caltech.edu -->

_______________________________________________