[Caml-list] OCamlJIT2 vs. OCamlJIT

Benedikt Meurer

unread,

Nov 30, 2010, 3:36:56 AM11/30/10

to caml...@yquem.inria.fr

Hello everybody,

I did some final work on OCamlJIT2, and compared the result to OCamlJIT. The performance measures are presented in the following tech report (skip straight to section 4 for the performance results):

http://arxiv.org/abs/1011.6223

In short: Performance measured on a P4 "Northwood" (no long mode, plain x86) 2.4GHz. OCamlJIT2 beats OCamlJIT by a factor of 1.1 to 2.0 in every benchmark, and - rather surprising - was even able to beat ocamlopt in the number crunching benchmark (probably an issue with the x86 backend of ocamlopt).

As mentioned by Xavier Leroy and others previously, we probably went as far as we could go in the direction of JITting the byte-code virtual machine, while preserving its general stack-based nature and instruction set. Moving even further means translating the byte-code to some intermediate form suitable for use with standard compilation techniques; but as we saw earlier, in an LLVM-based prototype, the compilation overhead increases dramatically and the benefit of JIT compilation vanishes.

Therefore, as suggested earlier, I'll try to get my hands on the native top-level now (already contacted Alain for the emitter code). Additionally, the linear scan register allocation will be implemented by a student as part of his diploma thesis.

regards,
Benedikt
_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Török Edwin

unread,

Nov 30, 2010, 5:48:14 AM11/30/10

to Benedikt Meurer, caml...@yquem.inria.fr

On Tue, 30 Nov 2010 09:36:07 +0100
Benedikt Meurer <benedik...@googlemail.com> wrote:

> Hello everybody,
>
> I did some final work on OCamlJIT2, and compared the result to
> OCamlJIT. The performance measures are presented in the following
> tech report (skip straight to section 4 for the performance results):
>
> http://arxiv.org/abs/1011.6223
>
> In short: Performance measured on a P4 "Northwood" (no long mode,
> plain x86) 2.4GHz. OCamlJIT2 beats OCamlJIT by a factor of 1.1 to 2.0
> in every benchmark, and - rather surprising - was even able to beat
> ocamlopt in the number crunching benchmark (probably an issue with
> the x86 backend of ocamlopt).

Looks like this happens only on Pentium4: on Core2 and Xeon ocamlopt
is still faster on almabench.unsafe, or did I miss something?

>
> As mentioned by Xavier Leroy and others previously, we probably went
> as far as we could go in the direction of JITting the byte-code
> virtual machine, while preserving its general stack-based nature and
> instruction set. Moving even further means translating the byte-code
> to some intermediate form suitable for use with standard compilation
> techniques; but as we saw earlier, in an LLVM-based prototype, the
> compilation overhead increases dramatically and the benefit of JIT
> compilation vanishes.

An LLVM-based backend would still be interesting for static
compilation, where compile times don't matter much.
Did you try comparing an LLVM-based backend with ocamlopt?
If it is faster could some of the LLVM passes be ported to ocamlopt's
backend?

Best regards,
--Edwin

Benedikt Meurer

unread,

Nov 30, 2010, 5:55:47 AM11/30/10

to caml...@yquem.inria.fr

On Nov 30, 2010, at 11:48 , Török Edwin wrote:

>> In short: Performance measured on a P4 "Northwood" (no long mode,
>> plain x86) 2.4GHz. OCamlJIT2 beats OCamlJIT by a factor of 1.1 to 2.0
>> in every benchmark, and - rather surprising - was even able to beat
>> ocamlopt in the number crunching benchmark (probably an issue with
>> the x86 backend of ocamlopt).
>
> Looks like this happens only on Pentium4: on Core2 and Xeon ocamlopt
> is still faster on almabench.unsafe, or did I miss something?

Core 2 and Xeon are x86-64, P4 is x86, so completely different architecture (with different code generator backends).

>>
>> As mentioned by Xavier Leroy and others previously, we probably went
>> as far as we could go in the direction of JITting the byte-code
>> virtual machine, while preserving its general stack-based nature and
>> instruction set. Moving even further means translating the byte-code
>> to some intermediate form suitable for use with standard compilation
>> techniques; but as we saw earlier, in an LLVM-based prototype, the
>> compilation overhead increases dramatically and the benefit of JIT
>> compilation vanishes.
>
> An LLVM-based backend would still be interesting for static
> compilation, where compile times don't matter much.
> Did you try comparing an LLVM-based backend with ocamlopt?
> If it is faster could some of the LLVM passes be ported to ocamlopt's
> backend?

LLVM backend for ocamlopt is a totally different story. You'd have to start with the Ulambda or the Cmm intermediate representation, while a JIT approach starts with the byte-code representation (not necessarily, but that was the case for our approach).

However, I'm not sure there would be any immediate benefit from using LLVM as code generator backend for ocamlopt, since ocamlopt already does a quite good (and fast) job. So besides probably educational or research interests, what would be the practical benefit of LLVM inside ocamlopt?

> Best regards,
> --Edwin

regards,
Benedikt

bluestorm

unread,

Nov 30, 2010, 12:02:07 PM11/30/10

to Benedikt Meurer, caml...@yquem.inria.fr

On Tue, Nov 30, 2010 at 11:55 AM, Benedikt Meurer <
benedik...@googlemail.com> wrote:
>
> LLVM backend for ocamlopt is a totally different story. You'd have to start
> with the Ulambda or the Cmm intermediate representation, while a JIT
> approach starts with the byte-code representation (not necessarily, but that
> was the case for our approach).
>
> However, I'm not sure there would be any immediate benefit from using LLVM
> as code generator backend for ocamlopt, since ocamlopt already does a quite
> good (and fast) job. So besides probably educational or research interests,
> what would be the practical benefit of LLVM inside ocamlopt?
>

There would be several advantages in switching to LLVM for code generation.
The general idea is that if other people work on the low-level stuff, it is
less work for the OCaml implementors.

- more portability : while I'm not sure LLVM is ported to more platforms
than OCaml right now, the LLVM community will probably port its bytecode to
numerous architectures in the future; in contrast, maintining numerous
backend in the OCaml compiler has a maintainance cost. In particular,
cross-compilation would probably be easier.

- more optimizations : the LLVM guys try to develop a wide range of
optimization passes between LLVM IR and native code, while ocamlopt is
mostly a "non-optimising compiler". It's not clear however how much gain we
could have, as OCaml has already optimized the most important parts, and a
big part of the performance concerns are outside the LLVM area (data
representation and GC). Still, the experience of GHC-LLVM has been quite
positive, with noticeable improvements in some cases.

On the other hand, it may be non-trivial to adapt LLVM to cope with the
OCaml runtime, the GC in particuliar, and that would probably involve
upstream changes to LLVM.

LLVM is nice and trendy (though it's a shame the GNU guys, partly due to
their own mistakes, are losing an important part of the FLOSS ecosystem to
Apple...), but I'm personally more interested in the more theoretical
projects of verified compilation toolchains, such as compcert (
http://compcert.inria.fr/ ). It's unrealistic to hope to have a completely
verified ocaml-to-assembly compiler, as we would first need formal semantics
for the OCaml language itself, but it is the very point : doing that kind of
things forces you to have formal semantics, which is very interesting in
many respects.

Asking for a decent compiler was once the way to tell apart the serious
languages from the insane string-fiddling script languages, but the line is
blurred by the indecent amount of work injected in the optimization of those
insane languages. Formal semantics will distinguish the gentlemen of the
future.

Benedikt Meurer

unread,

Nov 30, 2010, 12:29:30 PM11/30/10

to caml...@yquem.inria.fr

On Nov 30, 2010, at 18:01 , bluestorm wrote:

> - more portability : while I'm not sure LLVM is ported to more platforms than OCaml right now, the LLVM community will probably port its bytecode to numerous architectures in the future; in contrast, maintining numerous backend in the OCaml compiler has a maintainance cost. In particular, cross-compilation would probably be easier.

Indeed, cross-compilation would be a nice benefit.

> - more optimizations : the LLVM guys try to develop a wide range of optimization passes between LLVM IR and native code, while ocamlopt is mostly a "non-optimising compiler". It's not clear however how much gain we could have, as OCaml has already optimized the most important parts, and a big part of the performance concerns are outside the LLVM area (data representation and GC). Still, the experience of GHC-LLVM has been quite positive, with noticeable improvements in some cases.

Most of LLVM optimizing passes won't help with OCaml, as they are targeted towards constructs generated by C/C++/Java compilers. Of course that could be fixed to include constructs generated by the OCaml compiler, but we won't get much for free here.

> On the other hand, it may be non-trivial to adapt LLVM to cope with the OCaml runtime, the GC in particuliar, and that would probably involve upstream changes to LLVM.

This is a major issue, which also hit me with the bytecode runtime. In the native runtime it's even worse; there is no well-defined stack layout nor specific register assignment in LLVM, necessary to handle the runtime and garbage collector calls. One would need to reinvent the OCaml runtime and GC interface (at least in part) to fit well with LLVM. Some of this can be fixed using special calling conventions, but that way the problems are simply shifted down to another layer (requiring more - possibly not really portable - changes to LLVM). You will be faced with various small issues, like how to check whether a SEGV was caused within OCaml code (for stack overflow detection), how to force LLVM to place only tagged integers onto the C stack when spilling registers, etc. Resolving these issues will require various non-trivial changes to both OCaml and LLVM. Studying GHC-LLVM might help to resolve some of them (i.e. there is already a GHC calling convention available in LLVM); maybe t
he Mono-LLVM backend may also provide some pointers.

Yoann Padioleau

unread,

Nov 30, 2010, 12:33:05 PM11/30/10

to bluestorm, caml...@yquem.inria.fr

On Nov 30, 2010, at 9:01 AM, bluestorm wrote:

> On Tue, Nov 30, 2010 at 11:55 AM, Benedikt Meurer <benedik...@googlemail.com> wrote:
> There would be several advantages in switching to LLVM for code generation. The general idea is that if other people work on the low-level stuff, it is less work for the OCaml implementors.
>

[...]

> LLVM is nice and trendy

Yes, and it has to stop. I don't understand why there is so much hype around LLVM. Why would you think something written in C++ would be far better than the ocaml code we have in the ocamlopt compiler ?

> (though it's a shame the GNU guys, partly due to their own mistakes, are losing an important part of the FLOSS ecosystem to Apple...), but I'm personally more interested in the more theoretical projects of verified compilation toolchains, such as compcert ( http://compcert.inria.fr/ ). It's unrealistic to hope to have a completely verified ocaml-to-assembly compiler, as we would first need formal semantics for the OCaml language itself, but it is the very point : doing that kind of things forces you to have formal semantics, which is very interesting in many respects.
>
> Asking for a decent compiler was once the way to tell apart the serious languages from the insane string-fiddling script languages, but the line is blurred by the indecent amount of work injected in the optimization of those insane languages. Formal semantics will distinguish the gentlemen of the future.

Török Edwin

unread,

Nov 30, 2010, 12:57:15 PM11/30/10

to bluestorm, caml...@yquem.inria.fr

On Tue, 30 Nov 2010 18:01:28 +0100
bluestorm <bluesto...@gmail.com> wrote:

> On Tue, Nov 30, 2010 at 11:55 AM, Benedikt Meurer <
> benedik...@googlemail.com> wrote:
> >
> > LLVM backend for ocamlopt is a totally different story. You'd have
> > to start with the Ulambda or the Cmm intermediate representation,
> > while a JIT approach starts with the byte-code representation (not
> > necessarily, but that was the case for our approach).
> >
> > However, I'm not sure there would be any immediate benefit from
> > using LLVM as code generator backend for ocamlopt, since ocamlopt
> > already does a quite good (and fast) job.

I thought you already had some code to do that, but thrown it away
because the JIT was too slow. Was just trying to point out that such
code shouldn't be thrown away completely.
Did the LLVM code you had take OCaml bytecode or Ulambda/Cmm as
input?

> > So besides probably
> > educational or research interests, what would be the practical
> > benefit of LLVM inside ocamlopt?
> >
>
> There would be several advantages in switching to LLVM for code
> generation. The general idea is that if other people work on the
> low-level stuff, it is less work for the OCaml implementors.
>
> - more portability : while I'm not sure LLVM is ported to more
> platforms than OCaml right now,

OCaml's tier 1 platforms are supported by LLVM. Ocaml's tier 2 support
has more architectures than LLVM though.

> the LLVM community will probably port
> its bytecode to numerous architectures in the future;

I've seen some microcontroller backends added lately, but I've also seen
unmaintained and broken backends removed (IA-64 for example).

> in contrast,
> maintining numerous backend in the OCaml compiler has a maintainance
> cost. In particular, cross-compilation would probably be easier.
>
> - more optimizations : the LLVM guys try to develop a wide range of
> optimization passes between LLVM IR and native code, while ocamlopt is
> mostly a "non-optimising compiler". It's not clear however how much
> gain we could have, as OCaml has already optimized the most important
> parts, and a big part of the performance concerns are outside the
> LLVM area (data representation and GC). Still, the experience of
> GHC-LLVM has been quite positive, with noticeable improvements in
> some cases.
>
> On the other hand, it may be non-trivial to adapt LLVM to cope with
> the OCaml runtime, the GC in particuliar, and that would probably
> involve upstream changes to LLVM.

There is some support for emitting (assembly, not JIT) code that works
with OCaml 3.10, it'd probably have to be adapted for 3.12:
http://llvm.org/svn/llvm-project/llvm/trunk/lib/CodeGen/AsmPrinter/OcamlGCPrinter.cpp
http://llvm.org/svn/llvm-project/llvm/trunk/lib/CodeGen/OcamlGC.cpp
http://llvm.org/releases/2.8/docs/GarbageCollection.html

>
> LLVM is nice and trendy

It'd be even nicer if it was written in OCaml, whenever I write C++
code lately it feels like I could've written the same with much less
code in OCaml.

If someone were to develop an LLVM backend for OCaml, I think it could
coexist with ocamlopt though, allowing the user to choose which backend
it wants to use.

> (though it's a shame the GNU guys, partly due
> to their own mistakes, are losing an important part of the FLOSS
> ecosystem to Apple...),

Well there could also be a GCC backend for OCaml, but I don't know how
one should get started writing one (or if it would be worth it).

Best regards,
--Edwin

Basile Starynkevitch

unread,

Nov 30, 2010, 1:28:22 PM11/30/10

to Török Edwin, caml...@yquem.inria.fr

On Tue, 30 Nov 2010 19:26:00 +0200
Török Edwin <edwin...@gmail.com> wrote:
>
> Well there could also be a GCC backend for OCaml, but I don't know how
> one should get started writing one (or if it would be worth it).

Since I worked on OcamlJIT1 and since I am working now on GCC
(actually, mostly on the GCC MELT branch, see www.gcc-melt.org for
more, but sometimes on the GCC trunk, e.g. its gengtype generator), I
probably have some personal ideas [e.g. my opinions only] on that.
However, I am not crazy enough to work on an Ocaml front-end to GCC!

The current (slow) trend inside GCC is to open & stabilize more and
more its middle-end. In particular, the GCC middle end internal
representations (which you can work on using MELT, a Lispy domain
specific language suited for that very purpose) such as Gimple (and
Tree) are becoming more stable, and will eventually (I don't know how
and when and by whom!) have a "front-end", that is a program eating a
textual representation of Gimple code and building from it Gimple
internal representation in GCC memory.

So in the long term, the GCC internal Gimple language is becoming more
stable and will have a textual front-end. It will then becomes quite
similar to the LLVM input language http://llvm.org/docs/LangRef.html.

When that happens, one could imagine that ocamlopt (or some other
program) will have the ability to generate (in textual files) the
appropriate representations of the Ocaml code it is compiling.

One could also imagine that in the long term, GCC would provide enough
plugin hooks to plug a full (nearly arbitrary) front-end inside it. If
that happens, one could imagine an Ocaml compiler, implemented as a
gcc-ocaml-frontend.so plugin, which would create Gimple internals from
Ocaml code.

However, I believe no one is interested to work on that, and this is
probably the case because Ocaml is "fast-enough" for most cases. Ocaml
major strength is the power and expressiveness of its source language.

My feeling is that Ocaml is more targetted for people requiring
developers' productivity, while GCC is a mature industrial compiler
aiming portability and performance of generated binary programs, coded
in old [ugly] languages like C or C++.

I am not sure that putting a lot of efforts inside the Ocaml compiler,
to slightly improve the performance of generated executable, is
worthwhile (in particular, because the runtime & the GC may become a
bottleneck w.r.t to minor performance improvement of ocamlopt generated
binaries). I would bet that would bring less than 20% improvement at
most (and often, much less), for a quite high labor cost.

And my personal feeling is that these days, bright Gallium people (like
Xavier Leroy or François Pottier and their colleagues) and other Inria
persons related to Ocaml (like Damien Doligez or Pierre Weis and many
others) are much more interested in bringing formal methods inside
compilers (e.g. the Compcert project of a certified & "proved" C
compiler) than in spending a lot of time to improve ocamlopt generated
code by a few percents.

I would prefer them to improve even more the Ocaml language (e.g.
GADT...), perhaps to work on a parallel runtime (but we all now that is
*very* hard and perhaps even boring), perhaps even to start working, as
researchers, on the [incompatible] successor of Ocaml. As food for
thought, I tend to believe that parallel computers (e.g. "clouds", or
just our next desktop with 16 cores) will require another programming
style and another programming language, and that incrementally
improving Ocaml for such systems is not enough. Some people will have
to invent another way of thinking and programming these parallel
systems, and it might be the same people who brought us Ocaml!

But all this is day-dreaming, I have no real ideas, just personal
wishes (and a big admiration for all the Ocaml people at INRIA, which
are *researchers* not random software developers).

Cheers

--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***

Jon Harrop

unread,

Nov 30, 2010, 4:19:32 PM11/30/10

to bluestorm, Benedikt Meurer, caml...@yquem.inria.fr

Bluestorm wrote:
> - more optimizations : the LLVM guys try to develop a wide range of
optimization passes between
> LLVM IR and native code, while ocamlopt is mostly a "non-optimising
compiler". It's not clear
> however how much gain we could have, as OCaml has already optimized the
most important parts,
> and a big part of the performance concerns are outside the LLVM area (data
representation and
> GC). Still, the experience of GHC-LLVM has been quite positive, with
noticeable improvements
> in some cases.

Without a typeful intermediate representation of the program, LLVM's hands
would be tied when it comes to most optimization passes. You need to convey
as much type information as possible via the LLVM IR for LLVM's optimization
passes to be really effective.

Cheers,
Jon.

Jon Harrop

unread,

Nov 30, 2010, 5:07:00 PM11/30/10

to Yoann Padioleau, bluestorm, caml...@yquem.inria.fr

Yoann wrote:
> Yes, and it has to stop. I don't understand why there is so much hype
around LLVM. Why
> would you think something written in C++ would be far better than the
ocaml code we have
> in the ocamlopt compiler?

Because benchmarks like my HLVM ones have proven that LLVM can generate
*much* faster code than ocamlopt does.

For example, Fibonacci function over floats in HLVM with optimization passes
disabled and compilation time included in the measurement:

# let rec fib (x: float) : float =
if x < 1.5 then x else fib(x - 1.0) + fib(x - 2.0);;
# fib 40.0;;
- : `Float = 1.02334e+08
Live: 0
2.48074s total; 0s suspend; 0s mark; 0s sweep

And ocamlopt without compilation time:

$ cat >fib.ml
let rec fib x = if x < 1.5 then x else fib(x -. 1.0) +. fib(x -. 2.0);;
fib 40.0;;
$ ocamlopt fib.ml -o fib
$ time ./fib

real 0m7.811s
user 0m7.808s
sys 0m0.000s

Note that HLVM's *REPL* is over 3x faster than ocamlopt-compiled OCaml.

I have microbenchmarks where LLVM generates code over 10x faster than
ocamlopt (specifically, floating point code on x86) and larger numerical
programs that also wipe the floor with OCaml.

LLVM is also much better documented than ocamlopt's internals.

Cheers,
Jon.

Benedikt Meurer

unread,

Nov 30, 2010, 5:48:28 PM11/30/10

to caml...@yquem.inria.fr

On Nov 30, 2010, at 23:06 , Jon Harrop wrote:

> Because benchmarks like my HLVM ones have proven that LLVM can generate
> *much* faster code than ocamlopt does.
>
> For example, Fibonacci function over floats in HLVM with optimization passes
> disabled and compilation time included in the measurement:
>
> # let rec fib (x: float) : float =
> if x < 1.5 then x else fib(x - 1.0) + fib(x - 2.0);;
> # fib 40.0;;
> - : `Float = 1.02334e+08
> Live: 0
> 2.48074s total; 0s suspend; 0s mark; 0s sweep
>
> And ocamlopt without compilation time:
>
> $ cat >fib.ml
> let rec fib x = if x < 1.5 then x else fib(x -. 1.0) +. fib(x -. 2.0);;
> fib 40.0;;
> $ ocamlopt fib.ml -o fib
> $ time ./fib
>
> real 0m7.811s
> user 0m7.808s
> sys 0m0.000s
>
> Note that HLVM's *REPL* is over 3x faster than ocamlopt-compiled OCaml.

This has nothing to do with LLVM, but is simply due to the fact that your code does not box the float parameters/results. The following peace of C code is most probably even faster than your code, so what?

double fib(double x) { if (x < 1.5) return x else return fib(x-1) + fib(x-2); }

So this is about data representation not code generation (using LLVM with boxed floats would result in same/lower performance); HLVM ignores complex stuff like polymorphism, modules, etc. (at least last time I checked), which makes code generation almost trivial. The literature includes various solutions to implement stuff like ML polymorphism: tagged integers/boxed floats/objects is just one solution, not necessarily the best; but examples that simply ignore the complex stuff, and therefore deliver better performance don't really help to make progress.

A possible solution to improve ocamlopt's performance in this case would be to compile the recursive fib calls in a way that the parameter/result is passed in a floating point register (not boxed), and provide a wrapper for calls from outside, which unboxes the parameter, invokes the actual function code, and boxes the result. This should be doable on the Ulambda/Cmm level, so it is actually quite portable and completely independent of the low-level code generation (which is where LLVM comes into play). That way ocamlopt code will be as fast as the C code for this example.

> I have microbenchmarks where LLVM generates code over 10x faster than
> ocamlopt (specifically, floating point code on x86) and larger numerical
> programs that also wipe the floor with OCaml.

ocamlopt's x86 floating point backend is easy to beat, as demonstrated in the original post. Even a simple byte-code JIT engine (which still boxes floating point results, etc.) is able to beat it.

Your benchmarks do not prove that LLVM generates faster code than ocamlopt. Instead you proved that OCaml's floating point representation comes at a cost for number crunching applications (which is obvious). Use the same data representation with LLVM (or C) and you'll notice that the performance is the same (most likely worse) compared to ocamlopt.

> LLVM is also much better documented than ocamlopt's internals.

Definitely, but LLVM replaces just the low-level stuff of ocamlopt (most probably starting at the Cmm level), so a lot of (undocumented) ocamlopt internals remain.

> Cheers,
> Jon.

regards,
Benedikt

Erik de Castro Lopo

unread,

Nov 30, 2010, 7:17:12 PM11/30/10

to caml...@inria.fr

Jon Harrop wrote:

> Because benchmarks like my HLVM ones have proven that LLVM can generate
> *much* faster code than ocamlopt does.

Until ocamlopt has an LLVM backend that is not a fair comparison.

I suspect that the speed differences between your HLVM and ocamlopt
have very little to do with LLVM and are almost totally due to
other factors.

> LLVM is also much better documented than ocamlopt's internals.

LLVM has well over 20 full time programmers working on it. The
Ocaml compiler has how many?

Erik
--
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/

Yoann Padioleau

unread,

Nov 30, 2010, 8:34:21 PM11/30/10

to caml...@inria.fr

On Nov 30, 2010, at 4:16 PM, Erik de Castro Lopo wrote:

>
> Jon Harrop wrote:
>
>> Because benchmarks like my HLVM ones have proven that LLVM can generate
>> *much* faster code than ocamlopt does.
>
> Until ocamlopt has an LLVM backend that is not a fair comparison.
>
> I suspect that the speed differences between your HLVM and ocamlopt
> have very little to do with LLVM and are almost totally due to
> other factors.
>
>> LLVM is also much better documented than ocamlopt's internals.
>
> LLVM has well over 20 full time programmers working on it. The
> Ocaml compiler has how many?

Microsoft has hundreds of developers working on C#, CIL, visual studio (and its integrated debugger), etc
and I still find ocaml and ocamldebug superior.

Jon Harrop

unread,

Dec 1, 2010, 7:59:25 AM12/1/10

to caml...@inria.fr

Erik wrote:

> Jon wrote:
> > LLVM is also much better documented than ocamlopt's internals.
>
> LLVM has well over 20 full time programmers working on it. The
> Ocaml compiler has how many?

Another reason why LLVM deserves its hype.

Cheers,
Jon.

ivan chollet

unread,

Dec 1, 2010, 8:55:46 AM12/1/10

to Jon Harrop, caml...@inria.fr

20 full time programmers, but if each of them has 1/40 of the skills of X.
Leroy, LLVM will get only 50% of the caml native compiler performance.
I'm not sure hype is correlated with value as you seem to imply but i might
be wrong.

This was the troll of the day.

Jon Harrop

unread,

Dec 1, 2010, 9:12:01 AM12/1/10

to Benedikt Meurer, caml...@yquem.inria.fr

Benedikt wrote:
> This has nothing to do with LLVM, but is simply due to the fact that
> your code does not box the float parameters/results.

Exactly, yes.

> The following peace of C code is most probably even faster than your
> code, so what?
>
> double fib(double x) { if (x < 1.5) return x else return fib(x-1) +
> fib(x-2); }

FWIW, gcc -O2 takes longer to compile and produces slower code than HLVM
anyway:

$ time gcc -O2 fib.c -o fib

real 0m0.161s
user 0m0.124s
sys 0m0.036s

$ time ./fib
102334155.000000

real 0m2.792s
user 0m2.792s
sys 0m0.000s

If you're asking what the advantages of using LLVM over generating C code
are, I'd say functionality like more numeric types, tail calls and JIT
compilation come top but also the fact that LLVM bundles nice OCaml bindings
and makes it easy to generate fast code. Also, I have other examples (e.g.
the random number generator from the SciMark2 benchmark) where LLVM can
generate code that runs 2x faster than anything I've been able to get out of
GCC.

> So this is about data representation not code generation (using LLVM
> with boxed floats would result in same/lower performance);

Yes. LLVM lets you choose your own data representation and applications like
numerics can benefit enormously from that. All the more reason to use LLVM.

> HLVM ignores
> complex stuff like polymorphism, modules, etc. (at least last time I
> checked), which makes code generation almost trivial.

OCaml ignores complex stuff like parallelism and value types (at least last
time I checked ;-). This is precisely why OCaml and HLVM complement each
other. Ideally, we would want all of this at the same time but, for now,
we'll have to settle for separate solutions.

> The literature
> includes various solutions to implement stuff like ML polymorphism:
> tagged integers/boxed floats/objects is just one solution, not
> necessarily the best; but examples that simply ignore the complex
> stuff, and therefore deliver better performance don't really help to
> make progress.

Right. Reified generics can be a much better solution for performance,
particularly when combined with value types, and something else that
ocamlopt cannot express but HLVM can.

> A possible solution to improve ocamlopt's performance in this case
> would be to compile the recursive fib calls in a way that the
> parameter/result is passed in a floating point register (not boxed),
> and provide a wrapper for calls from outside, which unboxes the
> parameter, invokes the actual function code, and boxes the result. This
> should be doable on the Ulambda/Cmm level, so it is actually quite
> portable and completely independent of the low-level code generation
> (which is where LLVM comes into play). That way ocamlopt code will be
> as fast as the C code for this example.

Wow, I'd love to see your OCaml version that is as fast as C.

> > I have microbenchmarks where LLVM generates code over 10x faster than
> > ocamlopt (specifically, floating point code on x86) and larger
> > numerical programs that also wipe the floor with OCaml.
>
> ocamlopt's x86 floating point backend is easy to beat, as demonstrated
> in the original post. Even a simple byte-code JIT engine (which still
> boxes floating point results, etc.) is able to beat it.

Yes. Another reason to consider alternatives to ocamlopt for such
applications.

> Your benchmarks do not prove that LLVM generates faster code than
> ocamlopt.

My benchmarks show LLVM generating faster code than ocamlopt.

> Instead you proved that OCaml's floating point representation
> comes at a cost for number crunching applications (which is obvious).
> Use the same data representation with LLVM (or C) and you'll notice
> that the performance is the same (most likely worse) compared to
> ocamlopt.

You are saying is that LLVM might not be faster if it were also afflicted
with ocamlopt's design, which is unproductive speculation. The point is that
LLVM and HLVM are not constrained by those design decisions as OCaml is, so
you can use them to generate much faster code.

> > LLVM is also much better documented than ocamlopt's internals.
>
> Definitely, but LLVM replaces just the low-level stuff of ocamlopt
> (most probably starting at the Cmm level), so a lot of (undocumented)
> ocamlopt internals remain.

Sure. OCaml has a good pattern match compiler, for example.

Cheers,
Jon.

Benedikt Meurer

unread,

Dec 1, 2010, 10:00:46 AM12/1/10

to caml...@yquem.inria.fr

On Dec 1, 2010, at 15:11 , Jon Harrop wrote:

> If you're asking what the advantages of using LLVM over generating C code
> are, I'd say functionality like more numeric types, tail calls and JIT
> compilation come top but also the fact that LLVM bundles nice OCaml bindings
> and makes it easy to generate fast code. Also, I have other examples (e.g.
> the random number generator from the SciMark2 benchmark) where LLVM can
> generate code that runs 2x faster than anything I've been able to get out of
> GCC.

LLVM sure does better as an intermediate language than C does, no question. But that wasn't my point in this example.

>> So this is about data representation not code generation (using LLVM
>> with boxed floats would result in same/lower performance);
>
> Yes. LLVM lets you choose your own data representation and applications like
> numerics can benefit enormously from that. All the more reason to use LLVM.

As does assembler, so even more reasons to emit assembler?

>> The literature
>> includes various solutions to implement stuff like ML polymorphism:
>> tagged integers/boxed floats/objects is just one solution, not
>> necessarily the best; but examples that simply ignore the complex
>> stuff, and therefore deliver better performance don't really help to
>> make progress.
>
> Right. Reified generics can be a much better solution for performance,
> particularly when combined with value types, and something else that
> ocamlopt cannot express but HLVM can.

So this is an area to work on within ocamlopt.

>> Instead you proved that OCaml's floating point representation
>> comes at a cost for number crunching applications (which is obvious).
>> Use the same data representation with LLVM (or C) and you'll notice
>> that the performance is the same (most likely worse) compared to
>> ocamlopt.
>
> You are saying is that LLVM might not be faster if it were also afflicted
> with ocamlopt's design, which is unproductive speculation. The point is that
> LLVM and HLVM are not constrained by those design decisions as OCaml is, so
> you can use them to generate much faster code.

We are talking about using LLVM within the OCaml compiler, so we have to talk about OCaml, not HLVM or anything else.

Don't get me wrong, I'm curious to see an LLVM backend for ocamlopt, especially since that way we would get a native top-level for free (at least for the JIT capable LLVM targets). But I doubt that LLVM will make a noticeable difference (except for probably reduced number of target specific code), since it will be faced with the same data representation used right now and LLVM will not be able to optimize much (assuming the ocamlopt higher level optimizations were already applied).

What you are talking about is replacing several design decisions within OCaml (which have nothing to do with the low-level code generator); this might be a good idea, and may indeed improve generated code. I don't know the opinion of the OCaml core developers, but from my POV, this sounds like an interesting challenge; however getting beyond a simplified proof-of-concept requires a lot of hard work.

> Cheers,
> Jon.

greets,
Benedikt

Jon Harrop

unread,

Dec 1, 2010, 5:03:50 PM12/1/10

to Benedikt Meurer, caml...@yquem.inria.fr

Benedikt wrote:
> On Dec 1, 2010, at 15:11 , Jon Harrop wrote:
> > If you're asking what the advantages of using LLVM over generating C
> code
> > are, I'd say functionality like more numeric types, tail calls and
> JIT
> > compilation come top but also the fact that LLVM bundles nice OCaml
> bindings
> > and makes it easy to generate fast code. Also, I have other examples
> (e.g.
> > the random number generator from the SciMark2 benchmark) where LLVM
> can
> > generate code that runs 2x faster than anything I've been able to get
> out of
> > GCC.
>
> LLVM sure does better as an intermediate language than C does, no
> question. But that wasn't my point in this example.

I think we are talking at cross purposes. My post was to explain some of the
many benefits of LLVM in response to Yoann's statement:

"I don't understand why there is so much hype around LLVM. Why would you
think something written in C++ would be far better than the ocaml code we
have in the ocamlopt compiler?"

I was not just talking about porting ocamlopt to LLVM but more generally
about the advantages of using LLVM from OCaml in any way.

> >> So this is about data representation not code generation (using LLVM
> >> with boxed floats would result in same/lower performance);
> >
> > Yes. LLVM lets you choose your own data representation and
> applications like
> > numerics can benefit enormously from that. All the more reason to use
> LLVM.
>
> As does assembler, so even more reasons to emit assembler?

LLVM makes it a *lot* easier to generate efficient code, of course.

> >> The literature
> >> includes various solutions to implement stuff like ML polymorphism:
> >> tagged integers/boxed floats/objects is just one solution, not
> >> necessarily the best; but examples that simply ignore the complex
> >> stuff, and therefore deliver better performance don't really help to
> >> make progress.
> >
> > Right. Reified generics can be a much better solution for performance,
> > particularly when combined with value types, and something else that
> > ocamlopt cannot express but HLVM can.
>
> So this is an area to work on within ocamlopt.

If you can add value types and reified generics to ocamlopt and get them
accepted that would be great.

> > You are saying is that LLVM might not be faster if it were also
> afflicted
> > with ocamlopt's design, which is unproductive speculation. The point
> is that
> > LLVM and HLVM are not constrained by those design decisions as OCaml
> is, so
> > you can use them to generate much faster code.
>
> We are talking about using LLVM within the OCaml compiler, so we have
> to talk about OCaml, not HLVM or anything else.

I was talking more generally about the benefits of LLVM in response to
Yoann's statement about the hype surrounding LLVM. HLVM demonstrates some of
the things a next-generation ocamlopt might be capable of.

> Don't get me wrong, I'm curious to see an LLVM backend for ocamlopt,
> especially since that way we would get a native top-level for free (at
> least for the JIT capable LLVM targets). But I doubt that LLVM will
> make a noticeable difference (except for probably reduced number of
> target specific code), since it will be faced with the same data
> representation used right now and LLVM will not be able to optimize
> much (assuming the ocamlopt higher level optimizations were already
> applied).

Absolutely.

> What you are talking about is replacing several design decisions within
> OCaml (which have nothing to do with the low-level code generator);

Except that those design decisions have a huge effect on the optimizations
LLVM will do (if it is your code gen).

> this might be a good idea, and may indeed improve generated code.

HLVM's benchmark results would certainly seem to indicate that, yes.

> I don't know the opinion of the OCaml core developers, but from my POV,
> this sounds like an interesting challenge; however getting beyond a
> simplified proof-of-concept requires a lot of hard work.

Yes. At the end of the day, the performance gains that can be obtained by
bringing the GC up to date already dwarf all of these other effects so a new
GC is surely the highest priority.

This begs a couple of questions:

* How dependent is OCaml bytecode on the data representation and GC?

* Could you write an OCaml bytecode interpreter in OCaml?

Perhaps we could write a multicore friendly OCamlJIT3. :-)

Cheers,
Jon.

Eray Ozkural

unread,

Dec 1, 2010, 8:42:23 PM12/1/10

to Jon Harrop, caml...@yquem.inria.fr

On Thu, Dec 2, 2010 at 12:03 AM, Jon Harrop <
jonathand...@googlemail.com> wrote:
>
> >
> > As does assembler, so even more reasons to emit assembler?
>
> LLVM makes it a *lot* easier to generate efficient code, of course.

Just like the way gcc is using a hierarchy of program representations from
high-level to low-level, an ocaml compiler could in principle perform type
and language specific optimization passes in the high-level (cmm for
instance) and then emit llvm code, without any loss of efficiency, I think.

Still, I have to agree with other posters stating that a compiler backend
written in ocaml is preferable. I agree, if not only for the reason that I
know from experience it would be 10 times easier to write in ocaml some of
the passes I have had to deal with in C++. Even working with symbols is
quite awkward in C++. While on the other hand, libraries like boost can help
at times, but it is nothing we cannot replicate in ocaml. As a compiler
developer, I would be most intrigued in profiling and static analysis
information represented in ocaml.

I don't see why hybrids are a bad idea, though. If you see LLVM becoming an
industry standard, then, of course it's worth supporting. It means you'll
target new architectures with no effort.

Best,

--
Eray Ozkural, PhD candidate. Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct

Benedikt Meurer

unread,

Dec 3, 2010, 5:04:05 AM12/3/10

to caml...@yquem.inria.fr

So, let's take that LLVM thing to a somewhat more serious level (which means no more arguing with toy projects that simply ignore the difficult stuff). What would be necessary to support LLVM in ocamlopt, and what would be the benefits? First, what has to be done:

1. Obviously, one would need to emit LLVM code from ocamlopt (starting at the Cmm level). Following the GHC/EHC approach we could simply emit the .ll file here as a starting point (no fiddling with the LLVM API then, could still switch to the LLVM API later for runtime code generation).

2. LLVM would need to generate the stack maps for the GC and the required symbols for the program (code_begin, code_end, data_begin, data_end). LLVM already contains an OcamlGCPrinter class, which seems to do exactly this; so this may be easy.

3a. If the current runtime is to be reused, LLVM will have to support OCaml calling conventions (easy to implement, did that already), LLVM must be taught about the OCaml runtime state (i.e. %r14 and %r15 on x86-64 must not be used by LLVM, also easy to implement), and LLVM must be extended with intrinsics to support allocation and exception handling with appropriate lowering for the targets (somewhat difficult and questionable whether this will be merged into LLVM, since it's usable by OCaml only).

3b. Alternatively we could also replace the native interface of the current runtime with something that fits better with what is already provided by LLVM (which isn't a lot, so we would have to limit ourselves somewhat). Allocation and GC are possible without too many changes here, but the exception handling is really difficult, since LLVM doesn't provide anything usable here; we'd either need to do everything by hand or use one of the two modes provided by LLVM (overhead, most probably).

4. We need a way to emit Cmm data items. LLVM cannot handle this ATM, which should be fixed (dunno how yet). In the meantime we can emit Cmm data items using module level asm.

It is important to note that we won't get anything for free. Work is required for every single platform we plan to support using the LLVM backend. This is probably the case for every serious programming language (and no, we don't need another mention of HLVM here, it's a toy, that's all). So, there's some boring and probably some interesting work to be done, but what do we gain?

1. Reduce lines of code and improved maintainability of ocamlopt once all targets are replaced with the LLVM backend. But as said, this does itself come at a cost, and depending on whether 3a. or 3b. is implemented, we burden ourselves with maintaining code within LLVM.

2. Some interesting optimizations and all the standard optimizations for free.

3. A better LLVM. If we do it right, other's implementing functional languages using LLVM will not be faced with the same problems again. Dunno if that's possible.

4. And for the kids: LLVM is cool, OCaml will use LLVM, so OCaml is also cool. ;-)

Anything I forgot to mention? Comments?

Benedikt

Till Varoquaux

unread,

Dec 3, 2010, 8:35:05 AM12/3/10

to Benedikt Meurer, caml...@yquem.inria.fr

Thanks for the summary.

You seem to think LLVM wouldn't actually buy us much in term of
optimisations. In my experience the current ocaml compiler is really
good when writing code fairly low level but discourages use of
combinator library, higher order functions, functors in performance
sensitive code (i.e. you have to do inlining, specialization, constant
propaagation etc... by hand).

I was under the impression that some of LLVM passes could be a good
match for those problems. That is: micro benchmark code that is
written carefully with those constraints in mind wouldn't gain much
but some form of "origami" programming could be unfolded by the
compiler. Am I missing something obvious? (e.g. need for better side
effect analysis).

Cheers,
Till

Eray Ozkural

unread,

Dec 3, 2010, 8:41:48 AM12/3/10

to Till Varoquaux, caml...@yquem.inria.fr

Is there IPA in LLVM? I didn't know that.

Török Edwin

unread,

Dec 3, 2010, 9:07:34 AM12/3/10

to Eray Ozkural, Till Varoquaux, caml...@yquem.inria.fr

On Fri, 3 Dec 2010 15:41:41 +0200
Eray Ozkural <exama...@gmail.com> wrote:

> Is there IPA in LLVM? I didn't know that.
>

See createStandardLTOPasses and the UnitAtATime part of
createStandardModulePasses:
http://llvm.org/svn/llvm-project/llvm/trunk/include/llvm/Support/StandardPasses.h

Also
http://llvm.org/svn/llvm-project/llvm/trunk/include/llvm/Transforms/IPO.h
http://llvm.org/svn/llvm-project/llvm/trunk/lib/Analysis/IPA

I doubt they'll be able to figure out everything on their own though,
the frontend should probably add some extra info. For example
by adding some function attributes, or some metadata about the
high-level types.
I see that LLVM has a metadata based TBAA, but right now it is for
C/C++, so I'm not sure it could take advantage of all the properties of
a functional language.
It'd probably be better if an OCamlAA was written (perhaps building on
the already existing TBAA, or improving that) that takes advantage of
immutable data structures and OCaml's type system.

Best regards,
--Edwin

Philippe Strauss

unread,

Dec 3, 2010, 9:32:49 AM12/3/10

to caml...@yquem.inria.fr

I'm totally noob on compilers internals, but if the processing of float arrays can be improved a lot by a LLVM ocamlopt, I would use it exclusively.

Jon Harrop

unread,

Dec 3, 2010, 3:07:50 PM12/3/10

to Benedikt Meurer, caml...@yquem.inria.fr

Benedikt wrote:
> So, let's take that LLVM thing to a somewhat more serious level (which
> means no more arguing with toy projects that simply ignore the
> difficult stuff).

You can learn a lot from prototypes.

> 3. A better LLVM. If we do it right, other's implementing functional
> languages using LLVM will not be faced with the same problems again.
> Dunno if that's possible.

What exactly are you thinking of that goes beyond what the other functional
languages that already target LLVM have already had to implement?

Cheers,
Jon.

Jon Harrop

unread,

Dec 3, 2010, 4:16:55 PM12/3/10

to Till Varoquaux, caml...@yquem.inria.fr

Till wrote:
> You seem to think LLVM wouldn't actually buy us much in term of
> optimisations. In my experience the current ocaml compiler is really
> good when writing code fairly low level but discourages use of
> combinator library, higher order functions, functors in performance
> sensitive code (i.e. you have to do inlining, specialization, constant
> propaagation etc... by hand).
>
> I was under the impression that some of LLVM passes could be a good
> match for those problems. That is: micro benchmark code that is
> written carefully with those constraints in mind wouldn't gain much
> but some form of "origami" programming could be unfolded by the
> compiler. Am I missing something obvious? (e.g. need for better side
> effect analysis).

I doubt many of LLVM's optimization passes would kick in if the type information has been removed by boxing and casting all the pointers to i8*.

Cheers,
Jon.

Jon Harrop

unread,

Dec 3, 2010, 4:23:13 PM12/3/10

to Philippe Strauss, caml...@yquem.inria.fr

Philippe wrote:
> I'm totally noob on compilers internals, but if the processing of float
> arrays can be improved a lot by a LLVM ocamlopt, I would use it
> exclusively.

LLVM's x86 code gen will generate more efficient floating point code than
ocamlopt if and only if the type information is available to it. With
OCaml's current design, that is unlikely and you'll still have things like
the 16Mb limit.

What algorithms are you running?

Cheers,
Jon.

Jon Harrop

unread,

Dec 3, 2010, 4:35:35 PM12/3/10

to Michael Ekstrand, caml...@yquem.inria.fr

Michael:
> We would also get new optimizations developed by the compiler backend
> and computer architecture researchers working on LLVM for free. I see
> that as one of the major benefits - it lets the OCaml community harness
> that work (and share it with other languages) and focus our energies on
> high-level language features and language-specific optimizations.
>
> Not sure if the benefit is worth the cost of doing it, though.

I'm not even sure there would be any benefit. More optimization passes to
play with != more performance.

Cheers,
Jon.

Philippe Strauss

unread,

Dec 3, 2010, 4:45:44 PM12/3/10

to Jon Harrop, caml...@yquem.inria.fr

mostly fftw using christophe troessler binding and a kind of plain convolution, actully very comparable to plain convolution 100% in caml (not using any c binding).

plus minor things like binary search, vector multiplications etc. (audio DSP).

Benedikt Meurer

unread,

Dec 5, 2010, 11:37:42 AM12/5/10

to caml...@yquem.inria.fr

On Dec 3, 2010, at 21:07 , Jon Harrop wrote:

> Benedikt wrote:
>> So, let's take that LLVM thing to a somewhat more serious level (which
>> means no more arguing with toy projects that simply ignore the
>> difficult stuff).
>
> You can learn a lot from prototypes.

Indeed, but you need stay on topic. HLVM does not prototype the use of LLVM within OCaml, because you're using a quite different data representation.

Using a different data representation within OCaml would indeed be possible, but one has to ask what it's worth? You'd get better floating point performance (most likely) and possibly also gain from a few LLVM optimization passes. But on the other hand:

1. You will have to rewrite not only the compiler, the standard library and the bytecode interpreter (already a massive amount of work), but you also loose compatibility with almost every existing OCaml library binding, since the native function interface will be different. That's a hell of a lot of work for many people.

2. The current data representation is already close to optimal for symbolic computation, so there's is not much to gain here. The most complex applications use OCaml for symbolic computation (i.e. Coq), so they will not benefit (if at all, see 3.).

3. The current garbage collector is mostly straight-forward because of the data representation. No need to carry around/load type information, you just need the following bits of information: Is it a block in the heap? Does it contain pointers? If so how many? All this information is immediately available with the current data representation (also improves cache locality of the GC loops). Even if the current GC is replaced with something more recent (which I'm sure is worth it), the new GC will also be easy to implement because of the data representation.

4. Marshalling/unmarshalling will also be touched (and probably become more complex), loosing even more compatibility (less of an issue than the FFI, but nevertheless something to take care of).

So, it is really worth to spend years on a new data representation (including fixing all C bindings, etc.)? Just to get better floating point performance? Integer performance will be almost the same, avoiding the shift/tag bit just replaces an "addq r1, r2; subq $1, r2" with "addq r1, r2"; even doing this thousands of times will not cause a noticeable difference.

I already mentioned a few simple ways to improve floating point performance w/o the need to throw away most of the code base. I.e. just compile (tail-)recursive floating point functions in a way that avoids the boxing for the (tail-)calls. If this seems too complex, you could use an even simpler trick: Just compile every floating point function twice, one "generic version", which accepts "value"s (for use in closures, on parameter position, etc.), and one "float version" which takes non-boxed doubles. Calling the appropriate version is trivial, just pass the type information down to the Cmm level. That way everything will continue to work, no need to mess up everything. And you can also apply your LLVM optimizations.

>> 3. A better LLVM. If we do it right, other's implementing functional
>> languages using LLVM will not be faced with the same problems again.
>> Dunno if that's possible.
>
> What exactly are you thinking of that goes beyond what the other functional
> languages that already target LLVM have already had to implement?

Two main areas for now: The GC interface and the exception handling. LLVM's exception support is really limited; the GC support is better and more generic. I don't know how to implement the OCaml exception model within LLVM w/o adding a lot of special case stuff to LLVM itself (mentioned in my original post); if there would be a way to easily extend LLVM with special exception models, other projects could plug in their models.

For the GC interface. From what I know right now, the "ocaml" GC plugin generates the frametables and the special symbols. But several details are left to LLVM, which don't seem to be handled properly: For example, there does not seem to be a way to tell LLVM how to spill/reload; in case of OCaml the stack cells marked as gcroots must not contain "untagged integers". I'm still trying to figure out what are the exact lowering mechanisms involved, so it may already be possible - we shall see.

The other thing I noticed so far: LLVM does not seem to eliminate stores to gcroots (probably required for Java, dunno). In case of ocamlopt, this would generate a lot of unnecessary stores. ocamlopt avoids this because it has knowledge not available to LLVM, i.e. when calling a native C floating point primitive (sin, cos, ...), ocamlopt knows that i.e. %r12 is preserved and the primitive does not allocate or raise so no need to spill the value to the stack cell, LLVM does not know this (and there's no way to tell, yet) so it has to spill and reload all (possible) gcroots.

You can think of various similar examples, where ocamlopt generates better code, simply because it "knows" OCaml (and even tho it lacks complex data/control analysis), whereas LLVM is generic (not necessarily a bad thing) but lacks ways to "learn" OCaml (or other language specific stuff). Adding more ways to tell LLVM about certain constraints will also be beneficial to other projects (not only functional programming languages...).

> Cheers,
> Jon.

Benedikt

Benedikt Meurer

unread,

Dec 5, 2010, 11:44:24 AM12/5/10

to caml...@yquem.inria.fr

On Dec 3, 2010, at 14:34 , Till Varoquaux wrote:

> Thanks for the summary.
>
> You seem to think LLVM wouldn't actually buy us much in term of
> optimisations. In my experience the current ocaml compiler is really
> good when writing code fairly low level but discourages use of
> combinator library, higher order functions, functors in performance
> sensitive code (i.e. you have to do inlining, specialization, constant
> propaagation etc... by hand).
>
> I was under the impression that some of LLVM passes could be a good
> match for those problems. That is: micro benchmark code that is
> written carefully with those constraints in mind wouldn't gain much
> but some form of "origami" programming could be unfolded by the
> compiler. Am I missing something obvious? (e.g. need for better side
> effect analysis).

This would be possible, yes.

And the usual special note: No, it doesn't work with floating point stuff (automagically).

> Cheers,
> Till

Török Edwin

unread,

Dec 5, 2010, 11:57:26 AM12/5/10

to Benedikt Meurer, caml...@yquem.inria.fr

On Sun, 5 Dec 2010 17:37:32 +0100
Benedikt Meurer <benedik...@googlemail.com> wrote:

> Two main areas for now: The GC interface and the exception handling.
> LLVM's exception support is really limited; the GC support is better
> and more generic. I don't know how to implement the OCaml exception
> model within LLVM w/o adding a lot of special case stuff to LLVM
> itself (mentioned in my original post); if there would be a way to
> easily extend LLVM with special exception models, other projects
> could plug in their models.

There is a discussion on the LLVM mailing list about changing exception
handling in LLVM:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-December/036692.html

If the new model is not generic enough to support OCaml's model, then I
think now would be a good time to describe on the LLVM ML what OCaml's
model would need and the proposal doesn't cover.

Best regards,
--Edwin

Jon Harrop

unread,

Dec 5, 2010, 3:12:52 PM12/5/10

to Benedikt Meurer, caml...@yquem.inria.fr

Benedikt wrote:
> On Dec 3, 2010, at 21:07 , Jon Harrop wrote:
> > Benedikt wrote:
> >> So, let's take that LLVM thing to a somewhat more serious level
> (which
> >> means no more arguing with toy projects that simply ignore the
> >> difficult stuff).
> >
> > You can learn a lot from prototypes.
>
> Indeed, but you need stay on topic. HLVM does not prototype the use of
> LLVM within OCaml, because you're using a quite different data
> representation.

My point was that HLVM's data representation is far more flexible than
ocamlopt's, so you can use it to measure the effect of changing data
representations on the efficiency of the code LLVM generates more easily
than with ocamlopt.

In particular, I am saying that (from my own measurements) LLVM does not
cope with data representations like ocamlopt's at all well. Specifically,
when you box and cast away type information. My guess is that retargeting
ocamlopt to LLVM IR (which is a considerable undertaking due to the amount
of hacking required on LLVM itself) would result in more performance
degradation that improvement. Moreover, that degradation is most likely to
occur on code like Coq because ocamlopt has been so heavily optimized for
that one application domain and, consequently, I doubt your hard work would
ever be accepted into mainstream OCaml.

Ultimately, LLVM was built specifically to exploit a typed intermediate
representation whereas ocamlopt removes type information very early.

> Using a different data representation within OCaml would indeed be
> possible, but one has to ask what it's worth? You'd get better floating
> point performance (most likely)

And faster tuples, ints, chars, complex numbers, low-dimensional
vectors/matrices, hash tables and so on. More types (e.g. int16 and
float32). Even arbitrary-precision arithmetic can benefit by keeping small
numbers unboxed when possible. Bigarrays disappear. The FFI gets simpler,
easier to use and more powerful (e.g. you can generate AoS layouts for
OpenGL). The benefits are enormous in the general case but that is beside
the point here. Moreover, this would never be accepted either because it
would degrade the performance of applications like Coq. If it is done at
all, such work must be kept separate from the OCaml compilers.

> and possibly also gain from a few LLVM optimization passes.

I found that LLVM's optimization passes are of little value in this context.
You want to do the optimization passes on the OCaml side (where it is vastly
easier, if nothing else) and focus on emitting high quality LLVM IR for
LLVM's code gen to work with.

> But on the other hand:
>
> 1. You will have to rewrite not only the compiler, the standard library
> and the bytecode interpreter (already a massive amount of work), but
> you also loose compatibility with almost every existing OCaml library
> binding, since the native function interface will be different. That's
> a hell of a lot of work for many people.

Yes. That would be revolution and not evolution. I cannot see any easy way
around that and also advise against it.

> 2. The current data representation is already close to optimal for
> symbolic computation, so there's is not much to gain here. The most
> complex applications use OCaml for symbolic computation (i.e. Coq), so
> they will not benefit (if at all, see 3.).

Yes.

> 3. The current garbage collector is mostly straight-forward because of
> the data representation. No need to carry around/load type information,

Note that conveying run-time type information also facilitates useful
features like generic printing.

> you just need the following bits of information: Is it a block in the
> heap? Does it contain pointers? If so how many? All this information is
> immediately available with the current data representation (also
> improves cache locality of the GC loops).

That information is also immediately available with HLVM's design. Each
value contains a pointer to its run-time type. Each run-time type contains a
pointer to a "visit" function for the GC that returns the references in any
value of that type.

> Even if the current GC is
> replaced with something more recent (which I'm sure is worth it), the
> new GC will also be easy to implement because of the data
> representation.

I'm not sure that would be any easier than HLVM's design.

> 4. Marshalling/unmarshalling will also be touched (and probably become
> more complex), loosing even more compatibility (less of an issue than
> the FFI, but nevertheless something to take care of).

Yes. I'd be amazed if anyone cared about that though.

> So, it is really worth to spend years on a new data representation
> (including fixing all C bindings, etc.)? Just to get better floating
> point performance? Integer performance will be almost the same,
> avoiding the shift/tag bit just replaces an "addq r1, r2; subq $1, r2"
> with "addq r1, r2"; even doing this thousands of times will not cause a
> noticeable difference.

On the contrary, there are many example of huge performance gains other than
floating point. The int-based random number generator from the SciMark2
benchmark is 6x faster with LLVM than with OCaml. Generic hash tables are
17x faster in F# than Java because of value types. And so on.

> I already mentioned a few simple ways to improve floating point
> performance w/o the need to throw away most of the code base. I.e. just
> compile (tail-)recursive floating point functions in a way that avoids
> the boxing for the (tail-)calls. If this seems too complex, you could
> use an even simpler trick: Just compile every floating point function
> twice, one "generic version", which accepts "value"s (for use in
> closures, on parameter position, etc.), and one "float version" which
> takes non-boxed doubles. Calling the appropriate version is trivial,
> just pass the type information down to the Cmm level. That way
> everything will continue to work, no need to mess up everything. And
> you can also apply your LLVM optimizations.

Yes.

Yes. I suspect the effort required to bring such features up-to-par with
ocamlopt will be pretty huge as well though.

Cheers,
Jon.

Benedikt Meurer

unread,

Dec 5, 2010, 3:54:36 PM12/5/10

to caml...@yquem.inria.fr

On Dec 5, 2010, at 17:57 , Török Edwin wrote:

> On Sun, 5 Dec 2010 17:37:32 +0100
> Benedikt Meurer <benedik...@googlemail.com> wrote:
>
>> Two main areas for now: The GC interface and the exception handling.
>> LLVM's exception support is really limited; the GC support is better
>> and more generic. I don't know how to implement the OCaml exception
>> model within LLVM w/o adding a lot of special case stuff to LLVM
>> itself (mentioned in my original post); if there would be a way to
>> easily extend LLVM with special exception models, other projects
>> could plug in their models.
>
> There is a discussion on the LLVM mailing list about changing exception
> handling in LLVM:
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-December/036692.html
>
> If the new model is not generic enough to support OCaml's model, then I
> think now would be a good time to describe on the LLVM ML what OCaml's
> model would need and the proposal doesn't cover.

It's not a different model, but a better implementation of what is already available. From what I've seen now, the best starting point seems to be the setjmp/longjmp model, which is supported by LLVM. It's certainly slower than the current scheme, but well, it would be a starting point.

> Best regards,
> --Edwin

Benedikt

Benedikt Meurer

unread,

Dec 5, 2010, 4:22:13 PM12/5/10

to caml...@yquem.inria.fr

From what I've seen so far, LLVM makes little use of the type information, especially integer and pointer types are treated nearly the same, so LLVM won't benefit from this information. Any other data is still distinguished at the Cmm level (i.e. doubles, strings, ...), which is necessary for the register allocation and instruction selection phase in ocamlopt. You could easily change the intermediate representations in ocaml (lambda, ulambda, cmm) to include additional type information, so this is not the problem. But as said, it won't help a lot with LLVM.

Also looking at the GHC work you mentioned, they also start at the Cmm level (slightly different C--, but comparable to ocamlopt), with mostly the same amount of type information available. And as you said, they did quite well in some cases.

>> Using a different data representation within OCaml would indeed be
>> possible, but one has to ask what it's worth? You'd get better floating
>> point performance (most likely)
>
> And faster tuples, ints, chars, complex numbers, low-dimensional
> vectors/matrices, hash tables and so on. More types (e.g. int16 and
> float32). Even arbitrary-precision arithmetic can benefit by keeping small
> numbers unboxed when possible. Bigarrays disappear. The FFI gets simpler,
> easier to use and more powerful (e.g. you can generate AoS layouts for
> OpenGL). The benefits are enormous in the general case but that is beside
> the point here. Moreover, this would never be accepted either because it
> would degrade the performance of applications like Coq. If it is done at
> all, such work must be kept separate from the OCaml compilers.

ocamlopt already supports float32 and int16, tho they are not exploited to the upper layers (don't know why, but I also don't know why one would need them). Keeping Int32/Int64 values unboxed would be possible similar to what is done with doubles; maybe there is simply no need to do (honestly, I've never needed Int32/Int64 so far, int/Natint is usually what you want and these are optimized).

>> 3. The current garbage collector is mostly straight-forward because of
>> the data representation. No need to carry around/load type information,
>
> Note that conveying run-time type information also facilitates useful
> features like generic printing.

Generic printing is currently implemented using the run-time type information (the three bits mentioned below).

>> you just need the following bits of information: Is it a block in the
>> heap? Does it contain pointers? If so how many? All this information is
>> immediately available with the current data representation (also
>> improves cache locality of the GC loops).
>
> That information is also immediately available with HLVM's design. Each
> value contains a pointer to its run-time type. Each run-time type contains a
> pointer to a "visit" function for the GC that returns the references in any
> value of that type.

This is indeed possible, yes, and also implemented by most Java VMs; each objects ships a type-info pointer. OCaml avoids the type-info pointer using clever header field encoding. A matter of taste probably.

>> So, it is really worth to spend years on a new data representation
>> (including fixing all C bindings, etc.)? Just to get better floating
>> point performance? Integer performance will be almost the same,
>> avoiding the shift/tag bit just replaces an "addq r1, r2; subq $1, r2"
>> with "addq r1, r2"; even doing this thousands of times will not cause a
>> noticeable difference.
>
> On the contrary, there are many example of huge performance gains other than
> floating point. The int-based random number generator from the SciMark2
> benchmark is 6x faster with LLVM than with OCaml. Generic hash tables are
> 17x faster in F# than Java because of value types. And so on.

I doubt that this is really related to the use of "value types", I also doubt that the integer performance improvements are related to the missing "subq" (must have been an unusual CPU then). The Java performance is probably related to the GC in most JVMs, which is not as well optimized for high speed allocation and compaction as the GCs found in most runtimes of functional programming languages. But this is just guessing, we'd need some facts to be sure what's the cause. I suggest you present some proof-of-concept code in C or LLVM, simply using malloc() for memory allocation, comparing a straight-forward implementation to a "value type" implementation (with everything else the same). If there's still a 17x improvement, I promise to rewrite the whole OCaml system during the next three months to support value types.

> Cheers,
> Jon.

greets,
Benedikt

Benedikt Meurer

unread,

Dec 5, 2010, 4:44:32 PM12/5/10

to caml...@yquem.inria.fr

On Dec 5, 2010, at 22:21 , Benedikt Meurer wrote:

> On Dec 5, 2010, at 21:12 , Jon Harrop wrote:
>
>>> Using a different data representation within OCaml would indeed be
>>> possible, but one has to ask what it's worth? You'd get better floating
>>> point performance (most likely)
>>
>> And faster tuples, ints, chars, complex numbers, low-dimensional
>> vectors/matrices, hash tables and so on. More types (e.g. int16 and
>> float32). Even arbitrary-precision arithmetic can benefit by keeping small
>> numbers unboxed when possible. Bigarrays disappear. The FFI gets simpler,
>> easier to use and more powerful (e.g. you can generate AoS layouts for
>> OpenGL). The benefits are enormous in the general case but that is beside
>> the point here. Moreover, this would never be accepted either because it
>> would degrade the performance of applications like Coq. If it is done at
>> all, such work must be kept separate from the OCaml compilers.
>
> ocamlopt already supports float32 and int16, tho they are not exploited to the upper layers (don't know why, but I also don't know why one would need them).

To be exact, you could load/store float32 and int16 values. At least for int16, computations would be less efficient on modern CPUs, since the partial register reads/writes would create false dependencies and therefore result in less efficient instruction scheduling (similar for float32 with SSE2, i.e. "divss" leaves the three higher doublewords of the target register unchanged, thereby introducing false dependencies on previous instructions).

Erik de Castro Lopo

unread,

Dec 5, 2010, 5:34:25 PM12/5/10

to caml...@inria.fr

Benedikt Meurer wrote:

> Using a different data representation within OCaml would indeed be
> possible, but one has to ask what it's worth?

Is it necessary to use a different data representation? What does
that buy you?

> 1. You will have to rewrite not only the compiler,

At least the compiler backend, and the new backend might require
changes in earlier stages earlier of the compiler.

> the standard
> library and the bytecode interpreter (already a massive amount of
> work),

Why?

> but you also loose compatibility with almost every existing OCaml
> library binding, since the native function interface will be different.
> That's a hell of a lot of work for many people.

Again, why?

When the GHC haskell compiler gained an LLVM backend, the LLVM project
accepted changes to allow a new GHC specific calling convention. Couldn't
the same be done for Ocaml?

> 3. The current garbage collector is mostly straight-forward because of

There is no need to use LLVM's GC if something better already exists.

> Two main areas for now: The GC interface and the exception handling.

If the Ocaml versions of these are already better than what LLVM can
provide, here is no reason to use the LLVM versions.

I'm currently working on the LLVM backend to an experimental compiler
called DDC:

http://www.mega-nerd.com/erikd/Blog/CodeHacking/DDC/index.html

DDC had an existing C backend and its own garbage collector. For the
LLVM backend I have kept the C calling convention and the existing
GC and just used LLVM for code generation.

In my experience, LLVM has been a pleasure to work with, but the
backend is not yet far enough advanced for any speed comparisons
between the C and LLVM backend.

Erik
--
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/

_______________________________________________

Erik de Castro Lopo

unread,

Dec 5, 2010, 5:42:06 PM12/5/10

to caml...@inria.fr

Jon Harrop wrote:

> My point was that HLVM's data representation is far more flexible than
> ocamlopt's,

Would you be able to list those differences for us?

> In particular, I am saying that (from my own measurements) LLVM does not
> cope with data representations like ocamlopt's at all well. Specifically,
> when you box and cast away type information.

Yes, thats obviously a mistake when generating typed assembly language
like LLVM.

> Ultimately, LLVM was built specifically to exploit a typed intermediate
> representation whereas ocamlopt removes type information very early.

That suggests that a first pass at adding an LLVM backend would
be to extend the used of typed data representations through to the
backend of the compiler.

> And faster tuples, ints, chars, complex numbers, low-dimensional
> vectors/matrices, hash tables and so on. More types (e.g. int16 and
> float32).

So specifically, you keep much more data in unboxed form?

Erik
--
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/

_______________________________________________

Benedikt Meurer

unread,

Dec 6, 2010, 3:27:53 AM12/6/10

to caml...@yquem.inria.fr

On Dec 5, 2010, at 23:34 , Erik de Castro Lopo wrote:

> Benedikt Meurer wrote:
>
>> Using a different data representation within OCaml would indeed be
>> possible, but one has to ask what it's worth?
>
> Is it necessary to use a different data representation? What does
> that buy you?

Read my mail again please, it was all about the data representation issues.

>> the standard
>> library and the bytecode interpreter (already a massive amount of
>> work),
>
> Why?

Because you want to keep the bytecode interpreter in sync, otherwise you can no longer share any piece of C code between ocamlopt and ocaml/ocamlrun.

>> but you also loose compatibility with almost every existing OCaml
>> library binding, since the native function interface will be different.
>> That's a hell of a lot of work for many people.
>
> Again, why?
>
> When the GHC haskell compiler gained an LLVM backend, the LLVM project
> accepted changes to allow a new GHC specific calling convention. Couldn't
> the same be done for Ocaml?

This is not about calling conventions, and also not about LLVM. This is about data representation.

>> 3. The current garbage collector is mostly straight-forward because of
>
> There is no need to use LLVM's GC if something better already exists.

LLVM does not provide a GC.

>> Two main areas for now: The GC interface and the exception handling.
>
> If the Ocaml versions of these are already better than what LLVM can
> provide, here is no reason to use the LLVM versions.

It isn't as simple. With LLVM you are no longer in control of the code generation. The "ocaml versions" use special registers and stack layout, which will not work with LLVM's code generation unless LLVM is patched appropriately.

> Erik

Benedikt

David Rajchenbach-Teller

unread,

Dec 6, 2010, 4:29:09 AM12/6/10

to Benedikt Meurer, caml...@yquem.inria.fr

>
>>> 3. The current garbage collector is mostly straight-forward because of
>>
>> There is no need to use LLVM's GC if something better already exists.
>
> LLVM does not provide a GC.

True. However, if I recall correctly, it provides some facilities to use OCaml's own GC [1]

[1] http://llvm.org/releases/2.5/docs/GarbageCollection.html#ocaml .

Cheers,
David

Richard Jones

unread,

Dec 6, 2010, 6:08:33 AM12/6/10

to Benedikt Meurer, caml...@yquem.inria.fr

On Sun, Dec 05, 2010 at 05:37:32PM +0100, Benedikt Meurer wrote:
> 1. You will have to rewrite not only the compiler, the standard
> library and the bytecode interpreter (already a massive amount of
> work), but you also loose compatibility with almost every existing
> OCaml library binding, since the native function interface will be
> different. That's a hell of a lot of work for many people.

It might kick-start efforts to automatically generate bindings, at
least to C code, which wouldn't be a bad thing. camlidl is clumsy and
doesn't handle a very important case (pointers to handles) so it's not
really useful for this.

I do agree with the rest of your points though, and it's good to have
intelligent discussion of the real issues at long last.

Rich.

--
Richard Jones
Red Hat

Jon Harrop

unread,

Dec 6, 2010, 3:18:42 PM12/6/10

to Richard Jones, Benedikt Meurer, caml...@yquem.inria.fr

Richard Jones wrote:
> It might kick-start efforts to automatically generate bindings, at
> least to C code, which wouldn't be a bad thing. camlidl is clumsy and
> doesn't handle a very important case (pointers to handles) so it's not
> really useful for this.

Yes, LLVM is ideal for this. Using LLVM to JIT compile OCaml<->C interop
stubs should be quite easy. I did a bit of this in some of the OCaml Journal
article that were on LLVM.

You could also kill two birds with one stone by creating an OCaml<->HLVM
bridge with it. That would make it easier to create a bigarray from OCaml
and call into high-performance parallel numerical code to chomp on it.

Cheers,
Jon.

Jon Harrop

unread,

Dec 6, 2010, 5:39:24 PM12/6/10

to Benedikt Meurer, caml...@yquem.inria.fr

Benedikt wrote:
> Also looking at the GHC work you mentioned, they also start at the Cmm
> level (slightly different C--, but comparable to ocamlopt), with mostly
> the same amount of type information available. And as you said, they
> did quite well in some cases.

Yes. They did well in benchmarks like hailstone:

collatzLen :: Int -> Word32 -> Int
collatzLen c 1 = c
collatzLen c n = collatzLen (c+1) $ if n `mod` 2 == 0 then n `div` 2 else
3*n+1
pmax x n = x `max` (collatzLen 1 n, n)
main = print $ foldl pmax (1,1) [2..1000000]

Because they unbox every int and tuple to recover C-like performance, just
as HLVM does thanks to value types and OCaml cannot and does not. So I would
not presume that OCaml+LLVM would be as effective as GHC+LLVM.

> > And faster tuples, ints, chars, complex numbers, low-dimensional
> > vectors/matrices, hash tables and so on. More types (e.g. int16 and
> > float32). Even arbitrary-precision arithmetic can benefit by keeping
> small
> > numbers unboxed when possible. Bigarrays disappear. The FFI gets
> simpler,
> > easier to use and more powerful (e.g. you can generate AoS layouts
> for
> > OpenGL). The benefits are enormous in the general case but that is
> beside
> > the point here. Moreover, this would never be accepted either because
> it
> > would degrade the performance of applications like Coq. If it is done
> at
> > all, such work must be kept separate from the OCaml compilers.
>
> ocamlopt already supports float32 and int16, tho they are not exploited
> to the upper layers

But there is no data representation for them in the heap? Could OCaml's GC
handle a boxed float32 or a float32 array without support for that in the
lower layers?

> (don't know why, but I also don't know why one would need them).

The float32 type is used primarily to halve memory requirements and improve
locality but also to facilitate SSE. Vector arithmetic with LLVM can be well
over 4x faster than OCaml thanks to SSE, of course.

> Keeping Int32/Int64 values unboxed would be possible
> similar to what is done with doubles; maybe there is simply no need to
> do (honestly, I've never needed Int32/Int64 so far, int/Natint is
> usually what you want and these are optimized).

Regular int types are useful for applications like pseudo-random number
generators, image processing and signal analysis.

> >> 3. The current garbage collector is mostly straight-forward because
> of
> >> the data representation. No need to carry around/load type
> information,
> >
> > Note that conveying run-time type information also facilitates useful
> > features like generic printing.
>
> Generic printing is currently implemented using the run-time type
> information (the three bits mentioned below).

I'm not sure what you mean by this. OCaml cannot express generic printing,
by which I mean the ability to print a value of an arbitrary type at
run-time and have it pretty printed appropriately according to its type.

For example, the F# code:

printf "%A" [1;2;3]

prints "[1;2;3]".

Similarly in HLVM:

# print(create(3, 1));;
[|1; 1; 1|]

OCaml's top level can pretty print only because it knows the type of the
expression. Ordinary compiled user code has no such information because the
OCaml only retains partial type information in its data representation so
you cannot do this in OCaml.

> >> you just need the following bits of information: Is it a block in
> the
> >> heap? Does it contain pointers? If so how many? All this information
> is
> >> immediately available with the current data representation (also
> >> improves cache locality of the GC loops).
> >
> > That information is also immediately available with HLVM's design.
> Each
> > value contains a pointer to its run-time type. Each run-time type
> contains a
> > pointer to a "visit" function for the GC that returns the references
> in any
> > value of that type.
>
> This is indeed possible, yes, and also implemented by most Java VMs;
> each objects ships a type-info pointer. OCaml avoids the type-info
> pointer using clever header field encoding. A matter of taste probably.

Beyond taste, it facilitates useful features like generic printing and
reflection.

> >> So, it is really worth to spend years on a new data representation
> >> (including fixing all C bindings, etc.)? Just to get better floating
> >> point performance? Integer performance will be almost the same,
> >> avoiding the shift/tag bit just replaces an "addq r1, r2; subq $1,
> r2"
> >> with "addq r1, r2"; even doing this thousands of times will not
> cause a
> >> noticeable difference.
> >
> > On the contrary, there are many example of huge performance gains
> other than
> > floating point. The int-based random number generator from the
> SciMark2
> > benchmark is 6x faster with LLVM than with OCaml. Generic hash tables
> are
> > 17x faster in F# than Java because of value types. And so on.
>
> I doubt that this is really related to the use of "value types", I also
> doubt that the integer performance improvements are related to the
> missing "subq" (must have been an unusual CPU then).

The "subq" is irrelevant, yes. OCaml's poor performance is due to
unnecessary boxing. Value types solve that problem in the general case, i.e.
in the heap as well as on the stack.

> The Java performance is probably related to the GC in most JVMs,

Java's performance is also dire because of boxing. Specifically, the
key-value pairs and often the keys and values themselves. Firstly, that is
due to type erasure instead of reified generics. Secondly, even if you wrote
a type-specialized version the key-value pair still cannot be unboxed
without value types. Thirdly, these limitations of the VM lead to the use of
suboptimal concrete data structures. Specifically, open hashing with chains
instead of closed hashing.

> which is not as
> well optimized for high speed allocation and compaction as the GCs
> found in most runtimes of functional programming languages. But this is
> just guessing, we'd need some facts to be sure what's the cause. I
> suggest you present some proof-of-concept code in C or LLVM, simply
> using malloc() for memory allocation, comparing a straight-forward
> implementation to a "value type" implementation (with everything else
> the same). If there's still a 17x improvement, I promise to rewrite the
> whole OCaml system during the next three months to support value types.

Again, this is most easily studied using HLVM. A similar program using value
types in HLVM that fills an array of unboxed pairs of 32-bit ints looks like
this and takes 0.39s to run:

let rec fill((a, i0, i2) : (int * int) array * int * int) : unit =
let di = i2 - i0 in
if di=0 then () else
if di=1 then
a.[i0] <- (i0, i2)
else
( let i1 = i0 + di/2 in
( fill(a, i0, i1);
fill(a, i1, i2) ) );;

let n = 10000000 in fill(create(n, (0, 0)), 0, n);;

The following version using a data representation similar to that of OCaml
and Java by boxing the key-value pair that is allocated on the heap using
"malloc" and, consequently, this version takes 2.37s to run:

type keyValue = KeyValue of int * int;;

let rec fill((a, i0, i2) : keyValue array * int * int) : unit =
let di = i2 - i0 in
if di=0 then () else
if di=1 then
a.[i0] <- (KeyValue(i0, i2))
else
( let i1 = i0 + di/2 in
( fill(a, i0, i1);
fill(a, i1, i2) ) );;

let n = 10000000 in fill(create(n, KeyValue(0, 0)), 0, n);;

So that is already 6в slower and the problem gets worse when you box the
keys and values themselves within the boxed pair.

You might try to blame the poor performance on malloc but OCaml is over 12в
slower with the following similar operation (that I had to do because of its
16Mb limit):

Array.init 1000 (fun i -> Array.init 10000 (fun j ->
let k = i*1000+j in Int32.of_int k, Int32.of_int k));;

Generational garbage collection further exacerbates the problem by
unnecessarily copying every allocated value from the young to old
generation. Remembered sets also further exacerbate the problem by having
the write barrier record a pointer at every write. Hence the huge
performance discrepancies between OCaml/Java and HLVM/.NET in this context.

Cheers,
Jon.