[Caml-list] More cores

Mikkel Fahnøe Jørgensen

unread,

Dec 19, 2008, 8:06:24 AM12/19/08

to OCaml

Is it time to start rethinking concurrency in OCaml?

I have followed the argumentation of only using one native thread for
the OCaml runtime.
I can easily see how this can increase performance and simplify implementation.
I can also see that spawning new processes makes sense, so you get a
local heap for each task.

However, as we move forward it seems that we will get more than a few
cores on the same computational node according to the following
article:

Intel says to prepare for 'thousands of cores':
http://news.cnet.com/8301-13924_3-9981760-64.html?part=rss&subj=news&tag=2547-1_3-0-20

As I see it, it is not feasible to spawn a new process with a local
heap for each core, when the number of cores increases dramatically.

I am not sure that a parallel GC is a sufficient solution either due
to the high contention on memory, at least unless it provide some
additional core affinity features. I believe some level of compiler
support is needed in the not so distant future such that enough
primitives are available to build powerful multi-core aware libraries.
One approach could be micro heaps with core affinity and handle
mutable memory specially.

Regards,
Mikkel

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Dario Teixeira

unread,

Dec 19, 2008, 9:04:36 AM12/19/08

to OCaml, Mikkel Fahnøe Jørgensen

Hi,

> Is it time to start rethinking concurrency in OCaml?

> (...)

That's a recurrent topic in this list. I'll summarise the arguments
and save us all some time:

i) Someone raises the issue that it's time for Ocaml to go multicore.

ii) A few voices go "yeah!" and state that with a concurrent GC,
Ocaml can take over the world. Their argument is as follows:

1) Ocaml gets a concurrent GC;
2) ...
3) Profit!

iii) A voice of reason notes that the devil is in step 2) above.
If your programming paradigm for concurrency is Threads+Mutexes,
then you are exposing yourself to a world of pain in race
conditions. At this point someone usually links to Xavier's
standard speech on concurrency (which is a few years old, but
as poignant now as it was then).

iv) The voices from step ii) retort that they're über-programmers
and that they can make perfectly scalable and race condition-free
programmes with Threads+Mutexes if only they had a concurrent GC.

v) The voice of reason remarks that one of the main reasons we all chose
Ocaml is because the language ensures safe, resilient programmes.
In a way it's the humble recognition that we are human, and that
we make mistakes. Choosing the Threads+Mutexes path would be
hubris and an abandonment of fundamental reasons why we chose
Ocaml in the first place.

While I tend to agree with viewpoints iii) and v), I do think a healthy
discussion on the subject of multicore is in order. Though I suggest
we focus the discussion on concurrency models that will allow us to
take advantage of those multicores in a safe, sane manner:

a) Could Jocaml be the future of Ocaml?

b) How do we handle the global mutability issue?

Best regards,
Dario Teixeira

Alexy Khrabrov

unread,

Dec 19, 2008, 10:07:17 AM12/19/08

to Dario Teixeira, OCaml

Well, it's fun to join the old discussion in the new times. The fact
that computers go multicore at a greater scale makes it recurrent.

Erlang makes concurrency easy due to purity, and OCaml is famous for
being eclectic. Why not embrace Erlang's model by imposing
limitations on what can be in threads -- keeping them pure? Erlang
model works and attracts people to functional programming in general.
Even if some other model of concurrency prevails, it is interesting
and useful to interop with Erlang easily. Here's what Erlang folks
have started:

http://code.google.com/p/erlocaml/

Doing this properly can solve a lot of needs out there, and bring lots
of debugged, proven, high-quality concurrent server and communication
code within reach.

Cheers,
Alexy

Dario Teixeira

unread,

Dec 19, 2008, 10:54:35 AM12/19/08

to Alexy Khrabrov, OCaml

Hi,

> Erlang makes concurrency easy due to purity, and OCaml is
> famous for being eclectic. Why not embrace Erlang's
> model by imposing limitations on what can be in threads --
> keeping them pure?

Indeed. Ocaml's imperative features can be very useful in a local context
(ie, not crossing the boundaries of a function), but do hinder concurrency
when used for modifying global state. I wonder: is there already a tool
that verifies Ocaml code to ensure that a function is pure? Something
along the lines of Ocamlexc, but applied to purity.

The way I see it, there are (at least) three axis associated with a
function: the value axis, the exception axis, and the purity axis (the
latter can also be called the side-effect axis). In Ocaml, a function
signature cares only about the value axis, though implicitly the other
axis are also be present. For example:

val integer_division: int -> int -> int
throws Division_by_zero
is_pure

val incr_global_counter: unit -> unit
throws_nothing
is_not_pure

I can imagine a "spawn" statement in a concurrent Caml that expects
that the function passed as parameter be pure. And of course a function
would only be pure if it did not modify global state and only invoked
pure functions itself.

Obviously the other alternative is to go the Haskell route and transform
Caml in a pure functional language where monads reign. Though I think it
would be interesting to come up with a solution that keeps the advantages
of local imperative features, but does ensure purity for the purpose of
concurrency.

(and I'm sure there is already a ton of research along these lines)

Cheers,
Dario Teixeira

Paolo Donadeo

unread,

Dec 19, 2008, 11:27:04 AM12/19/08

to OCaml mailing list

> I can imagine a "spawn" statement in a concurrent Caml that expects
> that the function passed as parameter be pure.

This is, IMO, much more interesting than a concurrent garbage
collector (no flame, please!).

--
Paolo
~
~
:wq

Dario Teixeira

unread,

Dec 19, 2008, 12:01:41 PM12/19/08

to Alexy Khrabrov, OCaml

Hi again,

> I can imagine a "spawn" statement in a concurrent Caml that expects
> that the function passed as parameter be pure. And of course a function
> would only be pure if it did not modify global state and only invoked
> pure functions itself.

Incidentally, it is of course possible for a function to invoke impure
functions while still being pure itself (ie, it ensures the impurity
does not "leak out"). One question to those more familiar with current
language research: any recommended resources out there about this topic?

Christophe Raffalli

unread,

Dec 19, 2008, 12:57:44 PM12/19/08

to OCaml

Dear list members,

> Incidentally, it is of course possible for a function to invoke impure
> functions while still being pure itself (ie, it ensures the impurity
> does not "leak out"). One question to those more familiar with current
> language research: any recommended resources out there about this topic?
>
> Cheers,
> Dario Teixeira
>
>
>

You can easily infer purity (meaning no affectation) of an ocaml
expression using the same algorithm
than for exception (numerous papers on that subkect) ... just tag ":="
and "<-" as being able to raise the fake exception "#Affect" which
can not be catch by any try (because its name starts with a "#") ...

Cheers,
Christophe

Ulf Wiger (TN/EAB)

unread,

Dec 19, 2008, 1:51:11 PM12/19/08

to Alexy Khrabrov, OCaml

Alexy Khrabrov skrev:

> Well, it's fun to join the old discussion in the new times. The fact
> that computers go multicore at a greater scale makes it recurrent.
>
> Erlang makes concurrency easy due to purity, and OCaml is famous
> for being eclectic. Why not embrace Erlang's model by imposing
> limitations on what can be in threads -- keeping them pure?

Erlang processes are not pure, but they do have their own
memory heap, making it possible to do stop-and-copy GC per
process. I think the share-nothing model is more important
than purity.

There are some functions that allow Erlang processes to update
"global state" (e.g. the ETS hash and ordered_set tables), but
they are ensured to be atomic by the runtime system. You could
model each of these functions using normal erlang processes.

BR,
Ulf W

Richard Jones

unread,

Dec 19, 2008, 2:10:17 PM12/19/08

to Dario Teixeira, OCaml

Also worth reading is Ulrich Drepper's series on current and future
system architectures. I've highlighted the important parts of this
long series below, but you can find the complete list of links at the
end of part 1.

Part 1:
http://lwn.net/Articles/250967/

Part 2, on CPU caches:
http://lwn.net/Articles/252125/

Part 4, on NUMA:
http://lwn.net/Articles/254445/

Part 8, on future technologies:
http://lwn.net/Articles/258154/

Uli has recently been advocating languages like OCaml (+ fork, numactl
and virtualization obviously) for future architectures which will
involve massive numbers of cores and very long paths to main memory.

Rich.

--
Richard Jones
Red Hat

Oliver Bandel

unread,

Dec 19, 2008, 3:37:42 PM12/19/08

to OCaml

Zitat von Mikkel Fahnøe Jørgensen <mik...@dvide.com>:

[...]

> I am not sure that a parallel GC is a sufficient solution either due
> to the high contention on memory, at least unless it provide some
> additional core affinity features. I believe some level of compiler
> support is needed in the not so distant future such that enough
> primitives are available to build powerful multi-core aware
> libraries.
> One approach could be micro heaps with core affinity and handle
> mutable memory specially.

[...]

Not especially for multicore, but for parallel programming,
this might be of interest:

http://camlp3l.inria.fr/eng.htm

(To mention this by me also is recurrent, as the thread we are in...)

Ciao,
Oliver

P.S.: During the last multicore discussion, I found that link,
but had not tried OCamlp3l. Now I think I will have more
time and motivation and it could be compiled and installed
without any problems with OCaml 3.10.2.

Richard Jones

unread,

Dec 19, 2008, 4:27:59 PM12/19/08

to Oliver Bandel, OCaml

On Fri, Dec 19, 2008 at 09:37:32PM +0100, Oliver Bandel wrote:
[...]

> P.S.: During the last multicore discussion, I found that link,
> but had not tried OCamlp3l. Now I think I will have more
> time and motivation and it could be compiled and installed
> without any problems with OCaml 3.10.2.

Has anyone tried it with 3.11?

I had an idea to try out some fork-based OCaml programming to exploit
the 4 & 8 core machines we have here, but maybe can try this instead.

Rich.

--
Richard Jones
Red Hat

_______________________________________________

Hezekiah M. Carty

unread,

Dec 19, 2008, 5:03:41 PM12/19/08

to Richard Jones, Oliver Bandel, OCaml

On Fri, Dec 19, 2008 at 4:27 PM, Richard Jones <ri...@annexia.org> wrote:
> On Fri, Dec 19, 2008 at 09:37:32PM +0100, Oliver Bandel wrote:
> [...]
>> P.S.: During the last multicore discussion, I found that link,
>> but had not tried OCamlp3l. Now I think I will have more
>> time and motivation and it could be compiled and installed
>> without any problems with OCaml 3.10.2.
>
> Has anyone tried it with 3.11?
>
> I had an idea to try out some fork-based OCaml programming to exploit
> the 4 & 8 core machines we have here, but maybe can try this instead.

The prelude.ml project has some fork-based parallel functions for
lists, arrays, strings and bigarrays:

http://github.com/kig/preludeml/tree/master/prelude.ml

While I have not tried OCamlp3l on 3.11 yet, my guess is that it would
work. It is a pure-OCaml set of libraries along with some helper
scripts/programs and as far as I know there is not any camlp4
involved. After speaking with the authors, the package does seem to
be more focused on distributed computing than local parallelism. It
is still possible to use it for local parallelism though. OCamlp3l is
currently going through a rewrite as Camlp3l though the restructuring
is not complete at this point. CVS repositories for both are here --
http://camlcvs.inria.fr/cgi-bin/cvsweb/bazar-ocaml/

Please let us know how it goes if you do try one or both of these out.

Hez

Jon Harrop

unread,

Dec 19, 2008, 5:28:49 PM12/19/08

to caml...@yquem.inria.fr

On Friday 19 December 2008 14:04:22 Dario Teixeira wrote:
> Hi,
>
> > Is it time to start rethinking concurrency in OCaml?
> > (...)
>
> That's a recurrent topic in this list. I'll summarise the arguments
> and save us all some time:
>
> i) Someone raises the issue that it's time for Ocaml to go multicore.
>
> ii) A few voices go "yeah!" and state that with a concurrent GC,
> Ocaml can take over the world. Their argument is as follows:
>
> 1) Ocaml gets a concurrent GC;
> 2) ...
> 3) Profit!

Or:

ii) People used to choose OCaml because it was fast but lack of support for
multicores means that OCaml is no longer competitively performant for many
tasks on today's computers.

> iii) A voice of reason notes that the devil is in step 2) above.
> If your programming paradigm for concurrency is Threads+Mutexes,
> then you are exposing yourself to a world of pain in race
> conditions. At this point someone usually links to Xavier's
> standard speech on concurrency (which is a few years old, but
> as poignant now as it was then).

iii) Your "voice of reason" also said that multicore computers "have never
really taken off, at least in the general public":

http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html

> ...

> While I tend to agree with viewpoints iii) and v), I do think a healthy
> discussion on the subject of multicore is in order. Though I suggest we
> focus the discussion on concurrency models that will allow us to take
> advantage of those multicores in a safe, sane manner:

At this point, you need to distinguish between concurrency and parallelism.

OCaml can already handle concurrency reasonably well, particularly when CPU
usage is low. Perhaps OCaml has a future in that specific domain but it is
not my remit so I am not the best person to comment on this (ask Gerd).

OCaml is extremely bad at parallelism, even embarassingly parallel tasks
because OCaml still has to marshall all the data in a gather operation or
resort to manual memory management. However, the problems are entirely with
the current implementation and not with the language at all. So a new
implementation could address this problem.

> a) Could Jocaml be the future of Ocaml?

For concurrency perhaps but not for parallelism.

> b) How do we handle the global mutability issue?

Avoid mutation when it is likely to get out of control (e.g. Erlang-style
concurrent applications) but keep it available because it gives huge
performance improvements in many parallel applications. F# already has good
solutions in both cases so we can just copy them.

I spent a lot of time thinking about the future of the ML family of languages
in the open source world recently. I believe I have an another way forward
that some people might find interesting. I think ML has proven its worth and
it is time to begin developing a completely new open source ML
implementation. There are various different ways to accomplish this and the
goals are quite obvious (IMHO). However, finding a route that is likely to
lead to those goals that is socially acceptable (i.e. not a fork) and
technically feasible (i.e. incrementally adding useful features and garnering
users) was not easy. I believe the best course of action is to develop an
embedded DSL for OCaml (e.g. for higher-performance numerics) and evolve it
into a completely new ML implementation.

I have actually already started this using the excellent LLVM project and I
just obtained the first promising results yesterday: a simple benchmark where
my compiler runs OCaml code 3x faster than ocamlopt does.

There are a *lot* of low hanging fruit here. So I think it should be
relatively easy to create a tool that is useful for a significant number of
people. The most interesting aspect is that developing this project as a DLL
that is entirely accessible from within OCaml makes it possible to address
many of OCaml's own problems for existing OCaml programmers.

If anyone has any ideas about this I'd love to hear them.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Erik de Castro Lopo

unread,

Dec 19, 2008, 5:36:56 PM12/19/08

to caml...@inria.fr

Jon Harrop wrote:

> I have actually already started this using the excellent LLVM project and I
> just obtained the first promising results yesterday: a simple benchmark where
> my compiler runs OCaml code 3x faster than ocamlopt does.

Is this going to be an open source project? Is this (or is this going
to be) developed in the open with the sources in a publicly available
revision control system?

Erik
--
-----------------------------------------------------------------
Erik de Castro Lopo
-----------------------------------------------------------------
The idea that Bill Gates has appeared like a knight in shining armour to
lead all customers out of a mire of technological chaos neatly ignores
the fact that it was he who, by peddling second-rate technology, led them
into it in the first place. - Douglas Adams in Guardian, 25-Aug-95

Richard Jones

unread,

Dec 19, 2008, 5:42:52 PM12/19/08

to Jon Harrop, caml...@yquem.inria.fr

On Fri, Dec 19, 2008 at 10:31:33PM +0000, Jon Harrop wrote:
> OCaml is extremely bad at parallelism, even embarassingly parallel tasks
> because OCaml still has to marshall all the data in a gather operation or
> resort to manual memory management.

Merjis implemented the Ancient module to handle this case, and we
shared the code with an open source license so everyone can benefit:

http://merjis.com/developers/ancient

Rich.

--
Richard Jones
Red Hat

_______________________________________________

Richard Jones

unread,

Dec 19, 2008, 5:48:04 PM12/19/08

to Hezekiah M. Carty, Oliver Bandel, OCaml

On Fri, Dec 19, 2008 at 05:03:30PM -0500, Hezekiah M. Carty wrote:
> The prelude.ml project has some fork-based parallel functions for
> lists, arrays, strings and bigarrays:
>
> http://github.com/kig/preludeml/tree/master/prelude.ml

Ace! Any good documentation though? I found a few things in Google,
eg: http://fhtr.blogspot.com/2008/09/preludeml-more-multicore-mandelbrot.html
http://fhtr.blogspot.com/2008/08/preludeml-some-parallel-combinators.html
But I guess some proper introductory documentation would be helpful.

Rich.

--
Richard Jones
Red Hat

_______________________________________________

Jon Harrop

unread,

Dec 19, 2008, 5:50:28 PM12/19/08

to caml...@inria.fr

On Friday 19 December 2008 22:36:40 Erik de Castro Lopo wrote:
> Jon Harrop wrote:
> > I have actually already started this using the excellent LLVM project and
> > I just obtained the first promising results yesterday: a simple benchmark
> > where my compiler runs OCaml code 3x faster than ocamlopt does.
>
> Is this going to be an open source project?

Yes.

> Is this (or is this going to be) developed in the open with the sources in a
> publicly available revision control system?

Yes. I will, most likely, work on this as a fun project for as long as
possible in an attempt to build something vaguely inspiring before I put the
code up somewhere (e.g. OCamlForge). I will be more than happy to give
contributors commit access to the source because I am very busy with
unrelated projects. If anyone is desperately keen to have a play then I'll
consider putting it up immediately.

The only thing that some might object to is that I would like to make this a
commerce-friendly platform if possible, i.e. facilitate the buying and
selling of libraries built upon this platform. I don't see that this desire
has any negative effects but I believe the presence of a commercial platform
built around such a foundation would help to provide much-needed resources to
fund further work.

I should also mention that several other people are working on similar things,
also building upon LLVM. Someone has even tried to contribute an ML
implementation to LLVM itself as a configuration tool:

http://github.com/pcwalton/llvm-nw/tree/miniml/utils/TableML/Core

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________

Alexy Khrabrov

unread,

Dec 19, 2008, 6:00:25 PM12/19/08

to Richard Jones, Oliver Bandel, OCaml

I've used prelude.ml to parallelize my system, and it works fine on
Mac OSX. Ilmari has graciously worked with me to add versions of pmap
called pmap_init to initialize per-process file descriptors, and
pmap_initWithIndex to let the pieces know their IDs -- hopefully they
will be rolled into the trunk once it's refactored.

On Dec 19, 2008, at 5:47 PM, Richard Jones wrote:

> On Fri, Dec 19, 2008 at 05:03:30PM -0500, Hezekiah M. Carty wrote:
>> The prelude.ml project has some fork-based parallel functions for
>> lists, arrays, strings and bigarrays:
>>
>> http://github.com/kig/preludeml/tree/master/prelude.ml
>
> Ace! Any good documentation though? I found a few things in Google,
> eg: http://fhtr.blogspot.com/2008/09/preludeml-more-multicore-mandelbrot.html
> http://fhtr.blogspot.com/2008/08/preludeml-some-parallel-combinators.html
> But I guess some proper introductory documentation would be helpful.

In fact it looks self-documenting. :)

Cheers,
Alexy

Alexy Khrabrov

unread,

Dec 19, 2008, 6:56:43 PM12/19/08

to OCaml

While we're on the topic of prelude.ml, I wonder how many folks would
find it very nice to just open Prelude and have all these nice
functions readily available. I know that the lack of it makes
references such as List.map ubiquitous, and many argue it's a good
thing. However, it feels refreshing to use short names in an orderly
way. Since there's also the Batteries, perhaps a consensus can be
worked out on what should be in a such an extension to Pervasives?

Cheers,
Alexy

Mikkel Fahnøe Jørgensen

unread,

Dec 19, 2008, 8:41:13 PM12/19/08

to Alexy Khrabrov, OCaml

2008/12/20 Alexy Khrabrov <deliv...@gmail.com>:

> While we're on the topic of prelude.ml, I wonder how many folks would find
> it very nice to just open Prelude and have all these nice functions readily
> available.

Prelude looks nice, and I certainly do not want to put it down, so
forgive me for my critical questions.

How does Prelude handle Windows support, and would we want library
components that are not cross platform?

The 3pl project is cross platform if I remember correctly (short of
some issues on OS-X that hopefully is corrected or is in CVS).

There is also the Lwt lightweight threads library which is not
multi-core. I'm just wondering of some of these efforts can be
combined to a more homogenous library solution. I believe there are
also other solutions in progress.

Other than that, I wonder if Prelude really is a practical solution as
it stands: without looking deep into it, it seems to be targeting
inner loop concurrency by spawning processes, which is costly. In some
cases this may be fine, such as an easy way to speed up calculating
file signatures, but often you would like to have your N processes in
a pool that you can tap when needed, such as handling web requests.

Of course we can have different libraries for different purposes, but
before including any one such library, I think it would be healthy to
discuss requirements.

Finally, I'm also wondering about how to tap platform specific
concurrency. OS-X 10.6 will be shipping with Grand Central for
concurrent support, and I'm sure there will be, or already are,
similar initiatives on other platforms.

Mikkel

Alexy Khrabrov

unread,

Dec 19, 2008, 11:51:03 PM12/19/08

to Mikkel Fahnøe Jørgensen, OCaml

On Dec 19, 2008, at 8:40 PM, Mikkel Fahnøe Jørgensen wrote:

> 2008/12/20 Alexy Khrabrov <deliv...@gmail.com>:
>
>> While we're on the topic of prelude.ml, I wonder how many folks
>> would find
>> it very nice to just open Prelude and have all these nice functions
>> readily
>> available.
>
> Prelude looks nice, and I certainly do not want to put it down, so

> forgive me for my critical questions [...]

I agree its solution to parallelism via a simple forking map/reduce is
not universal, but in fact I am wondering about having other
functional combinators available in shorthand. Prelude.ml is a
superb crash course in FP, and in fact I catch myself reinventing
these idioms ad hoc quite often. I'm very tempted to just include it
always, making it my own Pervasives.

Cheers,
Alexy

Zheng Li

unread,

Dec 20, 2008, 5:54:07 AM12/20/08

to Alexy Khrabrov, OCaml

Hi,

Alexy Khrabrov wrote:
> I agree its solution to parallelism via a simple forking map/reduce is
> not universal, but in fact I am wondering about having other functional

coThreads is comparatively low-level. You might be able to write your
own schedule on top of it. A no-brainer pmap can be defined as follows:

Toplevel trace:
----
# :load unix.cma;;
# :path +process;;
# :load cothreads.cma;;
# let pmap f a =
let ea = Array.map (Cothread.spawn f) a in
Array.map Event.sync ea;;
val pmap : ('a -> 'b) -> 'a array -> 'b array = <fun>
# pmap ((+) 1) (Array.init 10 (fun i -> i));;
- : int array = [|1; 2; 3; 4; 5; 6; 7; 8; 9; 10|]
----

> combinators available in shorthand. Prelude.ml is a
> superb crash course in FP, and in fact I catch myself reinventing these
> idioms ad hoc quite often. I'm very tempted to just include it always,
> making it my own Pervasives.

Maybe you should also look at some of Haskell's standard lib.
Pros: Haskell guys are really good at naming;
Cons: costs are always hidden and combinators composition can be terse.

Just my 2 cents.

--
Zheng

Richard Jones

unread,

Dec 20, 2008, 7:37:47 AM12/20/08

to Hezekiah M. Carty, Oliver Bandel, OCaml

Here's a Fedora package:
https://bugzilla.redhat.com/show_bug.cgi?id=477313

I don't think this is packaged in Debian yet, so I had to make a few
decisions about packaging:

- the ocamlfind name is 'preludeml'
- the package name (in Fedora) is ocaml-preludeml
- no upstream versions, so I packaged it as "version 0.1", dated 20080922

Mikkel Fahnøe Jørgensen

unread,

Dec 20, 2008, 2:33:09 PM12/20/08

to Jon Harrop, caml...@yquem.inria.fr

2008/12/19 Jon Harrop <j...@ffconsultancy.com>:

> ii) People used to choose OCaml because it was fast but lack of support for
> multicores means that OCaml is no longer competitively performant for many
> tasks on today's computers.

That is definitely an argument.

>> If your programming paradigm for concurrency is Threads+Mutexes,

I think this model is necessary for systems programming, but I don't
think we should look broader than this.

I'm not really sure what separates threads from functions, but I'd
like to be able to chain logic sequentially and concurrently in a way
that sync primitives are available but usually not required, and in a
way that efficiently handle concurrent operations in C libraries,
including transaction operations. I believe Oleg did some work on
this, but this work would likely benefit from compiler support.

So the discussion should be

a) is there any real interest in supporting multiple cores - when we
look beyond concurrent GC and Mutexes?

b) what are the basic primitives we need from the compiler to support
a range of interesting concurrency models in an efficient manner.

c) we are probably not inventing a new vector language - like fortress
- super efficient at micro parallelism, but it should be possible to
efficiently tap into the processing power available.

d) error handling and asymmetric termination times of concurrent
operations is non-trivial - you may not want to wait for all sub-tasks
to complete before the main task does starts aggregating. And if you
embed distributed concurrency you may want to reschedule a long
running operation on a new node. This is not a compiler issue, but the
compiler may support a failure model that makes this easier to handle
in a generic way for a distributed library. Google often mentions the
importance of abandoning failed tasks because a hard-drive is down
ever so often, or something.

The spawn multiple processes solutions we have today concerns me
slightly because I'm not convinced that processes are cleaned up
properly on failure. I may be wrong though.

> I think ML has proven its worth and
> it is time to begin developing a completely new open source ML
> implementation.

I think this sounds interesting.

But before starting such an effort, it would be nice with more road
map information from Inria. Do they want to include LLVM, or are there
good reasons not to?

More importantly, I believe Xavier is right about the non-scalability
of plain threaded garbage collection. So I really think we need to
understand how we want to obtain that parallelism efficiently. And
when we understand that, perhaps Inria is more willing to listen to
the multi-core argument?

That said, it can't hurt to have an experimental solution to try out
new ideas and prove that they work.

Mikkel

Mikkel Fahnøe Jørgensen

unread,

Dec 20, 2008, 2:41:47 PM12/20/08

to Jon Harrop, caml...@yquem.inria.fr

2008/12/20 Mikkel Fahnøe Jørgensen <mik...@dvide.com>:
> 2008/12/19 Jon Harrop <j...@ffconsultancy.com>:

> I think this model is necessary for systems programming, but I don't
> think we should look broader than this.

Sorry, I mean the exactly opposite - we should not rely on a tradional
threading model, It won't scale. But we still need basic threads for
some systems tasks, likely.

Jon Harrop

unread,

Dec 22, 2008, 11:57:16 AM12/22/08

to caml...@yquem.inria.fr

On Friday 19 December 2008 22:53:15 Jon Harrop wrote:
> On Friday 19 December 2008 22:36:40 Erik de Castro Lopo wrote:
> > Jon Harrop wrote:
> > > I have actually already started this using the excellent LLVM project
> > > and I just obtained the first promising results yesterday: a simple
> > > benchmark where my compiler runs OCaml code 3x faster than ocamlopt
> > > does.
> >
> > Is this going to be an open source project?
>
> Yes.

Just to update: I've spent a couple more days playing with this. I now have
unit, bools, ints, floats, arrays, fast internal calling convention with tail
calls, C-compatible external calling convention, arbitrary extern functions,
C printf, multiple function arguments and multiple function return values
(passed on the stack!). I have written a few tiny benchmarks and the results
compared to OCaml on x86 are promising:

Float Fibonacci: 2.9x faster than OCaml.
Sieve of Eratosthenes: 1.8x faster than OCaml.
Mandelbrot: 2.9x faster than OCaml.

The compiler is only ~500LOC. The language still lacks closures, a garbage
collector, algebraic datatypes, pattern matching, polymorphism, exceptions,
run-time types and safety (!). However, it already has some killer features:

. Declare and then call C functions directly.

. No 16Mb limits.

. Very fast JIT compilation (2x faster than ocamlopt).

. Easy incremental native-code compilation for a REPL.

. Efficient polymorphism.

LLVM even does a decent job of statically type checking the code, which helps
a lot when debugging (once you get used to the error messages!).

I'll probably write a front end or REPL using camlp4 and then port my ray
tracer benchmark to it. After that, I'll look at some more features:

. Algebraic data types and a kind of switch over them as a poor man's
pattern matching.

. Run-time type information and generic printing.

. Real polymorphism.

. Garbage collection.

This begs the question: what is the simplest possible concurrent GC?

Merry Christmas,

Richard Jones

unread,

Dec 22, 2008, 4:45:07 PM12/22/08

to Jon Harrop, caml...@yquem.inria.fr

On Mon, Dec 22, 2008 at 05:00:08PM +0000, Jon Harrop wrote:
> . Declare and then call C functions directly.

This is a particularly nice feature of C#.

[...]

So you're going to put this on ocamlforge soon?

Rich.

--
Richard Jones
Red Hat

_______________________________________________

Jon Harrop

unread,

Dec 23, 2008, 1:04:43 AM12/23/08

to caml...@yquem.inria.fr

On Monday 22 December 2008 21:44:57 Richard Jones wrote:
> On Mon, Dec 22, 2008 at 05:00:08PM +0000, Jon Harrop wrote:
> > . Declare and then call C functions directly.
>
> This is a particularly nice feature of C#.

Yes. Maybe I can get a more compelling demo working.

> So you're going to put this on ocamlforge soon?

Yes. I'll do a bit more work on it and then tidy it up and document it before
uploading it, unless there is any great interest from people now.

Working with LLVM is really good fun. A JIT compiled regexp engine would be
another interesting project and a JIT compiler for OCaml's existing bytecode
would also be nice.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________

Oliver Bandel

unread,

Dec 23, 2008, 4:44:10 AM12/23/08

to caml...@yquem.inria.fr

Hello Jon,

where have you been for such a long time?

It seems, your destructive ages of blaming OCaml-team
instead of implementing things by your own, are now gone
(or at least on decay).

Seems you are back on the constructive side.
Nice to meet you at this place. :)

Zitat von Jon Harrop <j...@ffconsultancy.com>:

> On Friday 19 December 2008 22:53:15 Jon Harrop wrote:
> > On Friday 19 December 2008 22:36:40 Erik de Castro Lopo wrote:
> > > Jon Harrop wrote:
> > > > I have actually already started this using the excellent LLVM
> project
> > > > and I just obtained the first promising results yesterday: a
> simple
> > > > benchmark where my compiler runs OCaml code 3x faster than
> ocamlopt
> > > > does.
> > >
> > > Is this going to be an open source project?
> >
> > Yes.
>
> Just to update: I've spent a couple more days playing with this. I
> now have
> unit, bools, ints, floats, arrays, fast internal calling convention
> with tail
> calls, C-compatible external calling convention, arbitrary extern
> functions,
> C printf, multiple function arguments and multiple function return
> values

[...]

Fine. :)

> (passed on the stack!). I have written a few tiny benchmarks and the
> results
> compared to OCaml on x86 are promising:
>
> Float Fibonacci: 2.9x faster than OCaml.
> Sieve of Eratosthenes: 1.8x faster than OCaml.
> Mandelbrot: 2.9x faster than OCaml.

cool. :)

We wait for the day, when it's faster than assembler. ;-)

[...]

> LLVM even does a decent job of statically type checking the code,
> which helps
> a lot when debugging (once you get used to the error messages!).

[...]

You mentioned LLVM is using 3-op-code.
Interesting. Since a while I thought about using such
a language level for own implementations, but it was not more
than idea on using it.

To have the LLVM-project implementing this is fine.
This makes the start easier.
Seems to be interesting and mature enough to use it.

Ciao,
Oliver

Jon Harrop

unread,

Dec 23, 2008, 4:56:28 AM12/23/08

to caml...@yquem.inria.fr

On Tuesday 23 December 2008 06:07:37 Jon Harrop wrote:
> Yes. I'll do a bit more work on it and then tidy it up and document it
> before uploading it, unless there is any great interest from people now.

Incidentally, I would like to know what performance issues (good and bad)
people have with the current OCaml implementation?

AFAICT, OCaml evolved from a family of languages that were only optimized for
symbolic processing. Some of OCaml's relatives, like PolyML, were able to
provide even faster symbolic processing than naive C but their numerical
performance is 100x worse. These heavily-skewed performance characteristics
rendered many ML implementations domain specific.

I believe OCaml was the first ML to put an unusual amount of effort into
optimizing other aspects, most notably high performance floating-point
arithmetic and float arrays (where OCaml still thrashes SML/NJ and MLton).
Moreover, I think OCaml became the world's favourite ML precisely because of
this diversity.

I am just looking at the simplest possible GC implementations, like shadow
stack, and trying to envisage an ML implementation that puts a lot more
effort into numerics and string processing and a lot less effort into
symbolics. I am guessing the performance of allocation will be degraded
10-100x but many allocations can be removed. This leaves me wondering how
much slowdown is acceptable without deterring a lot of users?

Anyway, I'll try to implement a simple GC and get some concrete performance
results first...

Richard Jones

unread,

Dec 23, 2008, 5:05:01 AM12/23/08

to Jon Harrop, caml...@yquem.inria.fr

On Tue, Dec 23, 2008 at 06:07:37AM +0000, Jon Harrop wrote:
> another interesting project and a JIT compiler for OCaml's existing bytecode
> would also be nice.

Probably easier to start with the abstract syntax tree that camlp4
writes out. http://brion.inria.fr/gallium/index.php/AST

Rich.

--
Richard Jones
Red Hat

_______________________________________________

Jon Harrop

unread,

Dec 23, 2008, 5:35:48 AM12/23/08

to caml...@yquem.inria.fr

On Tuesday 23 December 2008 10:04:55 Richard Jones wrote:
> On Tue, Dec 23, 2008 at 06:07:37AM +0000, Jon Harrop wrote:
> > another interesting project and a JIT compiler for OCaml's existing
> > bytecode would also be nice.
>
> Probably easier to start with the abstract syntax tree that camlp4
> writes out. http://brion.inria.fr/gallium/index.php/AST

Actually I was previously compiling quoted OCaml code via Camlp4. However,
lack of support for camlp4 macros in Tuareg and lack of documentation about
Camlp4 itself (how do you get a tuple "out"!?) made this sufficiently painful
that I ditched it and opted for writing out the AST in variant types by hand
instead, e.g. my mandelbrot benchmark:

let mandelbrot n =
[Defn.Function("lerp", ["i", Type.Int; "n", Type.Int], [Type.Float],
Float 2.0 *:. FloatOfInt(Var "i") /:. FloatOfInt(Var "n") -:.
Float 1.0);

Defn.Function("zsqr",
["r", Type.Float; "i", Type.Float], [Type.Float; Type.Float],
Values[Var "r" *:. Var "r" -:. Var "i" *:. Var "i";
Float 2.0 *:. Var "r" *:. Var "i"]);

Defn.Function("znorm2", ["r", Type.Float; "i", Type.Float], [Type.Float],
Var "r" *:. Var "r" +:. Var "i" *:. Var "i");

Defn.Function
("pixel2", ["n", Type.Int;
"zr", Type.Float;
"zi", Type.Float;
"cr", Type.Float;
"ci", Type.Float], [Type.String],
If(Var "n" =: Int 65536, String " ",
If(apply(Var "znorm2", [Var "zr"; Var "zi"], Type.Float) >=:.
Float 4.0,
String ".",
Call(["zr", Type.Float; "zi", Type.Float],
Var "zsqr", [Var "zr"; Var "zi"],
apply(Var "pixel2", [Var "n" +: Int 1;
Var "zr" +:. Var "cr";
Var "zi" +:. Var "ci";
Var "cr"; Var "ci"], Type.Unit)))));

Defn.Function
("pixel", ["n", Type.Int;
"zr", Type.Float;
"zi", Type.Float;
"cr", Type.Float;
"ci", Type.Float], [Type.String],
If(Var "n" =: Int 65536, String " ",
If(Var "zr" *:. Var "zr" +:. Var "zi" *:. Var "zi" >=:. Float 4.0,
String ".",
apply(Var "pixel",
[Var "n" +: Int 1;
Var "zr" *:. Var "zr" -:.
Var "zi" *:. Var "zi" +:. Var "cr";
Float 2.0 *:. Var "zr" *:. Var "zi" +:. Var "ci";
Var "cr"; Var "ci"], Type.Unit))));

Defn.Function
("row", ["i", Type.Int; "j", Type.Int; "n", Type.Int], [],
If(Var "i" >: Var "n", Unit,
Compound
[ Printf[apply(Var "pixel",
[Int 0;
Float 0.0; Float 0.0;
apply(Var "lerp", [Var "i"; Var "n"], Type.Float);
apply(Var "lerp", [Var "j"; Var "n"], Type.Float)],
Type.Unit)];
apply(Var "row",
[Var "i" +: Int 1; Var "j"; Var "n"], Type.Unit)]));

Defn.Function
("col", ["j", Type.Int; "n", Type.Int], [],
If(Var "j" >: Var "n", Unit,
Compound
[ apply(Var "row", [Int 0; Var "j"; Var "n"], Type.Unit);
Printf[String "\n"];
apply(Var "col", [Var "j" +: Int 1; Var "n"], Type.Unit)]));

Defn.Function
("main", [], [], apply(Var "col", [Int 0; Int n], Type.Unit))]

I'll probably just write a "real" parser and implement a REPL for it instead,
probably after I've got algebraic datatypes and generic printing working.
Hmm, maybe I should use ocamllex and ocamlyacc instead of Camlp4 because I'd
like a better lexer as well (e.g. complex literals). I had been thinking
about using parser combinators to make bootstrapping easier but they're a
world of pain and I don't actually see any point in bootstrapping anyway.

Are there any usable OCaml frameworks that can parse and pretty print (i.e.
remove superfluous parentheses by taking associativity, precedence and fixity
into account) from the same grammar definition?

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________

Jon Harrop

unread,

Dec 23, 2008, 6:50:07 AM12/23/08

to caml...@yquem.inria.fr

On Tuesday 23 December 2008 09:43:59 Oliver Bandel wrote:
> Hello Jon,
>
> where have you been for such a long time?

My time has been split between building products around F# (in preparation for
its world domination in 2010 ;-) and my bouncing baby boy.

> It seems, your destructive ages of blaming OCaml-team
> instead of implementing things by your own, are now gone
> (or at least on decay).

I don't want to blame the OCaml team: they did a great job with OCaml and it
was never their goal to make what I want. After all, I would not be here were
it not for them (I mean on the caml-list, not in existence ;-).

But I do think it is time to face facts: the current OCaml implementation has
some serious issues that are holding it back but they can be solved with
enough effort. My personal belief is that the other significant FPLs on Linux
and Mac OS X (i.e. Haskell, Scala and Erlang) have even more serious
practical issues. So I perceive OCaml's stagnation as a catastrophic loss in
language diversity that I would like to prevent if at all possible.

I cannot see a revenue stream in this new language implementation so I can
only devote my spare time to it, which is usually extremely limited. If
anything, this is a very long term investment with the hope that this new
language implementation might eventually become the dominant open source FPL
outside Windows in a few years time. If that happens then we would be able to
earn money from it by writing books about it and selling software for it, but
that is a long shot.

> To have the LLVM-project implementing this is fine.
> This makes the start easier.
> Seems to be interesting and mature enough to use it.

Yes. I have found a couple of limitations and a performance issue in LLVM but
nothing serious and I would not hesitate to recommend it.

LLVM is really thriving because it is commercially viable and, consequently,
has serious commercial backing. Indeed, LLVM is likely to become more popular
than OCaml in the not-too-distant future:

http://www.google.com/trends?q=ocaml%2Cllvm

I think it would be very productive to learn from this development model: an
open source ML with industrial backing would have huge potential.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________

Ashish Agarwal

unread,

Dec 23, 2008, 10:33:05 AM12/23/08

to Jon Harrop, caml...@yquem.inria.fr

> a lot more effort into numerics and string processing and a lot less
effort into
symbolics

Is there any fundamental reason these two goals must be at odds? Why can't a
compiler be good at numeric and symbolic computation?

Jon Harrop

unread,

Dec 23, 2008, 12:30:19 PM12/23/08

to caml...@yquem.inria.fr

On Tuesday 23 December 2008 15:32:52 Ashish Agarwal wrote:
> > a lot more effort into numerics and string processing and a lot less
> > effort into symbolics.
>

> Is there any fundamental reason these two goals must be at odds?

No theoretical reason, AFAIK.

> Why can't a compiler be good at numeric and symbolic computation?

The required performance characteristics appear to be mutually exclusive when
it comes to GCs.

Symbolics benefit enormously from extremely fast allocation rates for
short-lived values and numerics benefit enormously from efficient handling of
shared memory. Today's GCs either provide one or the other. OCaml is ideal
for symbolics and awful for parallel programming. The CLR is awful for
symbolics and good for parallel programming. The JVM is arguably the best
trade-off but its run-time cannot handle tail calls, which makes it a useless
for performant functional languages.

In the case of the compiler I'm writing, there are more pressing practical
concerns. Primarily that it is extremely difficult to write an efficient GC
like OCaml's, if not impossible within the confines of LLVM (and LLVM is the
only thing making this feasible for me). I believe I can get some kind of GC
working, possibly even a parallel GC. Moreover, I think I can remove all
(performance) overhead associated with automatic memory management from
numerical routines. I think that would be very compelling even if it were bad
for symbolics...

Mikkel Fahnøe Jørgensen

unread,

Dec 24, 2008, 8:13:05 AM12/24/08

to Jon Harrop, caml...@yquem.inria.fr

2008/12/23 Jon Harrop <j...@ffconsultancy.com>:

> symbolics. I am guessing the performance of allocation will be degraded
> 10-100x but many allocations can be removed. This leaves me wondering how
> much slowdown is acceptable without deterring a lot of users?

I think this would be major problem. A great advantage of OCaml is
that you can do things you would not normally do in C/C++ for
performance reasons, like mapping a list.

I believe it is desirable to split allocation into pools dedicated to
each physical thread. Memory barriers are really expensive, such as
used for atomic increment, not to mention locking. Small block
allocation could be done in relatively small heap segments where each
physical thread locks a larger heap only when it needs a new segment,
but not while allocating within a segment. This can further be
improved by adding NUMA support and scope based allocation (arena). If
full segments are marked non-mutable where possible, it is probably
possible to reduce thread blocking during collection. Garbage
collection could be limited to processing full segments. Probably not
trivial though.

How do you think of adding an Erlang style threading model?

Speaking of JIT compiler, would eval make any sense?

Mikkel

Jon Harrop

unread,

Dec 24, 2008, 11:44:06 AM12/24/08

to caml...@yquem.inria.fr

On Wednesday 24 December 2008 13:12:58 Mikkel Fahnøe Jørgensen wrote:
> 2008/12/23 Jon Harrop <j...@ffconsultancy.com>:
> > symbolics. I am guessing the performance of allocation will be degraded
> > 10-100x but many allocations can be removed. This leaves me wondering how
> > much slowdown is acceptable without deterring a lot of users?
>
> I think this would be major problem. A great advantage of OCaml is
> that you can do things you would not normally do in C/C++ for
> performance reasons, like mapping a list.

Yes. The counter argument is that people wanting performance should not be
using long linked lists in the first place. I think that is actually quite a
compelling argument: it is more important to remove barriers to potential
performance than it is to make everything fast.

Indeed, that is one of the strongest reasons to choose OCaml over the
alternatives: you can write very fast code in OCaml without having to drop to
C. In contrast, Haskell compilers do a lot of generic optimizations like
deforesting that help in unimportant cases but place a low performance
ceiling on fast code so you are forced to drop to another language for speed.

> I believe it is desirable to split allocation into pools dedicated to
> each physical thread.

Absolutely. However, that is also extremely complicated and I just cannot see
myself implementing that any time soon. A parallel GC using a shadow stack
seems feasible and should still be a productive improvement. Symbolic code
will really suffer but numeric code will really gain.

> Memory barriers are really expensive, such as used for atomic increment, not
> to mention locking.

Particularly when mutating pointers to reference types in the heap. However,
reading and writing value types in the heap should still be fast, e.g.
parallel numerical methods.

> Small block
> allocation could be done in relatively small heap segments where each
> physical thread locks a larger heap only when it needs a new segment,
> but not while allocating within a segment. This can further be
> improved by adding NUMA support and scope based allocation (arena). If
> full segments are marked non-mutable where possible, it is probably
> possible to reduce thread blocking during collection. Garbage
> collection could be limited to processing full segments. Probably not
> trivial though.

Indeed. There are also a variety of type-directed approaches that may be
useful given the information available to the JIT. Not to mention that I'm
expecting to JIT the GC code for each type as well... :-)

Incidentally, the availability of fences as intrinsics in LLVM makes it sound
ideal for implementing GCs using wait-free methods. If I can get the basics
up and running then it may be best for me to focus on making this VM an easy
target for GC researchers to implement new algorithms.

> How do you think of adding an Erlang style threading model?

That is effectively what F#'s asynchronous workflows aim to achieve. However,
I currently have no use for it so I only desire that its existence does not
limit the capabilities that I am interested in (parallelism).

> Speaking of JIT compiler, would eval make any sense?

Yes, no reason why not. Perhaps the same infrastructure used to type
specialize definitions as they are JIT compiled can be used for other kinds
of partial specialization, like low-dimensional vector/matrix operations.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________