[Caml-list] OC4MC : OCaml for Multicore architectures

265 views
Skip to first unread message

Philippe Wang

unread,
Sep 22, 2009, 5:31:14 PM9/22/09
to caml...@inria.fr
This is some additional "noise" about "OCaml for Multicore
architectures" (or "Ok with parallel threads GC").
----------------------------

Dear list,

We have implemented an alternative runtime library for OCaml, one that
allows threads to compute in parallel on different cores of now
widespread CPUs.

This project will be presented at IFL 2009 (http://blogs.shu.edu/projects/IFL2009/
).

A testing version available online at
http://www.algo-prog.info/ocmc/
It works with OCaml 3.10.2 for Linux x86-64bit, we haven't met any
bugs with the latest build (it doesn't *unexpectedly* crash, not yet).

Hope you'll enjoy,

--
Mathias Bourgoin, Adrien Jonquet, Emmanuel Chailloux, Benjamin Canou,
Philippe Wang

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Goswin von Brederlow

unread,
Sep 23, 2009, 6:54:48 AM9/23/09
to Philippe Wang, caml...@inria.fr
Philippe Wang <philip...@lip6.fr> writes:

> This is some additional "noise" about "OCaml for Multicore
> architectures" (or "Ok with parallel threads GC").
> ----------------------------
>
> Dear list,
>
> We have implemented an alternative runtime library for OCaml, one that
> allows threads to compute in parallel on different cores of now
> widespread CPUs.
>
> This project will be presented at IFL 2009
> (http://blogs.shu.edu/projects/IFL2009/
> ).
>
> A testing version available online at
> http://www.algo-prog.info/ocmc/
> It works with OCaml 3.10.2 for Linux x86-64bit, we haven't met any
> bugs with the latest build (it doesn't *unexpectedly* crash, not yet).
>
> Hope you'll enjoy,
>
> --
> Mathias Bourgoin, Adrien Jonquet, Emmanuel Chailloux, Benjamin Canou,
> Philippe Wang

Has anyone tested this yet? Any success stories?

MfG
Goswin

Jon Harrop

unread,
Sep 23, 2009, 7:10:20 AM9/23/09
to caml...@yquem.inria.fr
On Wednesday 23 September 2009 11:53:09 Goswin von Brederlow wrote:
> Has anyone tested this yet? Any success stories?

Its compiling. :-)

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Jon Harrop

unread,
Sep 23, 2009, 7:49:01 AM9/23/09
to caml...@yquem.inria.fr
On Wednesday 23 September 2009 13:21:35 Jon Harrop wrote:
> On Wednesday 23 September 2009 11:53:09 Goswin von Brederlow wrote:
> > Has anyone tested this yet? Any success stories?
>
> Its compiling. :-)

Oops, I just compiled a vanilla OCaml 3.10 and their patch is not currently
downloadable. I assume everyone else is thrashing their server instead of
writing contentless posts here? :-)

Philippe Wang

unread,
Sep 23, 2009, 10:26:12 AM9/23/09
to caml...@yquem.inria.fr
I've updated the download page, it should be more robust to multiple
downloads now.

Cheers,

Philippe Wang

Jon Harrop

unread,
Sep 23, 2009, 7:10:13 PM9/23/09
to caml...@yquem.inria.fr
On Wednesday 23 September 2009 11:53:09 Goswin von Brederlow wrote:
> Has anyone tested this yet? Any success stories?

Well, I've used the build.sh script to build a patched OCaml 3.10.2 that
identifies itself as:

$ ocamlopt -v
The Objective Caml native-code compiler, version
3.10.2+patch-ocaml4multicore-20090823
Standard library
directory: /home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml

and I've built their tests:

$ cd tests
$ make matmul.nc
ocamlopt -o "matmul.nc" -thread unix.cmxa threads.cmxa
graphics.cmxa "matmul.ml"
File "matmul.ml", line 25, characters 8-13:
Warning Y: unused variable count.
File "matmul.ml", line 26, characters 8-16:
Warning Y: unused variable last_col.

and run them:

$ time ./matmul.nc 1000 8
Temp de calcul: utime 38.930433, stime 0.012000, rtime 38.943138
Fatal error: exception Invalid_argument("index out of bounds")

real 0m38.974s
user 0m38.942s
sys 0m0.028s

Note the exception that (I think) should have been caught and handled
silently.

But I cannot get anything to run in parallel. None of the tests use more than
one core and my own busy-wait-loops-on-two-threads test also runs only on one
core. Any idea what I'm doing wrong? Is there a flag to enable it or
something?

One possible cause: I'm running in a 64-bit chroot.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________

Philippe Wang

unread,
Sep 23, 2009, 7:15:27 PM9/23/09
to Jon Harrop, caml...@yquem.inria.fr
make program.nc uses original ocamlopt

make program.th uses the newly built ocamlopt with the necessary
options (lib links)

then you can compare program.nc and program.th

--
Philippe Wang
ma...@philippewang.info

Jon Harrop

unread,
Sep 23, 2009, 7:54:15 PM9/23/09
to Philippe Wang, caml...@yquem.inria.fr
On Thursday 24 September 2009 00:15:14 you wrote:
> make program.nc uses original ocamlopt
>
> make program.th uses the newly built ocamlopt with the necessary
> options (lib links)
>
> then you can compare program.nc and program.th

Aha! Progress, but now I get errors:

$ make matmul.th
./out/bin/ocamlopt -ccopt -march=native -ccopt -mtune=native -ccopt -O4 -I /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/ -I /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o -cclib -lgc -cclib -g -thread
unix.cmxa threads.cmxa graphics.cmxa -verbose -compact -rectypes -inline
100 -fno-PIC -cclib -lunix -cclib -lpthread "matmul.ml" -o "matmul.th"


File "matmul.ml", line 25, characters 8-13:
Warning Y: unused variable count.
File "matmul.ml", line 26, characters 8-16:
Warning Y: unused variable last_col.

+ as -o matmul.o /tmp/camlasm081590.s
+ as -o /tmp/camlstartupdac3e2.o /tmp/camlstartup8f7152.s
+
gcc -o 'matmul.th' -I'/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml' -march=native -mtune=native -O4 '/tmp/camlstartupdac3e2.o' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/std_exit.o' 'matmul.o' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/graphics.a' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml/threads/threads.a' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/unix.a' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/stdlib.a' '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/' '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par' '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml/threads' '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml' '-lgraphics' '-lX11' '-lthreadsnat' '-lunix' '-lpthread' '-lunix' '/home/jdh30/src/ocam
l/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o' '-lgc' '-g' '-lunix' '-lpthread' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a' -lm -ldl
/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a(memory.o):
In function `gc_end_roots':
memory.c:(.text+0x10): multiple definition of `gc_end_roots'
/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:/home/jdh30/src/ocaml/parallel/oc4mc-20090823/runtime/gcs/sc_par/gci.c:948:
first defined here
/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a(memory.o):
In function `gc_begin_roots':
memory.c:(.text+0x12): multiple definition of `gc_begin_roots'
/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:/home/jdh30/src/ocaml/parallel/oc4mc-20090823/runtime/gcs/sc_par/gci.c:947:
first defined here
/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a(finalise.o):
In function `caml_final_do_strong_roots':
finalise.c:(.text+0x0): multiple definition of `caml_final_do_strong_roots'
/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:/home/jdh30/src/ocaml/parallel/oc4mc-20090823/runtime/gcs/sc_par/gci.c:301:
first defined here
/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:
In function `stop_the_world':
gci.c:(.text+0x38e): undefined reference to `caml_all_threads'
gci.c:(.text+0x403): undefined reference to `caml_all_threads'
gci.c:(.text+0x410): undefined reference to `caml_all_threads'
gci.c:(.text+0x48a): undefined reference to `caml_all_threads'
/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:
In function `resume_the_world':
gci.c:(.text+0x4c4): undefined reference to `caml_all_threads'
/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:gci.c:
(.text+0x57c): more undefined references to `caml_all_threads' follow
/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:
In function `termination_action':
gci.c:(.text+0x1e94): undefined reference to `remove_thread_from_list'
/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o:
In function `gc_terminate_local':
gci.c:(.text+0x1fe5): undefined reference to `remove_thread_from_list'
collect2: ld returned 1 exit status
Error during linking
make: *** [matmul.th] Error 2

Philippe Wang

unread,
Sep 23, 2009, 8:02:11 PM9/23/09
to Jon Harrop, caml...@yquem.inria.fr
Ok... well, I guess that
- whether it is something about your environment that is too different
from ours (in which case build.sh is bad),
- whether you have corrupted your installation (it could be by having
a bad PATH value that makes original ocamlopt be mixed up with oc4mc
ocamlopt)


What I suggest is to use a default PATH (without modifying it for the
purpose of OC4MC), and do these steps in a clean directory that is not
included in PATH :

1) wget oc4mc-2009XXXX.tgz
2) tar xzf oc4mc-2009XXXX.tgz
3) cd oc4mc-2009XXXX
4) wget ocaml 3.10.2 (tar.gz or tar.bz2)
5) bash build.sh
... wait
6) cd test
7) make matmul.th
8) time matmul.th 1000 8

Sorry it's messy, we are thinking about something cleaner... (there's
a matter of lack of time somewhere)

cheers,

--
Philippe Wang
ma...@philippewang.info


On Thu, Sep 24, 2009 at 2:05 AM, Jon Harrop <j...@ffconsultancy.com> wrote:
> On Thursday 24 September 2009 00:15:14 you wrote:
>> make program.nc uses original ocamlopt
>>
>> make program.th uses the newly built ocamlopt with the necessary
>> options (lib links)
>>
>> then you can compare program.nc and program.th
>
> Aha! Progress, but now I get errors:
>
> $ make matmul.th

> ../out/bin/ocamlopt -ccopt -march=native -ccopt -mtune=native -ccopt -O4 -I /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/ -I /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par /home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o -cclib -lgc -cclib -g -thread


> unix.cmxa threads.cmxa graphics.cmxa -verbose -compact -rectypes -inline
> 100 -fno-PIC -cclib -lunix -cclib -lpthread "matmul.ml" -o "matmul.th"
> File "matmul.ml", line 25, characters 8-13:
> Warning Y: unused variable count.
> File "matmul.ml", line 26, characters 8-16:
> Warning Y: unused variable last_col.
> + as -o matmul.o /tmp/camlasm081590.s
> + as -o /tmp/camlstartupdac3e2.o /tmp/camlstartup8f7152.s
> +
> gcc -o 'matmul.th' -I'/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml' -march=native -mtune=native -O4 '/tmp/camlstartupdac3e2.o' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/std_exit.o' 'matmul.o' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/graphics.a' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml/threads/threads.a' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/unix.a' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/stdlib.a' '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/' '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par' '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml/threads' '-L/home/jdh30/src/ocaml/parallel/oc4mc-20090823/ocaml-3.10.2/../out/lib/ocaml' '-lgraphics' '-lX11' '-lthreadsnat' '-lunix' '-lpthread' '-lunix' '/home/jdh30/src/oc

aml/parallel/oc4mc-20090823/tests/../runtime/gcs/sc_par/gci.o' '-lgc' '-g' '-lunix' '-lpthread' '/home/jdh30/src/ocaml/parallel/oc4mc-20090823/tests/../out/lib/ocaml/libasmrun.a' -lm -ldl

Jon Harrop

unread,
Sep 23, 2009, 9:36:16 PM9/23/09
to Philippe Wang, caml...@yquem.inria.fr
On Thursday 24 September 2009 01:01:58 you wrote:
> Ok... well, I guess that
> - whether it is something about your environment that is too different
> from ours (in which case build.sh is bad),
> - whether you have corrupted your installation (it could be by having
> a bad PATH value that makes original ocamlopt be mixed up with oc4mc
> ocamlopt)
>
> What I suggest is to use a default PATH (without modifying it for the
> purpose of OC4MC), and do these steps in a clean directory that is not
> included in PATH :
>
> 1) wget oc4mc-2009XXXX.tgz
> 2) tar xzf oc4mc-2009XXXX.tgz
> 3) cd oc4mc-2009XXXX
> 4) wget ocaml 3.10.2 (tar.gz or tar.bz2)
> 5) bash build.sh
> 6) cd tests
> 7) make matmul.th
> 8) time ./matmul.th 1000 8

>
> Sorry it's messy, we are thinking about something cleaner... (there's
> a matter of lack of time somewhere)

No problem. I'll be happy to get anything working!

Following your advice, it seems to work perfectly now:

$ ./matmul.th 500 1
Temp de calcul: utime 2.324145, stime 0.020001, rtime 2.325608
$ ./matmul.th 500 2
Temp de calcul: utime 1.780111, stime 0.000000, rtime 0.890797
$ ./matmul.th 500 3
Temp de calcul: utime 1.784111, stime 0.004000, rtime 0.608895
$ ./matmul.th 500 4
Temp de calcul: utime 1.764110, stime 0.004000, rtime 0.451214
$ ./matmul.th 500 5
Temp de calcul: utime 1.768111, stime 0.000000, rtime 0.393285
$ ./matmul.th 500 6
Temp de calcul: utime 1.924120, stime 0.004001, rtime 0.333215
$ ./matmul.th 500 7
Temp de calcul: utime 1.788112, stime 0.000000, rtime 0.302328
$ ./matmul.th 500 8
Temp de calcul: utime 1.992124, stime 0.000000, rtime 0.290383

Wow! 2.6x faster on 2 cores is good. ;-)

That's a really fantastic piece of work. I'll do my best to study it and write
literature about it. May I ask, can you give a rough overview of the design?
For example, is there a separate nursery per thread so each thread can
allocate a certain amount before incurring a global pause? Do you have any
ideas for libraries built on top of this, such as a task parallel library
using work-stealing deques?

Thanks very much!!!

Richard Jones

unread,
Sep 24, 2009, 5:49:55 AM9/24/09
to Jon Harrop, caml...@yquem.inria.fr, Philippe Wang
On Thu, Sep 24, 2009 at 02:47:17AM +0100, Jon Harrop wrote:
> Wow! 2.6x faster on 2 cores is good. ;-)

Isn't that impossible? Or is the multicore GC better than the single
threaded one? (Sorry if this is a stupid or obvious question)

Rich.

--
Richard Jones
Red Hat

kch...@math.carleton.ca

unread,
Sep 24, 2009, 6:01:48 AM9/24/09
to caml...@yquem.inria.fr
> On Thursday 24 September 2009 01:01:58 you wrote:

> No problem. I'll be happy to get anything working!
>
> Following your advice, it seems to work perfectly now:

I'm not too familiar with concurrency in ocaml.
How does OC4MC compare with JoCaml?

ri...@happyleptic.org

unread,
Sep 24, 2009, 6:01:47 AM9/24/09
to caml...@yquem.inria.fr
> > Wow! 2.6x faster on 2 cores is good. ;-)
>
> Isn't that impossible? Or is the multicore GC better than the single
> threaded one? (Sorry if this is a stupid or obvious question)

There are so many factors that makes the running time unpredictable that
nothing is surprising any more. Haven't you read this paper [1] about the
length of an environment variable causing a program to be 10% faster or
slower ? :)

[1]: http://www-plan.cs.colorado.edu/diwan/asplos09.pdf

Florian Hars

unread,
Sep 24, 2009, 6:41:55 AM9/24/09
to Richard Jones, Jon Harrop, caml...@yquem.inria.fr, Philippe Wang
Richard Jones schrieb:

> On Thu, Sep 24, 2009 at 02:47:17AM +0100, Jon Harrop wrote:
>> Wow! 2.6x faster on 2 cores is good. ;-)
>
> Isn't that impossible? Or is the multicore GC better than the single
> threaded one? (Sorry if this is a stupid or obvious question)

It might just happen that the size of the working set and memory
access pattern of the application is just right so that you get a
better interleaving of cache misses and thread execution if you
run more than two threads on two cores. Hyperthreading might muddle
things further.

- Florian

Jon Harrop

unread,
Sep 24, 2009, 7:34:18 AM9/24/09
to caml...@yquem.inria.fr
On Thursday 24 September 2009 10:49:43 Richard Jones wrote:
> On Thu, Sep 24, 2009 at 02:47:17AM +0100, Jon Harrop wrote:
> > Wow! 2.6x faster on 2 cores is good. ;-)
>
> Isn't that impossible? Or is the multicore GC better than the single
> threaded one? (Sorry if this is a stupid or obvious question)

Superlinear scaling is entirely possible because more cores can mean more
cache in play. However, I have only seen superlinear scaling on AMD hardware
and not Intel hardware.

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________

Jon Harrop

unread,
Sep 24, 2009, 7:41:20 AM9/24/09
to caml...@yquem.inria.fr
On Thursday 24 September 2009 11:00:57 kch...@math.carleton.ca wrote:
> > On Thursday 24 September 2009 01:01:58 you wrote:
> >
> > No problem. I'll be happy to get anything working!
> >
> > Following your advice, it seems to work perfectly now:
>
> I'm not too familiar with concurrency in ocaml.
> How does OC4MC compare with JoCaml?

JoCaml is all about concurrency: minimizing latency. Oc4mc is all about
parallelism: maximizing throughput.

Until now, OCaml sucked at parallelism. You can sometimes obtain some
parallelism by forking threads but it is asymptotically slower than using
shared memory. Consequently, oc4mc is a hugely-important development in the
OCaml world because it means that OCaml programmers can write OCaml programs
that use multicore machines efficiently for the first time.

The next steps are to get oc4mc into the apt repositories and build some
libraries that make parallelism easier (like Microsoft's Task Parallel
Library).

Rakotomandimby Mihamina

unread,
Sep 24, 2009, 7:55:33 AM9/24/09
to caml...@yquem.inria.fr, debian-oc...@lists.debian.org
09/24/2009 02:52 PM, Jon Harrop:

> The next steps are to get oc4mc into the apt repositories

Amen! ;-)

--
Architecte Informatique chez Blueline/Gulfsat:
Administration Systeme, Recherche & Developpement
+261 34 29 155 34

ri...@happyleptic.org

unread,
Sep 24, 2009, 8:11:49 AM9/24/09
to caml...@yquem.inria.fr
> Until now, OCaml sucked at parallelism. (...) OCaml programmers

> can write OCaml programs that use multicore machines efficiently
> for the first time.

Subtle and strongly argumented, as expected.

Philippe Wang

unread,
Sep 24, 2009, 8:14:45 AM9/24/09
to Jon Harrop, caml...@yquem.inria.fr
On Thu, Sep 24, 2009 at 3:47 AM, Jon Harrop <j...@ffconsultancy.com> wrote:
> Following your advice, it seems to work perfectly now:

:-)

> Wow! 2.6x faster on 2 cores is good. ;-)

your machine is more generous than ours (which is Intel, not AMD) :-)

> That's a really fantastic piece of work. I'll do my best to study it and write
> literature about it. May I ask, can you give a rough overview of the design?
> For example, is there a separate nursery per thread so each thread can
> allocate a certain amount before incurring a global pause? Do you have any
> ideas for libraries built on top of this, such as a task parallel library
> using work-stealing deques?

A few words on the GC's design (that uses stop&copy algorithm several times) :

Heaps :
- a set of pages are used to give threads the possibility to allocate
memory without interfering with other threads, such as there is no
mutex locking at local memory allocation. Each thread borns with an
empty page, when it's full, the thread takes another one.
- a big heap is shared between all, there is a mutex over it to
prevent parallel memory allocation into this one.

Collection :
- when there are no pages left, a collection stops-the-world and
copies living values (of the pages) to the shared heap
- when the shared heap is full, a collection stops-the-world and
copies all living values (pages+shared heap) to a new shared heap
(which can be grow if need be)

Special operations :
- if there is a blocking operation (e.g. mutex lock or I/O operation),
the mechanism is roughly the same as original INRIA OCaml's : it tells
the GC that there is no need to stop it when stopping the world.
- if there is a thread with no allocation and no blocking operation,
the behaviur is the same as INRIA OCaml.


The number of pages, the size of a page, and the size of the shared
heap can be changed before running a program by setting some
environment variables (cf. last lines README file included in the
distribution package).

--
Philippe Wang
ma...@philippewang.info

Stefano Zacchiroli

unread,
Sep 24, 2009, 8:40:13 AM9/24/09
to caml...@yquem.inria.fr
On Thu, Sep 24, 2009 at 12:52:24PM +0100, Jon Harrop wrote:
> The next steps are to get oc4mc into the apt repositories and build

Uhm, I'm curious: how do you plan to achieve that?
AFAICT the patch is only against 3.10.2, and in Debian we're at 3.11.1.

Thus far, we have never had support for more than one version of OCaml
at a time. If it were worth we can surely consider that, but the current
uncertainty about OC4MC future doesn't seem enough to justify that.

So, the real question is: is OC4MC going to be ported to mainline OCaml
and support in the future or not? If the answer is "no", I don't see it
arriving in Debian anytime soon.

Cheers.

--
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
zack@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/
Dietro un grande uomo c'� ..| . |. Et ne m'en veux pas si je te tutoie
sempre uno zaino ...........| ..: |.... Je dis tu � tous ceux que j'aime

signature.asc

Jon Harrop

unread,
Sep 24, 2009, 8:59:00 AM9/24/09
to caml...@yquem.inria.fr
On Thursday 24 September 2009 13:39:40 Stefano Zacchiroli wrote:
> On Thu, Sep 24, 2009 at 12:52:24PM +0100, Jon Harrop wrote:
> > The next steps are to get oc4mc into the apt repositories and build
>
> Uhm, I'm curious: how do you plan to achieve that?

Good question. I have no idea, of course. :-)

> AFAICT the patch is only against 3.10.2, and in Debian we're at 3.11.1.

Philippe, is it feasible to bring your patches up to date wrt OCaml?

> Thus far, we have never had support for more than one version of OCaml
> at a time. If it were worth we can surely consider that, but the current
> uncertainty about OC4MC future doesn't seem enough to justify that.

Fair enough. I think this is the single most important development OCaml has
seen since its inception so I would personally drop OCaml in favor of oc4mc
even if it meant reverting to 3.10.2.

There is also the issue that this is x64 only...

> So, the real question is: is OC4MC going to be ported to mainline OCaml
> and support in the future or not? If the answer is "no", I don't see it
> arriving in Debian anytime soon.

Yes, that would be ideal. Pretty please, Xavier? ;-)

Jon Harrop

unread,
Sep 24, 2009, 9:00:55 AM9/24/09
to caml...@yquem.inria.fr
On Thursday 24 September 2009 13:14:35 Philippe Wang wrote:
> On Thu, Sep 24, 2009 at 3:47 AM, Jon Harrop <j...@ffconsultancy.com> wrote:
> > Following your advice, it seems to work perfectly now:
> >
> :-)
> :
> > Wow! 2.6x faster on 2 cores is good. ;-)
>
> your machine is more generous than ours (which is Intel, not AMD) :-)

Yes. I don't know why AMD are so much better at this but I have seen it
several times now.

> > That's a really fantastic piece of work. I'll do my best to study it and
> > write literature about it. May I ask, can you give a rough overview of
> > the design? For example, is there a separate nursery per thread so each
> > thread can allocate a certain amount before incurring a global pause? Do
> > you have any ideas for libraries built on top of this, such as a task
> > parallel library using work-stealing deques?
>
> A few words on the GC's design (that uses stop&copy algorithm several
> times) :
>
> Heaps :
> - a set of pages are used to give threads the possibility to allocate
> memory without interfering with other threads, such as there is no
> mutex locking at local memory allocation. Each thread borns with an
> empty page, when it's full, the thread takes another one.
> - a big heap is shared between all, there is a mutex over it to
> prevent parallel memory allocation into this one.
>
> Collection :
> - when there are no pages left, a collection stops-the-world and
> copies living values (of the pages) to the shared heap
> - when the shared heap is full, a collection stops-the-world and
> copies all living values (pages+shared heap) to a new shared heap
> (which can be grow if need be)

Ok, so this is stop&copy GC with per-thread nurseries/gen0.

Are values such as float arrays copied in their entirety or are they allocated
outside the shared heap and only a pointer to them is copied?

Is the copy operation parallelized?

Is there a write barrier but no read barrier? If so, what exactly does the
write barrier do?

> Special operations :
> - if there is a blocking operation (e.g. mutex lock or I/O operation),
> the mechanism is roughly the same as original INRIA OCaml's : it tells
> the GC that there is no need to stop it when stopping the world.

Can users mark external calls in their bindings as blocking so the GC will
treat them appropriately?

> - if there is a thread with no allocation and no blocking operation,
> the behaviur is the same as INRIA OCaml.
>
> The number of pages, the size of a page, and the size of the shared
> heap can be changed before running a program by setting some
> environment variables (cf. last lines README file included in the
> distribution package).

Great!

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

_______________________________________________

Rakotomandimby Mihamina

unread,
Sep 24, 2009, 9:41:21 AM9/24/09
to caml...@yquem.inria.fr
09/24/2009 03:39 PM, Stefano Zacchiroli:

> So, the real question is: is OC4MC going to be ported to mainline OCaml
> and support in the future or not?

I dont write so much programs that would really require multiple cores.
But I think this is such a good "feature" that should be inclided in
the main distribution...

--
Architecte Informatique chez Blueline/Gulfsat:
Administration Systeme, Recherche & Developpement
+261 34 29 155 34

_______________________________________________

Mike Lin

unread,
Sep 24, 2009, 10:01:43 AM9/24/09
to caml...@yquem.inria.fr
On Thu, Sep 24, 2009 at 8:39 AM, Stefano Zacchiroli <za...@debian.org> wrote:

>
> So, the real question is: is OC4MC going to be ported to mainline OCaml
> and support in the future or not?


Recalling how mainline had us waiting like 5 years for native exception
backtraces, and then another like 3 years for the ability to access the
backtrace within the program, I most certainly hope NOT :)
(Nothing personal to INRIA, I work on academic projects and well know how
these things go, it's just not the most awesome maintenance schedule for
one's main PL)

Dario Teixeira

unread,
Sep 24, 2009, 10:11:35 AM9/24/09
to caml...@inria.fr, Philippe Wang
Hi,

Cheers for the work you guys put into this project! And I'd like to join
the crowd that has questions, if I may:

a) If I understand correctly, part of prerequisites for implementing the
new GC was cleaning up the excessive use of imperative constructs in
the compiler's tree. Will the new tree be also more amenable to the
implementation of new language constructs such as GADTs?

b) Could you quantify the performance penalty (if any) of using the new GC
in a single-thread context? And should this penalty be significant, are
there provisions for a compile-time choice of which GC to use?

c) Is there an understanding between you and the folks at INRIA concerning
the eventual merging of this code into the mainline tree?

Thanks a lot for your time!
Best regards,
Dario Teixeira

Philippe Wang

unread,
Sep 24, 2009, 10:22:40 AM9/24/09
to Rakotomandimby Mihamina, caml...@yquem.inria.fr
On Thu, Sep 24, 2009 at 3:40 PM, Rakotomandimby Mihamina
<miha...@gulfsat.mg> wrote:
> 09/24/2009 03:39 PM, Stefano Zacchiroli:
>>
>> So, the real question is: is OC4MC going to be ported to mainline OCaml
>> and support in the future or not?
>
> I dont write so much programs that would really require multiple cores.
> But I think this is such a good "feature" that should be inclided in
> the main distribution...

Thing is that having a runtime library that supports parallel threads
costs more than having a runtime library that doesn't.

Programs that take advantage of multicore architectures are not easy
to write, not easy to maintain, not easy to debug, ...
So "it's a great feature, so it should get into mainstream" is not a
good enough reason for INRIA's team. It's probably up to the community
to find a great way of taking advantage of multicore architectures.

One must be aware that
- parallel threads vs not-parellel threads : if a program is well
suited to parallel computing on multicore CPUs, then it means that
not-parallel-capable runtime library puts the performance bottleneck
at the CPU. Then, allowing parallel threads means *moving* this
bottleneck (moving, not removing) : indeed, it's much likely that the
bottleneck will then be at memory (RAM) bandwidth. See, if your memory
is 1000 MHz, having 8 cores means 125MHz/core, which becomes
ridiculous even if it were 2400MHz it would mean only 300MHz/core,
imaging a 300MHz memory bandwidth for a 3GHz core ! So it's *very*
important to keep that in mind.
- for programming langages that are from the early beginning quite
slower than INRIA OCaml, it's much easier to gain performance because
they come from far, sometimes from very very far.

Well, from a quite subjective personal point of view, of course it
would be really great to give parallel threads capability to
mainstream INRIA OCaml, because it would mean having found a (great)
acceptable solution.

--
Philippe Wang
ma...@philippewang.info

Philippe Wang

unread,
Sep 24, 2009, 10:39:50 AM9/24/09
to Dario Teixeira, caml...@inria.fr, Philippe Wang
On Thu, Sep 24, 2009 at 4:11 PM, Dario Teixeira <dariot...@yahoo.com> wrote:
> Hi,
>
> Cheers for the work you guys put into this project! And I'd like to join
> the crowd that has questions, if I may:
>
> a) If I understand correctly, part of prerequisites for implementing the
> new GC was cleaning up the excessive use of imperative constructs in
> the compiler's tree. Will the new tree be also more amenable to the
> implementation of new language constructs such as GADTs?

Nope...
We wanted not to touch the code generator (or any other part of the
compiler). Eventually, we had to modify a very little bit the code
generator so that it does not compact too much the generated code.
That meant changing less than 10 lines of ml code.

> b) Could you quantify the performance penalty (if any) of using the new GC
> in a single-thread context? And should this penalty be significant, are
> there provisions for a compile-time choice of which GC to use?

Very few programs that are not written with multicore in mind would
not be penalized.
I mean our GC is much much dumber than INRIA OCaml's one.
Our goal was to show it was possible to have good performance with
multicores for OCaml.
Maybe someday we'll find some time to optimize the GC, but it's likely
not very soon.

> c) Is there an understanding between you and the folks at INRIA concerning
> the eventual merging of this code into the mainline tree?

Almost same answer as the previous one.
We have shown that it's possible to enjoy multicore for performance.
The changes over the whole runtime library are not easy to merge into
mainstream.

It is very important to know this : the runtime library is written in
C (and a little part is in ASM in order to have better performance...
but mainly because of the "foreign function interface" so there is no
way to ignore it). Its type system really sucks (comparing to
OCaml's). When you change a very little part, it will tell you that
you were wrong, but not with a hard-to-understand type error message :
it will be some tricky dirty segmentation fault, which can sometimes
that days or weeks, even months, to take down.


I guess that if INRIA decides to implement parallel threads
capability, they will have to make the runtime library ready (clean up
some global variables, tidy the code like remove compatibility.h and
such stuff) before thinking about the GC. This could take some time,
because it's not good to break everything at once. Then, if they have
finished this step, I would be confident that they could integrate an
awesome GC.
But that's only my personal opinion...

Oh, why they wouldn't take OC4MC? ... If I were them, I wouldn't. We
have probably broken some stuff such as Weak or Lazy, so there is no
chance to bootstrap with OC4MC. Well, I mean that it's better to
change INRIA's OCaml with all the lessons learnt than to try to fix
OC4MC such that it's fully compatible with latest version of INRIA
OCaml.

--
Philippe Wang
ma...@philippewang.info

Stefano Zacchiroli

unread,
Sep 24, 2009, 10:49:50 AM9/24/09
to caml...@yquem.inria.fr
On Thu, Sep 24, 2009 at 04:40:53PM +0300, Rakotomandimby Mihamina wrote:
> I dont write so much programs that would really require multiple cores.
> But I think this is such a good "feature" that should be inclided in
> the main distribution...

I think you miss what does that would mean in terms of efforts for
maintaining the corresponding packages. De facto, it would mean
duplicating all source packages of the libraries you want to be able to
build against ocaml 3.10.2 + OC4MC.

You want PCRE? then you need two PCRE packages (3.11 and 3.10.2 4MC)
You want ocamlnet? then you need two ocamlnet packages

You got the picture :-)

Additionally, it would also mean supporting in-house potential security
problems arising for old version of the compiler (or even 3rd party
libraries when you will be forced to "fork" then due to source-level
incompatibilities between versions) without any upstream support.

Not fun.

Cheers.

--
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
zack@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/
Dietro un grande uomo c'� ..| . |. Et ne m'en veux pas si je te tutoie
sempre uno zaino ...........| ..: |.... Je dis tu � tous ceux que j'aime

_______________________________________________

Stefano Zacchiroli

unread,
Sep 24, 2009, 10:52:29 AM9/24/09
to caml...@yquem.inria.fr

But the result you are anticipating will actually mean low acceptance of
OC4MC among "common" users, possibly close to 0. All "mainstream" ways
of distributing OCaml (both .rpm and .deb distros, GODI, ...) are
regularly switching to most recent versions of the compiler.

The only people being able to stay to 3.10.2 to benefit of OC4MC will be
industries which fixed their developed on a specific version and do not
plan to change.

Or am I missing something?
Cheers.

--
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
zack@{upsilon.cc,