Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Caml-list] "ok with parallel threads" GC (aka ocaml for multicore)

178 views
Skip to first unread message

Philippe Wang

unread,
Apr 10, 2009, 1:04:08 PM4/10/09
to caml...@inria.fr, Philippe Wang
Hello list,

Mathias and Adrien have just started their internship (for their
Master's degree requirements).
Thus they have some time to spend on this project. Moreover, Mathias'
internship is strongly related to this project.
=> man power dramatically increased

We are currently searching for the last remaining bugs.

Our thread library is restricted, it contains:
Thread : create, join, yield, id, self, delay
Mutex : full module
Condition : full module


Our alternative garbage collector
- uses a Stop(the world)&Copy algorithm
- has memory pages for threads (each thread takes a page at its
creation)
- has a shared heap for shared values and for old generation from
pages (i.e. memory pages are flushed to this heap)
- should be not to hard to replace.
Blocking sections such as I/O operations or mutex locks do not prevent
garbage collection.

We currently do *not* support POSIX signals (let's say their behaviour
is not specified).

We should make a release soon, but before:
- some code has to be cleaned
- some benchmarks have to be done
- some documentation has to be completed
- an installation script still has to be written.
Thus not a lot is left to do before the release :-)

We are writing test programs to search for the last remaining bugs but
also to measure performances.

So far, as long as there are not too many concurrent memory accesses,
it is not too hard to go n times faster with a n-core CPU;
though intense memory accesses generate page faults and divide memory
bandwidth by the number of concurrent accesses,
and intense memory consuming programs show our GC is not as performant
as INRIA's, of course.

Cheers,

--
Philippe Wang
Philippe.Wang \at/ lip6.fr


PS: Sorry for taking so much time, debugging parallel threads in
shared memory style is hell (you can give it a try).

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Jon Harrop

unread,
Apr 10, 2009, 4:46:03 PM4/10/09
to caml...@yquem.inria.fr
On Friday 10 April 2009 18:04:12 Philippe Wang wrote:
> We should make a release soon, but before:
> - some code has to be cleaned
> - some benchmarks have to be done
> - some documentation has to be completed
> - an installation script still has to be written.
> Thus not a lot is left to do before the release :-)

This is the best news I've heard since OCaml 3.11 made it into testing and
Batteries went into beta!

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Phil Tomson

unread,
Apr 10, 2009, 6:38:45 PM4/10/09
to

Is there a website where we can track progress? Is this a library
that can be downloaded?

Phil

Jerome Benoit

unread,
Apr 10, 2009, 7:20:42 PM4/10/09
to Philippe Wang, caml...@inria.fr
Le Fri, 10 Apr 2009 19:04:12 +0200,
Philippe Wang <Philip...@lip6.fr> a écrit :

Hello,

> We should make a release soon, but before:
> - some code has to be cleaned
> - some benchmarks have to be done
> - some documentation has to be completed
> - an installation script still has to be written.

Is there a public SCM repository I can pull off the source ?

Thanks.

--
Jérôme Benoit aka fraggle
La Météo du Net - http://grenouille.com
OpenPGP Key ID : 9FE9161D
Key fingerprint : 9CA4 0249 AF57 A35B 34B3 AC15 FAA0 CB50 9FE9 161D

Philippe Wang

unread,
Apr 14, 2009, 6:21:50 AM4/14/09
to fo...@x9c.fr, caml...@inria.fr, Philippe Wang
On Apr 10, 2009, at 20:36 CEDT, fo...@x9c.fr wrote:

> Would it be correct to assume that the current state of the project
> implies that you have patched the OCaml runtime to make it
> fully reentrant ?

Is this following partial answer relevant enough ?
The garbage collector is clearly separated from the rest of the
runtime library: the GC is contained in a libgc.a and our patched
ocamlopt links programs to this GC. The GC variables are known by the
GC only.

> If so, is this code/patch available for download ?

Officially, not yet (and not before April 20th).

We did not expect the debugging part to be so complex and hard, and
taking so long.
The man power dramatically decreased in late September : the 2
Master's students went back to Master's courses, and the 3 supervisors
had to do research in parallel with teaching.
Some major bug fixes were made in February/March, a lot of major bug
fixes were made in April (yes, these last 2 weeks).
You know, bugs hiding other bugs... however we are hopefully getting
close to the fix point: today there is no known bug ! :-)
Unsupported features are - of course - not considered as bugs. For
instance, posix signals are (currently) not supported. And, as
parallel computing *potentially* requires quite a lot more memory,
some programs can easily end up in a blocked state when the heap
becomes full: our GC (currently) uses (parameterized) fixed size pages
and heap.

The next days, we will concentrate on making benchmarks (if you have
some relevant testing programs, they are welcome), and if we don't
discover new bugs then we will focus on (finishing) writing a
documentation and a building script, for the release.
If we release as such now, we will have too much support to do because
of the lack of documentation. So it's not quite a good idea...
When we have the minimal-but-sufficient documentation, we will make
the release :-)

In parallel, we try to make it work with OS X Leopard 64 bit and/or
ocaml 3.11 (currently we only support 3.10.2 - Linux x86_64).

> Anyway, wholehearted respect for undertaking such a complex project.
> Good luck in your bug-chasing tasks !

Thanks.

--
Philippe Wang
Philippe.Wang \at/ lip6.fr

N.B. I hope we will not discover new bugs in our amd64.S, our assembly
guru is enjoying (abroad with no www) his "vacances de pâques"...

xclerc

unread,
Apr 14, 2009, 10:18:25 AM4/14/09
to caml...@inria.fr

Le 14 avr. 09 à 12:21, Philippe Wang a écrit :

> On Apr 10, 2009, at 20:36 CEDT, fo...@x9c.fr wrote:
>
>> Would it be correct to assume that the current state of the project
>> implies that you have patched the OCaml runtime to make it
>> fully reentrant ?
>
> Is this following partial answer relevant enough ?
> The garbage collector is clearly separated from the rest of the
> runtime library: the GC is contained in a libgc.a and our patched
> ocamlopt links programs to this GC. The GC variables are known by
> the GC only.

Well, this is not what I had in mind, but I realize that my question
was not clear. A better question would have been:
Is your implementation still based on a global runtime lock ?

A negative answer would imply that you patched the OCaml
runtime to make it reentrant. To illustrate my point, I will take
the example of the file "byterun/compare.c". In this file, the
code for the comparison of values makes use of a global
variable (named "caml_compare_unordered").

So, allowing multiple threads to execute primitives from the
runtime at the same time may result in incorrect outcomes.
Unless you patch the runtime to make it reentrant, hence my
original (foolishly indirect) question.

If you patched the runtime to allow multiple threads to use
it concurrently, I would also be interested by the strategy
used: is the problem solved by additional parameters ?
by the use of thread-local storage ? by any other mean ?


Xavier Clerc

Philippe Wang

unread,
Apr 16, 2009, 5:45:50 AM4/16/09
to caml...@inria.fr
> Le 14 avr. 09 à 12:21, Philippe Wang a écrit :
>
>> The garbage collector is clearly separated from the rest of the
>> runtime library: the GC is contained in a libgc.a and our patched
>> ocamlopt links programs to this GC. The GC variables are known by
>> the GC only.
>
> Well, this is not what I had in mind, but I realize that my question
> was not clear. A better question would have been:
> Is your implementation still based on a global runtime lock ?

No, it isn't. (And it would probably too often prevent parallel
threads, wouldn't it.)

> A negative answer would imply that you patched the OCaml
> runtime to make it reentrant. To illustrate my point, I will take
> the example of the file "byterun/compare.c". In this file, the
> code for the comparison of values makes use of a global
> variable (named "caml_compare_unordered").

It should be, unless bugs are still hiding under that.
It is supposed to have been done for a while. We'll make one more check.

> If you patched the runtime to allow multiple threads to use
> it concurrently, I would also be interested by the strategy
> used: is the problem solved by additional parameters ?
> by the use of thread-local storage ? by any other mean ?

- local variable instead of global variable in functions
- some functions are parameterized by thread identifier (that is, one
more parameter than "before") (e.g. in amd64.S)

Philippe Wang

Philippe Wang

unread,
Apr 17, 2009, 6:16:12 PM4/17/09
to caml...@inria.fr, Philippe Wang

On Apr 16, 2009, at 11:45 CEDT, Philippe Wang wrote:

>> A negative answer would imply that you patched the OCaml
>> runtime to make it reentrant. To illustrate my point, I will take
>> the example of the file "byterun/compare.c". In this file, the
>> code for the comparison of values makes use of a global
>> variable (named "caml_compare_unordered").
>
> It should be, unless bugs are still hiding under that.
> It is supposed to have been done for a while. We'll make one more
> check.
>
>> If you patched the runtime to allow multiple threads to use
>> it concurrently, I would also be interested by the strategy
>> used: is the problem solved by additional parameters ?
>> by the use of thread-local storage ? by any other mean ?
>
> - local variable instead of global variable in functions
> - some functions are parameterized by thread identifier (that is,
> one more parameter than "before") (e.g. in amd64.S)

Well, we went back into runtime code implementation.

This is what can be said rapidly :
- compare.c contains no global variables anymore, we use local
variables instead
- if a Caml-C or C-Caml interface uses caml_compare_unordered, we
don't know what can happen with parallel threads
- we have many global mutex locks with small scopes
- we do use an "enter/leave blocking section" mechanism to prevent the
GC from waiting on a blocking operation such as I/O or mutex locking
etc.
- we don't support weak values (not sure whether they don't work or
they became strong, if they dont work anymore, they can be back in 2
minutes as strong values anyway)
- serialisation of values is a little bit tricky, though it should work
- most important : many global variables do not exist anymore because
they are irrelevant in our implementation
- we do not support unofficial-features of ocaml 3.10, e.g. the new
features that come with 3.11 but actually have their roots in previous
versions
~ it is almost sad to see all the based-on-"one-thread-at-a-time"
optimisations removed...
+ (it looks like it works just fine)


I hope there are no "hidden" bad global variables.

Is it fully reentrant ? Hmmmm... maybe.

We use a stop-the-world GC (which means no one is running is parallel
with the GC), that is actually like original ocaml, that comes with
its inconveniences : C calls not declared as blocking sections (which
has quite a cost) may prevent other threads from running when the heap
is full.

Graphics module, for instance, is not reentrant at all (anyhow it's
not part of the runtime). Same for Str.
Funny thing is we can open several windows by launching parallel
threads (though only one is useful at the end).


Anyway, thank you for your questions and interest, they have helped us
find&fix some bugs.


--
Philippe Wang
http://www-apr.lip6.fr/~pwang/


PS. We tried to switch to 3.11, but it seems to need too much time,
it's far from being a piece of cake.
We have tried to make it work on Leopard (actually, I failed the 1st
time - half the way, I may try again if I have time).

=====================================
A free very personal advice that may save you some headaches:
do not program in concurrent shared memory style, especially when you
can replace "concurrent" by "parallel".
Even if it may have a future, even if it may sound great, even if it
sounds exciting, even if it helps you go faster, even if <put-here-
whatever-you-want>,
it is **awful**. Well, if you really really don't have a choice, never
mind what I said.

Joel Reymont

unread,
Apr 17, 2009, 6:21:07 PM4/17/09
to Philippe Wang, caml...@inria.fr

On Apr 17, 2009, at 11:15 PM, Philippe Wang wrote:

> PS. We tried to switch to 3.11, but it seems to need too much time,
> it's far from being a piece of cake.
> We have tried to make it work on Leopard (actually, I failed the 1st
> time - half the way, I may try again if I have time).


What was the problem with Leopard?

---
Mac hacker with a performance bent
http://linkedin.com/in/joelreymont

Philippe Wang

unread,
Apr 17, 2009, 6:29:57 PM4/17/09
to Joel Reymont, caml...@inria.fr, Philippe Wang

On Apr 18, 2009, at 00:20 CEDT, Joel Reymont wrote:

>
> On Apr 17, 2009, at 11:15 PM, Philippe Wang wrote:
>
>> PS. We tried to switch to 3.11, but it seems to need too much
>> time, it's far from being a piece of cake.
>> We have tried to make it work on Leopard (actually, I failed the
>> 1st time - half the way, I may try again if I have time).
>
>
> What was the problem with Leopard?

We have our own amd64.S which is based on but quite different from
ocaml 3.10 amd64.S.
ocaml 3.11 has its amd64.S changed for Leopard.
*And* we have made some small changes to emit.ml.
Then, building the whole thing is not trivial.

Building ocaml 3.10 with our modified runtime is not trivial with
Linux x86_64 : we use an intern script that does it. Without it, it
would need hours to build it.
Adapting the script for Leopard is not trivial to do. But we think
it's quite possible.

Besides, it was late at night and I was sleepy and tired when I tried.

Another answer is Leopard needs the changes made to amd64.S from 3.10
to 3.11, which mainly fix an alignment problem.


--
Philippe Wang
Philip...@lip6.fr
http://www-apr.lip6.fr/~pwang/

fo...@x9c.fr

unread,
Apr 17, 2009, 7:05:55 PM4/17/09
to caml users

Le 18 avr. 09 à 00:15, Philippe Wang a écrit :


I was indeed mostly worried with the runtime itself. I wanted to have
a fully reentrant runtime for the OCaml-Java
project (to be able to execute several programs in the very same JVM)
and remember that it implied primitives
from "compare.c", "hash.c", "intern.c" and "extern.c" among others to
be written differently for this purpose.

Out of curiosity: you state that your GC is of "stop-the-world"
nature, what about finalizers ?
Are they executed by the GC thread when the world is stopped or
concurrently with
application threads ?
Not sure this question really matters, just curious (I mean, it is
doubtful that one would write finalizers with
a long execution time).


Keep up the good work !

Xavier Clerc

Philippe Wang

unread,
Apr 23, 2009, 12:49:18 PM4/23/09
to caml...@inria.fr, Philippe Wang
> On Sat, Apr 18, 2009 at 1:05 AM, fo...@x9c.fr <fo...@x9c.fr> wrote:
> I was indeed mostly worried with the runtime itself. I wanted to
> have a fully reentrant runtime for the OCaml-Java
> project (to be able to execute several programs in the very same
> JVM) and remember that it implied primitives
> from "compare.c", "hash.c", "intern.c" and "extern.c" among
> others to be written differently for this purpose.

Indeed. We have made them reentrant but we haven't made much stress
testing on that.
Reentrance on those are not free (they have a cost), and the way we
chose is the "simplest or quickest way".

> Out of curiosity: you state that your GC is of "stop-the-world"
> nature, what about finalizers ?
> Are they executed by the GC thread when the world is stopped or
> concurrently with
> application threads ?
> Not sure this question really matters, just curious (I mean, it
> is doubtful that one would write finalizers with
> a long execution time).

Finalizers are supposed to be called by the thread that does the
garbage collection, so there is no concurrency with finalizers as the
rest of the world is meant to be stopped when garbage collecting.
(Our garbage collector does not try to be as smart as the original one
on many many things)

By the way, we are late on writing the documentation for our future
release...
but we have just implemented a (simple) experimental growing heap.

Here is a quote from wikipedia (http://en.wikipedia.org/wiki/Speedup):
<<
Sometimes a speedup of more than N when using N processors is observed
in parallel computing, which is called super linear speedup. Super
linear speedup rarely happens and often confuses beginners, who
believe the theoretical maximum speedup should be N when N processors
are used.
>>
Well, we *have* observed that on a matrix multiplication :-)

_______________________________________________

Eric Merritt

unread,
Jun 2, 2009, 10:25:56 PM6/2/09
to
Just out of curiosity, was there an update here? Phillip mentioned
that a possible source release on the 20th of April. I can't find any
references to this out side of this thread though.

>   Philippe.W...@lip6.fr

0 new messages