[Caml-list] Regarding SMP computing

Jacques Carette

unread,

Sep 25, 2006, 8:21:54 AM9/25/06

to caml...@inria.fr

Over on Haskell-cafe, Simon Peyton-Jones says:
"GHC 6.6 (release candidate available) supports parallel execution on
SMP machines.

Garbage collection is not parallelised yet, something we plan to fix
this autumn."

A bit of competition is a good thing, isn't it?

Jacques

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

skaller

unread,

Sep 25, 2006, 9:46:38 AM9/25/06

to Jacques Carette

On Mon, 2006-09-25 at 08:13 -0400, Jacques Carette wrote:
> Over on Haskell-cafe, Simon Peyton-Jones says:
> "GHC 6.6 (release candidate available) supports parallel execution on
> SMP machines.
>
> Garbage collection is not parallelised yet, something we plan to fix
> this autumn."
>
> A bit of competition is a good thing, isn't it?

Actually this isn't quite accurate, at least as I understand it.
GHC already does have fully parallel minor heap allocation
and compaction etc using a per-thread minor heap,
but the major collection isn't only not parallel ..
it isn't even concurrent.

[Using Cheng's terminology, parallel collection implies
multiple collector threads can cooperate so that the system
can scale to very large numbers of CPUs by allowing more
than one CPU to run the collector in simultaneously,
'merely' concurrent includes collectors limited to a
single collector thread]

--
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

Richard Jones

unread,

Sep 25, 2006, 3:46:08 PM9/25/06

to Jacques Carette

On Mon, Sep 25, 2006 at 08:13:50AM -0400, Jacques Carette wrote:
> Over on Haskell-cafe, Simon Peyton-Jones says:
> "GHC 6.6 (release candidate available) supports parallel execution on
> SMP machines.
>
> Garbage collection is not parallelised yet, something we plan to fix
> this autumn."
>
> A bit of competition is a good thing, isn't it?

Can someone explain how/if this is better than using MPI for
parallelism? The reason I ask is that we are starting to use MPI for
real on SMP machines to process our larger datasets.

On a related note, is/could there be a way in OCaml to tell the GC not
to bother going "into" a particular data structure to check if it part
of it has become unreachable? Our Weblogs library[1] loads a whole
set of log files into memory (multiple gigabytes of memory), basically
stored in a big linked list with lots of smaller structures hanging
off each element, and I *know* that the hanging elements will never
become unreachable unless the main linked list does so. I'd like to
be able to tell the runtime that. Otherwise every time we do a major
collection it's going to be uselessly iterating over the whole lot,
and that appears to have particularly bad consequences as soon as we
start to go into swap.

Rich.

[1] http://merjis.com/developers/weblogs

--
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Internet Marketing and AdWords courses - http://merjis.com/courses - NEW!
Merjis blog - http://blog.merjis.com - NEW!

Yoann Padioleau

unread,

Sep 25, 2006, 4:00:17 PM9/25/06

to Richard Jones

Richard Jones <ri...@annexia.org> writes:

> On Mon, Sep 25, 2006 at 08:13:50AM -0400, Jacques Carette wrote:
>> Over on Haskell-cafe, Simon Peyton-Jones says:
>> "GHC 6.6 (release candidate available) supports parallel execution on
>> SMP machines.
>>
>> Garbage collection is not parallelised yet, something we plan to fix
>> this autumn."
>>
>> A bit of competition is a good thing, isn't it?
>
> Can someone explain how/if this is better than using MPI for
> parallelism? The reason I ask is that we are starting to use MPI for
> real on SMP machines to process our larger datasets.

I guess that with a working shared-memory SMP support, you
don't pay the communication cost you have with MPI.

Jonathan T Bryant

unread,

Sep 25, 2006, 9:44:06 PM9/25/06

to caml...@yquem.inria.fr

Yoann Padioleau wrote:

>Richard Jones <ri...@annexia.org> writes:
>
>> On Mon, Sep 25, 2006 at 08:13:50AM -0400, Jacques Carette wrote:
>>> Over on Haskell-cafe, Simon Peyton-Jones says:
>>> "GHC 6.6 (release candidate available) supports parallel execution on
>>> SMP machines.
>>>
>>> Garbage collection is not parallelised yet, something we plan to fix
>>> this autumn."
>>>
>>> A bit of competition is a good thing, isn't it?
>>
>> Can someone explain how/if this is better than using MPI for
>> parallelism? The reason I ask is that we are starting to use MPI for
>> real on SMP machines to process our larger datasets.
>
>I guess that with a working shared-memory SMP support, you
>don't pay the communication cost you have with MPI.
>

Again, if you are using MPI, look at the Event module. It's a set of
message passing concurrency primitives and operations. Reasoning about
message passing concurrency (for correctness and such) is typically
much simpler than shared memory concurrency, and it fits very cleanly
into the functional paradigm. The only thing the Event module lacks
(as compared to MPI) are a) collective channels/operations, b)
inter-(heavyweight)-process comunnication, c) asynchronous
communications.

Two of these (a and c) are almost trivial to write extensions while (b)
is harder (but very doable). Note that (b) is only needed because the
OCaml GC won't allow you to use multiple processors, so to take emulate
this on an SMP system, a small library has to be written to communicate
over sockets (UNIX or INET) and a few heavyweight processes can be
forked.

Fortunately, it is possible to wrap the Event module with these
extensions so that it is backwards compatible, which is very
convenient. :).

--Jonathan Bryant

Damien Doligez

unread,

Sep 26, 2006, 8:03:39 AM9/26/06

to caml users

On 2006-09-25, at 21:41, Richard Jones wrote:

> On a related note, is/could there be a way in OCaml to tell the GC not
> to bother going "into" a particular data structure to check if it part
> of it has become unreachable?

The GC doesn't work by checking whether something has become
unreachable. It works by marking all reachable data, then deallocating
everything else. There is no way to do what you want in OCaml, and
I don't think it can be done without a major rework of the runtime
system.

> Otherwise every time we do a major
> collection it's going to be uselessly iterating over the whole lot,
> and that appears to have particularly bad consequences as soon as we
> start to go into swap.

The OCaml GC is not designed to work well when the heap becomes larger
than physical memory. As soon as you start going to swap you'll see
a huge performance hit, and there's not much we can do about it.

-- Damien

Markus Mottl

unread,

Sep 26, 2006, 10:40:42 AM9/26/06

to Damien Doligez

On 9/26/06, Damien Doligez <damien....@inria.fr> wrote:
> The GC doesn't work by checking whether something has become
> unreachable. It works by marking all reachable data, then deallocating
> everything else. There is no way to do what you want in OCaml, and
> I don't think it can be done without a major rework of the runtime
> system.

But isn't it true that the GC doesn't follow pointers that point
outside the OCaml-heap? In that case it might be conceivable to copy
OCaml-data that must not be reclaimed into the C-heap. Of course,
this would mean that pointers must not point back into the OCaml-heap
from there, because there is no way the GC could know then that some
value in the OCaml-heap is still in use, or how to update the pointer
in the C-heap in case the OCaml-value gets moved around, e.g. during a
compaction.

If the above really works, I'd be glad to know whether there is
already functionality to copy OCaml-structures around. We have some
applications that would greatly benefit from this feature, too: they
require an enormous amount of static background knowledge, and have to
use it for small jobs which can be easily distributed. We have
multi-core, multi-processor machines, and would be able to greatly
speed up our compute jobs if we could put these large OCaml-values
into a shared-memory region. It would save us both a hell lot of
memory (probably in the range of GBs per machine for some jobs), and
also make the GC work less hard marking data that is not going to be
reclaimed anyway.

Regards,
Markus

--
Markus Mottl http://www.ocaml.info markus...@gmail.com

Christophe TROESTLER

unread,

Sep 26, 2006, 10:55:45 AM9/26/06

to caml...@inria.fr

On Tue, 26 Sep 2006, "Markus Mottl" <markus...@gmail.com> wrote:
>
> [...] We have some applications that would greatly benefit from this

> feature, too: they require an enormous amount of static background
> knowledge, and have to use it for small jobs which can be easily
> distributed. We have multi-core, multi-processor machines, and
> would be able to greatly speed up our compute jobs if we could put
> these large OCaml-values into a shared-memory region. It would save
> us both a hell lot of memory (probably in the range of GBs per
> machine for some jobs), and also make the GC work less hard marking
> data that is not going to be reclaimed anyway.

Just on the top of my head, how about to use memcached
<http://www.danga.com/memcached/> for this (with Marshal'ing) ?

Just a thought,
ChriS

Gerd Stolpmann

unread,

Sep 26, 2006, 11:04:35 AM9/26/06

to Markus Mottl

Am Dienstag, den 26.09.2006, 10:37 -0400 schrieb Markus Mottl:
> On 9/26/06, Damien Doligez <damien....@inria.fr> wrote:
> > The GC doesn't work by checking whether something has become
> > unreachable. It works by marking all reachable data, then deallocating
> > everything else. There is no way to do what you want in OCaml, and
> > I don't think it can be done without a major rework of the runtime
> > system.
>
> But isn't it true that the GC doesn't follow pointers that point
> outside the OCaml-heap? In that case it might be conceivable to copy
> OCaml-data that must not be reclaimed into the C-heap. Of course,
> this would mean that pointers must not point back into the OCaml-heap
> from there, because there is no way the GC could know then that some
> value in the OCaml-heap is still in use, or how to update the pointer
> in the C-heap in case the OCaml-value gets moved around, e.g. during a
> compaction.

That sounds perfectly compatible with the current GC approach. All what
you need to do is to add a third, "very old" generation.

Gerd
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------

Richard Jones

unread,

Sep 26, 2006, 3:00:04 PM9/26/06

to Markus Mottl

Really similar case to us!

In theory you'd have a "MARK_ANCIENT (obj)" operation. It would copy
obj and everything pointed to by obj out of the OCaml heap into
malloc'd memory. It would need to find everything pointing to obj and
update those pointers to the out-of-heap copy, or else have a proxy in
place of obj. All done in C so that the garbage collector wouldn't be
running while this happened, and in Markus's case you'd want to move
out to some sort of shared memory, perhaps using something like
mmalloc[1].

The problem with all this is that such an "ancient" object had really
better not be changed. If it was changed such that part of it pointed
to a new object allocated on the Caml heap, then that new object could
never be reached by the GC, and so would quickly get collected,
resulting in a dangling pointer. You'd have to ensure that the
ancient object was never changed by careful programming ...

Rich.

[1] http://www.math.utah.edu/docs/info/mmalloc_1.html

--
Richard Jones, CTO Merjis Ltd.
Merjis - web marketing and technology - http://merjis.com
Internet Marketing and AdWords courses - http://merjis.com/courses - NEW!
Merjis blog - http://blog.merjis.com - NEW!

_______________________________________________

Richard Jones

unread,

Sep 27, 2006, 8:18:19 AM9/27/06

to Markus Mottl

Here's a very lightly tested version of this idea for review:

http://homes.merjis.com/~rich/ancient-0.0.1.tar.gz

If you just want to look at the interface or the code, see here:

http://homes.merjis.com/~rich/ancient.mli
http://homes.merjis.com/~rich/ancient_c.c.txt

It works for a very simple case; I haven't tried it for anything
significant yet.

Rich.

Xavier Leroy

unread,

Sep 27, 2006, 1:39:59 PM9/27/06

to Markus Mottl

> But isn't it true that the GC doesn't follow pointers that point
> outside the OCaml-heap? In that case it might be conceivable to copy
> OCaml-data that must not be reclaimed into the C-heap.

Yes, this is possible. For instance, ocamlopt places structured
constants (e.g. constant lists) in the initialized data section
of the executable, outside the Caml heap.

> Of course,
> this would mean that pointers must not point back into the OCaml-heap
> from there, because there is no way the GC could know then that some
> value in the OCaml-heap is still in use, or how to update the pointer
> in the C-heap in case the OCaml-value gets moved around, e.g. during a
> compaction.

.. unless you register the locations of back-pointers as global roots.
But be careful that global roots are scanned at every GC (minor as
well as major), therefore a large number of such roots slow down the
GC significantly.

> If the above really works,

There is one caveat: ad-hoc polymorphic primitives (structural
equality and comparisons, marshaling, hashing) will not work on data
structures that reside outside of the Caml heap. The reason is that
these primitives treat out-of-heap pointers as opaque data. There is
a special case (the "Is_atom" test) for pointers that correspond to
ocamlopt-generated static data, but extending this special case is
nonobvious.

> I'd be glad to know whether there is
> already functionality to copy OCaml-structures around.

Apparently, Richard Jones is already working on it... Basically, it's
just like a copying collection with only one root. You could draw
inspiration from the OCaml minor GC (Cheney-style breadth-first copying)
and from the marshaller (depth-first quasi-copying).

- Xavier Leroy

David M. Cooke

unread,

Sep 27, 2006, 5:13:39 PM9/27/06

to caml...@inria.fr

Yoann Padioleau <Yoann.P...@emn.fr> writes:

> Richard Jones <ri...@annexia.org> writes:
>
>> On Mon, Sep 25, 2006 at 08:13:50AM -0400, Jacques Carette wrote:
>>> Over on Haskell-cafe, Simon Peyton-Jones says:
>>> "GHC 6.6 (release candidate available) supports parallel execution on
>>> SMP machines.
>>>
>>> Garbage collection is not parallelised yet, something we plan to fix
>>> this autumn."
>>>
>>> A bit of competition is a good thing, isn't it?
>>
>> Can someone explain how/if this is better than using MPI for
>> parallelism? The reason I ask is that we are starting to use MPI for
>> real on SMP machines to process our larger datasets.
>
> I guess that with a working shared-memory SMP support, you
> don't pay the communication cost you have with MPI.

Unless, of course, your MPI implementation uses shared memory on SMP systems.

--
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca