On Mon, 2006-09-25 at 08:13 -0400, Jacques Carette wrote: > Over on Haskell-cafe, Simon Peyton-Jones says: > "GHC 6.6 (release candidate available) supports parallel execution on > SMP machines.
> Garbage collection is not parallelised yet, something we plan to fix > this autumn."
> A bit of competition is a good thing, isn't it?
Actually this isn't quite accurate, at least as I understand it. GHC already does have fully parallel minor heap allocation and compaction etc using a per-thread minor heap, but the major collection isn't only not parallel .. it isn't even concurrent.
[Using Cheng's terminology, parallel collection implies multiple collector threads can cooperate so that the system can scale to very large numbers of CPUs by allowing more than one CPU to run the collector in simultaneously, 'merely' concurrent includes collectors limited to a single collector thread]
-- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net
On Mon, Sep 25, 2006 at 08:13:50AM -0400, Jacques Carette wrote: > Over on Haskell-cafe, Simon Peyton-Jones says: > "GHC 6.6 (release candidate available) supports parallel execution on > SMP machines.
> Garbage collection is not parallelised yet, something we plan to fix > this autumn."
> A bit of competition is a good thing, isn't it?
Can someone explain how/if this is better than using MPI for parallelism? The reason I ask is that we are starting to use MPI for real on SMP machines to process our larger datasets.
On a related note, is/could there be a way in OCaml to tell the GC not to bother going "into" a particular data structure to check if it part of it has become unreachable? Our Weblogs library[1] loads a whole set of log files into memory (multiple gigabytes of memory), basically stored in a big linked list with lots of smaller structures hanging off each element, and I *know* that the hanging elements will never become unreachable unless the main linked list does so. I'd like to be able to tell the runtime that. Otherwise every time we do a major collection it's going to be uselessly iterating over the whole lot, and that appears to have particularly bad consequences as soon as we start to go into swap.
Richard Jones <r...@annexia.org> writes: > On Mon, Sep 25, 2006 at 08:13:50AM -0400, Jacques Carette wrote: >> Over on Haskell-cafe, Simon Peyton-Jones says: >> "GHC 6.6 (release candidate available) supports parallel execution on >> SMP machines.
>> Garbage collection is not parallelised yet, something we plan to fix >> this autumn."
>> A bit of competition is a good thing, isn't it?
> Can someone explain how/if this is better than using MPI for > parallelism? The reason I ask is that we are starting to use MPI for > real on SMP machines to process our larger datasets.
I guess that with a working shared-memory SMP support, you don't pay the communication cost you have with MPI.
Yoann Padioleau wrote: >Richard Jones <r...@annexia.org> writes:
>> On Mon, Sep 25, 2006 at 08:13:50AM -0400, Jacques Carette wrote: >>> Over on Haskell-cafe, Simon Peyton-Jones says: >>> "GHC 6.6 (release candidate available) supports parallel execution on >>> SMP machines.
>>> Garbage collection is not parallelised yet, something we plan to fix >>> this autumn."
>>> A bit of competition is a good thing, isn't it?
>> Can someone explain how/if this is better than using MPI for >> parallelism? The reason I ask is that we are starting to use MPI for >> real on SMP machines to process our larger datasets.
>I guess that with a working shared-memory SMP support, you >don't pay the communication cost you have with MPI.
Again, if you are using MPI, look at the Event module. It's a set of message passing concurrency primitives and operations. Reasoning about message passing concurrency (for correctness and such) is typically much simpler than shared memory concurrency, and it fits very cleanly into the functional paradigm. The only thing the Event module lacks (as compared to MPI) are a) collective channels/operations, b) inter-(heavyweight)-process comunnication, c) asynchronous communications.
Two of these (a and c) are almost trivial to write extensions while (b) is harder (but very doable). Note that (b) is only needed because the OCaml GC won't allow you to use multiple processors, so to take emulate this on an SMP system, a small library has to be written to communicate over sockets (UNIX or INET) and a few heavyweight processes can be forked.
Fortunately, it is possible to wrap the Event module with these extensions so that it is backwards compatible, which is very convenient. :).
> On a related note, is/could there be a way in OCaml to tell the GC not > to bother going "into" a particular data structure to check if it part > of it has become unreachable?
The GC doesn't work by checking whether something has become unreachable. It works by marking all reachable data, then deallocating everything else. There is no way to do what you want in OCaml, and I don't think it can be done without a major rework of the runtime system.
> Otherwise every time we do a major > collection it's going to be uselessly iterating over the whole lot, > and that appears to have particularly bad consequences as soon as we > start to go into swap.
The OCaml GC is not designed to work well when the heap becomes larger than physical memory. As soon as you start going to swap you'll see a huge performance hit, and there's not much we can do about it.
On 9/26/06, Damien Doligez <damien.doli...@inria.fr> wrote:
> The GC doesn't work by checking whether something has become > unreachable. It works by marking all reachable data, then deallocating > everything else. There is no way to do what you want in OCaml, and > I don't think it can be done without a major rework of the runtime > system.
But isn't it true that the GC doesn't follow pointers that point outside the OCaml-heap? In that case it might be conceivable to copy OCaml-data that must not be reclaimed into the C-heap. Of course, this would mean that pointers must not point back into the OCaml-heap from there, because there is no way the GC could know then that some value in the OCaml-heap is still in use, or how to update the pointer in the C-heap in case the OCaml-value gets moved around, e.g. during a compaction.
If the above really works, I'd be glad to know whether there is already functionality to copy OCaml-structures around. We have some applications that would greatly benefit from this feature, too: they require an enormous amount of static background knowledge, and have to use it for small jobs which can be easily distributed. We have multi-core, multi-processor machines, and would be able to greatly speed up our compute jobs if we could put these large OCaml-values into a shared-memory region. It would save us both a hell lot of memory (probably in the range of GBs per machine for some jobs), and also make the GC work less hard marking data that is not going to be reclaimed anyway.
On Tue, 26 Sep 2006, "Markus Mottl" <markus.mo...@gmail.com> wrote:
> [...] We have some applications that would greatly benefit from this > feature, too: they require an enormous amount of static background > knowledge, and have to use it for small jobs which can be easily > distributed. We have multi-core, multi-processor machines, and > would be able to greatly speed up our compute jobs if we could put > these large OCaml-values into a shared-memory region. It would save > us both a hell lot of memory (probably in the range of GBs per > machine for some jobs), and also make the GC work less hard marking > data that is not going to be reclaimed anyway.
Am Dienstag, den 26.09.2006, 10:37 -0400 schrieb Markus Mottl:
> On 9/26/06, Damien Doligez <damien.doli...@inria.fr> wrote: > > The GC doesn't work by checking whether something has become > > unreachable. It works by marking all reachable data, then deallocating > > everything else. There is no way to do what you want in OCaml, and > > I don't think it can be done without a major rework of the runtime > > system.
> But isn't it true that the GC doesn't follow pointers that point > outside the OCaml-heap? In that case it might be conceivable to copy > OCaml-data that must not be reclaimed into the C-heap. Of course, > this would mean that pointers must not point back into the OCaml-heap > from there, because there is no way the GC could know then that some > value in the OCaml-heap is still in use, or how to update the pointer > in the C-heap in case the OCaml-value gets moved around, e.g. during a > compaction.
That sounds perfectly compatible with the current GC approach. All what you need to do is to add a third, "very old" generation.
On Tue, Sep 26, 2006 at 10:37:27AM -0400, Markus Mottl wrote: > On 9/26/06, Damien Doligez <damien.doli...@inria.fr> wrote: > >The GC doesn't work by checking whether something has become > >unreachable. It works by marking all reachable data, then deallocating > >everything else. There is no way to do what you want in OCaml, and > >I don't think it can be done without a major rework of the runtime > >system.
> But isn't it true that the GC doesn't follow pointers that point > outside the OCaml-heap? In that case it might be conceivable to copy > OCaml-data that must not be reclaimed into the C-heap. Of course, > this would mean that pointers must not point back into the OCaml-heap > from there, because there is no way the GC could know then that some > value in the OCaml-heap is still in use, or how to update the pointer > in the C-heap in case the OCaml-value gets moved around, e.g. during a > compaction.
> If the above really works, I'd be glad to know whether there is > already functionality to copy OCaml-structures around. We have some > applications that would greatly benefit from this feature, too: they > require an enormous amount of static background knowledge, and have to > use it for small jobs which can be easily distributed. We have > multi-core, multi-processor machines, and would be able to greatly > speed up our compute jobs if we could put these large OCaml-values > into a shared-memory region. It would save us both a hell lot of > memory (probably in the range of GBs per machine for some jobs), and > also make the GC work less hard marking data that is not going to be > reclaimed anyway.
Really similar case to us!
In theory you'd have a "MARK_ANCIENT (obj)" operation. It would copy obj and everything pointed to by obj out of the OCaml heap into malloc'd memory. It would need to find everything pointing to obj and update those pointers to the out-of-heap copy, or else have a proxy in place of obj. All done in C so that the garbage collector wouldn't be running while this happened, and in Markus's case you'd want to move out to some sort of shared memory, perhaps using something like mmalloc[1].
The problem with all this is that such an "ancient" object had really better not be changed. If it was changed such that part of it pointed to a new object allocated on the Caml heap, then that new object could never be reached by the GC, and so would quickly get collected, resulting in a dangling pointer. You'd have to ensure that the ancient object was never changed by careful programming ...
> But isn't it true that the GC doesn't follow pointers that point > outside the OCaml-heap? In that case it might be conceivable to copy > OCaml-data that must not be reclaimed into the C-heap.
Yes, this is possible. For instance, ocamlopt places structured constants (e.g. constant lists) in the initialized data section of the executable, outside the Caml heap.
> Of course, > this would mean that pointers must not point back into the OCaml-heap > from there, because there is no way the GC could know then that some > value in the OCaml-heap is still in use, or how to update the pointer > in the C-heap in case the OCaml-value gets moved around, e.g. during a > compaction.
.. unless you register the locations of back-pointers as global roots. But be careful that global roots are scanned at every GC (minor as well as major), therefore a large number of such roots slow down the GC significantly.
> If the above really works,
There is one caveat: ad-hoc polymorphic primitives (structural equality and comparisons, marshaling, hashing) will not work on data structures that reside outside of the Caml heap. The reason is that these primitives treat out-of-heap pointers as opaque data. There is a special case (the "Is_atom" test) for pointers that correspond to ocamlopt-generated static data, but extending this special case is nonobvious.
> I'd be glad to know whether there is > already functionality to copy OCaml-structures around.
Apparently, Richard Jones is already working on it... Basically, it's just like a copying collection with only one root. You could draw inspiration from the OCaml minor GC (Cheney-style breadth-first copying) and from the marshaller (depth-first quasi-copying).
Yoann Padioleau <Yoann.Padiol...@emn.fr> writes: > Richard Jones <r...@annexia.org> writes:
>> On Mon, Sep 25, 2006 at 08:13:50AM -0400, Jacques Carette wrote: >>> Over on Haskell-cafe, Simon Peyton-Jones says: >>> "GHC 6.6 (release candidate available) supports parallel execution on >>> SMP machines.
>>> Garbage collection is not parallelised yet, something we plan to fix >>> this autumn."
>>> A bit of competition is a good thing, isn't it?
>> Can someone explain how/if this is better than using MPI for >> parallelism? The reason I ask is that we are starting to use MPI for >> real on SMP machines to process our larger datasets.
> I guess that with a working shared-memory SMP support, you > don't pay the communication cost you have with MPI.
Unless, of course, your MPI implementation uses shared memory on SMP systems.
-- |>|\/|< /-------------------------------------------------------------------------- \ |David M. Cooke |cookedm(at)physics(dot)mcmaster(dot)ca