Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

DOD, mutation, and generational collectors

39 views

Skip to first unread message

Dan Sugalski

unread,

Jan 29, 2004, 8:41:06 AM1/29/04

to perl6-i...@perl.org

So, while I'm heaping much grumpiness on threads (though I suppose,
as I've been out of touch for a bit maybe you've all solved the
problem. That'd be nice) I've also been thinking about DOD, as
there's a fair amount of overlap in the things that cause problems
for threads and the ones that cause problems for generational garbage
collectors.

For a generational DOD to work we have to have a way to note which
generation a structure's in, as well as trap writes so we know when
you're referencing an old generation from a new, and vice versa.
(Threads have to trap mutations of this stuff, as this is in large
part the bits that need synchronizing on mutation) A short list of
mutating activities that are of issue are:

*) Setting the PMC data pointer
*) Resetting the PMC data pointer
*) Setting the PMC cache pointer
*) Resetting the PMC cache pointer
*) Putting a PMC or buffer pointer into a buffer
*) Altering a buffer's metadata (size or location)

We don't care about most buffer data, just those buffers whose data
are pointers to PMCs or other buffers. (Which argues that strings and
buffers that contain raw data should be allocated from different
pools than buffers that contain pointers to DOD-able data)

I think, then, that we'll want to intercept all these activities,
which means that we need to have API facilities for them. Before we
do that, though, I want to make sure there's nothing I missed from
that list.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Gordon Henriksen

unread,

Jan 30, 2004, 10:51:10 PM1/30/04

to Dan Sugalski, perl6-i...@perl.org

On Thursday, January 29, 2004, at 08:41 , Dan Sugalski wrote:

> So, while I'm heaping much grumpiness on threads (though I suppose, as
> I've been out of touch for a bit maybe you've all solved the problem.
> That'd be nice) I've also been thinking about DOD, as there's a fair
> amount of overlap in the things that cause problems for threads and the
> ones that cause problems for generational garbage collectors.

Can you elaborate a bit on your concept of generational collection as
they apply to parrot? To my ear, generational collection implies all of
this (which is a lot, and a lot of it grossly inapplicable to parrot as
the runtime now stands):

GENERATIONAL GARBAGE COLLECTORS

Rationale: Most objects are short-lived, and those which are not are
less likely to become garbage. Marking or copying the older is
statistically wasted effort. A garbage collector which usually traces
only new objects will thus tend to perform better than one which traces
all of them.

Implementation: Most generational collectors are copying collectors.
They are generally mutations upon a hemispace collector. The variation
is that collection always proceeds from a small, fixed-sized space to a
larger, growable space, rather than swapping back and forth between two
symmetric spaces. These asymmetric spaces are the so-called generations,
and the small one containing the younger objects is dubbed the nursery.
Moving an object from the nursery is called tenuring. Thus, the
"generation" of an object is not so much a property of an object as it
is a property of the pool from which the object is presently allocated.

Some generational collectors have intermediate generations, but most do
not. (Parrot might benefit from a "younger-than-the-nursery" generation
for impatient destructors, but then that property would need to be
ascertainable at construction time.)

The nursery is typically explicitly un-fragmented. All live objects in
it are tenured from it when it fills. Thus, allocation can be as simple
as an atomic pointer increment and filling in an object header. The
tenured generation might be a heap.

To avoid tracing from the roots, the generational collector requires a
cooperative mutator to maintain a table of references from tenured
objects to objects within the nursery. This is often forced upon the
mutator through the use of VM write barriers. The write barrier function
marks the VM page as dirty and then removes write protection for that
page. When invoked, the garbage collector will examine all dirtied
pages, updating the table of references.

A truly cooperative mutator will update a table of dirty pages every
time it changes a pointer variable. Usually, the pages are smaller than
VM pages, and thus called "cards" instead. This operation can be as
simple as card_table[((int) &ptr) >> n] = 1; ptr = obj;.[1] An expensive
proposition, at least doubling the memory bandwidth overhead of pointer
manipulation. But, since cards are smaller than pages, it can reduce the
amount of memory which has to be scanned at collection time.

Despite being incremental by nature, a generational collector is not a
concurrent collector. That is, it still "stops the world," it just
doesn't do it for quite a long as a traditional hemispace or mark-sweep
collector might.

Being a copying collector in the traditional mold has its drawbacks. It
requires accurate reference tracing, in particular. A traditional
copying collector cannot work with a conservative collector. For
instance, it cannot operate on the OS stack (unless the program is
written very cooperatively), since the collector cannot ascertain for
certain whether a value is a pointer (which should be traced and
possibly updated) or data (which should be left alone).

But parrot has anchored objects and other sorts of things which are
incompatible with this. In particular, the fast-path nursery allocator
(the very fast allocation being a major benefit of generational
collectors) can't work: If an object were anchored in the nursery, it
must stay there after GC rather than being tenured; this leaves the
nursery fragmented and complicates the very hot allocation code path.
Then there's the whole "PMCs don't move" guarantee. And there's also the
conservative tracing of system areas.

—

Gordon Henriksen
mali...@mac.com

[1] To be that simple, all allocated memory must be contiguous. So it's
not usually that simple. More likely, this:

(((alloc_header*) obj) - 1)->pool.card_table[((int) obj) >> n];
obj->ptr = newvalue;

Dan Sugalski

unread,

Feb 2, 2004, 10:43:47 AM2/2/04

to Gordon Henriksen, perl6-i...@perl.org

At 10:51 PM -0500 1/30/04, Gordon Henriksen wrote:
>On Thursday, January 29, 2004, at 08:41 , Dan Sugalski wrote:
>
>>So, while I'm heaping much grumpiness on threads (though I suppose,
>>as I've been out of touch for a bit maybe you've all solved the
>>problem. That'd be nice) I've also been thinking about DOD, as
>>there's a fair amount of overlap in the things that cause problems
>>for threads and the ones that cause problems for generational
>>garbage collectors.
>
>Can you elaborate a bit on your concept of generational collection
>as they apply to parrot? To my ear, generational collection implies
>all of this (which is a lot, and a lot of it grossly inapplicable to
>parrot as the runtime now stands):

[Description snippage]

From my reading of the literature the generational collectors don't
have to be copying collectors -- the only thing that's ultimately
required is an identification of which generation an object is living
in. Granted, with an absolute minimum support in the data (just a
generation count) it makes collection pretty slow, but that's all you
*need*.

Having all the objects of a generation in a single pool does make
life significantly easier, to be sure, but you don't need to have the
objects *themselves* in a single pool -- just have a quick way to
identify all the objects in a generation. We could do that with a
separate generation pool if we wanted, at the cost of a pointer per
PMC plus overhead for the pool segmentation, much the way that we
have the freelist for PMCs.

Might ultimately be untenable, of course. On the other hand, the code
infrastructure and APIs needed for a generational collector are the
same as the ones for incremental and concurrent collectors, so we may
well just jump to an incremental collector instead if it turns out to
be more worthwhile or less hassle.

Leopold Toetsch

unread,

Feb 2, 2004, 11:41:45 AM2/2/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

> Having all the objects of a generation in a single pool does make
> life significantly easier, to be sure, but you don't need to have the
> objects *themselves* in a single pool -- just have a quick way to
> identify all the objects in a generation. We could do that with a
> separate generation pool if we wanted, at the cost of a pointer per
> PMC plus overhead for the pool segmentation, much the way that we
> have the freelist for PMCs.

The freelist is located inside of unused PMCs, it doesn't take up extra
memory. An additional generation pointer needs additional memory. For
changing the generations a double linked list could be necessary.

> Might ultimately be untenable, of course. On the other hand, the code
> infrastructure and APIs needed for a generational collector are the
> same as the ones for incremental and concurrent collectors, so we may
> well just jump to an incremental collector instead if it turns out to
> be more worthwhile or less hassle.

Incremental collectors are generally slower, but can provide RT
capabilities. The stopping-the-world time is bounded. Generational GCs
have a signifant overhead too, but depending on workload, this can be
outweighed by not scanning old generation pools each time.

We can achieve a similar effect without much overhead, by remembering a
previous-live-objects-count per arena and skip "almost all live arenas"
each second time or so.

But anyway, we don't need[1] different schemes for PMCs: The copying
collection of variable sized Buffer memory is the problem for
multi-threading.

We can configure with --gc=libc or --gc=malloc but we have to be sure,
that either the system malloc is fast enough or that the supplied
malloc.c is portable and thread-safe. The latter isn't the case for the
LEA implementation.

[1] experiments, patches, benchmarks are always welcome of course

leo

Dan Sugalski

unread,

Feb 2, 2004, 12:06:43 PM2/2/04

to l...@toetsch.at, perl6-i...@perl.org

At 5:41 PM +0100 2/2/04, Leopold Toetsch wrote:
>Dan Sugalski <d...@sidhe.org> wrote:
>
>> Having all the objects of a generation in a single pool does make
>> life significantly easier, to be sure, but you don't need to have the
>> objects *themselves* in a single pool -- just have a quick way to
>> identify all the objects in a generation. We could do that with a
>> separate generation pool if we wanted, at the cost of a pointer per
>> PMC plus overhead for the pool segmentation, much the way that we
>> have the freelist for PMCs.
>
>The freelist is located inside of unused PMCs, it doesn't take up extra
>memory. An additional generation pointer needs additional memory. For
>changing the generations a double linked list could be necessary.

The freelist used to be an array that held free PMC pointers. I
presume this changed?

> > Might ultimately be untenable, of course. On the other hand, the code
>> infrastructure and APIs needed for a generational collector are the
>> same as the ones for incremental and concurrent collectors, so we may
>> well just jump to an incremental collector instead if it turns out to
>> be more worthwhile or less hassle.
>
>Incremental collectors are generally slower, but can provide RT
>capabilities. The stopping-the-world time is bounded. Generational GCs
>have a signifant overhead too, but depending on workload, this can be
>outweighed by not scanning old generation pools each time.

The not scanning part is where generational GC systems get their
wins. Properly done (depending on workload) a generational system
will have less gc overhead because it does less work.

>We can achieve a similar effect without much overhead, by remembering a
>previous-live-objects-count per arena and skip "almost all live arenas"
>each second time or so.

Well... maybe. I'm not sure we can, since those arenas need to be
looked at for pointers to other arenas. If you work around that then
you *have* a generational collector.

>But anyway, we don't need[1] different schemes for PMCs: The copying
>collection of variable sized Buffer memory is the problem for
>multi-threading.

No, it isn't. For several reasons, not the least of it is the fact
that we're going to have to shift to a non-copying collector in the
threaded case. The bigger problem is tracing the live set in the face
of multiple simultaneous mutators, which the incremental gc systems
handle much more pleasantly and with less overall impact on the
system.

Leopold Toetsch

unread,

Feb 2, 2004, 1:54:07 PM2/2/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

> The freelist used to be an array that held free PMC pointers. I
> presume this changed?

Must have been changed before my days then. The freelist is a linked
list of pointers:

get_free_object ...

ptr = pool->free_list;
pool->free_list = *(void **)ptr;

A generation list either uses extra PMC memory or it is an array of
pointers in the objects arena. An array of pointers would need
allocation and resizing if generation of an object changes.

>>But anyway, we don't need[1] different schemes for PMCs: The copying
>>collection of variable sized Buffer memory is the problem for
>>multi-threading.

> No, it isn't. For several reasons, not the least of it is the fact
> that we're going to have to shift to a non-copying collector in the
> threaded case.

Ok. But then lets start here. This *is* the current issue.

> ... The bigger problem is tracing the live set in the face

> of multiple simultaneous mutators, which the incremental gc systems
> handle much more pleasantly and with less overall impact on the
> system.

I doubt that it has less overall impact. Anyway: an incremental GC isn't
the only solution. It could be ok, if all PMCs are shared. But this
isn't true. All temporary PMCs are *not* shared logically. Considering
all temps of all threads as shared (and managing all PMCs of all threads
in one PMC pool is for sure suboptimal). Freelist handling could have
high contention with this case.

And incremental GC must always run faster then allocating new PMCs. This
can't be achieved, if all threads use the one and only pool for its
PMCs.

Incremental GC has a lot of overhead by tracking pointer updates. When
we have the temps automatically shared, we add locking and this cost to
each PMC. This scheme can't win (or will need a ~16 CPU machine to
achieve something).

So it could work like this:
- temps aren't shared - they live in their interpreters private pools
- shared PMCs live in exactly one pool (of the "main" interpreter)
- any thread can DOD its own PMCs - shared PMCs are marked live by
that process, but nether freed, because that interpreter doesn't own
the pool. And they aren't append to the next_for_GC pointer.

- shared PMC DOD needs:
- suspend all interpreters, that might have a shared PMC from this
pool
- the main interpreter marks the root set
- all other interpreters run the mark phase, one by one, and append
live shared PMCs to the next_for_GC pointer (anchored at the main
interpreter)
- the main interpreter continues/finishes marking phase 2
- all interpreters can now continue to free_objects (if their
statistics state, that they are low on free objects)
- and in parallel the main interpreter can free shared objects.

If someone implements a truely paralled incremental GC later, and *if*
that's faster, we can switch to it. Above scheme works (almost:) now.

leo

0 new messages