Unifying PMCs and Buffers for GC

Dan Sugalski

unread,

Aug 1, 2002, 3:03:07 AM8/1/02

to perl6-i...@perl.org

Okay, I finally give. For purposes of liveness tracing and GC, we're
going to unify PMCs and strings/buffers. This means we trace through
strings and buffers if the flags are right, and we need to add a GC
link pointer to strings/buffers. It'll make things a bit larger,
which I don't like, but it lifts some restrictions I see looming,
which I do like.

Anyone care to take a shot at this?
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Mike Lambert

unread,

Aug 4, 2002, 5:44:47 AM8/4/02

to Dan Sugalski, perl6-i...@perl.org

> Okay, I finally give. For purposes of liveness tracing and GC, we're
> going to unify PMCs and strings/buffers. This means we trace through
> strings and buffers if the flags are right, and we need to add a GC
> link pointer to strings/buffers. It'll make things a bit larger,
> which I don't like, but it lifts some restrictions I see looming,
> which I do like.
>
> Anyone care to take a shot at this?

I've started on this task, although it seems to be rather involved. :)

What follows is basically a brain dump on my current ideas that I'm
tossing around in an attempt to resolve the unification issues while
retaining the current speed.

a) The current hash implementation works against the GC, not with it.
Since we currently need a PerlHash PMC surrounding every buffer, these are
directly related by such a unification, and it would be good to allow for
them, and other such data structures.

I'm currently favoring allowing for header pools on a per-type basis, not
just a per-size basis. This would give us a 'hash' pool. The pool
structure would contain function pointers for collection and/or dod
purposes. (stuff that would otherwise be in a PMC vtable.)

Since collection phases are done on a per-header-pool basis already, it
wouldn't be difficult to make per-pool collection functions that are
responsible for iterating over their elements and handling them.

This would help speed up hashes, and make them easier to implement, since
they could update their internal pointers on hash relocation, while it's
all still in the cache.

However, dod functions are a bit harder to handle. mark_used currently
calls pmc->vtable->mark to handle its behavior, and buffers don't do
anything special. This is what prevents hashes from being implemented as
buffers, GC-wise...they need special collection logic. Currently, any
buffer that contains pointers *must* be surrounded by a PMC which
indicates it's behavior, or it's considered a dumb data pointer, like
strings.

One idea, which is most closely in line with the current semantics, is to
add a pool pointer to every header. I've found a few times in the past
where such a pointer would have come in handy. This would allow us to call
the pool's mark() function, to handle stuff like pointing-to-buffers, etc.

It's main drawback is the additional size of the pointer in the header.
I believe this might be okay for a few reasons:

a) our main types, pmc and string, are already quite large. This isn't
that much in their scale of things.

b) it allows us to make new types of buffer-like headers on par with
existing structures. This should hopefully make the core GC code change
less often, and push it out onto the implementation of the headers.

c) currently pmc's have a vtable pointer. If we're really concerned about
the additional data element, we could do something like:
pmc->pool.vtable->add_used instead of the traditional vtable-> . I'm not
convinced of the merit of this idea, and if the 'add' is deemed too slow,
we can just keep a vtable *and* pool pointer in the PMC header.

One implication of c) is that every pmc type has its own pool. This means:

a) no pmc type morphing. once in a pool, it stays in a pool. I don't see
this as a big loss, since type morphing is error-prone to begin with, imo.

b) data members! Since not all pmcs are the same size, pmcs are able to
store data elements in their structure. This allows us to make a SV-like
PMC which stores str-value, int-value, float-value, etc. All without
imposing on the base PMC buffer size. (no, data and cache aren't enough to
handle the above three values, without having the data point to a header
pointing to a buffer containing the values.)

Thoughts on all of the above? The main drawback that I see is that we can
have a lot more pools. Currently, we don't take advantage of sized header
pools, so making them per-type won't hurt us. However, by making different
pools for different pmc types, an explosion in base pmc types could cause
an explosion in pools and create wasteful memory usage as each pool stores
'extra' headers for allocation. This can probably be tuned in some form
to reduce over-allocation's affect, but I thought it wise to bring it up.

Finally....the unification of buffers and PMCs means that buffers can now
point to things of their own accord, without requiring that they be
surrounded by an accompanying PMC type. (This is a seperate question from
the above discussion, as this problem occurs regardless of what we do
above.) This imposes additional work on the DOD, since instead of just
buffer_lives-ing a buffer, it must now stick it on the DOD list so that it
can be properly traced later. This then requires that each buffer contain
a next_for_GC pointer, so it can be added to the to-do list. Alternately,
we can use pool-specific memory to handle the various pointers that are
required for DOD....but the point remains that this further increases the
memory footprint of buffers, and I wanted to verify that it was okay.

Comments and/or suggestions, please?

Thanks,
Mike Lambert

Mike Lambert

unread,

Aug 4, 2002, 6:10:43 AM8/4/02

to Dan Sugalski, perl6-i...@perl.org

Mike Lambert wrote:

> One idea, which is most closely in line with the current semantics, is to
> add a pool pointer to every header. I've found a few times in the past
> where such a pointer would have come in handy. This would allow us to call
> the pool's mark() function, to handle stuff like pointing-to-buffers, etc.

Oh, I meant to mention an alternative to the pool pointer, but forgot...

At one point, we had a mem_alloc_aligned, which guaranteed the start of a
block of memory given any pointer into the contents of the block. If we
store a pointer to the pool at the beginning of each set of headers, then
we navoid the need for a per-header pool pointer, at the cost of a bit
more math and an additional dereference to get at it.

The benefits to this are the drawbacks to the aforementioned approach, but
the drawbacks include:

- additional cpu, and/or cache misses in getting to the pool. for dod,
this might be very inefficient.

- it imposes additional memory requirements in order to align the block of
memory, and imposes a bit more in this 'header header' at the beginning of
the block of headers.

Mike Lambert

Peter Gibbs

unread,

Aug 4, 2002, 8:46:39 AM8/4/02

to Mike Lambert, perl6-internals

Mike Lambert wrote:

> I'm currently favoring allowing for header pools on a per-type basis, not
> just a per-size basis. This would give us a 'hash' pool. The pool
> structure would contain function pointers for collection and/or dod
> purposes. (stuff that would otherwise be in a PMC vtable.)

I am very much in agreement with this concept in principle. I would like you
to consider adding a name/tag/id field to all pool headers, containing a
short text description of the pool, for debugging purposes.

>
>
> One idea, which is most closely in line with the current semantics, is to
> add a pool pointer to every header. I've found a few times in the past
> where such a pointer would have come in handy. This would allow us to call
> the pool's mark() function, to handle stuff like pointing-to-buffers, etc.

This is something I have done in my personal version, for buffer headers
only at present (I have been mainly ignoring PMCs, as I believe they are
still immature). I use it for my latest version COW code, as well as to
allow buffer headers to be returned to the correct pool when they are
detected as free in code that is not resource-pool driven.

> b) it allows us to make new types of buffer-like headers on par with
> existing structures.

On this subject, I would like to see the string structure changed to include
a buffer header structure, rather than duplicating the fields. This would
mean a lot of changes (e.g. all s->bufstart to s->buffer.bufstart), but
would be safer and more consistant. Of course, strings may not even
warrant existence outside of a generic String pmc any more.

>
> a) no pmc type morphing. once in a pool, it stays in a pool. I don't see
> this as a big loss, since type morphing is error-prone to begin with, imo.

The main issue here would be the definition of pmc type, in an untyped
language. We may need a PerlScalar pmc type, as that is what most Perl
variables really are - if we stick to using pmc types based on current
content, then we need to be able to morph between the different
subclasses of PerlScalar as the contents change.

>
> b) data members! Since not all pmcs are the same size, pmcs are able to
> store data elements in their structure. This allows us to make a SV-like
> PMC which stores str-value, int-value, float-value, etc. All without

Okay, you were obviously thinking the same way!

>
> Thoughts on all of the above? The main drawback that I see is that we can
> have a lot more pools. Currently, we don't take advantage of sized header
> pools, so making them per-type won't hurt us. However, by making different
> pools for different pmc types, an explosion in base pmc types could cause
> an explosion in pools and create wasteful memory usage as each pool stores
> 'extra' headers for allocation. This can probably be tuned in some form
> to reduce over-allocation's affect, but I thought it wise to bring it up.

One option would be to use a limited set of physical sizes (only multiples
of 16 bytes or something) and have free lists per physical size, rather than
per individual pool. This would waste some space in each header, but may
be more efficient overall.

>
> Finally....the unification of buffers and PMCs means that buffers can now
> point to things of their own accord, without requiring that they be
> surrounded by an accompanying PMC type.

How about the other way round? If the one-size-fits-all PMCs were to be
replaced by custom structures, then everything could be a PMC, and
buffer headers as a separate resource could just disappear!

--
Peter Gibbs
EmKel Systems

Peter Gibbs

unread,

Aug 4, 2002, 8:54:10 AM8/4/02

to Mike Lambert, perl6-i...@perl.org

Mike Lambert wrote:

> At one point, we had a mem_alloc_aligned, which guaranteed the start of a
> block of memory given any pointer into the contents of the block. If we
> store a pointer to the pool at the beginning of each set of headers, then
> we navoid the need for a per-header pool pointer, at the cost of a bit
> more math and an additional dereference to get at it.
>

> - it imposes additional memory requirements in order to align the block of
> memory, and imposes a bit more in this 'header header' at the beginning of
> the block of headers.

I considered this option also, but dismissed it as you need to allocate
twice
the required size to get guaranteed alignment, so you are better off with
the
pointer per header. To use this method without the memory overhead would
require implementing another allocator: if you want for example a 1K
aligned block, first allocate 16K, discard the amount before the alignment
point, and dish out the rest as 15 (or 16 if you're really lucky) 1K aligned
pages. I seriously considered this when I changed my buffer memory to be
paged instead of a single allocation per memory pool; but I haven't actually
implemented it yet.

Mike Lambert

unread,

Aug 4, 2002, 2:51:45 PM8/4/02

to Peter Gibbs, perl6-internals

Peter Gibbs wrote:

> I am very much in agreement with this concept in principle. I would like you
> to consider adding a name/tag/id field to all pool headers, containing a
> short text description of the pool, for debugging purposes.

I don't have a problem with that. And yes, it'd definitely help debugging
(as opposed to printing out the various pool addresses and comparing them ;)

> > One idea, which is most closely in line with the current semantics, is to
> > add a pool pointer to every header. I've found a few times in the past
> > where such a pointer would have come in handy. This would allow us to call
> > the pool's mark() function, to handle stuff like pointing-to-buffers, etc.
> This is something I have done in my personal version, for buffer headers
> only at present (I have been mainly ignoring PMCs, as I believe they are
> still immature). I use it for my latest version COW code, as well as to
> allow buffer headers to be returned to the correct pool when they are
> detected as free in code that is not resource-pool driven.

Re: DOD immaturity: Yeah, I agree to some extent. It's somewhat difficult
to test DOD efficiency because every string is directly traceable from the
root, thus avoding mark_used for the most part. Perhaps some GC-PMC
benchmarks are needed to weed out remaining issues.

Re: COW code. Ooohh! You've kept it up date with the current code? I was
working on applying your old patch (ticket 607 at
http://bugs6.perl.org/rt2/Ticket/Display.html?id=607), but if you've gow
COW code in the current build, that's even better.

One question: does your current code utilize bufstart as the beginning of
the buffer, or the beginning of the string?

> > b) it allows us to make new types of buffer-like headers on par with
> > existing structures.
> On this subject, I would like to see the string structure changed to include
> a buffer header structure, rather than duplicating the fields. This would
> mean a lot of changes (e.g. all s->bufstart to s->buffer.bufstart), but
> would be safer and more consistant. Of course, strings may not even
> warrant existence outside of a generic String pmc any more.

Again, I agree. If the COW code forces all the string usage to use
strstart and strlen, then bufstart and buflen essentially are used a *lot*
less. This should make the mental transition easier.

> One option would be to use a limited set of physical sizes (only multiples
> of 16 bytes or something) and have free lists per physical size, rather than
> per individual pool. This would waste some space in each header, but may
> be more efficient overall.

I suppose this allows us to mix and match entries of different types in
same pools, since each header would have a pointer to its own pool,
regardless of its neighbors. However, the number 16 could be tuned to 4 or
1 to achieve slightly better mem usage. (Or even POINTER_ALIGNMENT).

> > Finally....the unification of buffers and PMCs means that buffers can now
> > point to things of their own accord, without requiring that they be
> > surrounded by an accompanying PMC type.
> How about the other way round? If the one-size-fits-all PMCs were to be
> replaced by custom structures, then everything could be a PMC, and
> buffer headers as a separate resource could just disappear!

I think you misunderstood me here. I agree that making the buffer headers
a distinct resource is unnecessary. However, this does mean that all
headers need to be traced now. For pure strings, this can hurt
performance, although one can argue that it helps performance in the
general case of the PMC containing buffer data (a couple less
indirections needed on usage).

We could make a new header flag, BUFFER_has_pointers_FLAG, which specifies
that this buffer contains pointers to other data structures, and should be
traced. If this is unset, the buffer doesn't get added onto the free list.

Since adding it to the free list requires adjusting next_for_GC, it's
already going to reference memory there. Checking the flag would merely
prevent traversing the memory again in the 'process' portion.

Thanks for the quick reply,
Mike Lambert

Peter Gibbs

unread,

Aug 4, 2002, 4:50:42 PM8/4/02

to Mike Lambert, perl6-internals

Mike Lambert wrote:
>
> Re: COW code. Ooohh! You've kept it up date with the current code? I was
> working on applying your old patch (ticket 607 at
> http://bugs6.perl.org/rt2/Ticket/Display.html?id=607), but if you've gow
> COW code in the current build, that's even better.
>
> One question: does your current code utilize bufstart as the beginning of
> the buffer, or the beginning of the string?
>

I have given up trying to keep in sync with the live code, and have just
been playing with various alternate approaches to some things. My
current COW code is totally different from the previous version, and is
based on some major changes to the memory management code.
In particular, buffer headers are kept in a doubly-linked list per memory
page, thereby adding two pointers each; this will probably be
considered too expensive in memory usage and link maintenance time.

As far as my original style COW code goes, Dan last ruled as follows:
<< I think I'd like these differently. STRING is a subclass of Buffer,
<< and the bufstart and buflen fields in Buffers point to the actual
<< start of memory and length of allocated data. I'd rather keep it that
<< way, and have the 'real' string length and string start be extra
<< fields in the STRING part.
That was how my first COW patch worked; the second one (which is
the one you refer to above) uses bufstart to point to the string data,
and a separate buffer field to point to the physical start of memory.
However, Dan changes his mind regularly, so it might be safer to
get a new ruling before you do much work on it.

Dan Sugalski

unread,

Aug 4, 2002, 5:09:55 PM8/4/02

to Peter Gibbs, Mike Lambert, perl6-internals

At 2:46 PM +0200 8/4/02, Peter Gibbs wrote:
>Mike Lambert wrote:
>
>> I'm currently favoring allowing for header pools on a per-type basis, not
>> just a per-size basis. This would give us a 'hash' pool. The pool
>> structure would contain function pointers for collection and/or dod
>> purposes. (stuff that would otherwise be in a PMC vtable.)
>I am very much in agreement with this concept in principle. I would like you
>to consider adding a name/tag/id field to all pool headers, containing a
>short text description of the pool, for debugging purposes.

I'm OK with this. Feel free to throw a patch in for it.

We should add in some introspection capabilities so the running
interpreter has access to it as well.

> > One idea, which is most closely in line with the current semantics, is to
>> add a pool pointer to every header. I've found a few times in the past
>> where such a pointer would have come in handy. This would allow us to call
>> the pool's mark() function, to handle stuff like pointing-to-buffers, etc.
>This is something I have done in my personal version, for buffer headers
>only at present (I have been mainly ignoring PMCs, as I believe they are
>still immature). I use it for my latest version COW code, as well as to
>allow buffer headers to be returned to the correct pool when they are
>detected as free in code that is not resource-pool driven.

I don't want pointers back to the pool header for space reasons. This
was why I was playing the alignment games at the beginning--with
sufficient control it's an OK thing to do.

If we limited the size of the arenas, I'd be willing to dedicate some
bits in the flag word to an offset count, though. (8 or 9 at most, so
I don't know that a limit of 256 or 512 headers per arena would be
worth it)

> > b) it allows us to make new types of buffer-like headers on par with
>> existing structures.
>On this subject, I would like to see the string structure changed to include
>a buffer header structure, rather than duplicating the fields. This would
>mean a lot of changes (e.g. all s->bufstart to s->buffer.bufstart), but
>would be safer and more consistant. Of course, strings may not even
>warrant existence outside of a generic String pmc any more.

Strings still warrant their own structure. They're more primitive
than PMCs, so I'd rather they stay that way.

> > a) no pmc type morphing. once in a pool, it stays in a pool. I don't see
>> this as a big loss, since type morphing is error-prone to begin with, imo.
>The main issue here would be the definition of pmc type, in an untyped
>language. We may need a PerlScalar pmc type, as that is what most Perl
>variables really are - if we stick to using pmc types based on current
>content, then we need to be able to morph between the different
>subclasses of PerlScalar as the contents change.

We don't need separate classes for Perl scalars--that they're done
that way is an accident of implementation. They could be a single
class with multiple vtables.

> > b) data members! Since not all pmcs are the same size, pmcs are able to
>> store data elements in their structure. This allows us to make a SV-like
>> PMC which stores str-value, int-value, float-value, etc. All without
>Okay, you were obviously thinking the same way!

There are sufficient advantages to fixed-sized PMCs that I'd rather
keep them that way.

>One option would be to use a limited set of physical sizes (only multiples
>of 16 bytes or something) and have free lists per physical size, rather than
>per individual pool. This would waste some space in each header, but may
>be more efficient overall.

For Bufferish things, that's fine. That's been the plan, just one
that hasn't been implemented yet

> > Finally....the unification of buffers and PMCs means that buffers can now
>> point to things of their own accord, without requiring that they be
>> surrounded by an accompanying PMC type.
>How about the other way round? If the one-size-fits-all PMCs were to be
>replaced by custom structures, then everything could be a PMC, and
>buffer headers as a separate resource could just disappear!

I'd rather not. Separate things for separate purposes.

Dan Sugalski

unread,

Aug 4, 2002, 5:34:55 PM8/4/02

to Peter Gibbs, Mike Lambert, perl6-internals

At 10:50 PM +0200 8/4/02, Peter Gibbs wrote:
>However, Dan changes his mind regularly, so it might be safer to
>get a new ruling before you do much work on it.

That's because I don't think about it very often, and don't so much
remember the decisions on it as make one whenever the subject arises.
Which is stupid and untenable.

So, with the assumption that COW only applies to Strings (and not
Buffers generically), do whatever you want that works. All access to
String internals has to be through routines in string.c, so as long
as they know what to do I don't care.

Dan Sugalski

unread,

Aug 5, 2002, 3:11:31 AM8/5/02

to Josef Hook, Mike Lambert, perl6-i...@perl.org

At 9:06 AM +0200 8/5/02, Josef Hook wrote:
>On Sun, 4 Aug 2002, Mike Lambert wrote:

>
>> Mike Lambert wrote:
>>
>> > One idea, which is most closely in line with the current semantics, is to
>> > add a pool pointer to every header. I've found a few times in the past
>> > where such a pointer would have come in handy. This would allow us to call
>> > the pool's mark() function, to handle stuff like pointing-to-buffers, etc.
>>

>> Oh, I meant to mention an alternative to the pool pointer, but forgot...
>>

>> At one point, we had a mem_alloc_aligned, which guaranteed the start of a
>> block of memory given any pointer into the contents of the block. If we
>> store a pointer to the pool at the beginning of each set of headers, then
>> we navoid the need for a per-header pool pointer, at the cost of a bit
>> more math and an additional dereference to get at it.
>>

>> The benefits to this are the drawbacks to the aforementioned approach, but
>> the drawbacks include:
>>
>> - additional cpu, and/or cache misses in getting to the pool. for dod,
>> this might be very inefficient.
>>

>> - it imposes additional memory requirements in order to align the block of
>> memory, and imposes a bit more in this 'header header' at the beginning of
>> the block of headers.
>>

>> Mike Lambert
>>
>>
>
>I guess these things will affect my matrix implementation.
>Dan: How long will it take until my code is merged with current
>tree?

Damn, I'm way behind--my apologies. Generally I'd prefer the matrix
stuff to live in a PMC class, rather than go into the core code. I'm
not, at the moment, convinced that there's a need to have the core
get direct access to matrices rather than going through a PMC. (I may
change my mind when multimethod dispatch comes in)

Josef Hook

unread,

Aug 5, 2002, 3:06:20 AM8/5/02

to Mike Lambert, Dan Sugalski, perl6-i...@perl.org

On Sun, 4 Aug 2002, Mike Lambert wrote:

I guess these things will affect my matrix implementation.

Dan: How long will it take until my code is merged with current
tree?

/Josef

Josef Hook

unread,

Aug 5, 2002, 3:17:06 AM8/5/02

to Dan Sugalski, Mike Lambert, perl6-i...@perl.org

My idea was mainly to reuse cell.c cell.h for a multiarray.pmc. That's why
i moved it to the core. I guess we could move the stuff in matrix_core.c
into matrix.pmc.

/J

Melvin Smith

unread,

Aug 5, 2002, 8:40:01 PM8/5/02

to Dan Sugalski, Josef Hook, Mike Lambert, perl6-i...@perl.org

This Matrix class may good to use in hacking up dynamically loadable
PMCs. That way we may start a separate distribution of "Parrot libs".

-Melvin