Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Of PMCs Buffers and memory management

14 views

Skip to first unread message

Leopold Toetsch

unread,

Sep 27, 2002, 3:28:41 AM9/27/02

to P6I

I have some questions and suggestions regarding PMCs, Buffers and memory
management/internal data structures.

First and foremost, is there any compelling reason, to have totally
different structures for PMCs and Buffers?
- Both have a ->data aka ->bufstart
- Both have ->flags, that have vastly the same meaning.

This separation means two different routines for marking in DOD,
separated allocation and so on - a lot of code duplication, which
is IMHO not necessary.
Finally they are both SmallObjects and handled as these in the deep
inyards of arenas.

DOD considerations

We have currently:
- buffer_lives -> BUFFER_live_FLAG
- mark_used -> PMC_live_flag + next_for_GC

If PMC and Buffers are unified, it should be possible to mark them in
one recursive process:

mark(buffer) {
if (life_flag) // already done
return
set life_flag
if (buffer_is_buffer_ptr)
mark(buffer->data)
else if(buffer_is_array_of_buffers)
mark(...) for (buffer->data[..])
else if(has_custom_mark)
((PMC*)buffer)->vtable->mark()
}

Above mark routined called for each register/stack entry/whatever,
where now PMCs & Buffers are marked, should catch _all_ entries. No need
for next_for_GC, 2 different subroutines and a second pass for PMCs.

I propose to unify these structure, i.e. make PMC a buffer_like
structure, unify the flags, and treat them vastly the same.

2)
What is PMCs member:
SYNC *synchronize; /* undocumented + unused */

3)
What are: arena_base->extra_buffer_headers;

4)
Is there any deeper reason that the sized_small_object pool allocates
unused slots for intermediate object sizes?

Finally, if we ever have multiple interpreters, which can be built
dynamically, _all_ structures including the interpreter itself and
it's internal data structures should have to be derived from a Buffer
object (or have to manage there own destroy method). If not, these
interpreters will leak memory like the current one, and this is more
then a bunch of sieves.

Comments welcome,
leo

Jason Gloudon

unread,

Sep 27, 2002, 11:40:27 AM9/27/02

to Leopold Toetsch, P6I

On Fri, Sep 27, 2002 at 09:28:41AM +0200, Leopold Toetsch wrote:

> First and foremost, is there any compelling reason, to have totally
> different structures for PMCs and Buffers?

The reasons stopped being compelling about a month or two ago, when it was
decided to unify the two. No one has had the time and knowledge to do it.

--
Jason

Mike Lambert

unread,

Sep 27, 2002, 5:27:49 PM9/27/02

to Leopold Toetsch, P6I

> First and foremost, is there any compelling reason, to have totally
> different structures for PMCs and Buffers?
> - Both have a ->data aka ->bufstart
> - Both have ->flags, that have vastly the same meaning.

As jason said in another message, Dan has changed his mind from
yesteryear, and decided that buffers and pmcs should be the same
structure. There are a few ideas of my own that would be better
implemented if we unified the two, but unfortunately, I haven't have the
motivation to unify them. I made a few passes at it, and the task is just
monumental, in terms of lots of search and replaces, and lots of debugging
to track down why the semantics have broken. :)

I could attempt a piecemeal conversion, submitting patches that get us a
bit closer, except that each patch would not be acceptable on its own due
to the confusion introduced. ie, having PMC use BUFFER_*_FLAGs, or having
worse memory usage/dod-speeds because of the larger size of buffers/pmcs
after they are unified, etc.

> This separation means two different routines for marking in DOD,
> separated allocation and so on - a lot of code duplication, which
> is IMHO not necessary.
> Finally they are both SmallObjects and handled as these in the deep
> inyards of arenas.

Currently, there are two different marking routines, but with good reason.
PMCs can recursively reference PMCs, and thus their next_for_GC to avoid
recursion in DOD (see below). Buffers, on the other hand, are for raw data
(although pmcs referencing buffers can do magic with the buffer's data
itself). The fact that both use the smallobject allocator is a more recent
introduction, and it wasn't that way in their original design.

> DOD considerations
>
> We have currently:
> - buffer_lives -> BUFFER_live_FLAG
> - mark_used -> PMC_live_flag + next_for_GC
>
> If PMC and Buffers are unified, it should be possible to mark them in
> one recursive process:
>
> mark(buffer) {
> if (life_flag) // already done
> return
> set life_flag
> if (buffer_is_buffer_ptr)
> mark(buffer->data)
> else if(buffer_is_array_of_buffers)
> mark(...) for (buffer->data[..])
> else if(has_custom_mark)
> ((PMC*)buffer)->vtable->mark()
> }

Yes, that works. But there is a reason for next_for_GC. The next_for_GC
creates a linked list. This linked list is then iterated over in a for
loop. There is no recursion, no chance of blowing the C stack, no worries
about the overheads of recursive calls, etc.

So while it may seem more memory efficient to not use next_for_GC, it
actually isn't. A linked list of 500 elements would cause 500 recursive
calls and use more memory than would a next_for_GC solution.

> 2)
> What is PMCs member:
> SYNC *synchronize; /* undocumented + unused */

This is for multi-threaded access, where you need to synchronize on
something as a way to control access to the PMC. Of course, this is
entirely placeholder, as we don't have multi-threading or multiple
interpreters. :)

> 3)
> What are: arena_base->extra_buffer_headers;

These are an array of pointers to buffer headers. For example, the
interpreter has some buffers "inlined" into the actual interpreter struct.
These headers aren't part of any header pools, but the data they reference
should be retained when pools are copied. This could be called a hack, and
maybe we should force all headers to come from header pools. But there is
no compelling reason to do so, at this point in time. (I have some ideas
that would require it, tho)

> 4)
> Is there any deeper reason that the sized_small_object pool allocates
> unused slots for intermediate object sizes?

This currently isn't used, although my plan was to use it for KEY/HASH
structs before their designs were changed such that this wasn't necessary.
:)

The sized_small_object pool is intended for pools where you only care
about allocating objects of a given size. It creates an array of
sized-pool pointers. Then it indexes into this array by
sizeof-this-smallobject / sizeof(void*). If there are no objects of a
given size, then there is just a null pointer in the array. Are you seeing
something else?

> Finally, if we ever have multiple interpreters, which can be built
> dynamically, _all_ structures including the interpreter itself and
> it's internal data structures should have to be derived from a Buffer
> object (or have to manage there own destroy method). If not, these
> interpreters will leak memory like the current one, and this is more
> then a bunch of sieves.

Perhaps. But buffers are for storing data. The "proper" way to make it a
"buffer" would be a sized buffer with lots of fields attached to it, and
maybe some data in bufstart (not sure). Then one'd need to wrap it in a
PMC in order to give it a custom mark() method, so that fields of the
sized buffer interpreter header could be marked() and buffer_lives()
themselves. (Currently, this is done in dod.c).

If they were unified, the PMC would be an interpreter referencing a sized
buffer header. Or if we had sized PMCs, the fields could be part of
it, avoiding the need for a buffer.

However, as far as leaking memory, there is no reason that interpreters
have to be PMC/buffers. Just as we have an make_interpreter to create an
interpreter, we can have an unmake_interpreter that destroys the
interpreter. I don't think we want interpreters appearing and
disapppearing with references...they should be explicitly created and
destroyed. But that's a discussion for another thread. My point is that
all things don't need to be traced, and some stuff can be handled
manually, as long as the perl programmer doesn't see it directly.

Hope this helps answer your questions,
Mike Lambert

Leopold Toetsch

unread,

Sep 28, 2002, 8:50:54 AM9/28/02

to Mike Lambert, P6I

Mike Lambert wrote:

>>First and foremost, is there any compelling reason, to have totally
>>different structures for PMCs and Buffers?
>>- Both have a ->data aka ->bufstart
>>- Both have ->flags, that have vastly the same meaning.
>>
>
> As jason said in another message, Dan has changed his mind from
> yesteryear, and decided that buffers and pmcs should be the same
> structure. There are a few ideas of my own that would be better
> implemented if we unified the two,

Are there any additional hints or pointers regarding this?

> ... but unfortunately, I haven't have the

> motivation to unify them. I made a few passes at it, and the task is just
> monumental, in terms of lots of search and replaces, and lots of debugging
> to track down why the semantics have broken. :)

Yes, changing internals like these, would yield huge patches and isn't easy.

> I could attempt a piecemeal conversion, submitting patches that get us a
> bit closer, except that each patch would not be acceptable on its own due
> to the confusion introduced. ie, having PMC use BUFFER_*_FLAGs,

The internals during changes could be hidden with some #defines. So the
surface would stay the same.

> ... or having

> worse memory usage/dod-speeds because of the larger size of buffers/pmcs
> after they are unified, etc.

Don't think so, that a unified type has to be larger. I tried to layout
a data hierarchy, which basically should work, when other usages of e.g.
"flags" and "buflen" in other structures or protypes are first renamed
(s. attached test prog, native types used for brevity).

[ recursive marking ]

> So while it may seem more memory efficient to not use next_for_GC, it
> actually isn't. A linked list of 500 elements would cause 500 recursive
> calls and use more memory than would a next_for_GC solution.

I'm not aware of such a deeply nested list. But as marking now knows of
e.g. array of PMCs, it could mark a linked list of PMCs as well, w/o
deep recursion.

>> SYNC *synchronize; /* undocumented + unused */

> This is for multi-threaded access, where you need to synchronize on
> something as a way to control access to the PMC. Of course, this is
> entirely placeholder, as we don't have multi-threading or multiple
> interpreters. :)

Would a "in_use" bit not suffice?

>>What are: arena_base->extra_buffer_headers;

> ... and

> maybe we should force all headers to come from header pools.

I think, we need just the sized pools, keeping things of same size
together and one unsized pool. Both in two variants for vars/constants.

> ... But there is

> no compelling reason to do so, at this point in time. (I have some ideas
> that would require it, tho)

Could you elaborate on these ideas?

[ sized pools slot ]

> ...If there are no objects of a

> given size, then there is just a null pointer in the array. Are you seeing
> something else?

For some reason I misread the code, and thought the unused pools were
allocated.

> ... I don't think we want interpreters appearing and

> disapppearing with references...they should be explicitly created and
> destroyed.

Actually, it's not a big difference, how they are destroyed, but we have
already a "newinterp" opcode, so a interpreter PMC class just needs a
custom destroy method - that get called too ;-)
Though, if nested structures inside the interpreter are all buffers,
destroying them would neatlessly fit into the framework.

> Hope this helps answer your questions,

Yes, thanks you for the detailled answers.

> Mike Lambert

leo

Leopold Toetsch

unread,

Sep 29, 2002, 3:34:19 AM9/29/02

to Mike Lambert, P6I

Mike Lambert wrote:

[ Unifying Buffer and PMC ]

> As jason said in another message, Dan has changed his mind from
> yesteryear, and decided that buffers and pmcs should be the same
> structure.

Roadmap
-------

1) Hide Buffer and PMC internals, namely
- buflen
- bufstart
- flags
- cache
inside access macros and #defines and unify common flags (e.g.
BUFFER_live_flag <=> PMC_live_flag) to have the same value.

E.g.
str->flags & BUFFER_constant_FLAG => get_constant_FLAG(str)
pmc->flags |= BUFFER_live_FLAG => set_live_FLAG(pmc)
pmc->flags &= ~BUFFER_live_FLAG => reset_live_FLAG(pmc)

The actual changes in *.c can be done file by file, are equally readable
and don't disrupt functionallity.

2) Change Buffer, PMC to a common structure, adjust above access macros.
This change would not be seen from the outside.

3) Cleanup, use the common base object, wherever appropriate, e.g.
headers, smallobject, dod.

Proposal for a common structure
-------------------------------

The base object is Pobj (Parrot object), it's very similar to a current
Buffer:

typedef struct _Buffer {
void * bufstart;
size_t buflen;
} _Buffer;

typedef union UnionVal {
int int_val;
double num_val;
void* struct_val;
char* string_val;
struct PMC* pmc_val;
_Buffer b;
} UnionVal;

typedef struct Pobj {
UnionVal u;
unsigned int flags;
} Pobj;

typedef struct Buffer {
Pobj o;
} Buffer;

A PMC is derived from this structure and has additional fields like the
VTABLE amd next_for_GC (if we really need this).

typedef struct PMC {
Pobj o;
VTABLE *vtable;
} PMC;

It would be also possible to have an extended PMC type for aggregates:

typedef struct APMC {
Pobj o;
VTABLE *vtable;
void *data;
} APMC;

.... or just use the latter for both.

Comments welcome,

leo

Mike Lambert

unread,

Sep 29, 2002, 3:39:05 AM9/29/02

to Leopold Toetsch, P6I

> >>First and foremost, is there any compelling reason, to have totally
> >>different structures for PMCs and Buffers?
> >>- Both have a ->data aka ->bufstart
> >>- Both have ->flags, that have vastly the same meaning.
> >>
> >
> > As jason said in another message, Dan has changed his mind from
> > yesteryear, and decided that buffers and pmcs should be the same
> > structure. There are a few ideas of my own that would be better
> > implemented if we unified the two,
>
>
> Are there any additional hints or pointers regarding this?

As far as things that could be done? There've been some discussions on the
mailing list before, but nothing really concrete. Read the thread starting
here:

http://archive.develooper.com/perl6-i...@perl.org/msg11553.html

I have a semi-todo list of things I'd like to get done at some point
regarding the GC system. I can nicefy this and post it if you like.

> > I could attempt a piecemeal conversion, submitting patches that get us a
> > bit closer, except that each patch would not be acceptable on its own due
> > to the confusion introduced. ie, having PMC use BUFFER_*_FLAGs,
>
>
> The internals during changes could be hidden with some #defines. So the
> surface would stay the same.

Yeah, but that seems even worse than the approach I mentioned above, only
because it requires yet another step to untangle and undo the defines
later. :)

> > ... or having
> > worse memory usage/dod-speeds because of the larger size of buffers/pmcs
> > after they are unified, etc.
>
>
> Don't think so, that a unified type has to be larger. I tried to layout
> a data hierarchy, which basically should work, when other usages of e.g.
> "flags" and "buflen" in other structures or protypes are first renamed
> (s. attached test prog, native types used for brevity).

Well, currently hashtables require a surrounding PMC, which has a data
poiinter to a Buffer. So unifying the two would allow these two structures
to be combined, and have a lesser total footprint. But stuff like strings
or perlint pmcs, don't use two buffers, and so would actually be somewhat
larger if we unified the types.

Granted if PMCs became buffers the base buffer "class" wouldn't need
synchronized or cache, but it'd still need next_for_GC (yeah, we disagree
here, I'll argue that point below ;), and possibly room for a vtable. And
to do some of the things I hinted at before, we'd need a pointer back to
the header pool, which isn't kosher with Dan, unless I'm able to
demonstrate performance win.

> [ recursive marking ]
> > So while it may seem more memory efficient to not use next_for_GC, it
> > actually isn't. A linked list of 500 elements would cause 500 recursive
> > calls and use more memory than would a next_for_GC solution.
>
> I'm not aware of such a deeply nested list. But as marking now knows of
> e.g. array of PMCs, it could mark a linked list of PMCs as well, w/o
> deep recursion.

Yes, arrays are a more efficient data structure than linked lists, and
arrays would not totally have the problem of recursive marking blowing the
stack. However, are you going to impose restrictions on users of Perl6
code, telling them that they shouldn't be allowed to create linked lists?
If the programmer creates a linked list in Perl of 500 elements, it could
easily blow the stack. If I were programming and had my working test
program fail as I tried to extend it to realistic data sets, I would be
quite pissed. :)

> >> SYNC *synchronize; /* undocumented + unused */
>
> > This is for multi-threaded access, where you need to synchronize on
> > something as a way to control access to the PMC. Of course, this is
> > entirely placeholder, as we don't have multi-threading or multiple
> > interpreters. :)
>
> Would a "in_use" bit not suffice?

This was pretty much placeholder, I think, as none of the logistics or
semantics have been defined. To do synchronization properly, you need an
OS which can give you the ability to create atomic locking, otherwise it
is impossible to be 100% correct. SYNC* would probably point to this OS
synchronized lock or somesuch. I'm only guessing, as I'm not intimately
familiar with multithreading implementations.

> >>What are: arena_base->extra_buffer_headers;
>
> > ... and
> > maybe we should force all headers to come from header pools.
>
> I think, we need just the sized pools, keeping things of same size
> together and one unsized pool. Both in two variants for vars/constants.

Since header pools are contiguous blocks of memory that are split up into
consecutive headers. It's pretty much impossible to have an unsized pool
of headers. It is possible, however, to have a pool of unsized-header
*pointers*, and that's exactly what extra_buffer_headers is.

Currently, we group all headers of the same size in the same header pool,
although only constants string headers currently have their own pool.
Namely, because we don't really have constants of anything else
implemented yet. :)

> > ... But there is
> > no compelling reason to do so, at this point in time. (I have some ideas
> > that would require it, tho)
>
> Could you elaborate on these ideas?

I guess I will need to write up those ideas. :)

> > ... I don't think we want interpreters appearing and
> > disapppearing with references...they should be explicitly created and
> > destroyed.
>
> Actually, it's not a big difference, how they are destroyed, but we have
> already a "newinterp" opcode, so a interpreter PMC class just needs a
> custom destroy method - that get called too ;-)
> Though, if nested structures inside the interpreter are all buffers,
> destroying them would neatlessly fit into the framework.

Yes, it would. But a lot of the interpreters structures have data fields,
and those don't work too well as buffer data. They could work as part of a
sized buffer header, I suppose. I think it would be much easier to make
the interpreter PMC-ish, or at least have a PMC wrapper. Then this PMC
can have an active-destroy method, which would properly clean up
everything that needed to be cleaned up. Since the interpreter memeory
would be malloc-allocated, it wouldn't be copied or cleaned on it's own.
The PMC would become an interface for the GC system to control the
lifetime of the allocated interpreter memory, since the GC system would
control the PMC.

Mike Lambert

Leopold Toetsch

unread,

Sep 29, 2002, 9:03:32 AM9/29/02

to Mike Lambert, P6I

Mike Lambert wrote:

> http://archive.develooper.com/perl6-i...@perl.org/msg11553.html

Thanks for this. I must have missed some parts of this discussion on the
list. Aligning the header pools could be an interesting approach, since
now a considerable amount of time is spent to determine if a pointer on
the sysem stack is contained in a PMC/buffer-header. This would be a lot
faster then.

>>The internals during changes could be hidden with some #defines. So the
>>surface would stay the same.
>>
>
> Yeah, but that seems even worse than the approach I mentioned above, only
> because it requires yet another step to untangle and undo the defines
> later. :)

No need to undo anything. S. followup on my previous posting.

> Yes, arrays are a more efficient data structure than linked lists, and
> arrays would not totally have the problem of recursive marking blowing the
> stack. However, are you going to impose restrictions on users of Perl6
> code, telling them that they shouldn't be allowed to create linked lists?

No, I don't want to impose restrictions on any HL of course. So
recursive marking is burried.

What about a PMC external store for the PMC's still to visit? The amount
of dead PMCs compared to the active PCMs at DOD time seems to be rather
high, so this could be more memory efficient. And only nested PMCs would
be on this external list, for plain PMCs just the life bit is set,
that's all.

>>I think, we need just the sized pools, keeping things of same size
>>together and one unsized pool. Both in two variants for vars/constants.
>>
>
> Since header pools are contiguous blocks of memory that are split up into
> consecutive headers. It's pretty much impossible to have an unsized pool
> of headers. It is possible, however, to have a pool of unsized-header
> *pointers*, and that's exactly what extra_buffer_headers is.

Sorry, when writing above, I ment of course Buffer_headers, not the rwa
buffer data themselfs.

> The PMC would become an interface for the GC system to control the
> lifetime of the allocated interpreter memory, since the GC system would
> control the PMC.

Yep, exactly

> Mike Lambert

leo

0 new messages