Start of thread proposal

Dan Sugalski

unread,

Jan 19, 2004, 3:16:19 PM1/19/04

to perl6-i...@perl.org

I've not gotten into the technical bits yet. That's next, but rip
this apart first.

=head1 DEFINITIONS

So we can all talk about things the same way, the following
definitons apply. Some of these are drawn from the POSIX thread spec,
and as such we should have a translation section at the end.

=over 4

=item THREAD

An OS level thread. If that makes no sense, neither will any of the
rest of this, in which case I recommend picking up "Programming with
POSIX Threads" by Dave Butenhof, and coming back when you have.

=item MUTEX

This is a low level, under the hood, not exposed to users, thing that
can be locked. They're non-recursive, non-read/write, exclusive
things. When a thread gets a mutex, any other attempt to get that
mutex will block until the owning thread releases the mutex. The
platform-native lock construct will be used for this.

=item LOCK

This is an exposed-to-HLL-code thing that can be locked. Only PMCs can
be locked, and the lock may or may not be recursive or read/write.

=item CONDITION VARIABLE

The "sleep until something pings me" construct. Useful for queue
construction. Not conceptually associated with a MUTEX. (POSIX
threads require this, so we're going to hide it there behind macros
and/or functions)

=item RENDEZVOUS POINT

A HLL version of a condition variable.

=item INTERPRETER

Those bits of the Parrot_Interp structure that are absolutely required
to be thread-specific. This includes the current register sets and
stack pointers, as well as security context information. Basically if
a continuation captures it, it's the interpreter.

=item INTERPRETER ENVIRONMENT

Those bits of the Parrot_Interp structure that aren't required to be
thread-specific (though I'm not sure there are any) I<PLUS> anything
pointed to that doesn't have to be thread-specific.

The environment includes the global namespaces, pads, stack chunks,
memory allocation regions, arenas, and whatnots. Just because the
pointer to the current pad is thread-specific doesn't mean the pad
I<itself> has to be. It can be shared.

=item INDEPENDENT THREAD

A thread that has no contact I<AT ALL> with the internal data of any
other thread in the current process. Independent threads need no
synchronization for anything other than what few global things we
have. And the fewer the better, though alas we can't have none at all.

Note that independent threads may still communicate back and forth by
passing either atomic things (ints, floats, and pointers) or static
buffers that can become the property of the destination thread.

=item SHARED THREAD

A thread that's part of a group of threads sharing a common
interpreter environment.

=back

=head1 REQUIREMENTS

=head2 Supported Models

=over 4

=item POSIX threads

The threading scheme must be sufficient to support a POSIX
share-everything style of threading, such as is used in perl 5's
"pthread" model, as well as the thread models for Ruby and Python.

=item "Process-type' threads

The scheme must also support the perl 5 "iThreads" threading
model. In this model no data is shared implicitly, and all sharing
must be done on purpose and explicitly. It very much resembles the
Unix fork-process-with-shared-memory-segment model, not a surprise as
it was originally developed with

=back

=head2 Guarantees

=over 4

=item No Crashes

The interpreter guarantees that no user program errors of any sort
will crash the interpreter. This includes threading problems. As
such, synchronization issues (where multiple interpreters are
accessing the same shared data) must not crash the interpreter or
corrupt its internal state.

=back

=head2 Assumptions

=over 4

=item System memory allocation routines are threadsafe

We are assuming that the memory allocation system of the base OS is
threadsafe. While some of the C runtime libraries are notoriously
thread dangerous, memory allocation code generally is threadsafe, and
we'll assume that on all platforms. (Though we will, in general,
manage our own memory)

=back

=head1 Proposal

The proposal is as follows:

=over 4

=item All global state shall be protected by mutexes

Straightforward enough. This allows for independent threads to
coexist without threatening the state of the proces.

=item Multiple independent interpreters will be allowed

Once again, straightforward enough. With threadsafe protected global
state, there's no issue here.

=item Only one OS thread in an interpreter at once

While there is no requirement that any interpreter be tied to an
underlying OS thread, under no circumstances may multiple OS threads
use a single interpreter simultaneously.

=item A Stop-and-copy communication method will be provided

Parrot will provide a function to make a call into another interpreter
and wait for that call to complete. This call may pass in data and
have data returned to it. The interpreter making the call will block
until the call is complete. The data passed in as parameters will be
copied into the called interpreter, and any return values will be
copied back into the calling interpreter. The called interpreter will
block while the return data is copied back into the calling interpreter.

=item Inter-interpreter events will be provided

Interpreters will be able to post events to other interpreters.

=item Each interpreter will have a unique id

This ID will be independent of the process or OS thread, and will be
constant across the lifetime of the interpreter. Interpreter IDs
I<may> be reused as interpreters are destroyed and recreated, and as
such are only guaranteed valid while an interpreter is in use.

(Note that we may decide to relax this requirement, but doing so
likely means moving to at least 64-bit integers to mark interpreter IDs)

=item Each interpreter show the same process id

All the interpreters within a process will share a process ID. On
those systems where each thread has its own unique ID (such as many
versions of Linux) Parrot will still report a single process ID for
all interpreters.

This process ID will be the ID of the process that first instantiated
Parrot.

=item Interpreter pools will share allocation pools

All the interpreters in an interpreter pool will share header and
memory allocation pools. This means that when there is more than one
interpreter in a pool the memory allocation and collection system
needs to be swapped out, as a copying collector is generally
untenable in a threaded environment.

As the allocation and collection system is a black box to user
programs and much of the interpreter internals, this isn't a big deal
outside of needing swappable allocation systems, the potential
issue of COW'd shared memory leaking, and the need to switch
allocation schemes mid-execution.

=item Each interpreter has a separate event queue

Some events, such as timers, may be interpreter-specific and, as
such, each interpreter has its own event queue.

=item Each interpreter pool has a shared event queue

Some events, such as IO callbacks, may not be interpreter-specific,
and can be serviced by any interpreter in the interpreter pool. For
these events, there is a pool-wide event queue.

=item PMCs are the coordination point for threads

That is, only PMCs are shared as such between threads. Strings,
specifically, are I<not> shared between interpreters as such

=item All PMCs shared amongst interpreters in a pool must be marked shared

A PMC which is not marked shared may not be handed to another
interpreter. Parrot will prevent this from happening either by
marking the PMC as shared, or throwing an exception when the PMC is
placed in a spot where it may be shared but is not shareable.

=item All shared PMCs must have a threadsafe vtable

The first thing that any vtable function of a shared PMC must do is to
aquire the mutex of the PMCs in its parameter list, in ascending
address order. When the mutexes are released they are not required to
be released in any order.

=item Automatic PMC sharing will be provided

When a PMC is placed into a container which is shared (including
lexical pads and global namespaces) then that PMC will automatically
be marked as shared. It is acceptable for this to trigger an
exception if for some reason a PMC should not be shared between
interpreters.

PMCs are, by default, not shared. This avoids sharing overhead for
PMCs which are only used as temporaries and not shared. (Note that
this is dangerous, and may end up not being done, due to the sharing
of continuations)

=item All interpreter constructs in a pool are shareable

This means that a PMC or string may be used by any interpreter in a
pool. It additionally means that, if full sharing is enabled, that
any interpreter in a pool may invoke a continuation, assuming the
continuation is valid. (That is, a continuation taken at parrot's top
level. Continuations taken within vtable functions, user-defined ops,
or extension code may not be shareable)

=item The embedding API will allow posting events to a pool

Many events are interpreter-specific, often caused by one particular
interpreter requesting an async event that later completes.

=item The embedding API will allow posting events to an interpreter

For events that don't have to go to any particular interpreter, they
can go into the pool's event loop.

=item The embedding API will allow calling a parrot sub with a pool

In those cases where there is an interpreter pool, embedders may call
a parrot sub using the pool as a whole, rather than an individual
interpreter, to run the sub. In that case Parrot may either choose a
dormant interpreter (if there is one) or create a new interpreter in
the pool to run the subroutine.

When the sub is done, Parrot may either cache the created
interpreter or destroy it as it needs to, though in no case will
Parrot ever leave a pool with no interpreters at all.

=back

--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Dan Sugalski

unread,

Jan 19, 2004, 3:37:34 PM1/19/04

to perl6-i...@perl.org

At 3:16 PM -0500 1/19/04, Dan Sugalski wrote:
>I've not gotten into the technical bits yet. That's next, but rip
>this apart first.

First rip -- I left a section unfinished. It should be:

=item "Process-type" threads

The scheme must also support the perl 5 "iThreads" threading
model. In this model no data is shared implicitly, and all sharing
must be done on purpose and explicitly. It very much resembles the
Unix fork-process-with-shared-memory-segment model, not a surprise as

it was originally developed with emulation of Unix's fork system in
mind.

Uri Guttman

unread,

Jan 19, 2004, 4:37:27 PM1/19/04

to Dan Sugalski, perl6-i...@perl.org

>>>>> "DS" == Dan Sugalski <d...@sidhe.org> writes:

DS> =item All shared PMCs must have a threadsafe vtable

DS> The first thing that any vtable function of a shared PMC must do is to
DS> aquire the mutex of the PMCs in its parameter list, in ascending
DS> address order. When the mutexes are released they are not required to
DS> be released in any order.

why the ascending address order to grab mutexes? is this to help solve
deadlocks?

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Luke Palmer

unread,

Jan 19, 2004, 4:44:10 PM1/19/04

to Dan Sugalski, perl6-i...@perl.org

Dan's thread proposal mentions:

> =item Automatic PMC sharing will be provided
>
> When a PMC is placed into a container which is shared (including
> lexical pads and global namespaces) then that PMC will automatically
> be marked as shared. It is acceptable for this to trigger an
> exception if for some reason a PMC should not be shared between
> interpreters.
>
> PMCs are, by default, not shared. This avoids sharing overhead for
> PMCs which are only used as temporaries and not shared. (Note that
> this is dangerous, and may end up not being done, due to the sharing
> of continuations)

I don't know why this is dangerous. A continuation is a data structure
just like an array or a hash. When you share it, everything "inside" it
gets shared, too. For a continuation, this could be an awful lot of
stuff, but it's the safe way.

Luke

Dan Sugalski

unread,

Jan 19, 2004, 4:45:57 PM1/19/04

to Uri Guttman, perl6-i...@perl.org

At 4:37 PM -0500 1/19/04, Uri Guttman wrote:
> >>>>> "DS" == Dan Sugalski <d...@sidhe.org> writes:
>
> DS> =item All shared PMCs must have a threadsafe vtable
>
> DS> The first thing that any vtable function of a shared PMC must do is to
> DS> aquire the mutex of the PMCs in its parameter list, in ascending
> DS> address order. When the mutexes are released they are not required to
> DS> be released in any order.
>
>why the ascending address order to grab mutexes? is this to help solve
>deadlocks?

Yes. The recommendation I've always seen for deadlock avoidance is to
always have all your code grab its mutexes in some fixed order. With
source, it's generally recommended that you grab mutex variables in
lexical order. Since we're all binary we need a different order, and
ascending addresses are reasonably simple to do.

Dan Sugalski

unread,

Jan 19, 2004, 4:47:45 PM1/19/04

to Luke Palmer, perl6-i...@perl.org

True, but the danger part is if we don't mark everything grabbed by a
continuation as shared by default. If we do, we might as well mark
everything as shared, as there'll be less overhead.

Dan Sugalski

unread,

Jan 19, 2004, 6:06:25 PM1/19/04

to Gordon Henriksen, perl6-i...@perl.org

At 5:32 PM -0500 1/19/04, Gordon Henriksen wrote:
>Dan Sugalski wrote:
>
>> [A] copying collector is generally untenable in a threaded
>environment.
>
>Can you elaborate upon the reasoning behind this statement?
>

Sure. For a copying collector to work, all the mutators must be
blocked, and arguably all readers should be blocked as well. (I think
they need to be, otherwise you may be accessing reclaimed and reused
memory) That means if we keep a copying collector we need to have
everything that accesses strings or pmcs to get a lock on the GC
before every access of every string or PMC. A touch excessive, I
think. :)

> > The first thing that any vtable function of a shared PMC must do is to
>

>> aquire the mutex of the PMCs in its parameter list, in ascending

>> address order. When the mutexes are released they are not required to

>> be released in any order.
>

>Note that this locking strategy cannot avoid deadlock if the user is
>allowed to acquire these locks--HLL locks must be altogether different
>beasts from automatic PMC locks. That's okay. Just a design consequence
>worth noting for everyone.

Oh, arguably it can't avoid deadlock at all, what with vtable methods
having access to the full environment. I can live with deadlocks,
only because there's no real alternative.

Gordon Henriksen

unread,

Jan 19, 2004, 6:37:01 PM1/19/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski wrote:

> For a copying collector to work, all the mutators must be blocked,
> and arguably all readers should be blocked as well.

True of non-moving collectors, too. Or, let me put it this way: non-
copying *GC* (the sweep or copy phase) can be threadsafe, but the mark
phase is never threadsafe. The method in which marking is not
threadsafe is a bit more pathological (i.e., it's not the common case
as it is with the copying collector), but a standard tracing DOD
cannot be correct when competeting with mutators. It WILL collect non-
garbage (those are MUTATORS, remember), and the result WILL be
Heizenbugs and crashes.

Some of what I've written up addresses why. It's pretty simple to
demonstrate a single case to prove the point, but I don't feel like
re-creating the ASCII art right now. :) I'll send that section when I
get out of the office.

parrot will have to be able to suspend all threads in the environment.
Unfortunate, but really quite unavoidable.

> Oh, arguably it can't avoid deadlock at all, what with vtable methods
> having access to the full environment. I can live with deadlocks,
> only because there's no real alternative.

But PMC implementations have to fall inside of the trusted environment,
so that's not really a failure. Of course uncooperative code can break
a cooperative algorithm. :)

--

Gordon Henriksen
IT Manager
ICLUBcentral Inc.
gor...@iclub.com

Sam Vilain

unread,

Jan 19, 2004, 6:55:17 PM1/19/04

to Dan Sugalski, Uri Guttman, perl6-i...@perl.org

On Tue, 20 Jan 2004 10:45, Dan Sugalski wrote;

> Yes. The recommendation I've always seen for deadlock avoidance is
> to always have all your code grab its mutexes in some fixed order.

Yes; otherwise, you need to back off and start again, if one lock
acquisition fails.

Consider these functions; for the purpose of this example, lexical
lock ordering is implied;

func1($AAAA, $CCCC, $FFFF, $GGGG, $KKKK);
func2($BBBB, $DDDD, $FFFF, $IIII, $KKKK);
func3($FFFF, $KKKK);

So long as the locks are ramped up from the lowest to the highest,
there is no chance of func1 acquiring a lock to be held by func2 (eg,
$KKKK), if that other function already has one of the shared
dependancies (eg, $FFFF).

All well and good. But, what about recursive locks?

ie

sub func1($one is locked, $two is locked, $three is locked) {
for my $x ($one, $two, $three) {
func2($x.property) if $x.property;
}
}

sub func2($eins is locked) {
if ($eins.property) {
func2($eins.property, { });
}
}

$aaa.property = { };
$bbb.property.property = $aaa;
$ccc = { };

if (thork()) { # fork a thread
# thread A
func1($aaa, $bbb, $ccc);
} else {
# thread B
func2($bbb.property);
}

OK, the execution order that causes the deadlock is:

1. Thread B - func2() acquires a lock on the $bbb.property PMC.
2. Thread A - func() acquires a lock on $aaa, $bbb, $ccc.
3. Thread A - func2() acquires a lock on $aaa.property, which
returns quickly
4. Thread A - func2() blocks waiting for a lock on $bbb.property,
held by func2() in thread B
5. Thread B - func2() blocks waiting for a lock on
$bbb.property.property, held by func() in thread A.

So, there are still possibilities for deadlocks, as the non-linear
nature of subroutine calls screws up your nice lock ramping.

In summary, acquiring mutexes in a defined order as a means to avoid
deadlocks only works when you are acquiring them all up front. If you
are acquiring any later, and you detect a deadlock (ie, a loop of
threads blocking on locks held by each other), you *must* be able to
tell one of them to "ramp off" to holding no locks at all. ie,
ROLLBACK :).

The bugger is, winding back execution state automatically to when you
last started acquiring locks is probably an intractable problem from
an interpreter's perspective...

Sounds like a job for an exception to me ;-).

for (1..50) {
eval {
func_that_acquires_first_lock($args);
};
last unless $@ and $@ =~ m/mutex deadlock/i;
}

--
Sam Vilain, s...@vilain.net

When I sell liquor, its called bootlegging; when my patrons serve it
on Lake Shore Drive, its called hospitality.
AL CAPONE

Dan Sugalski

unread,

Jan 19, 2004, 8:24:36 PM1/19/04

to nigels...@btconnect.com, perl6-i...@perl.org

At 12:58 AM +0000 1/20/04, nigels...@btconnect.com wrote:
> > =item MUTEX
>>
>> This is a low level, under the hood, not exposed to users, thing that
>> can be locked. They're non-recursive, non-read/write, exclusive
>> things. When a thread gets a mutex, any other attempt to get that
>> mutex will block until the owning thread releases the mutex. The
>> platform-native lock construct will be used for this.
>>
>

>Will this be macroised to hide the platform native implementation from
>the main body of the code?

Yes.

> >
>> The "sleep until something pings me" construct. Useful for queue
>> construction. Not conceptually associated with a MUTEX. (POSIX
>> threads require this, so we're going to hide it there behind macros
>> and/or functions)
>>

>Could you elaborate on the nature of what would constitute a "ping"?

POSIX has a function cond_signal (and cond_broadcast, which is
similar). What happens is a thread grabs the condition variable and
goes to sleep, and sleeps until another thread does a cond_signal,
which then wakes it up. If there are multiple threads sleeping on the
condition variable, only one is woken. (cond_broadcast wakes them all
up)

> > =item POSIX threads
>>
>> The threading scheme must be sufficient to support a POSIX
>> share-everything style of threading, such as is used in perl 5's
>> "pthread" model, as well as the thread models for Ruby and Python.
>>
>

>Thankyou:)

No problem. It's the model I prefer. :)

> > =item "Process-type' threads
>>
>> The scheme must also support the perl 5 "iThreads" threading
>> model. In this model no data is shared implicitly, and all sharing
>> must be done on purpose and explicitly. It very much resembles the
>> Unix fork-process-with-shared-memory-segment model, not a surprise as
>> it was originally developed with
>>
>

>Pseudo forks?

Yeah. On Win32, when Perl forks it starts a new thread and clones the
interpreter and its contents. If you later do an exec in that thread
it starts a new process and grabs hold of it. Definitely a clever
trick.

>An interesting extract from the Windows Services for Unix docs.
>
>"The Interix subsystem shows improvements in all aspects of
>performance ranging from 30 percent improvements in fork and
>exec performance..."
>
>It's sales pitch, but I'm trying to verify it (and build parrot
>using it).

If it works, cool. We can add environment probing at configure time.
I worry some about the stability and evil involved there (I've a good
idea how much work it is to weld fork into a process with a
fundamentally different process model) but that's Microsoft's
problem, not ours.

> > =item System memory allocation routines are threadsafe
>>
>> We are assuming that the memory allocation system of the base OS is
>> threadsafe. While some of the C runtime libraries are notoriously
>> thread dangerous, memory allocation code generally is threadsafe, and
>> we'll assume that on all platforms. (Though we will, in general,
>> manage our own memory)
>>
>

>Is there any likelyhood that memory allocation will be hidden behind
>macros at two levels:
> - ALLOCPOOL() for allocating large chunks of memory (ppols) that are
> later sub-allocated (and managed) by:
>
> - SUBALLOC() for sub allocating within a pool

We'll generally hide behind the memory.c routines, if for no other
reason than to allow the embedder to override the memory functions.
We need to define that at some point...

> >
>> =item Each interpreter will have a unique id
>>
>> This ID will be independent of the process or OS thread, and will be
>> constant across the lifetime of the interpreter. Interpreter IDs
>> I<may> be reused as interpreters are destroyed and recreated, and as
>> such are only guaranteed valid while an interpreter is in use.
>>
>

>The provision of method(s) to obtain the native TIDs/HANDLES would
>make life for those writing implementation specific extensions easier.

TIDs, definitely. It'll be a sysinfo or interpinfo function. There's
the issue of interpreters binding to different threads, but we'll
deal with that. I'm not sure what the HANDLE is, but explain it and
we can work something out. :)

Gordon Henriksen

unread,

Jan 20, 2004, 12:04:36 AM1/20/04

to Dan Sugalski, perl6-i...@perl.org

On Monday, January 19, 2004, at 06:37 , Gordon Henriksen wrote:

> Dan Sugalski wrote:
>
>> For a copying collector to work, all the mutators must be blocked,
>> and arguably all readers should be blocked as well.
>

> True of non-moving collectors, too. [...]
>
> Some of what I've written up addresses why. [...] I'll send that

> section when I get out of the office.

Consider this simple object graph:

Root set = { &A, &B }

[ A=NULL]
[ B=&C ] ---> [ C=....]

Collection begins in thread 1 and marks A as reachable before being
preempted by thread 2:

[xA=NULL]
[ B=&C ] ---> [ C=....]

Thread 2 sets A to &C, and nullifies B before being preempted again by
thread 1:

[xA=&C ] ---> [ C=....]
[ B=NULL]

Thread 1 marks B as reachable:

[xA=&C ] ---> [ C=....]
[xB=NULL]

The mark phase complete, thread 1 sweeps:

[ A=&???] ---> ???
[ B=NULL]

C was not marked reachable (although it was) and was thus erroneously
collected, leaving a dangling pointer. This problem applies equally to
copying and mark-sweep collectors.

—

Gordon Henriksen
mali...@mac.com

Dave Whipp

unread,

Jan 19, 2004, 7:59:42 PM1/19/04

to perl6-i...@perl.org

"Dan Sugalski" <d...@sidhe.org> wrote:
> =head2 Guarantees

Maybe this isn't strictly a threading thing, but are we going to make any
guarantees about event orderings? For example, will we guarantee that a
sequence of events send from one thread to another will always be received
in the order they are sent? (Obviously no guarantees about interleaving of
events from other threads). This is usually only important in distributed
environments where multiple paths exists between a pair of endpoints, but it
would be nice for user-code to be able to rely on it.

Dave.

nigels...@btconnect.com

unread,

Jan 19, 2004, 7:58:49 PM1/19/04

to perl6-i...@perl.org

> =item MUTEX
>
> This is a low level, under the hood, not exposed to users, thing that
> can be locked. They're non-recursive, non-read/write, exclusive
> things. When a thread gets a mutex, any other attempt to get that
> mutex will block until the owning thread releases the mutex. The
> platform-native lock construct will be used for this.
>

Will this be macroised to hide the platform native implementation from
the main body of the code?

>

> The "sleep until something pings me" construct. Useful for queue
> construction. Not conceptually associated with a MUTEX. (POSIX
> threads require this, so we're going to hide it there behind macros
> and/or functions)
>

Could you elaborate on the nature of what would constitute a "ping"?
>

> =item POSIX threads
>
> The threading scheme must be sufficient to support a POSIX
> share-everything style of threading, such as is used in perl 5's
> "pthread" model, as well as the thread models for Ruby and Python.
>

Thankyou:)

>
> =item "Process-type' threads
>
> The scheme must also support the perl 5 "iThreads" threading
> model. In this model no data is shared implicitly, and all sharing
> must be done on purpose and explicitly. It very much resembles the
> Unix fork-process-with-shared-memory-segment model, not a surprise as
> it was originally developed with
>

Pseudo forks?

An interesting extract from the Windows Services for Unix docs.

"The Interix subsystem shows improvements in all aspects of
performance—ranging from 30 percent improvements in fork and
exec performance..."

It's sales pitch, but I'm trying to verify it (and build parrot
using it).

>
>

> =item System memory allocation routines are threadsafe
>
> We are assuming that the memory allocation system of the base OS is
> threadsafe. While some of the C runtime libraries are notoriously
> thread dangerous, memory allocation code generally is threadsafe, and
> we'll assume that on all platforms. (Though we will, in general,
> manage our own memory)
>

Is there any likelyhood that memory allocation will be hidden behind

macros at two levels:
- ALLOCPOOL() for allocating large chunks of memory (ppols) that are
later sub-allocated (and managed) by:

- SUBALLOC() for sub allocating within a pool

?

>
> =item Each interpreter will have a unique id
>
> This ID will be independent of the process or OS thread, and will be
> constant across the lifetime of the interpreter. Interpreter IDs
> I<may> be reused as interpreters are destroyed and recreated, and as
> such are only guaranteed valid while an interpreter is in use.
>

The provision of method(s) to obtain the native TIDs/HANDLES would

make life for those writing implementation specific extensions easier.

>
>[SNIP] - Too much to understand before commenting...
>

>
> --------------------------------------"it's like this"-------------------
> Dan Sugalski even samurai
> d...@sidhe.org have teddy bears and even
> teddy bears get drunk

Nigel

Leopold Toetsch

unread,

Jan 20, 2004, 7:46:07 AM1/20/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

>=item SHARED THREAD

> A thread that's part of a group of threads sharing a common
> interpreter environment.

I presume that this "group of threads" is one thread pool or
interpreter pool. Could you expand the definition to cover "pool".

>=item Each interpreter show the same process id

The term "process id" is really misleading. Again I presume, that one
pool is ment here:

> All the interpreters within a process will share a process ID.

I'd vote for keeping a process ID what it is and use "pool id" or such
here.

leo

Gordon Henriksen

unread,

Jan 20, 2004, 8:29:35 AM1/20/04

to nigels...@btconnect.com, perl6-i...@perl.org

On Monday, January 19, 2004, at 07:58 , nigels...@btconnect.com
wrote:

> Is there any likelyhood that memory allocation will be hidden behind
> macros at two levels:
> - ALLOCPOOL() for allocating large chunks of memory (ppols) that are
> later sub-allocated (and managed) by:
>
> - SUBALLOC() for sub allocating within a pool

Are you wanting something akin to Apache 2 pools, which are hierarchical
and designed to reduce path length when freeing blocks of objects? For
instance, all objects and sub-pools allocated during an HTTP request
cycle can be deallocated just by free()'ing the top-level request pool.

I don't think parrot could make use of that model, since it can't very
well guarantee that the user cannot retain references past the lifetime
of the pool. Apache trusts modules not to make such errors; parrot can't
trust the byte-code it's executing any further than it can throw it. A
generational collector is a more likely means by which parrot might
reduce memory-related overhead.

—

Gordon Henriksen
mali...@mac.com

Dan Sugalski

unread,

Jan 20, 2004, 8:37:23 AM1/20/04

to l...@toetsch.at, perl6-i...@perl.org

At 1:46 PM +0100 1/20/04, Leopold Toetsch wrote:
>Dan Sugalski <d...@sidhe.org> wrote:
>
>>=item SHARED THREAD
>
>> A thread that's part of a group of threads sharing a common
>> interpreter environment.
>
>I presume that this "group of threads" is one thread pool or
>interpreter pool. Could you expand the definition to cover "pool".

Ah, damn, I needed to fix that up before sending it out. Should've
been "SHARED INTERPRETER". A shared interpreter is one that's in an
interpreter pool.

An interpreter pool is a set of interpreters that share common
resources. They're essentially threads in the OS sense, and they have
shared access to pretty much everything. The memory pools, arenas,
and global namespace are shared--pretty much everything except what's
in the interpreter structure itself.

> >=item Each interpreter show the same process id
>
>The term "process id" is really misleading.

Nope. It's the process ID. Nothing misleading there, unless you've
done threading work under linux, since for a while it gave each
thread a separate PID.

> Again I presume, that one
>pool is ment here:
>
>> All the interpreters within a process will share a process ID.

Nope, again, that's really process. (Again, because of Linux' past
bad behavior reporting prorcess IDs)

Elizabeth Mattijsen

unread,

Jan 20, 2004, 10:36:55 AM1/20/04

to l...@toetsch.at, Dan Sugalski, perl6-i...@perl.org

At 16:24 +0100 1/20/04, Leopold Toetsch wrote:
>Dan Sugalski <d...@sidhe.org> wrote:
>> At 1:46 PM +0100 1/20/04, Leopold Toetsch wrote:
>>>The term "process id" is really misleading.
>
>> Nope. It's the process ID. Nothing misleading there, unless you've
>> done threading work under linux, since for a while it gave each
>> thread a separate PID.
>

>$ ps | grep [p]arrot
>17472 pts/0 00:00:00 parrot
>17473 pts/0 00:00:00 parrot
>17474 pts/0 00:00:00 parrot
>
>So the unless applies ;) This is a single parrot interpreter, with main,
>thread-manager, and event-handler thread. 3 PIDs.

This _all_ depends on which version of Linux you're using. Older
versions don''t do it that way, and newer versions don't do it either
(the specific versions escape me at the moment, but I know RH9 does
_not_ have pids for threads).

I know, because my Thread::Signal module depends on threads having
pids. But fewer and fewer Linux systems "support" it (and Linux was
the only one who worked this way to begin with).

;-(

Liz

Leopold Toetsch

unread,

Jan 20, 2004, 10:24:50 AM1/20/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:
> At 1:46 PM +0100 1/20/04, Leopold Toetsch wrote:
>>The term "process id" is really misleading.

> Nope. It's the process ID. Nothing misleading there, unless you've
> done threading work under linux, since for a while it gave each
> thread a separate PID.

$ ps | grep [p]arrot

17472 pts/0 00:00:00 parrot
17473 pts/0 00:00:00 parrot
17474 pts/0 00:00:00 parrot

So the unless applies ;) This is a single parrot interpreter, with main,
thread-manager, and event-handler thread. 3 PIDs.

leo

Dan Sugalski

unread,

Jan 20, 2004, 10:57:57 AM1/20/04

to l...@toetsch.at, perl6-i...@perl.org

Right. Linux is, as I said, arguably broken here, but I don't want to
argue that point. That's why I specified that the PID is the process
id of whatever instantiated parrot -- in this case, no matter which
thread you're in, when you ask for the pid from parrot you should get
17472. (In this case, at least)

Parrot needs to grab the pid at global initialization time and store
it away for later pid queries.

Leopold Toetsch

unread,

Jan 20, 2004, 11:16:57 AM1/20/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

>=item Interpreter pools will share allocation pools

> All the interpreters in an interpreter pool will share header and
> memory allocation pools.

What about temporary PMCs (or strings)? Evaluating a non-trivial
expression can have lot of these. Each new temp would need a lock on the
memory sub-system. Combined with the cost of:

>=item All shared PMCs must have a threadsafe vtable

> The first thing that any vtable function of a shared PMC must do is to
> aquire the mutex of the PMCs in its parameter list

i.e. typically 3 mutexes, it could be that the vtable of a shared PMC
should just grab one interpreter lock (of the owner of that PMC) and
that not all is shared (especially not the temps).

leo

Dan Sugalski

unread,

Jan 20, 2004, 11:43:00 AM1/20/04

to l...@toetsch.at, perl6-i...@perl.org

At 5:16 PM +0100 1/20/04, Leopold Toetsch wrote:
>Dan Sugalski <d...@sidhe.org> wrote:
>
>>=item Interpreter pools will share allocation pools
>
>> All the interpreters in an interpreter pool will share header and
>> memory allocation pools.
>
>What about temporary PMCs (or strings)?

Those won't get marked as shared unless we're unconditionally marking
things as shared. (Though we may just give 'em a mutex anyway)

> Evaluating a non-trivial
>expression can have lot of these. Each new temp would need a lock on the
>memory sub-system.

Only on allocation. We could have a per-thread freelist if we wanted.
Wouldn't be unreasonable.

> Combined with the cost of:
>
>>=item All shared PMCs must have a threadsafe vtable
>
>> The first thing that any vtable function of a shared PMC must do is to
>> aquire the mutex of the PMCs in its parameter list
>
>i.e. typically 3 mutexes, it could be that the vtable of a shared PMC
>should just grab one interpreter lock (of the owner of that PMC) and
>that not all is shared (especially not the temps).

Yeah, but since PMCs aren't owned by any one interpreter in the pool
but are rather pool-wide you run into either the global-lock problem
(which kills performance on SMP systems) or interpreters potentially
deadlocking on unrelated data access.

Leopold Toetsch

unread,

Jan 20, 2004, 12:42:04 PM1/20/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:
> At 5:16 PM +0100 1/20/04, Leopold Toetsch wrote:
>>What about temporary PMCs (or strings)?

> Those won't get marked as shared unless we're unconditionally marking
> things as shared. (Though we may just give 'em a mutex anyway)

This needs either one check per PMC, if its shared or not, or additional
costs for locking temps. Both is rather expensive, compared to the raw
working functionality of a vtable.

>> Evaluating a non-trivial
>>expression can have lot of these. Each new temp would need a lock on the
>>memory sub-system.

> Only on allocation.

Yes: "Each *new* temp .."

> ... We could have a per-thread freelist if we wanted.
> Wouldn't be unreasonable.

That already smells like separate memory sub-systems. The freelist has
to be filled first. During DOD runs, it has to be refilled. To achieve
some locality of the temp PMCs, you can't just give these to arbitrary
intpreters. Separate free lists seem also to imply separate DOD runs to
cleanup the temps.

>>i.e. typically 3 mutexes, it could be that the vtable of a shared PMC
>>should just grab one interpreter lock (of the owner of that PMC) and
>>that not all is shared (especially not the temps).

> Yeah, but since PMCs aren't owned by any one interpreter in the pool

the owner is for me the interpreter, that did allocate the arena.

> ... but are rather pool-wide you run into either the global-lock problem

> (which kills performance on SMP systems) or interpreters potentially
> deadlocking on unrelated data access.

Could be. But the performance is the overall throughput. When a lot of
fine grained locks (plus memory subsystem locks for temps) take much
longer then one global lock, then that scheme can nethertheless be
slower on SMP machines. It would scale better though for more CPUs.

Deadlocks shouldn't be a problem, if exactly one vtable (like in
SharedRef) just grabs one and only one lock.

leo

Gordon Henriksen

unread,

Jan 20, 2004, 9:12:09 PM1/20/04

to nigels...@btconnect.com, perl6-i...@perl.org

On Tuesday, January 20, 2004, at 07:56 , nigels...@btconnect.com
wrote:

> I believe that parrot already has the concept of memory pools in it's
> memory management. The idea is that by allocating similarly sized
> objects from separate (large) allocations, you can reduce the
> fragmentation of chunks and reduce the incidences where the memory need
> to be GC'd and compacted.

Don't presume that. General purpose memory allocators tend to handle
this scenario... generically. Use of custom allocators is usually a
premature optimization; the best generic memory allocators are very,
very good.

But the ultimate effect of the overuse of custom allocators or pools is
to move memory fragmentation to a higher level, preventing the generic
allocator from addressing it even when the memory allocation patterns
would otherwise allow it.

> If the allocation of pools, and the allocation of bit-of-a-pool, are
> macroised, it makes it possible for OS's where there are multiple APIs
> for memory allocation to bypass the CRT memory allocation routines and
> use which ever native APis are best suited for the type of allocation.

Would much rather see them abstracted, as they are now, beneath an API.
This reduces the number of ways in which compiled parrot extensions can
be incompatible with the parrot core.

> Personally, I would like to see memory allocation for each class type
> be managed by the class constructor itself. This would theoretically
> allow each class that has a fixed instance size to manage it's own pool
> on OS's where that makes sense. The class would allocate a pool for
> itself when loaded and then allocate instances from that pool on new()
> and deallocate upon DESTROY. If it's memory pool was exhausted when new
> was called, it would invoke the GC on *it's pool only*.

A PMC class is free to do precisely that. Only the PMC header cannot be
thus managed, and that's already pooled.

> This separation would mean that each run of the GC would have a much
> smaller pool of memory to compact and garbage collect when it was
> invoked.

This is false. The mark phase will still need to run over the entire
process, else it cannot detect all references into the pool. (Unless you
can provide a type-safety guarantee that your type can only be
referenced by other instances of itself. In which case, your type is
necessarily "garbage" and the best optimization strategy would be
dead-code elimination. :)

I predict the best overall throughput would be with the sweep or copy
phase run immediately after the mark phase, across the entire process:
It would be wasteful to do all that marking and yet leave known garbage
uncollected.

> It would also be less likely to be called, as each allocation from a
> pool of fixed sized sub allocations will only ever need to call the GC
> when it's pool is entirely exhausted.

Statistically, multiple pools will individually exhaust themselves
*MORE* frequently than a global pool. The ideal case for collection
frequency is that there is only one pool, or that all pools rise to
capacity concurrently. In these ideal cases, the segregated pools will
exhaust themselves precisely *AS* frequently as a global pool. In any
case, there is no possibility for a decrease in collection events
through the use of pooled memory. As per above, collection events will
also not become less expensive. Thus, expect garbage collection to have
a negative net impact on performance if pooled memory is used.

> But that is a radical departure :), so if would just like to see
> separate calls for pool allocation/reallocation and element
> allocation/reallocation, rather than having calls to malloc() scattered
> through out the codebase.

Uh. See memory.c and smallobject.c...

Digression:

Most copying collectors avoid free space fragmentation by design.
(That's one of their benefits.) An allocator as might be employed by a
standard compacting, generational system (Java, MS CLR), might resemble
the following:

#define SYS_ALIGN sizeof(double) /* for instance */

struct generation {
volatile void *next;
void *top;
void *base;
}

struct generation *generation_init(size_t capacity) {
struct generation *g = (struct generation *) malloc(sizeof(struct
generation));
g->next = g->base = malloc(capacity);
g->top = g->base + capacity;
}

void* generation_alloc(struct generation* p, size_t len) {
len = (len + SYS_ALIGN - 1) & ~(SYS_ALIGN - 1); /* round up */
do {
void *ptr = ATOMIC_ADD(&p->next, len);
if (ptr <= p->top) {
return ptr - len; /* break the loop */
}
gc();
} while (true); /* try again */
}

+ No free space fragmentation at all.
+ Super-duper fast.
+ Threadsafe.
+ Lock-free unless ATOMIC_ADD cannot be implemented without a mutex.
- There is copying overhead when the generation is exhausted.
- One could say that the generation is fragmented by garbage.
+ It is no more fragmented by garbage than a GC system which uses a
freelist allocator.

—

Gordon Henriksen
mali...@mac.com

nigels...@btconnect.com

unread,

Jan 20, 2004, 7:56:30 PM1/20/04

to Gordon Henriksen, perl6-i...@perl.org

20/01/2004 13:29:35, Gordon Henriksen <mali...@mac.com> wrote:

>On Monday, January 19, 2004, at 07:58 , nigels...@btconnect.com
>wrote:
>
>> Is there any likelyhood that memory allocation will be hidden behind
>> macros at two levels:
>> - ALLOCPOOL() for allocating large chunks of memory (ppols) that are
>> later sub-allocated (and managed) by:
>>
>> - SUBALLOC() for sub allocating within a pool
>
>Are you wanting something akin to Apache 2 pools, which are hierarchical
>and designed to reduce path length when freeing blocks of objects? For
>instance, all objects and sub-pools allocated during an HTTP request
>cycle can be deallocated just by free()'ing the top-level request pool.
>

Nothing to do with Apache memory pools.

I believe that parrot already has the concept of memory pools in it's
memory management. The idea is that by allocating similarly sized objects
from separate (large) allocations, you can reduce the fragmentation of
chunks and reduce the incidences where the memory need to be GC'd and
compacted.

Allocating an 8 byte chunk from a common memory pool is quite likely to
nip a little off from a previously freed large chunk. When it comes time
reallocate another chunk the same size as that large, freed chunk, although
there is enough room in the over all freespace chain to accommodate it, the
largest available chunk is now 8 bytes or so too small for the requirement.

That induces either a compaction cycle or the need to extend the memory pool
by the size of the large request.

Allocating all small requests from the same pool, and large from another
pool means that you are less likely to fragment the memory and more likely
to be able to re-use an existing slot in the free-space chain for any
given request.

If the allocation of pools, and the allocation of bit-of-a-pool, are
macroised, it makes it possible for OS's where there are multiple APIs
for memory allocation to bypass the CRT memory allocation routines and
use which ever native APis are best suited for the type of allocation.

Personally, I would like to see memory allocation for each class type

be managed by the class constructor itself. This would theoretically allow
each class that has a fixed instance size to manage it's own pool on OS's
where that makes sense. The class would allocate a pool for itself when
loaded and then allocate instances from that pool on new() and deallocate
upon DESTROY. If it's memory pool was exhausted when new was called,
it would invoke the GC on *it's pool only*.

This separation would mean that each run of the GC would have a much smaller
pool of memory to compact and garbage collect when it was invoked. It would

also be less likely to be called, as each allocation from a pool of fixed
sized sub allocations will only ever need to call the GC when it's pool is
entirely exhausted.

But that is a radical departure :), so if would just like to see separate calls

for pool allocation/reallocation and element allocation/reallocation, rather
than having calls to malloc() scattered through out the codebase.

>

>I don't think parrot could make use of that model, since it can't very
>well guarantee that the user cannot retain references past the lifetime
>of the pool. Apache trusts modules not to make such errors; parrot can't
>trust the byte-code it's executing any further than it can throw it. A
>generational collector is a more likely means by which parrot might
>reduce memory-related overhead.
>
>
>
>Gordon Henriksen
>mali...@mac.com
>
>

Nigel/

Leopold Toetsch

unread,

Jan 21, 2004, 3:47:29 AM1/21/04

to nigels...@btconnect.com, perl6-i...@perl.org

nigels...@btconnect.com <nigels...@btconnect.com> wrote:

> I believe that parrot already has the concept of memory pools in it's
> memory management. The idea is that by allocating similarly sized objects

The problem is not in the fixed sized header pools, its with the
*memory* pool used e.g for string memory.

During GC we are walking the *header* pools, and if we find its buffer
memory being used, we move that buffer memory to a new store, thereby
compacting it. The old memory store(s) are freed then.

So string memory can move around beyond your code.

> Nigel/

leo

Leopold Toetsch

unread,

Jan 21, 2004, 4:25:26 AM1/21/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

One more question:

>=head2 Guarantees

... doesn't have anything about user data integrity. So when 2 threads
access a PerlNum, they could get a mixture of the typically 2 involved
words.

But:

>=item All shared PMCs must have a threadsafe vtable

> The first thing that any vtable function of a shared PMC must do is to
> aquire the mutex of the PMCs in its parameter list

... seems to indicate that even whole ops like add P,P,P are atomic.

And how does user level locking play together with that?

leo

nigels...@btconnect.com

unread,

Jan 20, 2004, 10:46:18 PM1/20/04

to Gordon Henriksen, perl6-i...@perl.org

21/01/2004 02:12:09, Gordon Henriksen <mali...@mac.com> wrote:

> This is false. The mark phase will still need to run over the entire
> process, else it cannot detect all references into the pool.
>

If by reference, you mean address, then that is true.

If when a reference is taken, the address of the referent is
stored in arbitrary other data structures, then all memory
be scanned by the GC,

However, if references were not addresses, but an index into a
table of addresses, then only the table need be scanned.

When a reference, or the thing holding it, is deleted, then
the indexed slot in the address table is blanked and subsequent
GC passes looking for references to a referent will no longer
find the reference in the table.

With the addition of a reference count to the table, all references
to a single entity can share the same index, but now I'm groping
back in the direction of reference counting, which is not flavour
of the month:)

Nigel.

Dan Sugalski

unread,

Jan 21, 2004, 1:14:46 PM1/21/04

to l...@toetsch.at, perl6-i...@perl.org

At 10:25 AM +0100 1/21/04, Leopold Toetsch wrote:
>Dan Sugalski <d...@sidhe.org> wrote:
>
>One more question:
>
>>=head2 Guarantees
>
>... doesn't have anything about user data integrity. So when 2 threads
>access a PerlNum, they could get a mixture of the typically 2 involved
>words.

Potentially, yeah, though it's really unlikely.

>But:
>
>>=item All shared PMCs must have a threadsafe vtable
>
>> The first thing that any vtable function of a shared PMC must do is to
>> aquire the mutex of the PMCs in its parameter list
>
>... seems to indicate that even whole ops like add P,P,P are atomic.

Yep. They have to be, because they need to guarantee the integrity of
the pmc structures and the data hanging off them (which includes
buffer and string stuff)

>And how does user level locking play together with that?

I've not decided -- That was something I thought we might hash out as
we abused the first half of the design doc. Personally I'm all for
the user and low-level lock being the same thing, but that doesn't
have to be the case. There are advantages and disadvantages to either
way of doing things.

Leopold Toetsch

unread,

Jan 21, 2004, 6:15:20 PM1/21/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

[ No Guarantees WRT data access }

>>... seems to indicate that even whole ops like add P,P,P are atomic.

> Yep. They have to be, because they need to guarantee the integrity of
> the pmc structures and the data hanging off them (which includes
> buffer and string stuff)

But isn't that a contradiction? Or better: When even an opcode like
above is atomic, that an access to a shared PerlNum should be guaranteed
being atomic too.

leo

Damien Neil

unread,

Jan 21, 2004, 4:32:52 PM1/21/04

to perl6-i...@perl.org

On Wed, Jan 21, 2004 at 01:14:46PM -0500, Dan Sugalski wrote:
> >... seems to indicate that even whole ops like add P,P,P are atomic.
>
> Yep. They have to be, because they need to guarantee the integrity of
> the pmc structures and the data hanging off them (which includes
> buffer and string stuff)

Personally, I think it would be better to use corruption-resistant
buffer and string structures, and avoid locking during basic data
access. While there are substantial differences in VM design--PMCs
are much more complicated than any JVM data type--the JVM does provide
a good example that this can be done, and done efficiently.

Failing this, it would be worth investigating what the real-world
performance difference is between acquiring multiple locks per VM
operation (current Parrot proposal) vs. having a single lock
controlling all data access (Python) or jettisoning OS threads
entirely in favor of VM-level threading (Ruby). This forfeits the
ability to take advantage of multiple CPUs--but Leopold's initial
timing tests of shared PMCs were showing a potential 3-5x slowdown
from excessive locking.

I've seen software before that was redesigned to take advantage of
multiple CPUs--and then required no less than four CPUs to match
the performance of the older, single-CPU version. The problem was
largely attributed to excessive locking of mostly-uncontested data
structures.

- Damien

Dan Sugalski

unread,

Jan 22, 2004, 1:10:26 PM1/22/04

to l...@toetsch.at, perl6-i...@perl.org

Sure, but there's a *lot* more to user data integrity than atomic
access to individual pieces. That's the least of the problem. The
user data issue is one where you have multiple pieces being updated,
or one piece being updated multiple times--that's the stuff we're not
guaranteeing.

So, while we will make sure that storing a single value into a hash
happens atomically, we won't guarantee that a series of stores into
the hash, or a combination of loads and stores, or even a combination
of reads and writes to a scalar, will happen atomically.

Dan Sugalski

unread,

Jan 22, 2004, 1:22:22 PM1/22/04

to Gordon Henriksen, perl6-i...@perl.org

[Note to everyone -- I'm digging through my mail so be prepared for a
potential set of responses to things that're already answered...]

At 6:37 PM -0500 1/19/04, Gordon Henriksen wrote:
>Dan Sugalski wrote:
>
>> For a copying collector to work, all the mutators must be blocked,
>> and arguably all readers should be blocked as well.
>
>True of non-moving collectors, too. Or, let me put it this way: non-
>copying *GC* (the sweep or copy phase) can be threadsafe, but the mark
>phase is never threadsafe. The method in which marking is not
>threadsafe is a bit more pathological (i.e., it's not the common case
>as it is with the copying collector), but a standard tracing DOD
>cannot be correct when competeting with mutators. It WILL collect non-
>garbage (those are MUTATORS, remember), and the result WILL be
>Heizenbugs and crashes.
>
>Some of what I've written up addresses why. It's pretty simple to
>demonstrate a single case to prove the point, but I don't feel like
>re-creating the ASCII art right now. :) I'll send that section when I
>get out of the office.
>
>parrot will have to be able to suspend all threads in the environment.
>Unfortunate, but really quite unavoidable.

I'm not sure that's the case. What we need to do is suspend metadata
mutation--that is, buffers can't be resized while a gc run is in
progress. Other than that, if we have guarantees of aligned atomic
pointer writes (so we don't have word shearing to deal with) we don't
have to stop the mutation of the contents of the buffers themselves.

The only tricky bit comes in with the examination of the root set of
other threads--accessing the hardware register contents of another
running thread may be... interesting. (Which, I suppose, argues for
some sort of volatile marking of the temp variables)

Dan Sugalski

unread,

Jan 22, 2004, 1:36:59 PM1/22/04

to Dave Whipp, perl6-i...@perl.org

At 4:59 PM -0800 1/19/04, Dave Whipp wrote:
>"Dan Sugalski" <d...@sidhe.org> wrote:
>> =head2 Guarantees
>
>Maybe this isn't strictly a threading thing, but are we going to make any
>guarantees about event orderings? For example, will we guarantee that a
>sequence of events send from one thread to another will always be received
>in the order they are sent?

Hrm. I suppose we really ought to, so we will. If we prioritize
events (and I'm still torn) then they'll be in priority then send
order.

Leopold Toetsch

unread,

Jan 22, 2004, 4:40:31 PM1/22/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

> The only tricky bit comes in with the examination of the root set of
> other threads--accessing the hardware register contents of another
> running thread may be... interesting. (Which, I suppose, argues for
> some sort of volatile marking of the temp variables)

You'll provide the "interesting" part, that is:

use Psi::Estimate::CPU_Register_Changes_in_Future_till_mark_is_done;

SCNR, leo

Dan Sugalski

unread,

Jan 23, 2004, 8:54:50 AM1/23/04

to l...@toetsch.at, perl6-i...@perl.org

Nah, no need for that one. I need to go back and recheck the stuff
that Gordon posted in case I missed something, but if you put a lock
on the arena allocator this isn't an issue.

Dan Sugalski

unread,

Jan 23, 2004, 11:09:04 AM1/23/04

to Gordon Henriksen, perl6-i...@perl.org

At 12:04 AM -0500 1/20/04, Gordon Henriksen wrote:
>On Monday, January 19, 2004, at 06:37 , Gordon Henriksen wrote:
>
>>Dan Sugalski wrote:
>>
>>>For a copying collector to work, all the mutators must be blocked,
>>>and arguably all readers should be blocked as well.
>>
>>True of non-moving collectors, too. [...]
>>
>>Some of what I've written up addresses why. [...] I'll send that
>>section when I get out of the office.
>
>Consider this simple object graph:
>

>C was not marked reachable (although it was) and was thus
>erroneously collected, leaving a dangling pointer. This problem
>applies equally to copying and mark-sweep collectors.

Ah, OK, I see. The problem comes in where we've got an object in the
transient root set (basically the processor stack and registers) that
gets anchored into the base root set (stash, pads, or whatever) after
the DOD has traced where it's going into and falls out of the
transient root set before the DOD traces over to it.

Race condition. Dammit.

Okay, I'd not wrapped my brain around that possibility, which will
make for some interesting DOD tracing, especially on SMP systems. I
was *really* hoping a single lock on the arena allocation system that
the DOD held onto while tracing would be sufficient, but I see that
it isn't.

That means we're going to have to have either a really forgiving DOD
system that takes multiple passes before it collects up a PMC or
buffer (which still isn't safe) or have some way to force a
low-overhead rendezvous.

The obvious rendezvous point is the arena lock, but that's going to
see a lot of contention anyway, and we'd as soon not have a single
one for speed reasons. Feh.

Okay, we're going to mandate that we have read/write locks, each
interpreter pool has one, and mutating vtable entries must get a read
lock on the pool read/write lock. Pricey (ick) but isolated to
mutators. The DOD gets a write lock on it, which'll block all
read/write access so no mutators can be in process while the pool DOD
runs.

I think that'll work. The .1j thread spec requires r/w locks, and we
can fake them on platforms that don't implement them. Hopefully
Nigel's got the Windows scoop so we can see if Win32 has anything
like this (which I'd expect it does)

Leopold Toetsch

unread,

Jan 23, 2004, 11:56:03 AM1/23/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:
> At 12:04 AM -0500 1/20/04, Gordon Henriksen wrote:
>>C was not marked reachable (although it was) and was thus
>>erroneously collected, leaving a dangling pointer. This problem
>>applies equally to copying and mark-sweep collectors.

> Ah, OK, I see.

That is well knowm in the literature as "Tri-Color Invariant": Black are
the already marked (live) PMCs, grey the PMCs on the next_for_GC list,
white the not yet reached PMCs. The strong tri-color invariants states
that no black object may point to a white object, the weak invariant
states, that at least one path from the black to a white object must
contain a grey one.

This can be handled by either stop the world GCs or by intercepting each
read or write access that would change the color of an object and update
the color accordingly. This is e.g. used for incremental GC. As soon as
we have a thread in the background that runs GC, we have to cope with
these issues.

> That means we're going to have to have either a really forgiving DOD
> system that takes multiple passes before it collects up a PMC or
> buffer (which still isn't safe)

Alas not an alternative, it doesn't work.

> ... or have some way to force a
> low-overhead rendezvous.

> Okay, we're going to mandate that we have read/write locks, each

> interpreter pool has one, and mutating vtable entries must get a read
> lock on the pool read/write lock. Pricey (ick) but isolated to
> mutators. The DOD gets a write lock on it, which'll block all
> read/write access so no mutators can be in process while the pool DOD
> runs.

Stopping all interpreters seems to be cheaper. The rwlock will sooner or
later stop all interpreters anyway (on first PMC access), so we can omit
the price for the rwlock and just hold the world(s).

An alternative would be real background incremental GC, *when* running
multiple threads. I estimate the overhead to be in regions of a rwlock
(with no contention of course).

leo

Dan Sugalski

unread,

Jan 23, 2004, 12:21:29 PM1/23/04

to l...@toetsch.at, perl6-i...@perl.org

At 5:56 PM +0100 1/23/04, Leopold Toetsch wrote:
>Dan Sugalski <d...@sidhe.org> wrote:
>> At 12:04 AM -0500 1/20/04, Gordon Henriksen wrote:
>>>C was not marked reachable (although it was) and was thus
>>>erroneously collected, leaving a dangling pointer. This problem
>>>applies equally to copying and mark-sweep collectors.
>
>> Ah, OK, I see.
>
>That is well knowm in the literature as "Tri-Color Invariant": Black are
>the already marked (live) PMCs, grey the PMCs on the next_for_GC list,
>white the not yet reached PMCs. The strong tri-color invariants states
>that no black object may point to a white object, the weak invariant
>states, that at least one path from the black to a white object must
>contain a grey one.
>
>This can be handled by either stop the world GCs or by intercepting each
>read or write access that would change the color of an object and update
>the color accordingly. This is e.g. used for incremental GC. As soon as
>we have a thread in the background that runs GC, we have to cope with
>these issues.

Yeah, point. And since we want to be able to have an incremental DOD
at some point we need to get support for it in now.

> > ... or have some way to force a
>> low-overhead rendezvous.
>
>> Okay, we're going to mandate that we have read/write locks, each
>> interpreter pool has one, and mutating vtable entries must get a read
>> lock on the pool read/write lock. Pricey (ick) but isolated to
>> mutators. The DOD gets a write lock on it, which'll block all
>> read/write access so no mutators can be in process while the pool DOD
>> runs.
>
>Stopping all interpreters seems to be cheaper. The rwlock will sooner or
>later stop all interpreters anyway (on first PMC access), so we can omit
>the price for the rwlock and just hold the world(s).

The rwlock only stops all the interpreters when the DOD runs.
Anything that mutates a PMC gets a *read* lock so that they don't
interfere with each other, and only pause if the DOD is running. The
DOD getting a *write* lock will block any read lock attempts, so when
the DOD is running no mutation can take place. Since mutation doesn't
require any global exclusion it doesn't need a write lock -- the read
lock is sufficient.

>An alternative would be real background incremental GC, *when* running
>multiple threads. I estimate the overhead to be in regions of a rwlock
>(with no contention of course).

If we have the facilities to do incremental DOD runs then this is
definitely a possibility except for finalizers. Finalizers make
things interesting, though if the background thread doing the DOD is
a member of the interpreter pool then it'd work out OK.

Gordon Henriksen

unread,

Jan 23, 2004, 11:26:42 PM1/23/04

to Dan Sugalski, perl6-i...@perl.org

On Friday, January 23, 2004, at 11:09 , Dan Sugalski wrote:

> Ah, OK, I see. The problem comes in where we've got an object in the
> transient root set (basically the processor stack and registers) that
> gets anchored into the base root set (stash, pads, or whatever) after
> the DOD has traced where it's going into and falls out of the transient
> root set before the DOD traces over to it.

(Worse than that. It could come from any untraced location—or possibly
even be brand new, depending upon memory allocation details.)

—

Gordon Henriksen
mali...@mac.com

Leopold Toetsch

unread,

Jan 24, 2004, 9:08:08 AM1/24/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:
> At 5:56 PM +0100 1/23/04, Leopold Toetsch wrote:

>>Stopping all interpreters seems to be cheaper. The rwlock will sooner or
>>later stop all interpreters anyway (on first PMC access), so we can omit
>>the price for the rwlock and just hold the world(s).

> The rwlock only stops all the interpreters when the DOD runs.

Sure. But that would still need to aquire a readers rwlock for each PMC
access. This is more expensive then a mutex. During DOD any PMC access
will halt the interpreter, so we can do that explicitely too and save
the cost for the rwlock overhead. Albeit I can imagine, that aggregates
will need a rwlock anyway.

[ incremental GC ]

> If we have the facilities to do incremental DOD runs then this is
> definitely a possibility except for finalizers. Finalizers make
> things interesting, though if the background thread doing the DOD is
> a member of the interpreter pool then it'd work out OK.

Fianlizers and incremental DOD don't play together. The DOD must run to
end to be sure, that the objects isn't referenced any more.

leo

Dan Sugalski

unread,

Jan 29, 2004, 4:55:10 AM1/29/04

to l...@toetsch.at, perl6-i...@perl.org

At 3:08 PM +0100 1/24/04, Leopold Toetsch wrote:
>Dan Sugalski <d...@sidhe.org> wrote:
>> At 5:56 PM +0100 1/23/04, Leopold Toetsch wrote:
>
>>>Stopping all interpreters seems to be cheaper. The rwlock will sooner or
>>>later stop all interpreters anyway (on first PMC access), so we can omit
>>>the price for the rwlock and just hold the world(s).
>
>> The rwlock only stops all the interpreters when the DOD runs.
>
>Sure. But that would still need to aquire a readers rwlock for each PMC
>access. This is more expensive then a mutex. During DOD any PMC access
>will halt the interpreter, so we can do that explicitely too and save
>the cost for the rwlock overhead. Albeit I can imagine, that aggregates
>will need a rwlock anyway.

Well... only the mutating vtable entries need to get the lock, so
that reduces the expense somewhat. Still, I agree, it may be
untenably expensive.

>[ incremental GC ]
>
>> If we have the facilities to do incremental DOD runs then this is
>> definitely a possibility except for finalizers. Finalizers make
>> things interesting, though if the background thread doing the DOD is
>> a member of the interpreter pool then it'd work out OK.
>
>Fianlizers and incremental DOD don't play together. The DOD must run to
>end to be sure, that the objects isn't referenced any more.

Finalizers and incremental DOD work just fine together. At some point
the incremental DOD will figure out that something's dead, just as
the stop-the-world DOD will. It just may take a bit longer.

Leopold Toetsch

unread,

Jan 30, 2004, 10:38:36 AM1/30/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:
> At 3:08 PM +0100 1/24/04, Leopold Toetsch wrote:

>>Fianlizers and incremental DOD don't play together. The DOD must run to
>>end to be sure, that the objects isn't referenced any more.

> Finalizers and incremental DOD work just fine together. At some point
> the incremental DOD will figure out that something's dead, just as
> the stop-the-world DOD will. It just may take a bit longer.

I wanted to say: "Finalizers & destructors of PMCs that need timely
destruction ...". In the case of dead objects at scope exit.

leo