threads and shared interpreter data structures

Leopold Toetsch

unread,

Dec 21, 2003, 5:09:10 AM12/21/03

to P6I, Dan Sugalski

I'm currently investigating various issues related to internal
interpreter data structures and multiple threads.
Here is one, that need some design decision:

Parrot_base_vtables[] (the master array of all registered vtables) is
currently a true global. This causes nice errors and segfaults on
i386/JIT, when NCI calls are JITted, because the JITted method call stub
is located in that global Parrot_base_vtables[].
(Currently Parrot_base_vtables[] is checked to be set only once and
JITted NCI ist disabled).

The question is: Should Parrot_base_vtables[] be a real global or per
interpreter? This is of course related to the question, how dynamic do
we want to have registered PMCs, that is: Can different threads register
different PMCs? Do ParrotClass objects get an entry in that table
(currently yes)? The class_hash is per interpreter, so this doesn't play
together either.

Trying an answer:
*If* ParrotClasses are per thread (very likely yes - different threads
might create different objects of different classes dynamically) *and
if* Parrot_class_register() creates entries in Parrot_base_vtables[],
then this structure has to be per interpreter too.

This also implies, that we have to lookup dynamically registered PMCs
and ParrotClasses per name and not per ID, at least for the general case.

Comments very welcome,
leo

Elizabeth Mattijsen

unread,

Dec 21, 2003, 6:15:09 AM12/21/03

to Leopold Toetsch, P6I, Dan Sugalski

Please note that these are comments from a Parrot list lurker and
outsider, but also as someone with some hands on experience with Perl
threads... ;-) And probably stating the bleedingly obvious.

At 11:09 +0100 12/21/03, Leopold Toetsch wrote:
> *If* ParrotClasses are per thread (very likely yes - different
>threads might create different objects of different classes
>dynamically) *and if* Parrot_class_register() creates entries in
>Parrot_base_vtables[], then this structure has to be per interpreter
>too.

I agree.

Ideally I'd see a COWed structure: a thread startup would not
actually copy the main vtable structure. As soon as something needs
to be specific for the thread, only then needs that structure to be
copied for the thread (and possibly only that part that is actually
different from the thread a thread is inheriting from.

The main problem with Perl 5 ithreads is the thread startup CPU and
memory usage. It's what makes Perl 5 ithreads _very_ hard to use in
a production environment. That overhead needs to be prevented at all
costs in Parrot.

Hope this made sense.

Liz

Leopold Toetsch

unread,

Dec 21, 2003, 8:17:09 AM12/21/03

to Elizabeth Mattijsen, perl6-i...@perl.org

Elizabeth Mattijsen <l...@dijkmat.nl> wrote:

Hi Liz,

> Please note that these are comments from a Parrot list lurker and
> outsider, but also as someone with some hands on experience with Perl
> threads... ;-) And probably stating the bleedingly obvious.

I'm appreciating your (and of course other) comments very much, the more
that my experience with the topic and all related issues is very
limited.

> At 11:09 +0100 12/21/03, Leopold Toetsch wrote:
>> *If* ParrotClasses are per thread (very likely yes - different
>>threads might create different objects of different classes
>>dynamically) *and if* Parrot_class_register() creates entries in
>>Parrot_base_vtables[], then this structure has to be per interpreter
>>too.

> I agree.

> Ideally I'd see a COWed structure: a thread startup would not
> actually copy the main vtable structure. As soon as something needs
> to be specific for the thread, only then needs that structure to be
> copied for the thread (and possibly only that part that is actually
> different from the thread a thread is inheriting from.

Sounds reasoanble, yes. Given, that we go for the dynamic case cited
above, I can imagine that we finally have:
1) a real global Parrot_base_vtables[] containing pointers to the local
*static* temp_base_vtable inside the PMC classes C file.
2) another per interpreter vtable array holding all dynamic entries

Access to the former is per index like now, for the latter, the index
has to be obtained once at tuntime per name lookup.
1) is for all static classes: PMCs inside parrot and ParrotClasses
loaded into the first interpreter from the PBCs metadata.

And:

3) a separate array for the JITed method stubs, which are per interpeter
and only if the platform can generate such stubs on the fly.

> The main problem with Perl 5 ithreads is the thread startup CPU and
> memory usage. It's what makes Perl 5 ithreads _very_ hard to use in
> a production environment. That overhead needs to be prevented at all
> costs in Parrot.

That are additional issues, which I'll address later. There are no
shared PMCs yet in Parrot. But what I currently know is: a single
threaded parrot which can handle huge PMC amounts fast, should use huge
arena memory pools, and worse (when ARENA_DOD_FLAGS is on) fixed sized
pools[1]. This doesn't cooperate with a heavily multi-threaded usage,
where each thread doesn't deal with a lot of data.

[1] I already have some ideas, to use variable sized pools though ...

> Hope this made sense.

Very much, thank you, yes.

> Liz

leo

Elizabeth Mattijsen

unread,

Dec 21, 2003, 9:32:08 AM12/21/03

to l...@toetsch.at, perl6-i...@perl.org

At 14:17 +0100 12/21/03, Leopold Toetsch wrote:

>Elizabeth Mattijsen <l...@dijkmat.nl> wrote:
> > Ideally I'd see a COWed structure: a thread startup would not
>> actually copy the main vtable structure. As soon as something needs
>> to be specific for the thread, only then needs that structure to be
>> copied for the thread (and possibly only that part that is actually
> > different from the thread a thread is inheriting from.
>Sounds reasoanble, yes. Given, that we go for the dynamic case cited
>above, I can imagine that we finally have:
>1) a real global Parrot_base_vtables[] containing pointers to the local
> *static* temp_base_vtable inside the PMC classes C file.
>2) another per interpreter vtable array holding all dynamic entries
>
>Access to the former is per index like now, for the latter, the index
>has to be obtained once at tuntime per name lookup.

I'm not sure the per interpreter vtable array should exist at all
until it is needed. One of the things people use threads for, is to
have many, many worker threads doing the same thing. The way to do
this would be to start 1 thread that sets up all of the specific
worker related changes to the global environment, which would set up
the per interpreter vtable array. Then all threads started from that
thread would use exactly _that_ structure (without any copying or
cloning or anything). That is, if I'm understanding what vtables do
correctly.

Not copying anything is what threaded applications in general make
such a good thing (and copying everything for each thread is
currently what makes Perl 5 ithreads such a badly performing threads
implementation).

>1) is for all static classes: PMCs inside parrot and ParrotClasses
>loaded into the first interpreter from the PBCs metadata.
>
>And:
>
>3) a separate array for the JITed method stubs, which are per interpeter
> and only if the platform can generate such stubs on the fly.
>
>> The main problem with Perl 5 ithreads is the thread startup CPU and
>> memory usage. It's what makes Perl 5 ithreads _very_ hard to use in
>> a production environment. That overhead needs to be prevented at all
>> costs in Parrot.
>
>That are additional issues, which I'll address later. There are no
>shared PMCs yet in Parrot. But what I currently know is: a single
>threaded parrot which can handle huge PMC amounts fast, should use huge
>arena memory pools, and worse (when ARENA_DOD_FLAGS is on) fixed sized
>pools[1]. This doesn't cooperate with a heavily multi-threaded usage,
>where each thread doesn't deal with a lot of data.

Indeed.

I'm someone who started in the PC-DOS world, migrated to *nix about
10 years ago and started with Perl threads about 1.5. years ago. I
know a lot of how _not_ to do threads. Maybe someone with more
experience on inherently threaded systems, suchas Win32 and OS/2
should speak up.

Liz

Sterling Hughes

unread,

Dec 21, 2003, 8:12:04 AM12/21/03

to Leopold Toetsch, P6I, Dan Sugalski

From a PHP perspective, having it per-thread would be a very good thing.

If you dl() (PHP's version) an extension it should *only* be available
to the execution path that called it. COW would be fine for that - and
shouldn't yield too much of a performance decrement.

At least that's the PHP 2c. :)

-sterling

Leopold Toetsch

unread,

Dec 21, 2003, 2:36:54 PM12/21/03

to Elizabeth Mattijsen, perl6-i...@perl.org

Elizabeth Mattijsen <l...@dijkmat.nl> wrote:
> At 14:17 +0100 12/21/03, Leopold Toetsch wrote:
>>1) a real global Parrot_base_vtables[] containing pointers to the local
>> *static* temp_base_vtable inside the PMC classes C file.
>>2) another per interpreter vtable array holding all dynamic entries
>>
>>Access to the former is per index like now, for the latter, the index
>>has to be obtained once at tuntime per name lookup.

> I'm not sure the per interpreter vtable array should exist at all
> until it is needed.

It is there - emmpty, NULL ;-) Then, after registering a dynamic type,
its allocated and filled with data.
When then a thread is started, this tables is copied - COW at best.

> ... One of the things people use threads for, is to

> have many, many worker threads doing the same thing. The way to do
> this would be to start 1 thread that sets up all of the specific
> worker related changes to the global environment, which would set up
> the per interpreter vtable array. Then all threads started from that
> thread would use exactly _that_ structure (without any copying or
> cloning or anything). That is, if I'm understanding what vtables do
> correctly.

Yes, but we don't know in advance, if any thread will register some
class and another thread a different one. Its unlikely, so a COWed copy
fits best.

> Not copying anything is what threaded applications in general make
> such a good thing (and copying everything for each thread is
> currently what makes Perl 5 ithreads such a badly performing threads
> implementation).

I don't see a big problem with interpreter internal data structures.
Copying user data i.e. PMCs can be time consuming. E.g. you have a huge
set of input data, and you start some threads, to work on different
parts on that input data. Ideally these data should be marked shared and
constant to avoid copying as well as locking - or the optimizer is able
to analyse all usage of these input data and doesn't see any alteration.

> Liz

leo

Leopold Toetsch

unread,

Dec 21, 2003, 5:07:29 PM12/21/03

to Elizabeth Mattijsen, perl6-i...@perl.org

Elizabeth Mattijsen wrote:

> Whenever a PMC gets "shared", it would get changed read and write
> function slots of the applicable vtable entries that would add mutexes
> around all accesses to that PMC.

Yep. That was my conclusion too. A shared PMC will be a variant of the
plain one, where all vtable methods that change something are protected
by mutxes. Additionally I think its best, to put these PMCs into a
separate PMC memory arena, because DOD on shared PMCs is a bit different
- but I din't think about that very much yet.

> Most important thing to note here is that changes are made only when
> something is happening inside the thread, _not_ when a thread is started
> (apart from the adaptation of the "write" functions when the very first
> child thread is started).

With that approach, we violate one Parrot paradigma: "The address of a
PMC doesn't change during its lifetime". So I don't see currently a way
to avoid copying non-constant non-shared PMCs on thread creation.

> Hope I had enough understanding of vtables and PMC to make sense.

It makes sense, I'd rather have such COWed PMC write semantics, but I
fear "The big cheese"[1] will not like it.

> Liz

[1] s. CREDITS
leo

Jeff Clites

unread,

Dec 21, 2003, 8:31:38 PM12/21/03

to Leopold Toetsch, P6I Internals

It sounds like an assumption here is that separate threads get separate
interpreter instances. I would have thought that a typical
multithreaded program would have one interpreter instance and multiple
threads (sharing that instance). I would think of separate interpreter
instances as the analog of separate independent processes (at the Unix
level), and that threads would be something more lightweight than that.
There would be _some_ structure which is per-thread, but not logically
the whole interpreter.

JEff

Dan Sugalski

unread,

Dec 21, 2003, 8:44:17 PM12/21/03

to Jeff Clites, Leopold Toetsch, P6I Internals

Been there, done that, got the scars to prove it. Doesn't work well
in a system with guarantees of internal consistency and core data
elements that are too large for atomic access by the processor. (And,
while you can lock things, it gets really, really, *really* slow...)

We need to talk about threads, thread pools, and whatnot, but not
until after the holiday, so it'll have to wait until tomorrow.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Jeff Clites

unread,

Dec 21, 2003, 9:38:17 PM12/21/03

to Dan Sugalski, Leopold Toetsch, P6I Internals

On Dec 21, 2003, at 5:44 PM, Dan Sugalski wrote:

> At 5:31 PM -0800 12/21/03, Jeff Clites wrote:
>> It sounds like an assumption here is that separate threads get
>> separate interpreter instances. I would have thought that a typical
>> multithreaded program would have one interpreter instance and
>> multiple threads (sharing that instance). I would think of separate
>> interpreter instances as the analog of separate independent processes
>> (at the Unix level), and that threads would be something more
>> lightweight than that. There would be _some_ structure which is
>> per-thread, but not logically the whole interpreter.
>
> Been there, done that, got the scars to prove it. Doesn't work well in
> a system with guarantees of internal consistency and core data
> elements that are too large for atomic access by the processor. (And,
> while you can lock things, it gets really, really, *really* slow...)

1) It seems like you've made a leap here. I don't see how the need to
guarantee the internal consistency of things such as strings directly
implies per-thread allocation pools or file-descriptor tables. Having
non-shared HLL-accessible data doesn't imply non-shared internal data
structures.

2) Separate issue really, but: How can a language such as Java, which
doesn't have inter-thread data-access restrictions, be implemented on
top of Parrot?

JEff

Leopold Toetsch

unread,

Dec 22, 2003, 10:27:04 AM12/22/03

to Elizabeth Mattijsen, perl6-i...@perl.org

Elizabeth Mattijsen <l...@dijkmat.nl> wrote:

> The main problem with Perl 5 ithreads is the thread startup CPU and
> memory usage.

I've now compiled a threaded perl5.8.0 and benchmarked prime-pthread
from perlthrtut against parrot:

$ time parrot t.imc >1

real 0m1.044s
user 0m0.730s
sys 0m0.290s

$ time ./t.pl>1

real 0m8.574s
user 0m8.070s
sys 0m0.480s

"t.imc" is F<examples/assembly/thr-primes.imc>, but producing primes up
to 1000 too. Parrot is built unoptimized and runs the slow
bounds-checking core. My system is i386/linux (2.2), Athlon 800, 256 Meg
RAM (ulimited to 200 Meg).

That's about 8 times faster (parrot doesn't copy all its state, but the
test program doesn't have much) but sill dog slow. Starting and joining
1000 threads in parrot takes about 5 secs on my system while doing 1000
pthread_create/pthread_join in C takes only ~ 0.1 sec.

Its of course far too early to do benchmarks, but interesting ...

> Liz

leo

Dan Sugalski

unread,

Dec 22, 2003, 11:00:46 AM12/22/03

to Jeff Clites, Leopold Toetsch, P6I Internals

At 6:38 PM -0800 12/21/03, Jeff Clites wrote:
>On Dec 21, 2003, at 5:44 PM, Dan Sugalski wrote:
>
>>At 5:31 PM -0800 12/21/03, Jeff Clites wrote:
>>>It sounds like an assumption here is that separate threads get
>>>separate interpreter instances. I would have thought that a
>>>typical multithreaded program would have one interpreter instance
>>>and multiple threads (sharing that instance). I would think of
>>>separate interpreter instances as the analog of separate
>>>independent processes (at the Unix level), and that threads would
>>>be something more lightweight than that. There would be _some_
>>>structure which is per-thread, but not logically the whole
>>>interpreter.
>>
>>Been there, done that, got the scars to prove it. Doesn't work well
>>in a system with guarantees of internal consistency and core data
>>elements that are too large for atomic access by the processor.
>>(And, while you can lock things, it gets really, really, *really*
>>slow...)
>
>1) It seems like you've made a leap here. I don't see how the need
>to guarantee the internal consistency of things such as strings
>directly implies per-thread allocation pools or file-descriptor
>tables. Having non-shared HLL-accessible data doesn't imply
>non-shared internal data structures.

We've a copying garbage collector. That pretty much requires
per-thread memory pools and higher-level mediated access to allocated
collectable memory. Otherwise things get... nasty.

>2) Separate issue really, but: How can a language such as Java,
>which doesn't have inter-thread data-access restrictions, be
>implemented on top of Parrot?

Easily, albeit with a bit of a speed hit for threaded code. (Java has
immutable strings which cuts out a lot of the need for
synchronization, since you don't need any for immutable data) If all
access is through PMCs, and it needs to be, you use the threaded
version of the PMC vtables, which automatically get and release the
PMC lock. (That's what the synchronization entry on the PMC is for,
to hold the mutex or whatever it is for this)

Elizabeth Mattijsen

unread,

Dec 22, 2003, 2:05:13 PM12/22/03

to Dan Sugalski, Jeff Clites, Leopold Toetsch, P6I Internals

At 11:00 -0500 12/22/03, Dan Sugalski wrote:
>Easily, albeit with a bit of a speed hit for threaded code. (Java
>has immutable strings which cuts out a lot of the need for
>synchronization, since you don't need any for immutable data) If all
>access is through PMCs, and it needs to be, you use the threaded
>version of the PMC vtables, which automatically get and release the
>PMC lock. (That's what the synchronization entry on the PMC is for,
>to hold the mutex or whatever it is for this)

In Perl 5, the sharedness of a variable can be determined at
run-time. Leo's mentioned that a PMC will never change its address
during its lifetime. Can these two requirements be met if there are
threaded and unthreaded versions of PMC vtables?

Liz

Dan Sugalski

unread,

Dec 22, 2003, 2:16:19 PM12/22/03

to Elizabeth Mattijsen, Jeff Clites, Leopold Toetsch, P6I Internals

Yes. Making a PMC shared can be as simple as swapping out the vtable
pointer in the PMC structure--no need to move it around at all. (Or,
worst case, turning the PMC into a reference PMC for the actual PMC,
whose contents get moved to a new header, which is legal-ish)

Elizabeth Mattijsen

unread,

Dec 23, 2003, 3:49:32 AM12/23/03

to Dan Sugalski, Jeff Clites, Leopold Toetsch, P6I Internals

At 14:16 -0500 12/22/03, Dan Sugalski wrote:
>At 8:05 PM +0100 12/22/03, Elizabeth Mattijsen wrote:
>>In Perl 5, the sharedness of a variable can be determined at
>>run-time. Leo's mentioned that a PMC will never change its address
>>during its lifetime. Can these two requirements be met if there
>>are threaded and unthreaded versions of PMC vtables?
>Yes. Making a PMC shared can be as simple as swapping out the vtable
>pointer in the PMC structure--no need to move it around at all. (Or,
>worst case, turning the PMC into a reference PMC for the actual PMC,
>whose contents get moved to a new header, which is legal-ish)

It's that last thing I'm worried about. That all thread related
things in Parrot are forced to use an extra indirection and
consequent performance penalty.

Liz

Leopold Toetsch

unread,

Dec 23, 2003, 4:37:41 AM12/23/03

to Elizabeth Mattijsen, perl6-i...@perl.org

I'm thinking of:

1) PMC class definitions get an extra flags entry (or two):
- shared_only (e.g. msg-PMC for inter-thread communication)
- shared_too (PMC has a shared variant)

If these flags are seen, the PMC compiler generates the vtable
methods like now done already for the Const$PMCs.
That is, we have e.g. a SharedPerlUndef PMC at compile time.

2) the Perl5ish declaration

my $var : shared;

is basically:

$P0 = new SharedPerlUndef;

OTOH:

share($var);

may need to morph $var into a shared reference, with an additional
indirection and memory overhead.

(I don't know, what Perl5 does with an already used "$var", that is
turned into a shared var later - or even at runtime).

So the overhead is only the necessary locking, the indirection can
easily be avoided.

> Liz

leo

Elizabeth Mattijsen

unread,

Dec 23, 2003, 5:07:53 AM12/23/03

to l...@toetsch.at, perl6-i...@perl.org

At 10:37 +0100 12/23/03, Leopold Toetsch wrote:
>2) the Perl5ish declaration
>
> my $var : shared;
>
> is basically:
>
> $P0 = new SharedPerlUndef;
>
> OTOH:
>
> share($var);
>
> may need to morph $var into a shared reference, with an additional
> indirection and memory overhead.
>
>(I don't know, what Perl5 does with an already used "$var", that is
>turned into a shared var later - or even at runtime).

$ perl5.8.2-threaded -Mthreads -Mthreads::shared -MO=Deparse -e 'my
$a : shared = 1'
use attributes ();
('attributes'->import('main', \$a, 'shared'), my $a) = 1;

$ perl5.8.2-threaded -Mthreads -Mthreads::shared=share -MO=Deparse -e
'my $a = 1; share( $a )'
my $a = 1;
share $a;

Both the share() function as well as the ":shared" attribute, operate
at runtime in Perl5. This is especially awkward for the ":shared"
attribute.

I think your solution of making ":shared" to become a true compile
time action, is best. If one wants to share at execution time, one
can expect extra overhead.

>So the overhead is only the necessary locking, the indirection can
>easily be avoided.

I'm glad to hear that!

Liz

Dave Mitchell

unread,

Dec 23, 2003, 5:40:02 AM12/23/03

to Elizabeth Mattijsen, l...@toetsch.at, perl6-i...@perl.org

On Tue, Dec 23, 2003 at 11:07:53AM +0100, Elizabeth Mattijsen wrote:
> At 10:37 +0100 12/23/03, Leopold Toetsch wrote:
> >2) the Perl5ish declaration
> >
> > my $var : shared;
> >
> > is basically:
> >
> > $P0 = new SharedPerlUndef;
> >
> > OTOH:
> >
> > share($var);
> >
> > may need to morph $var into a shared reference, with an additional
> > indirection and memory overhead.
> >
> >(I don't know, what Perl5 does with an already used "$var", that is
> >turned into a shared var later - or even at runtime).
>
> $ perl5.8.2-threaded -Mthreads -Mthreads::shared -MO=Deparse -e 'my
> $a : shared = 1'
> use attributes ();
> ('attributes'->import('main', \$a, 'shared'), my $a) = 1;
>
> $ perl5.8.2-threaded -Mthreads -Mthreads::shared=share -MO=Deparse -e
> 'my $a = 1; share( $a )'
> my $a = 1;
> share $a;
>
>
> Both the share() function as well as the ":shared" attribute, operate
> at runtime in Perl5. This is especially awkward for the ":shared"
> attribute.

Sharing of lexical vars has to be done at run-time, since each time the
scope is entered, you need to create a new instance of the variable.

--
"Do not dabble in paradox, Edward, it puts you in danger of fortuitous
wit." -- Lady Croom - Arcadia

Dan Sugalski

unread,

Dec 23, 2003, 2:23:45 PM12/23/03

to Elizabeth Mattijsen, Jeff Clites, Leopold Toetsch, P6I Internals

They'll live. Python and Ruby both have a single global interpreter
lock and nobody much cares.

People won't move to parrot because of signal or thread support, or
because we give them a cookie. People will move to parrot because it
runs perl 6, or because it gives them cross-language support for
perl, python, and ruby, or because it's an easy target to write a
custom language for.

Threads are useful, they're important, and they will be supported
properly. All indications are, however, that most programs will be
non-threaded, and decisions about the design are made with that in
mind.

Rod Adams

unread,

Dec 23, 2003, 4:54:09 PM12/23/03

to P6I Internals

Dan Sugalski wrote:

> They'll live. Python and Ruby both have a single global interpreter lock
> and nobody much cares.
>
> People won't move to parrot because of signal or thread support, or
> because we give them a cookie. People will move to parrot because it
> runs perl 6, or because it gives them cross-language support for perl,
> python, and ruby, or because it's an easy target to write a custom
> language for.
>
> Threads are useful, they're important, and they will be supported
> properly. All indications are, however, that most programs will be
> non-threaded, and decisions about the design are made with that in mind.

A major use of many languages these days is web services.
In the parrot world, I see three possible ways for this to happen.

- CGI/Exec. No problem to make parrot work, but the performance issues
with this are well known.
- mod_parrot. With Apache 2.0, this would need to be heavily threaded to
match the Apache core.
- A pure parrot web application server (PPWAS) that can compete directly
against the EJB/.NET crowd. This would obviously need heavy threading
with high performance.

If parrot is fast enough at threading and general computation, I'd see a
PPWAS as an amazing attractive target platform.
- Open Source Specs & Code.
- Multiple native languages
- Could relatively easily port your php & jsp web apps over.
- Ability to cross talk between your various languages.

I'd also point out that most all of the projects I've worked on which
were performance sensitive, they tended to favor a threaded approach to
easily work around something that blocks (UI, I/O, etc). Storing state
information to get around this is a major pain in the butt.

In short, I feel that for parrot/perl to be considered for more
"serious" applications, it needs industrial strength threading with
screaming performance.

That's my $0.02
-- Rod

Uri Guttman

unread,

Dec 23, 2003, 5:27:16 PM12/23/03

to Rod Adams, P6I Internals

>>>>> "RA" == Rod Adams <r...@rodadams.net> writes:

> A major use of many languages these days is web services.
> In the parrot world, I see three possible ways for this to happen.

> - CGI/Exec. No problem to make parrot work, but the performance issues
> with this are well known.
> - mod_parrot. With Apache 2.0, this would need to be heavily threaded
> to match the Apache core.
> - A pure parrot web application server (PPWAS) that can compete
> directly against the EJB/.NET crowd. This would obviously need heavy
> threading with high performance.

you missed at least two major designs

a pure parrot event loop system. if it has true async file i/o (which
dan has promised by using kernel threads but not parrot threads), you
can do it all in one process and one thread and not have the
sync/locking thread overhead or the process context switch overhead.

use an apache front end to a backend design as with the above. then you
get all the apache stuff you want and you get a fast backend with no
mod_perl craziness, no parrot level thread issues, etc.

the moral is, parrot threads are not the be-all/end-all solution. my
favorite query about threads is how well do they scale beyond one box?
(even dan can't fix that problem. :)

> I'd also point out that most all of the projects I've worked on which
> were performance sensitive, they tended to favor a threaded approach
> to easily work around something that blocks (UI, I/O, etc). Storing
> state information to get around this is a major pain in the butt.

that is not the only way as i have pointed out. it is just a way that is
promoted heavily (like java). events if done correctly are generaly
faster than threads and use much less ram (no stack context created for
each thread). and blocking stuff can be solved with true async file i/o
support (via direct kernel support or kernel threads) and good old
forking (preforked backend blocking servers and doing comm with either
pipes or shared ram). you just have to think outside the threaded box.

> In short, I feel that for parrot/perl to be considered for more
> "serious" applications, it needs industrial strength threading with
> screaming performance.

it should have fast threads but it doesn't NEED them. it NEEDS a solid
event loop and real async file i/o (which dan will deliver). then you
can solve things like this even faster than with parrot level threads.

just plugging events,

uri

--
Uri Guttman ------ u...@stemsystems.com -------- http://www.stemsystems.com
--Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org

Rod Adams

unread,

Dec 23, 2003, 6:20:26 PM12/23/03

to Uri Guttman, P6I Internals

Uri Guttman wrote:
>>>>>>"RA" == Rod Adams <r...@rodadams.net> writes:
> that is not the only way as i have pointed out. it is just a way that is
> promoted heavily (like java). events if done correctly are generaly
> faster than threads and use much less ram (no stack context created for
> each thread). and blocking stuff can be solved with true async file i/o
> support (via direct kernel support or kernel threads) and good old
> forking (preforked backend blocking servers and doing comm with either
> pipes or shared ram). you just have to think outside the threaded box.

I've written plenty of async I/O with event loops, etc, and yes, it does
work very fast and efficiency. (and yes, Parrot absolutely needs async
I/O). However, I generally view it as a major pain in the rump, and it
seriously obscures what it going on, especially if you have some
radically different things going on at once. Much clearer to read code
that dispatches a thread that can block. This is likely one of the
reasons it gets promoted as a "Good Idea".

I've also encountered things that had some reasonable issues solving
that way. Example of logical flow of actions:
For several different sites,
1) Grab one or more sets of data over the net. (can be heavily lagged)
2) Munch the data grabbed, something that can be very intensive.
3) If errors were found (data incomplete), put the page back in the
queue and go back to 1.
4) Otherwise, start feeding processed data into a Database (can also
block on overloaded DBs).

What's the problem with doing this in an event loop? Step 2. I've had
times where this was intensive enough that Step 1 happening for other
sites were dropping tcp connections. Forking didn't work well, because I
needed to mark things as done/not done back in the main loop.
Breaking step 2 into smaller pieces that played nice wasn't feasible, so
I went to a forking model w/ IPC. Except then the client wanted it to
work under Win32, where I've never trusted any of the pseudo-forks that
perl did (esp with Network I/O going on).
So I rewrote the whole thing in a language that supported threads.

> just plugging events,

Events are good. But they are not a complete replacement for threads.
Just like threads are not a complete replacement for event loops.
IMO, parrot should be able to support both models strongly, and not
overly promote one over the other.

-- Rod

Simon Glover

unread,

Dec 23, 2003, 6:15:06 PM12/23/03

to Uri Guttman, Rod Adams, P6I Internals

On Tue, 23 Dec 2003, Uri Guttman wrote:

> >>>>> "RA" == Rod Adams <r...@rodadams.net> writes:
>
> > A major use of many languages these days is web services.
> > In the parrot world, I see three possible ways for this to happen.
>
> > - CGI/Exec. No problem to make parrot work, but the performance issues
> > with this are well known.
> > - mod_parrot. With Apache 2.0, this would need to be heavily threaded
> > to match the Apache core.
> > - A pure parrot web application server (PPWAS) that can compete
> > directly against the EJB/.NET crowd. This would obviously need heavy
> > threading with high performance.
>
> you missed at least two major designs
>
> a pure parrot event loop system. if it has true async file i/o (which
> dan has promised by using kernel threads but not parrot threads), you
> can do it all in one process and one thread and not have the
> sync/locking thread overhead or the process context switch overhead.
>
> use an apache front end to a backend design as with the above. then you
> get all the apache stuff you want and you get a fast backend with no
> mod_perl craziness, no parrot level thread issues, etc.
>
> the moral is, parrot threads are not the be-all/end-all solution. my
> favorite query about threads is how well do they scale beyond one box?
> (even dan can't fix that problem. :)

Well, Dan's type 2 threads sound fairly like a message-passing system,
in which case it would hopefully be possible to have them seamlessly
interface with MPI or PVM; in which case, scaling beyond one box is
no problem (well, at least until latency, bandwidth and/or Amdahl's
law start to bite...)

Simon

Dan Sugalski

unread,

Dec 23, 2003, 6:33:13 PM12/23/03

to Rod Adams, P6I Internals

At 3:54 PM -0600 12/23/03, Rod Adams wrote:
>Dan Sugalski wrote:
>
>>They'll live. Python and Ruby both have a single global interpreter
>>lock and nobody much cares.
>>
>>People won't move to parrot because of signal or thread support, or
>>because we give them a cookie. People will move to parrot because
>>it runs perl 6, or because it gives them cross-language support for
>>perl, python, and ruby, or because it's an easy target to write a
>>custom language for.
>>
>>Threads are useful, they're important, and they will be supported
>>properly. All indications are, however, that most programs will be
>>non-threaded, and decisions about the design are made with that in
>>mind.
>
>A major use of many languages these days is web services.
>In the parrot world, I see three possible ways for this to happen.
>
>- CGI/Exec. No problem to make parrot work, but the performance
>issues with this are well known.

Yep, though we start up reasonably snapppily, which is nice. :)

>- mod_parrot. With Apache 2.0, this would need to be heavily
>threaded to match the Apache core.

Well... no. Completely non-interacting threads would work fine for
this case, I think. (I'm not completely familiar with Apache 2.0, so
there may be some things I'm not aware of)

>- A pure parrot web application server (PPWAS) that can compete
>directly against the EJB/.NET crowd. This would obviously need heavy
>threading with high performance.

Nope, not necessary, though certainly that's one way to do it, and
would likely be needed for proper exploitation of a multiprocessor
machine. On a uniprocessor machine, though, you'll go further with
events and async I/O. It'll certainly be faster than the pure perl 5
webserver, though. (Well, until it's running on Parrot :)

>If parrot is fast enough at threading and general computation, I'd
>see a PPWAS as an amazing attractive target platform.
>- Open Source Specs & Code.
>- Multiple native languages
>- Could relatively easily port your php & jsp web apps over.
>- Ability to cross talk between your various languages.

Right, but all of this can be dealt with via mod_parrot running under
Apache, which strikes me as the more sensible thing to do--they've
already worked out all the network protocol stuff and other grotty
low-level bits.

>I'd also point out that most all of the projects I've worked on
>which were performance sensitive, they tended to favor a threaded
>approach to easily work around something that blocks (UI, I/O, etc).
>Storing state information to get around this is a major pain in the
>butt.

Which is why you use async I/O and events with proper callbacks. :)
Using async I/O and events properly will get you better throughput
than a threaded solution most of the time. It is, granted, somewhat
more brain-twisting if you're not used to it.

>In short, I feel that for parrot/perl to be considered for more
>"serious" applications, it needs industrial strength threading with
>screaming performance.

It's probably not going to get it by most definitions of "screaming
performance" regardless of what emphasis threads get. Parrot's base
data structures are too large for anything other than very loosely
coupled threads to access things. Too many large things and too many
shared resources that the interpreter needs to lock with utter
paranoia, because we're guaranteeing that programs running on top of
parrot can't screw up the internals by not locking things.

Uri Guttman

unread,

Dec 23, 2003, 6:50:35 PM12/23/03

to Rod Adams, P6I Internals

>>>>> "RA" == Rod Adams <r...@rodadams.net> writes:

> What's the problem with doing this in an event loop? Step 2. I've
> had times where this was intensive enough that Step 1 happening for
> other sites were dropping tcp connections. Forking didn't work well,
> because I needed to mark things as done/not done back in the main
> loop. Breaking step 2 into smaller pieces that played nice wasn't
> feasible, so I went to a forking model w/ IPC. Except then the
> client wanted it to work under Win32, where I've never trusted any
> of the pseudo-forks that perl did (esp with Network I/O going on).
> So I rewrote the whole thing in a language that supported threads.

other than the fork issue on win32, it can be done with events. i have
layered an async/sync flow control module over an event loop. it allows
remote (blocking or heavy crunching ops) to be mixed with local ops in a
simple if/then/while minilang. it removes much of the state issues from
complex event loops. ask me about it off list if you want more on that.

i even sent leo (though i am not sure of ownership since that company
went under) a generic event loop in c that i wrote. even if it can't be
used for legal or technical reasons, it should be useful to him and dan
for its design ideas and speed. it uses doubly linked queues (with a
full public api of its own and a slab malloc similar to what parrot has)
to handle all the event types (read/write/timer/signal/plain). dan's
plan is to put the single event loop into a single kernel thread and
have a event and trigger queues shared by all threads (parrot or
kernel). so my event loop design and/or code could be used there. and i
also did run that application with kernel (solaris actually) threads for
blocking ops (dns lookups) and the event loop in one process. it used a
pipe to itself to sync the dns worker threads with the main thread which
ran the event loop. so you can use both in one design if you integrate
them properly. the pipe was an elegant synch solution as you can put an
event handler on one side and the threads could do blocking reads on the
other side. all that was written on the pipe was the address of a
command block (for the dns and its results). i haven't see many systems
which have a process use a pipe to itself :)

> Events are good. But they are not a complete replacement for
> threads. Just like threads are not a complete replacement for event
> loops.

i totally agree. it is just that threads get way more publicity than
event loops. i am just trying to balance that here. :) you have to
realize i have been doing event loop systems for over 20 years (i have
created at least 4 different major event systems in very different
environments) so events come very naturally to me. they seem to be
trickier for some and they gravitate to threads which is a 'simpler'
model but as i have said has its own pitfalls and efficiency issues that
few typical thread users recognize.

> IMO, parrot should be able to support both models strongly, and not
> overly promote one over the other.

if that that balance happens, i will be very happy. but more people are
going to come in with thread desires (as most other langs push threads
and not event loops). dan and i have talked extensively on this subject
and i can work with his design which will support both in a reasonable
fashion. parrot must have a core event loop which can be used by any gui
or other event package. it will have parrot level threads as well. how
efficient those threads are vs. a single thread in parrot is one of the
issues on the table.

Dan Sugalski

unread,

Dec 23, 2003, 7:43:07 PM12/23/03

to Simon Glover, Uri Guttman, Rod Adams, P6I Internals

Arguably any method call could be a message that gets passed off to
the other end of the world, though some of the designs of the method
lookup system make that somewhat more difficult than we might
otherwise like. (The fetch method PMC, then invoke method PMC, means
that a distributed object system needs to have the object return a
proxy method call PMC, where a single call-on-object system could
embed the proxying into the object itself, but that's a minor issue)

But yeah, there's no requirement that the receiving interpreter need
be on the same system, given how loosely coupled a type 2 thread
system is and the fact that calls are all pass-by-value. A call could
well serialize up the PMC contents and fire them across the wire to a
remote interpreter or something--there'd not really be any way for
the calling interpreter to know. Or much reason for it to either, I
suppose.

Rod Adams

unread,

Dec 23, 2003, 8:40:25 PM12/23/03

to Uri Guttman, Dan Sugalski, P6I Internals

Uri Guttman wrote:
>>>>>> "RA" == Rod Adams <r...@rodadams.net> writes:
>> Except then the client wanted it to work under Win32, where I've
>> never trusted any of the pseudo-forks that perl did (esp with
>> Network I/O going on). So I rewrote the whole thing in a language
>> that supported threads.
> other than the fork issue on win32, it can be done with events.

Therein lies the rub. There's a lot of win32 out there, and it needs to
be supported. Will parrot have a pseudo fork system (that actually
works) for platforms with no native fork?

>> IMO, parrot should be able to support both models strongly, and not
>> overly promote one over the other.
>
> if that that balance happens, i will be very happy. but more people
> are going to come in with thread desires (as most other langs push
> threads and not event loops). dan and i have talked extensively on
> this subject and i can work with his design which will support both
> in a reasonable fashion. parrot must have a core event loop which can
> be used by any gui or other event package. it will have parrot level
> threads as well. how efficient those threads are vs. a single thread
> in parrot is one of the issues on the table.

Okay, this plus what Dan said of:

> Which is why you use async I/O and events with proper callbacks. :)
> Using async I/O and events properly will get you better throughput
> than a threaded solution most of the time. It is, granted, somewhat
> more brain-twisting if you're not used to it.

got me thinking. (this is mostly thinking out loud, so bear with).

- Most treaded code can be converted to an event loop (+async I/O)
without issue. Esp if we have co-routines.
- For the other stuff, we'd still have Type 2 threads, which gets the
job done.

So, since very little will be written for parrot directly, but rather in
higher level languages, could we have a mechanism which takes IMC (or
higher) that _looks_ like it's doing Type 3 threads, but automagically
converts them to an event loop if possible, or a Type 2 for the cases it
can't?
I'm thinking this would likely have to be done at a level higher than
IMC, since the higher language would have a better idea of what's going
on. But I'm not sure.

By the same token, Type 3 can be faked with Type 2, and a suitable
number of messages updating values.

I'm starting to think I could be happy w/ a well tuned event loop and
type 2 threads. Stuff that need heavy data interaction can use an event
loop. Stuff that needs easier blocking (and untimely death) protection
can go go Type 2.

-- Rod

Robert Spier

unread,

Dec 24, 2003, 1:27:16 AM12/24/03

to P6I Internals

At Tue, 23 Dec 2003 14:23:45 -0500,

Dan Sugalski wrote:
> >It's that last thing I'm worried about. That all thread related
> >things in Parrot are forced to use an extra indirection and
> >consequent performance penalty.
>
> They'll live. Python and Ruby both have a single global interpreter
> lock and nobody much cares.

I really don't think thats true. People do care, and there are some
brain cells out there working to fix it, last I checked. Its just not
top of the list, because its non-trivial.

-R

Elizabeth Mattijsen

unread,

Dec 24, 2003, 3:21:10 AM12/24/03

to Dan Sugalski, Rod Adams, P6I Internals

At 18:33 -0500 12/23/03, Dan Sugalski wrote:
>At 3:54 PM -0600 12/23/03, Rod Adams wrote:
>>If parrot is fast enough at threading and general computation, I'd
>>see a PPWAS as an amazing attractive target platform.
>>- Open Source Specs & Code.
>>- Multiple native languages
>>- Could relatively easily port your php & jsp web apps over.
>>- Ability to cross talk between your various languages.
>Right, but all of this can be dealt with via mod_parrot running
>under Apache, which strikes me as the more sensible thing to
>do--they've already worked out all the network protocol stuff and
>other grotty low-level bits.

A marriage between Parrot and APR (Apache Portable Runtime) might be
a marriage in heaven, in that respect. For those not in the know,
APR contains most of the grotty low-level bits.

Liz

Leopold Toetsch

unread,

Dec 24, 2003, 3:54:16 AM12/24/03

to Uri Guttman, perl6-i...@perl.org

Uri Guttman <u...@stemsystems.com> wrote:

> i even sent leo (though i am not sure of ownership since that company
> went under) a generic event loop in c that i wrote.

Thanks again, its really helpful, albeit running event handlers in PASM
is a bit different :)

> ... dan's

> plan is to put the single event loop into a single kernel thread and
> have a event and trigger queues shared by all threads (parrot or
> kernel).

Actually the event loop thread is already running. It can dispatch
events to all interpreters. It needs for sure refactoring, when IO is
added, but for now its enough to have a look at issues that could arise
from it.

> uri

leo

Robert Spier

unread,

Dec 24, 2003, 1:16:25 PM12/24/03

to P6I Internals

> A marriage between Parrot and APR (Apache Portable Runtime) might be
> a marriage in heaven, in that respect. For those not in the know,
> APR contains most of the grotty low-level bits.

It misses some things that are important to us, like fork(), and it's
got this concept of memory pools, which seem pretty nifty, but not
necessarily compatible with the parrot scheme...

but it may get us 80% of the way there, leaving us only 20% to
implement, which wouldn't be so bad.

-R

Gordon Henriksen

unread,

Dec 30, 2003, 11:22:32 AM12/30/03

to Rod Adams, p6i

On Tuesday, December 23, 2003, at 08:40 , Rod Adams wrote:

> - Most treaded code can be converted to an event loop (+async I/O)
> without issue. Esp if we have co-routines.
> - For the other stuff, we'd still have Type 2 threads, which gets the
> job done.

(Just got back from vacation and was reviewing this aging thread...)

Not to throw a damper on the events party or anything, but an
event-based system with asynchronous I/O isn't capable of saturating an
SMP box's processors. This is one of the major reasons for using threads
in web servers. It's also a significant reason for using threads in
desktop applications. Yes, N interpreters for N processors will get you
there, but at the cost of times-N static memory consumption (for JITted
code, type registries, vtables, etc.: interpreter overhead), and at the
cost of fine-grained, lightweight inter-thread communication between the
segregated threads.

Further, threading implemented as context switches at block time amounts
to a cooperative multithreading environment. Yes, it may provide
near-optimal throughput. Despite that, it also has some very bad indeed
worst-case latency characteristics. If a worker thread fails to block,
the thread which started it will never (or rarely) run and the program
will become unresponsive. This makes such a threading model unsuitable
for use as in a web application host. One misbehaving HTTP request
handler mustn't block other requests. A worker thread mustn't block the
UI thread.

Sidenote: Shades of System 7: CPU benchmarks on the old Mac OS do run
several percentage points faster than on preemptive systems. The
preemptive model is clearly superior in the general case, though; its
perceived performance under load is by far superior. Also worth noting:
parrot will already be paying the preemptive performance penalty on any
modern OS.

I can only hail core events and asynchronous I/O as great advances in
parrot. But they are not a general replacement for preemptive
multithreading. Of course, TMTOWTDI and YMMV, but parrot should support
both models well, and the above line of thought isn't doing threading
justice in my opinion.

—

Gordon Henriksen
mali...@mac.com

Leopold Toetsch

unread,

Jan 2, 2004, 9:15:48 AM1/2/04

to perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> wrote:

> 3) a separate array for the JITed method stubs, which are per interpeter
> and only if the platform can generate such stubs on the fly.

Done now. JITted NCI methods are now enabled again but only i386 has the
necessary interface code in jit/*.

leo