More on threads

Gordon Henriksen

unread,

Jan 23, 2004, 11:48:22 PM1/23/04

to perl6-i...@perl.org

Just thought I'd share some more thoughts on threading. I don't think
the threading proposal is baked, yet, unfortunately.

I've come to agree with Dan: As the threading requirements and the
architecture stand, parrot requires frequent and automatic locking to
prevent crashes. This is completely apart from user synchronization.

"As the architecture stands?" What's wrong with it? I think the most
problematic items are:

1. parrot's core operations are heavy and multi-step, not lightweight
and atomic.
--> This makes it harder for parrot to provide a crash-proof environment.
2. PMCs are implemented in C, not PIR.
--> Again, makes parrot's job of providing a crash-proof environment
much harder. If a small set of safe operations can be guaranteed safe,
then the crash-proofing bubbles upward.
3. New code tends to appear in parrot's core rather than accumulating in
a standard library.
--> This bloats the core, increasing our exposure to bugs at that level.
4. Memory in parrot is not type-stable.
--> unions with mutable discriminators are evil, because checking the
discriminator and accessing the data field could be preempted by a
change of discriminator and value. Thus, unions containing pointers
require locking for even read access, lest seg faults or unsafe memory
accesses occur. Best example: morph. morph must die.

But parrot's already much too far along for the above to change. (ex.:
morph must die.)

The JVM and CLR have successful threading implementations because their
core data types are either atomic or amenable to threading. (I've been
over this before, but I'm playing devil's advocate today.)
--> Many of Perl's core string types, for instance, are not threadsafe—
and they never will be. (note: I said Perl, not parrot.) Even if
implemented on the JVM, Perl's string types would still require locking.
That Perl doesn't use them yet doesn't mean parrot can't also have data
structures that are amenable to locking. Immutable strings wouldn't
require locking on parrot any more than on the JVM—so long as morph and
transcode could be prevented. (Three cheers for type-stable memory.) If
parrot can prove that a P-reg will point to a PMC of such-and-such type,
and can know that such-and-such operation requires no locking on that
type, it can avoid locking the PMC.

That, and neither environment (any longer) makes any misguided attempt
to provide user-level consistency when it hasn't been requested.
--> That means they simply don't lock except when the user tells them
to. No reader-writer locks to update a number.

Like Dan mentioned, there's no "JVM magic," but rather there is a lot of
very careful design. The core is crash-proofed, and is small enough that
the crash-proofing is reasonable and provable. Atop that is built code
which inherits that crash-proofing. Thread-safety is a very high-level
guarantee, only rarely necessary.

Dan Sugalski wrote:

> =item All shared PMCs must have a threadsafe vtable
>
> The first thing that any vtable function of a shared PMC must do is to
> aquire the mutex of the PMCs in its parameter list, in ascending
> address order. When the mutexes are released they are not required to
> be released in any order.

Wait a sec. $2->vtable->add(interpreter, $1, $2, $3). That's one dynamic
dispatch. I see 2 variables that could be shared. I think that's fatal,
actually.

The algorithm I'd suggest instead is this: Newborn objects couldn't have
been shared, and as such can safely be accessed without locks. This is a
lot given how Perl treats values, though certainly not all. All objects
from foreign sources, which have been passed to another routine, or
stored into a sharable container must be presumed to require locks. It's
not as aggressive, true, but I think the overall cost is lower.

To back up Dan: Regarding Leo's timings, everyone that's freaking out
should remember that he was testing a very fast operation, the worst
case scenario for the locking overhead. *Of course* the overhead will
appear high. Most of parrot's operations are much heavier, and the
locking overhead will be less apparent when those are executing. That
said, a 400% penalty is too high a price to pay for what, after all,
isn't even a useful level of threadsafety from a user's standpoint. But,
again, without respecification and redesign, parrot requires the
locking. The trick is to lock less.

One way I can see to do that is to move locking upward, so that several
operations can be carried out under the auspices of one lock. How would
I propose to do this?

• Add some lock opcodes to PBC. The pluralized version allows parrot to
acquire locks in ascending address order (hardcoded bubble sorts),
according to Dan's very important deadlock-avoidance algorithm.
- op lock(in PMC)
- op lock2(in PMC, in PMC)
- ...
- op lock5(in PMC, in PMC, in PMC, in PMC, in PMC)
- ...
• Add unlock opcode(s), too. Pluralized? Doesn't matter.
• Force all locks to be released before any of:
- locking more PMCs (deadlock avoidance!)
- invoke*
- an upwards branch
- reassigning a PMC register (changing reg[n], not reg[n]->something)
- any blocking operation
- anything that checks for events
• Bytecode validation can ensure that PMC locks are held before
operating on a PMC that is:
- not a temporary
- not known to be a threadsafe PMC (I repeat: morph is evil)

Pain in the ass? Maybe not. Here, I think IMCC can pull most of the
weight: lock* shouldn't even be available through IMCC. Since it's just
a matter of chugging through the rules, IMCC can apply locking
automatically. (Hell, to avoid changing PBC, parrot could implement them
automatically at runtime! Wouldn't want to be there, but the above set
of rules is darned easy to apply.)

However, I would again repeat that the value provided to user programs
by atomic PMC operations is negligible (since a user's data structures
will typically be higher-level, and thus require user-level locking)—and
is certainly not offset by a 400% performance penalty. Locking PMCs is
the easy way out. If type-stable memory can be provided (morph must
die), it may be possible to look away from automatic PMC locks in more
cases. For instance, an Int PMC would require no locks whatsoever—so
long as its type cannot change. Nor would an immutable string, or a
fixed-size array. On platforms with 8-word atomic writes, a Number PMC
could get by without locks, too. morph must die.

Dan Sugalski wrote:

> Okay, we're going to mandate that we have read/write locks, each
> interpreter pool has one, and mutating vtable entries must get a read
> lock on the pool read/write lock. Pricey (ick) but isolated to
> mutators. The DOD gets a write lock on it, which'll block all
> read/write access so no mutators can be in process while the pool DOD
> runs.

Adding a reader-writer lock on every pointer mutation will be completely
and utterly unfeasible from a performance standpoint. There will be
massive reader contention even on uniprocessors, and the bus pollution
on multiprocessors will be disastrous. Reader-writer locks are more
costly than mutexes.

Can I suggest an algorithm to completely eliminate that reader-writer
lock, replacing it with a form of coordination that's cheaper in
aggregate, and free in the common case? It's not simple, but I'd rather
one elephant than a billion mice. Might help with infant mortality, too.

—

Gordon Henriksen
mali...@mac.com

P.S. - morph must die.

Leopold Toetsch

unread,

Jan 24, 2004, 9:23:48 AM1/24/04

to Gordon Henriksen, perl6-i...@perl.org

Gordon Henriksen <mali...@mac.com> wrote:

> ... Best example: morph. morph must die.

Morph is necessary. But please note: morph changes the vtable of the PMC
to point to the new data types table. It has nothing to do with a typed
union.

> Gordon Henriksen

leo

Gordon Henriksen

unread,

Jan 24, 2004, 10:50:57 AM1/24/04

to l...@toetsch.at, Dan Sugalski, perl6-i...@perl.org

The vtable IS the discriminator. I'm referring to this:

typedef union UnionVal {
struct { /* Buffers structure */
void * bufstart;
size_t buflen;
} b;
struct { /* PMC unionval members */
DPOINTER* _struct_val; /* two ptrs, both are defines */
PMC* _pmc_val;
} ptrs;
INTVAL int_val;
FLOATVAL num_val;
struct parrot_string_t * string_val;
} UnionVal;

So long as the discriminator does not change, the union is type stable.
When the discriminator does change, as per here:

void
Parrot_PerlInt_set_string_native(Parrot_Interp interpreter, PMC*
pmc, STRING* value)
{
VTABLE_morph(interpreter, pmc, enum_class_PerlString);
VTABLE_set_string_native(interpreter, pmc, value);
}

void
Parrot_perlscalar_morph(Parrot_Interp interpreter, PMC* pmc, INTVAL
type)
{
if (pmc->vtable->base_type == enum_class_PerlString) {
if (type == enum_class_PerlString)
return;
PObj_custom_mark_CLEAR(pmc);
pmc->vtable = Parrot_base_vtables[type];
return;
}
if (type == enum_class_PerlString) {
pmc->vtable = Parrot_base_vtables[type];
VTABLE_init(interpreter, pmc);
return;
}
PObj_custom_mark_CLEAR(pmc);
pmc->vtable = Parrot_base_vtables[type];

}

... then these can both run:

Parrot_scalar_get_string(Parrot_Interp interpreter, PMC* pmc)
{
return (STRING*)pmc->cache.string_val;
}

FLOATVAL
Parrot_scalar_get_number(Parrot_Interp interpreter, PMC* pmc)
{
return pmc->cache.num_val;
}

That clearly allows a struct parrot_string_t * to freely share the same
memory as a double. Were it an int and a double, the "surprising
results" from this unprotected access wouldn't violate the no crashes
guarantee. But it's a pointer! Dereferencing it could cause a segfault,
or a read or write of an arbitrary memory location. Both clearly violate
the crucial guarantee.

—

Gordon Henriksen
mali...@mac.com

Gordon Henriksen

unread,

Jan 24, 2004, 1:59:26 PM1/24/04

to l...@toetsch.at, Dan Sugalski, perl6-i...@perl.org

Leopold Toetsch wrote:

I overstated when I said that morph must die. morph could live IF:

• the UnionVal struct were rearranged
• bounds ere placed upon how far a morph could... well, morph

It doesn't matter if an int field could read half of a double or v.v.;
it won't crash the program. Only pointers matter.

To allow PMC classes to guarantee segfault-free operation, morph and
cooperating PMC classes must conform to the following rule. Other
classes would require locking.

With this vocabulary:
variable: A memory location which is reachable (i.e., not garbage). [*]
pointer: The address of a variable.
pointer variable: A variable which contains a pointer.
access: For a pointer p, any dereference of p—*p, p->field, or
p[i]—whether for the purposes of reading or writing to that variable.

And considering:
any specific pointer variable ("ptr"), and
all accesses which parrot might perform[**] on any pointer ever
stored in ptr ("A") [***], and
any proposed assignment to ptr

Then:
If any A which once accessed a pointer variable would now access a
non-pointer variable,
Then the proposed assignment MUST NOT be performed.

This is a relaxed type-stability definition. (Relaxed: It provides type
stability only for pointer variables, not for data variables. It does
not discriminate the types of pointers, only that the data structures
they directly reference have the same layout of pointers. Also, a
loophole allows non-pointer variables to become pointer variables, but
not the reverse.)

These rules ensure that dereferencing a pointer will not segfault. They
also ensure that it is safe to deference a pointer obtained from a union
according to the union's discriminator—regardless of when or in which
order or how often parrot read the pointer or the discriminator.[***] I
think they're actually the loosest possible set of rules to do this.

[*] Two union members are the same variable.
[**] This is in the variable ptr specifically, not merely in the same
field of a similar struct. That is, having an immutable discriminator
which selects s.u.v or s.u.i from struct { union { void * v; int i; }
u } s is valid. A mutable discriminator is also valid—so long as the
interpretation of pointer fields does not change.
[***] But only if the architecture prevents shearing in pointer reads
and writes.

From another perspective this is to say:

Every pointer variable must forever remain a pointer.
Union discriminators must not change such that a pointer will no longer
be treated as a pointer, or will be treated as a pointer to a structure
with a different layout.

The first step in conforming to these rules is guaranteeing that a
perlscalar couldn't morph into an intlist or some other complete
nonsense. So the default for PMCs should be to prohibit morphing. Also,
morphable classes will have a hard time using struct_val without
violating the above rules. But for this price, parrot could get
lock-free, guaranteed crash-proof readers for common data types. But
note that pmc->cache.pmc_val can be used freely! So if exotic scalars
wrap their data structures in a PMC
*cough*perlobject*cough*managedstruct*ahem*, then those PMCs can be part
of a cluster of morphable PMC classes without violating these rules.

Next, the scalar hierarchy (where morphing strikes me as most important)
could be adjusted to provide the requisite guarantees, such as:
perlstring's vtable methods would never look for its struct
parrot_string_t * in the same memory location that a perlnum vtable
method might be storing half of a floatval. Right now, that sort of
guarantee is not made, and so ALL shared PMCs REALLY DO require locking.
That's bad, and it's solvable.

Specifically, UnionVal with its present set of fields, would have to
become something more like this:

struct UnionVal {
struct parrot_string_t * string_val;
DPOINTER* struct_val;
PMC* pmc_val;
void *b_bufstart;
union {
INTVAL _int_val;
size_t _buflen;
FLOATVAL _num_val;
} _data_vals;
};

If no scalar types use struct_val or pmc_val or b_bufstart, then those
fields can go inside the union.

Unconstrained morphing is the only technology that *in all cases*
*completely* prevents the crash-proof guarantee for lock-free access to
shared PMCs. Without changes to this, we're stuck with implicit PMC
locking and what looks like an unusable threading implementation.

This is only the beginning! For example, if parrot can povide type
stability, mutable strings can be crash-proofed from multithreaded
access. "Wha?!"

1. Add to the buffer structure an immutable capacity field. Or use the
memory allocator's length field; same thing.
2. All routines which access the string's memory buffer should copy the
string's byte length and buffer pointer to local variables before
processing them, so that C ensures the algorithm has a stable copy of
the variables. (The actual fields should be declared volatile.)
3. All routines which access the string's memory buffer should now bound
their accesses by selecting as the upper bound the lesser of those two
variables. Or: Throw an exception if the capacity is less than the byte
length.
4. It must be guaranteed that a buffer contraction will change the
buffer pointer. (i.e., realloc() musn't just put the top of the buffer
back in the free-list.)

This smidgen of extra work will ensure that a string iterator never
walks off the end of the buffer, should the pointer to the buffer and
the string's length be read from inconsistent states of the same object.
Surprising results? Yes. Crashes? No!

On 32-bit platforms with atomic 64-bit reads and writes (or 64-bit
platforms with atomic 128-bit reads and writes), the above can actually
be avoided if the buffer and byte length are stored adjacently and
always written and read as a pair.

But this sort of lock avoidance cleverness can't happen unless parrot
can guarantee type stability in at least limited circumstances.

—

Gordon Henriksen
mali...@mac.com

Gordon Henriksen

unread,

Jan 24, 2004, 3:35:35 PM1/24/04

to Gordon Henriksen, perl6-i...@perl.org

I wrote:

> With this vocabulary:
> variable: A memory location which is reachable (i.e., not
> garbage). [*]
> pointer: The address of a variable.
> pointer variable: A variable which contains a pointer.
> access: For a pointer p, any dereference of p—*p, p->field, or
> p[i]—whether for the purposes of reading or writing to that variable.
>
> And considering:
> any specific pointer variable ("ptr"), and
> all accesses which parrot might perform[**] on any pointer ever
> stored in ptr ("A") [***], and
> any proposed assignment to ptr
>
> Then:
> If any A which once accessed a pointer variable would now access a
> non-pointer variable,
> Then the proposed assignment MUST NOT be performed.

D'oh. This actually has to be recursive.

Considering:

any specific pointer variable ("ptr'), and

all accesses which parrot might perform on any pointer ever stored
in ptr,
and all accesses which parrot might perform on any pointer
ever stored in those variables,
...,
... "A", and

any proposed assignment to ptr

else it allows

char * a = ...;
char ** b = a;

Doesn't change the conclusions I drew at all. (Nor does it require some
massively recursive algorithm to run at pointer assignment time, just as
the first one didn't require anything more than pointer assignment at
pointer assignment time.) Could probably be simplified with the addition
of pointer type to the definitions section.

Anyhoo.

—

Gordon Henriksen
mali...@mac.com

Pete Lomax

unread,

Jan 24, 2004, 8:47:42 PM1/24/04

to perl6-i...@perl.org

On Sat, 24 Jan 2004 13:59:26 -0500, Gordon Henriksen
<mali...@mac.com> wrote:

<snip>

>It doesn't matter if an int field could read half of a double or v.v.;
>it won't crash the program. Only pointers matter.

<snip>

>These rules ensure that dereferencing a pointer will not segfault.

In this model, wouldn't catching the segfault and retrying (once or
twice) work? - If I'm reading you correctly, which is unlikely, this
has little to do with program correctness, but about the interpreter
not crashing because of an unfortunate context switch.. which the
programmer should have guarded against in the first place... no, I
think I just lost the plot again ;-)

Pete

Gordon Henriksen

unread,

Jan 25, 2004, 12:23:59 AM1/25/04

to Pete Lomax, perl6-i...@perl.org

Pete Lomax wrote:

> Gordon Henriksen wrote:
>
> <snip>
>> It doesn't matter if an int field could read half of a double or v.v.;
>> it won't crash the program. Only pointers matter.
> <snip>
>> These rules ensure that dereferencing a pointer will not segfault.
>
> In this model, wouldn't catching the segfault and retrying (once or
> twice) work?

Determining how to "retry" in the general case would be... much more
interesting than this proposal. :)

Furthermore, worse than segfaults could potentially result from using
half of a double as a pointer. There's no assurance that *((caddr_t *)
&double) won't in fact be a valid memory address. In this case, there
would be no segfault, but memory would be subtly corrupted. There's no
way to detect that, so there's no way to retry.

The point of all the tweaks and care is to prevent ever dropping
something else in a particular variable where parrot would, at another
point in the program, expect a pointer of a particular type.

I think you probably got the following, but I'd just like to elaborate
more specifically. I think the greatest subtlety of the rules was in the
interpretation of

>> accesses which parrot might perform

and the word specific in

>> any specific pointer variable ptr

Without understanding precisely what I meant there, one might think that
even a simple polymorphic system like the following is prohibited:

struct eg;
typedef int (func_ptr*)(struct eg*);
struct {
func_ptr fp;
union { int *pointer; int integer; } u;
} eg;

int pointer_meth(struct eg *thiz) { return ++*(thiz->u.pointer); }
int integer_meth(struct eg *thiz) { return ++(thiz->u.integer); }

void main() {
struct eg eg1 = { pointer_meth, NULL };
struct eg eg2 = { integer_meth, 1 };
eg1.u.pointer = malloc(sizeof(int));
*eg1.u.pointer = 0;

print_it("eg1", &eg1);
print_it("eg2", &eg2);
}

void print_it(char *name, struct eg *some_eg) {
printf("%s says %d\n", some_eg>fp(&eg2));
}

But the program IS allowed. While print_it might behave in any number of
ways depending on some_eg->fp, it will always access a particular eg.u
in a consistent fashion, since it always respects the discriminator
(eg.fp), which the program never changes. By extension of this, C++
instances do not violate these rules, either.[*] Were the following line
added to main(), though, then the program would be in violation:

eg1.fp = integer_meth;

Because now some other thread could have obvserved eg1.fp ==
pointer_meth and begun invoking pointer_meth. pointer_meth might now
access u.pointer and, instead of a pointer, see n + (int) u.pointer.
That probably won't segfault for small values of n, but will certainly
not do the right thing either.

This is trite tangent, but also note that the type stability rule
prohibits this:

eg1.u.pointer = NULL;

But would not if the definition of pointer_meth became:

int pointer_meth(struct eg *thiz) {
int* pointer = this->u.pointer;
return pointer == NULL? -1 : ++*pointer;
}

Because now the program will now not dereference u.pointer if its value
is NULL. How cute. (But if u.pointer were not copied to a local, then
bets are off again, because C might perform an extra load and get a
value inconsistent with the one it received when testing for NULL.)

... But even that extra local copy isn't required if u.pointer begins
NULL, and can become non-NULL, but will not become NULL again. Why?

>> all accesses which parrot MIGHT perform on any pointer ever
>> storED in ptr ("A")

Note the past tense there. That's why.

> If I'm reading you correctly, which is unlikely,

That has little to do with you, but much to do with my burying important
parts of my message in pages of dense text. :)

> this has little to do with program correctness, but about the
> interpreter not crashing because of an unfortunate context switch..
> which the programmer should have guarded against in the first place...

Yes. Precisely.

> no, I think I just lost the plot again ;-)

I think you're pretty close, just missing a few of the subtleties that
got buried in that long missive. :)

—

Gordon Henriksen
mali...@mac.com

[*] Even though C++ changes the vtable of an instance during
instantiation, it does so in a broadening fashion, making
formerly-inaccessible variables accessible. A C++ instance of "class
derived_class : public base_class" is not a derived_class until
base_class's constructor finishes and derived_class's constructor
begins. (Side-effect: A subclass cannot influence the instantiation
behavior of a base class.) (Objective C's class methods are wildly
useful. Static languages tend to ignore them. It's sad.)

Leopold Toetsch

unread,

Jan 25, 2004, 4:52:28 AM1/25/04

to Gordon Henriksen, perl6-i...@perl.org

Gordon Henriksen <mali...@mac.com> wrote:

> I overstated when I said that morph must die. morph could live IF:

[ long proposal ]

Increasing the union size, so that each pointer is distinct is not an
option. This imposes considerable overhead on a non-threaded program
too, due its bigger PMC size.

To keep internal state consistent we have to LOCK shared PMCs, that's
it. This locking is sometimes necessary for reading too.

leo

Gordon Henriksen

unread,

Jan 25, 2004, 1:47:23 PM1/25/04

to Leopold Toetsch, perl6-i...@perl.org, Dan Sugalski

Leopold Toetsch wrote:

> Gordon Henriksen wrote:
>
>> I overstated when I said that morph must die. morph could live IF:
>
> [ long proposal ]
>
> Increasing the union size, so that each pointer is distinct is not an
> option. This imposes considerable overhead on a non-threaded program
> too, due its bigger PMC size.

That was the brute-force approach, separating out all pointers. If the
scalar hierarchy doesn't use all 4 of the pointers, then the bloat can
be reduced. So long as a morph can't make a memory location which was
formerly a pointer variable point to a new type, or reuse the memory
location for data--then pointers can be part of a union without
violating the principal of operation.

And what of the per-PMC mutex? Is that not also considerable overhead?
More than an unused field, even.

> To keep internal state consistent we have to LOCK shared PMCs, that's
> it. This locking is sometimes necessary for reading too.

Sometimes? Unless parrot can prove a PMC is not shared, PMC locking is
ALWAYS necessary for ALL accesses to ANY PMC. (This is easy to show; for
any thread, hypothesize a preemption and morph at an inconvenient PC.
Some morph will always put an int where a pointer was expected, or
change the type of a pointer.)

We seem to be painting ourselves into a corner here with impossible
constraints. Clearly, adding size to PMCs is undesirable, and I
recognize that you put a lot of work into shrinking the PMC struct over
a Perl 5 scalar. But, fresh from CVS:

BENCHMARK USER SYS %CPU TOTAL
------------------------------ ------- ------- ----- --------
addit.imc 23.53s 0.05s 96% 24.517
addit2.imc 15.97s 0.07s 98% 16.258
arriter.imc 0.03s 0.03s 37% 0.159
arriter_o1.imc 0.02s 0.04s 37% 0.160
fib.imc 4.41s 3.07s 97% 7.691
addit.pasm 17.84s 0.05s 95% 18.827
bench_newp.pasm 4.53s 0.16s 97% 4.824
freeze.pasm 1.47s 0.18s 82% 1.991
gc_alloc_new.pasm 0.20s 0.41s 72% 0.836
gc_alloc_reuse.pasm 19.09s 13.15s 98% 32.731
gc_generations.pasm 14.02s 5.42s 95% 20.424
gc_header_new.pasm 7.91s 1.37s 97% 9.558
gc_header_reuse.pasm 11.45s 0.09s 98% 11.733
gc_waves_headers.pasm 3.05s 0.40s 95% 3.597
gc_waves_sizeable_data.pasm 2.22s 3.81s 97% 6.200
gc_waves_sizeable_headers.pasm 7.48s 1.78s 97% 9.480
hash-utf8.pasm 7.02s 0.07s 95% 7.404
primes.pasm 61.63s 0.10s 97% 1:03.300
primes2.pasm 39.26s 0.09s 97% 40.189
primes2_p.pasm 69.00s 0.17s 96% 1:11.840
stress.pasm 2.19s 0.28s 91% 2.705
stress1.pasm 46.95s 1.33s 95% 50.377
stress2.pasm 8.63s 0.24s 98% 9.026
stress3.pasm 26.19s 0.78s 96% 28.063
TOTAL 7:21.890

And with a PMC struct that's bloated by 16 bytes as proposed:

BENCHMARK USER SYS %CPU TOTAL
------------------------------ ------- ------- ----- --------
addit.imc 23.67s 0.10s 96% 24.745
addit2.imc 16.05s 0.07s 98% 16.414
arriter.imc 0.03s 0.03s 44% 0.135
arriter_o1.imc 0.03s 0.04s 48% 0.144
fib.imc 4.47s 2.99s 86% 8.651
addit.pasm 18.08s 0.08s 98% 18.442
bench_newp.pasm 4.73s 0.18s 97% 5.019
freeze.pasm 1.56s 0.29s 89% 2.075
gc_alloc_new.pasm 0.25s 0.35s 89% 0.673
gc_alloc_reuse.pasm 18.90s 13.58s 96% 33.642
gc_generations.pasm 13.84s 5.80s 98% 19.942
gc_header_new.pasm 7.98s 1.29s 97% 9.492
gc_header_reuse.pasm 11.37s 0.05s 98% 11.552
gc_waves_headers.pasm 3.09s 0.33s 96% 3.538
gc_waves_sizeable_data.pasm 1.98s 4.01s 96% 6.207
gc_waves_sizeable_headers.pasm 7.67s 1.60s 91% 10.112
hash-utf8.pasm 7.03s 0.05s 97% 7.287
primes.pasm 61.68s 0.15s 98% 1:02.800
primes2.pasm 39.17s 0.12s 99% 39.553
primes2_p.pasm 69.01s 0.18s 96% 1:11.640
stress.pasm 2.32s 0.43s 96% 2.840
stress1.pasm 50.85s 1.79s 93% 56.189
stress2.pasm 9.18s 0.32s 98% 9.689
stress3.pasm 29.06s 1.23s 98% 30.748
TOTAL 7:31.527

That's only 2.1%.

What do you think the overall performance effect of fine-grained locking
will be? You just showed in a microbenchmark that it's 400% for some
operations. We've also heard anecdotal evidence of 400% *overall*
performance hits from similar threading strategies in other projects.
And remember, these overheads are ON TOP OF the user's synchronization
requirements; the PMC locks will rarely coincide with the user's
high-level synchronization requirements.

If these are the two options, I as a user would rather have a separate
threaded parrot executable which takes the 2.1% hit, rather than the
400% overhead as per above. It's easily the difference between usable
threads and YAFATTP (yet another failed attempt to thread perl).

—

Gordon Henriksen
mali...@mac.com

Leopold Toetsch

unread,

Jan 25, 2004, 3:34:28 PM1/25/04

to Gordon Henriksen, perl6-i...@perl.org

Gordon Henriksen <mali...@mac.com> wrote:
> Leopold Toetsch wrote:

>> Increasing the union size, so that each pointer is distinct is not an
>> option. This imposes considerable overhead on a non-threaded program
>> too, due its bigger PMC size.

> That was the brute-force approach, separating out all pointers. If the
> scalar hierarchy doesn't use all 4 of the pointers, then the bloat can
> be reduced.

Yep. Your proposal is a very thorough analysis, what the problems of
morph() currently are. I can imagine, that we have a distinct string_val
pointer, that isn't part of the value union. Morph is currently
implemented only (and necessary) for PerlScalar types. So if a
PerlString's string_val member is valid at any time, we could probably
save a lot of locking overhead.

> And what of the per-PMC mutex? Is that not also considerable overhead?
> More than an unused field, even.

We have to deal single-CPU (no-threaded) performance against threaded.
For the latter, we have from single-CPU to many-multi CPU NUMA systems a
wide spectrum of possibilties.

I currently don't want to slow down the "normal" no-threaded case.

>> To keep internal state consistent we have to LOCK shared PMCs, that's
>> it. This locking is sometimes necessary for reading too.

> Sometimes? Unless parrot can prove a PMC is not shared, PMC locking is
> ALWAYS necessary for ALL accesses to ANY PMC.

get_integer() on a PerlInt PMC is always safe. *If* the vtable is
pointing to a PerlInt PMC it yields a correct value (atomic int access
presumed). If for some reason (which I can't imagine now) the vtable
pointer and the cache union are out of sync, get_integer would produce a
wrong value, which is currently onsidered to be a user problem (i.e.
missing user-level locking).

The same holds for e.g. PerlNum, which might read a mixture of lo and hi
words, but again, the pmc->vtable->get_number of a PerlNum is a safe
operation on shared PMCs without locking too.

The locking primitives provide AFAIK the (might needed) memory barriers
to update the vtable *and* the cache values to point to consistent data
during the *locking* of mutating vtable methods.

> BENCHMARK USER SYS %CPU TOTAL

> That's only 2.1%.

[ big snip on benchmarks ]

All these benchmarks show a scheme like: allocate once and use once. I.e.
these benchmarks don't reuse the PMCs. They don't show RL program
behavior. We don't have any benchmarks currently that expose a near to
worst-case slowdown, which went towards 400% for doubled PMC sizes.

We had some test programs (C only with simulated allocation schemes), that
shuffled allocated PMCs in memory (a typical effect of reusing PMC
memory a few time). The current 20 byte PMC went down by 200% the old 32
byte PMC went down by 400%.

The point is, that when you get allocated PMC from the free_list, they
are more and more scattered in the PMC arenas. All current benchmarks
tend to touch contiguous memory, while RL (or a bit longer running)
programs don't.

That said, I do consider PMC sizes and cache pollution the major
performance issues of RL Parrot programs. These benchmarks don't show
that yet.

> What do you think the overall performance effect of fine-grained locking
> will be? You just showed in a microbenchmark that it's 400% for some
> operations.

Yes. That's the locking overhead for the *fastest* PMC vtable methods. I
think, that we'll be able to have different locking strategies in the
long run. If you want to have a more scalable application on a many-CPU
system, a build-option may provide this. For a one or two CPU system, we
can do a less fine grained locking with less overhead. That might be a
global interpreter lock for that specific case.

> And remember, these overheads are ON TOP OF the user's synchronization
> requirements; the PMC locks will rarely coincide with the user's
> high-level synchronization requirements.

User level locking is't layed out yet. But my 2˘ towards that are: a
user will put a lock around unsafe (that is shared) variable access. We
have to lock internally a lot of times to keep our data-integrity, which
is guaranteed. So the question is, why not to provide that integrity
per se (the user will need it anyway and lock). I don't see a difference
her for data integrity, *but* if all locking is under our control, we
can optimize it and it doesn't conflict or it shouldn't deadlock.

> If these are the two options, I as a user would rather have a separate
> threaded parrot executable which takes the 2.1% hit, rather than the
> 400% overhead as per above. It's easily the difference between usable
> threads and YAFATTP (yet another failed attempt to thread perl).

All these numbers are by far too premature, to have any impact on RL
applications.

Please note, that even with a 400% slowdown for one vtable operation the
mops_p.pasm benchmark would run 4 to eight times faster then on an
*unthreaded* perl5. Thread spawning is currently 8 times faster then on
perl5.

> ?

!!!1

> Gordon Henriksen

leo

Dan Sugalski

unread,

Jan 29, 2004, 6:09:15 AM1/29/04

to Gordon Henriksen, l...@toetsch.at, perl6-i...@perl.org

At 10:50 AM -0500 1/24/04, Gordon Henriksen wrote:
>On Saturday, January 24, 2004, at 09:23 , Leopold Toetsch wrote:
>
>>Gordon Henriksen <mali...@mac.com> wrote:
>>
>>>... Best example: morph. morph must die.
>>
>>Morph is necessary. But please note: morph changes the vtable of
>>the PMC to point to the new data types table. It has nothing to do
>>with a typed union.
>
>The vtable IS the discriminator. I'm referring to this:
>
> typedef union UnionVal {
> struct { /* Buffers structure */
> void * bufstart;
> size_t buflen;
> } b;
> struct { /* PMC unionval members */
> DPOINTER* _struct_val; /* two ptrs, both are defines */
> PMC* _pmc_val;
> } ptrs;
> INTVAL int_val;
> FLOATVAL num_val;
> struct parrot_string_t * string_val;
> } UnionVal;
>
>So long as the discriminator does not change, the union is type stable.

The vtable's not the discriminator there, the flags in the pmc are
the discriminator, as they're what indicates that the union's a
GCable thing or not. I will admit, though, that looks *very*
different than it did when I put that stuff in originally. (It used
to be just a union of FLOATVAL, INTVAL, and string pointer...)

Still, point taken. That needs to die and it needs to die now. For
the moment, lets split it into two pieces, a buffer pointer and an
int/float union, so we don't have to guess whether the contents have
issues with threads.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Dan Sugalski

unread,

Jan 29, 2004, 6:14:38 AM1/29/04

to Pete Lomax, perl6-i...@perl.org

At 1:47 AM +0000 1/25/04, Pete Lomax wrote:
>On Sat, 24 Jan 2004 13:59:26 -0500, Gordon Henriksen
><mali...@mac.com> wrote:
>
><snip>
>>It doesn't matter if an int field could read half of a double or v.v.;
>>it won't crash the program. Only pointers matter.
><snip>
>>These rules ensure that dereferencing a pointer will not segfault.
>In this model, wouldn't catching the segfault

Apart from anything else, I don't want to catch segfaults and bus
errors in parrot. (Well, OK, that's not true--I *do* want to catch
segfaults and bus errors, I just don't think it's feasible, or
possible on all platforms)

Gordon Henriksen

unread,

Jan 30, 2004, 11:29:42 AM1/30/04

to Dan Sugalski, l...@toetsch.at, perl6-i...@perl.org

Dan Sugalski wrote:

> Gordon Henriksen wrote:
>
> > Leopold Toetsch wrote:
> >

Hm. Well, both are a discriminator, then; dispatch to code which
presumes the contents of the union is quite frequently done without
examining the flags. Maybe use a VTABLE func instead to get certain
flags? i.e.,

INTVAL parrot_string_get_flags(..., PMC *pmc) {
return PMC_FLAG_IS_POBJ + ...;
}

Then, updating the vtable would atomically update the flags as well.
Or, hell, put the flags directly in the VTABLE if it's not necessary
for them to vary across instances.

I have the entire source tree (save src/ tests) scoured of that rat's
nest of macros for accessing PMC/PObj fields, but I broke something and
haven't had the motivation to track down what in the multi-thousand-
line-diff it was, yet. :( Else you'd have the patch already and plenty
of mobility in the layout of that struct. Near time to upgrade my poor
old G3, methinks; the build cycle kills me when I touch parrot/pobj.h.

Do any PMC classes use *both* struct_val *and* pmc_val concurrently? I
was looking for that, but am afraid I didn't actually notice.

--

Gordon Henriksen
IT Manager
ICLUBcentral Inc.
gor...@iclub.com

Leopold Toetsch

unread,

Jan 30, 2004, 10:58:34 AM1/30/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

[ PObj union ]

> Still, point taken. That needs to die and it needs to die now. For
> the moment, lets split it into two pieces, a buffer pointer and an
> int/float union, so we don't have to guess whether the contents have
> issues with threads.

The Buffer members (bufstart, buflen) of the union are never used for a
PMC. Also a PMC can't get converted into a Buffer or vv. These union
members are just there for DOD, so that one pobject_lives() (and other
functions) can be used for both PMCs and Buffers. That was introduced
when uniting Buffers and PMCs.

I don't see a problem with that.

The problem that Gordon expressed with morph is:

thread1 thread2

PerlInt->vtable->set_string_native
(int_val = 3)
LOCK()
perlscalar->vtable->morph:
pmc->vtable is now a PerlString
vtable, str_val is invalid
read access on pmc - non-locked
PerlString->vtable->get_integer
STRING *s = pmc->str_val
SIGBUS/SEGV on access of s

But that can be solved by first clearing str_val, then changing the
vtable.

leo

Gordon Henriksen

unread,

Jan 30, 2004, 11:49:09 AM1/30/04

to Leopold Toetsch, perl6-i...@perl.org

Leopold Toetsch wrote:

> Gordon Henriksen wrote:
>
> > ... in the multi-thousand-

> > line-diff it was, yet. :( Else you'd have the patch already
>

> 1) *no* multi-thousands line diffs
> 2) what is the problem, you like to solve?

Er? Extending to the rest of the source tree the huge patch to
classes which you already applied. No logic changes; just
cleaning those PObj accessor macros up.

Leopold Toetsch

unread,

Jan 30, 2004, 12:02:30 PM1/30/04

to Gordon Henriksen, perl6-i...@perl.org

Gordon Henriksen wrote:

>
> Er? Extending to the rest of the source tree the huge patch to
> classes which you already applied. No logic changes; just
> cleaning those PObj accessor macros up.

Ah sorry, that one. Please send in small bunches, a few files changed at
once.

leo

Leopold Toetsch

unread,

Jan 30, 2004, 11:43:27 AM1/30/04

to Gordon Henriksen, perl6-i...@perl.org

Gordon Henriksen wrote:

> Hm. Well, both are a discriminator, then; dispatch to code which
> presumes the contents of the union is quite frequently done without
> examining the flags.

The flags are *never* consulted for a vtable call in classes/*. DOD does
different things if a Buffer or PMC is looked at, but that doesn't
matter here.

> Then, updating the vtable would atomically update the flags as well.

Doesn't matter.

> Or, hell, put the flags directly in the VTABLE if it's not necessary
> for them to vary across instances.

No, flags are mutable and per PMC *not* per class.

> ... in the multi-thousand-

> line-diff it was, yet. :( Else you'd have the patch already

1) *no* multi-thousands line diffs

2) what is the problem, you like to solve?

> Do any PMC classes use *both* struct_val *and* pmc_val concurrently?

E.g. iterator.pmc. UnmanagedStruct uses int_val & pmc_val. This is no
problem. These PMCs don't morph.

leo

Gordon Henriksen

unread,

Jan 30, 2004, 12:20:43 PM1/30/04

to Leopold Toetsch, perl6-i...@perl.org

Leopold Toetsch wrote:

> Gordon Henriksen wrote:
>
> > Or, hell, put the flags directly in the VTABLE if it's not
> > necessary for them to vary across instances.
>
> No, flags are mutable and per PMC *not* per class.

Of course there are flags which must remain per-PMC. I wasn't
referring to them. Sorry if that wasn't clear.

If a flag is only saying "my VTABLE methods use the UnionVal as {a
void*/a PObj*/a PMC*/data}, so GC should trace accordingly," it may be
a waste of a per-object flag bit to store those flags with the PMC
instance rather than with the PMC class. And if it's with the VTABLE,
then it doesn't need to be traced. (But, then, all PObjs don't have
VTABLES...)

Sidebar:
If we're looking at lock-free concurrency, flag updates probably have
to be performed with atomic &'s and |'s. BUT: Doesn't apply during GC,
since other threads will have to be stalled then.

> > Do any PMC classes use *both* struct_val *and* pmc_val concurrently?
>
> E.g. iterator.pmc. UnmanagedStruct uses int_val & pmc_val. This is no
> problem. These PMCs don't morph.

Er, int_val and pmc_val at the same time? That's not quite what the
layout
provides for:

typedef union UnionVal {
struct { /* Buffers structure */
void * bufstart;
size_t buflen;
} b;
struct { /* PMC unionval members */
DPOINTER* _struct_val; /* two ptrs, both are defines */
PMC* _pmc_val;
} ptrs;
INTVAL int_val;
FLOATVAL num_val;
struct parrot_string_t * string_val;
} UnionVal;

Says to me:

struct_val and pmc_val concurrently
-- or --
bufstart and buflen concurrently
-- or --
int_val
-- or --
num_val
-- or --
string_val

I don't know if C provides a guarantee that int_val and ptrs._pmc_val
won't
overlap just because INTVAL and DPOINTER* fields happen to be the same
size.
At least one optimizing compiler I know of, MrC/MrC++, would do some
struct
rearrangement when it felt like it.

Leopold Toetsch

unread,

Jan 30, 2004, 1:00:52 PM1/30/04

to Gordon Henriksen, perl6-i...@perl.org

Gordon Henriksen wrote:

> Leopold Toetsch wrote:
>>No, flags are mutable and per PMC *not* per class.
>
> Of course there are flags which must remain per-PMC. I wasn't
> referring to them. Sorry if that wasn't clear.
>
> If a flag is only saying "my VTABLE methods use the UnionVal as {a
> void*/a PObj*/a PMC*/data}, so GC should trace accordingly," it may be
> a waste of a per-object flag bit to store those flags with the PMC
> instance rather than with the PMC class.

All DOD related flags in the fast paths (i.e. for marking scalars) are
located in the PMCs arena (with ARENA_DOD_FLAGS is on). This reduces
cache misses during DOD to nearly nothing. More DOD related information
is in the flags part of the Pobj - but accessing that also means cache
pollution. Putting flags elsewhere too, needs one more indirection and
allways an access to the PMC memory itself. This doesn't give us any
advantage.

But again, flags don't matter during setting or getting a PMCs data.
Flags aren't used in classes for these purposes.

There are very few places in classes, where flags are even changed. This
is morphing scalars, and Key PMCs come to my mind.

> If we're looking at lock-free concurrency, flag updates probably have
> to be performed with atomic &'s and |'s.

Almost all mutating vtable methods will lock the pmc.

> Er, int_val and pmc_val at the same time?

I know :) This isn't the safest thing we have. After your union accessor
patches, we can clean that up, and use a notation so that for this case,
the two union members really can't overlap.

leo

Leopold Toetsch

unread,

Jan 30, 2004, 12:00:35 PM1/30/04

to perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> wrote:

[ perlscalar moprh ]

> But that can be solved by first clearing str_val, then changing the
> vtable.

Fixed. I currently don't see any more problems related to perscalars.

PerlStrings are unsafe per se, as long as we have the copying GC. They
need a lock during reading too. All other perscalars should be safe now
for non-locked reading. Mutating vtables get a lock.

leo