The reason for scads of keyed variants

Dan Sugalski

unread,

Sep 1, 2003, 2:03:19 PM9/1/03

to perl6-i...@perl.org

I should read the list and respond to the outstanding stuff, but I
should also get this done, and since the former probably precludes
the latter...

Why, exactly, have I spec'd (nay, demanded!) that every darned
operation in a PMC's vtable have a keyed variant?

Simple. A combination of speed and space. (And yes, I know we're
going to fight over this one)

Now, for aggregates that hold PMCs and are relatively passive
containers (that is, they don't get in the way of anything besides
load and store operations, if that) the keyed variants provide no
benefit to speak of. Less opcode dispatch overhead, but there's not a
whole lot of that in general anyway, and on JITted cores there's no
win at all.

For aggregates that *don't* hold PMCs, though, that's where the win
is. Those are the sorts of aggregates we're aggressively targeting as
well. One of the big issues with perl, python, and ruby is that the
base variable data structure is big. (And we're not making it any
smaller with Parrot--our PMCs are pretty hefty still) Optimizing the
size of an individual scalar isn't much of a win, but optimizing the
size of arrays and hashes of scalars is a win. (This is one of the
lessons learned from Chip's Topaz project) Many hashes and arrays
don't need full-fledged PMCs in them, nor the space that full PMCs
take. A string, integer, or bool array is sufficient for many of the
uses that aggregates are put to.

This is a good argument for abstract aggregate element access, which
we have now, and that's good. Saves us space, potentially. Yay us.

How this argues for keyed access to the operations on aggregate
elements, however, is less obvious, but I think it's worth it.

If we don't have direct operations on aggregate elements but instead
have to do a fetch and perform the operation on what we fetch, it
means we have to create a temporary PMC for each 'fake' entry, one
potentially with a fair amount of knowledge about how the aggregate
works, which means that every aggregate will need to provide two
vtables, one for itself and one for an aggregate entry.

Going with direct operations on keyed aggregates, then, makes the
code somewhat more dense, since we only need to access one vtable per
operand rather than two (one to fetch the data and one to operate on
it). That's balanced somewhat by having to have two sets of PMC
opcodes, one that operates on all parts keyed and one that doesn't.

The integrated approach *also* makes it easier to optimize the
operation. For example, this code:

foo[12] = foo[12] + bar[15];

or the more compact

foo[12] += bar[15];

has the destination identical to one of the sources. In the keyed
access scheme, that's a single operation, one where foo's vtable
function can *easily* determine that the destination and one of the
sources is identical. (It's a pair of comparisons) If accessing foo
is somewhat expensive--for example requiring a DB access--having
knowledge that the source and destination are identical can allow for
optimizations that wouldn't otherwise be allowable. (For example, if
foo and bar represented fields in two tables, knowing that the
destination and one of the sources is identical can potentially cut
out one of the DB accesses) While this is doable if we pass in plain
scalar PMCs, it'll have to be more expensive, and as such is less
likely to be done.

Yes, I also know that this potentially breaks down in more complex
expressions. That's fine--it means that we can optimize some access
rather than all access. I'm OK with that, as it's better than
optimizing all accesses.

More information's available if anyone wants it, but this *is* the
way we're going unless someone proves that it's suboptimal in the
common case.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Leopold Toetsch

unread,

Sep 1, 2003, 5:17:10 PM9/1/03

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

[ heavily snipped ]

> Now, for aggregates that hold PMCs ...
> ... and on JITted cores there's no
> win at all.

> For aggregates that *don't* hold PMCs, though, that's where the win
> is.

> If we don't have direct operations on aggregate elements but instead

> have to do a fetch and perform the operation on what we fetch, it
> means we have to create a temporary PMC for each 'fake' entry, one
> potentially with a fair amount of knowledge about how the aggregate
> works, which means that every aggregate will need to provide two
> vtables, one for itself and one for an aggregate entry.

I don't see the point here especially why we would need a temporary PMC.
If we have an array of packed ints, I just need a pointer to the element
to work on it. This is very similar to the C<key> opcode I had in some of
my proposals.

> Going with direct operations on keyed aggregates, then, makes the
> code somewhat more dense, since we only need to access one vtable per
> operand rather than two (one to fetch the data and one to operate on
> it). That's balanced somewhat by having to have two sets of PMC
> opcodes, one that operates on all parts keyed and one that doesn't.

Not when you need 64 opcodes for the keyed variants. 64:1 isn't
"somewhat balanced".

> More information's available if anyone wants it, but this *is* the
> way we're going unless someone proves that it's suboptimal in the
> common case.

Implementation details wanted ;-)

leo

Dan Sugalski

unread,

Sep 1, 2003, 6:25:51 PM9/1/03

to l...@toetsch.at, perl6-i...@perl.org

At 11:17 PM +0200 9/1/03, Leopold Toetsch wrote:
>Dan Sugalski <d...@sidhe.org> wrote:
>
>[ heavily snipped ]
>
>> Now, for aggregates that hold PMCs ...
>> ... and on JITted cores there's no
>> win at all.
>
>> For aggregates that *don't* hold PMCs, though, that's where the win
>> is.
>
>> If we don't have direct operations on aggregate elements but instead
>> have to do a fetch and perform the operation on what we fetch, it
>> means we have to create a temporary PMC for each 'fake' entry, one
>> potentially with a fair amount of knowledge about how the aggregate
>> works, which means that every aggregate will need to provide two
>> vtables, one for itself and one for an aggregate entry.
>
>I don't see the point here especially why we would need a temporary PMC.
>If we have an array of packed ints, I just need a pointer to the element
>to work on it. This is very similar to the C<key> opcode I had in some of
>my proposals.

We can't do that. There's insufficient information about the
variables at compiletime to do anything other than call into the PMC
to do the operation, so the internal representation's irrelevant. As
far as parrot is concerned, it's a PMC and it doesn't know anything
about the inside.

> > Going with direct operations on keyed aggregates, then, makes the
>> code somewhat more dense, since we only need to access one vtable per
>> operand rather than two (one to fetch the data and one to operate on
>> it). That's balanced somewhat by having to have two sets of PMC
>> opcodes, one that operates on all parts keyed and one that doesn't.
>
>Not when you need 64 opcodes for the keyed variants. 64:1 isn't
>"somewhat balanced".

Erm.... you need to redo your math. Even *if* we went with a full set
of keyed and unkeyed parameters (and I don't see a reason to do so,
though we certainly can) it's not 64. At worst it's 16 for three-arg
ops, and that's for both the keyed int, keyed normal, and nonkeyed
version.

> > More information's available if anyone wants it, but this *is* the
>> way we're going unless someone proves that it's suboptimal in the
>> common case.
>
>Implementation details wanted ;-)

I'll go thump the ops preprocessor, then. There's no reason to
actually write all the code for it, as it can be autogenerated.

Matt Fowles

unread,

Sep 1, 2003, 8:45:12 PM9/1/03

to l...@toetsch.at, Dan Sugalski, perl6-i...@perl.org

All~

>>If we don't have direct operations on aggregate elements but instead
>>have to do a fetch and perform the operation on what we fetch, it
>>means we have to create a temporary PMC for each 'fake' entry, one
>>potentially with a fair amount of knowledge about how the aggregate
>>works, which means that every aggregate will need to provide two
>>vtables, one for itself and one for an aggregate entry.
>
> I don't see the point here especially why we would need a temporary PMC.
> If we have an array of packed ints, I just need a pointer to the element
> to work on it. This is very similar to the C<key> opcode I had in some of
> my proposals.

We could also have n reserved keyed PMCs that are used ONLY with the
prep keyed functions (where n is the greatest number of keyed items in
any op (3, I think)).

Matt

Leopold Toetsch

unread,

Sep 2, 2003, 2:34:49 AM9/2/03

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:
> At 11:17 PM +0200 9/1/03, Leopold Toetsch wrote:

>>I don't see the point here especially why we would need a temporary PMC.
>>If we have an array of packed ints, I just need a pointer to the element
>>to work on it. This is very similar to the C<key> opcode I had in some of
>>my proposals.

> We can't do that. There's insufficient information about the
> variables at compiletime to do anything other than call into the PMC
> to do the operation, so the internal representation's irrelevant. As
> far as parrot is concerned, it's a PMC and it doesn't know anything
> about the inside.

We don't loose the abstraction with a special C<key> opcode. This C<key>
opcode is the abstraction: get a pointer to an item in the aggregate (or
prepare for a LHS). This is a vtable method of the aggregate. Splitting
the 3-keyed operations down to simpler parts, that's it - no permutations.

>>Not when you need 64 opcodes for the keyed variants. 64:1 isn't
>>"somewhat balanced".

> Erm.... you need to redo your math.

So I'll try. We have 4 different "addressing" modes of one single keyed
argument:

set_p_k
set_p_kc
set_p_ki
set_p_kic

Now when we want to support 3-keyed operations there are 4*4*4 different
opcodes of an all-3-keyed operation (+ some more if one isn't keyed).

If we only have e.g. <op>_p_k_p_k_p_k we are missing the nice
optimization of integer indices, we have to go through key_integer(),
have to check for NULL keys and we need a full fledged (albeit) constant
Key PMC per argument.
Further: you have to implement type morphing, range checking BIG*
promotion in the non-keyed variants and in the keyed vtable variants
too. This is error-prone and a waste of memory.

Anf finally, you are saying that these are most usefull for aggregate
containing non-PMCs. What about

@a[i] = @b[i] + 1;

or the equivalent Perl6 vector operations[1]. More permutaions on opcodes
to implement these?

> ... Even *if* we went with a full set

> of keyed and unkeyed parameters (and I don't see a reason to do so,
> though we certainly can) it's not 64. At worst it's 16 for three-arg
> ops, and that's for both the keyed int, keyed normal, and nonkeyed
> version.

What I'm missing here?

>>Implementation details wanted ;-)

> I'll go thump the ops preprocessor, then. There's no reason to
> actually write all the code for it, as it can be autogenerated.

But please with a Configure switch to turn it off ;-)

[1] I expect the most speed gain here, when we can optimize to
hell.

@a >>=<< @b + 1 // @a, @b are known to be arrays of packed int

n = @b.elements()
c = n / CHUNK_SIZE
r = n % CHUNK_SIZE
for i (0..c-1)
ptra = @a.make_chunk(i)
ptrb = @b.get_chunk(i)
for (1..CHUNK_SIZE)
*ptr++ = *ptrb++ + 1
// do rest ...
@a.elements = @b.elements

This would be a seqence of some specialized opcodes (enventually with a
runtime check in front) and its fully JITtable.

leo

Luke Palmer

unread,

Sep 2, 2003, 3:57:45 AM9/2/03

to Leopold Toetsch, Dan Sugalski, perl6-i...@perl.org

Leopold Toetsch writes:
> Dan Sugalski <d...@sidhe.org> wrote:
> > At 11:17 PM +0200 9/1/03, Leopold Toetsch wrote:
>
> >>I don't see the point here especially why we would need a temporary PMC.
> >>If we have an array of packed ints, I just need a pointer to the element
> >>to work on it. This is very similar to the C<key> opcode I had in some of
> >>my proposals.
>
> > We can't do that. There's insufficient information about the
> > variables at compiletime to do anything other than call into the PMC
> > to do the operation, so the internal representation's irrelevant. As
> > far as parrot is concerned, it's a PMC and it doesn't know anything
> > about the inside.
>
> We don't loose the abstraction with a special C<key> opcode. This C<key>
> opcode is the abstraction: get a pointer to an item in the aggregate (or
> prepare for a LHS). This is a vtable method of the aggregate. Splitting
> the 3-keyed operations down to simpler parts, that's it - no permutations.

And I think you're saying that it'll be illegal to use this pointer PMC
if the aggregate changes or anything like that, so the proxy can be as
dumb and fast as possible... right? And that it wouldn't really need a
header. So it wouldn't really be a PMC. But that's okay, I think.

I'm wondering how the actual ops would look in this case though.

foo a[x], b[y]

turns into:

key P0, a[x]
key P1, b[y]
foo P0, P1

How does foo know that it's working on fake PMCs? Or am I missing your
vision (probably am)?

> [1] I expect the most speed gain here, when we can optimize to
> hell.
>
> @a >>=<< @b + 1 // @a, @b are known to be arrays of packed int

FWIW, that does something entirely useless[1]: the equivalent of:

@a = (1 + @b) x @a

I think you meant to say:

@a = @b >>+<< 1

> n = @b.elements()
> c = n / CHUNK_SIZE
> r = n % CHUNK_SIZE
> for i (0..c-1)
> ptra = @a.make_chunk(i)
> ptrb = @b.get_chunk(i)
> for (1..CHUNK_SIZE)
> *ptr++ = *ptrb++ + 1
> // do rest ...
> @a.elements = @b.elements
>
> This would be a seqence of some specialized opcodes (enventually with a
> runtime check in front) and its fully JITtable.

That would be, um, very cool. PDL, eat your heart out!

Luke

[1] I'm saying that word with full awareness of its connotations on a
list like this :-)

> leo

Leopold Toetsch

unread,

Sep 2, 2003, 5:17:52 AM9/2/03

to Luke Palmer, perl6-i...@perl.org

Luke Palmer <fibo...@babylonia.flatirons.org> wrote:
> Leopold Toetsch writes:

> And I think you're saying that it'll be illegal to use this pointer PMC
> if the aggregate changes or anything like that, so the proxy can be as
> dumb and fast as possible... right? And that it wouldn't really need a
> header. So it wouldn't really be a PMC. But that's okay, I think.

It would be a Key PMC probably with some extra flags. I didn't much
think about details yet. Its basically that part inside current keyed
set operations that gets at the value inside the aggregate.

This part of code is currently repeated all over followed by an
get- or set_<type> something operation basically.
For aggregates holding basic types a C-pointer would be enough.

> I'm wondering how the actual ops would look in this case though.

> foo a[x], b[y]

> turns into:

> key P0, a[x]
> key P1, b[y]
> foo P0, P1

Yes. The first one would be C<key_lhs> or such.

> How does foo know that it's working on fake PMCs? Or am I missing your
> vision (probably am)?

It doesn't matter for PMCs. The C<key_lhs> for C<a> has to prepare a new
PMC slot in the aggregate, P0 points at it, C<foo> sets the value then.

> I think you meant to say:

> @a = @b >>+<< 1

Oops, yes of course.

>> n = @b.elements()
>> c = n / CHUNK_SIZE
>> r = n % CHUNK_SIZE
>> for i (0..c-1)
>> ptra = @a.make_chunk(i)
>> ptrb = @b.get_chunk(i)
>> for (1..CHUNK_SIZE)
>> *ptr++ = *ptrb++ + 1
>> // do rest ...
>> @a.elements = @b.elements
>>
>> This would be a seqence of some specialized opcodes (enventually with a
>> runtime check in front) and its fully JITtable.

> That would be, um, very cool. PDL, eat your heart out!

I dunno, which of the many acronyms of PDL you are applying here :-) But
anyway such optimizations make vector ops fly at optimized C level
speed.
"Fat" multikeyed opcodes wont't do that.

> Luke

leo