[RFC] 2. Proposal for

Leopold Toetsch

unread,

Sep 20, 2002, 5:12:18 AM9/20/02

to P6I

2. Proposal for _keyed opcodes
------------------------------

The thread with subject "pdd06_pasm, pdd08_keys: _keyed ops" clearly
showes the shortcomings of the current _keyed opcodes and the
implementation of these.[1]

My first proposal WRT a solution (modifying the run loop) did not earn
much consent and when looking closer at JIT, I have to admit, that it
seems rather impractical, or, when implemented might have bad
influence on JIT execution times.

So here is another possible solution:

The current opcode:

set_n_p_ki

will translate into 2 opcodes:

key_p_p_ki [2]
set_n_p

The 3 operand keyed add @a[$i] = @b[3] + %h{"k"}:

add_p_ki_p_kic_p_kc

(which we don't have)

would be 4 ops

key_p_p_ki [3]
key_p_p_kic
key_p_p_kc
add_p_p_p

So a keyed access would have a "key_" opcode, fetching a value pointer
out of the aggregate, and a plain operation working on these values.

Some thoughts on implementation
-------------------------------

The assembler respectively imcc would generate this op sequence for
the current bracketed keyed syntax. assembler.pl could reserve e.g.
P29-P31 for the usage in one key sequence, imcc does dynamic register
allocation anyway.

[2]
PerlHash stores a HASH_ENTRY (a "UnionVal", with a type), a PerlArray
uses a "PMC *" entry for its data, MultiArray uses a different UnionVal
based store.

The UnionVal allowes for a compact storage of primitive types, while a
PMC is more efficient for PMC types and a universal pointer.

As a HASH_ENTRY stores the data type of the value, it would be
possible to optimize above keyed set to:

key_n_p_ki
set_n_n

... with the drawback of some more opcodes. If the used types are
known at compile time, imcc could optimize Binops too, to use native
types.

For LHS usage, we need a pointer into the aggregates storage, so
we use a PMC anyway.

[3]
For LHS usage of compound keys (@a[$b][1] = ..), we might need a
special opcode, if we should autovivify the intermediate aggregate.

And finally: the stores must not move, while we are operating on the
value PMCs.

Comments welcome,
leo

[1] Current state and restrictions
----------------------------------

Looking from HL (perl6;-) down to the inyards of parrot, we have e.g.:

$n = @a[$i]; # assuming native types

which translates to PASM:

set N0, P0[I0]

giving an opcode:

set_n_p_ki

which calls

get_number_keyed_int(...)

on the perlarray PMC, returning directly the stored "num" from array.

Now we have functions returning STRING, number, int and additionally
two versions of these, one taking a KEY and the second (optimized)
version taking directly an integer array index.

Implementing e.g.

@æ[$i] = $j + @b[$k]

would need a "add_keyed_int()" function for perlint, swapping
arguments would imply for perlarray to have such a function and so on.

Now pdd06_pasm states, that all operations might have 3 keys, so we
would end up with basically 64 times the opcodes, we have now, because
each keyed access in the 3 operands has 4 possible types of keys
(_k,_ki,_kic,_kc).

Leopold Toetsch

unread,

Oct 21, 2002, 11:46:51 AM10/21/02

to Leopold Toetsch, P6I

Leopold Toetsch wrote:

> 2. Proposal for _keyed opcodes
> ------------------------------
>
> The thread with subject "pdd06_pasm, pdd08_keys: _keyed ops" clearly
> showes the shortcomings of the current _keyed opcodes and the
> implementation of these.[1]

> The 3 operand keyed add @a[$i] = @b[3] + %h{"k"}:
>
> add_p_ki_p_kic_p_kc

Attached is a proof of concept of my proposal.

A 6 operand 3 keyed op get's rewritten like so:

/* OP _p_k _p_k_p_k =>
* set py, p_k
* set pz, p_k
* new px, .PerlUndef
* OP px, py, pz
* set _p_k_px
*/

- It uses only set_ ops
- only imcc, but with pasm syntax in example

With an approach like this, we could cut down the VTABLE to roughly 1/3
of it's current size. The _keyed entrys would only consist of the
set_.._keyed{,_int} variants plus exists_keyed and defined_keyed. And,
we would never have the opcode explosion to 64 times of current size.

Attached is
- patch to imcc.y
- 3key.imc (example)
- a.pasm (output for above)

$ parrot a.pbc
300
leo

multi_keyed.patch

Juergen Boemmels

unread,

Oct 21, 2002, 2:45:08 PM10/21/02

to Leopold Toetsch, perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> writes:

> Leopold Toetsch wrote:
>
> > 2. Proposal for _keyed opcodes
> > ------------------------------
> > The thread with subject "pdd06_pasm, pdd08_keys: _keyed ops" clearly
>
> > showes the shortcomings of the current _keyed opcodes and the
> > implementation of these.[1]
>
>
> > The 3 operand keyed add @a[$i] = @b[3] + %h{"k"}:
> > add_p_ki_p_kic_p_kc
>
>
>
> Attached is a proof of concept of my proposal.
>
> A 6 operand 3 keyed op get's rewritten like so:
>
> /* OP _p_k _p_k_p_k =>
> * set py, p_k
> * set pz, p_k
> * new px, .PerlUndef
> * OP px, py, pz
> * set _p_k_px
> */
>
> - It uses only set_ ops
> - only imcc, but with pasm syntax in example

I like this approach. (The first time you posted it I didn't
understand it).

What happens if you try to use it on an object which has no real
components like a bitvector or a packed structure? There will always
be a pack-unpack cycle. On the other hand, is there any case where
this pack-unpack cycle can be avoided?

The newly created PMC is of type PerlUndef. Is this the correct
behavior or shouldn't it be the same type as the old element in the
Component? Something like this.

# OP Px[kx], Py[ky], Pz[kz]
set Pa, Py[ky]
set Pb, Pz[kz]
typeof Ii,Px[kx]
new Pc, Ii
OP Pc, Pa, Pb
set Px[kx], Pc

> With an approach like this, we could cut down the VTABLE to roughly
> 1/3 of it's current size. The _keyed entrys would only consist of the
> set_.._keyed{,_int} variants plus exists_keyed and defined_keyed. And,
> we would never have the opcode explosion to 64 times of current size.

Instead of 1 opcode and 1 vtable-lookup with a quite complex
vtable-function you have 6 opcodes and 5 vtable lookups with simple
vtable-functions.

From the just opcode-counting point of view, the first solution is
surley better, but the complexity of the vtable function is much
larger than in the later case. The vtable has to distinguish all 8
possible combinations of keyed/non-keyed and than do up to 4
vtable-lookups with the resulting calls to the vtable functions. So
behind the behind the scenes the total number of vtable-calls
(except for some very special cases) is exactly the same. Remains the
number of opcodes. But as the opcode-table gets smaller so its much more
likely that it fits into the cache. Also no vtable function has to
decide wether its called with 1, 2 or 3 keyed elements.

bye
b.
--
Juergen Boemmels boem...@physik.uni-kl.de
Fachbereich Physik Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F 23 F6 C7 2F 85 93 DD 47

Dan Sugalski

unread,

Oct 21, 2002, 3:00:16 PM10/21/02

to Leopold Toetsch, P6I

At 5:46 PM +0200 10/21/02, Leopold Toetsch wrote:
>Leopold Toetsch wrote:
>
>>2. Proposal for _keyed opcodes
>>------------------------------
>>
>>The thread with subject "pdd06_pasm, pdd08_keys: _keyed ops" clearly
>>showes the shortcomings of the current _keyed opcodes and the
>>implementation of these.[1]
>
>
>>The 3 operand keyed add @a[$i] = @b[3] + %h{"k"}:
>>
>> add_p_ki_p_kic_p_kc

>Attached is a proof of concept of my proposal.
>
>

>With an approach like this, we could cut down the VTABLE to roughly
>1/3 of it's current size. The _keyed entrys would only consist of
>the set_.._keyed{,_int} variants plus exists_keyed and
>defined_keyed. And, we would never have the opcode explosion to 64
>times of current size.

The big disadvantage here is speed. It means that specialized
aggregates will have to create temporary PMCs for things that don't
already have them, which is potentially slow and wasteful, something
I'd rather avoid. Encouraging the use of specialized aggregates is
one of the reasons for the typing system coming in with perl 6, and
given the size of aggregates in perl 5 I think it's something that
will see some heavy use.

I don't mind the opcode explosion, honestly. It's automatically
generated, and that's not a big deal. There are other ways to cut
down on it as well, if we find the need.

For the moment, I'd rather things stay the way they are. If we can
produce demonstrable speed wins, then I'll change my mind. For now,
though, things stay generally the way they are. We can do some mild
reworking to get things manageable if they're currently really
unmanageable.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Leopold Toetsch

unread,

Oct 22, 2002, 2:23:49 AM10/22/02

to Juergen Boemmels, perl6-i...@perl.org

Juergen Boemmels wrote:

> Leopold Toetsch <l...@toetsch.at> writes:

> What happens if you try to use it on an object which has no real
> components like a bitvector or a packed structure?

The substituted code for an aggregate is:
set Py, P1[k1]
and for a non keyed operand:
set Py, {N,S,I}1

> There will always
> be a pack-unpack cycle. On the other hand, is there any case where
> this pack-unpack cycle can be avoided?

Yes, if the 3rd operand is an integer, we have some opcodes like

add_p_p_i

> The newly created PMC is of type PerlUndef. Is this the correct
> behavior or shouldn't it be the same type as the old element in the
> Component? Something like this.

As in my example, the LHS might and does not exist yet, so you can't get
the type of it. And The PerlUndef convert's itself to the result of the
operation automagically, so it's IMHO the correct way to do it.

>>With an approach like this, we could cut down the VTABLE to roughly
>>1/3 of it's current size. The _keyed entrys would only consist of the
>>set_.._keyed{,_int} variants plus exists_keyed and defined_keyed. And,
>>we would never have the opcode explosion to 64 times of current size.

> Instead of 1 opcode and 1 vtable-lookup with a quite complex
> vtable-function you have 6 opcodes and 5 vtable lookups with simple
> vtable-functions.

Take the "new px, .PerlUndef" out of the calculation, it can be done
only once. The vtable lookup's are there, either hidden in a complex
opcode, or explictely like in my solution.

So the difference are +4 opcodes worst case. But opcode dispatch is
really fast. A better cache locality will bring us back this speed
difference.

> ... Also no vtable function has to

> decide wether its called with 1, 2 or 3 keyed elements.

Yes, another advantage, I didn't think of. Currently all _keyed vtable
calls have to check, it the key is really there. This could be tossed.

> bye
> b.

leo

Leopold Toetsch

unread,

Oct 22, 2002, 2:55:48 AM10/22/02

to Dan Sugalski, P6I

Dan Sugalski wrote:

> At 5:46 PM +0200 10/21/02, Leopold Toetsch wrote:

>> With an approach like this, we could cut down the VTABLE to roughly
>> 1/3 of it's current size. The _keyed entrys would only consist of the
>> set_.._keyed{,_int} variants plus exists_keyed and defined_keyed. And,
>> we would never have the opcode explosion to 64 times of current size.

> The big disadvantage here is speed. It means that specialized aggregates
> will have to create temporary PMCs for things that don't already have
> them, which is potentially slow and wasteful, something I'd rather
> avoid.

No, take the "new px, .PerlUndef" out of the opcode, it must only be
done once. This temporary Px can be reused by _all_ _keyed ops.

If the LHS doesn't exist, like in my example, a new PMC has to be
created anyway. If the LHS exists, the set_value methods assigns a new
value.

The only difference of my solution is +4 opcode dispatches worst case
minus some checks, we now have, that a key exists. These checks in each
_keyed method could be tossed then.

And for my demonstration I did use current existing set_ ops. I could
imagine, thats - as in my proposal - special key_ ops could be used,
which have the variable/value split built in, i.e. they could prepare
pointers to the PMC values directly.

> Encouraging the use of specialized aggregates is one of the
> reasons for the typing system coming in with perl 6, and given the size
> of aggregates in perl 5 I think it's something that will see some heavy
> use.

The HL language can and will use multi_keyed operations. Sean already
stated, that probably every (local) variable usage will be a multi_keyed
operation, where the aggregates are the lexical scope pads.

I just hide these operations in the assembler (or imcc currently).

> I don't mind the opcode explosion, honestly. It's automatically
> generated, and that's not a big deal. There are other ways to cut down
> on it as well, if we find the need.

I do mind. We currently have ~900 opcodes. We would have ~60000 opcodes.
I can't imagine, that the CPU cache will be happy with that many
opcodes. You could multiply this by 4 (normal, CGoto, CPderef, JIT) for
an estimation of a final full fledged parrot executable size.

> For the moment, I'd rather things stay the way they are. If we can
> produce demonstrable speed wins, then I'll change my mind.

It's hard to demonstrate speed wins against a system we don't have. But
I'll have a closer look, how to simulate this.

> ... For now,

> though, things stay generally the way they are. We can do some mild
> reworking to get things manageable if they're currently really
> unmanageable.

As I already demonstrated wtih my op_stat, currently 1/3 opcodes is
unused, neither parrot tests, nor any perl6 program produces this ops.

With the final ~60000 opcodes, we would probably have 1000 used ops,
scattered over a huge program - this is unmanagable IMHO.

leo

Leopold Toetsch

unread,

Oct 22, 2002, 7:14:48 AM10/22/02

to Dan Sugalski, P6I

Dan Sugalski wrote:

> For the moment, I'd rather things stay the way they are. If we can
> produce demonstrable speed wins, then I'll change my mind.

I have some preliminary numbers:
- changed core.ops to include add_keyed [1]

- did implement add_keyed_int in array

- hacked set_pmc_keyed_int() to resemble the set_value approach [2]
- changed test program to do 10^5 add_keyed (+100 checks for ok)

$ time parrot a.pbc # 3 keyed emulation
ok

real 0m0.563s

$ time parrot k.pbc # 3 keyed opcode
ok

real 0m0.503s

So the direct add_keyed_int is ~10% faster then my proposal.

[1] This adds 9 opcodes. We still don't have the _kc variants.

[2] this breaks perl6's t/compiler/1_4, where proably a clone would help
(it's a postincrement @z[0]++).

So, when looking at the size of just one method (~60 lines for just
add_keyed_int) I can't imagine that implementing all these _keyed
methods is the way to go.

leo

multi_keyed2.patch

Juergen Boemmels

unread,

Oct 22, 2002, 8:20:32 AM10/22/02

to Leopold Toetsch, perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> writes:

> Dan Sugalski wrote:
>
> > At 5:46 PM +0200 10/21/02, Leopold Toetsch wrote:
>
>
> >> With an approach like this, we could cut down the VTABLE to roughly
> >> 1/3 of it's current size. The _keyed entrys would only consist of
> >> the set_.._keyed{,_int} variants plus exists_keyed and
> >> defined_keyed. And, we would never have the opcode explosion to 64
> >> times of current size.
>
>
> > The big disadvantage here is speed. It means that specialized
> > aggregates will have to create temporary PMCs for things that don't
> > already have them, which is potentially slow and wasteful, something
> > I'd rather avoid.
>
> No, take the "new px, .PerlUndef" out of the opcode, it must only be
> done once. This temporary Px can be reused by _all_ _keyed ops.

Im not sure about this. This really depends on the fact if the
op does something depending on the type of LHS. If it doesn't need any
information about this type then even the "new px, .PerlUndef" is not
really needed (This is just an artefact of the fact that the
vtable-functions are called with value of the register instead of the
register). But if the context-information is needed then the right
context must be provided. If this context evalutation is done in the
vtable function then its up to the PMC-implementation to decide if
its needed. OTOH the compiler has no chance to decide if this context
is necessary or not.

> If the LHS doesn't exist, like in my example, a new PMC has to be
> created anyway. If the LHS exists, the set_value methods assigns a new
> value.

If we would change the signature of the vtable function to
add (PMC *, PMC *, PMC **lhs) and the call in the op to
$2->vtable->add($2,$3,&$1), then the new is not need. The signature of
the add op could also be changed to add (out PMC, in PMC, in PMC)

> The only difference of my solution is +4 opcode dispatches worst case
> minus some checks, we now have, that a key exists. These checks in
> each _keyed method could be tossed then.

[...]

Leopold Toetsch

unread,

Oct 22, 2002, 10:40:56 AM10/22/02

to Juergen Boemmels, perl6-i...@perl.org

Juergen Boemmels wrote:

> Leopold Toetsch <l...@toetsch.at> writes:

>>No, take the "new px, .PerlUndef" out of the opcode, it must only be
>>done once. This temporary Px can be reused by _all_ _keyed ops.

> Im not sure about this. This really depends on the fact if the
> op does something depending on the type of LHS.

My timing test was done with the "new px, .PerlUndef" out of the loop,
and worked after some tweaking of the array_set_pmc_keyed_int.

> ... If it doesn't need any

> information about this type then even the "new px, .PerlUndef" is not
> really needed (This is just an artefact of the fact that the
> vtable-functions are called with value of the register instead of the
> register).

We will see, what Dan's plans WRT variable/value vtable's will look like.

> ... But if the context-information is needed then the right

> context must be provided. If this context evalutation is done in the
> vtable function then its up to the PMC-implementation to decide if
> its needed.

The LHS of an add_keyed may exist or not. Extending a PerlArray creates
a PerlUndef anyway. So the LHS is not concerned. The questions is
already, how to set the value of the LHS.

> If we would change the signature of the vtable function to
> add (PMC *, PMC *, PMC **lhs) and the call in the op to
> $2->vtable->add($2,$3,&$1), then the new is not need.

Yes, that's similar to my current hack.

> ... The signature of

> the add op could also be changed to add (out PMC, in PMC, in PMC)

If the add creates a new PMC holding the value (or the value is changed
directly), yes. I think this is similar to set vs assign.

> b.

leo

Dan Sugalski

unread,

Oct 23, 2002, 4:04:22 PM10/23/02

to Leopold Toetsch, Juergen Boemmels, perl6-i...@perl.org

At 8:23 AM +0200 10/22/02, Leopold Toetsch wrote:

>Juergen Boemmels wrote:
>
>>... Also no vtable function has to
>>decide wether its called with 1, 2 or 3 keyed elements.
>
>
>Yes, another advantage, I didn't think of. Currently all _keyed
>vtable calls have to check, it the key is really there. This could
>be tossed.

That is an issue, yep, as everything potentially needs to go NULL
checking. That's the one part of this that I'm not happy with, but
I'm willing to live with it.

[RFC] 2. Proposal for _keyed opcodes

Leopold Toetsch

Leopold Toetsch

Juergen Boemmels

Dan Sugalski

Leopold Toetsch

Leopold Toetsch

Leopold Toetsch

Juergen Boemmels

Leopold Toetsch

Dan Sugalski