Prederefed run cores

Leopold Toetsch

unread,

Oct 28, 2004, 5:13:51 AM10/28/04

to Perl 6 Internals

With the indirect register addressing all prederefed run cores
(Prederefed, CGP, Switch) are currently not functional, as these run
cores have absolute addresses in the prederefed code.

I see two ways to fix it:

1) use frame pointer relative addressing:
+ prederefed code is usable by different threads too
- ~4 times increase in code size of core_ops_*.{c,o} [1]

2) Re-prederef on function calls, if frame pointer differs
+ no impact on code size
- needs precise code length of functions
- threads need distinct prederefed code
- possibly slower then 1)

Comments welcome,
leo

[1] due to absolute addressing a constant argument and a register
argument have the same code, set_i_ic and set_i_i are the same.

Dan Sugalski

unread,

Oct 28, 2004, 8:54:56 AM10/28/04

to Leopold Toetsch, Perl 6 Internals

At 11:13 AM +0200 10/28/04, Leopold Toetsch wrote:
>With the indirect register addressing all prederefed run cores
>(Prederefed, CGP, Switch) are currently not functional, as these run
>cores have absolute addresses in the prederefed code.
>
>I see two ways to fix it:
>
>1) use frame pointer relative addressing:
> + prederefed code is usable by different threads too
> - ~4 times increase in code size of core_ops_*.{c,o} [1]
>
>2) Re-prederef on function calls, if frame pointer differs
> + no impact on code size
> - needs precise code length of functions
> - threads need distinct prederefed code
> - possibly slower then 1)

Or 3) Toss the prederef stuff entirely.
--
Dan

--------------------------------------it's like this-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Duraid Madina

unread,

Oct 28, 2004, 9:27:32 AM10/28/04

to Dan Sugalski, Leopold Toetsch, Perl 6 Internals

Dan Sugalski wrote:
> Or 3) Toss the prederef stuff entirely.

Which might not be quite as bad as it sounds: on at least one "strange
platform" (IA64 HP-UX) the native C compiler gets the switch core
running faster than the prederef core! (!)

Duraid

Leopold Toetsch

unread,

Oct 28, 2004, 11:36:43 AM10/28/04

to Dan Sugalski, Perl 6 Internals

Dan Sugalski wrote:
> At 11:13 AM +0200 10/28/04, Leopold Toetsch wrote:
>>
>> 1) use frame pointer relative addressing:
>> + prederefed code is usable by different threads too
>> - ~4 times increase in code size of core_ops_*.{c,o} [1]
>>
>> 2) Re-prederef on function calls, if frame pointer differs
>> + no impact on code size
>> - needs precise code length of functions
>> - threads need distinct prederefed code
>> - possibly slower then 1)
>
>
> Or 3) Toss the prederef stuff entirely.

Well, the prederefed function core (parrot -P) is for sure not
necessary. Are still remaining CGP and switched core, which is
prederefed too. CGP is by far the fasted run-core for JIT-less
architectures, if CGoto is available. The switched core can of course
run w/o prederef too.

But one thing is nice with prederef: it's by far the simplest way to
create a safe run core that verifies opcode arguments. This could of
course be done w/o predereferencing afterwords, but while you are
checking function args, predereferencing these is of almost zero cost.

Using option 1) above isn't really complicated. The problem we have is
code size and opcode count, which is a problem with the CGoto core too.

I've proposed not too long ago to toss all opcode variants with
constants and just leave:

set I, Ic
set N, Nc
set S, Sc

Immediate constants aren't really that useful with RISC cpus. You might
have a look at e.g. jit/arm/jit_emit.h:459 ff.

leo

Leopold Toetsch

unread,

Oct 28, 2004, 11:01:11 AM10/28/04

to Duraid Madina, Perl 6 Internals

Err, the switched core *is* a prederefed core.

> Duraid

leo

Leopold Toetsch

unread,

Nov 1, 2004, 5:12:50 AM11/1/04

to Perl 6 Internals

Leopold Toetsch wrote:
> Dan Sugalski wrote:
>
>> At 11:13 AM +0200 10/28/04, Leopold Toetsch wrote:
>>
>>>
>>> 1) use frame pointer relative addressing:
>>> + prederefed code is usable by different threads too
>>> - ~4 times increase in code size of core_ops_*.{c,o} [1]
>>>

I've now committed this case 1) as a fix for prederefed run cores. It's
unoptimized currently. make fulltest is passing again here.

>> Or 3) Toss the prederef stuff entirely.
>
>
> Well, the prederefed function core (parrot -P) is for sure not
> necessary.

Patches welcome to remove the plain prederefed function core
F<ops/core_ops_prederef.*>. F<lib/Parrot/OpTrans/CPrederef.pm> is still
needed as an abstract base class of CGP.pm and CSwitch.pm but can be
cleanued up too.

I still like to keep CGP and CSwitch run cores. The latter as the safe
run core with argument checking and as a fallback, if CGOTO isn't
available on that platform. The former as an extension for JIT to run
non-JITted opcodes. Similar to the current JIT_CGP stuff on i386, but in
a more general way:

For a sequence of non-JITted opcodes: create a copy of the byte-code of
these non-JITted opcodes and append one opcode that returns to JIT. Then
fill it with the CORE_ops_prederef__ opcode. Generate code to call this
piece of code via cgp_core().

leo

Dan Sugalski

unread,

Nov 1, 2004, 8:57:46 AM11/1/04

to Leopold Toetsch, Perl 6 Internals

At 11:12 AM +0100 11/1/04, Leopold Toetsch wrote:

>Leopold Toetsch wrote:
>
>>>Or 3) Toss the prederef stuff entirely.
>>
>>
>>Well, the prederefed function core (parrot -P) is for sure not necessary.
>
>Patches welcome to remove the plain prederefed function core
>F<ops/core_ops_prederef.*>. F<lib/Parrot/OpTrans/CPrederef.pm> is
>still needed as an abstract base class of CGP.pm and CSwitch.pm but
>can be cleanued up too.
>
>I still like to keep CGP and CSwitch run cores. The latter as the
>safe run core with argument checking and as a fallback, if CGOTO
>isn't available on that platform. The former as an extension for JIT
>to run non-JITted opcodes. Similar to the current JIT_CGP stuff on
>i386, but in a more general way:

While I want to keep the switch core, I'm still not seeing the need
for prederef with it. I'm presuming this crept in at some point and
just needs un-creeping?

Leopold Toetsch

unread,

Nov 1, 2004, 9:41:15 AM11/1/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

> While I want to keep the switch core, I'm still not seeing the need
> for prederef with it. I'm presuming this crept in at some point and
> just needs un-creeping?

Using prederef for switch has one advantage: it's a bit faster. Before
the indirect register addressing it had another one: it took only 1/4th
of code size because of the collapsing of constant and register variants
into one switch case.

There is of course no need to prederef the switched core.

Maybe benchmarking the two variants yields a final answer.

leo

Leopold Toetsch

unread,

Nov 1, 2004, 10:13:00 AM11/1/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

> Or 3) Toss the prederef stuff entirely.

And here is, why I want to keep the CGP core:

sub_i_i_i

0x81bbef0 <cgp_core+33488>: mov 0x4(%esi),%ecx
0x81bbef3 <cgp_core+33491>: mov 0x8(%esi),%edx
0x81bbef6 <cgp_core+33494>: mov 0xc(%esi),%eax
0x81bbef9 <cgp_core+33497>: add $0x10,%esi
0x81bbefc <cgp_core+33500>: mov (%eax,%edi,1),%eax
0x81bbeff <cgp_core+33503>: mov (%edx,%edi,1),%edx
0x81bbf02 <cgp_core+33506>: sub %eax,%edx
0x81bbf04 <cgp_core+33508>: mov %edx,(%ecx,%edi,1)
0x81bbf07 <cgp_core+33511>: jmp *(%esi)

if_i_ic

0x81b4152 <cgp_core+1330>: mov 0x4(%esi),%eax
0x81b4155 <cgp_core+1333>: cmpl $0x0,(%eax,%edi,1)
0x81b4159 <cgp_core+1337>: je 0x81b4167 <cgp_core+1351>
0x81b415b <cgp_core+1339>: mov 0x8(%esi),%eax
0x81b415e <cgp_core+1342>: mov (%eax),%eax
0x81b4160 <cgp_core+1344>: shl $0x2,%eax
0x81b4163 <cgp_core+1347>: add %eax,%esi
0x81b4165 <cgp_core+1349>: jmp *(%esi)
0x81b4167 <cgp_core+1351>: add $0xc,%esi
0x81b416a <cgp_core+1354>: jmp *(%esi)

%esi ... cur_opcode
%edi ... register frame pointer

A register access is 2 CPU instructions only:

mov 8(%esi), %edx # cur_opcode[2], i.e. offset of REG_INT(x)
mov (%edx, %edi, 1), %edx # get *(base + offset)

That's all.

$ ./parrot -C mops.pasm
Iterations: 100000000
Estimated ops: 200000000
Elapsed time: 2.156002
M op/s: 92.764291

That's an Athlon 800 - 8.5 CPU instructions per Parrot instruction.

leo

Leopold Toetsch

unread,

Nov 1, 2004, 11:26:51 AM11/1/04

to perl6-i...@perl.org

FWIW the CGP sub_i_i_i opcode on the PowerBook

0x001048d4 <cgp_core+35652>: lwz r0,8(r30)
0x001048d8 <cgp_core+35656>: lwz r2,12(r30)
0x001048dc <cgp_core+35660>: lwzx r0,r27,r0
0x001048e0 <cgp_core+35664>: lwzx r2,r27,r2
0x001048e4 <cgp_core+35668>: lwz r9,4(r30)
0x001048e8 <cgp_core+35672>: subf r0,r2,r0
0x001048ec <cgp_core+35676>: stwx r0,r27,r9
0x001048f0 <cgp_core+35680>: lwzu r2,16(r30)
0x001048f4 <cgp_core+35684>: mtctr r2
0x001048f8 <cgp_core+35688>: bctr

Only slightly longer caused by the branch sequence but also quite
compact.

leo