With the indirect register addressing all prederefed run cores (Prederefed, CGP, Switch) are currently not functional, as these run cores have absolute addresses in the prederefed code.
I see two ways to fix it:
1) use frame pointer relative addressing: + prederefed code is usable by different threads too - ~4 times increase in code size of core_ops_*.{c,o} [1]
2) Re-prederef on function calls, if frame pointer differs + no impact on code size - needs precise code length of functions - threads need distinct prederefed code - possibly slower then 1)
Comments welcome, leo
[1] due to absolute addressing a constant argument and a register argument have the same code, set_i_ic and set_i_i are the same.
At 11:13 AM +0200 10/28/04, Leopold Toetsch wrote:
>With the indirect register addressing all prederefed run cores >(Prederefed, CGP, Switch) are currently not functional, as these run >cores have absolute addresses in the prederefed code.
>I see two ways to fix it:
>1) use frame pointer relative addressing: > + prederefed code is usable by different threads too > - ~4 times increase in code size of core_ops_*.{c,o} [1]
>2) Re-prederef on function calls, if frame pointer differs > + no impact on code size > - needs precise code length of functions > - threads need distinct prederefed code > - possibly slower then 1)
Or 3) Toss the prederef stuff entirely. -- Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
Dan Sugalski wrote: > Or 3) Toss the prederef stuff entirely.
Which might not be quite as bad as it sounds: on at least one "strange platform" (IA64 HP-UX) the native C compiler gets the switch core running faster than the prederef core! (!)
Dan Sugalski wrote: > At 11:13 AM +0200 10/28/04, Leopold Toetsch wrote:
>> 1) use frame pointer relative addressing: >> + prederefed code is usable by different threads too >> - ~4 times increase in code size of core_ops_*.{c,o} [1]
>> 2) Re-prederef on function calls, if frame pointer differs >> + no impact on code size >> - needs precise code length of functions >> - threads need distinct prederefed code >> - possibly slower then 1)
> Or 3) Toss the prederef stuff entirely.
Well, the prederefed function core (parrot -P) is for sure not necessary. Are still remaining CGP and switched core, which is prederefed too. CGP is by far the fasted run-core for JIT-less architectures, if CGoto is available. The switched core can of course run w/o prederef too.
But one thing is nice with prederef: it's by far the simplest way to create a safe run core that verifies opcode arguments. This could of course be done w/o predereferencing afterwords, but while you are checking function args, predereferencing these is of almost zero cost.
Using option 1) above isn't really complicated. The problem we have is code size and opcode count, which is a problem with the CGoto core too.
I've proposed not too long ago to toss all opcode variants with constants and just leave:
set I, Ic set N, Nc set S, Sc
Immediate constants aren't really that useful with RISC cpus. You might have a look at e.g. jit/arm/jit_emit.h:459 ff.
> Which might not be quite as bad as it sounds: on at least one "strange > platform" (IA64 HP-UX) the native C compiler gets the switch core > running faster than the prederef core! (!)
>> At 11:13 AM +0200 10/28/04, Leopold Toetsch wrote:
>>> 1) use frame pointer relative addressing: >>> + prederefed code is usable by different threads too >>> - ~4 times increase in code size of core_ops_*.{c,o} [1]
I've now committed this case 1) as a fix for prederefed run cores. It's unoptimized currently. make fulltest is passing again here.
>> Or 3) Toss the prederef stuff entirely.
> Well, the prederefed function core (parrot -P) is for sure not > necessary.
Patches welcome to remove the plain prederefed function core F<ops/core_ops_prederef.*>. F<lib/Parrot/OpTrans/CPrederef.pm> is still needed as an abstract base class of CGP.pm and CSwitch.pm but can be cleanued up too.
I still like to keep CGP and CSwitch run cores. The latter as the safe run core with argument checking and as a fallback, if CGOTO isn't available on that platform. The former as an extension for JIT to run non-JITted opcodes. Similar to the current JIT_CGP stuff on i386, but in a more general way:
For a sequence of non-JITted opcodes: create a copy of the byte-code of these non-JITted opcodes and append one opcode that returns to JIT. Then fill it with the CORE_ops_prederef__ opcode. Generate code to call this piece of code via cgp_core().
>>Well, the prederefed function core (parrot -P) is for sure not necessary.
>Patches welcome to remove the plain prederefed function core >F<ops/core_ops_prederef.*>. F<lib/Parrot/OpTrans/CPrederef.pm> is >still needed as an abstract base class of CGP.pm and CSwitch.pm but >can be cleanued up too.
>I still like to keep CGP and CSwitch run cores. The latter as the >safe run core with argument checking and as a fallback, if CGOTO >isn't available on that platform. The former as an extension for JIT >to run non-JITted opcodes. Similar to the current JIT_CGP stuff on >i386, but in a more general way:
While I want to keep the switch core, I'm still not seeing the need for prederef with it. I'm presuming this crept in at some point and just needs un-creeping? -- Dan
--------------------------------------it's like this------------------- Dan Sugalski even samurai d...@sidhe.org have teddy bears and even teddy bears get drunk
Dan Sugalski <d...@sidhe.org> wrote: > While I want to keep the switch core, I'm still not seeing the need > for prederef with it. I'm presuming this crept in at some point and > just needs un-creeping?
Using prederef for switch has one advantage: it's a bit faster. Before the indirect register addressing it had another one: it took only 1/4th of code size because of the collapsing of constant and register variants into one switch case.
There is of course no need to prederef the switched core.
Maybe benchmarking the two variants yields a final answer.