One more thing...

Dan Sugalski

unread,

Apr 28, 2004, 1:47:14 PM4/28/04

to perl6-i...@perl.org

Not to sound like a Jackie Chan cartoon or anything, but...

If we go MMD all the way, we can skip the bytecode->C->bytecode
transition for MMD functions that are written in parrot bytecode, and
instead dispatch to them like any other sub.

Not to make this sound good or anything, of course. :-P
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Jarkko Hietaniemi

unread,

Apr 28, 2004, 2:31:50 PM4/28/04

to perl6-i...@perl.org, Dan Sugalski, perl6-i...@perl.org

Dan Sugalski wrote:

> Not to sound like a Jackie Chan cartoon or anything, but...

I was thinking Columbo, actually...

Leopold Toetsch

unread,

Apr 30, 2004, 5:35:52 AM4/30/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:
> If we go MMD all the way, we can skip the bytecode->C->bytecode
> transition for MMD functions that are written in parrot bytecode, and
> instead dispatch to them like any other sub.

Not really. Or not w/o significant overhead for MMD functions
implemented in C. Opcodes like C<invoke> that branch somewhere have a
special treatment in the JIZ (and other) run cores. These instruction
are a branch source, the next instruction is a branch target. This means
that all CPU registers must be flushed to Parrot's register file and
reloaded on the next instruction.

Prederefed run core have to recalulate their program counter relative to
the prederefed code.

So I'd rather not do that. I expect most of the functions being executed
are implemented in C and not in PASM/PIR. Operator overloading has to
have some cost :)

leo

Leopold Toetsch

unread,

Apr 30, 2004, 6:57:50 AM4/30/04

to perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> wrote:

> ... Operator overloading has to
> have some cost :)

$ ./bench -b=over #[1]
Numbers are relative to the first one. (lower is better)
p-j-Oc perl-th perl python ruby
overload 100% 151% 131% - -

Not too bad, but not really flying--as expected. Parrot built -O3.

> leo

leo

[1] if someone's wondering:
$ cat bench
perl tools/dev/parrotbench.pl -c=parrotbench.conf -b=^oo "$@"

Dan Sugalski

unread,

Apr 30, 2004, 8:47:26 AM4/30/04

to l...@toetsch.at, perl6-i...@perl.org

At 11:35 AM +0200 4/30/04, Leopold Toetsch wrote:
>Dan Sugalski <d...@sidhe.org> wrote:
>> If we go MMD all the way, we can skip the bytecode->C->bytecode
>> transition for MMD functions that are written in parrot bytecode, and
>> instead dispatch to them like any other sub.
>
>Not really. Or not w/o significant overhead for MMD functions
>implemented in C.

Well... about that. It's actually easily doable with a bit of
trickery. We can either:

1) Mark the overload subs as special and change their calling conventions
2) Wrap the overload subs in some bytecode that Does The Right
Thing--takes a continuation, pushes the registers to the stack, then
calls the overload sub--when we add them to the MMD table.

Leopold Toetsch

unread,

Apr 30, 2004, 11:27:16 AM4/30/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

> 1) Mark the overload subs as special and change their calling conventions

Different calling conventions are not really pleasant for the
compiler(s). But doable.

> 2) Wrap the overload subs in some bytecode that Does The Right
> Thing--takes a continuation, pushes the registers to the stack, then
> calls the overload sub--when we add them to the MMD table.

That has the same cost + overhead as the current scheme, which is just
that wrapper in C.

3) Inspect the delegated method or MMD sub and save only the needed
register range. E.g. if a MMD sub doesn't use I and N registers, only
S[0]..P[31] needs saving. That reduces memcpy cost by 3/5. Doesn't work,
when the sub calls another sub of course. But for simple functions it'll
work.

leo

Leopold Toetsch

unread,

May 2, 2004, 10:00:47 AM5/2/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:
> Well... about that. It's actually easily doable with a bit of
> trickery. We can either:

I have trickery number 4) here. Dunno if its doable, but worth
considering IMHO:

Here is mmd.pasm (using bxor but substitute any math/bitwise/... op).
Comments inline.

_main:
new P16, .PerlInt
new P17, .PerlInt
new P18, .PerlInt
set P17, 0b101
set P18, 0b100
# might call a PASM sub or not, who knows
bxor P16, P17, P18
#
print P16
print "\n"

# plain MMD
# install PASM handler for overloaded bxor
.include "pmctypes.pasm"
.include "vtable_constants.pasm"
find_global P19, "PerlInt_bxor"
mmdvtregister .VTABLE_BXOR, .PerlInt, .PerlInt, P19
bxor P16, P17, P18
print P16
print "\n"

# So, now when the compiler see's[1] a mmdvtregister .. BXOR, it changes
# the emitted code sequence of all bxor_p_p_p opcodes to:

# recompile each bxor_p_p_p
mmdvtfind P20, .VTABLE_BXOR, .PerlInt, .PerlInt
isnull P20, bxor_normal_1
set P5, P17
set P6, P18
set P7, P16
set P0, P20
pushtopp
invokecc
poptopp
branch bxor_done_1
bxor_normal_1:
bxor P16, P17, P18
bxor_done_1:
# end bxor_p_p_p

# the recompiled code could be compacted a bit by some helper opcodes,
# but it shows what could happen

print P16
print "\n"

end

.pcc_sub PerlInt_bxor:
set I16, P5
set I17, P6
bxor I18, I16, I17
set P7, I18
invoke P1

[1] would need a flag in loaded byte code too or such

Comments?
leo

Leopold Toetsch

unread,

May 2, 2004, 10:13:42 AM5/2/04

to perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> wrote:

> # recompile each bxor_p_p_p
> mmdvtfind P20, .VTABLE_BXOR, .PerlInt, .PerlInt

^^^^^^^^^^^^^^^^^^
This has to use the dynamic type of the PMC of course.

typeof I20, P17
typeof I21, P18
mmdvtfind P20, .VTABLE_BXOR, I20, I21
...

leo

Dan Sugalski

unread,

May 2, 2004, 2:53:02 PM5/2/04

to l...@toetsch.at, perl6-i...@perl.org

At 4:00 PM +0200 5/2/04, Leopold Toetsch wrote:
>Dan Sugalski <d...@sidhe.org> wrote:
>> Well... about that. It's actually easily doable with a bit of
>> trickery. We can either:
>
>I have trickery number 4) here. Dunno if its doable, but worth
>considering IMHO:

It's doable but the problem you run into is that if you can't be sure
that you're going to see a MMD-able PMC you need to do this
everywhere, just to be sure. Since generally we're not going to be
able to tell (joys of dynamic library loading) it'd mean we'd need to
emit that code all the time. And if the binary ops always expand, we
might as well make the compact versions just do the MMD stuff.

Leopold Toetsch

unread,

May 3, 2004, 3:39:13 PM5/3/04

to perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> wrote:

> 3) Inspect the delegated method or MMD sub and save only the needed
> register range.

Have this now running here locally and tested:

$ ./bench -b=^over

Numbers are relative to the first one. (lower is better)

p-j-Oc p-C-Oc perl-th perl python ruby
overload 100% 126% 300% 257% - -

This[1] doubles the overload benchmark performance. The "my_mul" function
uses (changes) only integer regs, so 128 bytes are saved now instead of
640. Object vtable method delegation will also be faster.

Cachgrind reports these numbers:

CVS: I refs: 1,070,689,038
D refs: 666,860,918
D1 misses: 4,030,633

now I refs: 464,016,706
D refs: 316,245,506
D1 misses: 1,530,816

Cache misses are still to high.

perl 5.8.0:
I refs: 1,189,527,716
D refs: 724,919,542
D1 misses: 24,844

leo

[1] not alone. dod_register_pmc() of the return continuation in
Parrot_runops_fromc() isn't really necessary. The old continuation is on
the CPU stack. The passed continuation is in the registers. I was a bit
too pessimistic, when coding this.

Leopold Toetsch

unread,

May 5, 2004, 6:02:20 AM5/5/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:
> At 11:35 AM +0200 4/30/04, Leopold Toetsch wrote:
>>Dan Sugalski <d...@sidhe.org> wrote:
>>> If we go MMD all the way, we can skip the bytecode->C->bytecode
>>> transition for MMD functions that are written in parrot bytecode, and
>>> instead dispatch to them like any other sub.
>>
>>Not really. Or not w/o significant overhead for MMD functions
>>implemented in C.

> Well... about that. It's actually easily doable with a bit of
> trickery. We can either:

This still doesn't work. Function calls just look different then
"plain" opcodes like "add Px, Py, Pz".
- it's not known, if C<add> calls a PASM subroutine
- if it calls a PASM routine, registers have to be preserved. Which
registers depend on the subroutine that actually gets called (ok, this
information - which registers are changed by the sub - can be attached
to the Sub's metadata)
- every opcode that possibly branches has a significant overhead for JIT
and prederefed run cores: they must recalculate their PC from a byte
code PC to a run loop PC.

Changing C<add> or any MMDed opcode to look like a branch is a severe
performance impact for the non-overloaded case.

WRT performance: You can set

#define SAVE_ALL_REGS 0

in interpreter.c:912. This checks the sub's register usage and saves
only needed registers, e.g. only 128 byte instead of 640 for the overload
benchmark. This makes MMD calls via Parrot_runops faster then a plain
function call + the check, if the operation is actually overloaded.

WRT continuations:
- It's highly unlikely that one would like to (ab)?use this
functionality, i.e. take a continuation from an overloaded PASM and
branch elsewhere.
- If we really need this "feature" it is doable. On each (re)entering of
the run loop a Parrot_exception is created. We would need a run loop
nesting level in the continuation. When a continuation is invoked and
the nesting level differs, we could longjmp(3) until we reach the old
nesting level and then resume at the continuation offset.

leo

Dan Sugalski

unread,

May 5, 2004, 9:14:57 AM5/5/04

to l...@toetsch.at, perl6-i...@perl.org

At 12:02 PM +0200 5/5/04, Leopold Toetsch wrote:
>Dan Sugalski <d...@sidhe.org> wrote:
>> At 11:35 AM +0200 4/30/04, Leopold Toetsch wrote:
>>>Dan Sugalski <d...@sidhe.org> wrote:
>>>> If we go MMD all the way, we can skip the bytecode->C->bytecode
>>>> transition for MMD functions that are written in parrot bytecode, and
>>>> instead dispatch to them like any other sub.
>>>
>>>Not really. Or not w/o significant overhead for MMD functions
>>>implemented in C.
>
>> Well... about that. It's actually easily doable with a bit of
>> trickery. We can either:
>
>This still doesn't work. Function calls just look different then
>"plain" opcodes like "add Px, Py, Pz".
>- it's not known, if C<add> calls a PASM subroutine
>- if it calls a PASM routine, registers have to be preserved. Which
> registers depend on the subroutine that actually gets called (ok, this
> information - which registers are changed by the sub - can be attached
> to the Sub's metadata)
>- every opcode that possibly branches has a significant overhead for JIT
> and prederefed run cores: they must recalculate their PC from a byte
> code PC to a run loop PC.
>
>Changing C<add> or any MMDed opcode to look like a branch is a severe
>performance impact for the non-overloaded case.

If the JIT structure makes it untenable, it doesn't work, and that's
fine. I don't think it has to be quite as bad as it is now, but on
the other hand the performance hit in general needed to make this
work better is probably not worth it.

Something to keep in mind once we get more of the base PMC types
implemented, and have more of an idea how much of the MMD code ends
up being bytecode vs C.

Leopold Toetsch

unread,

May 5, 2004, 9:52:07 AM5/5/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:
> At 12:02 PM +0200 5/5/04, Leopold Toetsch wrote:
>>Changing C<add> or any MMDed opcode to look like a branch is a severe
>>performance impact for the non-overloaded case.

> If the JIT structure makes it untenable, it doesn't work, and that's
> fine. I don't think it has to be quite as bad as it is now, but on
> the other hand the performance hit in general needed to make this
> work better is probably not worth it.

It's not that slow any more. Running overloaded PASM hasn't more
overhead then calling a sub. My Pentium 600 runs 1E6 overloaded C<bxor>
functions in 1.5 seconds. The overload benchmark is at 3 times the speed
of perl 5.8.2 now.
The SSE version of memcpy gave it a big boost on Pentiums. And there is
still SSE2, which I can't test.

> Something to keep in mind once we get more of the base PMC types
> implemented, and have more of an idea how much of the MMD code ends
> up being bytecode vs C.

I don't expect much of the basic functionality being in PASM.

leo

Piers Cawley

unread,

May 7, 2004, 9:11:19 AM5/7/04

to l...@toetsch.at, Dan Sugalski, perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> writes:

> Dan Sugalski <d...@sidhe.org> wrote:
>> At 11:35 AM +0200 4/30/04, Leopold Toetsch wrote:
>>>Dan Sugalski <d...@sidhe.org> wrote:
>>>> If we go MMD all the way, we can skip the bytecode->C->bytecode
>>>> transition for MMD functions that are written in parrot bytecode, and
>>>> instead dispatch to them like any other sub.
>>>
>>>Not really. Or not w/o significant overhead for MMD functions
>>>implemented in C.
>
>> Well... about that. It's actually easily doable with a bit of
>> trickery. We can either:
>
> This still doesn't work. Function calls just look different then
> "plain" opcodes like "add Px, Py, Pz".
> - it's not known, if C<add> calls a PASM subroutine
> - if it calls a PASM routine, registers have to be preserved. Which
> registers depend on the subroutine that actually gets called (ok, this
> information - which registers are changed by the sub - can be attached
> to the Sub's metadata)

No, we're in caller saves remember. The registers that need saving are
dependent on the caller. Since the registers used by a function at any
point are statically determined, maybe add's signature could be altered
to take an integer 'save flags' argument specifying which registers
need to be preserved for the caller, then if MMD determines that the
call needs to go out to a PASM function, the appropriate registers can
be saved.

Leopold Toetsch

unread,

May 7, 2004, 9:37:51 AM5/7/04

to Piers Cawley, perl6-i...@perl.org

Piers Cawley <pdca...@bofh.org.uk> wrote:
> Leopold Toetsch <l...@toetsch.at> writes:

>> - if it calls a PASM routine, registers have to be preserved. Which
>> registers depend on the subroutine that actually gets called (ok, this
>> information - which registers are changed by the sub - can be attached
>> to the Sub's metadata)

> No, we're in caller saves remember.

Ok, yes. But MMD and delegated functions are a bit different. The caller
isn't knowing that it's a caller. The PASM is run from the inside of the
C code.

> ... The registers that need saving are
> dependent on the caller.

Not quite for this case. Or in theory yes, but... As calling the
subroutine mustn't have any changes to the caller's registers, it's just
simpler to save these registers that the subroutine might change.

> ... Since the registers used by a function at any

> point are statically determined, maybe add's signature could be altered
> to take an integer 'save flags' argument specifying which registers
> need to be preserved for the caller,

This has a performance penalty for the non-MMD case. I can imagine that
overloaded MMD functions are simpler (in respect of register usage) then
the caller's code. So it seems that saving, what the MMD sub might
change on behalf of the caller is just more effective.

leo

Piers Cawley

unread,

May 11, 2004, 12:16:38 PM5/11/04

to l...@toetsch.at, perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> writes:

But generating the save signature for a given sub is a compile time cost
that only needs to be paid once for each sub and shoved on an I register
(which could, of course, be standardized). An MMD sub with a PASM
implementation simply looks at the appropriate register, saves the right
stuff, sets up a return continuation and has the interpreter invoke
it. Which leaves a correctly set up continuation chain and a PASM
implementation which can do whatever the heck it likes, including
making continuations, closures etc that can be returned to multiple
times because it got invoked in the normal runloop.

The work has to be done either way, but by arranging things so that
everything looks like caller saves (and so that there is no MMD barrier
to continuations) just seems to make the most sense. BTW, if it's a
continuation barrier does that also mean it's an exception barrier?

Leopold Toetsch

unread,

May 11, 2004, 4:21:56 PM5/11/04

to Piers Cawley, perl6-i...@perl.org

Piers Cawley wrote:
> Leopold Toetsch <l...@toetsch.at> writes:
>
>>Not quite for this case. Or in theory yes, but... As calling the
>>subroutine mustn't have any changes to the caller's registers, it's just
>>simpler to save these registers that the subroutine might change.

> But generating the save signature for a given sub is a compile time cost

> that only needs to be paid once for each sub and shoved on an I register

... once per sub per location where the sub is called from. But there
isn't any knowledge that a sub might be called. So the cost is actually
more per PMC instruction that might eventually run a PASM MMD. This is,
when its done right, or ...

> (which could, of course, be standardized).

Yes. saveall, which is really expensive.

> ... An MMD sub with a PASM

> implementation simply looks at the appropriate register, saves the right
> stuff, sets up a return continuation and has the interpreter invoke
> it.

Well, that's exactly how it works now, with a bit differing in the
meaning of "right stuff" :)

> The work has to be done either way, but by arranging things so that
> everything looks like caller saves (and so that there is no MMD barrier
> to continuations) just seems to make the most sense. BTW, if it's a
> continuation barrier does that also mean it's an exception barrier?

It looks like caller saves. The saved range of register's can't change
that view. If the caller or the called sub defines the saved register
range does in no way change *how* registers are saved. They are saved by
the C code that actually runs the PASM. And the PASM is run from C code.
These are the "problems".

And WRT continuation barrier: I already have said: if we really need
that (an opcode function "jumps" somewhere) then its possible. On each
enter of the run loop a setjmp(3) is done, which is also the base for
throwing exceptions from within an opcode function. There are no
barriers, AFAIK.

leo

Piers Cawley

unread,

May 12, 2004, 1:38:45 AM5/12/04

to Leopold Toetsch, perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> writes:

> Piers Cawley wrote:
>> Leopold Toetsch <l...@toetsch.at> writes:
>>
>>>Not quite for this case. Or in theory yes, but... As calling the
>>>subroutine mustn't have any changes to the caller's registers, it's just
>>>simpler to save these registers that the subroutine might change.
>
>> But generating the save signature for a given sub is a compile time cost
>> that only needs to be paid once for each sub and shoved on an I register
>
> ... once per sub per location where the sub is called from. But there
> isn't any knowledge that a sub might be called. So the cost is actually
> more per PMC instruction that might eventually run a PASM MMD. This is,
> when its done right, or ...

No. Once per compilation unit. Stick it in a high register and keep it nailed there
for the duration of the sub. Specify this register as part of the
calling conventions; the right value will then get restored at any
function return and there's no need to regenerate it.

Leopold Toetsch

unread,

May 12, 2004, 2:36:40 AM5/12/04

to Piers Cawley, perl6-i...@perl.org

Piers Cawley <pdca...@bofh.org.uk> wrote:
> Leopold Toetsch <l...@toetsch.at> writes:

[ calculating registers to save ]

>> ... once per sub per location where the sub is called from. But there
>> isn't any knowledge that a sub might be called. So the cost is actually
>> more per PMC instruction that might eventually run a PASM MMD. This is,
>> when its done right, or ...

> No. Once per compilation unit.

An example:

.sub foo

# a lot of string handling code
# and some PMCs
$P0 = concat $P0, $S0 # <<< 1) calculate: save P, S here
# now a lot of float code
# no strings used any more
# and no branch back to 1)
$N1 = 47.11 # $N1's live starts here
$P0 = $P1 + $N1 # <<< 2) calculate: save P, N regs
$P2 = $P0 + $N1 # <<< 3) calculate: save P regs
# no N reg used here
.end

At 1) the caller is not interested in preserving N-registers, these
aren't used there. Saving everything, the caller needs saving, ends up
with C<saveall> in non trivial subroutines.

Using your proposal would need a lot of storage for the saved
register ranges.

If the calculation is done based on the called subroutine, it's not
unlikely that only a few registers have to be preserved, e.g. no
N-registers for the overloaded C<concat> and no string registers for the
overloaded C<add>.

This doesn't violate the principle of caller saves: all that needs
preserving from the caller's POV is preserved.

leo

Piers Cawley

unread,

May 26, 2004, 6:53:12 AM5/26/04

to l...@toetsch.at, perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> writes:

But under this scheme, the implementing function will have to do a
saveall for every function it calls because it doesn't know what
registers its caller cares about. And you're almost certainly going
to want to call other functions to do the heavy lifting for all the
usual reasons of code reuse. I can see a situation where you end up
with

.sub implementing_function
saveall
invokecc user_callable_implementing_function
restoreall
invoke P1
.end

.sub user_callable_implementing_function
do_this(...)
do_that(...)
do_the_other(...)
...
.end

simply because you want to follow good coding practice. You're right
that, in the limiting case, my 'fingerprinting' approach is going to
reduce to a saveall, but the example you give could be broken
up into

.sub foo
$P0 = stringy_stuff($P0)
($P0, $P2) = floaty_stuff($P0)
...
.end

which will simply need save P registers (and the called functions will
be able to arrange for efficient saves too...)

Leopold Toetsch

unread,

May 26, 2004, 10:54:37 AM5/26/04

to Piers Cawley, perl6-i...@perl.org

Piers Cawley wrote:
>
> But under this scheme, the implementing function will have to do a
> saveall for every function it calls because it doesn't know what
> registers its caller cares about. And you're almost certainly going
> to want to call other functions to do the heavy lifting for all the
> usual reasons of code reuse.

Yep that's true. As well as with real caller saves. Which leads back to
my (almost) warnocked "proposal":

Subject: Register stacks again
Date: Sat, 08 May 2004

leo

Dan Sugalski

unread,

May 26, 2004, 1:29:59 PM5/26/04

to Leopold Toetsch, Piers Cawley, perl6-i...@perl.org

At 4:54 PM +0200 5/26/04, Leopold Toetsch wrote:
>Piers Cawley wrote:
>>
>>But under this scheme, the implementing function will have to do a
>>saveall for every function it calls because it doesn't know what
>>registers its caller cares about. And you're almost certainly going
>>to want to call other functions to do the heavy lifting for all the
>>usual reasons of code reuse.
>
>Yep that's true. As well as with real caller saves. Which leads back
>to my (almost) warnocked "proposal":

If you want to go back to a frame pointer style of register stack
access, that's doable, but that's the way it was in the beginning and
the performance penalties in normal code outweighed the savings in
stack pushes.

If you want to try it again to see if things are different I don't
care, so long as the semantics expressed to the bytecode programs
don't change. It will invalidate all the current JIT code on all the
platforms so it's a not-insignificant thing to do. I also don't think
we've sufficient real code to judge performance, so I think it's a
bit premature to worry about it.

Leopold Toetsch

unread,

May 28, 2004, 4:23:30 AM5/28/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

> If you want to go back to a frame pointer style of register stack
> access, that's doable, but that's the way it was in the beginning and
> the performance penalties in normal code outweighed the savings in
> stack pushes.

JITted memory access through the frame pointer is as fast as with
absolute memory addresses. The same is likely true for gcc/CGP core,
when we force the frame pointer being a CPU register.

> If you want to try it again to see if things are different I don't
> care, so long as the semantics expressed to the bytecode programs
> don't change. It will invalidate all the current JIT code on all the
> platforms so it's a not-insignificant thing to do.

That's the problem, yes.

> ... I also don't think

> we've sufficient real code to judge performance, so I think it's a
> bit premature to worry about it.

This is of course true, the more for changing it in the first place :)

What about issues with JIT and prederefed cores and multi-threading:
currently we need to "recompile" all bytecode per thread.

leo

Piers Cawley

unread,

Jun 1, 2004, 4:08:02 PM6/1/04

to Leopold Toetsch, perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> writes:

> Piers Cawley wrote:
>> But under this scheme, the implementing function will have to do a
>> saveall for every function it calls because it doesn't know what
>> registers its caller cares about. And you're almost certainly going
>> to want to call other functions to do the heavy lifting for all the
>> usual reasons of code reuse.
>
> Yep that's true. As well as with real caller saves. Which leads back to
> my (almost) warnocked "proposal":

Consider a sub, call it fred, that calls other subs and only uses PMC
registers. At compile time, you wrap those calls in appropriate
pushtopp/poptopp pairs.

Then, at runtime, 'fred' gets set up as the implemntation for an op.

Which, given your implementation, means that each function call that
fred makes should be protected with savetop/restoretop pairs. Oops.

Leopold Toetsch

unread,

Jun 2, 2004, 3:17:00 AM6/2/04

to Piers Cawley, perl6-i...@perl.org

Piers Cawley <pdca...@bofh.org.uk> wrote:

> Then, at runtime, 'fred' gets set up as the implemntation for an op.

> Which, given your implementation, means that each function call that
> fred makes should be protected with savetop/restoretop pairs. Oops.

The implementation checks register usage of the called sub at *runtime*,
or more precisely at first invocation of the sub and caches the value.
It would need a notification (similar to the method cache), if the sub
got recompiled.

leo

Piers Cawley

unread,

Jun 3, 2004, 6:20:27 PM6/3/04

to l...@toetsch.at, perl6-i...@perl.org

Leopold Toetsch <l...@toetsch.at> writes:

Who's talking about recompiling. I'm talking about fred being
registered as the handler for some op at runtime. ISTM that, either
fred has to get recompiled (assuming the source is kicking about) so
that every function call it makes is guarded with a saveall, or you
*always* do a saveall as you call fred so that the compiled in,
optimized saves will continue to work.

With the fingerprint approach I outlined, one can at least avoid the
saveall in some cases.

Dan Sugalski

unread,

Jun 4, 2004, 9:20:11 AM6/4/04

to l...@toetsch.at, perl6-i...@perl.org

At 10:23 AM +0200 5/28/04, Leopold Toetsch wrote:
>Dan Sugalski <d...@sidhe.org> wrote:
>
>> If you want to go back to a frame pointer style of register stack
>> access, that's doable, but that's the way it was in the beginning and
>> the performance penalties in normal code outweighed the savings in
>> stack pushes.
>
>JITted memory access through the frame pointer is as fast as with
>absolute memory addresses. The same is likely true for gcc/CGP core,
>when we force the frame pointer being a CPU register.

I think you'll find that's not the case. And it's certainly not the
case on non-x86 platforms. (I'm also not sure it's true on the x86-64
systems)

> > If you want to try it again to see if things are different I don't
>> care, so long as the semantics expressed to the bytecode programs
>> don't change. It will invalidate all the current JIT code on all the
>> platforms so it's a not-insignificant thing to do.
>
>That's the problem, yes.
>
>> ... I also don't think
>> we've sufficient real code to judge performance, so I think it's a
>> bit premature to worry about it.
>
>This is of course true, the more for changing it in the first place :)

No, changing it made good sense at the time, and it *still* makes
good sense. The base cores and the JIT on all platforms got a good
boost because of it.

>What about issues with JIT and prederefed cores and multi-threading:
>currently we need to "recompile" all bytecode per thread.

What about it? Threads or not, the single-threaded case is going to
be the common case for parrot, and it's mildly faster in general.
Emitting position-independent code's a transparent option if we want,
and I'm not inclined to change the architecture because of it.

Leopold Toetsch

unread,

Jun 4, 2004, 4:44:37 PM6/4/04

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:

> ... Threads or not, the single-threaded case is going to

> be the common case for parrot, and it's mildly faster in general.

That and the overhead to just test the scheme (in the absence of RL code)
are really good arguments to keep the status quo :)

leo