Context, continuations, and call speed

Dan Sugalski

unread,

Jun 12, 2003, 6:05:14 PM6/12/03

to perl6-i...@perl.org

Okay, at the moment I'm working on getting an implementation of
classes and objects working. I'm also taking a look at calling speed,
as I'd really like to not suck with our call times. :)

First off, the core stuff looks good. I'd not really looked at it
until now, but now that I have, well... good job folks. I put in a
minor tweak that may or may not speed things up (basically checking
for the COW flag on stack chunks and bailing on the walk up if
they're already marked) depending on what's resident in the L1/L2
cache. We'll see.

Second, I see that the registers themselves are in the context
structure. I think this may be a good part of our speed problem with
taking continuations. Now, continuations should *not* restore the
registers, so this strikes me as an incorrect thing to do, but before
I twiddle the context structure some and remove them, I want to check
and make sure that there's not a good reason to have them in.

So... anyone?
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

Luke Palmer

unread,

Jun 12, 2003, 6:28:37 PM6/12/03

to d...@sidhe.org, perl6-i...@perl.org

> Okay, at the moment I'm working on getting an implementation of
> classes and objects working. I'm also taking a look at calling speed,
> as I'd really like to not suck with our call times. :)
>
> First off, the core stuff looks good. I'd not really looked at it
> until now, but now that I have, well... good job folks. I put in a
> minor tweak that may or may not speed things up (basically checking
> for the COW flag on stack chunks and bailing on the walk up if
> they're already marked) depending on what's resident in the L1/L2
> cache. We'll see.
>
> Second, I see that the registers themselves are in the context
> structure. I think this may be a good part of our speed problem with
> taking continuations. Now, continuations should *not* restore the
> registers, so this strikes me as an incorrect thing to do, but before
> I twiddle the context structure some and remove them, I want to check
> and make sure that there's not a good reason to have them in.

Well, aren't the registers really part of the context? Saving the
registers to a stack and then not using a stack for control flow seems
kind of convoluted. I'd prefer that the context does restore the
registers, and that C<saveall> / C<restoreall> not be part of the
calling convention.

CPS puts the call stack into the register stack. But, if you put the
register context into the continuation, then everything about the
current execution context is inside P0, and you needn't write code to
mess with stacks. This seems to make more sense if you use
continuations for more than just pure stack-based execution.

Luke

Dan Sugalski

unread,

Jun 12, 2003, 6:49:04 PM6/12/03

to Luke Palmer, perl6-i...@perl.org

At 4:28 PM -0600 6/12/03, Luke Palmer wrote:
> > Okay, at the moment I'm working on getting an implementation of
>> classes and objects working. I'm also taking a look at calling speed,
>> as I'd really like to not suck with our call times. :)
>>
>> First off, the core stuff looks good. I'd not really looked at it
>> until now, but now that I have, well... good job folks. I put in a
>> minor tweak that may or may not speed things up (basically checking
>> for the COW flag on stack chunks and bailing on the walk up if
>> they're already marked) depending on what's resident in the L1/L2
>> cache. We'll see.
>>
>> Second, I see that the registers themselves are in the context
>> structure. I think this may be a good part of our speed problem with
>> taking continuations. Now, continuations should *not* restore the
>> registers, so this strikes me as an incorrect thing to do, but before
>> I twiddle the context structure some and remove them, I want to check
>> and make sure that there's not a good reason to have them in.
>
>Well, aren't the registers really part of the context?

Nope. If they were, there would be no way to return data from a
function. The return values go in the registers, remember, and we
return to the caller by invoking the continuation it passed into us.

Luke Palmer

unread,

Jun 12, 2003, 7:01:03 PM6/12/03

to d...@sidhe.org, perl6-i...@perl.org

> At 4:28 PM -0600 6/12/03, Luke Palmer wrote:
> > > Okay, at the moment I'm working on getting an implementation of
> >> classes and objects working. I'm also taking a look at calling speed,
> >> as I'd really like to not suck with our call times. :)
> >>
> >> First off, the core stuff looks good. I'd not really looked at it
> >> until now, but now that I have, well... good job folks. I put in a
> >> minor tweak that may or may not speed things up (basically checking
> >> for the COW flag on stack chunks and bailing on the walk up if
> >> they're already marked) depending on what's resident in the L1/L2
> >> cache. We'll see.
> >>
> >> Second, I see that the registers themselves are in the context
> >> structure. I think this may be a good part of our speed problem with
> >> taking continuations. Now, continuations should *not* restore the
> >> registers, so this strikes me as an incorrect thing to do, but before
> >> I twiddle the context structure some and remove them, I want to check
> >> and make sure that there's not a good reason to have them in.
> >
> >Well, aren't the registers really part of the context?
>
> Nope. If they were, there would be no way to return data from a
> function. The return values go in the registers, remember, and we
> return to the caller by invoking the continuation it passed into us.

But they really are, honest. We just don't restore all of the
context, so we can get back return values.

When something calls a sub, and does a C<saveall> before it, it's
saving its context so it can restore it after the sub returns. It
just doesn't restore over the parts that have useful information.

Of course I don't mean restore all the registers on continuation
invocation, lest there wouldn't be a way to return values. Restore
half of them... or none of them perhaps, and have some set of ops that
resore from that context... or something.

I guess I'm just getting scared having an implicit control stack and
an explicit register stack. It seems like if you do anything
"tricky", things could get messed up real quick. Are the register
stacks saved with the continuation? If so, I'm not so worried. I've
basically been saying to give the register stacks a depth of one, but
a greater depth wouldn't be a problem... except for efficiency issues,
maybe.

Would someone (presumably Dan, being the one who decided and all)
explain why we're using CPS? Maybe we (I) can get a better idea of
how things should work, then.

Luke

Dan Sugalski

unread,

Jun 12, 2003, 7:11:00 PM6/12/03

to Luke Palmer, perl6-i...@perl.org

At 5:01 PM -0600 6/12/03, Luke Palmer wrote:
> > At 4:28 PM -0600 6/12/03, Luke Palmer wrote:
>> > > Okay, at the moment I'm working on getting an implementation of
>> >> classes and objects working. I'm also taking a look at calling speed,
>> >> as I'd really like to not suck with our call times. :)
>> >>
>> >> First off, the core stuff looks good. I'd not really looked at it
>> >> until now, but now that I have, well... good job folks. I put in a
>> >> minor tweak that may or may not speed things up (basically checking
>> >> for the COW flag on stack chunks and bailing on the walk up if
>> >> they're already marked) depending on what's resident in the L1/L2
>> >> cache. We'll see.
>> >>
>> >> Second, I see that the registers themselves are in the context
>> >> structure. I think this may be a good part of our speed problem with
>> >> taking continuations. Now, continuations should *not* restore the
>> >> registers, so this strikes me as an incorrect thing to do, but before
>> >> I twiddle the context structure some and remove them, I want to check
>> >> and make sure that there's not a good reason to have them in.
>> >
>> >Well, aren't the registers really part of the context?
>>
>> Nope. If they were, there would be no way to return data from a
>> function. The return values go in the registers, remember, and we
>> return to the caller by invoking the continuation it passed into us.
>
>But they really are, honest.

No. No, they aren't. Registers are really data, so arguably a
continuation restore shouldn't restore *any* of the register stacks.
That'd make things rather... odd, so we're not doing that. I may
regret it in the future, of course.

Continuations are supposed to save only control information, not
data. Registers are data. (And so far as I know, no system that does
continuations saves register contents for restore, since there'd be
no way to pass data back and forth)

>Are the register stacks saved with the continuation?

Yes, of course they are.

Luke Palmer

unread,

Jun 12, 2003, 7:22:45 PM6/12/03

to d...@sidhe.org, perl6-i...@perl.org

> >Are the register stacks saved with the continuation?
>
> Yes, of course they are.

Er, yeah, um.. Everybody, forget everything I just said %-)

Luke

Melvin Smith

unread,

Jun 12, 2003, 8:40:45 PM6/12/03

to Dan Sugalski, perl6-i...@perl.org

At 06:05 PM 6/12/2003 -0400, Dan Sugalski wrote:
>Second, I see that the registers themselves are in the context structure.
>I think this may be a good part of our speed problem with taking
>continuations. Now, continuations should *not* restore the registers, so
>this strikes me as an incorrect thing to do, but before I twiddle the
>context structure some and remove them, I want to check and make sure that
>there's not a good reason to have them in.
>
>So... anyone?

Well since I'm the one that wrote that specific garbage I'll honestly say.....
I can't remember!

Now there, I'm sure that helped. :)

Seriously, most likely it was my lack of understanding of how to correctly
implement continuations so at the time it seemed like the correct thing
to do.

-Melvin

Leopold Toetsch

unread,

Jun 13, 2003, 5:33:40 AM6/13/03

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:
> Okay, at the moment I'm working on getting an implementation of
> classes and objects working. I'm also taking a look at calling speed,
> as I'd really like to not suck with our call times. :)

So first some numbers WRT speed:
Based on calling a bare subroutine (.Sub) with some variations:

set I20, 1000000
set I21, 0
new P0, .Sub
set_addr I22, func
set P0, I22
set I0, 0 # no prototype
set I2, 0 # no PMC params
set I3, 0 # void context
lp:
saveall # 1)
invoke
restoreall # 2)
inc I21
lt I21, I20, lp
end
func:
ret

1) and 2) omitted: 0.2s (all: -O3 compiled imcc)
1) and 2) as above: 1.2s
1) + 2) = 4 * halfpopX: 1.0s
perl 5.8.0: 1.25s

BTW: Putting the loop label before the "new P0, .Sub" more then doubles
the execution time (2.5s).

Cachegrind of course states that the memcpy in the register push/pop is
the culprit, the pushN/popN take almost double the time of the other.

I think, there was some discussion ago, if we couldn't use sliding
register windows e.g.:

P0 ... P15, P16 ... P31
^regp ^^^^^^^^^^^ caller fills regs+16 according to pdd03
P0 .... P15, P16 ... P31
^^^^^^^^^^^ called sub receives params like pdd03
^regp
return values like pdd03
P0 ... P15, P16 ... P31
^regp return values are in pdd03 + 16

Or probably better:

P0 .. P9, P10 .. P21, P22 .. P31
incoming local outgoing
P0 .. P9, P10 .. P21, P22 .. P31
incoming local outgoing

This would need one additional redirection for register access. But for
saving and restoring registers, we would just move the register pointer
by x*sizeof(reg). A memcpy would only be necessary on register frame
boundarys - or not when we reallocate frames.

leo

Jonathan Sillito

unread,

Jun 13, 2003, 12:08:51 PM6/13/03

to Dan Sugalski, Luke Palmer, perl6-i...@perl.org

I missed this discussion I see, but for what it's worth, the patch I
submitted earlier in the week introduced a continuation pmc that does
**not** save or restore the registers. The old continuation pmc that saved
(including the registers) everything has been renamed to
completecontext.pmc.

On the other hand I like the idea several types of continuations including
one that saves and restores half of the registers.

Finally if the registers are never to be saved as part of the context, we
should reorganize the relevant structs so that the registers are not part of
the Parrot_Context struct.

Jonathan Sillito

Dan Sugalski

unread,

Jun 13, 2003, 3:36:14 PM6/13/03

to l...@toetsch.at, perl6-i...@perl.org

At 11:33 AM +0200 6/13/03, Leopold Toetsch wrote:
>Cachegrind of course states that the memcpy in the register push/pop is
>the culprit, the pushN/popN take almost double the time of the other.
>
>I think, there was some discussion ago, if we couldn't use sliding
>register windows

I'd rather not have the window, but...

Saving and restoring all the registers is obviously a waste of time
in many cases. My assumption is that the compilers won't emit
saveall/restoreall instructions unless they're really needed, and in
most cases they won't be, so I think part of the timing's excessive.

Having said that, since the lower half of the register sets are
parameters and shouldn't be restored over, it seems sensible that
they shouldn't be saved over either. I think we may be better served
halving the size of the frame on the register stacks, adding in
pushtop, pushbottom, poptop, popbottom, and tossing the half-pop ops.
(well, they'd get renamed to poptop) saveall and restoreall, along
with the push ops, will stay, they'll just transparently do a
pushbottom and pushtop operation.

Not a big deal--the only reason it's not done is it has Jit
repercussions and I wanted you and Daniel to have a chance to bring
up problems with the scheme before I went and broke the JIT.

Leopold Toetsch

unread,

Jun 13, 2003, 5:59:33 PM6/13/03

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski wrote:

> At 11:33 AM +0200 6/13/03, Leopold Toetsch wrote:
>> register windows

> I'd rather not have the window, but...
>
> Saving and restoring all the registers is obviously a waste of time in
> many cases.

This sentence seems to contradict pdd03 - not that's a waste but ...
When it comes to return values they clearly clobber these values that
the caller is obligued to save.
A typical return value sequence currently looks like

pushp P5 # return val
restoreall
popp Px

OTOH the callee has to preserve P0-P2, S0, I0. Other registers, mainly
I1-I4 must be set on each sub return. The state of all other registers -
at least according to calling conventions - has to be preserved.

> ... My assumption is that the compilers won't emit

> saveall/restoreall instructions unless they're really needed, and in
> most cases they won't be, so I think part of the timing's excessive.

The compiler can't decide on not to emit saveall/restoreall - we are
talking here about Parrot calling conventions and the caller saves all.

The compiler may omit the saveall/restoreall on the last function call,
*if* the compiler is really sure, that its calling really a leaf (If
that returns with C<invoke> the compiler *doesn't* be sure).

> Having said that, since the lower half of the register sets are
> parameters and shouldn't be restored over, it seems sensible that they
> shouldn't be saved over either.

This sounds very reasonable to me. But some of these registers must not
get globbered by the callee either. This needs another - and please -
very detailled update for pdd03. (And with separated parts with or w/o
prototypes - Thanks)

> Not a big deal--the only reason it's not done is it has Jit
> repercussions and I wanted you and Daniel to have a chance to bring up
> problems with the scheme before I went and broke the JIT.

No JIT involved currently AFAIK. IMCC is a much bigger problem here. The
flow of registers in and out of suborutines has to be well defined to
get register allocation done. At PASM level, where registers are fixed
they are either mapped to an hardware processor register or not. Before
function calls they have to be flushed to Parrot registers, because of
exceptions and other interpreter weirdness like introspective possibilities.

Back to register windows for a minute:
Do you already have data structures in mind for multi threading? The
registers are for sure separated per thread. So the extra pointer for
the register window or something similar would be needed anyway.

leo

Jonathan Sillito

unread,

Jun 13, 2003, 8:11:32 PM6/13/03

to Leopold Toetsch, Dan Sugalski, perl6-i...@perl.org

> -----Original Message-----
> From: Leopold Toetsch [mailto:l...@toetsch.at]

[snip]

>
> The compiler can't decide on not to emit saveall/restoreall - we are
> talking here about Parrot calling conventions and the caller saves all.
>
> The compiler may omit the saveall/restoreall on the last function call,
> *if* the compiler is really sure, that its calling really a leaf (If
> that returns with C<invoke> the compiler *doesn't* be sure).
>

The calling convention says that the caller saves everything that the caller
cares about, but that may not be everything. If a sub uses only one or two
registers it only needs to save those before calling another sub ... right?

--
Jonathan Sillito

Dan Sugalski

unread,

Jun 13, 2003, 8:17:59 PM6/13/03

to Leopold Toetsch, perl6-i...@perl.org

At 11:59 PM +0200 6/13/03, Leopold Toetsch wrote:
>Dan Sugalski wrote:
>
>>At 11:33 AM +0200 6/13/03, Leopold Toetsch wrote:
>>>register windows
>
>>I'd rather not have the window, but...
>>
>>Saving and restoring all the registers is obviously a waste of time
>>in many cases.
>
>
>This sentence seems to contradict pdd03 - not that's a waste but ...
>When it comes to return values they clearly clobber these values
>that the caller is obligued to save.

The caller's not obliged to save anything really. In a caller-save
system, the caller is responsible for saving those things that it
cares about--it relieves the called sub of the responsibility to do
so. If the caller doesn't have anything it wants to save, then it
doesn't have to do so.

>OTOH the callee has to preserve P0-P2, S0, I0.

Well... strictly speaking it doesn't, though it really needs to keep
track of the return data in P1.

> Other registers, mainly I1-I4 must be set on each sub return. The
>state of all other registers - at least according to calling
>conventions - has to be preserved.

I think I need to go back and rework the text of PDD03 then--I can
tell it desperately needs an edit.

>>Not a big deal--the only reason it's not done is it has Jit
>>repercussions and I wanted you and Daniel to have a chance to bring
>>up problems with the scheme before I went and broke the JIT.
>
>No JIT involved currently AFAIK.

Cool. I double-checked with Daniel on IRC and he's fine with it as well.

> IMCC is a much bigger problem here.

I'm not sure it does, or at least that it should. (Though if people
play interesting games with register set swapping it could, I suppose)

>Back to register windows for a minute:
>Do you already have data structures in mind for multi threading? The
>registers are for sure separated per thread. So the extra pointer
>for the register window or something similar would be needed anyway.

Well... I'm not thinking of windows per se, just shrinking the size
of the frames on the stacks, so there's not a lot of wasted space. If
most of the pushes will be half-sets it seems sensible to just shrink
the frames so they hold half a set, and have the full frame saves
just do to half-set pushes. Seems to work out OK.

I need to think a bit about the current ramifications of threads,
since I wasn't planning on allowing multiple threads to share things
like continuations, but given the code I think we can do that, and if
we can it's pretty much open season on threads. (Though there's still
the issue of buffer and pmc structs migrating across interpreters in
ways that don't make things go bang)

Dan Sugalski

unread,

Jun 13, 2003, 8:19:26 PM6/13/03

to Jonathan Sillito, Leopold Toetsch, perl6-i...@perl.org

Right. I'm rewriting some of PDD 03 to make that much clearer.

John Van V.

unread,

Jun 13, 2003, 7:20:17 PM6/13/03

to perl6-i...@perl.org

Hello all,

I am on the LinuxBIOS list and my gut sense told me that a compiler that they
are developing called romcc might be a fit for Parrot.

Since Parrot uses CPU style assembler as its native language, it might make
sense to match it with a compiler that can take advantage of this.

Along comes romcc. The biggest problem w/ working in the BIOS is that you have
to initialize RAM. That cannot be done in C because there is no RAM to do it
in. Assembler is the only solution but thats an economic impossibility
sometimes because of the scaricity of linux folks versed it it.

So, sensibly, Eric Biederman <ebied...@lnxi.com> writes a small compiler that
uses registers instead of RAM hungry stacks.

Bingo !! But I feel like an idiot since I have yet to write a line of C code
but in only about 5 emails, Eric tells me "Yeah, thats doable"

Me> Romcc uses registers, not stacks -- like the Perl6 Parrot VM

Eric> Actually quite a bit different.
Eric> Parrot will just not use stack oriented byte codes. But a call/return
Eric> stack will still be required. romcc does not use a call/return stack,
Eric> but romcc still implement subroutines.

Me> Is there any point in implementing the Perl6 VM in a version of romcc
Me> enhanced with a call/return stack as the only compromise toward the
Me> traditional VM ??

Eric> I simply do not understand the question.

Me> To be frank, this is my situation; I am in an "open" school where my
Me> mentor has told me that I am way over my computer credits

Eric> I hope this is at the high school level or else your computer credits
Eric> have not sunk in well at all.

Me> I meant to use romcc in a context separate from LinuxBIOS.
Me> It would be a custom compiler for Parrot itself, running with RAM, where
Me> CPU register style of Parrot would match the register reliance of Parrot
Me> -- with the one added call/return stack that you mentioned.

Eric> O.k. that makes more sense. I have strong reservations about adding a
Eric> call/return stack. But a port to the parrot VM without that should
Eric> be doable.

====================
Eric's romcc basics
====================

Currently LinuxBIOS has a lot of assembly code simply because memory
initialization is difficult in the general case. This code cannot be
written with a standard compiler because there is no memory to put
a stack in. Nor on x86 are there cache blocks that can be locked into
place. As code generated with romcc does not use a stack it can be
used during memory initialization.

It is true romcc is not *done*, it is quite usable at this point.

In the freebios2 I have been gradually making the primary API ones
that can be used before memory is initialized.

The biggest difference is that if you want to return multiple values
instead of passing in the address of a variable the a multi valued
structure must be returned.

The biggest current known bug is that if you have a small type
like short when it is stored in a register nothing ensures it does not
take on a larger value than will fit in a short.

unsigned short i;
i = 65535;
i = i + 1; /* i == 65536 oops */

The biggest shortcoming comes from it's nature and

I have used it enough at this point I don't want to live without it
again.

=====
CXN, Inc. Contact: jo...@thinman.com
President, The Linux Society
http://groups.yahoo.com/group/linux-society
linux society distro -> http://www.thinman.com/eLSD/readme
ThinMan is a registered trademark of CXN, Inc

__________________________________
Do you Yahoo!?
Yahoo! Calendar - Free online calendar with sync to Outlook(TM).
http://calendar.yahoo.com

Leopold Toetsch

unread,

Jun 14, 2003, 5:20:14 AM6/14/03

to Dan Sugalski, perl6-i...@perl.org

Dan Sugalski <d...@sidhe.org> wrote:
> At 11:59 PM +0200 6/13/03, Leopold Toetsch wrote:

>> IMCC is a much bigger problem here.

> I'm not sure it does, or at least that it should. (Though if people
> play interesting games with register set swapping it could, I suppose)

Apart from such tricks: First a short summary. We now have half sized
register frames. This almost doubles performance, when only the lower
half of registers has to be preserved (e.g. when calling a sub in a loop).

OTOH when all registers have to be preserved the current scheme only has
roughly half the performance of the old 32 registers/frame.

So IMCC has to look at the current subroutine's register usage and emit
an appropriate combination of {top,bottom}pushX opcodes to preserve used
registers. The register allocator starts allocating registers from 0 up,
so e.g. when only 16 PRegs are used and the sub is called in a loop, it
will emit only a C<toppushp> opcode.
Or e.g. a sub is called only once with one parameter and one return value
which are both in P5. IMCC has to know, that the P5 after the subcall
comes out from the subroutine. If no other PRegs are used after the
subroutine call, no pushp opcode is necessary.

The HL rather should not emit saveall/invoke/restoreall. It ought to
be done inside IMCC. This part should be handled in a way I tried to
outline in imcc/docs/calling_conventions.pod. The doc of course needs an
update for CPS.

leo