Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Why we went with CPS for parrot

5 views
Skip to first unread message

Dan Sugalski

unread,
Jul 29, 2003, 10:42:13 AM7/29/03
to perl6-i...@perl.org
Klaas-Jan asked about this a while back--sorry it's taken so long to
get an answer.

If you read back through the list and other stuff I've written, it's
pretty clear that while I like CPS, I wasn't convinced that CPS was
the way to go, but if you look at the parrot code now, we use CPS.
Why?

First, it's important to know why I wasn't going to go with CPS.
That's pretty straightforward--fear and uncertainty. I was uncertain
they were sufficiently useful, and afraid that if we did go with them
we'd end up with an engine that nobody would create code for and
nobody'd be able to work on because we were using a scheme (pardon
the pun) that noone understood. I don't mind pieces of something that
only a dozen or two folks at any one time feel comfortable working
on--you don't need that many people poking at the internals of the
garbage collector--but we need to make sure that things are at
intelligible to at least some folks.

The uncertainty about their utility has been dealt with. Partly
because of some of the needs that parrot has, and I'll talk about
that a bit later, and partly because I've just gotten more
comfortable with them. I had a fair amount of lingering discomfort
with them that needed dealing with. The fear about scaring people off
with them is gone as well--I understand them, I know how to explain
them to people quickly and reasonably simply, and in a way that can
deal with the knee-jerk fear often associated with them. (I'm
convinced at this point that all of the fear people have about
continuations is a direct result of how they're taught and what's
associated with them, though that's a rant for another day.

The reasons to use them fall out of two needs--the need for
optimizations, and the amount of context that needs to be preserved
across function calls.

Optimizations are easy--having a CPS makes doing tail calls
dead-simple. We can't always do tail calls, of course--they can have
some issues with destruction timing guarantees and while we don't
explicitly make those, it's not nice to surprise people when they
haven't asked for it. Still, it's there for the using, and it means
the difference between a deeply recursive algorithm dying a horrible
death and running to completion. Besides, in a managed environment
(and we are managed--while everything will ultimately be open to
inspection there'll be some limits to what you can actually change)
it can be quite difficult to implement tail calls without engine
help. It's trivial to provide it, difficult to make it happen if we
don't, and useful for the optimizer. Not a tough choice.

Calling context is another big one. We have gobs of stacks,
registers, context, and whatnot that need preserving, putting us far
past the "stick the return value on the stack and branch off to
wherever" style of calling code. Not only is there this stuff that's
required at the user level, the interpreter has a lot of stuff it
potentially needs to save--what runloop you had, what your security
context was, what flags were in effect, and so on. What we were going
to do, pre-CPS, was, in the caller:

saveall
call
restoreall

and the callee would do a:

return

The call would push all the interpreter info onto the call stack,
along with the return point. Return uses that information to put
things back on return. With a CPS style, that turns into:

saveall
callcc
restoreall

with the callee doing

invoke Px

with Px being the register holding the return continuation in it.
Generally P1, but that's not required. (In both cases, the sub/method
to be invoked has to be in P0 so no parameters are needed for the
call) You'll note two differences: We spell return "invoke" in the
CPS version and give it a parameter, and give call the "cc" suffix.
Ooohhh, scary!

We could, of course, make the state saving explicit, rather than
rolling it up with the call in the first example, but since we must
save the state in all cases, there's little point in doing that.
Mandatory state saving is fine, making it mandatory but not
automatically doing it is silly. Someone'll forget and things will go
bang, or we'll get odd security holes.

Interestingly, it also means that if you squint a bit and don't think
too hard about it (and generally you won't) the only difference
between a CPS scheme and the old scheme is we pass in the "return
object" as a parameter, rather than put it on the stack. We could
even rename the ops if we so chose, though the old jsr/ret style of
calling is being preserved for a variety of reasons, so we're not
going to.

Security is something of a concern--since a sub/method boundary is
also a privilege boundary (which is to say that code on one side of a
sub may have more or fewer privs than the other side) all the state
saving, restoring, and respecting has to be handled by the
interpreter. That's a good reason to not even give the option of
managing interpreter state manually, since that's dangerous. It may
be possible with sufficient bytecode verification and proper security
design, but that's a lot more work than just managing it all
ourselves and not giving user code the opportunity to screw it up.
(That, after all, is our job :)

Going CPS isn't entirely without extra effort, but its effort that's
hidden under the interpreter covers, where it ought to stay. Compiler
writers and people generating assembly by hand don't have to deal
with it, as they ought not have to.
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
d...@sidhe.org have teddy bears and even
teddy bears get drunk

0 new messages