Not continuation passing style

Jon Harrop

unread,

May 4, 2007, 9:21:51 PM5/4/07

to

Compilers like SML/NJ transform function calls into CPS. This has the
advantage of avoiding stack overflows but incurs a performance cost.

I'm wondering if anyone has tried to do the opposite. Can you transform CPS
code back into stack-eating code to make it run faster? Maybe resort to CPS
when you're running low on stack?

--
Dr Jon D Harrop, Flying Frog Consultancy
The F#.NET Journal
http://www.ffconsultancy.com/products/fsharp_journal/?usenet

Marcin 'Qrczak' Kowalczyk

unread,

May 5, 2007, 7:44:32 AM5/5/07

to

Dnia 05-05-2007, sob o godzinie 02:21 +0100, Jon Harrop napisał(a):

> I'm wondering if anyone has tried to do the opposite. Can you transform CPS
> code back into stack-eating code to make it run faster?

It would require whole program analysis to be sure that the
continuations are only entered once, and in the reverse order
of their creation. I doubt it can be practical.

> Maybe resort to CPS when you're running low on stack?

The stack doesn't have to be limited to a fixed size. Some compilers
reallocate it on overflow.

--
__("< Marcin Kowalczyk
\__/ qrc...@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/

Neelakantan Krishnaswami

unread,

May 7, 2007, 3:40:09 PM5/7/07

to

In article <463bdd7f$0$8758$ed26...@ptn-nntp-reader02.plus.net>, Jon Harrop
wrote:

>
> Compilers like SML/NJ transform function calls into CPS. This has
> the advantage of avoiding stack overflows but incurs a performance
> cost.
>
> I'm wondering if anyone has tried to do the opposite. Can you
> transform CPS code back into stack-eating code to make it run
> faster? Maybe resort to CPS when you're running low on stack?

CPS, as an intermediate representation, simply makes the control
context apparent in the source code. This does require you to have
have a heap-allocated continuation in your compiled code. For example,
the Moby compiler (whose runtime features a stack) will convert part
of a program into CPS in order to efficiently compile nested
recursions into nested loops.

But to answer your question, one cute way of representing a CPS-style
IR is to put it in a monadic style:

cps(x) == return x

cps(\x.e) == return (\x. cps(e))

cps(e e') == letv f = cps(e) in
letv v = cps(e') in
f(v)

As long as return and the monadic bind letv are interpreted in the
continuation monad, then this code is in CPS style. If you you never
use the explicit control context -- ie, you don't use call/cc -- you
can convert back to direct-style just by changing the monad you
interpret this program in from the continuation monad to the identity
monad.

--
Neel R. Krishnaswami
ne...@cs.cmu.edu

Jon Harrop

unread,

May 7, 2007, 8:55:01 PM5/7/07

to

Neelakantan Krishnaswami wrote:
> CPS, as an intermediate representation, simply makes the control
> context apparent in the source code. This does require you to have
> have a heap-allocated continuation in your compiled code. For example,
> the Moby compiler (whose runtime features a stack) will convert part
> of a program into CPS in order to efficiently compile nested
> recursions into nested loops.

Very interesting. Thanks.

Torben Ægidius Mogensen

unread,

May 8, 2007, 5:03:29 AM5/8/07

to

Jon Harrop <j...@ffconsultancy.com> writes:

> Compilers like SML/NJ transform function calls into CPS. This has the
> advantage of avoiding stack overflows but incurs a performance cost.
>
> I'm wondering if anyone has tried to do the opposite. Can you transform CPS
> code back into stack-eating code to make it run faster? Maybe resort to CPS
> when you're running low on stack?

There are two answers to this:

1. You can run the CPS program through an analysis that detects if
closures can be stack allocated. If you don't use control
operators etc., the continuations all can.

2. Olivier Danvy has written papers about converting CPS code into
direct-style code: http://citeseer.ist.psu.edu/danvy92back.html,
http://portal.acm.org/citation.cfm?id=141564 .

Torben

Ben Rudiak-Gould

unread,

May 10, 2007, 12:21:14 PM5/10/07

to

Jon Harrop wrote:
> Compilers like SML/NJ transform function calls into CPS. This has the
> advantage of avoiding stack overflows but incurs a performance cost.

I've never heard a lack of stack space mentioned as a reason for using CPS.
If that's a problem then you can just allocate a bigger stack. On the other
hand, putting your stack frames on the heap does make it a lot easier to
write a portable garbage collector, and I think many language implementors
do it for that reason. There's also the "Cheney on the M.T.A." technique,
which is an interesting hybrid.

-- Ben

Andrew Reilly

unread,

May 10, 2007, 9:02:38 PM5/10/07

to

On Thu, 10 May 2007 17:21:14 +0100, Ben Rudiak-Gould wrote:
> I've never heard a lack of stack space mentioned as a reason for using CPS.
> If that's a problem then you can just allocate a bigger stack.

It's been mentioned in the context of Java VMs for the case where you want
to support gazillions of threads (each with their own "stack") on a 32-bit
machine: there is conflict between running out of virtual memory or
unnecessarily constraining the per-thread stack size. CPS allows you to
smoosh all of the threads together into the one space, so that virtual
memory use tracks physical memory use more closely.

Of course, going to a 64-bit address space is also a way to avoid that
problem, and allows the continued use of the fairly efficient stack
structure. [As is a multi-process, rather than multi-thread, model.]

Cheers,

--
Andrew

Torben Ægidius Mogensen

unread,

May 11, 2007, 3:41:32 AM5/11/07

to

Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:

> On Thu, 10 May 2007 17:21:14 +0100, Ben Rudiak-Gould wrote:
>> I've never heard a lack of stack space mentioned as a reason for using CPS.
>> If that's a problem then you can just allocate a bigger stack.
>
> It's been mentioned in the context of Java VMs for the case where you want
> to support gazillions of threads (each with their own "stack") on a 32-bit
> machine: there is conflict between running out of virtual memory or
> unnecessarily constraining the per-thread stack size. CPS allows you to
> smoosh all of the threads together into the one space, so that virtual
> memory use tracks physical memory use more closely.

That is not really an argument for CPS, but for using linked
heap-allocated frames as an implementation of the stack. CPS
conversion is one way of achieving this, but on JVM you need a bit
more as tailcalls (essential for CPS) are not directly supported.

Torben