Efficient implementation of delimited control

264 views
Skip to first unread message

Alexis King

unread,
Nov 30, 2019, 11:21:28 PM11/30/19
to Matthew Flatt, Racket Developers
If I may ask one more question of you, Matthew: following up from the thread on `dynamic-wind`, I would love to pick your brain on what you think are the state of the art implementation techniques for delimited continuations. When I went looking for resources, I mostly found two papers:

  1. “A Monadic Framework for Delimited Continuations” by Dvbvig, Peyton Jones, and Sabry (JFP 2007)

  2. “Adding Delimited and Composable Control to a Production Programming Environment” by Flatt, Yu, Findler, and Felleisen (ICFP 2007)

Both these papers are extremely helpful, but they’re also both 12 years old. Moreover, I remember reading somewhere (perhaps from Sam?) that the 3m implementation of continuations is relatively slow compared to the implementation by Chez Scheme. Especially given your recent work on Racket on Chez, I imagine it’s something you’ve thought about a lot, so I’m curious if you have any pointers about what you consider to be the most desirable implementation approach.

Naïvely, it seems hard to get away from the need to maintain an annotated stack of frames, grouped into chunks separated by prompts, but I wonder if there are any tricks or optimizations that seem to win big in practice. Likewise, it would be helpful to know if any optimizations ended up not panning out. My context here is a project to implement efficient delimited control in Haskell, so in practice I might not have a ton of control at the lowest level, but that’s okay—I’m still interested in understanding the low-level details in case they might inform a change that could be made to GHC itself.

I’m sure a fully detailed explanation would take more time than I can reasonably ask you to give, so some high-level comments and pointers to other resources is more than fine. :) I’m mostly just trying to get a handle on where to start—I want to avoid choosing a path that leads to a dead end.

Thanks,
Alexis

Matthew Flatt

unread,
Dec 1, 2019, 9:47:39 AM12/1/19
to Alexis King, Racket Developers
Pycket has the fastest implementation of delimited control that I know.
Pycket's continuations are fast because it uses heap-based continuation
frames. SML/NJ continuations are fast for the same reason. And Pycket
gets all of Racket's control operators, because it directly implements
the ICFP'07 model.

As long as your program is just manipulating continuations, then
heap-based frames are the way to go. But if your programs spends a lot
of time in more conventional control patterns, then further work is
needed to make heap-based frames perform well overall. Pycket relies on
a tracing JIT to make its heap-based on model run fast overall, while
SML/NJ relies on compile-time analysis.


For an implementation that relies on a representation choice instead of
tracing or analysis, Racket CS's implementation of delimited control is
the state of the art --- mostly by building on Chez Scheme's
implementation of continuations.

For details, you'll want to go back to

"Representing Control in the Presence of First-Class Continuations"
Hieb, Dybvig, and Bruggeman (PLDI 1990)

The Rumble layer on top of that for delimiting uses metacontinuations.
It's probably about the same as implementations in

"Final Shift for call/cc: a Direct Implementation of Shift and Reset"
Gasbichler and Sperber (ICFP 2002)

but I haven't yet gone back to check or to see how that compares to the
approach in "A Mondaic Framework ...". Rumble's tail correction to
metacontinuations is probably essentially the same as in "A Mondaic
Framework ...". In any case, see "racket/src/cs/rumble/control.ss".

The Racket CS implementation of continuation marks benefits from direct
support in Chez Scheme for "continuation attachments". Attachments are
like marks, but with just one implicit key, so a key+value layer is
implemented in Racket CS by installing a dictionary as the single
attachment value. The implementation of attachments in Chez Scheme
exploits the representation of continuations to make mark operations
inexpensive. I'll send you more details separately.


For a monadic implementation, "A Mondaic Framework ..." is
state-of-the-art as far as I know. See also Oleg's variant
(http://okmij.org/ftp/continuations/implementations.html). I don't know
what it would meant to implement delimited control more primitively in
a lazy language.
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-dev+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/racket-dev/7559EBB2-4B34-49D1-921A-21C0D93B3E
> BE%40gmail.com.

Matthias Felleisen

unread,
Dec 1, 2019, 10:30:48 AM12/1/19
to Matthew Flatt, Alexis King, Racket Developers

On Dec 1, 2019, at 9:47 AM, Matthew Flatt <mfl...@cs.utah.edu> wrote:

Pycket has the fastest implementation of delimited control that I know.
Pycket's continuations are fast because it uses heap-based continuation
frames. SML/NJ continuations are fast for the same reason. And Pycket
gets all of Racket's control operators, because it directly implements
the ICFP'07 model.

As long as your program is just manipulating continuations, then
heap-based frames are the way to go. But if your programs spends a lot
of time in more conventional control patterns, then further work is
needed to make heap-based frames perform well overall. Pycket relies on
a tracing JIT to make its heap-based on model run fast overall, while
SML/NJ relies on compile-time analysis.


The speed of continuation operations in SML/NJ (and probably Pycket)
is, in my mind, vastly overrated. It neglects the entire calculation of 
heap allocation, gc, and other global effects on time that just aren’t measured. 



For an implementation that relies on a representation choice instead of
tracing or analysis, Racket CS's implementation of delimited control is
the state of the art --- mostly by building on Chez Scheme's
implementation of continuations.


A true comparison would start with a complete reimplementation of 
SML using “stacks" and Hieb et al.’s "lazy continuations”. That way 
you’d get the type information, static analysis, and indefinite recursion. 

(I do wonder now whether MLton has call/cc and friends. It’s the 
closest I can think of.) 

;; - - - 

Having said that, I still do wonder whether call/cc per se is needed (see
note cited by Alexis) and whether we wouldn’t be better off thinking thru
F, Dorai’s #, and a dynamic-wind appropriate for that setting. Even in our
famous examples (web browser, time pre-emption) I have yet to see an 
undelimited use of continuations. 

Mach 3 (Draves, CMU and Microsoft) demonstrated that OSes definitely 
get all of what they need from delimited continuations. (Indeed, they 
initially thought that they invented delimited continuations but I had imagined
them myself in Dan’s 1984 “511” implementing numerous small kernel “thingies.”)


— Matthias




Faré

unread,
Dec 1, 2019, 2:16:17 PM12/1/19
to Matthias Felleisen, Matthew Flatt, Alexis King, Racket Developers
Without resorting to static analysis, wouldn't a combination of a
stack-as-nursery (e.g. in the style of Gambit Scheme) and one-bit
reference counting (does any implementation use that these days?) help
a lot with the performance of mixing delimited control with more
linear control flows?

—♯ƒ • François-René ÐVB Rideau •Reflection&Cybernethics• http://fare.tunes.org
A student who changes the course of history is probably taking an exam.
> --
> You received this message because you are subscribed to the Google Groups "Racket Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to racket-dev+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/racket-dev/348CBCC5-9C49-4186-99B4-FE13A79FACC6%40felleisen.org.

Matthew Flatt

unread,
Dec 3, 2019, 2:01:03 PM12/3/19
to Racket Developers
Here are some extra notes on the papers I mentioned before, now that
I've re-read them.

Gasbichler and Sperber:

The implementation of delimited continuations in Racket CS is about as
much like the metacontinuation approach discussed in the beginning as
it's like the implementation described in the rest of the paper:

* Creating a prompt pushes onto the metacontinuation, just like
`*reset` in the paper. But the new current continuation in Racket CS
is immediately truncated after `k` is captured. That avoids the
space problem that the paper mentions.

* Then again, with multiple prompts, the metacontinuation stack looks
a lot like the continuation stack in the paper. Capturing a
continuation copies as many metacontinuation frames as needed to
reach the relevant prompt (i.e., for a specific prompt tag).

Section 7.2 of the paper looks right, except that Racket CS terminates
the continuation as soon as it is created by a prompt, not when the
continuation is later captured. To truncate a continuation, Racket CS's
implementation uses

(#%$current-stack-link #%$null-continuation)

to truncate the current continuation at the point where it's known to
be just one frame. See below for some thoughts on extended Chez Scheme
with the C operator, instead.

There's room for a better representation of metacontinuations to avoid
the copy operation on continuation capture. Some sort of tree structure
may be the right idea. A better data structure would matter if there
are prompts with enough different prompt tags in the current
continuation. That doesn't often happen, but it could happen with
enough uses of `call/ec`, because each escape continuation is
implemented as a prompt with a fresh tag.


Dybvig, Peyton Jones, and Sabry:

Section 3.2 says that the metacontinuation approach tends toward +F+. I
think it tends toward +F-; maybe it's a question of putting the prompt
on one side or the other of a link in the metacontinuation chain, and
section 3.3 resolve the issue by putting the prompt as its own element
of the metacontinuation-style list. Meanwhile, the primitives exposed
to Racket provide -F- (effectively) by including prompt handlers;
that's implemented with a little trampoline on the outside of each
prompt --- which, ok, is a little more complexity, but it works out
naturally it you're going to have prompt handlers.

The Racket CS implementation of delimited control is very much like the
Scheme implementation in this paper. Racket's implementation just adds
a more to deal with prompt handler, continuation marks, and winders.
The paper's implementation sets up an `abort` continuation that expects
a thunk, instead of explicitly truncating a continuation; explicitly
truncating should be a little faster (because it avoids another closure
creation, at least).

Section 7.5, about the Haskell implementation, points out that the C
operator is more useful here than `call/cc`. That avoids the need to
set up an `abort` continuation, and it could replace an explicit
truncation operation. Then again, `call/cc` is still handy for various
purposes in Racket CS's Rumble layer to reflect on the continuation
without aborting.

Matthias Felleisen

unread,
Dec 3, 2019, 7:42:12 PM12/3/19
to Matthew Flatt, Racket Developers


On Dec 3, 2019, at 2:01 PM, Matthew Flatt <mfl...@cs.utah.edu> wrote:

Section 7.5, about the Haskell implementation, points out that the C
operator is more useful here than `call/cc`. That avoids the need to
set up an `abort` continuation, and it could replace an explicit
truncation operation. Then again, `call/cc` is still handy for various
purposes in Racket CS's Rumble layer to reflect on the continuation
without aborting.


Yes, I meant C not F. (Copying/appending the stacks is too expensive in most cases.) 

Alexis King

unread,
Dec 6, 2019, 6:53:55 PM12/6/19
to Matthew Flatt, Matthias Felleisen, Racket Developers
Apologies for the delayed response to this—I had a busy week, then got sick at the end of it—but this is all incredibly helpful. There’s a lot of reading I still need to do to catch up on all of this, but that’s exactly what I was hoping for: there’s a lot of useful information here I hadn’t found on my own. Thank you very much for all of it!

I’ll follow up here if I have any questions, but for now, I think I just need to take some time to digest everything.

Thanks,
Alexis
Reply all
Reply to author
Forward
0 new messages