the evil of continuations

francois Morvillier

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

In a paper I've read on the future of LISP (forgot the author and the
whole title but it ends with "how to win big"), the author mentions the
fact that Scheme would be a good basis for a standardisation of LISP if
it weren't for its continuations (call-with-current-continuation ...).
Could someone please elaborate on this?

Also I'd like to know what's the general opinion on continuations - do
people actually understand them, use them, like them?

Cheers.

François.

Joe Marshall

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

francois Morvillier <francois....@eurocontrol.fr> writes:

> In a paper I've read on the future of LISP (forgot the author and the
> whole title but it ends with "how to win big"),

Richard Gabriel wrote it. The title is
`Lisp: Good News, Bad News, How to Win Big'
The url is http://www.ai.mit.edu/docs/articles/good-news/good-news.html

> the author mentions the"
> fact that Scheme would be a good basis for a standardisation of LISP if
> it weren't for its continuations (call-with-current-continuation ...).
> Could someone please elaborate on this?

The typical uses of first-class continuations are in
exception-handling, co-routining, and in creating new control-flow
primitives. Common Lisp provides mechanisms for all of these, and the
Common Lisp implementation has much less overhead than first-class
continuations.

First class continuations make compilation harder: when any
computation may be restarted because a continuation might be captured
in a lower layer, the compiler has to be very conservative about
allocating mutable structures.

First class continuations are confusing as well --- apparently much
more so than catch/throw, error-handlers, co-routines, etc.

In Gabriel's opinion, first-class continuations are more trouble than
they are worth.

> Also I'd like to know what's the general opinion on continuations - do
> people actually understand them, use them, like them?

A lot of people understand them, they are not used *that* frequently,
except in Scheme system code. If you have them, they can be pretty
handy, but I don't miss them when I'm hacking Common Lisp.

Barry Margolin

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

In article <3dmdy7...@alum.mit.edu>,
Joe Marshall <jmar...@alum.mit.edu> wrote:

>francois Morvillier <francois....@eurocontrol.fr> writes:
>> Also I'd like to know what's the general opinion on continuations - do
>> people actually understand them, use them, like them?
>
>A lot of people understand them, they are not used *that* frequently,
>except in Scheme system code. If you have them, they can be pretty
>handy, but I don't miss them when I'm hacking Common Lisp.

It should be noted that Scheme was designed mainly for pedagogical
purposes, particularly for research into language design, whereas Common
Lisp was intended to be a practical language for complex application
development. Thus, Scheme provides first-class continuations because
they're a useful substrate upon which just about any control structure can
be implemented. But most programmers aren't doing research about control
structures, so they have little need to use them directly; they're more
likely to use macros that have been implemented using them. For instance,
I expect there are Scheme exception handling macros, and they would
naturally expand into call/cc uses. It's no more necessary for most
programmers to understand continuations than they need to understand how
transistors work (in both cases, understanding them can be helpful, but you
can get by without it).

--
Barry Margolin, bar...@genuity.net
Genuity, Burlington, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.

Thant Tessman

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

Barry Margolin wrote:

> It should be noted that Scheme was designed mainly for pedagogical
> purposes, particularly for research into language design, whereas Common
> Lisp was intended to be a practical language for complex application
> development. Thus, Scheme provides first-class continuations because
> they're a useful substrate upon which just about any control structure can

> be implemented. [...]

The question remains: What is it about continuations that makes a
language (or its implementation) inappropriate for "real work"? What
about continuations in SML/NJ?

-thant

Joe Marshall

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

Thant Tessman <th...@acm.org> writes:

> The question remains: What is it about continuations that makes a
> language (or its implementation) inappropriate for "real work"?

Nothing, per se. It would just be more difficult to implement, and it
would be difficult to justify adding the feature to any language where
adequate alternatives exist (like CL).

> What about continuations in SML/NJ?

Appel argues that heap-allocation of continuation frames is cheap
enough to be considered an alternative, but Jim Miller and Bill Rozas
showed that a stack implemention can outperform a heap.

If your substrate gives you heap-allocated frames, then it makes
sense to create a call-with-current-continuation construct and use it
to implement your various control structures. But I think it would be
wrong to *require* call-with-current-continuation as part of a Common
Lisp substrate because the upper levels don't need it.

Thant Tessman

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

Joe Marshall wrote:
>
> Thant Tessman <th...@acm.org> writes:
>
> > The question remains: What is it about continuations that makes a
> > language (or its implementation) inappropriate for "real work"?
>
> Nothing, per se. It would just be more difficult to implement, and it
> would be difficult to justify adding the feature to any language where
> adequate alternatives exist (like CL).

I'm not familiar with Common Lisp. Is it that CL has its own threading
mechanism? (I'm not arguing that call/cc should be added to CL. I'm way
to ignorant of CL to do anything like that. It's just that call/cc is an
amazingly powerful thing, and I am curious why it's not as celebrated a
language feature as I would think it should be.)

>
> > What about continuations in SML/NJ?
>
> Appel argues that heap-allocation of continuation frames is cheap
> enough to be considered an alternative, but Jim Miller and Bill Rozas
> showed that a stack implemention can outperform a heap.

I thought there were newer techniques that allow stack-based
implementations to still do call/cc efficiently. (Or is that what you're
talking about?)

[...]

-thant

Barry Margolin

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

In article <394A7C71...@acm.org>, Thant Tessman <th...@acm.org> wrote:
>
>
>Barry Margolin wrote:
>
>> It should be noted that Scheme was designed mainly for pedagogical
>> purposes, particularly for research into language design, whereas Common
>> Lisp was intended to be a practical language for complex application
>> development. Thus, Scheme provides first-class continuations because
>> they're a useful substrate upon which just about any control structure can
>> be implemented. [...]
>

>The question remains: What is it about continuations that makes a

>language (or its implementation) inappropriate for "real work"? What
>about continuations in SML/NJ?

You've got the implication backwards. It's not 'continuations -> not
real', it's 'not real -> continuations'. In particular, it's the lack of
high-level constructs that makes a language like Scheme more difficult to
use for real work, because you have to find or create them when you need
them.

Basically, I consider Scheme a lower-level member of the Lisp family than
Common Lisp. Continuations are a low-level facility, not a high-level
facility.

Barry Margolin

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

In article <394A860B...@acm.org>, Thant Tessman <th...@acm.org> wrote:
>I'm not familiar with Common Lisp. Is it that CL has its own threading
>mechanism? (I'm not arguing that call/cc should be added to CL. I'm way
>to ignorant of CL to do anything like that. It's just that call/cc is an
>amazingly powerful thing, and I am curious why it's not as celebrated a
>language feature as I would think it should be.)

CL doesn't have threading in the standard, although most implementations
have it as an extension. The same is true for just about every popular HLL
except for Java and Ada, I think.

But it has exceptions and various types of non-local transfers (both
dynamic and lexical), which are often what continuations are used to
implement. CL doesn't support everything that Scheme continuations allow;
specifically, CL environments capture name bindings, but not control flow,
so you can't re-enter a block that has been exited (this avoids the problem
of spaghetti stacks, but it's also what requires threading to be an
extension).

Joe Marshall

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

Thant Tessman <th...@acm.org> writes:

> I'm not familiar with Common Lisp. Is it that CL has its own threading
> mechanism?

Most implementations of Common Lisp provide some multi-threading
capability. There is no `standard' (although most of the
multi-threading implementations seem to be based on `stack groups').

> (I'm not arguing that call/cc should be added to CL. I'm way
> to ignorant of CL to do anything like that. It's just that call/cc is an
> amazingly powerful thing, and I am curious why it's not as celebrated a
> language feature as I would think it should be.)

Some people don't like it. Go figure.

> I thought there were newer techniques that allow stack-based
> implementations to still do call/cc efficiently. (Or is that what you're
> talking about?)

They are still much hairier than implementations that only know how to
unwind the stack. They also preclude allocating mutable structures
and variables on the stack.

vsync

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

Joe Marshall <jmar...@alum.mit.edu> writes:

> First class continuations make compilation harder: when any
> computation may be restarted because a continuation might be captured

Hm. That bit made me think of Prolog for a second. I'm a CL fan
myself, and don't have much experience in Scheme, but this intrigues
me a bit. Are continuations more like the backtracking and things
that go on in Prolog, or like exceptions in C++ and Java?

--
vsync
http://quadium.net/ - last updated Fri Jun 16 00:38:12 MDT 2000
Orjner.

at ncal point verio point com x@x.x tom

unread,

Jun 16, 2000, 3:00:00 AM6/16/00

to

Thant Tessman <th...@acm.org> writes:
> The question remains: What is it about continuations that makes a
> language (or its implementation) inappropriate for "real work"? What
> about continuations in SML/NJ?

Well, it's not the presence of continuations as much as the absence of
exceptions and threads (SML/NJ has both). Standardizing exceptions in
particular is important because it means that code that different
people write has a standard way of communicating error conditions. If
everybody does their own exception package, I don't know how to even
detect that something went wrong in your library.

Tom.

felix

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

Joe Marshall wrote in message <3dmdy7...@alum.mit.edu>...

>The typical uses of first-class continuations are in
>exception-handling, co-routining, and in creating new control-flow
>primitives. Common Lisp provides mechanisms for all of these, and the
>Common Lisp implementation has much less overhead than first-class
>continuations.

This is not true. It just puts certain constraints on the way a language has
to be implemented.

>First class continuations make compilation harder: when any
>computation may be restarted because a continuation might be captured

>in a lower layer, the compiler has to be very conservative about
>allocating mutable structures.

This depends entirely on the design of the system. If you only see
the 'traditional' stack-based method of implementation, then, of course,
first-class continuations scare the hell out of you.

>First class continuations are confusing as well --- apparently much
>more so than catch/throw, error-handlers, co-routines, etc.

What is more confusing, CALL/CC or the combination of
CATCH/THROW/BLOCK/RETURN/RETURN-FROM/TAGBODY/GO/etc.?
(did I forget something? Oh yes: PROG!)

Again, in a stack-centric point of view, first-class continuations can be
confusing. But basically it's just that: save state in a function & GOTO
What is confusing about that?

felix

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

Joe Marshall wrote in message ...

>Appel argues that heap-allocation of continuation frames is cheap
>enough to be considered an alternative, but Jim Miller and Bill Rozas
>showed that a stack implemention can outperform a heap.

Under which circumstances? (we are talking *full* first-class continuations
here, with multiple returns)
Where can I find that information (paper)?

felix

Matthias Blume

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

Joe Marshall <jmar...@alum.mit.edu> writes:

> Thant Tessman <th...@acm.org> writes:
>
> > The question remains: What is it about continuations that makes a
> > language (or its implementation) inappropriate for "real work"?
>

> Nothing, per se. It would just be more difficult to implement, and it
> would be difficult to justify adding the feature to any language where
> adequate alternatives exist (like CL).
>

> > What about continuations in SML/NJ?
>

> Appel argues that heap-allocation of continuation frames is cheap
> enough to be considered an alternative, but Jim Miller and Bill Rozas
> showed that a stack implemention can outperform a heap.

Notice that the two statements (Appel's claim and Rozas' demonstration
(wasn't it actually Clinger et al.?)) are not necessarily contradictory.

Matthias

--
Matthias Blume <blume@k_u_r_i_m_s.k_y_o_t_o-u.a_c.j_p>
Kyoto University, Research Institute for Mathematical Sciences
(remove those spam-preventing underscores from mail address)

Tim Bradshaw

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

* Joe Marshall wrote:
> Thant Tessman <th...@acm.org> writes:
>> I'm not familiar with Common Lisp. Is it that CL has its own threading
>> mechanism?

> Most implementations of Common Lisp provide some multi-threading
> capability. There is no `standard' (although most of the
> multi-threading implementations seem to be based on `stack groups').

If you use call/cc (+ something else: timer interrupts?) to do
threading could such a system be persuaded to sit naturally on top of,
say, posix threads? By `naturally' I mean `without heroic effort' or
something.

(Not that the current CL thread systems do, I know)

--tim

felix

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

vsync wrote in message <87d7lhw...@quadium.net>...

>Hm. That bit made me think of Prolog for a second. I'm a CL fan
>myself, and don't have much experience in Scheme, but this intrigues
>me a bit. Are continuations more like the backtracking and things
>that go on in Prolog, or like exceptions in C++ and Java?

They are both. That's the beauty of it.

felix

Pierre R. Mai

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

Tim Bradshaw <t...@cley.com> writes:

OTOH Douglas T. Crosher is still working on an experimental version of
CMUCL multi-threading which sits atop native threading mechanisms on
Linux (and FreeBSD). This project seems to be progressing nicely (see
current snapshots on http://www.cons.org/cmucl/ in the experimental
Category). So it seems that it is imminently doable, which seems to
suggest that most users of the commercial systems don't see native
threading for Lisp stuff as a high priority item...

Regs, Pierre.

--
Pierre Mai <pm...@acm.org> PGP and GPG keys at your nearest Keyserver
"One smaller motivation which, in part, stems from altruism is Microsoft-
bashing." [Microsoft memo, see http://www.opensource.org/halloween1.html]

Pierre R. Mai

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

vsync <vs...@quadium.net> writes:

> Joe Marshall <jmar...@alum.mit.edu> writes:
>
> > First class continuations make compilation harder: when any
> > computation may be restarted because a continuation might be captured
>

> Hm. That bit made me think of Prolog for a second. I'm a CL fan
> myself, and don't have much experience in Scheme, but this intrigues
> me a bit. Are continuations more like the backtracking and things
> that go on in Prolog, or like exceptions in C++ and Java?

Continuations can be used to implement both of them (and a number of
other things like multi-threading, basic control structures, etc.).
In effect continuations take the control-flow (and local state) of a
program at a certain point-in time, and make them available to the
program itself. In this way you can implement e.g. the cut operator
of Prolog, or catch and throw for exception handling, including
automatic restarts, or e.g. yield for multi-threading, etc.

Note that there are other mechanisms to implement any of the above
mechanims.

Joe Marshall

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

"felix" <fe...@anu.ie> writes:

> Joe Marshall wrote in message <3dmdy7...@alum.mit.edu>...
>
> >The typical uses of first-class continuations are in
> >exception-handling, co-routining, and in creating new control-flow
> >primitives. Common Lisp provides mechanisms for all of these, and the
> >Common Lisp implementation has much less overhead than first-class
> >continuations.
>
> This is not true. It just puts certain constraints on the way a language has
> to be implemented.

Let's take the simple example of catch/throw. In common lisp, once
you throw to a tag, you cannot re-enter any part of the dynamic
context that was present during the throw. This means that catch need
do nothing more than push a marker on the stack, and throw can
`simply' unwind to that marker. This is quite low overhead.

With Call/cc, however, the dynamic state must be preserved because the
continuation may be re-invoked at any time. There are a number of
techniques to do this:
1. Copy the stack.
2. Put markers and machinery in place to detect escaping
continuations and copy the stack when necessary.
3. ``Phantom stacks'' allocated from the heap.
4. Allocate frames in the heap.

All of these work, but none has as low overhead as catch/throw.

> >First class continuations make compilation harder: when any
> >computation may be restarted because a continuation might be captured

> >in a lower layer, the compiler has to be very conservative about
> >allocating mutable structures.
>
> This depends entirely on the design of the system. If you only see
> the 'traditional' stack-based method of implementation, then, of course,
> first-class continuations scare the hell out of you.

Stack-based allocation is more efficient than heap-based (see MIT AI
memo 1462).

> >First class continuations are confusing as well --- apparently much
> >more so than catch/throw, error-handlers, co-routines, etc.
>
> What is more confusing, CALL/CC or the combination of
> CATCH/THROW/BLOCK/RETURN/RETURN-FROM/TAGBODY/GO/etc.?
> (did I forget something? Oh yes: PROG!)

Well, I suppose that call/cc is less confusing than the *combination*
of the above, but I rarely try to combine all of those. In my
experience, it is much harder to explain call/cc to a student than to
explain catch/throw or block/return/return-from or tagbody/go.

> Again, in a stack-centric point of view, first-class continuations can be
> confusing. But basically it's just that: save state in a function & GOTO
> What is confusing about that?

The act of doing so is simple, the consequences are much less so.
Things like unwind-protect become very difficult to reason about when
you can reenter the protected forms. Call/cc can expose the implicit
assignments in a LETREC form and thus turn purely functional code into
code with side effects.

It's simply not so cut-and-dried.

Joe Marshall

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

"felix" <fe...@anu.ie> writes:

> Joe Marshall wrote in message ...

>
> >Appel argues that heap-allocation of continuation frames is cheap
> >enough to be considered an alternative, but Jim Miller and Bill Rozas
> >showed that a stack implemention can outperform a heap.
>

> Under which circumstances?

Standard call and return.

> (we are talking *full* first-class continuations
> here, with multiple returns)

No. The point is that heap-allocation of control records (stack
frames) is more expensive than stack-allocation. So you are taking a
performance hit if you choose to heap-allocate your stack frames.

Remember that the bulk of code does *not* use first-class
continuations, but if you heap-allocate stack frames, all the code
pays the cost.

> Where can I find that information (paper)?

MIT AI memo 1462 (look in http://www.ai.mit.edu/)

Joe Marshall

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

Matthias Blume <s...@my.sig> writes:

> Joe Marshall <jmar...@alum.mit.edu> writes:
>
> > Appel argues that heap-allocation of continuation frames is cheap
> > enough to be considered an alternative, but Jim Miller and Bill Rozas
> > showed that a stack implemention can outperform a heap.
>

> Notice that the two statements (Appel's claim and Rozas' demonstration)
> are not necessarily contradictory.

Precisely so. In fact, Miller and Rozas' paper is titled
``Garbage Collection is Fast, but a Stack is Faster''. It is a question
of what is `fast enough'. Miller and Rozas conclude that stack
allocation ``remains an important implementation technique for
procedure calls''.

> (wasn't it actually Clinger et al.?)

James S. Miller and Guillermo J. Rozas
AI memo 1462, March 1994, 37 pages

Pierre R. Mai

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

Tim Bradshaw <t...@cley.com> writes:

> * Joe Marshall wrote:
> > Thant Tessman <th...@acm.org> writes:
> >> I'm not familiar with Common Lisp. Is it that CL has its own threading
> >> mechanism?
>
> > Most implementations of Common Lisp provide some multi-threading
> > capability. There is no `standard' (although most of the
> > multi-threading implementations seem to be based on `stack groups').
>
> If you use call/cc (+ something else: timer interrupts?) to do
> threading could such a system be persuaded to sit naturally on top of,
> say, posix threads? By `naturally' I mean `without heroic effort' or
> something.
>
> (Not that the current CL thread systems do, I know)

OTOH Douglas T. Crosher is still working on an experimental version of
CMUCL multi-threading which sits atop native threading mechanisms on
Linux (and FreeBSD). This project seems to be progressing nicely (see
current snapshots on http://www.cons.org/cmucl/ in the experimental
Category). So it seems that it is imminently doable, which seems to
suggest that most users of the commercial systems don't see native
threading for Lisp stuff as a high priority item...

Regs, Pierre.

Pierre R. Mai

unread,

Jun 17, 2000, 3:00:00 AM6/17/00

to

vsync <vs...@quadium.net> writes:

> Joe Marshall <jmar...@alum.mit.edu> writes:
>
> > First class continuations make compilation harder: when any
> > computation may be restarted because a continuation might be captured
>

> Hm. That bit made me think of Prolog for a second. I'm a CL fan
> myself, and don't have much experience in Scheme, but this intrigues
> me a bit. Are continuations more like the backtracking and things
> that go on in Prolog, or like exceptions in C++ and Java?

Continuations can be used to implement both of them (and a number of
other things like multi-threading, basic control structures, etc.).
In effect continuations take the control-flow (and local state) of a
program at a certain point-in time, and make them available to the
program itself. In this way you can implement e.g. the cut operator
of Prolog, or catch and throw for exception handling, including
automatic restarts, or e.g. yield for multi-threading, etc.

Note that there are other mechanisms to implement any of the above
mechanims.

Regs, Pierre.

felix

unread,

Jun 18, 2000, 3:00:00 AM6/18/00

to

Joe Marshall wrote in message ...

>Let's take the simple example of catch/throw. In common lisp, once

>you throw to a tag, you cannot re-enter any part of the dynamic
>context that was present during the throw. This means that catch need
>do nothing more than push a marker on the stack, and throw can
>`simply' unwind to that marker. This is quite low overhead.
>
>With Call/cc, however, the dynamic state must be preserved because the
>continuation may be re-invoked at any time. There are a number of
>techniques to do this:
> 1. Copy the stack.
> 2. Put markers and machinery in place to detect escaping
> continuations and copy the stack when necessary.
> 3. ``Phantom stacks'' allocated from the heap.
> 4. Allocate frames in the heap.

5. Use Hieb's/Dyvbig's/Bruggeman's "stack segments"
6. Allocate on the stack, but (nearly) never pop (Baker's MTA)

>All of these work, but none has as low overhead as catch/throw.

Correct.
What I want to say is that, in certain implementation models, call/cc
(with all its implications) *can* be as fast as catch/throw (to name
it: Baker's "Cheney on the M.T.A").

>Stack-based allocation is more efficient than heap-based (see MIT AI
>memo 1462).

Stack-based *allocation* is not more efficient than heap-based *allocation*!
You just increment a (stack- or heap-top-) pointer, save it and store
whatever
you want to store in that new chunk of memory.
If you mean anything more then *allocation* than please be more accurate.

>It's simply not so cut-and-dried.

Exactly!

felix

Boris Schaefer

unread,

Jun 18, 2000, 3:00:00 AM6/18/00

to

Thant Tessman <th...@acm.org> writes:

| Barry Margolin wrote:
|
| > It should be noted that Scheme was designed mainly for pedagogical
| > purposes, particularly for research into language design, whereas Common
| > Lisp was intended to be a practical language for complex application
| > development. Thus, Scheme provides first-class continuations because
| > they're a useful substrate upon which just about any control structure can
| > be implemented. [...]
|

| The question remains: What is it about continuations that makes a

| language (or its implementation) inappropriate for "real work"? What
| about continuations in SML/NJ?

It's the same reason that Java doesn't have "goto" and full MI. It
was thought that it could and would be abused too much. (Personally,
I don't care about "goto", but I think they made a mistake with not
having MI.)

--
Boris Schaefer <bo...@uncommon-sense.net>

If you don't drink it, someone else will.

Joe Marshall

unread,

Jun 18, 2000, 3:00:00 AM6/18/00

to

"felix" <fe...@anu.ie> writes:

> Joe Marshall wrote in message ...
>
> >Let's take the simple example of catch/throw. In common lisp, once
> >you throw to a tag, you cannot re-enter any part of the dynamic
> >context that was present during the throw. This means that catch need
> >do nothing more than push a marker on the stack, and throw can
> >`simply' unwind to that marker. This is quite low overhead.
> >
> >With Call/cc, however, the dynamic state must be preserved because the
> >continuation may be re-invoked at any time. There are a number of
> >techniques to do this:
> > 1. Copy the stack.
> > 2. Put markers and machinery in place to detect escaping
> > continuations and copy the stack when necessary.
> > 3. ``Phantom stacks'' allocated from the heap.
> > 4. Allocate frames in the heap.
>
>
> 5. Use Hieb's/Dyvbig's/Bruggeman's "stack segments"
> 6. Allocate on the stack, but (nearly) never pop (Baker's MTA)

This is equivalent to number 4. The stack serves as the most volatile
segment of the heap.

> >All of these work, but none has as low overhead as catch/throw.
>
> Correct.
> What I want to say is that, in certain implementation models, call/cc
> (with all its implications) *can* be as fast as catch/throw (to name
> it: Baker's "Cheney on the M.T.A").

Yes, but only if you slow down general procedure calls. Baker's
Cheney on the M.T.A. gives you a catch/throw that is equivalent in
performance to call/cc because you are heap-allocating all the
frames. However, the performance of a general procedure call suffers
for it.

> >Stack-based allocation is more efficient than heap-based (see MIT AI
> >memo 1462).
>
> Stack-based *allocation* is not more efficient than heap-based *allocation*!

Is too!

> You just increment a (stack- or heap-top-) pointer, save it and
> store whatever you want to store in that new chunk of memory. If
> you mean anything more then *allocation* than please be more
> accurate.

If you are going to be pedantic about it, you probably want me to be
more `precise' than `accurate'. But I'll rephrase the statement:

Stack-based management of continuation frames is more efficient than
heap-based management of continuation frames.

I presume that in both stack and heap implementations that the
*intent* is to allow the memory to be recycled. If this is not the
case, then yes, they have the exact same cost. However, if you *do*
intend to recycle the memory, the costs are different, and, in
particular, more expensive in the heap-based version. This is for two
reasons that Miller and Rozas identify:

``1. The size of continuation frames must be larger if allocated on
the heap in order to accomodate a pointer to the previous
frame. When frames are on the stack, the previous frame
pointer can be calculated using address arithmetic on the
current frame pointer.

2. Because continuation frames form a singly linked structure
when allocated on the heap, the maintenance of the link
information requires instructions. In addition, since these
instructions reference memory, they are relatively expensive
on current machines.''

In other words, when you allocate frames on the heap, you must, as
part of the allocation process, put a pointer to the previous frame
into the newly allocated frame if you intend to ever recycle this
frame.

The interesting thing is that the cost of setting up and using the
heap-based frame exceeds the cost of setting up, using, and
*deallocating* the stack-based frame. Thus ``even if
garbage-collection never occurs or costs nothing, the cost of heap
allocation exceeds the cost of stack allocation.''

felix

unread,

Jun 18, 2000, 3:00:00 AM6/18/00

to

Joe Marshall wrote in message ...

>Yes, but only if you slow down general procedure calls. Baker's

>Cheney on the M.T.A. gives you a catch/throw that is equivalent in
>performance to call/cc because you are heap-allocating all the
>frames. However, the performance of a general procedure call suffers
>for it.

(you probably meant "...gives you call/cc that is equivalent in performance
to catch/throw because...)

No. A procedure-call in Baker's approach is just as costly as a C
procedure call (which should be acceptable). With some dirty tricks
(certain compiler-specific function-attributes, etc.) one could tune even
that.

>Stack-based management of continuation frames is more efficient than
>heap-based management of continuation frames.

Nobody questions that. But Miller's and Roza's paper applies only to a
very small part of the picture: The cost of procedure-call/continuation-
frames *without* taking call/cc into account.

Yes, I'm pedantic. But you are generalizing things a little bit too much.
This thread was about call/cc and it's costs, not about the costs
of stack-allocation vs. heap-allocation regarding just function-calls.

felix

David Rush

unread,

Jun 19, 2000, 3:00:00 AM6/19/00

to

Joe Marshall <jmar...@alum.mit.edu> writes:
> "felix" <fe...@anu.ie> writes:
> > Joe Marshall wrote in message <3dmdy7...@alum.mit.edu>...

> Stack-based allocation is more efficient than heap-based (see MIT AI
> memo 1462).

Methinks that you overstate the case. AI memo 1462 only claims a
constant-factor difference in theoretical efficiency. It's claims
about efficiency on real machines is more equivocal, but no different
than what most people know about the problems of locality in GC
systems.

Miller & Rozas also neglected the VAX architecture which had CALLG, a
single-instruction linked-stack procedure call. In short, if hardware
engineers thought that there was a buck to be made supporting
linked-stacks and GC environments, they'd build them.

david rush
--
I repeat myself when under stress. I repeat myself when under
stress. I repeat myself when under stress. I repeat myself when
under stress. I repeat myself when under stress. I repeat myself
when under stress. I repeat myself when under stress. I repeat

Tim Bradshaw

unread,

Jun 19, 2000, 3:00:00 AM6/19/00

to

* David Rush wrote:

> Miller & Rozas also neglected the VAX architecture which had CALLG, a
> single-instruction linked-stack procedure call. In short, if hardware
> engineers thought that there was a buck to be made supporting
> linked-stacks and GC environments, they'd build them.

Was it Franz Lisp that outperformed C on the Vax by not using the
procedure-call instructions? Just because they provide them in
hardware doesn't mean they're fast.

--tim

Joe Marshall

unread,

Jun 19, 2000, 3:00:00 AM6/19/00

to

David Rush <ku...@bellsouth.net> writes:

> Joe Marshall <jmar...@alum.mit.edu> writes:
> > "felix" <fe...@anu.ie> writes:
> > > Joe Marshall wrote in message <3dmdy7...@alum.mit.edu>...
> > Stack-based allocation is more efficient than heap-based (see MIT AI
> > memo 1462).
>
> Methinks that you overstate the case.

Quite possibly.

> AI memo 1462 only claims a constant-factor difference in theoretical
> efficiency. It's claims about efficiency on real machines is more
> equivocal, but no different than what most people know about the
> problems of locality in GC systems.

Not exactly. On page 2, the paper states: `Our argument depends
critically on the implementation of the stack, not merely on the
abstract stack datatype' On page 3, they state: `We find the
conceptual analysis above to be interesting, but not fully
compelling.' And they measure the performance difference on the
Alpha processor, the HP PA, the 68000, and the '486.

> Miller & Rozas also neglected the VAX architecture which had CALLG, a
> single-instruction linked-stack procedure call.

True.

> In short, if hardware engineers thought that there was a buck to be
> made supporting linked-stacks and GC environments, they'd build
> them.

Quite definitely. I have always believed that hardware designed
specifically with Lisp (or Scheme) in mind will outperform `stock'
hardware hands down.

But to return to my original claims, I think that the cost of
supporting general call/cc on stock hardware is high enough that
Gabriel's objection to including it in a Lisp substrate has a
foundation. And while it is possible to make call/cc as cheap as
catch/throw, it cannot be done on stock hardware (80486) without
taking a performance hit on regular function calls.

Mind you, I *prefer* call/cc to all the other hacks/kludges
(catch/throw, block/return, tagbody/go, etc.), and consider it worth
the while to either use the `Cheney on the MTA' hack or to use a
traditional stack in the implementation. I just think that people who
object to either of these techniques have a valid point.

Shriram Krishnamurthi

unread,

Jun 19, 2000, 3:00:00 AM6/19/00

to

David Rush <ku...@bellsouth.net> writes:

> Miller & Rozas also neglected the VAX architecture which had CALLG, a

> single-instruction linked-stack procedure call. In short, if hardware

> engineers thought that there was a buck to be made supporting
> linked-stacks and GC environments, they'd build them.

But CALLG was pretty heavy-weight; many compilers ignored it entirely,
even when it corresponded to their desired semantics. (Also, you may
not have wanted caller-saves.) The main benefit I saw to it was when
programming directly in assembler, when it was mighty convenient to
set things up just so, then invoke a single instruction. (Ditto for
the polynomial instruction.)

'shriram

Joe Marshall

unread,

Jun 19, 2000, 3:00:00 AM6/19/00

to

"felix" <fe...@anu.ie> writes:

> Joe Marshall wrote in message ...
>
> >Yes, but only if you slow down general procedure calls. Baker's
> >Cheney on the M.T.A. gives you a catch/throw that is equivalent in
> >performance to call/cc because you are heap-allocating all the
> >frames. However, the performance of a general procedure call suffers
> >for it.
>
> (you probably meant "...gives you call/cc that is equivalent in performance
> to catch/throw because...)

Yes. Funny how mathematical equivalence is reflective, but this isn't.

> No. A procedure-call in Baker's approach is just as costly as a C
> procedure call (which should be acceptable). With some dirty tricks
> (certain compiler-specific function-attributes, etc.) one could tune even
> that.

Yes, the procedure call itself is the same cost, but before you call a
subproblem in Baker's approach, you must allocate and initialize a
continuation. This is the extra cost that is unnecessary in the C
approach.

> >Stack-based management of continuation frames is more efficient than
> >heap-based management of continuation frames.
>
> Nobody questions that. But Miller's and Roza's paper applies only to a
> very small part of the picture: The cost of procedure-call/continuation-
> frames *without* taking call/cc into account.

I don't understand. The cost of procedure-call/continuation-frames
*without* taking cwcc into account surely can't be *higher* than
the cost *with* taking cwcc into account!

> Yes, I'm pedantic. But you are generalizing things a little bit too much.
> This thread was about call/cc and it's costs, not about the costs
> of stack-allocation vs. heap-allocation regarding just function-calls.

That is correct. It was my claim that the cost of cwcc is higher than
the cost of catch/throw. I defend this claim as follows:

1. If continuations are stack-managed, then the cost of cwcc is
higher than catch/throw because cwcc must copy any information
that is not used in a stack-like manner.

2. If continuations are heap-managed, although cwcc now appears to
cost the same as catch/throw, the actual cost has been moved
from the cwcc to the function-calling sequence, which slows down
all the code, regardless of whether it uses cwcc or not.

It is important to note that the presence of cwcc is what would
cause you to consider heap-allocation of the continuation.

Joe Marshall

unread,

Jun 19, 2000, 3:00:00 AM6/19/00

to

"felix" <fe...@anu.ie> writes:

> I claim that the cost of call/cc can be considered acceptable, regarding
> its conceptual elegance and expressive power. And I also believe that
> there are certain heap-based implementation methods that *could* provide
> *nearly* comparable performance to stack-based systems.

I think we're in agreement.

As a side note, I implemented an interpreter for an obscure scripting
language a couple of years back that used Baker's `Cheney on the MTA'
trick. The intent wasn't to make cwcc work, however. This particular
language was unusual in that the EVAL procedure took two continuations
(the language was context-sensitive, so parsing and evaluation were
interleaved). In order to get tail recursion into the language, I had
to manually manipulate the continuations (they could not be mapped
onto the underlying C continuations easily).

Of course, since Baker's hack also gives you first-class
continuations, there was no reason to omit them from the interpreter.

Michael Sperber [Mr. Preprocessor]

unread,

Jun 19, 2000, 3:00:00 AM6/19/00

to

>>>>> "Pierre" == Pierre R Mai <pm...@acm.org> writes:

Pierre> Tim Bradshaw <t...@cley.com> writes:

>> * Joe Marshall wrote:
>> > Thant Tessman <th...@acm.org> writes:
>> >> I'm not familiar with Common Lisp. Is it that CL has its own threading
>> >> mechanism?
>>
>> > Most implementations of Common Lisp provide some multi-threading
>> > capability. There is no `standard' (although most of the
>> > multi-threading implementations seem to be based on `stack groups').
>>
>> If you use call/cc (+ something else: timer interrupts?) to do
>> threading could such a system be persuaded to sit naturally on top of,
>> say, posix threads? By `naturally' I mean `without heroic effort' or
>> something.
>>
>> (Not that the current CL thread systems do, I know)

Pierre> OTOH Douglas T. Crosher is still working on an experimental version of
Pierre> CMUCL multi-threading which sits atop native threading mechanisms on
Pierre> Linux (and FreeBSD).

Unfortunately, OS threads are often *way* too heavyweight for the
really interesting applications of threads.

--
Cheers =8-} Mike
Friede, Völkerverständigung und überhaupt blabla

Michael Sperber [Mr. Preprocessor]

unread,

Jun 19, 2000, 3:00:00 AM6/19/00

to

>>>>> "Thant" == Thant Tessman <th...@acm.org> writes:

Thant> I thought there were newer techniques that allow stack-based
Thant> implementations to still do call/cc efficiently. (Or is that what you're
Thant> talking about?)

They are not really new, but only recently the relevant publication
has become accessible:

William D. Clinger, Anne Hartheimer, Eric Ost: Implementation
Strategies for First-Class Continuations. Higher-Order and Symbolic
Computation 12(1): 7-45 (1999)

Makes for excellent reading!

felix

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

Joe Marshall wrote in message ...

>> No. A procedure-call in Baker's approach is just as costly as a C

>> procedure call (which should be acceptable). With some dirty tricks
>> (certain compiler-specific function-attributes, etc.) one could tune even
>> that.
>
>Yes, the procedure call itself is the same cost, but before you call a
>subproblem in Baker's approach, you must allocate and initialize a
>continuation. This is the extra cost that is unnecessary in the C
>approach.

Right. But in a traditional, stack-based model you have to do extra frame-
creation/closure manipulation (for free variable-access, etc.) or other
procedure-entry processing also. Furthermore those continuation-
records don't have to be allocated on *every* function entry.

>> >Stack-based management of continuation frames is more efficient than
>> >heap-based management of continuation frames.
>>
>> Nobody questions that. But Miller's and Roza's paper applies only to a
>> very small part of the picture: The cost of procedure-call/continuation-
>> frames *without* taking call/cc into account.
>
>I don't understand. The cost of procedure-call/continuation-frames
>*without* taking cwcc into account surely can't be *higher* than
>the cost *with* taking cwcc into account!

Sorry, my english is a little bit awkward. What I mean is that the mentioned
paper favours stack-like implementation techniques. But the given
examples would not be able to handle first-class continuations properly and
so would not be valid in a real implementation of Scheme (if I understand
those code examples correctly). I want to say that the paper basically
compares C-like procedure calls (using stack-frames) with C-like procedure-
calls (using heap-frames). So I consider it not relevant to this discussion.

>... It was my claim that the cost of cwcc is higher than

>the cost of catch/throw. I defend this claim as follows:
>
> 1. If continuations are stack-managed, then the cost of cwcc is
> higher than catch/throw because cwcc must copy any information
> that is not used in a stack-like manner.
>
> 2. If continuations are heap-managed, although cwcc now appears to
> cost the same as catch/throw, the actual cost has been moved
> from the cwcc to the function-calling sequence, which slows down
> all the code, regardless of whether it uses cwcc or not.
>
> It is important to note that the presence of cwcc is what would
> cause you to consider heap-allocation of the continuation.

You are perfectly right.

I claim that the cost of call/cc can be considered acceptable, regarding
its conceptual elegance and expressive power. And I also believe that
there are certain heap-based implementation methods that *could* provide
*nearly* comparable performance to stack-based systems.

felix

David Rush

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

Joe Marshall <jmar...@alum.mit.edu> writes:

> David Rush <ku...@bellsouth.net> writes:
> > AI memo 1462 only claims a constant-factor difference in theoretical
> > efficiency. It's claims about efficiency on real machines is more
> > equivocal, but no different than what most people know about the
> > problems of locality in GC systems.
>
> Not exactly. On page 2, the paper states: `Our argument depends
> critically on the implementation of the stack, not merely on the
> abstract stack datatype' On page 3, they state: `We find the
> conceptual analysis above to be interesting, but not fully
> compelling.' And they measure the performance difference on the
> Alpha processor, the HP PA, the 68000, and the '486.

And they did instruction counts based on a typical implementations and
found that stack costs were something like 11n vs 13n for heap (where
n is the number of call/returns). Caveat lector: I re-read the paper
yesterday, and pitched it - I'm not sure if the heap number was 13 or
15.

Yes, they did do specific measures, and found the usual scalability
problems with the heap based code. They partially characterized the
problem, but as is usual with analyses involving multiple levels of
cache, didn't provide an complete analysis of the behavior. I can't
say that I blame them, those interactions turn chaotic pretty
quickly.

Lest this discussion turn into a terminology debate, I will say that
Miller & Rozas provide an important reality check on Appel. They don't
invalidate his conclusion, but they make the limitations of the
'asymptotically zero-cost' GC result clear.

> Quite definitely. I have always believed that hardware designed
> specifically with Lisp (or Scheme) in mind will outperform `stock'
> hardware hands down.

Obviously, but the question that no-one has asked (AFAICT) is how do
Algol-family languages do on GC-friendly hardware? And even more, how
much easier and more efficient can multi-threading be in that
environment? I was first attracted to the VAX CALLG instruction by its
applicability to building 'cactus stacks' (as we called them at the
time) for implementing short-lived threads in event-driven network
drivers. It was only after I got into functional languages (I started
with SML 10 years after my VAX years), that I saw other applications
for it (CALLG).

> But to return to my original claims, I think that the cost of
> supporting general call/cc on stock hardware is high enough that
> Gabriel's objection to including it in a Lisp substrate has a
> foundation.

Fair enough, although Stalin is pretty impressive in the performance
stats and has full (IIRC) call/cc.

david rush
--
From the start...the flute has been associated with pure (some might
say impure) energy. Its sound releases something naturally untamed, as
if a squirrel were let loose in a church." --Seamus Heaney

Joe Marshall

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

David Rush <ku...@bellsouth.net> writes:

> And they did instruction counts based on a typical implementations and
> found that stack costs were something like 11n vs 13n for heap (where
> n is the number of call/returns). Caveat lector: I re-read the paper
> yesterday, and pitched it - I'm not sure if the heap number was 13 or
> 15.

Pretty good memory! They found that the computation of (factorial n)
ran in 11n + 2 instructions (on an Alpha) when stack allocation was
used, and in 13n + 2 instructions when heap allocation was used. They
concluded that the cost of creating n continuations on the stack was
3n (of which 2 are memory stores) and the cost of deleting n
continuations from the stack was n (with no memory references). The
cost of creating n continuations on the heap was 5n (with 3 memory
stores).

> Lest this discussion turn into a terminology debate, I will say that
> Miller & Rozas provide an important reality check on Appel. They don't
> invalidate his conclusion, but they make the limitations of the
> 'asymptotically zero-cost' GC result clear.

They agree with Appel in the fully general case, they just point out
that in this one specific case --- creating continuation frames on
hardware that has support for a stack --- you can do better than
general garbage collection.

> > Quite definitely. I have always believed that hardware designed
> > specifically with Lisp (or Scheme) in mind will outperform `stock'
> > hardware hands down.
>
> Obviously, but the question that no-one has asked (AFAICT) is how do
> Algol-family languages do on GC-friendly hardware?

What specifically do you mean by `Algol family' languages?

> And even more, how much easier and more efficient can
> multi-threading be in that environment?

Depends. With the Lisp Machine, the more hardware that we threw at
making the function-call sequence faster, the harder it was to unload
the hardware in order to switch to a different thread.

> Fair enough, although Stalin is pretty impressive in the performance
> stats and has full (IIRC) call/cc.

But no EVAL.

Guillermo 'Bill' J. Rozas

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

David Rush <ku...@bellsouth.net> writes:
>
> Methinks that you overstate the case. AI memo 1462 only claims a

> constant-factor difference in theoretical efficiency. It's claims
> about efficiency on real machines is more equivocal, but no different
> than what most people know about the problems of locality in GC
> systems.
>

> Miller & Rozas also neglected the VAX architecture which had CALLG, a
> single-instruction linked-stack procedure call. In short, if hardware
> engineers thought that there was a buck to be made supporting
> linked-stacks and GC environments, they'd build them.

To my knowledge, CALLG was more expensive than CALLS in every single
version of the VAX.

Furthermore, the VAX calling instructions were so expensive as to be
avoided in practice several compilers.

Memory references are more expensive than arithmetic operations, and
getting more expensive and less abundant as we go forward.

David Rush

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

Joe Marshall <jmar...@alum.mit.edu> writes:
> David Rush <ku...@bellsouth.net> writes:

> > Obviously, but the question that no-one has asked (AFAICT) is how do
> > Algol-family languages do on GC-friendly hardware?
>
> What specifically do you mean by `Algol family' languages?

Err, those with only standard block-structured control structures,
where they don't perform any more intresting manipulations of the
current continuation than if and preocedure call/return. They also
generally using explicit storage management. -ish.

> > And even more, how much easier and more efficient can
> > multi-threading be in that environment?
>
> Depends. With the Lisp Machine, the more hardware that we threw at
> making the function-call sequence faster, the harder it was to unload
> the hardware in order to switch to a different thread.

Oh. :(

david rush
--
Censorship may be useful for preservation of morality, but can never
be so for its restoration. - Jean-Jacques Rousseau

Guillermo 'Bill' J. Rozas

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

"felix" <fe...@anu.ie> writes:
>
> Sorry, my english is a little bit awkward. What I mean is that the mentioned
> paper favours stack-like implementation techniques. But the given
> examples would not be able to handle first-class continuations properly and
> so would not be valid in a real implementation of Scheme (if I understand
> those code examples correctly). I want to say that the paper basically
> compares C-like procedure calls (using stack-frames) with C-like procedure-
> calls (using heap-frames). So I consider it not relevant to this discussion.

This is not right.

If memory serves, the given examples would be able to handle cwcc just
fine, as long as you use some variant of cwcc that copies the stack
(in part or whole).

The issue is, and has been:

A. Do you introduce cwcc with its immense power but associated
non-trivial costs?

B. Do you make do with something less powerful but less costly?

Note that in A you can choose your costs:

A.1
Make everything almost as cheap (*) as without cwcc, and make
cwcc costly.

A.2
Make many things more expensive and cwcc cheaper.

Of course there is a spectrum between A.1 and A.2 .

Note that I haven't advocated a position -- all are defensible.
Personally, I believe that A.1 is a reasonable choice. It just means
that programmers using cwcc have to be aware of its costs, and others
will not notice much "degradation".

The purpose of the paper was to refute a claim that heap allocation of
continuation frames (in the absence of cwcc -- that was added to ML
after it was already using heap allocation and the argument had been
made) was asymptotically cheaper than stack allocation.

(*) You can't easily keep mutable state on the stack, and this adds
extra overhead that you wouldn't have in the absence of cwcc.

> >... It was my claim that the cost of cwcc is higher than
> >the cost of catch/throw. I defend this claim as follows:
> >
> > 1. If continuations are stack-managed, then the cost of cwcc is
> > higher than catch/throw because cwcc must copy any information
> > that is not used in a stack-like manner.
> >
> > 2. If continuations are heap-managed, although cwcc now appears to
> > cost the same as catch/throw, the actual cost has been moved
> > from the cwcc to the function-calling sequence, which slows down
> > all the code, regardless of whether it uses cwcc or not.
> >
> > It is important to note that the presence of cwcc is what would
> > cause you to consider heap-allocation of the continuation.
>
> You are perfectly right.
>
> I claim that the cost of call/cc can be considered acceptable, regarding
> its conceptual elegance and expressive power. And I also believe that
> there are certain heap-based implementation methods that *could* provide
> *nearly* comparable performance to stack-based systems.

As I said, any position is defensible. However, I suspect that you
are being optimistic when using '*nearly*'. In particular, on a
previous post you mention that you consider the cost of C function
calls acceptable. In my experience, the average C compiler does a
terrible job at function boundaries, and some of the inefficiencies
are demanded by the language definition. Scheme/Lisp compilers can
have much more streamlined function call sequences, and often do,
since Scheme/Lisp programs depend on function calls so much more than
C programs.

felix

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

David Rush wrote in message ...

>Fair enough, although Stalin is pretty impressive in the performance
>stats and has full (IIRC) call/cc.

AFAIK Stalin does *not* have full call/cc. Or does it?
(Mr. Siskind?)

felix

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

Guillermo 'Bill' J. Rozas wrote in message
<6dn1kg2...@cobalt.transmeta.com>...

>As I said, any position is defensible. However, I suspect that you
>are being optimistic when using '*nearly*'. In particular, on a
>previous post you mention that you consider the cost of C function
>calls acceptable. In my experience, the average C compiler does a
>terrible job at function boundaries, and some of the inefficiencies
>are demanded by the language definition. Scheme/Lisp compilers can
>have much more streamlined function call sequences, and often do,
>since Scheme/Lisp programs depend on function calls so much more than
>C programs.

The costs of C function-calls are AFAIK seldom questioned by C programmers
(partly because there is no alternative, of course).
And more and more compilers for Lispand Scheme translate into C (and some of
those compilers are notorious for the efficient code they generate).

felix

Colin Walters

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

>>>>> "felix" == felix <fe...@anu.ie> writes:

felix> AFAIK Stalin does *not* have full call/cc. Or does it?

I got a message from Mr. Siskind himself a while ago in mostly capital
letters that informed me that Stalin does support call/cc :)

Jeffrey B. Siegal

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

felix wrote:
> AFAIK Stalin does *not* have full call/cc. Or does it?

> (Mr. Siskind?)

$ man stalin

-no-escaping-continuations
Normally, full continuations are supported. When
this option is specified, the only continuations
that are supported are those that cannot be called
after the procedure that created the continuation
has returned.

Jeffrey B. Siegal

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

Joe Marshall wrote:
> > Fair enough, although Stalin is pretty impressive in the performance
> > stats and has full (IIRC) call/cc.
>

> But no EVAL.

From stalin/README:

Stalin nominally accepts the language defined by The Revised^4
Report on the Algorithmic Language Scheme (R4RS).

(R4RS doesn't have EVAL.)

Jeffrey B. Siegal

unread,

Jun 20, 2000, 3:00:00 AM6/20/00

to

David Rush wrote:
> Fair enough, although Stalin is pretty impressive in the performance
> stats and has full (IIRC) call/cc.

Stalin compiles the entire program as a single block; there is no
separate compilation. While useful, this makes it very different from
compilers/environments that don't work that way.

Christopher Browne

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

Centuries ago, Nostradamus foresaw a time when Jeffrey B. Siegal would say:

Does Stalin still consume a whole _LOT_ of memory when compiling? The
last time I tried running the benchmarks, it was bogging down due to
my having a mere 96MB of memory, when it seemed to want Rather More.
--
cbbr...@ntlug.org - <http://www.ntlug.org/~cbbrowne/scheme.html>
"Unlike computers, guns don't have Y2K problems..."

Jeffrey B. Siegal

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

Christopher Browne wrote:
> Does Stalin still consume a whole _LOT_ of memory when compiling?

As far as I know, yes. (But I have a _REAL_LOT_ of memory so I don't
notice...)

Martin Cracauer

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

OS threads != kernel sheduled threads. Most thread interfaces in
current Unix systems shedule N user threads onto M<N kernel
shedulabale threads. Linux and Windows NT are exceptions, they let
every threads be handled by the kernel.

Martin
--
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <crac...@bik-gmbh.de> http://www.bik-gmbh.de/~cracauer/
FreeBSD - where you want to go. Today. http://www.freebsd.org/

Michael Sperber [Mr. Preprocessor]

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

>>>>> "Martin" == Martin Cracauer <crac...@counter.bik-gmbh.de> writes:

Martin> spe...@informatik.uni-tuebingen.de (Michael Sperber [Mr. Preprocessor]) writes:

>>>>>>> "Pierre" == Pierre R Mai <pm...@acm.org> writes:

Pierre> OTOH Douglas T. Crosher is still working on an experimental version of
Pierre> CMUCL multi-threading which sits atop native threading mechanisms on
Pierre> Linux (and FreeBSD).

>> Unfortunately, OS threads are often *way* too heavyweight for the
>> really interesting applications of threads.

Martin> OS threads != kernel sheduled threads. Most thread interfaces in
Martin> current Unix systems shedule N user threads onto M<N kernel
Martin> shedulabale threads. Linux and Windows NT are exceptions, they let
Martin> every threads be handled by the kernel.

Both are usually too heavyweight.

Jens Kilian

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

spe...@informatik.uni-tuebingen.de (Michael Sperber [Mr. Preprocessor]) writes:

> They are not really new, but only recently the relevant publication
> has become accessible:
>
> William D. Clinger, Anne Hartheimer, Eric Ost: Implementation
> Strategies for First-Class Continuations. Higher-Order and Symbolic
> Computation 12(1): 7-45 (1999)

A Google search turned up the following article:

http://www.acm.org/pubs/citations/proceedings/lfp/62678/p124-clinger/

Is this the same paper?
--
mailto:j...@acm.org phone:+49-7031-464-7698 (HP TELNET 778-7698)
http://www.bawue.de/~jjk/ fax:+49-7031-464-7351
PGP: 06 04 1C 35 7B DC 1F 26 As the air to a bird, or the sea to a fish,
0x555DA8B5 BB A2 F0 66 77 75 E1 08 so is contempt to the contemptible. [Blake]

Tim Bradshaw

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

* Michael Sperber wrote:

> Unfortunately, OS threads are often *way* too heavyweight for the
> really interesting applications of threads.

unfortunately, unless you use the OS-supplied threads you'll never run
on a multiprocessor.

--tim

Matthias Blume

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

Tim Bradshaw <t...@cley.com> writes:

True (probably), but it does not mean that every thread has to be
mapped to an OS thread. For highly concurrent code, threads can and
should be kept as light-weight as possible. OS threads can be
allocated to match the number of physical processors. The threads of
the concurrent program then use those OS threads in a multiplexed way
as "compute-servers" (just like they would use the single thread of a
single-CPU system without OS threads).

Matthias

--
Matthias Blume <blume@k_u_r_i_m_s.k_y_o_t_o-u.a_c.j_p>
Kyoto University, Research Institute for Mathematical Sciences
(remove those spam-preventing underscores from mail address)

Daniel C. Wang

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

Joe Marshall <jmar...@alum.mit.edu> writes:
{stuff deleted}
> The interesting thing is that the cost of setting up and using the
> heap-based frame exceeds the cost of setting up, using, and
> *deallocating* the stack-based frame. Thus ``even if
> garbage-collection never occurs or costs nothing, the cost of heap
> allocation exceeds the cost of stack allocation.''

The real performance penalty is the loss of cache locality. A few extra
instructions per-frame should get lost in the noise. Also note that for
standard C calling conventions you usally have to install a "frame-pointer"
that points to the previous allocation frame. So in some sense heap
allocation is as cheap as stack allocation on a per-frame basis. Also note
that garabge collection can some times be "better than free" when compared
to a stack based system.

In the presence of exceptions and multi-threading the stack based
implementations become very complex, while the heap based systems remain
almost unchanged.

Hannah Schroeter

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

Hello!

In article <8ipt1q$2umv$1...@counter.bik-gmbh.de>,
Martin Cracauer <crac...@counter.bik-gmbh.de> wrote:
>[...]

>OS threads != kernel sheduled threads. Most thread interfaces in

>current Unix systems shedule N user threads onto M<N kernel

>shedulabale threads. Linux and Windows NT are exceptions, they let

>every threads be handled by the kernel.

However, userland threads also need a completely separate runtime
stack, allocating at least one complete page of memory, perhaps
more. That's a memory overhead that can well be bigger than with a
specialized thread implementation based e.g. on first class continuations
(i.e. CPS transform in the compiler).

Kind regards,

Hannah.

Xenophon Fenderson the Carbon(d)ated

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

>>>>> "Michael" == Michael Sperber [Mr Preprocessor] <spe...@informatik.uni-tuebingen.de> writes:

Michael> Both are usually too heavyweight.

What do you mean by "heavyweight"?

--
"I will never submit to your lustful advances", cried Princess Beatrice, as the
wealthy, powerful, muscular and strikingly handsome Count Bertrand slowly
adjusted his mink gloves, "at least for another half-hour!"
--Dr. Zinn, from the novel

Erik Naggum

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

* Michael Sperber

| Unfortunately, OS threads are often *way* too heavyweight for the
| really interesting applications of threads.

* Tim Bradshaw

| unfortunately, unless you use the OS-supplied threads you'll never run
| on a multiprocessor.

Whatever happened to good old interprocess communication, shared
memory, etc, on multiprocessor systems? Whatever it is people are
doing with threads can be handled without involving any kernel.
There need be no particularly complicated implementation, either.
At issue, however, is whether it makes sense to write languages and
runtime systems that maintain the threads and surrounding support or
hack things by hand in what passes for the "popular" languages.

I'm sorry, but I completely fail to understand why OS threads is the
only way to utilize "modern" multiprocessor systems. Maybe I'm just
too old to be able to pretend we didn't have SMPs in the 1980's.

#:Erik, sighing
--
If this is not what you expected, please alter your expectations.

Kenneth P. Turvey

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

On 19 Jun 2000 13:08:50 +0200, Michael Sperber [Mr. Preprocessor] <spe...@informatik.uni-tuebingen.de> wrote:
>
>>>>>> "Pierre" == Pierre R Mai <pm...@acm.org> writes:
>

>Pierre> Tim Bradshaw <t...@cley.com> writes:
>
>>> * Joe Marshall wrote:
>>> > Thant Tessman <th...@acm.org> writes:
>>> >> I'm not familiar with Common Lisp. Is it that CL has its own threading
>>> >> mechanism?
>>>
>>> > Most implementations of Common Lisp provide some multi-threading
>>> > capability. There is no `standard' (although most of the
>>> > multi-threading implementations seem to be based on `stack groups').
>>>
>>> If you use call/cc (+ something else: timer interrupts?) to do
>>> threading could such a system be persuaded to sit naturally on top of,
>>> say, posix threads? By `naturally' I mean `without heroic effort' or
>>> something.
>>>
>>> (Not that the current CL thread systems do, I know)
>

>Pierre> OTOH Douglas T. Crosher is still working on an experimental version of
>Pierre> CMUCL multi-threading which sits atop native threading mechanisms on
>Pierre> Linux (and FreeBSD).
>

>Unfortunately, OS threads are often *way* too heavyweight for the
>really interesting applications of threads.

Could you explain this a bit further. I can't really see how OS threads
would carry more overhead.

--
Kenneth P. Turvey <kt-...@SprocketShop.com>
http://www.tranquility.net/~kturvey/resume/resume.html
--------------------------------------------------------
I certainly learned a great deal from 3,000 town hall meetings across
my home state of Tennessee over a 16-year period.
-- Al Gore (That's one every 46.7 hours)

Kenneth P. Turvey

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

Tim Bradshaw

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

* Erik Naggum wrote:

> I'm sorry, but I completely fail to understand why OS threads is the
> only way to utilize "modern" multiprocessor systems. Maybe I'm just
> too old to be able to pretend we didn't have SMPs in the 1980's.

That's not what I meant -- I worded it badly. What I meant was `a
threading system that isn't based on OS threads is going to have a
very hard time utilizing a multiprocessor'. There are obviously other
ways to use a multiprocessor.

(and the `very hard time' is intentional -- I can imagine a threading
system which didn't use OS facilities which made use of a
multiprocessor, but I think it would be significantly painful to
implement if it was to be general).

--tim

Espen Vestre

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

Tim Bradshaw <t...@cley.com> writes:

> (and the `very hard time' is intentional -- I can imagine a threading
> system which didn't use OS facilities which made use of a
> multiprocessor, but I think it would be significantly painful to
> implement if it was to be general).

but it isn't always 'very hard'. Not if your system consists of several
communicating processes.
--
(espen)

Tim Bradshaw

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

* Espen Vestre wrote:

> but it isn't always 'very hard'. Not if your system consists of several
> communicating processes.

Yes, obviously. The trick is providing a supposedly-general facility,
or arguing convincingly against other people (C/posix vendors) who do.

--tim

Rob Warnock

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Daniel C. Wang <danwan...@cs.princeton.edu> wrote:
+---------------

| Joe Marshall <jmar...@alum.mit.edu> writes:
| > The interesting thing is that the cost of setting up and using the
| > heap-based frame exceeds the cost of setting up, using, and
| > *deallocating* the stack-based frame.
|

| The real performance penalty is the loss of cache locality. A few extra
| instructions per-frame should get lost in the noise.

+---------------

And the cache-locality issue can be handled (at least, in a generational
copying collector) by adopting Ungar's suggestion of having a single
*fixed* "nursery" area for allocating generation-0 objects. If the size
of this nursery area is matched to the secondary cache size of the CPU,
then it stays "hot" in the caches, just like with a conventional stack.

By the way, this is in fact exactly what happens when you use Baker's
"Cheny on the MTA" approach -- the portion of the C stack used by the
CotMTA code *is* the generation-0 allocation area.

+---------------

| Also note that for standard C calling conventions you usally have to
| install a "frame-pointer" that points to the previous allocation frame.
| So in some sense heap allocation is as cheap as stack allocation on a
| per-frame basis.

+---------------

*IF* you have already resigned yourself to using the C calling sequence --
but many (most?) commercial-quality Lisp & Scheme systems *don't* do that...

+---------------

| In the presence of exceptions and multi-threading the stack based
| implementations become very complex, while the heap based systems remain
| almost unchanged.

+---------------

Yup, which is why I personally think it's still worth considering as
an implementation strategy (along with an appropriate GC, of course).

-Rob

-----
Rob Warnock, 41L-955 rp...@sgi.com
Applied Networking http://reality.sgi.com/rpw3/
Silicon Graphics, Inc. Phone: 650-933-1673
1600 Amphitheatre Pkwy. PP-ASEL-IA
Mountain View, CA 94043

Matthias Blume

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

xeno...@irtnog.org (Xenophon Fenderson the Carbon$d$ated) writes:

> >>>>> "Michael" == Michael Sperber [Mr Preprocessor] <spe...@informatik.uni-tuebingen.de> writes:
>
> Michael> Both are usually too heavyweight.
>
> What do you mean by "heavyweight"?

High overhead for creation and context switching. Makes it
impractical to have more than a few. Hundreds would be pushing it,
thousands is pretty much out of the question. Moreover, many systems
place rather arbitrary restrictions on how many threads can be
created.

Espen Vestre

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Tim Bradshaw <t...@cley.com> writes:

> Yes, obviously. The trick is providing a supposedly-general facility,
> or arguing convincingly against other people (C/posix vendors) who do.

well, if one wants general mechanisms for multiple processors, why not
jump a step further and ask for general mechanisms for programming
over _clusters_... there was an interesting talk on this at ELUGM-99,
I found the paper at one of the authors home page:

http://lki-www.informatik.uni-hamburg.de/~hotz/homepage.html

--
(espen)

Michael Sperber [Mr. Preprocessor]

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to Jens Kilian

>>>>> "Jens" == Jens Kilian <Jens_...@agilent.com> writes:

Jens> spe...@informatik.uni-tuebingen.de (Michael Sperber [Mr. Preprocessor]) writes:
>> They are not really new, but only recently the relevant publication
>> has become accessible:
>>
>> William D. Clinger, Anne Hartheimer, Eric Ost: Implementation
>> Strategies for First-Class Continuations. Higher-Order and Symbolic
>> Computation 12(1): 7-45 (1999)

Jens> A Google search turned up the following article:

Jens> http://www.acm.org/pubs/citations/proceedings/lfp/62678/p124-clinger/

Jens> Is this the same paper?

The HOSC paper is a considerably updated and expanded version of its
LFP predecessor.

Michael Sperber [Mr. Preprocessor]

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

>>>>> "Tim" == Tim Bradshaw <t...@cley.com> writes:

Tim> * Michael Sperber wrote:

>> Unfortunately, OS threads are often *way* too heavyweight for the
>> really interesting applications of threads.

Tim> unfortunately, unless you use the OS-supplied threads you'll never run
Tim> on a multiprocessor.

You're confusing two different things here:

a) Threads to implement programming paradigms which depend on
concurrency, i.e. GUIs, reactive systems etc.

b) Threads to utilize multiprocessing to increase performance.

OS-supplied threads are sometimes OK for b) but mostly not OK for a).

There's no inherent reason why lightweight threads can't utilize
multiprocessing. However (as Erik Naggum points out later in the
thread), threads are not an ideal mechanism to exploit
multiprocessing.

Tim Bradshaw

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

* Michael Sperber wrote:

> You're confusing two different things here:

> a) Threads to implement programming paradigms which depend on
> concurrency, i.e. GUIs, reactive systems etc.

> b) Threads to utilize multiprocessing to increase performance.

No, I'm not, I just don't care at all about (a), and I do care about
(b).

Tim Bradshaw

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

* Espen Vestre wrote:

> well, if one wants general mechanisms for multiple processors, why not
> jump a step further and ask for general mechanisms for programming
> over _clusters_... there was an interesting talk on this at ELUGM-99,
> I found the paper at one of the authors home page:

It would be very nice if there was such generality, however, as far
as I know no one really knows how to write efficient stuff on a
cache-coherent shared-memory box which is also efficient on a
shared-nothing cluster.

(By `efficient' I mean, at least in part `makes good use of the money
you spent' -- if your program doesn't benefit from all the expensive
stuff that big commercial shared-memory machines have, but would run
just as well on a cluster, then it's not `efficient' by this
definition. I should probably use another word. And not beign
efficient on those machines might just mean they're a wastoe of money
of course).

--tim

Phil Stubblefield

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Interesting. Do you mean:

(1) you don't care about (a) for the purposes of this discussion;

(2) you acknowledge the need for (a) but don't often have the time
and/or resources to utilitze it;

(3) think (a) is a load of crap; or

(4) some combination thereof?

Phil Stubblefield
Rockwell Palo Alto Laboratory 206/655-3204
http://www.rpal.rockwell.com/~phil ph...@rpal.rockwell.com

Jost Boekemeier

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Matthias Blume <s...@my.sig> writes:
> > Michael> Both are usually too heavyweight.
> >
> > What do you mean by "heavyweight"?
>
> High overhead for creation and context switching.

Hmm, and I thought that only processes have high overhead. Threads
(created by the Linux clone() call for example) essentially share
their top-level environment and their code.

Why is that more "heavyweight" than continuations?

Jost

Michael Livshin

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Jost Boekemeier <jost...@calvados.zrz.tu-berlin.de> writes:

> Matthias Blume <s...@my.sig> writes:
> > > Michael> Both are usually too heavyweight.
> > >
> > > What do you mean by "heavyweight"?
> >
> > High overhead for creation and context switching.
>
> Hmm, and I thought that only processes have high overhead. Threads
> (created by the Linux clone() call for example) essentially share
> their top-level environment and their code.

I think the point is that switching the CPU context is expensive,
especially with SMP, especially on a Unix or Windows where this
entails a switch to kernel-mode and then back.

--
only legal replies to this address are accepted.

Parenthesize to avoid ambiguity.

Christopher Browne

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Centuries ago, Nostradamus foresaw a time when Jost Boekemeier would say:

>Matthias Blume <s...@my.sig> writes:
>> > Michael> Both are usually too heavyweight.
>> >
>> > What do you mean by "heavyweight"?
>>
>> High overhead for creation and context switching.
>
>Hmm, and I thought that only processes have high overhead. Threads
>(created by the Linux clone() call for example) essentially share
>their top-level environment and their code.
>

>Why is that more "heavyweight" than continuations?

The top level environment could represent rather a lot of data, and
"threads with kernel participation" require that you get the OS kernel
into the process, thus meaning that:
-> The kernel has to track _something_ about the thread;
-> In order to switch contexts, you need some form of system call.

The recent "stackless Python" <http://www.stackless.com/> seems to
have one of the best presentation of this.

The paper "sbcpaper.html" illustrates how expensive having C stack
frames is, and a mere stack frame is a whole lot _less_ expensive than
a kernel-supported thread.

This demonstrates why it is by no means obvious what is the best way
of speeding up code that _MIGHT_ be parallelizable:
- Using "local threading," with _no_ OS support, which amounts to
processing using continuations, requires the least overhead. But
has no way to take advantage of SMP hardware.
- Kernel-supported threading can split processing transparently
across multiple CPUs, but involves considerably more coordination
costs when threads need to synchronize or communicate.

"Local threading" is preferable if the application is going to have
huge hordes of threads such that you plan to switch contexts a whole
lot. As well as if you only have one CPU.

"Kernel threading" is preferable if there will be a few threads, with
only moderate amounts of context change, where substantial
computational work takes place "between switches." That allows
harnessing the horsepower of SMP hardware.

--
cbbr...@acm.org - <http://www.hex.net/~cbbrowne/linux.html>
A Plateau is the highest form of flattery.

Christopher Browne

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Michael Sperber [Mr. Preprocessor]

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

>>>>> "Jost" == Jost Boekemeier <jost...@calvados.zrz.tu-berlin.de> writes:

Jost> Matthias Blume <s...@my.sig> writes:
>> > Michael> Both are usually too heavyweight.
>> >
>> > What do you mean by "heavyweight"?
>>
>> High overhead for creation and context switching.

Jost> Hmm, and I thought that only processes have high overhead. Threads
Jost> (created by the Linux clone() call for example) essentially share
Jost> their top-level environment and their code.

But not their continuations.

Jost> Why is that more "heavyweight" than continuations?

It's not a question of "than continuations": Any thread system needs
to keep track of the continuation of each thread to do context
switching. It just so happens that OS threads (no matter if userland
or kernel space) usually work with a very expensive representation for
continuations (a stack) as compared with systems that use
heap-allocated continuation frames, or a combination of that with a
stack cache.

I seem to remember Marc Feely has done measurements of his upcoming
release of Gambit-C against Linux threads, and came up with a factor
of 1000 difference in "weight".

Matthias Blume

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Jost Boekemeier <jost...@calvados.zrz.tu-berlin.de> writes:

> Matthias Blume <s...@my.sig> writes:
> > > Michael> Both are usually too heavyweight.
> > >
> > > What do you mean by "heavyweight"?
> >
> > High overhead for creation and context switching.
>

> Hmm, and I thought that only processes have high overhead. Threads

> (created by the Linux clone() call for example) essentially share

> their top-level environment and their code.

As someone else already pointed out, OS thread context switching still
involves going through the kernel, and that is slow.

A while ago I wrote a simple test program that had two threads rapidly
spinning and hand-shaking using a mutex. I did this twice on the same
machine: one time in CML and another time using Linux threads. IIRC,
the CML program was a factor of at least 5 faster. (It could have
been even more, I do not remember.)

Actually, thinking about it a bit more, I realize that I tried this on
an UltraSparc (Solaris) and also on an X86 Linux box. I think the
factor 5 applies to the Solaris box. Linux was even worse, but that
could also have been a problem with the architecture...

Tim Bradshaw

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

* Phil Stubblefield wrote:

> (1) you don't care about (a) for the purposes of this discussion;

Yes I mean this one: obviously I realize it's important that gui apps
perform well &c. I also kind of misread the question, because
actually I care about concurrency too, just in this context I was
interested a threading system than can make use of multiprocessors.

Sorry for being confusing / confused.

--tim

Erik Naggum

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

* Phil Stubblefield <ph...@rpal.rockwell.com>
| Interesting. Do you mean:

|
| (1) you don't care about (a) for the purposes of this discussion;
|

Or perhaps (5) think some other implementation technique is superior
to using threads, so the performance of threads is irrelevant.

#:Erik

felix

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Rob Warnock wrote in message <8irsu2$1cpsj$1...@fido.engr.sgi.com>...

>+---------------
>| Also note that for standard C calling conventions you usally have to
>| install a "frame-pointer" that points to the previous allocation frame.
>| So in some sense heap allocation is as cheap as stack allocation on a
>| per-frame basis.
>+---------------
>
>*IF* you have already resigned yourself to using the C calling sequence --
>but many (most?) commercial-quality Lisp & Scheme systems *don't* do
that...

BTW, is there any Scheme implementation (that does translate to C) that
uses something else than Steele's venerable driver loop? (and some kludgy
argument passing via global variables)

I know Scheme->C does not (but fails on tail-recursion but in the simplest
cases).
Stalin doesn't, probably.
Gambit does.

Can someone help me here?

felix

Rob Warnock

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

Tim Bradshaw <t...@cley.com> wrote:
+---------------
| ...as I know no one really knows how to write efficient stuff on a

| cache-coherent shared-memory box which is also efficient on a
| shared-nothing cluster.

+---------------

Well, one data point: The SGI versions of the MPI libraries (I think we
call ours "MPT"?) are tuned to use local shared-memory mailboxes instead
of network calls if the target of the MPI operation is on the same system.
This means, for example, that you get a *lot* better MPI performance by
clustering a few "fat" (many-CPU) nodes, compared to a whole bunch of
small nodes, assuming the same total CPU count.

Ob. c.l.l.: I only wish I could find a "killer" Lisp application that
could make good use of a 256-CPU (or bigger) system. Most of the huge
HPC apps are still Fortran... (*sigh*)

Marc Feeley

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

Michael Sperber wrote:

> I seem to remember Marc Feely has done measurements of his upcoming
> release of Gambit-C against Linux threads, and came up with a factor
> of 1000 difference in "weight".

Yes, Gambit-C's thread system, which is implemented on top of
first-class continuations, is quite fast. The thread system conforms
to SRFI-21 (real-time multithreading), i.e. it supports priorities and
real-time scheduling.

I have benchmarked Gambit-C's thread system against LinuxThreads, Java
threads (using the Kaffe 1.0.5 JIT compiler), and good old "fork".
The benchmark used is a variant of the doubly recursive fibonacci
function where one of the branches is done by a new thread (it creates
a total of 61,000 threads). The tests were done on an unloaded 600
MHz Athlon based PC with 256 MB RAM, RedHat 6.1, and egcs-2.91.66.

System Program Elapsed time Ratio with Gambit

Gambit-C pre 4.0 tfib.scm 0.36 1
Kaffe OpenVM 1.0.5 (JIT) Java_tfib.java 30.69 85
fork fork-tfib.c 45.98 128
LinuxThreads pthread-tfib.c 427.41 1187

As you can see Gambit-C's thread system is over 1000 times faster than
LinuxThreads and close to 100 times faster than Java threads.
(Suprisingly, fork is 10 times faster than LinuxThreads!) So
Gambit-C's thread system can create and join over 150,000 threads per
second, i.e. 6 microseconds per create and join.

Moreover, when using LinuxThreads you can't create more than 1000
concurrent threads or so (i.e. fib(16) crashes) and there is a
considerable space usage per thread (2 MBytes or so). Gambit-C's
thread system has no such limit, and a thread consumes as little as
512 bytes (it all depends on how deep the thread's continuation is).

Marc

P.S.: Here is the source code for these programs if anyone cares to
experiment.

; file: "tfib.scm"

(declare (standard-bindings) (block) (not safe) (fixnum))

(define (tfib n)
(if (< n 2)
1
(let ((x (make-thread (lambda () (tfib (- n 2))))))
(thread-start! x)
(let ((y (tfib (- n 1))))
(+ (thread-join! x) y)))))

(define (main)
(let ((n (string->number (cadr (argv))))
(repeat (string->number (caddr (argv)))))
(let loop ((repeat repeat)
(result '()))
(if (> repeat 0)
(let ((x (make-thread (lambda () (tfib n)))))
(thread-start! x)
(let ((r (thread-join! x)))
(loop (- repeat 1) r)))
(pp (list (list 'fib n) '= result))))))

(main)

; Test on a 600 MHz Athlon with RH 6.1:
;
; % gsc tfib
; % gcc -O2 -D___SINGLE_HOST tfib.c tfib_.c -lgambc -lm -lncurses -ldl
; % time ./a.out -:m1000 14 100
; ((fib 14) = 610)
; 0.34user 0.02system 0:00.36elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
; 0inputs+0outputs (309major+437minor)pagefaults 0swaps

/* file: "pthread-tfib.c" */

#include <stdio.h>
#include <pthread.h>

void *tfib (void *n)
{
if ((int)n < 2)
return (void*)1;
else
{
pthread_t x;
void *y;
void *z;
if (pthread_create (&x, NULL, tfib, (void*)((int)n - 2)) != 0) exit (1);
y = tfib ((void*)((int)n - 1));
if (pthread_join (x, &z) != 0) exit (1);
return (void*)((int)y + (int)z);
}
}

int main (int argc, char *argv[])
{
int n = atoi (argv[1]);
int repeat = atoi (argv[2]);
void *result;
do
{
pthread_t x;
if (pthread_create (&x, NULL, tfib, (void*)((int)n)) != 0) exit (1);
if (pthread_join (x, &result) != 0) exit (1);
repeat--;
} while (repeat > 0);
printf ("nb. threads = fib(%d) = %d\n", n, (int)result);
return 0;
}

/* Test on a 600 MHz Athlon with RH 6.1:

% gcc -O2 pthread-tfib.c -lpthread
% time ./a.out 14 100
nb. threads = fib(14) = 610
2.30user 425.10system 7:07.41elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (109major+61014minor)pagefaults 0swaps
*/

/* file: "fork-tfib.c" */

#include <stdio.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

int tfib (int n)
{
if (n < 2)
return 1;
else
{
pid_t x;
int y;
int z;
x = fork ();
if (x < 0) exit (1);
if (x == 0) exit (tfib (n - 2));
y = tfib (n - 1);
if (waitpid (x, &z, 0) < 0) exit (1);
return y + WEXITSTATUS(z);
}
}

int main (int argc, char *argv[])
{
int n = atoi (argv[1]);
int repeat = atoi (argv[2]);
int result;
do
{
pid_t x;
x = fork ();
if (x < 0) exit (1);
if (x == 0) exit (tfib (n));
if (waitpid (x, &result, 0) < 0) exit (1);
repeat--;
} while (repeat > 0);
printf ("fib(%d) mod 256 = %d\n", n, WEXITSTATUS(result));
return 0;
}

/* Test on a 600 MHz Athlon with RH 6.1:

% gcc -O2 fork-tfib.c
% time ./a.out 14 100
fib(14) mod 256 = 98
5.72user 40.27system 0:45.98elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (183103major+938521minor)pagefaults 0swaps
*/

// file: "Java_tfib.java"

class TFib extends Thread
{
TFib (int arg) { n = arg; }

public void run ()
{ try { result = fib (n); } catch (InterruptedException e) { } }

private int n;

public int result;

private int fib (int n)
throws InterruptedException
{
if (n < 2)
return 1;
else
{
TFib x = new TFib (n-2);
x.start ();
int y = fib (n-1);
x.join ();
return x.result + y;
}
}
}

class Java_tfib
{
public static void main (String[] args)
throws InterruptedException
{
int n = Integer.parseInt (args[0]);
int repeat = Integer.parseInt (args[1]);
TFib x;

do
{
x = new TFib (n);
x.start ();
x.join ();
repeat--;
} while (repeat > 0);

System.out.println ("nb. threads = fib(" + n + ") = " + x.result);
}
}

// Test on a 600 MHz Athlon with RH 6.1 and Kaffe OpenVM 1.0.5 (JIT compiler):
//
// % javac Java_tfib.java
// % time kaffe Java_tfib 14 100
// nb. threads = fib(14) = 610
// 24.16user 0.11system 0:30.69elapsed 79%CPU (0avgtext+0avgdata 0maxresident)k
// 0inputs+0outputs (3057major+6387minor)pagefaults 0swaps

Eliot Miranda

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

Marc Feeley wrote:
>
> Michael Sperber wrote:
>
> > I seem to remember Marc Feely has done measurements of his upcoming
> > release of Gambit-C against Linux threads, and came up with a factor
> > of 1000 difference in "weight".
>
> Yes, Gambit-C's thread system, which is implemented on top of
> first-class continuations, is quite fast. The thread system conforms
> to SRFI-21 (real-time multithreading), i.e. it supports priorities and
> real-time scheduling.
>
> I have benchmarked Gambit-C's thread system against LinuxThreads, Java
> threads (using the Kaffe 1.0.5 JIT compiler), and good old "fork".
> The benchmark used is a variant of the doubly recursive fibonacci
> function where one of the branches is done by a new thread (it creates
> a total of 61,000 threads). The tests were done on an unloaded 600
> MHz Athlon based PC with 256 MB RAM, RedHat 6.1, and egcs-2.91.66.
>
> System Program Elapsed time Ratio with Gambit
>
> Gambit-C pre 4.0 tfib.scm 0.36 1
> Kaffe OpenVM 1.0.5 (JIT) Java_tfib.java 30.69 85
> fork fork-tfib.c 45.98 128
> LinuxThreads pthread-tfib.c 427.41 1187

Here's the results for VisualWorks 5i.2, a Smalltalk-80 system on a 400
MHz Pentium II MMX

tfib1 = 610 in 0.066 (using Promise)

tfib2 = 610 in 0.037 (using hand-coded join)

tfib3 = 610 in 0.045 (using a quickly hacked-up JoinableProcess & join)

I've attached the code. One thing that's nice to do with fibonacci
benchmarks is make the result equal the number of operations you're
trying to measure. For example if you're trying to measure activations
per second then use

nfib 0 -> 1
nfib 1 -> 2
nfib n -> nfib (n-1) + nfib(n-2) + 1

The inner + 1 ensures we count the inner activation. So the result is
also the number of activations it took to compute the result, so
dividing result by time gives activations per second.

Since you're interested in counting one thread creation per inner
activation, why not define it thus:
tfib 0 -> 0
tfib 1 -> 0
tfib n -> tfib(n-1) + join(fork(tfib(n-2)) + 1

then dividing the result by time approximates to fork/joins per second
if they dominate.

--
_______________,,,^..^,,,____________________________
Eliot Miranda Smalltalk - Scene not herd

tfib.st

tfib-xml.st

Marc Feeley

unread,

Jun 24, 2000, 3:00:00 AM6/24/00

to

Eliot Miranda wrote:

> Here's the results for VisualWorks 5i.2, a Smalltalk-80 system on a 400
> MHz Pentium II MMX
>
> tfib1 = 610 in 0.066 (using Promise)
>
> tfib2 = 610 in 0.037 (using hand-coded join)
>
> tfib3 = 610 in 0.045 (using a quickly hacked-up JoinableProcess & join)

This is interesting, but is the time for one iteration or 100 (as in
my benchmark)? I can't see a loop in you code (but my Smalltalk is a
little rusty). This would bring the run time between 3.7 and 6.6
seconds which is roughly a factor of 10 slower than with Gambit-C's
threads, right? Also, do any of these variants of threads support
priorities and real-time scheduling?

Marc

Eliot Miranda

unread,

Jun 24, 2000, 3:00:00 AM6/24/00

to

Marc Feeley wrote:

>
> Eliot Miranda wrote:
>
> > Here's the results for VisualWorks 5i.2, a Smalltalk-80 system on a 400
> > MHz Pentium II MMX
> >
> > tfib1 = 610 in 0.066 (using Promise)
> >
> > tfib2 = 610 in 0.037 (using hand-coded join)
> >
> > tfib3 = 610 in 0.045 (using a quickly hacked-up JoinableProcess & join)
>

> This is interesting, but is the time for one iteration or 100 (as in
> my benchmark)?

Oops! Just one. I missed the x100! I'll redo them. But the other
important factor is that they're "green" threads. Which of your
benchmarks are green? I know LinuxThreads are native. There are
dialects of Smalltalk that fully supports true threads (Smalltalk-MT,
QKS Smalltalk) but VisualWorks isn't one of them.

OK, here are the x100 results

| s n |
s := String new writeStream.
s cr.
n := 14.
#(tfib2 tfib3 tfib1) do:
[:functionName| | milliseconds result |
milliseconds := Time millisecondsToRun:
[100 timesRepeat:
[result := n perform: functionName]].
s cr;
nextPutAll: functionName;
nextPutAll: ' = ';
print: result;
nextPutAll: ' in ';
print: milliseconds / 1000.0; cr].
s contents'

tfib2 = 610 in 4.781 (hand-coded join)

tfib3 = 610 in 6.192 (join via JoinableProcess)

tfib1 = 610 in 7.636 (Promise, speed problems because of huge GC
overhead, allocates 70 meg!)

'

> I can't see a loop in you code (but my Smalltalk is a
> little rusty). This would bring the run time between 3.7 and 6.6
> seconds which is roughly a factor of 10 slower than with Gambit-C's
> threads, right?

Well, we need to scale for relative processor performance, but within a
factor of 2, yes. What platform are you running on? If Linux or
Windows then I can run your code here or send you a pointer to
VisualWorks 5i non-commercial for you to run there.

| Also, do any of these variants of threads support
> priorities and real-time scheduling?

VisualWorks, and all Smalltalk-80 derived Smalltalks including Squeak,
have multiple priorities and a real-time scheduler. VisualWorks
actually lets you set the number of priorities dynamically.

Martin Cracauer

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

Jost Boekemeier <jost...@calvados.zrz.tu-berlin.de> writes:

>Matthias Blume <s...@my.sig> writes:
>> > Michael> Both are usually too heavyweight.
>> >
>> > What do you mean by "heavyweight"?
>>
>> High overhead for creation and context switching.

>Hmm, and I thought that only processes have high overhead. Threads
>(created by the Linux clone() call for example) essentially share
>their top-level environment and their code.

The sharing or not of a few pages and page tables (which is improved
here) is not the problem.

In this scheme (1-to-1 uselandthread to kernel-sheduled thread),
resheduing and even most locking requires system calls and those are
expensive. You need to cross from/to userland and kernel for too much
stuff that descent threading scheme can do on their own with at most
an signal.

Linus claims that Linux - compared to other OSes - has so few system
call overhead that it is worth the simplification. This is not true
for almost all uses of threads.

>Why is that more "heavyweight" than continuations?

In almost every way, except that continuations make optimization of
code more difficult which is an entirely different problem.

Martin
--
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <crac...@bik-gmbh.de> http://www.bik-gmbh.de/~cracauer/
FreeBSD - where you want to go. Today. http://www.freebsd.org/

Lars Thomas Hansen

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

"felix" <fe...@anu.ie> writes:

> BTW, is there any Scheme implementation (that does translate to C) that
> uses something else than Steele's venerable driver loop? (and some kludgy
> argument passing via global variables)

Petit Larceny can be configured to use the Baker/Appel trick (a nonlocal
jump is implemented as a C call to the continuation until there's no
more space to jump, at which point you longjmp to garbage collect all
the useless stack frames), though it still passes arguments via global
variables to support precise GC.

Petit Larceny is not yet available and I don't have performance numbers
to report for this particular implementation technique.

--lars

George Neuner

unread,

Aug 31, 2000, 12:03:05 PM8/31/00

to

On 21 Jun 2000 17:42:11 +0100, Tim Bradshaw <t...@cley.com> wrote:

>* Erik Naggum wrote:
>
>> I'm sorry, but I completely fail to understand why OS threads is the
>> only way to utilize "modern" multiprocessor systems. Maybe I'm just
>> too old to be able to pretend we didn't have SMPs in the 1980's.
>
>That's not what I meant -- I worded it badly. What I meant was `a
>threading system that isn't based on OS threads is going to have a
>very hard time utilizing a multiprocessor'. There are obviously other
>ways to use a multiprocessor.
>
>(and the `very hard time' is intentional -- I can imagine a threading
>system which didn't use OS facilities which made use of a
>multiprocessor, but I think it would be significantly painful to
>implement if it was to be general).
>
>--tim

I recall seeing a paper from the University of Washington on something
called "scheduler activations" that was concerned with kernel support
for user level thread management.

IIRC - the authors proposed a system in which kernel thread events
were reflected to the user level scheduler. They created a kind of
virtual multiprocessor which allowed for user level threads to be
mapped to a potentially dynamic pool of kernel threads. In effect, it
treated each kernel thread as a separate processor and mapped user
threads to processors as needed.

UW's Spin OS project looks [superficially at least] like it
incorporates a lot of the ideas I recall from that paper.

George

Lyman Taylor

unread,

Aug 31, 2000, 12:50:21 PM8/31/00

to

George Neuner wrote:
>
> On 21 Jun 2000 17:42:11 +0100, Tim Bradshaw <t...@cley.com> wrote:

....

> >multiprocessor, but I think it would be significantly painful to
> >implement if it was to be general).

...

> IIRC - the authors proposed a system in which kernel thread events

"Proposed" is not necessarily distinct from "painful to implement". :-)
Or from "painful to use".

I haven't scanned that paper but....
If the kernel threads have there own scheduler then there are
two levels of scheduling that occur. Scheduling on the abstract and
physical processors. I suspect that was not a performance benefit.

Lyman

George Neuner

unread,

Aug 31, 2000, 2:25:52 PM8/31/00

to

On Thu, 31 Aug 2000 09:50:21 -0700, Lyman Taylor
<lyman....@mindspring.com> wrote:

> "Proposed" is not necessarily distinct from "painful to implement". :-)
> Or from "painful to use".

No question.

> I haven't scanned that paper but....
> If the kernel threads have there own scheduler then there are
> two levels of scheduling that occur. Scheduling on the abstract and
> physical processors. I suspect that was not a performance benefit.

User space thread packages have to deal with this now - they have
little control over the kernel thread executing the process. User
threading only affects which part of the application executes when or
if the OS decides to execute it.

Despite that, for certain types of applications, e.g., transaction
systems that create thousands of short lived threads, user threading
is a really big win over kernel threading.

The Spin OS is interesting in that the microkernel can reflect almost
any kernel event into user space for special handling. A process can
only receive events that affect its own virtual space, so there is
isolation from other processes, but the user level event handlers run
with kernel privilege so there is no protection penalty in invoking
them. The handlers are signed in some manner based on what OS
functions they use so the kernel can statically check privileges
before the handler is installed.

George

eric hoffman

unread,

Aug 31, 2000, 4:31:34 PM8/31/00

to

as far as I can recall, scheduler activitations are a specialized form
of upcalls used to get around problems with threading caused by the
process/kernel boundary. as such its not as interesting or useful as
a generalized continuation approach.

in case people are interested here is the reference:

T. Anderson, B. Bershad, E. Lazowska and H. Levy. ``Scheduler
Activations: Effective Kernel Support for the User-Level Management of
Parallelism.'' ACM Transactions on Computer Systems 10, 1 (February
1992), pp. 53-79.

http://www.cs.berkeley.edu/~brewer/cs262/Scheduler.pdf

Matthew R Wette

unread,

Aug 31, 2000, 2:02:27 PM8/31/00

to

Matthew R Wette <Matthew...@jpl.nasa.gov> writes:

> Solaris uses a user-level + kernel-level thread approach. The kernel
> threads are mapped to processors. The user threads are mapped to
> kernel threads. There are hooks to allow the user to specify how many
> processors should be used. There used to be white papers on this
> design but I just looked on sun.com and didn't find them. Maybe
> ftp.uu.net has them still.

There are some details here:

http://www.sun.com/software/white-papers/wp-realtime/index.html

--
Matthew.R.Wette at jpl.nasa.gov -- I speak for myself, not for JPL.

Matthew R Wette

unread,

Aug 31, 2000, 1:26:59 PM8/31/00

to

gne...@dyn.SPAMMERS.DIE.com (George Neuner) writes:

Solaris uses a user-level + kernel-level thread approach. The kernel

threads are mapped to processors. The user threads are mapped to
kernel threads. There are hooks to allow the user to specify how many
processors should be used. There used to be white papers on this
design but I just looked on sun.com and didn't find them. Maybe
ftp.uu.net has them still.

Matt

Rob Warnock

unread,

Sep 2, 2000, 12:21:18 AM9/2/00

to

Matthew R Wette <Matthew...@jpl.nasa.gov> wrote:
+---------------

| Solaris uses a user-level + kernel-level thread approach. The kernel
| threads are mapped to processors. The user threads are mapped to
| kernel threads. There are hooks to allow the user to specify how many
| processors should be used. There used to be white papers on this
| design but I just looked on sun.com and didn't find them.

+---------------

SGI does something similar. From <URL:http://techpubs.sgi.com/library/
dynaweb_bin/ebt-bin/0650/nph-infosrch.cgi/infosrchtpl/SGI_Developer/
OrOn2_PfTune/@ebt-link;he=0?target=%25N%15_10657_START_RESTART_N%25;
DwebQuery=threads>, we see that we *used* to do "m-on-n" scheduling
(as above), but now[*] do the multi-CPU dispatching entirely in user mode:

C Source Using POSIX Threads
...
You can write a multithreaded program using the POSIX threads model
and POSIX synchronization primitives (POSIX standards 1003.1b,
threads, and 1003.1c, realtime facilities). The use of these
libraries is documented in Topics in IRIX Programming, listed
in the section "Software Tool Manuals".

Through IRIX 6.4, the implementation of POSIX threads creates a
certain number of IRIX processes and uses them to execute the
pthreads. Typically the library creates fewer processes than the
program creates pthreads (called an "m-on-n" implementation).
You cannot control or predict which process will execute the code
of any pthread at any time. When a pthread blocks, the process
running it looks for another pthread to run.

[*]==> Starting with IRIX 6.5, the pthreads library allocates a varying
number of execution resources (basically, CPUs) and dispatches them
to the runnable threads. These execution resources are allocated
and dispatched entirely in the user process space, and do not
require the creation of UNIX processes. As a result, pthread
dispatching is more efficient.

The "Topics in IRIX Programming" document mentioned can be found at
<URL:http://techpubs.sgi.com/library/dynaweb_bin/ebt-bin/0650/
nph-infosrch.cgi/infosrchtpl/SGI_Developer/T_IRIX_Prog/
@InfoSearch__BookTextView/4>.

Look particularly at "Part Four: Models of Parallel Computation",
"Chapter 13, "Thread-Level Parallelism". Table 13-1 in that section
provides a fairly extensive comparison of POSIX threads, "Lightweight
Processes" [such as Irix's "sproc()"], and traditional Unix processes.