Implementing coroutines with threads...but not rooting objects?

Charles Oliver Nutter

unread,

Sep 26, 2009, 1:41:19 AM9/26/09

to JVM Languages

Ok, here's a classic challenge for those of us on the JVM: coroutines.

In JRuby, the issue has been forced upon us by the inclusion of
somewhat localized coroutines for external enumeration in Ruby 1.8.7
and general coroutines in the form of fibers in 1.9. The two cases are
slightly different.

For external enumerating coroutines, we can have options that simulate
coroutine state. For example, the Enumerator for a core Array class
could know how to "next" through the array elements without using a
coroutine to call "each" with a block. But for other cases that may
have arbitrarily-complex iteration logic, we need a full-on coroutine
to hop in and out of that logic. And for these cases we must spin up a
thread to do the iteration.

For fibers, the logic is almost always going to be arbitrarily
complex, so they'll be a fiber in every case. We'll use pooling tricks
to reduce the overhead of spinning up fibers, but we can't avoid using
threads to implement them.

Functionally, spinning up threads where needed and ping-ponging state
back and forth is not a serious functional challenge. The challenge is
in managing the *lifecycle* of those threads.

The use of threads for simulating coroutines has a few immediate problems:

1. Threads are more expensive to spin up
2. Thread counts may be limited by the host platform
3. Threads root objects, for GC purposes
4. Threads will not GC until terminated

The latter two issues have been keeping me up nights. I'm looking for
suggestions.

Because Threads can root objects, and because we may want a fiber or
enumerator to GC before they've reached a terminal state (finished
enumerating/fiber exits), two major complexities arrive.

* The fiber/enum object can hold a hard reference to the thread, but
must define a finalizer so that when they are ready for collection
they can terminate the coroutine thread
* The thread must have a hard reference to the fiber/enum object, so
as not to interfere with it coming eligible for collection

Both of these stem from the realization that an incomplete
enumeration/fiber/generator can ultimately hold state forever, if the
thread doing that enumeration can't be killed once it would be
eligible for GC.

If it were possible to create "unrooting" threads, or perhaps threads
that only reference but do not root objects in their stacks, this all
become much easier...but I don't think there's a way currently to
create such threads, is there?

Are there other ways to implement this I've missed? Of course I know
about the approach Rife and Scala take for continuations in very
localized conditions, but they can't work across aribitrary libraries
that have not been similarly manipulated.

I'm now getting more desperate to see coroutines added to *any*
production-class JVM, even if I have to sign a waiver and pay a pound
of flesh to get at it.

- Charlie

John Cowan

unread,

Sep 26, 2009, 2:47:57 AM9/26/09

to jvm-la...@googlegroups.com

On Sat, Sep 26, 2009 at 1:41 AM, Charles Oliver Nutter
<hea...@headius.com> wrote:

> Ok, here's a classic challenge for those of us on the JVM: coroutines.

In order to know what you need, you need to say what *kind* of
coroutines these are. The Lua people have three binary features that
a given language's coroutines may or may not have:

1) Are coroutines first-class objects that can be passed around as
arguments or results and stored in data structures, or are they only
creatable and runnable using special syntax?

2) Can a coroutine resume any other coroutine (full coroutines,
symmetric coroutines), or only the coroutine that invoked it
(semi-coroutines, asymmetric coroutines, semi-symmetric coroutines)?

3) Can only the coroutine itself yield (shallow), or can any
subroutine invoked by the coroutine yield on its behalf (deep)?

Here's their paper on how it's done in Lua, which has first-class deep
semi-coroutines: http://www.inf.puc-rio.br/~roberto/docs/corosblp.pdf
.

--
GMail doesn't have rotating .sigs, but you can see mine at
http://www.ccil.org/~cowan/signatures

John Rose

unread,

Oct 6, 2009, 2:42:57 AM10/6/09

to jvm-la...@googlegroups.com

On Sep 25, 2009, at 10:41 PM, Charles Oliver Nutter wrote:

> Ok, here's a classic challenge for those of us on the JVM: coroutines.

Yes indeed.

FTR, here are some notes from our lively talk on this subject at the
JVM Language Summit:
http://wiki.jvmlangsummit.com/MotionsToContinue

> For external enumerating coroutines, we can have options that simulate
> coroutine state. For example, the Enumerator for a core Array class
> could know how to "next" through the array elements without using a
> coroutine to call "each" with a block. But for other cases that may
> have arbitrarily-complex iteration logic, we need a full-on coroutine
> to hop in and out of that logic. And for these cases we must spin up a
> thread to do the iteration.

Brian Goetz just pointed out Marcin Rzeznicki's project to me on this
subject. Check out this lovely samefringe code:
http://code.google.com/p/coroutines/source/browse/trunk/src/mr/go/coroutines/tests/classes/Tree.java

(RIFE probably offers the same thing; do they have a similarly elegant
samefringe demo?)

I think of such approaches "bytecode shredding". The idea is to
transform bytecodes to CPS after compilation. The problem is that it
is hard to know which bytecode to do this to; you don't want to abuse
the JVM by doing it to the bytecodes. I think integrating the
shredding into the JVM would be most efficient, since it would make
the transformation lazy and transparent.

> For fibers, the logic is almost always going to be arbitrarily
> complex, so they'll be a fiber in every case. We'll use pooling tricks
> to reduce the overhead of spinning up fibers, but we can't avoid using
> threads to implement them.
>
> Functionally, spinning up threads where needed and ping-ponging state
> back and forth is not a serious functional challenge. The challenge is
> in managing the *lifecycle* of those threads.
>
> The use of threads for simulating coroutines has a few immediate
> problems:
>
> 1. Threads are more expensive to spin up
> 2. Thread counts may be limited by the host platform
> 3. Threads root objects, for GC purposes
> 4. Threads will not GC until terminated
>
> The latter two issues have been keeping me up nights. I'm looking for
> suggestions.

A thread T1 could (perhaps) be GC-ed if it could be proven that T1 is
blocked, and that all references to T1 are unreachable, and (lastly)
that T1 could be unblocked only by some operation on a reachable
reference to T1. I doubt if any JVM has such logic in it, and I
suspect that Java's threading APIs might offer resistance to proving
the last statement. But maybe not.

Even if all that were to become workable, it would still be
unsatisfactory to be creating lots of fibers speculatively, and
letting the GC clean them up. The problem would be, not that the GC
couldn't clean them up, but that each one would be a short-lived
megabyte-scale object, even though, in the source code, it might
appear to have only a few words of interesting state.

> If it were possible to create "unrooting" threads, or perhaps threads
> that only reference but do not root objects in their stacks, this all
> become much easier...but I don't think there's a way currently to
> create such threads, is there?

No. A live thread is assumed to be capable of future side effects on
the whole system, so cannot be pruned or killed. (Except, maybe, as
T1 above.)

> Are there other ways to implement this I've missed? Of course I know
> about the approach Rife and Scala take for continuations in very
> localized conditions, but they can't work across aribitrary libraries
> that have not been similarly manipulated.

Yep; that's a subset of the "siloing" problem Neal and Josh were
insistent about at the Summit. But, how serious is this problem (in
your case)?

> I'm now getting more desperate to see coroutines added to *any*
> production-class JVM, even if I have to sign a waiver and pay a pound
> of flesh to get at it.

Send chocolate to Lukas Stadler and Marcin Rzeznicki!

We need to experiment more with this stuff. I am confident the
community can find graceful trade-offs to scale good demos like
Tree.samefringe into industrial-strength APIs. But it will require
experimentation with the corner cases. When the design settles out,
it will (presumably) be attractive for JVM vendors to optimize such
things.

On Sep 25, 2009, at 11:47 PM, John Cowan wrote:

> In order to know what you need, you need to say what *kind* of
> coroutines these are.

At the Summit we noted these degrees of freedom relevant to the JVM:
> • full or scoped
> • restartability: one-shot or repeated restart
> • reflectability: opaque, read-only, serializable (forgeable!)
> • mobility: thread-bound, JVM-bound, mobile

> The Lua people have three binary features that
> a given language's coroutines may or may not have:
>
> 1) Are coroutines first-class objects that can be passed around as
> arguments or results and stored in data structures, or are they only
> creatable and runnable using special syntax?

Any JVM-based coroutine is likely to be first-class. That in turn
leads to further questions of whether it is mobile and/or reflectable
in any way.

> 2) Can a coroutine resume any other coroutine (full coroutines,
> symmetric coroutines), or only the coroutine that invoked it
> (semi-coroutines, asymmetric coroutines, semi-symmetric coroutines)?

That's a good question. Does anyone know what the C# answer to that
is (for yielding iterators)?

> 3) Can only the coroutine itself yield (shallow), or can any
> subroutine invoked by the coroutine yield on its behalf (deep)?

To do samefringe (and IMO be useful) you need deep coroutines. And:
If coroutine C1 is going to call coroutine C2, which then will yield a
value back to C1's original caller, must C1 be compiled specially,
perhaps to call C2 via a special calling sequence? Or (this is
Charlie's siloing problem, I think) can C1 (and its coder and its
compiler) be ignorant of any coroutining, looking just like normal
Java code that happens to call C2?

When we finish figuring out the trade-offs, my guess is that we will
need C1 to be "knowing", and somehow to advertise to the JVM
(statically) that it is calling a coroutine C2.

> Here's their paper on how it's done in Lua, which has first-class deep
> semi-coroutines: http://www.inf.puc-rio.br/~roberto/docs/corosblp.pdf
> .

Lovely; thanks.

-- John

Neal Gafter

unread,

Oct 6, 2009, 10:49:45 AM10/6/09

to jvm-la...@googlegroups.com

On Mon, Oct 5, 2009 at 11:42 PM, John Rose <John...@sun.com> wrote:

> 2) Can a coroutine resume any other coroutine (full coroutines,
> symmetric coroutines), or only the coroutine that invoked it
> (semi-coroutines, asymmetric coroutines, semi-symmetric coroutines)?

That's a good question. Does anyone know what the C# answer to that
is (for yielding iterators)?

A "yield return" in a C# iterator returns control to the caller of MoveNext. That is the equivalent of returning control to the caller of Java's Enumerator.hasNext. In other words, to the invoker. However, the invoker isn't typically an iterator method (coroutine).

This is flexible enough to enable the creation of frameworks that simulate symmetric coroutines, though with a bit of boilerplate in clients at the yield points. Eliminating that boilerplate is one of the things I'm working on.

John Cowan

unread,

Oct 19, 2009, 5:05:28 PM10/19/09

to JVM Languages, John Cowan

On Oct 6, 2:42 am, John Rose <John.R...@Sun.COM> wrote:

> > 3) Can only the coroutine itself yield (shallow), or can any
> > subroutine invoked by the coroutine yield on its behalf (deep)?
>
> To do samefringe (and IMO be useful) you need deep coroutines. And:
> If coroutine C1 is going to call coroutine C2, which then will yield a
> value back to C1's original caller, must C1 be compiled specially,
> perhaps to call C2 via a special calling sequence? Or (this is
> Charlie's siloing problem, I think) can C1 (and its coder and its
> compiler) be ignorant of any coroutining, looking just like normal
> Java code that happens to call C2?

I think you have to statically mark one or the other of them, but I
don't think it matters much which one it is. C1 can say "I am
calling a coroutine" (so Lua), or C2 can say "I am a coroutine"
(so in CLU, the first language to do generators).

hlovatt

unread,

Oct 21, 2009, 6:56:17 AM10/21/09

to JVM Languages

Can you handle the lifetime by hiding the threads behind a closable
external iterator, e.g. the tree comparing example given in the Google
Code project Coroutine could be written:

package googlecodecoroutines;

import java.io.Closeable;
import java.util.Iterator;
import java.util.NoSuchElementException;
import java.util.concurrent.Callable;
import java.util.concurrent.Exchanger;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;

/**
* Thread version of Google Code Coroutines:
* http://code.google.com/p/coroutines/source/browse/trunk/src/mr/go/coroutines/tests/classes/Tree.java
*
* @author Howard Lovatt
*/
final class Tree<E extends Comparable<E>> {
private final TreeNode root;

public Tree( final E root ) { this.root = new TreeNode( root ); }

public static <T extends Comparable<T>> boolean sameFringe( final
Tree<T> t1,
final
Tree<T> t2 ) {
final SafeCloseableIterator<T> t1Leaves = t1.leaves
( t1.root ).iterator();
final SafeCloseableIterator<T> t2Leaves = t2.leaves
( t2.root ).iterator();

try {
while ( t1Leaves.hasNext() && t2Leaves.hasNext() ) {
if ( t1Leaves.next().compareTo( t2Leaves.next() ) != 0 ) {
return false;
}
}
return !t1Leaves.hasNext() && !t2Leaves.hasNext();
} finally {
t1Leaves.close();
t2Leaves.close();
}
}

public void insert( final E value ) { insert( root, value ); }

private void insert( final TreeNode node, E value ) {
if ( value.compareTo( node.key ) < 0 ) {
if ( node.left != null ) { insert( node.left, value ); }
else { node.left = new TreeNode( value ); }
} else {
if ( node.right != null ) { insert( node.right, value ); }
else { node.right = new TreeNode( value ); }
}
}

private Coroutine<E> leaves( final TreeNode tn ) {
return new Coroutine<E>( new LeavesYieldable( tn ) );
}

@Override public String toString() { return root.toString(); }

private final class LeavesYieldable extends Yieldable<E> {
final TreeNode tn;

LeavesYieldable( final TreeNode tn ) { this.tn = tn; }

@Override public void run() {
if ( tn.isLeaf() ) { yield( tn.key ); }
else {
if ( tn.left != null ) {
final SafeCloseableIterator<E> i = leaves( tn.left ).iterator
();
try {
while ( i.hasNext() ) {
yield( i.next() );
}
} finally {
i.close();
}
}
if ( tn.right != null ) {
final SafeCloseableIterator<E> i = leaves
( tn.right ).iterator();
try {
while ( i.hasNext() ) {
yield( i.next() );
}
} finally {
i.close();
}
}
}
close();
}
}

private class TreeNode {
final E key;

TreeNode left;

TreeNode right;

TreeNode( final E key ) { this.key = key; }

boolean isLeaf() { return ( left == null ) && ( right == null ); }

@Override public String toString() {
return "[" + String.valueOf( left ) + ", " + String.valueOf
( key ) +
", " + String.valueOf( right ) + "]";
}
}
}

// Infrastructure common to all coroutines
class Coroutine<E> implements SafeCloseableIterable<E> {
private static final Exception DONE = new Exception();

public static final ExecutorService pool =
Executors.newCachedThreadPool();

private final Yieldable<E> yieldable;

public Coroutine( final Yieldable<E> yieldable ) {
this.yieldable = yieldable;
}

@Override public SafeCloseableIterator<E> iterator() {
return new Iterator();
}

final class Iterator extends
AbstractSequentialSafeCloseableIterator<E> {
final Exchanger<E> exchanger = new Exchanger<E>();

private final Future<?> future = pool.submit( yieldable );

private Iterator() { yieldable.iterator = this; }

@Override public E call() throws Exception {
if ( future.isDone() ) { throw DONE; }
return exchanger.exchange( null );
}

@Override public void close() {
super.close();
future.cancel( true );
}
}
}

abstract class Yieldable<E> implements Runnable, SafeCloseable {
Coroutine<E>.Iterator iterator;

public final void yield( final E e ) {
try {
iterator.exchanger.exchange( e );
} catch ( InterruptedException notUsed ) {
iterator.close();
}
}

@Override public final void close() { iterator.close(); }
}

abstract class AbstractSequentialSafeCloseableIterator<E>
implements SafeCloseableIterator<E>, Callable<E> {
private boolean hasNextCalled = false;

private boolean hasNext;

protected E next;

@Override public boolean hasNext() {
if ( !hasNextCalled ) {
try {
next = call();
hasNext = true;
} catch ( Exception notUsed ) {
hasNext = false;
}
hasNextCalled = true;
}
return hasNext;
}

@Override public E next() {
if ( !hasNextCalled ) { hasNext(); }
if ( !hasNext ) { throw new NoSuchElementException(); }
hasNextCalled = false;
return next;
}

@Override public void remove() { throw new
UnsupportedOperationException(); }

@Override public void close() {
hasNextCalled = true;
hasNext = false;
}
}

interface SafeCloseable extends Closeable {
@Override void close();
}

interface SafeCloseableIterable<E> extends Iterable<E> {
@Override public SafeCloseableIterator<E> iterator();
}

interface SafeCloseableIterator<E> extends SafeCloseable, Iterator<E>
{}

// Test of Tree example
public class Main {
public static void main( final String[] notUsed ) {
final Tree<Integer> t1 = new Tree<Integer>( -1 );
final Tree<Integer> t2 = new Tree<Integer>( -1 );
for ( int i = 0; i < 6; i++ ) {
t1.insert( i );
t2.insert( i );
}
System.out.println( "t1 = " + t1 );
System.out.println( "t2 = " + t2 );
System.out.println( "Equal? " + Tree.sameFringe( t1, t2 ) );
Coroutine.pool.shutdownNow();
}
}

Note how the iterators are within a try block that closes them and the
act of closing them frees the thread. In the example above it isn't
important that the try blocks close the iterators since the iterators
terminate normally and hence are closed and the thread is freed in any
case. I don't know Ruby; so not sure if this helps, just a quick
suggestion.

Reply all

Reply to author

Forward