Goroutines

420 views
Skip to first unread message

rjmpm...@gmail.com

unread,
Dec 10, 2013, 4:50:51 PM12/10/13
to goph...@googlegroups.com
Hey Richard

Do you plan on supporting goroutines? If so, how do you plan to do it? Looking at the output of GopherJS, you are calling functions directly, i.e. the Go stack is mapped to the one and only JS stack. Will you have the compiler emit stack handling code "manually"?

Oh, and I just wanted to tell you it's a very interesting and cool project! :)


Cheers,
Michael

Richard Musiol

unread,
Dec 11, 2013, 6:32:41 AM12/11/13
to goph...@googlegroups.com
Hey Michael,

yes, I plan to add goroutines eventually. The reason for mapping the Go stack to only one JS stack is performance. I think that all other solutions would loose a lot of the JIT power of todays JS engines. My goal with GopherJS is to create something that can be used well in production environments, so performance is of importance.

I haven't figured out all details, but my general plan for goroutines is to translate blocking calls in Go to what you would normally use in JS: Callbacks. I plan to automatically add an additional callback parameter to functions that may block and rewrite the control flow of the calling functions to put the remaining part after the blocking call into the callback. Closures over local variables will come handy to preserve their state. The most difficult aspect will be interfaces and their polymorphism. It is hard to determine if a method of an interface is blocking or not. Especially if you want to compile each package separately (as Go and also GopherJS do for compile speed), you can not say at that moment if there later will be some type implementing the interface that has a blocking implementation of this method. So finding good tricks or heuristics here is the hardest part. Just treating everything as blocking will ruin performance I guess.

I would love to hear if you have some good ideas. Currently I still have many other features to work on, so I will think more about these problems when the time comes. Goroutines are one of the coolest features of Go and I would love to bring them to the browser, but I think GopherJS already has a lot to offer without them, right?

Bye,
Richard

rjmpm...@gmail.com

unread,
Dec 12, 2013, 4:24:58 PM12/12/13
to goph...@googlegroups.com
Hi Richard

It's nice to hear that goroutines are on your roadmap! Luckily you already have the other of my most favourite Go features: interfaces ;) I like the idea of being able to do front and back end code in both in Go :)

I haven't figured out all details, but my general plan for goroutines is to translate blocking calls in Go to what you would normally use in JS: Callbacks. I plan to automatically add an additional callback parameter to functions that may block and rewrite the control flow of the calling functions to put the remaining part after the blocking call into the callback.

I don't understand how this is going to work. The basic problem with n goroutines is that you have to keep track of n stacks while you only have one interpreter stack. 
Assume you're employing the suggested heuristic rewriting scheme with callbacks. You have a function a() call b() call c() which you deduce to be blocking for long enough to warrant use of your scheme. Now you have a() and b() on your interpreter's stack. For one you need to split b() into multiple pieces with at least one "continuation" piece. I can imagine this working out. In order to clear the stack of your interpreter for the execution of another goroutine you need to unwind the stack, you need to "get rid" first of b(), then of a(). This either implies that call sites need to be aware of this unwinding or that you use some kind of non-local flow control*. 
When the call to c() finally is ready to return, you call the continuation piece of b(). When this returns, it's not going to return to just after the call site of b() in a(), because a() is not on the stack. Then you'd need to link it to a continuation piece of a(). And at that point you'd need to have managed your stacks manually anyway ;)

Unfortunately I cannot offer you any good ideas. There is a basic mismatch between the number of goroutine stacks, n, and the number of stacks at your disposal, one. My gut feeling tells me that you need to multiplex (contempory: virtualize :D) to get many out of one. In hardware this is typically achieved by someone (IRQ, OS, ...) changing the stack pointer to point to stacks of different processes or threads, at least on processors that support this. On processors which do not realistically support switching stacks (6502, I'm looking at you) you either have one stack or you "virtualize" your stacks and emit stack handling code yourself.


Cheers,
Michael

* Like exceptions.

Richard Musiol

unread,
Dec 17, 2013, 5:33:58 PM12/17/13
to goph...@googlegroups.com
Hi Michael,

my approach will do it the exact way how blocking actions are handled today in Node.js. The function calling a blocking function will be split into a part before the blocking call and the part after the call which will be given as a callback to the blocking function. This also applies to all outer functions, since the function calling a blocking function is itself a blocking function. So in your example, not only c() will be split into two parts, but also b() and a(). At the end of c(), the callback to the second part of b() is called and at the end of b() the callback to the second part of a(). So without the callbacks, function c() ends at the blocking call, function b() at the call to c() and a() at the call to b(), thus, the stack can completely unwind when the blocking function is reached. At the toplevel there will be a scheduler, invoking the callback of c() as soon as the blocking call is done, thus invoking the whole callback chain c() -> b() -> a() which will be some kind of substitute for a call stack.

This is how Node.js's nonblocking concurrency is designed. It is an innovative approach to the concurrency problem, but my opinion is that it is more complicated than how for example Go solves this problem with goroutines. Especially in more complex applications, it is prone to its own kind of race conditions and concurrency bugs. That's why I hope that some day I can give developers of in-browser code the choice between nonblocking concurrency with JS or blocking concurrency with Go.

Bye,
Richard

rjmpm...@gmail.com

unread,
Dec 19, 2013, 4:38:30 PM12/19/13
to goph...@googlegroups.com
Hi Richard

Thanks for the clarification. I think we are on the same page now :)

This is how Node.js's nonblocking concurrency is designed.

Is there a way in Node.js to have this rewriting happen automatically? I'm aware of .NET's async/await mechanisms which do exactly the kind of rewriting that you mentioned in your previous post. I don't know of any other environment that does this automatically.

I believe that asynchronous I/O is a hack symptomatic of a world where context building, keeping and switching are expensive. Doing I/O asynchronously makes sense on microcontrollers and in operating systems. For everything else, Go did the right thing: make context building, keeping and switching cheap, thus enabling a locally linear and sane programming style. I'm looking forward to being able to use it in the browser.

I dug through your code for a couple of hours. It's very dense, I like that :D


Cheers,
Michael

Andy Balholm

unread,
Feb 8, 2014, 12:13:24 PM2/8/14
to goph...@googlegroups.com
As far as how to implement calls to interface methods, I would suggest that you always use CPS (callbacks) at first. (That is, when you first start to implement goroutines.) Then you can later work on compiler optimizations that replace them with direct calls when the static type of the method receiver is known.

Actually, once you've implemented the CPS transformation, I think that all that will be needed to implement goroutines will be to replace go f() with setTimeout(f, 0). (Although you will need to create a closure if f has parameters.) Channels will be more complicated, of course.

Kevin Gillette

unread,
Feb 9, 2014, 1:50:09 PM2/9/14
to Andy Balholm, goph...@googlegroups.com
That sounds about right. For channels and other synchronization, a simple approach would be to have each make call for channels internally call into an emulated runtime asking for an integer ID unique to channels; goroutine creation should also get an ID out of a separate pool. Then, all Go functions get broken up around scheduling points (if there are any), and end up as a sequence of JS functions wrapped in an outer closure -- the closure defines the local variables. Now, when a translated function reaches a channel communication point, it calls into the runtime, passing its goroutine ID and channel operations into an object keyed by channel ID which tracks ready state and read-and-write waiters. Care has to be taken to make sure that when a goroutine "resumes" it removes its channel associations from the runtime -- though as an optimization this could wait based on control flow such that, e.g. a select statement in a for loop doesn't need to continuously remove and re-add the same channels from the scheduler on each iteration, and can instead just wait until exiting the loop.

mutexes and other synchonization can simply be implemented in terms of channels internally (runtime.Cond.Broadcast might require some forethought). runtime.Gosched and default clauses in selects could just internally be translated into a discarding receive from an unexported, closed global channel.

Also needed would be some way to track channel references (perhaps manual reference counting) so that the runtime structures don't grow indefinitely.

On Sat, Feb 8, 2014 at 10:13 AM, Andy Balholm <andyb...@gmail.com> wrote:
As far as how to implement calls to interface methods, I would suggest that you always use CPS (callbacks) at first. (That is, when you first start to implement goroutines.) Then you can later work on compiler optimizations that replace them with direct calls when the static type of the method receiver is known.

Actually, once you've implemented the CPS transformation, I think that all that will be needed to implement goroutines will be to replace go f() with setTimeout(f, 0). (Although you will need to create a closure if f has parameters.) Channels will be more complicated, of course.

--
You received this message because you are subscribed to the Google Groups "GopherJS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gopherjs+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Andy Balholm

unread,
Feb 10, 2014, 11:15:29 AM2/10/14
to Kevin Gillette, goph...@googlegroups.com

On Feb 9, 2014, at 10:50 AM, Kevin Gillette <extempor...@gmail.com> wrote:

> That sounds about right. For channels and other synchronization, a simple approach would be to have each make call for channels internally call into an emulated runtime asking for an integer ID unique to channels; goroutine creation should also get an ID out of a separate pool. Then, all Go functions get broken up around scheduling points (if there are any), and end up as a sequence of JS functions wrapped in an outer closure -- the closure defines the local variables. Now, when a translated function reaches a channel communication point, it calls into the runtime, passing its goroutine ID and channel operations into an object keyed by channel ID which tracks ready state and read-and-write waiters. Care has to be taken to make sure that when a goroutine "resumes" it removes its channel associations from the runtime -- though as an optimization this could wait based on control flow such that, e.g. a select statement in a for loop doesn't need to continuously remove and re-add the same channels from the scheduler on each iteration, and can instead just wait until exiting the loop.
>
> mutexes and other synchonization can simply be implemented in terms of channels internally (runtime.Cond.Broadcast might require some forethought). runtime.Gosched and default clauses in selects could just internally be translated into a discarding receive from an unexported, closed global channel.
>
> Also needed would be some way to track channel references (perhaps manual reference counting) so that the runtime structures don't grow indefinitely.

I wonder if that’s more complicated than it needs to be. I don’t think that either a goroutine or a channel needs to be a global object. There is probably no need for any global runtime state. So garbage collection could take care of unused channels, and probably permanently-blocked goroutines as well. (Since JS is single-threaded, let’s take advantage of the simplifications that offers.)

What if a channel was just a JS object? When a goroutine comes to a channel send or receive that can’t complete immediately, it would just add its current continuation (i.e. the function that corresponds to the remainder of the goroutine) to the channel’s appropriate waiting list. If it encounters a synchronous channel operation that can complete immediately, it would get a continuation function from the waiting list, schedule its own continuation function with setTimeout(f, 0), and call the continuation from the waiting list.

Kevin Gillette

unread,
Feb 10, 2014, 3:30:55 PM2/10/14
to Andy Balholm, goph...@googlegroups.com
When I said global, I didn't mean window.global -- I meant a variable available to the compiled program, which conventionally would be wrapped in a JS closure. Since Go identifier resolution is a compile-time notion (it can't examine variables in the current or parent scopes at runtime), the "application" bits of the program would never have access to these internal names.

The design I proposed is probably more complex then it needs to be, but I proposed it with the cautious assumption that we would need a central runtime to do at least some things related to scheduling or channels, or that it'd be horribly inefficient for some things if there were no central runtime.  Of course the simpler way is easier to implement, and we might as well follow the Go mantra of profiling before eagerly optimizing.

One potential problem with the simpler design: we may have to do some tricks to do decentralized deadlock detection, but it could be costly; for example, we could have a periodic deadlock detector CAS a DID_WORK variable to false, which each continuation doing useful work would have set to true -- if the detector notices a false value, it means a deadlock has occurred.

I did some experimentation, and can confirm that at least with v8, scheduling is FIFO for events scheduled to be run immediately -- the w3 specs also indicate that tasks are to be scheduled onto a queue. Given this, we shouldn't have to worry about native scheduling causing one continuation to starve out the rest.

Kevin Gillette

unread,
Feb 10, 2014, 3:48:38 PM2/10/14
to Andy Balholm, goph...@googlegroups.com
Followup: we need to be careful about directly calling a continuation from a channel's wait list, since that behavior could daisy-chain ad infinitum, producing an apparent hang in the application. The concurrent prime sieve would probably have this behavior, for example. I suspect some real applications would have this behavior as well. 

Also, with respect to VM scheduling, see: <http://dbaron.org/log/20100309-faster-timeouts>. We could detect at initialization the presence of postMessage, and use setTimeout otherwise.

setTimeout(f) is spec-defined as exactly equivalent to setTimeout(f, 0), though the same spec suggests limitations on the minimum timeout -- so a requested zero-delay usually gets policy shifted to a few milliseconds of delay, hence postMessage.

Andy Balholm

unread,
Feb 10, 2014, 4:19:33 PM2/10/14
to Kevin Gillette, goph...@googlegroups.com
On Feb 10, 2014, at 12:30 PM, Kevin Gillette <extempor...@gmail.com> wrote:

One potential problem with the simpler design: we may have to do some tricks to do decentralized deadlock detection, but it could be costly; for example, we could have a periodic deadlock detector CAS a DID_WORK variable to false, which each continuation doing useful work would have set to true -- if the detector notices a false value, it means a deadlock has occurred.

The only kind of deadlock I envision—a panic(“All goroutines are asleep!”) deadlock—would result in control returning to the browser’s main event loop, with no setTimeout events pending. I assumed that we wouldn’t try to catch that, since that is considered a fairly normal state of affairs in the browser.

Followup: we need to be careful about directly calling a continuation from a channel's wait list, since that behavior could daisy-chain ad infinitum, producing an apparent hang in the application. The concurrent prime sieve would probably have this behavior, for example. I suspect some real applications would have this behavior as well. 

I don’t understand why this would happen. In the concurrent prime sieve, the goroutine would go on to immediately block on another channel, but it would make a little progress each time. As long as our channel latency wasn’t too high, it would work.

Also, with respect to VM scheduling, see: <http://dbaron.org/log/20100309-faster-timeouts>. We could detect at initialization the presence of postMessage, and use setTimeout otherwise.

setTimeout(f) is spec-defined as exactly equivalent to setTimeout(f, 0), though the same spec suggests limitations on the minimum timeout -- so a requested zero-delay usually gets policy shifted to a few milliseconds of delay, hence postMessage.

I wasn’t aware of the minimum timeout. Yes, we would want to use a real zero timeout, with postMessage or something.

Kevin Gillette

unread,
Feb 10, 2014, 5:31:59 PM2/10/14
to Andy Balholm, goph...@googlegroups.com
On Mon, Feb 10, 2014 at 2:19 PM, Andy Balholm <andyb...@gmail.com> wrote:

The only kind of deadlock I envision—a panic(“All goroutines are asleep!”) deadlock—would result in control returning to the browser’s main event loop, with no setTimeout events pending. I assumed that we wouldn’t try to catch that, since that is considered a fairly normal state of affairs in the browser.

I hadn't considered that... Certainly there's a large "unknown", particularly regarding JS interop (e.g. we can't assume that all Go-related events are fully written in pure Go). That said, it may be useful to have the concept of a soft-deadlock ("coma-lock"?) that can be logged to console when some runtime variable is set -- to aid in weeding out soft-bugs.
Followup: we need to be careful about directly calling a continuation from a channel's wait list, since that behavior could daisy-chain ad infinitum, producing an apparent hang in the application. The concurrent prime sieve would probably have this behavior, for example. I suspect some real applications would have this behavior as well. 

I don’t understand why this would happen. In the concurrent prime sieve, the goroutine would go on to immediately block on another channel, but it would make a little progress each time. As long as our channel latency wasn’t too high, it would work.

I don't see how this would be affected by channel latency, since the algorithm you describe is deterministic in the context of the single-threaded JS VM. Anyway, I think you're right that even an unbounded (keep generating forever) concurrent prime sieve wouldn't result in an infinite series of direct calls, but the importance of minimizing hangs is critical, even at the cost of some throughput. For example, if we don't hand back control for, say, 1 second, then during that second, clicks on buttons, or use of the mouse scroll wheel would have no visual response at all -- hangs of 1 second render an app unusable, and we shouldn't make programmers insert runtime.Gosched() calls just to avoid the possibility of channel ops chaining together for long periods of time; if our implementation could produce that effect at all, it'd just be safer to use async calls on all communication points instead of sync calls. That said, we can (and almost certainly should) implement these behaviors in internal runtime functions that the generated code calls, rather than inlining the channel walking code directly into each continuation; centralized code makes centralized state much simpler to manage -- in this case a runtime-internal timestamp that marks when control was handed to the app code -- each call into the scheduling utility functions can check the duration since that timestamp and decide whether to call or schedule the next continuation. I suspect that 30-50ms is about the most we could get away with in a single batch -- anything that runs much longer than that should somehow, implicitly or explicitly, get run via a web-worker or use runtime.Gosched just to granularize.

Kevin Gillette

unread,
Feb 10, 2014, 5:43:42 PM2/10/14
to Andy Balholm, goph...@googlegroups.com



On Mon, Feb 10, 2014 at 3:31 PM, Kevin Gillette <extempor...@gmail.com> wrote:
anything that runs much longer than that should somehow, implicitly or explicitly, get run via a web-worker or use runtime.Gosched just to granularize.

That is to say, long-running synchronous algorithms (something recursive or in a for loop, for example), may need special handling on the programmer's part; we don't want the programmer to need to employ additional handling of concurrent algorithms composed of short-running steps.

Andy Balholm

unread,
Feb 10, 2014, 5:59:42 PM2/10/14
to Kevin Gillette, goph...@googlegroups.com

On Feb 10, 2014, at 2:31 PM, Kevin Gillette <extempor...@gmail.com> wrote:

> I hadn't considered that... Certainly there's a large "unknown", particularly regarding JS interop (e.g. we can't assume that all Go-related events are fully written in pure Go).

I assume that we would have JS functions for sending and receiving on channels. This would mean, though, that even if all goroutines appear to be deadlocked on the Go side, a JS event might result in breaking the deadlock. So deadlock detection would be nearly impossible.

> For example, if we don't hand back control for, say, 1 second, then during that second, clicks on buttons, or use of the mouse scroll wheel would have no visual response at all -- hangs of 1 second render an app unusable, and we shouldn't make programmers insert runtime.Gosched() calls just to avoid the possibility of channel ops chaining together for long periods of time; if our implementation could produce that effect at all, it'd just be safer to use async calls on all communication points instead of sync calls.

Do you mean that when we send on a channel, the continuations from both the sending goroutine and the receiving goroutine get queued in the run loop (with postEvent or whatever)? That sounds like it would probably be a good solution.

Kevin Gillette

unread,
Feb 10, 2014, 6:24:23 PM2/10/14
to Andy Balholm, goph...@googlegroups.com
On Mon, Feb 10, 2014 at 3:59 PM, Andy Balholm <andyb...@gmail.com> wrote:
For example, if we don't hand back control for, say, 1 second, then during that second, clicks on buttons, or use of the mouse scroll wheel would have no visual response at all -- hangs of 1 second render an app unusable, and we shouldn't make programmers insert runtime.Gosched() calls just to avoid the possibility of channel ops chaining together for long periods of time; if our implementation could produce that effect at all, it'd just be safer to use async calls on all communication points instead of sync calls.

Do you mean that when we send on a channel, the continuations from both the sending goroutine and the receiving goroutine get queued in the run loop (with postEvent or whatever)? That sounds like it would probably be a good solution.

I was suggesting that as a starting point: it has no chance of artificially producing hangs in the browser. That said, it may just be fast enough on its own: on my old 2004-era desktop machine with modern chrome, a variation on <http://dbaron.org/mozilla/zero-timeout> with 10k iterations indicates that postMessage adds about 80 microseconds of overhead per call (~840ms for the total run) relative to a direct call. Compare this to 4.1ms per call using setTimeout (41.3 seconds for the whole run). While 80us is abysmal for systems-programming-hat Go (serving potentially thousands of users), it's entirely reasonable as "scheduling overhead" for browser based applications (serving one user) -- on typical modern or semi-modern machines, postMessage would of course complete much more quickly than the results I received.

Kevin Gillette

unread,
Feb 10, 2014, 6:27:00 PM2/10/14
to Andy Balholm, goph...@googlegroups.com
s/2004/2006/
Reply all
Reply to author
Forward
0 new messages