[YALEP] non-parallel channels

874 views
Skip to first unread message

Dmitry Vyukov

unread,
Apr 8, 2013, 4:58:11 PM4/8/13
to golang-dev
Yet another language extension proposal: non-parallel channels

The problem that it is intended to solve is runtime overheads of using
iterator/generator pattern. You can find it in parsers, complex data
structure traversal, etc. A simple example of iterator/generator is:

func BenchmarkGenerator(b *testing.B) {
c := make(chan int)
go func() {
for i := 0; i < b.N; i++ {
c <- i
}
close(c)
}()
sum := 0
for v := range c {
sum += v
}
_ = sum
}

Currently the performance is:
$ go test -v -cpu=1,2,4,8,16 -run=none -bench=Generator runtime
BenchmarkGenerator 10000000 186 ns/op
BenchmarkGenerator-2 5000000 622 ns/op
BenchmarkGenerator-4 5000000 432 ns/op
BenchmarkGenerator-8 5000000 603 ns/op
BenchmarkGenerator-16 5000000 531 ns/op

With the proposed non-parallel channels the runtime can be cut
probably to <100ns regardless of GOMAXPROCS.

The only visible change to the language is the ability to specify
negative chan buffer size:
c := make(chan int, -1)

The semantics of such chan are:
- recvieve and close work the same way
- send blocks until there is a request for the *next* value

This will allow to switch directly from consumer goroutine to producer
goroutine and back, w/o scheduler overhead and the overhead of
switching to the intermediate g0 goroutine.

The tricky case is what to do when such channel is involved in several
send/receive operations or in select statements. The idea is that in
such cases it loses it's non-parallel properties and behaves as
synchronous chan. This will ensure that the chan can be used mostly as
drop-in replacements for synchronous chans. There are still some
additional deadlock possible, e.g. the following code will deadlock:

c1 := make(chan int, -1)
c2 := make(chan int, -1)
go func() {
for i := 0; i < b.N; i++ {
c1 <- i
c2 <- i
}
close(c1)
close(c2)
}()
sum := 0
for v := range c1 {
sum += v
sum += <-c2
}

because the producer gooroutine will be blocked on c1 send, while the
consumer waits for a value on c2.

What do you think?

Rob Pike

unread,
Apr 8, 2013, 5:03:25 PM4/8/13
to Dmitry Vyukov, golang-dev
I think you need to explain it better, illustrated with a complete
example. What do you mean by the "*next* value"? Or "request" for that
matter. In short, when does send block?

-rob

Ian Lance Taylor

unread,
Apr 8, 2013, 5:13:02 PM4/8/13
to Dmitry Vyukov, golang-dev
On Mon, Apr 8, 2013 at 1:58 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> Yet another language extension proposal: non-parallel channels
>
> The problem that it is intended to solve is runtime overheads of using
> iterator/generator pattern. You can find it in parsers, complex data
> structure traversal, etc. A simple example of iterator/generator is:
>
> func BenchmarkGenerator(b *testing.B) {
> c := make(chan int)
> go func() {
> for i := 0; i < b.N; i++ {
> c <- i
> }
> close(c)
> }()
> sum := 0
> for v := range c {
> sum += v
> }
> _ = sum
> }

I think that changing make as you propose will be difficult to
understand.

What if we instead introduce a new runtime function, waitForReceiver
only with a better name.

go func() {
for i := 0; i < b.N; i++ {
c <- i
waitForReceiver(c)
}
close(c)
}()

When the goroutine calls waitForReceiver it will block until some
other goroutine does a receive on the channel. When the goroutine
does a send, we record the receiving goroutine on the channel. If it
calls waitForReceiver we flip to the receiving goroutine. When the
receiving goroutine calls receive again, it sees the waitForReceiver
on the channel and flips to the sending goroutine. After the startup
hiccup we should have a smooth flip back and forth.

I think the semantics of waitForReceiver are clear. It's
unfortunately still hard to understand clearly when it will improve
efficiency. On the other hand it's easy to understand that it will
help a goroutine only compute an expensive value when somebody wants
it.

Ian

Dmitry Vyukov

unread,
Apr 8, 2013, 5:16:32 PM4/8/13
to Rob Pike, golang-dev
Here is the full example (note -1 as chan buf size):

0: c := make(chan int, -1)
1: go func() {
2: for i := 0; i < b.N; i++ {
3: c <- i
4: }
5: close(c)
6: }()
7: for v := range c {
8: sum += v
9: }

Consider that consumer blocks on line 8 (chan recv) because the chan is empty.
Now producer starts running and reaches line 3 (chan send). At this
point it unblocks the consumer, but remains *blocked* (this is the
difference with synchronous chans) (from implementation point of view
this is implemented as direct switch from producer to consumer
goroutine).
Now the consumer is unblocked and executes the loop iteration.
Consumer reaches line 8 (chan recv) again, and only at this point it
unblocks the producer (this is also implemented as direct switch).
And so on.
When the producer closes the chan, it continues to run in parallel
with the consumer.

Dmitry Vyukov

unread,
Apr 8, 2013, 5:21:41 PM4/8/13
to Ian Lance Taylor, golang-dev
It won't work, because there is short period of time when both
producer and consumer are runnable. So we can't do direct switch, and
another thread can potentially pick up the second goroutine (and it
must be able to).

What would work is *atomic* send_and_wait_for_receiver, e.g.
sendAndWaitForReceiver(c, v)
or:
c <<- v
or:
c <-- v
or whatever.
(unless of course, the compiler can lower chan send and subsequent
waitForReceiver into a single runtime call)

Rob Pike

unread,
Apr 8, 2013, 5:21:44 PM4/8/13
to Dmitry Vyukov, golang-dev
Oh, you just want control flow to be signaled directly by the
communication. I thought you were going for something more profound, a
kind of write-ahead to enable unbounded sends to be gated by the
receiver.

You're asking for an implementation detail to be made available for
performance, not a semantic change.

-rob

Brad Fitzpatrick

unread,
Apr 8, 2013, 5:23:17 PM4/8/13
to Dmitry Vyukov, Ian Lance Taylor, golang-dev
Could you observe behavior at runtime and dynamically switch implementation strategies?




--

---
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Rob Pike

unread,
Apr 8, 2013, 5:24:05 PM4/8/13
to Brad Fitzpatrick, Dmitry Vyukov, Ian Lance Taylor, golang-dev
Or just use coroutines, which is what these are.

Dmitry Vyukov

unread,
Apr 8, 2013, 5:26:56 PM4/8/13
to Rob Pike, golang-dev
On Mon, Apr 8, 2013 at 2:21 PM, Rob Pike <r...@golang.org> wrote:
> Oh, you just want control flow to be signaled directly by the
> communication. I thought you were going for something more profound,

Yes, this is not very profound.

> a
> kind of write-ahead to enable unbounded sends to be gated by the
> receiver.

Can you now explain better and illustrate? :)


> You're asking for an implementation detail to be made available for
> performance, not a semantic change.

It is not exactly an implementation detail, because it can introduce
additional deadlocks (I've showed an example in first post). But it is
close.
This is a way for a user to say "hey, I do not want
concurrency/parallelism on two sides of the channel, I've created it
merely to structure my code", think of coroutines or your parser talk.

Rob Pike

unread,
Apr 8, 2013, 5:31:52 PM4/8/13
to Dmitry Vyukov, golang-dev
Yes, this is coroutines. Maybe, as Brad suggests, coroutine behavior
can be discovered dynamically. I'm unsure about that.

What I was thinking about was the design of Doug McIlroy's power
series code, where channels represent streams of coefficients and we
want to merge and mix and match them while they're being generated.
Without great care, you can get runaway generation of the terms; his
solution was to use pairs of channels, one to signal when a value is
needed, the other to respond with the value. Buffered channels don't
work because the streams are unbounded; it's a long story best seen in
his paper.

-rob

Ian Lance Taylor

unread,
Apr 8, 2013, 5:35:14 PM4/8/13
to Dmitry Vyukov, golang-dev
I thought about that, and probably I'm missing something, but I don't
see why the atomic send-and-switch is required.

Sender Receiver
c <- i read from range
both are runnable
waitForReceiver processing
read from range
see that waitForReceiver was called
flip to sender
processing
c <- i
we got here from flip
so flip back to receiver
read completes
processing
read from range
flip to sender
waitForReceiver
doesn't block
processing
c <- i
flip to receiver


In other words, once a goroutine calls waitForReceiver, we start
handling the channel differently. If any other goroutine sends or
receives on the channel, we drop that special handling and revert to
normal.

Ian

Dmitry Vyukov

unread,
Apr 8, 2013, 5:40:13 PM4/8/13
to Rob Pike, Brad Fitzpatrick, Ian Lance Taylor, golang-dev
On Mon, Apr 8, 2013 at 2:24 PM, Rob Pike <r...@golang.org> wrote:
> Or just use coroutines, which is what these are.

How do I use coroutines in Go?


> On Mon, Apr 8, 2013 at 2:23 PM, Brad Fitzpatrick <brad...@golang.org> wrote:
>> Could you observe behavior at runtime and dynamically switch implementation
>> strategies?


This will introduce slowdowns for all (to collect stats and decide),
and it's difficult to do reliably, and to prevent possible deadlocks
we still need to put "conditionally" runnable produced onto the run
queue.

Andrew Gerrand

unread,
Apr 8, 2013, 5:41:33 PM4/8/13
to Ian Lance Taylor, Dmitry Vyukov, golang-dev
Ian, if I'm reading you right, your sending and receiving sides are:

for {
waitForReceiver(c)
v := someComputation()
c <- v
}
for {
v := <-c
doSomething(v)
}

But I would prefer to make this existing pattern more efficient:
for {
<-next
v := someComputation()
c <- v
}
for {
next <- true
v := <-c
doSomething(v)
}

I know that's more challenging, but any such optimizations would be more broadly useful than just this simple pattern.

Dmitry's proposal does seem like exposing the runtime implementation. Hm.

Andrew

Dmitry Vyukov

unread,
Apr 8, 2013, 5:47:42 PM4/8/13
to Ian Lance Taylor, golang-dev
Ah, I see, you basically also mark the chan as "non-parallel" but at a
different point and for a single sender goroutine (and for a single
subsequent send?), right?

The potential concern is:

c <- v
if nextValueIsCheap() {
waitForReceiver(c)
}

I am not sure how to handle this. And even if we figure out it may be
not what the user has in mind...

Dmitry Vyukov

unread,
Apr 8, 2013, 5:51:31 PM4/8/13
to Andrew Gerrand, Ian Lance Taylor, golang-dev
On Mon, Apr 8, 2013 at 2:41 PM, Andrew Gerrand <a...@golang.org> wrote:
> Ian, if I'm reading you right, your sending and receiving sides are:
>
> for {
> waitForReceiver(c)
> v := someComputation()
> c <- v
> }
> for {
> v := <-c
> doSomething(v)
> }
>
> But I would prefer to make this existing pattern more efficient:
> for {
> <-next
> v := someComputation()
> c <- v
> }
> for {
> next <- true
> v := <-c
> doSomething(v)
> }
>
> I know that's more challenging, but any such optimizations would be more
> broadly useful than just this simple pattern.


It's impossible in general case, because we don't know whether the
user wants parallelism or not. In any of these example he may actually
want the goroutines to run in parallel, and we can not just make them
non-parallel.

This would require a complex and complete source code analysis, that
will say "this piece of code has no externally observable side-effects
and requires no more than X virtual instructions, so don't bother to
make it parallel". I would say this is unfeasible.

Maxim Khitrov

unread,
Apr 8, 2013, 5:54:23 PM4/8/13
to Dmitry Vyukov, golang-dev
On Mon, Apr 8, 2013 at 4:58 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> The only visible change to the language is the ability to specify
> negative chan buffer size:
> c := make(chan int, -1)

I was hoping that this syntax would one day be used to create an
unbounded buffered channel (i.e., a channel that never blocks on a
send). Bad idea?

> The semantics of such chan are:
> - recvieve and close work the same way
> - send blocks until there is a request for the *next* value

In my experience, the problem with using channels as generators is
early termination. If the next value is never requested, you have a
leak. To avoid this, the consumer has to create a second 'quit'
channel, and the producer selects between a send and a receive, which
would negate the benefits of this proposal. I would look through some
existing code to see how often the pattern that you're describing is
actually used.

Dmitry Vyukov

unread,
Apr 8, 2013, 5:55:06 PM4/8/13
to Ethan Burns, golang-dev
On Mon, Apr 8, 2013 at 2:20 PM, Ethan Burns <burns...@gmail.com> wrote:
> This sounds very closely related to coroutines. Is that correct?

Absolutely, I just did not want to spell the name to not predispose
you. I wanted to show how the existing Go concurrent primitives can be
moved in that direction. Instead of traditional coroutine
identity/suspend/resume I use the existing Go primitives to achieve
roughly the same goal.

Ian Lance Taylor

unread,
Apr 8, 2013, 5:58:11 PM4/8/13
to Dmitry Vyukov, golang-dev
Yes.

> The potential concern is:
>
> c <- v
> if nextValueIsCheap() {
> waitForReceiver(c)
> }
>
> I am not sure how to handle this. And even if we figure out it may be
> not what the user has in mind...

Fair point. The only way to handle that is the same as above. The
code will work correctly, it just won't run as fast as possible. And
in particular you're right that it may not run as fast as the user
expects.

I think I still prefer your sendAndWaitForReceiver to changing the
argument to make, because I think the former will have fewer obscure
failure cases than the latter.

Ian

Dmitry Vyukov

unread,
Apr 8, 2013, 5:58:36 PM4/8/13
to Maxim Khitrov, golang-dev
On Mon, Apr 8, 2013 at 2:54 PM, Maxim Khitrov <m...@mxcrypt.com> wrote:
> On Mon, Apr 8, 2013 at 4:58 PM, Dmitry Vyukov <dvy...@google.com> wrote:
>> The only visible change to the language is the ability to specify
>> negative chan buffer size:
>> c := make(chan int, -1)
>
> I was hoping that this syntax would one day be used to create an
> unbounded buffered channel (i.e., a channel that never blocks on a
> send). Bad idea?

I do think unbounded chans are a bad idea, we just need to fix that
chan with buffer size N consumes O(N) memory even if empty. Then you
can do:
c := make(*Req, 1<<30)


>> The semantics of such chan are:
>> - recvieve and close work the same way
>> - send blocks until there is a request for the *next* value
>
> In my experience, the problem with using channels as generators is
> early termination. If the next value is never requested, you have a
> leak. To avoid this, the consumer has to create a second 'quit'
> channel, and the producer selects between a send and a receive, which
> would negate the benefits of this proposal. I would look through some
> existing code to see how often the pattern that you're describing is
> actually used.


Good point. This is what Andrew showed in this example, right?

Kyle Lemons

unread,
Apr 8, 2013, 5:58:28 PM4/8/13
to Dmitry Vyukov, Andrew Gerrand, Ian Lance Taylor, golang-dev
Naively, it seems like the first step to any of this would be to analyze (preferably in the compiler) every syntactically-known goroutine to determine whether, for each channel it is given as a parameter or over which it closes, it is a {Producer | Consumer | Producer+Consumer | Neither}, and annotate it.  Doing this work could potentially enable many other optimizations, and running the analysis on a portion of existing Go code could potentially give us insights into how people actually use these patterns in real code.


Dmitry Vyukov

unread,
Apr 8, 2013, 6:01:10 PM4/8/13
to Ian Lance Taylor, golang-dev
Well, yes, if you pass "non-parallel" chan into some unknown library,
this can be problematic.
But this raises another question: how sendAndWaitForReceiver should
work with buffered chans? I think the same way as with non-buffered
chans.

Rob Pike

unread,
Apr 8, 2013, 6:01:32 PM4/8/13
to Kyle Lemons, Dmitry Vyukov, Andrew Gerrand, Ian Lance Taylor, golang-dev
By "use coroutines" I meant, "design and build coroutines and use
them", which is in a way what you're trying to do already.

Dmitry Vyukov

unread,
Apr 8, 2013, 6:14:38 PM4/8/13
to Rob Pike, Kyle Lemons, Andrew Gerrand, Ian Lance Taylor, golang-dev
On Mon, Apr 8, 2013 at 3:01 PM, Rob Pike <r...@golang.org> wrote:
> By "use coroutines" I meant, "design and build coroutines and use
> them", which is in a way what you're trying to do already.

Well, the hypothesis is that *this* is how coroutines may look in Go.

Goroutines solve most of the problems that coroutines solve in other
languages. So we don't need a full-fledged solution (it will be
overlapping with gorouitnes in lots of ways). The only non covered
part is non-parallel execution, direct context switches and the
performance implications, and this is what this proposal tries to
solve. Another non covered part is ability to share data directly
(because you know that either producer or consumer (but not both) run
at any point in time, so there are no data races), but I think it
would make coding style too different from goroutine-based coding
style and impossible to switch back and forth.

Ethan Burns

unread,
Apr 8, 2013, 6:21:24 PM4/8/13
to golan...@googlegroups.com, Dmitry Vyukov
Perhaps the generator could be tied to more closely to the send end of the channel.  This way, when there are no more references to the receive end of the channel then the generator go(co?)routine can be collected:

// The 2nd argument accepts any function on send-only int channels.
// When such a channel is received from then the generator routine is used.
vals :=  make(<-chan int, func(ch chan<- int) {
    for i := 0; i < 100; i++ {
        ch <- i
    }
}())

for v := range vals {
    fmt.Println(v)
}


Ethan

Dmitry Vyukov

unread,
Apr 8, 2013, 6:27:23 PM4/8/13
to Ethan Burns, golang-dev
Nice!
I need to think more on this.
This is one step closer to coroutines, right? The chan is essentially
coroutine identity, send on chan suspend the coroutine, receive on
chan resumes the coroutine.
... are there any cases when this needs to be more complex than this?
E.g. send on several chans or something else?..

Ian Lance Taylor

unread,
Apr 8, 2013, 7:49:05 PM4/8/13
to Dmitry Vyukov, golang-dev
On Mon, Apr 8, 2013 at 3:01 PM, Dmitry Vyukov <dvy...@google.com> wrote:
>
> Well, yes, if you pass "non-parallel" chan into some unknown library,
> this can be problematic.
> But this raises another question: how sendAndWaitForReceiver should
> work with buffered chans? I think the same way as with non-buffered
> chans.

Another issue is that has to work with select, so that there is a way
to shut down a goroutine using this technique.

Which in turn suggests adding a new kind of select case, meaning "run
this case if some goroutine is waiting on the other end of this
channel".

select {
case ready(<-c1):
case ready(c2<-):
}

This is potentially useful by itself, as long as people understand
that merely because ready returns true it does not follow that the
actual channel operation will succeed.

And to make your coroutine case work, we need to also permit

select {
case c <- v, ready(<-c):
v = /* new value */
case <-quit:
return
}

where case A, B means "if this case is ready, run A, and then rerun
the select statement, replacing case A, B with case B."

But at this point it's gotten too complicated.

Ian

Sameer Ajmani

unread,
Apr 9, 2013, 12:22:01 PM4/9/13
to Ian Lance Taylor, Dmitry Vyukov, golang-dev
Instead of overloading channels with this notion of the sender yielding to the receiver, what if we made yields explicit, like channel operations?  Yields look like unbuffered, receive-only channels:

func BenchmarkGenerator(b *testing.B) {
        c := make(chan int)
y := make(yield)
        go func() {
                for i := 0; i < b.N; i++ {
                        c <- i
<-y
                }
                close(c)
        }()
        sum := 0
        for v := range c {
                sum += v
<-y
        }
        _ = sum
}

A yield looks like a channel in the code but is provided purely for transferring control among the goroutines that receive from that yield.  When a goroutine receives from a yield, it blocks, and control transfers to one of the other goroutines that are also receiving from that yield (somewhat passing a token over a buffered channel, but guaranteeing that you don't get the token back until some other goroutine has received from the yield).  Receiving from a yield is a communication operation that can be included in a select, so shutdown might look like:
defer close(c)
for i := 0; i < b.N; i++ {
c <- i
select {
case <-y:
case <-quit:
return
}
}
Receiving from a yield evaluates to nil.  Yields cannot be closed (or perhaps they can, which might provide a simpler way to shut down a collection of goroutines that share a yield).

It's unclear to me whether sharing a yield among more than two goroutines makes sense.

S
 


Ian Lance Taylor

unread,
Apr 9, 2013, 1:24:27 PM4/9/13
to Sameer Ajmani, Dmitry Vyukov, golang-dev
On Tue, Apr 9, 2013 at 9:22 AM, Sameer Ajmani <sam...@golang.org> wrote:
> Instead of overloading channels with this notion of the sender yielding to
> the receiver, what if we made yields explicit, like channel operations?
> Yields look like unbuffered, receive-only channels:

It's an interesting idea but I don't think it helps with the runtime
optimization that Dmitriy is after. The issue is that the
optimization only works if we can assume that only one goroutine is
runnable at a time. Using your yield idea doesn't guarantee
that--after the channel send both goroutines are runnable. That means
that if the yield is only executed conditionally then the results
become difficult for the programmer to predict.

Ian

Dmitry Vyukov

unread,
Apr 10, 2013, 7:47:38 PM4/10/13
to Ian Lance Taylor, Sameer Ajmani, golang-dev
Yes, Ian is right, this won't help for the optimization that I want to do.

Dmitry Vyukov

unread,
Apr 10, 2013, 7:49:50 PM4/10/13
to Ian Lance Taylor, golang-dev
On Mon, Apr 8, 2013 at 4:49 PM, Ian Lance Taylor <ia...@golang.org> wrote:
> On Mon, Apr 8, 2013 at 3:01 PM, Dmitry Vyukov <dvy...@google.com> wrote:
>>
>> Well, yes, if you pass "non-parallel" chan into some unknown library,
>> this can be problematic.
>> But this raises another question: how sendAndWaitForReceiver should
>> work with buffered chans? I think the same way as with non-buffered
>> chans.
>
> Another issue is that has to work with select, so that there is a way
> to shut down a goroutine using this technique.
>
> Which in turn suggests adding a new kind of select case, meaning "run
> this case if some goroutine is waiting on the other end of this
> channel".
>
> select {
> case ready(<-c1):
> case ready(c2<-):
> }
>
> This is potentially useful by itself, as long as people understand
> that merely because ready returns true it does not follow that the
> actual channel operation will succeed.
>
> And to make your coroutine case work, we need to also permit
>
> select {
> case c <- v, ready(<-c):
> v = /* new value */
> case <-quit:
> return
> }

This code pattern is equivalent to only receiving from chan c, and
then closing c on the other end (instead of sending to quit). Right?
I just thinking that if limit number of patterns, then it can be
easier to support.

Dmitry Vyukov

unread,
Apr 10, 2013, 7:52:27 PM4/10/13
to Andrew Gerrand, Ian Lance Taylor, golang-dev
On Mon, Apr 8, 2013 at 2:41 PM, Andrew Gerrand <a...@golang.org> wrote:
> Ian, if I'm reading you right, your sending and receiving sides are:
>
> for {
> waitForReceiver(c)
> v := someComputation()
> c <- v
> }
> for {
> v := <-c
> doSomething(v)
> }
>
> But I would prefer to make this existing pattern more efficient:
> for {
> <-next
> v := someComputation()
> c <- v
> }
> for {
> next <- true
> v := <-c
> doSomething(v)
> }

This seems to be solvable with an "atomic send and receive", because
goroutines send and then block on receive.

Dmitry Vyukov

unread,
Apr 10, 2013, 7:55:42 PM4/10/13
to Ethan Burns, golang-dev
On Mon, Apr 8, 2013 at 3:21 PM, Ethan Burns <burns...@gmail.com> wrote:
What do other people think about this thing?

It is kind of strange and intrusive. But at least it solves both (1)
shutdown on the producer (just forget about it and it will garbage
collected), and (2) explicit request for the next value to not compute
expensive values ahead of time.

Rob Pike

unread,
Apr 10, 2013, 7:59:58 PM4/10/13
to Dmitry Vyukov, Ethan Burns, golang-dev
Right now, I don't really have the spare capacity to design new
language features.

-rob

Dmitry Vyukov

unread,
Apr 14, 2013, 8:05:24 PM4/14/13
to Kevin Gillette, golang-dev, Ethan Burns
On Sat, Apr 13, 2013 at 8:24 PM, <extempor...@gmail.com> wrote:
As I understand it, a negative channel cap only differs from an unbuffered chan in that the negative cap would hint to (a modified) current runtime that which a future compiler or runtime may be able to determine automatically. In that sense, it feels like the C++ strategy to solving performance problems.

Furthermore, needing to support an unbuffered fallback at runtime at times when more than two goroutines are involved means that we may not be able to take advantage of certain performant assumptions. If Dmitry's suggestion were accepted, it seems reasonable enough to panic if these special-cased channels do not fall within expected/assumed usage patterns. Alternatively, we could require the input code to fall within a given flow structure, and abort compilation if it does not (similarly to how we handled return statements: as we understand the problem better, we can expand allowed use-cases).

Arguably, for-range loops comprise most of the use cases -- if we can optimize around that case, we may be able to achieve a 90/10 solution. Unfortunately, emulating generators with early termination requires either a quit chan, a semantic change, or codifying existing bad behavior into a convention: for example, it's currently bad behavior for a consumer to close a channel, though if that were codified into a generator idiom, then something like the following could be used:

func producer(i int) <-chan int {
   ch := make(chan int, -1)
   go func() {
      defer func() { recover() }()
      for {
         ch <- i
         i++
      }
   }()
   return ch
}

func consumer() {
   ch := producer(0)
   for i := range ch {
      fmt.Println(i)
      if i > 50 {
         close(ch)
      }
   }
}

With a few modifications, this works already: http://play.golang.org/p/7t4oT6OqUJ

Still, this is much more setup than one would want if generators were to idiomatic. In particular, chan creation (since for a synchronous generator it would be unbuffered anyway), and explicit closing or signalling of early termination is burdensome. Ideally, you'd want something like:

func consumer() {
   // gen could be a builtin that takes a function call expression returning a single
   // named <-chan, makes that chan, and produces a sentinel'd panic in producer when
   // the following loop is exited. The runtime will implicitly catch the sentinel,
   // preventing program termination.
   for i := range gen(producer(0)) {
      fmt.Println(i)
      if i > 50 {
         break
      }
   }
}

func producer(i int) (ch <-chan int) {
   for {
      ch <- i
      i++
   }
}


This is very close to the previous proposal for:

vals :=  make(<-chan int, func(ch chan<- int) {
    for i := 0; i < 100; i++ {
        ch <- i
    }
}())

right?

I would actually prefer the make(chan int, producer) form (over gen(producer) form), because it makes it clear that the construct returns a chan. However, I think the exact syntax does not matter that much on this stage.


 

It's a bit more magic than I think any of us would like, though if generators are ever going to become first-class, I believe code volume will need to kept about as minimal as demonstrated in the above example -- whether or not something like the above syntax is used, the current minimum code required involves too much "boilerplate" to make it non-tedious for frequent use. I think the most ideal (and simplest) semantics would be: the producer does not execute any instructions until a receive on its channel has occurred, and then it only executes until its next send; at no time are both the consumer and producer simultaneously executing (this is, for example, how python treats generators). This doesn't address multiple consumers, though those use cases tend to complicate things enough that the current syntax and semantics are usually already ideal.

The comment text above is in terms of familiar concurrent go semantics, though an efficient implementation may not actually produce a separate goroutine (or would at least treat the pair of goroutines as a combined scheduling entity).

--------------------------------

Moving past the above... Assuming current syntax and semantics are retained (rather than extended), it seems that using channels as the primary unit of analysis is problematic: just because a channel does not escape a set of two goroutines does not mean that those goroutines do not communicate across other channels as well. Transitive goroutine relationships may be a better unit of analysis -- if it can be determined that all channel use within a given stretch of code are such that the graph of communicated-with goroutines is guaranteed to flow along at most one edge at a time (which probably implies that all channels are unbuffered), then those goroutines can all be statically scheduled together. `6g -m` could be made to also indicate whether or not any concurrent goroutines were determined to form an isolated mutually-exclusive-execution network.

In terms of potential optimizations, all chans in the graph could be combined into a compact memory structure along the lines of:

struct coroutineChanSet {
  int chanId;
  union {
    // union of channel types
  } chanVal;
};

(since analysis would have already guaranteed one channel in use at a time, a tagged union could service all channels). Signalling wouldn't be needed since jmp's could be used to switch goroutines. If there's reason not to inline the goroutines (they're too big and are used in multiple contexts), it may be possible to generate a wrapping trampoline at compile time for each group of related goroutines that could transparently turn channel ops into some stack ops and a jmp. Of course, all this is just theory -- implementation seems like it'd be extremely difficult (non-escaping []byte <-> string optimization, for example, would probably involve far less effort yet have greater net benefit). Also, the set of scenarios in which the semantics are blatantly obvious is fairly limited: goroutines that assign nil to their channels in the middle of a select loop, for example, might not fit into this model.


Agree that it's going to be a very complicated analysis, and as a consequence most likely fragile.



 

IMO, it seems better to wait until we can make the compiler figure all this out (even if that never happens) rather than changing the language to have this today.

--

Dmitry Vyukov

unread,
Apr 15, 2013, 1:31:34 AM4/15/13
to Kevin Gillette, golang-dev, Ethan Burns
On Sun, Apr 14, 2013 at 5:54 PM, <extempor...@gmail.com> wrote:
On Sunday, April 14, 2013 6:05:24 PM UTC-6, Dmitry Vyukov wrote:
This is very close to the previous proposal for:

vals :=  make(<-chan int, func(ch chan<- int) {
    for i := 0; i < 100; i++ {
        ch <- i
    }
}())

right?

I would actually prefer the make(chan int, producer) form (over gen(producer) form), because it makes it clear that the construct returns a chan. However, I think the exact syntax does not matter that much on this stage.

I wasn't actually intending to make a language proposal -- just demonstrating approximately the amount of code volume we'd need to get rid of this for practical use. The make proposal certainly fulfills that need, and is much less magical. 

There are still some additional deadlock possible, e.g. the following code will deadlock
...
because the producer gooroutine will be blocked on c1 send, while the consumer waits for a value on c2.

That's opposite to the behavior I'd expect (since just about anything with more than one channel outside of select, even when reading and writing in a consistent order, which is generally expect to avoid deadlock, will deadlock). I see why from an implementation perspective you'd want "blocks until *next* request" so as to force a something like lock-step execution behavior without requiring the core scheduler to be aware of what you're doing. However, this still wouldn't prevent the producer/generator from running independently until it reaches the first non-parallel channel op.

Yes, and the same happens when producer closes the chan -- it runs concurrently with consumer.
I do not think that this is a serious problem for performance (however probably it can be if the generator generates very few values). But your make/gen version avoids it entirely.

 
Making the scheduler aware of this execution flow would allow it to prevent simultaneous execution without the restrictive semantics.

The make-generator proposal, for example, could mark the implicitly launched goroutine as being execution-constrained to the given channel: the goroutine does not get scheduled until there is a blocking receive waiting on that channel

Yes, that's the point.
 
(so a select on that channel is never ready). Once such a recv operation begins, the recv'ing G blocks until the producing G is unscheduled (which in the case of 1 producer and 1 consumer, would be immediate); the producer G is then scheduled onto the same M that the consuming G was running on. Without a pre-emptive scheduler, this may be what happens already (at least for the initial iteration),

Yes, this is roughly what happens already with GOMAXPROCS=1, but there are still some additional overheads. But with GOMAXPROCS>1 currently it degrades badly.
 

except that in this case the running G would be guaranteed to be unscheduled as soon as it crosses the communication boundary of the specially made channel.


Kyle Lemons

unread,
Apr 23, 2013, 7:09:19 PM4/23/13
to pab...@google.com, golang-dev, Kevin Gillette, Ethan Burns



On Tue, Apr 23, 2013 at 1:45 PM, <pab...@google.com> wrote:
Hello, I started discussing coroutines with Dmitry a few weeks ago.  Dmitry told me last week he had
moved the discussion to golang-dev with some ideas on the topic.  I've just managed to read through
all the messages.

It might be helpful to give my original points about coroutines that started the discussion, because
the discussion in golang-dev started in the middle with an implementation proposal for an unknown
mechanism, which people later deduced to be coroutines.

I want to start with two primitive execution mechanisms: thread and stack. A thread causes action in
a program and a stack is the mechanism used by a thread to perform sequential execution using
routines.

It is possible to draw a table of how these two basic mechanism are combined to generate programming
language constructs:

                         stack
              |    no       |    yes
-------------+--------------+-------------
        no  | 1. routine | 2. coroutine
thread  --+--------------+-------------
       yes | 3. ???      | 4. goroutine

1. routine   : use caller's thread and stack
2. coroutine : use caller's thread and coroutine's stack
3. ???       : use thread but no stack (difficult to imagine)
4. goroutine : use goroutine's thread and stack

What's nice about this table is that it suggests language constructs based on fundamental
mechanisms.  Not all combinations make sense, but ones that make sense are often difficult or
impossible to implement with the others.  Go has language constructs for 1 and 4 but not 2.

Coroutines are a powerful extension to sequential programming, i.e., the ability to retain BOTH data
and execution state between calls. Closures can retain data state between calls but not execution
state. Retaining execution state significantly simplifies writing call-backs, device-drives, and
other mechanisms in event-driven programming. And even with low-cost user-level threads,
event-driven programming is still useful and necessary in many situations.  Finally, it is a race
for two threads to simultaneously execute a coroutine as the stack is shared, so simultaneous
execution is undefined as for other unsafe shared accesses. Threads can take turns executing a
coroutine, but this kind of sharing is infrequent.

Coroutines are popular now in a number of languages, such as Python's "yield". Coroutines can be
stackless (actually single frame) or stackfull.  For example, coroutines in Python can yield and
retain state between the yields but only in the single frame containing the "yield"; a stackless
coroutine cannot call out and have one of the called routines yield. A stackfull coroutine can call
out and yield at any point, which requires a full stack.

So I have tried to give some justification for having coroutines in a programming language; if you
don't agree with these reasons, you can stop reading now.

Since Go does not have coroutines there are 3 options:

1. simulation using existing language mechanisms
2. extend language
   a) using an existing mechanism, like channels
   b) using new constructs


1. Simulation
=============

If a language can simulate a mechanism with reasonable syntax and performance, that is the preferred
path.

Crucial point: coroutines are sequential. Therefore, a simulation cannot display any form of
concurrency, which would generate non-program-order behaviour.

Attached is my sample simulations, which illustrate the issues. The larger questions is can the
final and most general simulation be packaged in a simpler form that is easier for programmers?
I used the simulation approach to implement a few attached example coroutine programs.


2. Extend Language
==================

a) Dmitry and others showed how to extend the channel mechanism to achieve a better simulation, with
   possible additional functionality.

b) The final approach is to add new language constructs, e.g.:

co wholes func() int { // create coroutine
count := 0
for {
yield count
count += 1
fmt.Println("X")
}
}()
for n := 0; n < 10; n += 1 {
fmt.Println(wholes())
}

   This CLEARLY needs more thought, so don't jump on me. However, a new-construct approach should
   give the best performance because it is a direct context switch between coroutines without having
   to play with channels and/or perform a Go-runtime schedule. Finally, there are no special cases
   with respect to channels. The thread of the goroutine executing the coroutine is the one
   manipulating the channel so no new semantics are required; the goroutine thread is just executing
   on a different stack.

straw man: `go` can take more than one argument, and they share the same G -- that is, only one can be executing at a time, communication that would block switches immediately without calling into the scheduler, etc.
bonus points` `co` is like go, but adds the goroutine to the current goroutine's G.

Seems simple and orthogonal, but *hands out torches*

jtc...@gmail.com

unread,
Dec 29, 2013, 1:41:17 PM12/29/13
to golan...@googlegroups.com, Kevin Gillette, Ethan Burns


On Tuesday, April 23, 2013 4:45:57 PM UTC-4, pab...@google.com wrote:

It is possible to draw a table of how these two basic mechanism are combined to generate programming
language constructs:

                         stack
              |    no       |    yes
-------------+--------------+-------------
        no  | 1. routine | 2. coroutine
thread  --+--------------+-------------
       yes | 3. ???      | 4. goroutine

1. routine   : use caller's thread and stack
2. coroutine : use caller's thread and coroutine's stack
3. ???       : use thread but no stack (difficult to imagine)
4. goroutine : use goroutine's thread and stack

3 ??? => user space trap/interrupt, perhaps for instruction emulation or extension,  Not commonly supported/implemented but does exist.

 Jonathan
Reply all
Reply to author
Forward
0 new messages