Buffered channel that never blocks on a send

Maxim Khitrov

unread,

May 25, 2012, 10:24:11 PM5/25/12

to golang-nuts

Hello,

What are my options if I need a channel that would never block on a
send? Ideally, I need a channel with a dynamically expanding buffer
that could accommodate as many items as necessary (until the
app/system runs out of memory, basically). As far as I can tell, there
are no existing methods for accomplishing this.

The solution that I have in mind is to create two channels, a linked
list (container/list), and a goroutine. The goroutine would receive
values on the first channel and try to send them on the second. When a
send would block, the value is appended to the list instead. The
goroutine then does a select on both channels, appending newly
received values to the list until the send channel becomes available.
After that, all values in the list are sent out until the list is
empty or the send channel becomes full again. Repeat the process until
the inbound channel is closed.

I think this will work, but it seems like a complicated solution for
something that ought to be very simple. Any suggestions?

- Max

John Asmuth

unread,

May 25, 2012, 11:32:19 PM5/25/12

to golan...@googlegroups.com

http://github.com/kylelemons/iq

You'll have to replace "Type" with your actual type.

Maxim Khitrov

unread,

May 26, 2012, 10:11:45 AM5/26/12

to John Asmuth, golan...@googlegroups.com

Thanks. That looks pretty much exactly like what I was planning,
except for using a slice instead of a linked list. I'm not sure that
this is a good idea. All you need is constant time push/pop operations
in FIFO order. A slice will not only perform unnecessary copy
operations, but it will also keep references to items that have been
poped, at least until it runs out of space during an append.

- Max

John Asmuth

unread,

May 26, 2012, 10:23:00 AM5/26/12

to golan...@googlegroups.com, John Asmuth

Two things you can do here. One, you can zero out items after they're popped. That takes care of the GC issues. Two, you can use a circular array (ring.go) to avoid unnecessary copy operations.

⚛

unread,

May 26, 2012, 11:25:07 AM5/26/12

to golan...@googlegroups.com

I believe this is a wrong idea because you won't have control over scheduling. You need to tell Go's scheduler when to run the goroutine which is generating values and when to run the goroutine which is processing the values.

Blocking is a scheduling decision.

On Saturday, May 26, 2012 4:24:11 AM UTC+2, Maxim Khitrov wrote:

Hello,

What are my options if I need a channel that would never block on a
send?

There are no options. You need to write a program that will be making scheduling decisions.

Steven Blenkinsop

unread,

May 26, 2012, 11:57:27 AM5/26/12

to ⚛, golan...@googlegroups.com

On Saturday, May 26, 2012, ⚛ <0xe2.0x...@gmail.com> wrote:

I believe this is a wrong idea because you won't have control over scheduling. You need to tell Go's scheduler when to run the goroutine which is generating values and when to run the goroutine which is processing the values.

Blocking is a scheduling decision.

I don't think this is really relevant to Maxim's inquiry. Strictly speaking, you cannot guarantee that a goroutine won't be interrupted without controlling the scheduler. However, as far as program logic is concerned, a goroutine is really only blocked if it cannot be scheduled. Of course, in John's example, the goroutine will block waiting for the buffering goroutine to receive values. However, this isn't really important in a larger sense because the buffering goroutine itself *can't* block, other than occasionally when waiting to receive a value. Since at least one of the sending goroutine and the buffering goroutine can always be scheduled, you can guarantee that your program won't deadlock when you send on the channel, which is really what you care about as far as correctness is concerned.

Maxim Khitrov

unread,

May 26, 2012, 12:10:04 PM5/26/12

to ⚛, golan...@googlegroups.com

I may not be dealing with separate goroutines. What I mean is that my
program could be using a channel from the same goroutine, just as a
way of passing values from one part of the program to another. A
blocked send could cause a deadlock, since there may not be any other
goroutine reading from the channel at that time. To avoid this, I need
a guarantee that a send will never block, even if the channel's buffer
becomes full.

Here's what I came up with as a working demo:

http://play.golang.org/p/HXjmmRXl6k

- Max

⚛

unread,

May 26, 2012, 1:23:22 PM5/26/12

to golan...@googlegroups.com, ⚛

If sender and receiver are in the same goroutine you should use an array, slice, fifo, ...

A channel implies at least two goroutines.

Thomas Bushnell, BSG

unread,

May 26, 2012, 2:42:31 PM5/26/12

to ⚛, golan...@googlegroups.com

Huh? It certainly doesn't; there are plenty of uses of channels that don't involve goroutines.

Thomas

Alexey Borzenkov

unread,

May 26, 2012, 4:39:59 PM5/26/12

to Maxim Khitrov, golang-nuts

On Sat, May 26, 2012 at 6:24 AM, Maxim Khitrov <m...@mxcrypt.com> wrote:
> The solution that I have in mind is to create two channels, a linked
> list (container/list), and a goroutine. The goroutine would receive
> values on the first channel and try to send them on the second. When a
> send would block, the value is appended to the list instead. The
> goroutine then does a select on both channels, appending newly
> received values to the list until the send channel becomes available.
> After that, all values in the list are sent out until the list is
> empty or the send channel becomes full again. Repeat the process until
> the inbound channel is closed.

I recently had the same problem, and the most efficient solution in my
case turned out to be using a channel of "message batches/slices" (you
could use a linked list, if you want) with buffer size of 1 instead of
two channels (this gave me most control). Basically, you do this:

ch = make(chan []Message, 1)

There must be only one true sender and receiver, but the sender should
both receive and send, and receiver can only receive. What you do in
the sender is receive the current slice and if you do append to it and
put it back, otherwise you make and new slice and put it there.
Receiver takes "batches" and processes them. Here's an example:

https://gist.github.com/2714090

You can also put a device around it, but it has its caveats, biggest
being that close() on the sending side would leak memory unless
receiver reads it all (because flattening goroutines will keep it from
garbage collection). Here's an example:

https://gist.github.com/2721869

You may also read this discussion on many other solutions:

https://groups.google.com/forum/?fromgroups#!topic/golang-nuts/SgcUteZlmzQ

kortschak

unread,

May 26, 2012, 5:29:40 PM5/26/12

to golan...@googlegroups.com, ⚛

Yes, Rob specifically talks about this in the lexer talk.

On Sunday, 27 May 2012 04:12:31 UTC+9:30, Thomas Bushnell, BSG wrote:

Huh? It certainly doesn't; there are plenty of uses of channels that don't involve goroutines.

Thomas

Carlos Castillo

unread,

May 29, 2012, 4:31:53 PM5/29/12

to golan...@googlegroups.com

If you are only worried about the sending code blocking, and don't care about the order of the elements, do the send in a new goroutine:

func (ch Chan) send (val Type) {
go func() { ch <- val } ()
}

If the receiver isn't ready, the only code that blocks is the new goroutine, not your calling code. The goroutines generated this way will sit around waiting for the receiver to eventually get to their send. The only downsides are that it uses more memory per value sent (ie: the minimum stack-size), and that as far as I can tell, there is nothing in the spec which guarantees any order.

Maxim Khitrov

unread,

May 29, 2012, 4:48:28 PM5/29/12

to Carlos Castillo, golan...@googlegroups.com

Interesting idea, but the order of sends is important to me.

- Max

Paul Borman

unread,

May 29, 2012, 5:44:20 PM5/29/12

to Maxim Khitrov, John Asmuth, golan...@googlegroups.com

I would not worry about the slice vs linked list. It is only a problem if the consumer can not keep up with the producer. The amount you will keep around is the longest run where the consumer does not empty the queue. If you are worried about that problem then perhaps there is a meta-issue to worry about.

"We'll lose a little on every sale and make it up in volume."

-Paul

John Asmuth

unread,

May 30, 2012, 9:48:50 AM5/30/12

to Paul Borman, Maxim Khitrov, golan...@googlegroups.com

What I worry about is slice-or-linked-list vs a ring buffer. With a slice or with a linked list, if you send 10m items through the queue, there will be 10m slots allocated. Not all at the same time, of course, but if you add up all the new(ListNode) and make([]T, X), it will be 10m. With a ring buffer, you will only have to allocate enough to match the peek load. After that peek load has been hit, you can continue to use the same ring forever.

Steven Blenkinsop

unread,

May 30, 2012, 10:44:24 AM5/30/12

to John Asmuth, Paul Borman, Maxim Khitrov, golan...@googlegroups.com

On Wednesday, May 30, 2012, John Asmuth wrote:

What I worry about is slice-or-linked-list vs a ring buffer. With a slice or with a linked list, if you send 10m items through the queue, there will be 10m slots allocated. Not all at the same time, of course, but if you add up all the new(ListNode) and make([]T, X), it will be 10m. With a ring buffer, you will only have to allocate enough to match the peek load. After that peek load has been hit, you can continue to use the same ring forever.

Alternatively, you could have two slices, one for the producer to fill and one for the consumer to empty, with the two swapping whenever the consumer runs out. That way you still stop allocating once you hit peak load, but you don't have to manage a ring.

John Asmuth

unread,

May 30, 2012, 10:45:13 AM5/30/12

to Steven Blenkinsop, Paul Borman, Maxim Khitrov, golan...@googlegroups.com

Something like that can certainly work, but I don't see how it's easier than managing a ring :)

Paul Borman

unread,

May 30, 2012, 10:50:22 AM5/30/12

to John Asmuth, Maxim Khitrov, golan...@googlegroups.com

There will only be 10m items if the consumer is not able to catch up to the producer from the first element on. As soon as the consumer catches up a new slice will be used. I am not saying a ring buffer is not a good solution, only that if the slice implementation has the feared consequence that the real issue is probably not the use of the slice but the relationship of the producer to its consumer(s).

I was actually concerned about this same issue with the slice implementation until I thought about the failure case, realizing that the failure was indicative of something worse.

John Asmuth

unread,

May 30, 2012, 10:52:32 AM5/30/12

to Paul Borman, Maxim Khitrov, golan...@googlegroups.com

I'm not saying 10m items in memory at the same time. I'm just saying that 10m items will be allocated in total. For instance, 1m slices of length 10, only one of which is ever actually in memory, but all of which need to be allocated. This puts a load on the GC. With a ring buffer, it could be 10 items allocated, in total.

Thomas Bushnell, BSG

unread,

May 30, 2012, 10:57:17 AM5/30/12

to John Asmuth, Paul Borman, Maxim Khitrov, golan...@googlegroups.com

GC load should be proportional to the amount of live data, not the total allocation.

(This is why GC is asymptotically zero cost, and thus always superior to reference counting.)

Ingo Oeser

unread,

May 30, 2012, 11:05:51 AM5/30/12

to golan...@googlegroups.com, Steven Blenkinsop, Paul Borman, Maxim Khitrov

On Wed, May 30, 2012 at 10:44 AM, Steven Blenkinsop <stev...@gmail.com> wrote:

Alternatively, you could have two slices, one for the producer to fill and one for the consumer to empty, with the two swapping whenever the consumer runs out. That way you still stop allocating once you hit peak load, but you don't have to manage a ring.

On Wednesday, May 30, 2012 4:45:13 PM UTC+2, John Asmuth wrote:

Something like that can certainly work, but I don't see how it's easier than managing a ring :)

It actually is. See it as a two-element fifo. It's just that the elements are complete lists of work.

You can also easily avoid overscheduling with this pattern. Just make the lists longer, if the consumer finishes too fast.

Or you can make the lists shorter, if you need too much buffer memory, hit latency issues whatever.

All dynamically.

Just let the consumer send the list of finished work off to the producer via channel.

Then respond to the consumer with the other list of stuff already enqueued by the producer.

Amount of communication/scheduling/locking overhead was just reduced exactly by length of the list.

Maxim Khitrov

unread,

May 30, 2012, 12:10:53 PM5/30/12

to John Asmuth, Paul Borman, golan...@googlegroups.com

My implementation doesn't allocate anything if the received value can
be sent out immediately. With a slightly larger buffer for both
channels, a roughly equal send/receive rate will bypass all
linked-list code. The same modification can be made to the slice and
ring versions.

I took a look at the RingIQ code and, unless I'm missing something,
that ring will never shrink. A one-time backlog could grow the ring to
a large size and it will stay there as long as the goroutine is alive.
This seems wasteful.

Here's my self-contained ring implementation that will shrink when the
total number of elements drops to half the ring size:

http://play.golang.org/p/P8hzE0zt12

This code also zeros-out ring elements to avoid keeping references (if
Type is a pointer, for example). It seems like the best option if the
goal is to keep the number of allocations down without wasting too
much space.

- Max

roger peppe

unread,

May 30, 2012, 12:15:40 PM5/30/12

to Maxim Khitrov, John Asmuth, Paul Borman, golan...@googlegroups.com

i wrote a blog post around this topic a while ago.
the code is pre-Go 1, but the principles are
still the same, i think.

http://rogpeppe.wordpress.com/2010/02/10/unlimited-buffering-with-low-overhead/

DisposaBoy

unread,

May 30, 2012, 2:37:38 PM5/30/12

to golan...@googlegroups.com, Paul Borman, Maxim Khitrov

On Wednesday, May 30, 2012 3:52:32 PM UTC+1, John Asmuth wrote:

I'm not saying 10m items in memory at the same time. I'm just saying that 10m items will be allocated in total. For instance, 1m slices of length 10, only one of which is ever actually in memory, but all of which need to be allocated. This puts a load on the GC. With a ring buffer, it could be 10 items allocated, in total.

FWIW the cost of GC and allocation, at least in the gc linux amd64 implementation is probably not as high as one would assume. I've done some research into implementing an efficient bitset probably last christmas and what I found was that while the existing implementations seemed to be aimed at apis that encourage you to select the ideal size of the set, I just went ahead and allocated small slices in the the order of millions vs one big slice and my implementation used both a lower peak memory and was faster. Ofcourse YMMV

Kyle Lemons

unread,

May 30, 2012, 5:07:36 PM5/30/12

to Maxim Khitrov, John Asmuth, Paul Borman, golan...@googlegroups.com

You're not missing anything. I think you'll find that there is very rarely such a thing as a "one-time backlog;" from my experience, when there is a backlog it's either related to load or periodic in nature, in which case not having to reallocate the large ring (whose old memory is potentially fragmented now) when the condition reoccurs is a net win for resident size. If you want lowest memory footprint, use the slice iq version. I have been using that one for a long time and it hasn't been a performance/memory bottleneck, but there are some easy wins like reusing the slice in the one-element case.

Maxim Khitrov

unread,

May 31, 2012, 3:17:13 PM5/31/12

to Kyle Lemons, John Asmuth, Paul Borman, golan...@googlegroups.com

After a bit more testing, I decided to go with the slice version. In
reality, all variations discussed in this thread would work fine.
Thanks to everyone for their suggestions.

As pointed out, the linked list could create a lot of extra garbage
and was a bit slower than all other alternatives in my testing. This
could be improved by storing multiple values in each node, but I don't
think that's worth it. Ring introduces extra logic and only wins over
the slice version on the total number of allocations and copies. The
overall performance is about the same or even slower, depending on the
actual workload.

Now I have a new problem to deal with. Suppose I want to drain the
entire pipe without indefinite blocking. The 'to' end of the pipe has
a channel buffer for just one value, the rest are stored in the
internal buffer (slice). The following code seems to work in some
cases, but not others:

for done := false; !done; {
select {
case v := <-to:
...
default:
done = true
}
}

Basically, I'm not sure how the scheduler works when 'v := <-to' is
evaluated and 'to' is found to be empty. If I do something like
'fmt.Println(v)' after v is received, then the loop receives all
buffered values. Removing that print statement causes the loop to exit
after the first value. How can I distinguish the case where 'to' is
empty because the pipe goroutine didn't get a chance to refill to's
buffer, from the case where there are no more values available?

- Max

Kevin Ballard

unread,

May 31, 2012, 3:38:04 PM5/31/12

to Maxim Khitrov, Kyle Lemons, John Asmuth, Paul Borman, golan...@googlegroups.com

You could replace the `fmt.Println(v)` with `runtime.Gosched()`. This should allow the buffering goroutine to re-fill the outgoing channel, although that's making the assumption that Gosched will allow all other goroutines to continue before returning to the current one. And it's also assuming that you're not running concurrently with the buffering goroutine (e.g. with GOMAXPROCS > 1).

An alternative would be to simply stick in a timeout, like

loop:

for {

select {

case v := <-to:

// …

case _ := <-time.After(1 * time.Microsecond):

break loop

}

although this does mean that when the buffer is finally empty you'll block for a microsecond before continuing.

Alternatively, you could modify the buffering goroutine to keep an int32 around that records whether its buffering anything (0 means no, 1 means yes), and then use the sync/atomic routines to set it appropriately. Then you can expose this variable to the consuming code and it can test this variable before doing a blocking read on the channel.

-Kevin

roger peppe

unread,

May 31, 2012, 7:15:27 PM5/31/12

to Maxim Khitrov, Kyle Lemons, John Asmuth, Paul Borman, golan...@googlegroups.com

On 31 May 2012 20:17, Maxim Khitrov <m...@mxcrypt.com> wrote:
> Now I have a new problem to deal with. Suppose I want to drain the
> entire pipe without indefinite blocking.

why would you want to do that?

Maxim Khitrov

unread,

May 31, 2012, 7:41:09 PM5/31/12

to roger peppe, Kyle Lemons, John Asmuth, Paul Borman, golan...@googlegroups.com

The practical answer is to move all existing data from one pipe to
another. I have a different solution that bypasses this requirement,
but that's what I wanted to do initially.

The theoretical answer is that this whole issue grew out of the fact
that Go channels don't have dynamic buffers. Ideally, the receiver
shouldn't be concerned with the implementation details and that there
is a separate goroutine behind the whole arrangement.

Here's a scenario to consider. You have some data coming in over the
network. Different pieces are sent to different pipes. Now you want to
redirect the data from one of those pipes into a new pipe while
preserving the original order. To do this, you have to be sure that
the pipe you are redirecting from is empty before more data is read
from the network.

If these were normal buffered channels, there is no problem. Keep
reading with a select statement that has a default case or until
len(ch) == 0. At that point, you are sure that there is no more data
and you can resume network receive operations. With the pipe
implementation, select and len can't be used, because some of the data
is hidden in the internal slice.

- Max

roger peppe

unread,

Jun 1, 2012, 2:56:32 AM6/1/12

to Maxim Khitrov, Kyle Lemons, John Asmuth, Paul Borman, golan...@googlegroups.com

On 1 June 2012 00:41, Maxim Khitrov <m...@mxcrypt.com> wrote:
> On Thu, May 31, 2012 at 7:15 PM, roger peppe <rogp...@gmail.com> wrote:
>> On 31 May 2012 20:17, Maxim Khitrov <m...@mxcrypt.com> wrote:
>>> Now I have a new problem to deal with. Suppose I want to drain the
>>> entire pipe without indefinite blocking.
>>
>> why would you want to do that?
>
> The practical answer is to move all existing data from one pipe to
> another. I have a different solution that bypasses this requirement,
> but that's what I wanted to do initially.
>
> The theoretical answer is that this whole issue grew out of the fact
> that Go channels don't have dynamic buffers. Ideally, the receiver
> shouldn't be concerned with the implementation details and that there
> is a separate goroutine behind the whole arrangement.
>
> Here's a scenario to consider. You have some data coming in over the
> network. Different pieces are sent to different pipes. Now you want to
> redirect the data from one of those pipes into a new pipe while
> preserving the original order. To do this, you have to be sure that
> the pipe you are redirecting from is empty before more data is read
> from the network.

I don't see this. Surely you have one goroutine reading data from the network
and another goroutine reading from the pipe and redirecting it to the new pipe?
No problem (and no need for infinite buffering either, although there
might be some other reason you want to read as fast as possible
from the network, I guess).

Non-blocking reads to empty a channel can be useful as an optimisation
technique, but if you really want the ability to empty a channel, you'll have
to use other means, such as a channel that you can use to talk to the
buffering goroutine to say "please don't accept any more incoming data
and tell me when the buffer is empty". I've never needed to do that though -
a close at the end is usually sufficient.

Reply all

Reply to author

Forward