Counting Semaphore

egon

unread,

May 12, 2013, 9:52:14 AM5/12/13

to golan...@googlegroups.com

I'm trying to implement a counting semaphore and in the "runtime" package there are semacquire and semrelease that could be used to implement it. I'm trying to figure why aren't they visible -- my guess is that something to do with portability. I can understand the reason why there isn't a Semaphore in sync package since usually the problem can be solved much better by other means. I saw the discussion in: https://groups.google.com/forum/#!searchin/golang-nuts/counting$20semaphore/golang-nuts/G5EchCmYy9E/OvoKcysF7nAJ and I'm currently using implementation provided there.

The reason for using counting semaphore instead of channel is to provide a job queue where items can be reordered / reprioritized. The other reason is that the number of items in the job-queue can get to 1M+.

The basic idea is to enumerate a graph in parallel. The longer problem description can be found at https://gist.github.com/egonelbre/5262227 and solutions https://gist.github.com/egonelbre/5283953#file-solution-py (the solution uses two semaphores, although one could be just a mutex)

I could use recursively goroutines, but starting 1M routines would be very wasteful. I could use polling but it would be nicer if the workers were sitting quietly. I could use request/response, but that would introduce additional communication and would be quite complicated to get right.

So my questions are:

1. are there better solutions?

2. why aren't semacquire/semrelease public?

thanks,

+egon

Dmitry Vyukov

unread,

May 12, 2013, 10:44:03 AM5/12/13

to egon, golang-nuts

You can use chan struct{} as a semaphore. Prefill it with items, then use chan recv as Down and chan send as up.

runtime.semacquire is not exposed because it's not the public API we want to expose.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

egon

unread,

May 12, 2013, 1:23:42 PM5/12/13

to golan...@googlegroups.com, egon

On Sunday, May 12, 2013 5:44:03 PM UTC+3, Dmitry Vyukov wrote:

You can use chan struct{} as a semaphore. Prefill it with items, then use chan recv as Down and chan send as up.

Thanks, somehow I remembered that channel of any element consumed a ton of memory.

runtime.semacquire is not exposed because it's not the public API we want to expose.

understood :)

Tad Glines

unread,

May 12, 2013, 4:18:28 PM5/12/13

to golang-nuts, egon, voidl...@gmail.com

As Dmitry stated earlier, you can use "chan struct{}" as a semaphore. It doesn't appear consume any noticeable amount of memory.

For example, "make(chan struct{}, 2000000000)" consumes no more memory than "make(chan struct{})". At least, not in any of the tests I ran.

For example, this:

package main

import "time"

func main() {

ch := make(chan struct{}, 2000000000)

go func() {

for i := 0; i < 2000000000; i++ {

ch <- struct{}{}

}

}()

time.Sleep(60 * time.Minute)

}

seems to maintain a constant memory size.

-Tad

On Sun, May 12, 2013 at 12:12 PM, <voidl...@gmail.com> wrote:

I wish there was a counting Semaphore (with fixed memory usage) in "sync"; however, I realize this would lead to many people abusing it when they should be using a channel.

If you really need one, rolling your own is pretty simple: http://play.golang.org/p/Wo0Y_ePPJf
P.S. I did not really test this code to much, just wrote it...

Message has been deleted

Vova Niki

unread,

May 12, 2013, 5:06:04 PM5/12/13

to golan...@googlegroups.com, egon, voidl...@gmail.com

btw, is len(channel) atomic?

Dave Cheney

unread,

May 12, 2013, 5:12:09 PM5/12/13

to Vova Niki, golan...@googlegroups.com, egon, voidl...@gmail.com

Yes, and no.

Reading the length of a channel is an atomic operation, but the value read will be stale.

Tad Glines

unread,

May 12, 2013, 5:25:17 PM5/12/13

to golang-nuts, egon, voidl...@gmail.com, Vova Niki

I don't think it matters. For example:

if len(ch) > 0 {

val := <-ch

}

may block forever because. If len(ch) returned 1 be before it could read the channel, a different go routine, on a different thread, read the remaining value in the channel.

The only way to be able to make decisions based on len(channel) is to protect access to the channel with a lock, or maintain a separate atomically managed counter. For example:

// given

var counter uint32

ch := make(chan struct{}, 100)

// To read

r := atomic.AddInt32(&counter, -1)

if r > 0 {

value := <-ch

} else {

atomic.AddInt32(&counter, 1)

}

// To write

ch <- value

atomic.AddInt32(&counter, 1)

There are other ways to manage it and the above probably isn't the ideal solution.

Dmitry Vyukov

unread,

May 12, 2013, 11:55:49 PM5/12/13

to Tad Glines, golang-nuts, egon, Tylor Arndt, Vova Niki

On Mon, May 13, 2013 at 1:25 AM, Tad Glines <tad.g...@gmail.com> wrote:
> I don't think it matters. For example:
> if len(ch) > 0 {
> val := <-ch
> }
> may block forever because. If len(ch) returned 1 be before it could read the
> channel, a different go routine, on a different thread, read the remaining
> value in the channel.
>
> The only way to be able to make decisions based on len(channel) is to
> protect access to the channel with a lock, or maintain a separate atomically
> managed counter. For example:
> // given
> var counter uint32
> ch := make(chan struct{}, 100)
>
> // To read
> r := atomic.AddInt32(&counter, -1)
> if r > 0 {
> value := <-ch
> } else {
> atomic.AddInt32(&counter, 1)
> }
>
> // To write
> ch <- value
> atomic.AddInt32(&counter, 1)

How does it different from len(ch)?

Kevin Gillette

unread,

May 13, 2013, 1:03:09 AM5/13/13

to golan...@googlegroups.com, egon

On Sunday, May 12, 2013 11:23:42 AM UTC-6, egon wrote:

Thanks, somehow I remembered that channel of any element consumed a ton of memory.

Unless you know that it's impossible for hardware to do some task efficiently in the theoretical sense, this falls under: assume it's implemented sanely until you have personal empirical evidence to show that it isn't. Profiling a chan int with 10 million value capacity, the channel mem overhead is pretty negligible -- in my case, on an amd64 build, it looks like about 15k for every 1 million 64-bit ints, which is about 0.001%. Of course, an int actually takes up space, while the aforementioned struct{} type does not. Channel mem usage shouldn't be worth thinking about. For extremely large-scale or extremely low-latency applications, of course a low-level implementation of a counting semaphore will use less CPU time then a channel-based implementation, but in just about any other case, the cpu difference will also be negligible.

(main lesson: channels were not an afterthought in Go -- a lot of effort has been put into making them work well)

Dmitry Vyukov

unread,

May 13, 2013, 2:08:48 AM5/13/13

to Kevin Gillette, golang-nuts, egon

On Mon, May 13, 2013 at 9:03 AM, Kevin Gillette
<extempor...@gmail.com> wrote:
> On Sunday, May 12, 2013 11:23:42 AM UTC-6, egon wrote:
>>
>> Thanks, somehow I remembered that channel of any element consumed a ton of
>> memory.
>
>
> Unless you know that it's impossible for hardware to do some task
> efficiently in the theoretical sense, this falls under: assume it's
> implemented sanely until you have personal empirical evidence to show that
> it isn't. Profiling a chan int with 10 million value capacity, the channel
> mem overhead is pretty negligible -- in my case, on an amd64 build, it looks
> like about 15k for every 1 million 64-bit ints, which is about 0.001%. Of

I guess that's because it managed to get large buffer directly from
OS, so it's not yet paged in.
When the buffer will be reused for the next chan, it will consume all 8MB.

We can do lazy allocation, but it's not yet done.

> course, an int actually takes up space, while the aforementioned struct{}
> type does not. Channel mem usage shouldn't be worth thinking about. For
> extremely large-scale or extremely low-latency applications, of course a
> low-level implementation of a counting semaphore will use less CPU time then
> a channel-based implementation, but in just about any other case, the cpu
> difference will also be negligible.

We can do counting semaphore under the hood for chan struct{}.

Kevin Gillette

unread,

May 13, 2013, 4:24:27 AM5/13/13

to golan...@googlegroups.com, Kevin Gillette, egon

On Monday, May 13, 2013 12:08:48 AM UTC-6, Dmitry Vyukov wrote:

I guess that's because it managed to get large buffer directly from
OS, so it's not yet paged in.
When the buffer will be reused for the next chan, it will consume all 8MB.

I was referring to the overhead being that low. It was indeed reporting the 8MB allocation I had requested in the make call (though certainly that wouldn't [shouldn't] have been egon's fear -- one normally wouldn't consider the minimum space needed to store the number of objects requested to be a 'ton of memory').

Tad Glines

unread,

May 13, 2013, 9:33:12 AM5/13/13

to golang-nuts, Dmitry Vyukov, egon, Tylor Arndt, Vova Niki

On Sun, May 12, 2013 at 8:55 PM, Dmitry Vyukov <dvy...@google.com> wrote:

On Mon, May 13, 2013 at 1:25 AM, Tad Glines <tad.g...@gmail.com> wrote:
> I don't think it matters. For example:
> if len(ch) > 0 {
> val := <-ch
> }
> may block forever because. If len(ch) returned 1 be before it could read the
> channel, a different go routine, on a different thread, read the remaining
> value in the channel.
>
> The only way to be able to make decisions based on len(channel) is to
> protect access to the channel with a lock, or maintain a separate atomically
> managed counter. For example:
> // given
> var counter uint32
> ch := make(chan struct{}, 100)
>
> // To read
> r := atomic.AddInt32(&counter, -1)
> if r > 0 {
> value := <-ch
> } else {
> atomic.AddInt32(&counter, 1)
> }
>
> // To write
> ch <- value
> atomic.AddInt32(&counter, 1)

How does it different from len(ch)?

If all you want is to know the number of item in the channel buffer at the time of the len(ch) call, then there is no real difference.

But, if you need to make decisions based on the number of items in the channel buffer, then just using len(channel) will result in a race condition.

If you want to only read an item from the channel if there is an item to read then you could just use a select, but if you only wanted to read an item from the channel if there where more than X items, then you would need to track the count externally and atomically (or use a lock which defeats the purpose of using a channel).

Thread 1 | Thread 2

-------------------------------

n := len(ch) | n:= len(ch) | Both threads obtain the same value for len(ch)

if n > X | if n > X | Both determine that n is greater than X

| v := <-ch | Thread 2 reads a value first

v := <-ch | | Thread 1 lost the race and reads a value when it shouldn't

if n equaled X+1 then thread 1 be performing an invalid read.

By using the atomically incremented/decremented external counter, this race condition is avoided.

Dmitry Vyukov

unread,

May 13, 2013, 9:42:20 AM5/13/13

to Tad Glines, golang-nuts, egon, Tylor Arndt, Vova Niki

Ah, I see, for n>X it makes sense.

John Nagle

unread,

May 13, 2013, 2:57:16 PM5/13/13

to golan...@googlegroups.com

On 5/12/2013 6:52 AM, egon wrote:
> I'm trying to implement a counting semaphore and in the "runtime" package
> there are semacquire and semrelease that could be used to implement it. I'm
> trying to figure why aren't they visible

Go should have P and V. Bounded buffers are built from P and V.
A bounded buffer is two P/V items and a circular buffer.
With an efficient implementation of P and V, you can implement all the
other locking primitives efficiently. Implementing P and V from
channels is backwards.

Dijkstra got this right back in 1974. Then the UNIX people screwed
up the locking primitives in UNIX, and we now have two or three
generations of programmers who know only the badly chosen primitives
kludged onto UNIX over the years.

Go has this mostly right. You want a basic mutex, P and V, and
bounded buffers. Anything else can be built on those. For
example, "WaitGroup". "Add" adds to a counter N, "Done" does a V,
and "Wait" does a P operation N times.

John Nagle

Dmitry Vyukov

unread,

May 13, 2013, 3:12:56 PM5/13/13

to John Nagle, golang-nuts

Apparently this is not an efficient implementation. The WaitGroup
implementation is more efficient that that.
I am also not sure how to build an efficient RWMutex from P and V.
The primitives I find useful to build efficient synchronization
components are atomic operations and any way to park/unpark
threads/goroutines (semaphore, futex, event, etc).

Reply all

Reply to author

Forward