Calling sync.WaitGroup.Add() while a sync.WaitGroup.Wait() is waiting

472 views
Skip to first unread message

Jon Bodner

unread,
Apr 26, 2018, 11:29:26 AM4/26/18
to golang-nuts
Hello,

We were having a discussion at work about passing around references to sync.WaitGroup. I said it was a bad idea, because if a goroutine launched a goroutine of its own and called sync.WaitGroup.Add() after a goroutine called sync.WaitGroup.Wait(), but before the count dropped to 0, your code would panic. The comments seem to support this: 

"If a WaitGroup is reused to wait for several independent sets of events, new Add calls must happen after all previous Wait calls have returned."

And the WaitGroup source code for Add has lines like this:

if w != 0 && delta > 0 && v == int32(delta) {
panic("sync: WaitGroup misuse: Add called concurrently with Wait")
}

But a co-worker wrote this code, and it runs without a panic: https://play.golang.org/p/NjqK_YoqHeH

I thought that this code might only be working because the playground is single-threaded, but when this code is run locally, it also completes without panics. Running it with the race flag also works properly.

How do you trigger those panics in the WaitGroup codebase?

Thanks,

Jon

Ian Lance Taylor

unread,
Apr 26, 2018, 2:47:56 PM4/26/18
to Jon Bodner, golang-nuts
Here's one panic:

package main

import (
"sync"
"time"
)

func main() {
var wg sync.WaitGroup
wg.Add(1)
go func() {
time.Sleep(time.Millisecond)
wg.Add(-1)
wg.Add(1)
}()
wg.Wait()
}

Ian

delepl...@gmail.com

unread,
Apr 26, 2018, 2:57:13 PM4/26/18
to golang-nuts
Hello Jon
I think this is working as expected. 
The relevant part of the doc is "calls with a positive delta that start when the counter is greater than zero, may happen at any time".

This is what happens in your code (stdout on the left, my comments on the right)

                  here wg is created, its counter is 0
                  here after first Add, counter is now 1
waiting
                  ...here the first child goroutine is sleeping for 1s...
adding from top level goroutine
                  here after second Add, counter is now 2
done from top level goroutine
                  here after Done, counter is now 1
done from background goroutine
                  here after Done, counter is now 0

There are 3 goroutines involved, and while a few different interleaving of messages are possible, what is important here is that
- the first Add happens-before wg.Wait()
- the first Add happens-before the second Add
- the second Add happens-before the calls to Done, which implies that the second Add start when the counter is greater than zero (it is 1)

The part of the doc that you quote "If a WaitGroup is reused..." warns against a race condition between a "Wait" and an "Add positive number to a zero counter", which is not the case here.
I think it can be rephrased as "make sure all the calls to wg.Wait have properly returned (happen-before) and then you may start reusing the WaitGroup if you wish". 

As a side note, you have correctly spotted that it is legal to pass around a pointer to the WaitGroup (as opposed to passing it by value, which would be illegal). But in simple cases like this one, it is even more idiomatic (and legible) to use it directly from the closures scope : https://play.golang.org/p/nF9Am77e4u-

Hope this helps :)
Don't hesitate to ask for details on this topic
Valentin
Message has been deleted

David Collier-Brown

unread,
Apr 26, 2018, 7:49:25 PM4/26/18
to golang-nuts
I also ran into a race diagnostic  when I did an "Add" after my main programs started "Wait"ing: I moved the add to the main program and cured it.

I read this as the WaitGroup being careful and efficient about use of locking primitives, and not liking the expensive operation of incrementing a semaphore-like construct while someone is waiting for it to change towards zero. 

Logically, it's easy to prove correctness of a counter that increments to a value and then decrements to zero, and less easy to deal with one that jumps up and down (and maybe waves its arms in time with the music (;-))

--dave

roger peppe

unread,
Apr 27, 2018, 7:29:50 AM4/27/18
to Jon Bodner, golang-nuts
On 26 April 2018 at 16:29, Jon Bodner <j...@bodnerfamily.com> wrote:
> Hello,
>
> We were having a discussion at work about passing around references to
> sync.WaitGroup. I said it was a bad idea, because if a goroutine launched a
> goroutine of its own and called sync.WaitGroup.Add() after a goroutine
> called sync.WaitGroup.Wait(), but before the count dropped to 0, your code
> would panic. The comments seem to support this:
>
> "If a WaitGroup is reused to wait for several independent sets of events,
> new Add calls must happen after all previous Wait calls have returned."
>
> And the WaitGroup source code for Add has lines like this:
>
> if w != 0 && delta > 0 && v == int32(delta) {
> panic("sync: WaitGroup misuse: Add called concurrently with Wait")
> }
>
> But a co-worker wrote this code, and it runs without a panic:
> https://play.golang.org/p/NjqK_YoqHeH

FWIW This code is fine, and was part of the original design goal of
WaitGroup. When the count is known to be non-zero (for example because
the current goroutine has previously done an Add) it's OK to call Add
to add more.

The key in the above quote is "If a WaitGroup is reused to wait for
several independent sets of events". If you're reusing a WaitGroup,
this makes sense, but in the example above, the WaitGroup is not
reused so all is good.

As for the panic, others have already explained how that might happen.

Jon Bodner

unread,
Apr 28, 2018, 12:09:00 AM4/28/18
to golang-nuts
Thanks, everyone, for your answers! I think I understand what sync.WaitGroup is doing now.

Jon
Reply all
Reply to author
Forward
0 new messages