Waitgroup problem

212 views
Skip to first unread message

Pete Wilson

unread,
Jan 16, 2021, 11:28:59 AM1/16/21
to golan...@googlegroups.com
Gentlepersons

I asked for advice on how to handle a problem a few days ago, and have constructed a testbed of what I need to do, using WaitGroups in what seems to be a standard manner.

But the code fails and I don’t understand why.

The (simple version of) the code is at https://play.golang.org/p/-TEZqik6ZPB

In short, what I want to do is to have a controller goroutine (main) plus some number of worker goroutines

I implement a Barrier function which operates on a properly-initialised waitgroup.

The Barrier function simply does Done() then Wait()

What I want is that each worker does a two-phase operation
- wait until everybody has passed a start barrier
- do some work
- wait until everybody has passed an end barrier
- do some work

.. doing this some number of times

In parallel, main has created and initialised the start and end waitgroups wgstart and wgend
main has then created the worker goroutines (in the real thing I want roughly one worker per core, so there’s also some setting of GOMAXPROCS)
main then enters a loop in which it

- waits until everbody including it has passed the start barrier
- resets the start barrier
- waits until everybody has bassed the end barrier
- resets the end barrier

This behaviour is observed, except the code panics, both in the playgorund and on my machine. Typical failure is:

----------- [2] main about to barrier start ---------------

	w[7] enters barrier startpanic: sync: WaitGroup is reused before previous Wait has returned

goroutine 10 [running]:
sync.(*WaitGroup).Wait(0xc00002c030)
	/usr/local/go-faketime/src/sync/waitgroup.go:132 +0xae
main.Barrier(0x4, 0x4bef21, 0x3, 0xc00002c030)
	/tmp/sandbox686473236/prog.go:51 +0x12b
main.worker(0x4, 0xc00002c020, 0xc00002c030, 0xa)
	/tmp/sandbox686473236/prog.go:35 +0x309
created by main.main
	/tmp/sandbox686473236/prog.go:78 +0x295

What have I misunderstood and done wrongly?

Thanks!

— P

WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received. 

http://www.bsc.es/disclaimer 






WARNING / LEGAL TEXT: This message is intended only for the use of the individual or entity to which it is addressed and may contain information which is privileged, confidential, proprietary, or exempt from disclosure under applicable law. If you are not the intended recipient or the person responsible for delivering the message to the intended recipient, you are strictly prohibited from disclosing, distributing, copying, or in any way using this message. If you have received this communication in error, please notify the sender and destroy and delete any copies you may have received.

http://www.bsc.es/disclaimer

jake...@gmail.com

unread,
Jan 16, 2021, 11:59:19 AM1/16/21
to golang-nuts
There may be other problems as well, but the WaitGroup.Add documentation says:
" If a WaitGroup is reused to wait for several independent sets of events, new Add calls must happen after all previous Wait calls have returned."

You have a race condition. What I believe is happening is the following:
  • The last goroutine calls `Barrier(w, "start", wgstart)`. That calls barrier.Done(). It then calls Wait(), but Wait() has not returned.
  • Meanwhile main() calls `Barrier(threads, "start", &wgstart)`. The Wait() in that call returns because all the goroutines have called Done().
  • main() calls `wgstart.Add(threads + 1)`
  • The goroutine from above is still in the Wait() call, hence the panic.  
There is also another possible scenario, that is not causing the panic I see, but could cause incorrect behavior:
  • The last goroutine calls `Barrier(w, "start", wgstart)`. That calls barrier.Done().
  • Meanwhile main() calls `Barrier(threads, "start", &wgstart)`. The Wait() in that call returns because all the goroutines have called Done().
  • main() calls `wgstart.Add(threads + 1)`
  • The goroutine from above now calls Wait(), but since Add was already called, it blocks. That goroutine is now 'stuck',  because Wait() will never return, which will in turn end up blocking all the other goroutines eventually.
Honestly, I think you need to rethink your whole model.

Hope that helps.

Brian Candler

unread,
Jan 16, 2021, 4:02:52 PM1/16/21
to golang-nuts
On Saturday, 16 January 2021 at 16:28:59 UTC Pete Wilson wrote:
In short, what I want to do is to have a controller goroutine (main) plus some number of worker goroutines

This doesn't answer your question, but if you haven't seen it already I recommend this video about concurrency patterns in go:
 
All of it is well worth watching, but an example of using a semaphore channel instead of a worker pool starts at 32:15.
Reply all
Reply to author
Forward
0 new messages