Waitgroup with fast error exit

2,380 views
Skip to first unread message

Tobia

unread,
Dec 13, 2013, 9:22:02 AM12/13/13
to golan...@googlegroups.com
Hi all

What is the best way to launch a number of goroutines, wait for all of them to complete in the normal case, but stop waiting as soon as one of them exits with an error value?

Can I use WaitGroup in some clever way or should I come up with my own solution based on channels?

Tobia

Dave Cheney

unread,
Dec 13, 2013, 9:25:44 AM12/13/13
to Tobia, golang-nuts
On Sat, Dec 14, 2013 at 1:22 AM, Tobia <tobia.c...@gruppo4.eu> wrote:
> Hi all
>
> What is the best way to launch a number of goroutines, wait for all of them
> to complete in the normal case, but stop waiting as soon as one of them
> exits with an error value?

If you detect an error, what happens to the remaining successful
values (i'm guessing your goroutines transmit some value or do some
work) that you have not yet processed ?

>
> Can I use WaitGroup in some clever way or should I come up with my own
> solution based on channels?

What kind of error are we talking about ? log.Fatal might be useful

If you don't want to exit the process, why not just ignore the failed goroutine.

> Tobia
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Tobia

unread,
Dec 13, 2013, 9:42:19 AM12/13/13
to golan...@googlegroups.com, Tobia
If you detect an error, what happens to the remaining successful
values (i'm guessing your goroutines transmit some value or do some
work) that you have not yet processed?

In case of an error, I will ignore any other results, and let the other goroutines know that their work is no longer needed, so that they can exit (return) as soon as possible.

What kind of error are we talking about ? log.Fatal might be useful

By error I mean a regular err != nil condition. I'm already handling logging.

What I need is a WaitGroup that will wait for *either* all calls to exit with wg.Done() *or* one of them to exit with, say, wg.Fail(), which will also signal to the others that they can exit as soon as they test for it.

I guess I'll just put something together using channels.

Tobia

Konstantin Khomoutov

unread,
Dec 13, 2013, 10:59:13 AM12/13/13
to Tobia, golan...@googlegroups.com
On Fri, 13 Dec 2013 06:42:19 -0800 (PST)
Tobia <tobia.c...@gruppo4.eu> wrote:

> > If you detect an error, what happens to the remaining successful
> > values (i'm guessing your goroutines transmit some value or do some
> > work) that you have not yet processed?
>
> In case of an error, I will ignore any other results, and let the
> other goroutines know that their work is no longer needed, so that
> they can exit (return) as soon as possible.

Then signal them to quit (over their channels, supposedly) then wait
on your wait group as usually.

If your goroutines are being piped series of units of work, then just
close the channel(s) they receive them from and be done with that.
Otherwise you might to pass each goroutine a personal channel on which
you will signal an error condition and then restructure each goroutine
so that them periodically check their error signal channels (probably
using the non-blocking select on that channel) and quit as soon as
they told to do that.

Matt Harden

unread,
Dec 14, 2013, 11:51:58 AM12/14/13
to Konstantin Khomoutov, Tobia, golang-nuts
I find the pattern of signaling "stop" by closing a single channel to be useful. A closed channel can always be read from, so all your goroutines could include a <- stopchan in their selects, and this indicates it's time to shut down. This pattern only needs a single channel rather than one for every goroutine.


Tobia

unread,
Dec 15, 2013, 7:31:57 AM12/15/13
to golan...@googlegroups.com
Here is what I've come up with. It doesn't actually signal to the goroutines to stop, it just ignores their value in case of an error. The normal (no panic) route uses a WaitGroup, so it's performant.

I'm not entirely happy about the second recover() to ignore a send on closed channel, but it does the job. If anybody has any advice, or sees any concurrency problem, please let me know.


// launch N concurrent goroutines and wait for all to finish; if one of them
// panics, stop waiting and ignore any more errors
fail := make(chan interface{})
wg := new(sync.WaitGroup)
wg.Add(N)
for n := 0; n < N; n++ {
go func(n int) {
defer func() {
// if this goroutine panicked, send the panic value on fail channel
defer func() {
// ignore any sends on closed channel (panics other than the first)
// and mark as done
recover()
wg.Done()
}()
if rec := recover(); rec != nil {
// send failure on unbuffered channel before calling Done, to make
// sure main goroutine selects it
fail <- rec
}
}()
// (worker code here)
}(n)
}
// convert Wait into a channel operation, to select on it; using close
// instead of send, as main goroutine could already be gone
done := make(chan bool)
go func() {
wg.Wait()
close(done)
}()
// wait for either all goroutines to finish, or one to send a failure
select {
case <-done:
// all have exited cleanly
case rec := <-fail:
// one has failed; discard subsequent failures and raise (or return) error
close(fail)
panic(rec)
}

Matt Harden

unread,
Dec 15, 2013, 10:20:05 AM12/15/13
to Tobia, golang-nuts
How is this different from just letting the original panic end the program? When the main goroutine panics, the rest of the goroutines don't get a chance to recover.


Tobia

unread,
Dec 15, 2013, 1:33:05 PM12/15/13
to golan...@googlegroups.com, Tobia
On Sunday, December 15, 2013 4:20:05 PM UTC+1, Matt Harden wrote:
How is this different from just letting the original panic end the program? When the main goroutine panics, the rest of the goroutines don't get a chance to recover.

The difference is that a panic handled in this way can be recovered from, by the calling code (the code calling this loop.)

Correct me if I'm wrong, but the calling code has no way of recover from, or even discover the original panic in a goroutine it didn't spawn itself.

Tobia

Matt Harden

unread,
Dec 16, 2013, 8:18:39 AM12/16/13
to Tobia, golang-nuts, Tobia
I see; yes that's true.


--

roger peppe

unread,
Dec 16, 2013, 11:00:47 AM12/16/13
to Tobia, golang-nuts
That's not far off from the use case that launchpad.net/tomb is designed for.
Here's a simple example that does what you described:

http://play.golang.org/p/K-4PEvWy-W

Sol Toure

unread,
Dec 16, 2013, 2:24:41 PM12/16/13
to golang-nuts
Here is a rather contrived example without Waitgroup
http://play.golang.org/p/DaJQmtIruh

roger peppe

unread,
Dec 17, 2013, 5:02:40 AM12/17/13
to Sol Toure, golang-nuts
On 16 December 2013 19:24, Sol Toure <sol...@gmail.com> wrote:
> Here is a rather contrived example without Waitgroup
> http://play.golang.org/p/DaJQmtIruh

That looks a bit weird. Why the loop after calling work?
(BTW you don't need either of your select statements - they
can just be simple statements, and the first one could actually
be phrased as a range loop)

It occurs to me that this is a simpler problem than
I was making it. Here's an example without using WaitGroup
or Tomb:

http://play.golang.org/p/z6zSw5YWa5

Sol Toure

unread,
Dec 17, 2013, 10:12:32 AM12/17/13
to golang-nuts
Yes it's weird. I had the same solution you have. But I wanted to close both channels
Reply all
Reply to author
Forward
0 new messages