A few things:
1) Quit channels are typically implemented with a single channel given out to all listeners; then the "quit" message is sent by closing the channel.
2) Have you SIGQUIT (ctrl+/) the app when it is hung? You will get a nice stack trace explaining what goroutines are alive and where they are blocked and on what.
3) I suspect you have a communication loop; a routine is sending a message to the goroutine that is sending the quits, so they are blocked sending to one another.
The most direct way to fix it is probably something like:
func primary() (err error) {
quit := make(chan bool)
data := make(chan []string)
errch := make(chan error, NumSecondary) // each child can send at most one error
var wg sync.WaitGroup
for i := 0; i < NumSecondary; i++ {
wg.Add(1)
go func() {
defer wg.Done()
secondary(data, errch, quit)
}()
}
go func() {
wg.Wait()
close(data)
}()
for {
select{
case s, ok := <-data:
if !ok {
return err
}
process(s)
case err = <-errch: // got an error, signal children to exit
log.Printf("error: %s", err)
close(quit)
errch = nil // don't get any more errors (would cause double close)
}
}
panic("unreachable")
}
func secondary(data chan []string, errch chan error, quit chan bool) {
ready := time.Tick(1*time.Second) // just used for demonstration
for {
select {
case <-quit:
return
case <-ready:
s, err := getData() // just used for demonstration
if err != nil {
errch <- err
return
}
data <- s
}
}
}
It looks like you have lots of readers pumping into a single writer, though, which doesn't really mesh with your description; in a case like that, all of the readers should generally fail if any of them fail, so you don't need the quit channel in the first place. Doing a blocking operation and an async quit doesn't really work.