Managing perpetually running goroutines

987 views
Skip to first unread message

sansaid

unread,
Aug 29, 2021, 10:02:01 PM8/29/21
to golang-nuts
Hello,

Does anybody know of some reference code or common patterns I can use to keep track of worker goroutines? For context, I want a user to be able to issue a "stop" command using a CLI tool which will prompt my server to gracefully terminate one of my workers. 

For example, when a user issues a "start" command through a CLI tool, this will signal my server to spawn a goroutine that perpetually runs and polls an HTTP endpoint until a user initiates a "stop" command. When the user issues a "stop" command through the same CLI tool, I want the server to be able to signal the goroutine to stop. Any reference code would be much appreciated!

One approach I thought of was by passing a `context.WithCancel()` and holding a reference to the cancel function in a global map (with the worker ID as keys). When the user issues a "stop" command against the worker ID, another function is executed which calls the context's cancel function. Example code below (appreciate this is horrendously breaking a lot of rules, but I want to focus only on the elements around passing the cancel function around - happy to discuss anything else that is of concern to someone though):

```
// invoked when a CLI start subcommand is issued to my main CLI tool
func (w *WorkerGroup) Start(ctx context.Context, id string) {
  _, cancel := context.WithCancel(ctx)

  w.Workers.Lock()
  w.Workers[id] := cancel
  w.Workers.Unlock()

  go startWorker(id)
}

// invoked when a CLI stop subcommand is issued to my main CLI tool
func (w *WorkerGroup) Stop(id string) {
  cancelWorker := w.Workers[id]

  cancelWorker()
}
``` 

I have read that passing a context's cancel function around is a bad idea, but in an example like this, is it justified?

Thanks in advance to any help!

Ian Davis

unread,
Aug 30, 2021, 5:01:08 AM8/30/21
to golan...@googlegroups.com
Passing the cancel function is fine and storing it for use later is fine too.

Brian Candler

unread,
Aug 30, 2021, 7:19:46 AM8/30/21
to golang-nuts
Also: you can use the same context for all the workers (if that's appropriate); or if you want each worker to have a different context, you can make those contexts children of the same top-level context.  In that case, you only need to cancel one context and all the workers will stop, without having to keep a map of them.

Do bear in mind that you may wish to wait for the workers to finish what they're currently doing, before the program exits.  One way to do that is with a sync.WaitGroup, which you increment before starting each worker, and each worker decrements just before it terminates.  Then in your main program you wait on the WaitGroup.

Sanyia Saidova

unread,
Aug 30, 2021, 7:29:26 AM8/30/21
to Brian Candler, golang-nuts
Appreciated, thanks Brian and Ian.

On the subject of sync.WaitGroups, I haven't tackled this yet for my use case, but will use the opportunity to brainstorm some options, if that's okay. I want to be able to terminate each worker independently of others (i.e. close on worker, wait for it to finish its tasks, but allow other workers to run if they haven't been signalled to stop). Should I create a wait group per worker and store that in a map? This sounds bloated to me, especially since it's always going to be a wait group of 1, but I haven't come across better alternatives yet (relatively new to concurrency myself).

If that's too broad a question, can come back once I've explored some options.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/fdff94c4-c0e4-4418-abb3-eba91298fbb7n%40googlegroups.com.

Juliusz Chroboczek

unread,
Aug 30, 2021, 7:51:33 AM8/30/21
to golan...@googlegroups.com
> I want to be able to terminate each worker independently of others
> (i.e. close on worker, wait for it to finish its tasks, but allow
> other workers to run if they haven't been signalled to stop).

The solution I use is to have each worker take two parameters:

- a channel that will be closed by the main program in order to signal
that the worker should terminate;
- a channel that will be closd by the worker when it's done.

So the worker looks like this:

func worker(terminate chan struct{}, done chan struct{}) {
defer close(done)
outer:
for {
work();
select {
case <-done: break outer
default:
}
}
}

and the main program looks like this:

terminate := make(chan struct{})
done := make(chan struct{})
go worker(terminate, done)
...
close(terminate)
<-done

The terminate channel can be replaced by a context, which is useful if
you have a tree of workers to manage:

terminate, terminateFn := context.WithCancel(context.Background())
done := make(chan struct{})
go worker(terminate, done)
...
terminateFn()
<-done

Or perhaps with a timeout in case a worker gets stuck:

terminate, terminateFn := context.WithCancel(context.Background())
done := make(chan struct{})
go worker(terminate, done)
...
terminateFn()
timeout := time.NewTimer(5 * time.Second)
select {
case <-done:
timeout.Cancel()
case <-timeout.C:
log.Println("Worker failed to terminate within 5s")
}

Brian Candler

unread,
Aug 30, 2021, 7:52:59 AM8/30/21
to golang-nuts
For that case, I'd suggest you pass a channel to each worker.  It can either close the channel, or send a specific message down it, to indicate its termination.

This may also be an X-Y problem.  Generally speaking, "worker pools" are an anti-pattern in Go.  The startup and termination overhead of a goroutine is so tiny that often it's better just to start a goroutine when there's work to be done, and let it run to completion.  The other reason you might want to have a worker pool is to limit concurrency, but there are better ways to do that in Go: e.g. you can have a buffered channel with N slots, which the goroutine writes to before starting work and reads from when it's finished. That will limit your concurrency to N tasks.

Bryan C Mills discusses this in detail in this excellent video on "Rethinking Classical Concurrency Patterns":
It took me several views with pauses to fully understand it, but it was worth the effort :-)

Bryan C. Mills

unread,
Aug 31, 2021, 3:21:40 PM8/31/21
to golang-nuts
For the specific case of managing long-running polling- or pubsub-based tasks, I would suggest basically the same pattern that others on the thread have already given: use a `sync.Map` to associate each task ID with a small struct containing a `context.CancelFunc` and a channel that can be used to confirm when the task has actually stopped (https://play.golang.org/p/Gq3SXuPFDvi).

That way you can have the invariant that after a successful call to Stop, the worker has completely finished all of its work. That property is extremely useful, for example, in writing tests and benchmarks for your implementation: if you can guarantee that the workers have actually stopped by the time Stop returns, your tests can detect unexpected deadlocks and your benchmarks can measure a controlled, well-defined amount of work.




sanyia....@gmail.com

unread,
Sep 1, 2021, 5:55:49 PM9/1/21
to Bryan C. Mills, golang-nuts

Thanks Bryan, especially for the code example.


Would using a sync.WaitGroup in place of the `done` channel in your example be overkill? Just exploring what’s possible here for my own education.

Bakul Shah

unread,
Sep 1, 2021, 9:08:33 PM9/1/21
to sanyia....@gmail.com, golang-nuts
Suggest you look at https://go.dev/blog/pipelines 
And check out the two videos recommended at the end of this article.
Lots of interesting articles in the blog for you to chew on!

-- Bakul

Bryan C. Mills

unread,
Sep 2, 2021, 10:49:58 AM9/2/21
to sanyia....@gmail.com, golang-nuts
On Wed, Sep 1, 2021 at 5:55 PM <sanyia....@gmail.com> wrote:

Thanks Bryan, especially for the code example.


Would using a sync.WaitGroup in place of the `done` channel in your example be overkill? Just exploring what’s possible here for my own education.


Using a `sync.WaitGroup` in place of the `done` channel would be correct, but perhaps less useful: it's about the same amount of code, but you can `select` on the channel but not on the WaitGroup.

(That said, a WaitGroup would avoid a pointer indirection: channels are reference types, whereas WaitGroups are value types. That difference is the motivation behind my proposal in https://golang.org/issue/28366.)

Sanyia Saidova

unread,
Sep 2, 2021, 8:47:55 PM9/2/21
to Bryan C. Mills, golang-nuts
Awesome, thanks so much for the help here! Learned a lot from the input and code examples. :)

Much appreciated!
Reply all
Reply to author
Forward
0 new messages