Any good idea to stop the goroutine as soon as possible?

616 views
Skip to first unread message

E Z

unread,
Jan 11, 2022, 2:47:03 PM1/11/22
to golang-nuts
I'm using golang to implement a task schedule and process system.

I found that golang can't stop the goroutine externally,  we must wait for the goroutine to end itself. I can stop a goroutine through a channel, however, the only time to check the value of the channel is when the select is called, and this must wait until the execution of the business function ends, which can take a long time.

for example:

ticker := time.NewTicker(task.period)
defer ticker.Stop()
for {
    select {
        case <-task.channel:
            return
        case <-ticker.C:
            _ =  task.task.Run(task.logger)
     }
}

The above code is executing in a goroutine, if I want to cancel this goroutine, I can send a signal to task.channel, but the signal only can be retrieved after the task.task.Run is finished, it may be a long time, such as 5 mins. 

Is there any good idea to cancel the goroutine like the above code as soon as possible?

Thanks,
Ethan

Axel Wagner

unread,
Jan 11, 2022, 3:15:27 PM1/11/22
to golang-nuts
The best way to do this is to plumb a context.Context through all long-running functions - in particular, anything talking to the network.
Most RPC and network frameworks provide a way to pass a Context, so consistently doing this will more or less transparently cancel your business logic ASAP.
For purely CPU bound code, this is a bit more awkward, because you indeed have to intersperse code like
select {
    case <-ctx.Done():
        return ctx.Err()
    default:
}
to make the code return early. But that should be relatively rare.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/72dcd177-028e-43a3-aff9-6a6258e29bc4n%40googlegroups.com.

Ian Lance Taylor

unread,
Jan 11, 2022, 4:04:15 PM1/11/22
to Axel Wagner, golang-nuts
On Tue, Jan 11, 2022 at 12:15 PM 'Axel Wagner' via golang-nuts
<golan...@googlegroups.com> wrote:
>
> The best way to do this is to plumb a context.Context through all long-running functions - in particular, anything talking to the network.
> Most RPC and network frameworks provide a way to pass a Context, so consistently doing this will more or less transparently cancel your business logic ASAP.
> For purely CPU bound code, this is a bit more awkward, because you indeed have to intersperse code like
> select {
> case <-ctx.Done():
> return ctx.Err()
> default:
> }
> to make the code return early. But that should be relatively rare.

Yes. See also https://go.dev/blog/context .

Ian

E Z

unread,
Jan 11, 2022, 9:00:33 PM1/11/22
to golang-nuts
Thank you very much.

I understand that we can use context.Context to resolve the network blocking problem in long-running function if the network library support passing a context parameter. 

But for the CPU-bound code,  Is the following implementation mentioned by axel the only way to make a function exit earlier?
select {
    case <-ctx.Done():
        return ctx.Err()
    default:
}
For example, goroutine is executing a task to update a DNS record and then wait some time until the DNS record takes effect in some name servers.  It may take some seconds even minutes to make the DNS record take effect in the name server.
In this case, seems I can't cancel the running goroutine except that we add the above select at every for loop or wait timer, or  I change the design to split these time-consuming operations into different goroutine. Both seem not so good.

Axel Wagner

unread,
Jan 12, 2022, 1:31:32 AM1/12/22
to E Z, golang-nuts
On Wed, Jan 12, 2022 at 3:01 AM E Z <lege...@gmail.com> wrote:
Thank you very much.

I understand that we can use context.Context to resolve the network blocking problem in long-running function if the network library support passing a context parameter. 

But for the CPU-bound code,  Is the following implementation mentioned by axel the only way to make a function exit earlier?

It's not the only way, but it's the way I'd generally recommend. Universally using `context.Context` to signal cancellation solves exactly the problem you where having. Specifically,

> The above code is executing in a goroutine, if I want to cancel this goroutine, I can send a signal to task.channel, but the signal only can be retrieved after the task.task.Run is finished, it may be a long time, such as 5 mins.

If `task.task.Run` takes a `context.Context`, it can exit sooner than after 5 minutes. If it takes that long because it does remote requests, it can propagate the Context itself. If it is CPU-bound, it can check if the Context was cancelled, say, every 1000 iterations (or whatever. What's a reasonable number depends heavily on what it's doing).

But, yes, for such a CPU-bound task, actively checking if it was cancelled via a mechanism like a Context is the only way to be aborted.

For example, goroutine is executing a task to update a DNS record and then wait some time until the DNS record takes effect in some name servers.  It may take some seconds even minutes to make the DNS record take effect in the name server.

To be clear, this is not a CPU-bound process. Updating the DNS record is either a network request/IPC. The waiting is then a loop like

for {
    select {
    case <-ctx.Done():
        return ctx.Err()
    case <-time.After(time.Second()): // simplistic, you'd likely want some jitter and/or exponential backoff here
        if recordHasChanged(ctx) { // network request to check if the DNS record has changed - takes a Context, as it's a network request
            return nil
        }
    }
}

This will spend most of its time sleeping.

A CPU-bound task is something like a diff-operation, which is just an algorithm that can be very slow for large inputs, just because it has a lot of work to churn through.

In this case, seems I can't cancel the running goroutine except that we add the above select at every for loop or wait timer, or  I change the design to split these time-consuming operations into different goroutine. Both seem not so good.

I don't understand why you think this is not good. It seems perfectly reasonable code. But yes, it's what you have to do. Go has no way to asynchronously stop code, you need to manually cancel. And context.Context gives a universal mechanism to do that, which I would recommend using for that.
 

On Tuesday, January 11, 2022 at 1:04:15 PM UTC-8 Ian Lance Taylor wrote:
On Tue, Jan 11, 2022 at 12:15 PM 'Axel Wagner' via golang-nuts
<golan...@googlegroups.com> wrote:
>
> The best way to do this is to plumb a context.Context through all long-running functions - in particular, anything talking to the network.
> Most RPC and network frameworks provide a way to pass a Context, so consistently doing this will more or less transparently cancel your business logic ASAP.
> For purely CPU bound code, this is a bit more awkward, because you indeed have to intersperse code like
> select {
> case <-ctx.Done():
> return ctx.Err()
> default:
> }
> to make the code return early. But that should be relatively rare.

Yes. See also https://go.dev/blog/context .

Ian

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

E Z

unread,
Jan 12, 2022, 3:41:23 AM1/12/22
to Axel Wagner, golang-nuts
Hi Axel, thank you very much, your explanation of this problem is very detailed and solves my question.

What I think it's not so good is that I must add the select at all the places the goroutine is waiting or looping, it makes the execution and scheduling part of the task scheduling system coupling a bit tightly, because I need to implement many different types of tasks, and I have to consider the cancel logic (which I think belongs to scheduling part) explicitly in each type of task executor, instead of leaving it all to the scheduling part. I think it will be better if the task executor can only consider the business logic.

I also tried to abstract this cancel logic into generic processing so that it could be easily applied to different types of tasks, but I didn't find a good way to do this, I had to manually add the code wherever it was needed

jan.f...@gmail.com

unread,
Jan 12, 2022, 5:02:37 AM1/12/22
to golang-nuts
Just a related observation. I don't think you need to have a select statement to check if a context is cancelled. I think it is sufficient to just check if ctx.Err() gives a result different from nil.

From the API documentation for Context (https://pkg.go.dev/context#Context):
// If Done is not yet closed, Err returns nil.
// If Done is closed, Err returns a non-nil error explaining why:
// Canceled if the context was canceled
// or DeadlineExceeded if the context's deadline passed.
// After Err returns a non-nil error, successive calls to Err return the same error.
Err() error

//Jan

Brian Candler

unread,
Jan 12, 2022, 5:05:57 AM1/12/22
to golang-nuts
On Wednesday, 12 January 2022 at 08:41:23 UTC lege...@gmail.com wrote:
What I think it's not so good is that I must add the select at all the places the goroutine is waiting or looping

I don't think you do.  Well-behaved functions which take a Context will return with an error if the context is cancelled - and presumably you're already checking for an error, so it doesn't make any difference.

If you're waiting for communication on a channel, then you *may* need to check for context.  But if you're reading from a channel, and the context propagates to the other goroutine, and the other goroutine closes the channel, then that's not an issue either.

When it becomes a bit awkward is with IO on sockets, since io.Reader/io.Writer don't have a context.  There is a long discussion at #20280, and I made some notes here.  The pattern which was explained to me is that you can have a separate thread which waits for the context close, and then sets a read or write deadline on the IO; that will cause an ongoing operation to terminate immediately.

I don't know if there's a DNS client library which implements this pattern. 

Axel Wagner

unread,
Jan 12, 2022, 5:19:41 AM1/12/22
to golang-nuts
On Wed, Jan 12, 2022 at 11:03 AM jan.f...@gmail.com <jan.f...@gmail.com> wrote:
Just a related observation. I don't think you need to have a select statement to check if a context is cancelled. I think it is sufficient to just check if ctx.Err() gives a result different from nil.

I believe you are right, the select-statement I wrote should be equivalent to just `if err := ctx.Err(); err != nil { return err }`.
 

Brian Candler

unread,
Jan 12, 2022, 5:23:48 AM1/12/22
to golang-nuts
Based on this stackoverflow answer:

-----
package main

import (
        "context"
        "fmt"
        "net"
        "time"
)

const DNS_SERVER = "1.2.3.4:53"
const TIMEOUT = 3 * time.Second

func main() {
        main_context, main_cancel := context.WithCancel(context.Background())
        go func() {
                time.Sleep(TIMEOUT)
                main_cancel()
        }()

        r := &net.Resolver{
                PreferGo: true,
                Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
                        d := net.Dialer{
                                Timeout: 2 * time.Second,
                        }
                        return d.DialContext(ctx, network, DNS_SERVER)
                },
        }
        ip, err := r.LookupHost(main_context, "www.google.com")
        fmt.Printf("%v (err: %v)\n", ip, err)
}
-----

What I find on macOS:

[] (err: lookup www.google.com on 10.12.0.1:53: dial udp 1.2.3.4:53: operation was canceled)

10.12.0.1 is the "default" DNS server on this network, and the custom dialer is overriding this.

The program actually aborts after *5* seconds, which is also the interval between retries if the overall context is not cancelled (the total time to give up is 20 seconds in that case).  So the net.Dialer timeout doesn't work in the way you might expect, and the UDP exchange isn't immediately cancelled when the context expires; but it *does* terminate with an "operation was canceled" error without further retries.

E Z

unread,
Jan 12, 2022, 9:14:36 PM1/12/22
to golang-nuts
ctx.Err() really simplifies the whole process. I'll use it to optimize my task executor, thanks.

On Wednesday, 12 January 2022 at 10:05:57 UTC Brian Candler wrote:
On Wednesday, 12 January 2022 at 08:41:23 UTC lege...@gmail.com wrote:
What I think it's not so good is that I must add the select at all the places the goroutine is waiting or looping

I don't think you do.  Well-behaved functions which take a Context will return with an error if the context is cancelled - and presumably you're already checking for an error, so it doesn't make any difference.

Because I didn't use context as parameters when I execute my tasks before, so I need to add it into some functions and check ctx.Err() when necessary, It will take some work, but I believe it's the best method I can think of so far based on the discussion above.

I can understand that some blocking points in the I/O process cannot be canceled, this is an acceptable delay.
Reply all
Reply to author
Forward
0 new messages