exec.CommandContext questions

522 views
Skip to first unread message

Jan Mercl

unread,
Oct 12, 2021, 9:26:43 AM10/12/21
to golang-nuts
modernc.org/ccgo/v3 compiles test C source files into Go source file and executes the result to check if it compiles and performs correctly. To avoid incorrectly transpilled programs to hang the test, the resulting Go code is executed like this:

        out, err := func() ([]byte, error) {
                ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
                defer cancel()
       
                return exec.CommandContext(ctx, "go", append([]string{"run", main}, runargs...)...).CombinedOutput()
        }()

(https://gitlab.com/cznic/ccgo/-/blob/b4ea401c94ad40a2cd3146e9c1c5f0ec85f8834b/v3/lib/all_test.go#L976)

The code is lifted almost verbatim from the example here: https://pkg.go.dev/os/ex...@go1.17.2#example-CommandContext

For some time I suspected the code does not work properly on some builders: https://gitlab.com/cznic/ccgo/-/issues/19. I now tried to investigate and wrap my head around what's wrong. Here's a distilled experiment, consisting of four short Go files (also enclosed as an archive for convenience).

        jnml@e5-1650:~/tmp$ ls -l      
        total 16
        -rw-r--r-- 1 jnml jnml  77 Oct 12 14:38 crash.go
        -rw-r--r-- 1 jnml jnml 112 Oct 12 14:16 hang.go
        -rw-r--r-- 1 jnml jnml 740 Oct 12 14:41 main.go
        -rw-r--r-- 1 jnml jnml 111 Oct 12 14:30 sleep.go
        jnml@e5-1650:~/tmp$ cat main.go
        package main
       
        import (
                "context"
                "fmt"
                "os"
                "os/exec"
                "runtime"
                "strings"
                "time"
        )
       
        func main() {
                fmt.Printf("%s, GO111MODULE=%v\n", runtime.Version(), os.Getenv("GO111MODULE"))
                run("A", "sleep", "10")
                run("B", "go", "run", "sleep.go")
                run("C", "go", "run", "crash.go")
                run("D", "go", "run", "hang.go")
        }
       
        func run(cs, cmd string, args ...string) {
                ctx, cancel := context.WithTimeout(context.Background(), time.Second)
       
                defer cancel()
       
                fmt.Printf("\ncase %s: excuting: %s %s\n", cs, cmd, strings.Join(args, " "))
                t0 := time.Now()
                out, err := exec.CommandContext(ctx, cmd, args...).CombinedOutput()
                fmt.Printf("returned in %s\n", time.Since(t0))
                if err != nil {
                        fmt.Printf("FAIL: %s, err: %s\n", out, err)
                        return
                }
        }
        jnml@e5-1650:~/tmp$ cat crash.go
        //go:build ignore
        // +build ignore
       
        package main
       
        func main() {
                select {}
        }
        jnml@e5-1650:~/tmp$ cat hang.go
        //go:build ignore
        // +build ignore
       
        package main
       
        import "net"
       
        var _ net.Interface
       
        func main() {
                select {}
        }
        jnml@e5-1650:~/tmp$ cat sleep.go
        //go:build ignore
        // +build ignore
       
        package main
       
        import "time"
       
        func main() {
                time.Sleep(10 * time.Second)
        }
        jnml@e5-1650:~/tmp$ 

Here's the output of executing main.go:

        jnml@e5-1650:~/tmp$ go run main.go
        go1.17.2, GO111MODULE=
       
        case A: excuting: sleep 10
        returned in 1.000413389s
        FAIL: , err: signal: killed
       
        case B: excuting: go run sleep.go
        returned in 10.130517587s
        FAIL: , err: signal: killed
       
        case C: excuting: go run crash.go
        returned in 90.458159ms
        FAIL: fatal error: all goroutines are asleep - deadlock!
       
        goroutine 1 [select (no cases)]:
        main.main()
        /home/jnml/tmp/crash.go:7 +0x17
        exit status 2
        , err: exit status 1
       
        case D: excuting: go run hang.go
        ^Csignal: interrupt
        jnml@e5-1650:~/tmp$

The program was killed by ctrl-c after hanging for more than 10 secs. 

Case A is equal to the exec.ContextWithTimeout linked earlier. It works as expected, killing the child process in one second.

Case B also returns an error "signal killed", but only after the child process exits after time.Sleep(10*time.Second). Q1: Why is the child process in this case not killed after the timeout of one second?

Case C works as expected, a crashed child process returns immediately.

Case D differs from case C only in that it imports "net". This was observed recently, after modernc.org/libc started importing "github.com/google/uuid", which in turn imports "net". Q2: Why does hang.go _not_ crash?

To verify if using 'go run' may have something to do with it, I later tried also the below and it seems to disprove the hypothesis:

        jnml@e5-1650:~/tmp$ go build -o crash crash.go && ./crash
        fatal error: all goroutines are asleep - deadlock!
       
        goroutine 1 [select (no cases)]:
        main.main()
        /home/jnml/tmp/crash.go:7 +0x17
        jnml@e5-1650:~/tmp$ go build -o hang hang.go && ./hang
        ^C
        jnml@e5-1650:~/tmp$

Summary: There are two things I either do not understand correctly or that might be bugs:

1)  exec.CommandWithContext is not always cancelled in one second as expected for passing it context.WithTimeout(context.Background(), time.Second).

2) The deadlock detector might be confused by merely importing package "net", probably because of (transitively) initialization of that package.

Any insights will be appreciated, thanks in advance.

-j



src.tar.gz

Brian Candler

unread,
Oct 12, 2021, 10:25:06 AM10/12/21
to golang-nuts
I'm not 100% sure this is your issue, but the problem I found before with exec.CommandContext is that although it kills the direct child, that doesn't necessarily kill all of its descendants.  Since you're running the go compiler as a child ("go run xxx.go"), presumably that in turn is running the build executable as its own child.


My workaround was to create a process group and kill that.  This is Linux-specific, or at least Unix-specific.
(Doesn't work in the playground, try running it locally)

Brian Candler

unread,
Oct 12, 2021, 10:35:29 AM10/12/21
to golang-nuts
I think that's it.  New version of main.go:

It stops after 1 second in each case.

Jan Mercl

unread,
Oct 12, 2021, 10:47:15 AM10/12/21
to Brian Candler, golang-nuts
Wonderful! Thank you very much. I hope to have permission to use your
code, do I? I suggest sending a PR that adds you to the AUTHORS and/or
CONTRIBUTORS files:
https://gitlab.com/cznic/ccgo/-/merge_requests/new, thanks.

-j

Brian Candler

unread,
Oct 12, 2021, 10:55:48 AM10/12/21
to golang-nuts
You're more than welcome to use it - it's nothing more than a snippet that reimplements one line of cmd.Start.

Actually, checking that source code: at your side you should call exec.Command not exec.CommandContext.  That is:

        execcmd := exec.Command(cmd, args...)
        out, err := _combinedoutput(ctx, execcmd)

Otherwise there could be two bits of code both trying to kill the process.

Jan Mercl

unread,
Oct 12, 2021, 11:31:54 AM10/12/21
to Brian Candler, golang-nuts
On Tue, Oct 12, 2021 at 4:56 PM Brian Candler <b.ca...@pobox.com> wrote:

> You're more than welcome to use it - it's nothing more than a snippet that reimplements one line of cmd.Start.
>
> Actually, checking that source code: at your side you should call exec.Command not exec.CommandContext. That is:
>
> execcmd := exec.Command(cmd, args...)
> out, err := _combinedoutput(ctx, execcmd)
>
> Otherwise there could be two bits of code both trying to kill the process.

Thanks again: https://gitlab.com/cznic/ccgo/-/commit/3b7e0ed8ebf8cbdf7877f6fc49f62fa5d0acdc1b
Reply all
Reply to author
Forward
0 new messages