compiles test C source files into Go source file and executes the result to check if it compiles and performs correctly. To avoid incorrectly transpilled programs to hang the test, the resulting Go code is executed like this:
ctx, cancel := context.WithTimeout(context.Background(), 60*time.Second)
defer cancel()
return exec.CommandContext(ctx, "go", append([]string{"run", main}, runargs...)...).CombinedOutput()
}()
(
https://gitlab.com/cznic/ccgo/-/blob/b4ea401c94ad40a2cd3146e9c1c5f0ec85f8834b/v3/lib/all_test.go#L976)
For some time I suspected the code does not work properly on some builders:
https://gitlab.com/cznic/ccgo/-/issues/19. I now tried to investigate and wrap my head around what's wrong. Here's a distilled experiment, consisting of four short Go files (also enclosed as an archive for convenience).
jnml@e5-1650:~/tmp$ ls -l
total 16
-rw-r--r-- 1 jnml jnml 77 Oct 12 14:38 crash.go
-rw-r--r-- 1 jnml jnml 112 Oct 12 14:16 hang.go
-rw-r--r-- 1 jnml jnml 740 Oct 12 14:41 main.go
-rw-r--r-- 1 jnml jnml 111 Oct 12 14:30 sleep.go
jnml@e5-1650:~/tmp$ cat main.go
package main
import (
"context"
"fmt"
"os"
"os/exec"
"runtime"
"strings"
"time"
)
func main() {
fmt.Printf("%s, GO111MODULE=%v\n", runtime.Version(), os.Getenv("GO111MODULE"))
run("A", "sleep", "10")
run("B", "go", "run", "sleep.go")
run("C", "go", "run", "crash.go")
run("D", "go", "run", "hang.go")
}
func run(cs, cmd string, args ...string) {
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()
fmt.Printf("\ncase %s: excuting: %s %s\n", cs, cmd, strings.Join(args, " "))
t0 := time.Now()
out, err := exec.CommandContext(ctx, cmd, args...).CombinedOutput()
fmt.Printf("returned in %s\n", time.Since(t0))
if err != nil {
fmt.Printf("FAIL: %s, err: %s\n", out, err)
return
}
}
jnml@e5-1650:~/tmp$ cat crash.go
//go:build ignore
// +build ignore
package main
func main() {
select {}
}
jnml@e5-1650:~/tmp$ cat hang.go
//go:build ignore
// +build ignore
package main
import "net"
var _ net.Interface
func main() {
select {}
}
jnml@e5-1650:~/tmp$ cat sleep.go
//go:build ignore
// +build ignore
package main
import "time"
func main() {
time.Sleep(10 * time.Second)
}
jnml@e5-1650:~/tmp$
Here's the output of executing main.go:
jnml@e5-1650:~/tmp$ go run main.go
go1.17.2, GO111MODULE=
case A: excuting: sleep 10
returned in 1.000413389s
FAIL: , err: signal: killed
case B: excuting: go run sleep.go
returned in 10.130517587s
FAIL: , err: signal: killed
case C: excuting: go run crash.go
returned in 90.458159ms
FAIL: fatal error: all goroutines are asleep - deadlock!
goroutine 1 [select (no cases)]:
main.main()
/home/jnml/tmp/crash.go:7 +0x17
exit status 2
, err: exit status 1
case D: excuting: go run hang.go
^Csignal: interrupt
jnml@e5-1650:~/tmp$
The program was killed by ctrl-c after hanging for more than 10 secs.
Case A is equal to the exec.ContextWithTimeout linked earlier. It works as expected, killing the child process in one second.
Case B also returns an error "signal killed", but only after the child process exits after time.Sleep(10*time.Second). Q1: Why is the child process in this case not killed after the timeout of one second?
Case C works as expected, a crashed child process returns immediately.
Case D differs from case C only in that it imports "net". This was observed recently, after
modernc.org/libc started importing "
github.com/google/uuid", which in turn imports "net". Q2: Why does hang.go _not_ crash?
To verify if using 'go run' may have something to do with it, I later tried also the below and it seems to disprove the hypothesis:
jnml@e5-1650:~/tmp$ go build -o crash crash.go && ./crash
fatal error: all goroutines are asleep - deadlock!
goroutine 1 [select (no cases)]:
main.main()
/home/jnml/tmp/crash.go:7 +0x17
jnml@e5-1650:~/tmp$ go build -o hang hang.go && ./hang
^C
jnml@e5-1650:~/tmp$
Summary: There are two things I either do not understand correctly or that might be bugs:
1) exec.CommandWithContext is not always cancelled in one second as expected for passing it context.WithTimeout(context.Background(), time.Second).
2) The deadlock detector might be confused by merely importing package "net", probably because of (transitively) initialization of that package.
Any insights will be appreciated, thanks in advance.
-j