SIGSEGV: segmentation violation with go test -race

334 views
Skip to first unread message

John

unread,
Jan 15, 2025, 9:49:22 PMJan 15
to golang-nuts
I'm running into a fault now when trying to run `go test -race`/

I get the same when I disable CGO (because it lists a CGO signal in the fault): 
`CGO_ENABLED=0 go test -c -race`

I've attempted to build a binary out of it and run it, with the same results.  This is not happening on all tests I attempt, just some tests.  

I don' know of any C code I'm using, but I am importing various OTEL packages, which might have CGO somewhere.  

I'm not sure where to start debugging this, the internet has pointed to other issues similar to this, not exact and none with circumstances that are the same.

Here is the fault:

SIGSEGV: segmentation violation
PC=0x10012c23c m=0 sigcode=2 addr=0x10
signal arrived during cgo execution

goroutine 1 gp=0xc0000021c0 m=0 mp=0x102ce76e0 [syscall, locked to thread]:
runtime.cgocall(0x10182f3e0, 0x0)
        /usr/local/go/src/runtime/cgocall.go:167 +0x58 fp=0xc000095f40 sp=0xc000095f00 pc=0x1001dd068
runtime.main()
        /usr/local/go/src/runtime/proc.go:243 +0x210 fp=0xc000095fd0 sp=0xc000095f40 pc=0x1001a9dd0
runtime.goexit({})
        /usr/local/go/src/runtime/asm_arm64.s:1223 +0x4 fp=0xc000095fd0 sp=0xc000095fd0 pc=0x1001ec9b4

goroutine 18 gp=0xc000102380 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:424 +0xc8 fp=0xc000080790 sp=0xc000080770 pc=0x1001e3c78
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:430
runtime.forcegchelper()
        /usr/local/go/src/runtime/proc.go:337 +0xb8 fp=0xc0000807d0 sp=0xc000080790 pc=0x1001aa1b8
runtime.goexit({})
        /usr/local/go/src/runtime/asm_arm64.s:1223 +0x4 fp=0xc0000807d0 sp=0xc0000807d0 pc=0x1001ec9b4
created by runtime.init.7 in goroutine 1
        /usr/local/go/src/runtime/proc.go:325 +0x24

goroutine 19 gp=0xc000102540 m=nil [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
        /usr/local/go/src/runtime/proc.go:424 +0xc8 fp=0xc000096f60 sp=0xc000096f40 pc=0x1001e3c78
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:430
runtime.bgsweep(0xc000112000)
        /usr/local/go/src/runtime/mgcsweep.go:277 +0xa0 fp=0xc000096fb0 sp=0xc000096f60 pc=0x100191ea0
runtime.gcenable.gowrap1()
        /usr/local/go/src/runtime/mgc.go:204 +0x28 fp=0xc000096fd0 sp=0xc000096fb0 pc=0x100185df8
runtime.goexit({})
        /usr/local/go/src/runtime/asm_arm64.s:1223 +0x4 fp=0xc000096fd0 sp=0xc000096fd0 pc=0x1001ec9b4
created by runtime.gcenable in goroutine 1
        /usr/local/go/src/runtime/mgc.go:204 +0x6c

goroutine 20 gp=0xc000102700 m=nil [GC scavenge wait]:
runtime.gopark(0xc000112000?, 0x101cb09d8?, 0x1?, 0x0?, 0xc000102700?)
        /usr/local/go/src/runtime/proc.go:424 +0xc8 fp=0xc000090f60 sp=0xc000090f40 pc=0x1001e3c78
runtime.goparkunlock(...)
        /usr/local/go/src/runtime/proc.go:430
runtime.(*scavengerState).park(0x102ce6000)
        /usr/local/go/src/runtime/mgcscavenge.go:425 +0x5c fp=0xc000090f90 sp=0xc000090f60 pc=0x10018f89c
runtime.bgscavenge(0xc000112000)
        /usr/local/go/src/runtime/mgcscavenge.go:653 +0x44 fp=0xc000090fb0 sp=0xc000090f90 pc=0x10018fde4
runtime.gcenable.gowrap2()
        /usr/local/go/src/runtime/mgc.go:205 +0x28 fp=0xc000090fd0 sp=0xc000090fb0 pc=0x100185d98
runtime.goexit({})
        /usr/local/go/src/runtime/asm_arm64.s:1223 +0x4 fp=0xc000090fd0 sp=0xc000090fd0 pc=0x1001ec9b4
created by runtime.gcenable in goroutine 1
        /usr/local/go/src/runtime/mgc.go:205 +0xac

r0      0x0
r1      0x10182f3f0
r2      0xc000095ef0
r3      0x102ce6a80
r4      0x110
r5      0xc000095000
r6      0x1
r7      0x0
r8      0x102ce76e0
r9      0x10012c22c
r10     0x102ce76e0
r11     0x102ce6a80
r12     0x1000000000000000
r13     0x16fcd5e70
r14     0xffffff0000000000
r15     0x4
r16     0xc0000956c0
r17     0x206536cc0
r18     0x0
r19     0x10182f3f0
r20     0x0
r21     0x16fcd5e50
r22     0x102d1a604
r23     0x16fcd5fd8
r24     0x19429e000
r25     0x0
r26     0x102009c88
r27     0x102d1a000
r28     0x102ce6a80
r29     0x16fcd5de8
lr      0x1001ed7b8
sp      0x16fcd5dc0
pc      0x10012c23c
fault   0x10

My environment:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/jdoak/Library/Caches/go-build'
GOENV='/Users/jdoak/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/jdoak/go/pkg/mod'
GONOPROXY='none'
GONOSUMDB='[keeping private]'
GOOS='darwin'
GOPATH='[keeping private]'
GOPRIVATE='[keeping private]'
GOPROXY='[keeping private]|https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.4'
GODEBUG=''
GOTELEMETRY='on'
GOTELEMETRYDIR='/Users/jdoak/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='0'
GOMOD='[keeping private]'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/var/folders/rd/hbhb8s197633_f8ncy6fmpqr0000gn/T/go-build3846121577=/tmp/go-build -gno-record-gcc-switches -fno-common'

Any help on where to start debugging this is appreciated.

Kurtis Rader

unread,
Jan 15, 2025, 10:53:40 PMJan 15
to John, golang-nuts
That your failure backtrace only has Go garbage collection threads and your main thread is slightly surprising. It does look like your main function is calling C code that is causing the SIGSEGV. Can you show us your "main" package? You say you're using "various OTEL packages". Which ones specifically? I'm guessing you're using https://github.com/open-telemetry/opentelemetry-go but we shouldn't have to guess. I haven't used that package but it would not surprise me if it created OS threads that run C code. In this type of situation the most likely cause is a mistake in your code. Second most likely is a bug in a third-party package you're using. A distant third is a bug in the Go runtime, the OS, etc.

You also wrote "I've attempted to build a binary out of it and run it, with the same results." But it's unclear what you did. I interpret that sentence to mean you did "go build" and ran the resulting binary rather than running unit tests via "go test". Can you clarify what you mean by "build a binary out of it and run it"?

On Wed, Jan 15, 2025 at 6:49 PM John <johns...@gmail.com> wrote:
I'm running into a fault now when trying to run `go test -race`/

I get the same when I disable CGO (because it lists a CGO signal in the fault): 
`CGO_ENABLED=0 go test -c -race`

I've attempted to build a binary out of it and run it, with the same results.  This is not happening on all tests I attempt, just some tests.  

I don' know of any C code I'm using, but I am importing various OTEL packages, which might have CGO somewhere.  

I'm not sure where to start debugging this, the internet has pointed to other issues similar to this, not exact and none with circumstances that are the same.

Here is the fault:

SIGSEGV: segmentation violation
PC=0x10012c23c m=0 sigcode=2 addr=0x10
signal arrived during cgo execution
 
--
Kurtis Rader
Caretaker of the exceptional canines Junior and Hank

John

unread,
Jan 15, 2025, 11:30:27 PMJan 15
to golang-nuts
Hey Kurtis,

Thanks for responding.

Unfortunately, this does look like some type of OTEL problem.  I was able to make a copy and strip out all the OTEL code.  As soon as I did this, this stopped happening.  Which means it is some type of OTEL issue that I should probably track down with the OTEL people.  

As a note for someone who stumbles on this with a similar problem,  the OTEL packages included:


These packages are at v1.33.0

If and when I track down the cause, I'll post an update here.

Thanks again Kurtis!

Ian Lance Taylor

unread,
Jan 16, 2025, 12:06:16 AMJan 16
to John, golang-nuts
On Wed, Jan 15, 2025 at 6:49 PM John <johns...@gmail.com> wrote:
>
> I'm running into a fault now when trying to run `go test -race`/
>
> I get the same when I disable CGO (because it lists a CGO signal in the fault):
> `CGO_ENABLED=0 go test -c -race`
>
> I've attempted to build a binary out of it and run it, with the same results. This is not happening on all tests I attempt, just some tests.
>
> I don' know of any C code I'm using, but I am importing various OTEL packages, which might have CGO somewhere.
>
> I'm not sure where to start debugging this, the internet has pointed to other issues similar to this, not exact and none with circumstances that are the same.
>
> Here is the fault:
>
> SIGSEGV: segmentation violation
> PC=0x10012c23c m=0 sigcode=2 addr=0x10
> signal arrived during cgo execution
>
> goroutine 1 gp=0xc0000021c0 m=0 mp=0x102ce76e0 [syscall, locked to thread]:
> runtime.cgocall(0x10182f3e0, 0x0)
> /usr/local/go/src/runtime/cgocall.go:167 +0x58 fp=0xc000095f40 sp=0xc000095f00 pc=0x1001dd068
> runtime.main()
> /usr/local/go/src/runtime/proc.go:243 +0x210 fp=0xc000095fd0 sp=0xc000095f40 pc=0x1001a9dd0
> runtime.goexit({})
> /usr/local/go/src/runtime/asm_arm64.s:1223 +0x4 fp=0xc000095fd0 sp=0xc000095fd0 pc=0x1001ec9b4

For what it's worth, this is a cgo call made by the Go runtime at
startup time for a program that uses cgo. I don't know how it could
happen if CGO_ENABLED=0. It's a simple call that is made after all
init functions are run. It's hard to see how it could crash. It might
help to run the program under the debugger and look at the
instructions and stack at the point of the crash.

Ian

John Doak

unread,
Jan 16, 2025, 12:21:34 AMJan 16
to Ian Lance Taylor, golang-nuts
Hey Ian,

Certainly willing to fire up a debugger, but I’m assuming you don’t mean delve and mean GDB for this?

I don’t generally use debuggers with Go, just making sure I am going down the right path.

> On Jan 15, 2025, at 9:05 PM, Ian Lance Taylor <ia...@golang.org> wrote:

Kurtis Rader

unread,
Jan 16, 2025, 12:41:47 AMJan 16
to John, golang-nuts
On Wed, Jan 15, 2025 at 8:31 PM John <johns...@gmail.com> wrote:
Hey Kurtis,

Thanks for responding.

Unfortunately, this does look like some type of OTEL problem.  I was able to make a copy and strip out all the OTEL code.  As soon as I did this, this stopped happening.  Which means it is some type of OTEL issue that I should probably track down with the OTEL people.  

As a note for someone who stumbles on this with a similar problem,  the OTEL packages included:


These packages are at v1.33.0

Note that simply removing the references to the above mentioned OTEL package does not guarantee the problem is with that package. The failure could still be due to how you are using the package. Having said that, any public package should validate its inputs and provide a more meaningful failure than a SIGSEGV fault. So even if the proximate cause of the failure is a mistake in your code there is clearly room for improvement in the package you are using.

As a retired software support engineer who has spent thousands of hours debugging these types of problems I can't stress how important it is to create a minimal reproducible example as the quickest way to get to the root cause of the problem. A minimal reproducible example will allow others, such as the OTEL package maintainers, to employ tools, such as gdb or lldb, which you may not be comfortable using.

John

unread,
Jan 16, 2025, 2:02:37 AMJan 16
to golang-nuts
Thanks Kurtis for the advice.  I was heading in that direction.

This is definitely an OTEL problem.  The minimal version required to create the issue:

metrics.go
```go
```

metrics_test.go
```go
package metrics
```

`go test -race`

That will immediately cause the issue.  You don't even require tests, it fails before it even gets there.

I'll make my way over to the OTEL bugs tomorrow.  

For those that are interested in some random debugger output, here is a little from lldb and delve (which let's me see they are calling C from purego):

Process 58447 launched: '/Users/jdoak/base/concurrency/sync/sync.test' (arm64)
warning: (arm64) /Users/jdoak/base/concurrency/sync/sync.test(0x0000000100000000) address 0x0000000100000000 maps to more than one section: sync.test.__TEXT and sync.test.__TEXT
warning: (arm64) /Users/jdoak/base/concurrency/sync/sync.test(0x0000000100000000) address 0x0000000101bbc000 maps to more than one section: sync.test.__DATA_CONST and sync.test.__DATA_CONST
warning: (arm64) /Users/jdoak/base/concurrency/sync/sync.test(0x0000000100000000) address 0x0000000102b18000 maps to more than one section: sync.test.__DATA and sync.test.__DATA
Process 58447 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
    frame #0: 0x000000010000423c sync.test`__tsan_func_enter + 16
sync.test`__tsan_func_enter:
->  0x10000423c <+16>: ldr    x8, [x0, #0x10]
    0x100004240 <+20>: add    w9, w8, #0x8
    0x100004244 <+24>: tst    x9, #0xff0
    0x100004248 <+28>: b.eq   0x1000042a0    ; <+116>
Target 0: (sync.test) stopped.
(lldb) thread backtrace all
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
  * frame #0: 0x000000010000423c sync.test`__tsan_func_enter + 16
    frame #1: 0x0000000101706e34 sync.test`github.com/ebitengine/purego/internal/fakecgo.x_cgo_notify_runtime_init_done + 20
    frame #2: 0x00000001017073f0 sync.test`x_cgo_notify_runtime_init_done_trampoline + 16
  thread #2
    frame #0: 0x00000001945e64e8 libsystem_kernel.dylib`__semwait_signal + 8
    frame #1: 0x00000001944c56f0 libsystem_c.dylib`nanosleep + 220
    frame #2: 0x00000001944c5608 libsystem_c.dylib`usleep + 68
    frame #3: 0x00000001000c6304 sync.test`runtime.usleep_trampoline.abi0 + 20
  thread #3
    frame #0: 0x00000001945e66ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000194624894 libsystem_pthread.dylib`_pthread_cond_wait + 1204
    frame #2: 0x00000001000c6688 sync.test`runtime.pthread_cond_wait_trampoline.abi0 + 24
    frame #3: 0x00000001000c4838 sync.test`runtime.asmcgocall.abi0 + 200
  thread #4
    frame #0: 0x00000001945e66ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000194624894 libsystem_pthread.dylib`_pthread_cond_wait + 1204
    frame #2: 0x00000001000c6688 sync.test`runtime.pthread_cond_wait_trampoline.abi0 + 24
    frame #3: 0x00000001000c4838 sync.test`runtime.asmcgocall.abi0 + 200
  thread #5
    frame #0: 0x00000001945e66ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000194624894 libsystem_pthread.dylib`_pthread_cond_wait + 1204
    frame #2: 0x00000001000c6688 sync.test`runtime.pthread_cond_wait_trampoline.abi0 + 24
    frame #3: 0x00000001000c4838 sync.test`runtime.asmcgocall.abi0 + 200
  thread #6
    frame #0: 0x00000001945e66ec libsystem_kernel.dylib`__psynch_cvwait + 8
    frame #1: 0x0000000194624894 libsystem_pthread.dylib`_pthread_cond_wait + 1204
    frame #2: 0x00000001000c6688 sync.test`runtime.pthread_cond_wait_trampoline.abi0 + 24
    frame #3: 0x00000001000c4838 sync.test`runtime.asmcgocall.abi0 + 200
   
   
   
(dlv) continue
> [runtime-fatal-throw] runtime.fatalsignal() /usr/local/go/src/runtime/signal_unix.go:831 (hits goroutine(1):1 total:1) (PC: 0x104f027bc)
Warning: debugging optimized function
   826:         printDebugLog()
   827:
   828:         exit(2)
   829: }
   830:
=> 831: func fatalsignal(sig uint32, c *sigctxt, gp *g, mp *m) *g {
   832:         if sig < uint32(len(sigtable)) {
   833:                 print(sigtable[sig].name, "\n")
   834:         } else {
   835:                 print("Signal ", sig, "\n")
   836:         }
(dlv) stack
0  0x0000000104f027bc in runtime.fatalsignal
   at /usr/local/go/src/runtime/signal_unix.go:831
1  0x0000000104f02390 in runtime.sighandler
   at /usr/local/go/src/runtime/signal_unix.go:754
2  0x0000000104f01cac in runtime.sigtrampgo
   at /usr/local/go/src/runtime/signal_unix.go:490
3  0x0000000104e6c23c in ???
   at ?:-1
4  0x0000000106569974 in github.com/ebitengine/purego/internal/fakecgo.x_cgo_notify_runtime_init_done
   at /Users/jdoak/go/pkg/mod/github.com/ebitengine/pur...@v0.8.1/internal/fakecgo/go_libinit.go:22
5  0x000000016af95d88 in ???
   at ?:-1
6  0x0000000104f2cadc in runtime.asmcgocall
   at /usr/local/go/src/runtime/asm_arm64.s:1000
7  0x0000000104f2daa8 in racecall
   at /usr/local/go/src/runtime/race_arm64.s:476
8  0x0000000000000000 in ???
   at :0
   error: NULL address
(truncated)

John

unread,
Jan 16, 2025, 2:13:28 AMJan 16
to golang-nuts
Reply all
Reply to author
Forward
0 new messages