Odd runtime errors

Jakob Borg

unread,

Aug 25, 2019, 4:08:42 AM8/25/19

to golang-nuts

Hi all,

We develop an open source program for consumers that has a reasonably large usage within its niche, on a mix of operating systems and platforms. Recently we enabled crash reporting to get panic traces back from cooperating users. With that we've discovered a bunch of panics of our own creation, plus a lot of noise in terms of fatal errors outside of our control -- typically users running out of memory or threads.

There remains a lot of "unexplained" oddness however, some of which I'm sure is attributable to hardware errors (bad RAM/CPU/etc). It's hard to be sure either way, but we get a lot of stacks. The list below is a (probably non-exhaustive) selection of crashes from the last week or so that are odd in my mind:

fatal error: defer on system stack
fatal error: fatal error: unexpected signal during runtime execution
fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?) (this could be ours, though we have no cgo I'm sure there is unsafe deep in the dependencies)
fatal error: gc: unswept span
fatal error: malloc deadlock
fatal error: mSpanList.insertBack
fatal error: non in-use span in unswept list
fatal error: out of memory allocating heap arena metadata (I guess this is just a niche case of OOM)
fatal error: runtime: stack split at bad time
fatal error: runtime.newosproc (out of threads?)
fatal error: runtime·unlock: lock count
fatal error: s.allocCount != s.nelems && freeIndex == s.nelems
fatal error: slice bounds out of range (deep in the malloc code)
fatal error: stopm holding locks
fatal error: sweep increased allocation count
fatal error: sync: inconsistent mutex state
fatal error: wirep: invalid p state
panic: sync: inconsistent mutex state

I'm not going to spend any energy hunting these down or pester with bug reports, especially as I have no idea who the originating user is and no way to communicate with them or experiment. :) However, if there's anyone of you out there who think "Huh? That GC error should never happen, wonder what's going on?" I would be happy to forward a bunch of crashes for that particular crash or provide access to the crash database for searching.

(A limitation of our crash reporting is that output prior to the panic/fatal error is trimmed as potentially sensitive user data. This means we miss the description that some fatal-error crashes print before the "fatal error:" line. We might fix this at some point.)

//jb

Robert Engels

unread,

Aug 25, 2019, 7:59:12 AM8/25/19

to Jakob Borg, golang-nuts

Disheartening, but not unfamiliar - very reminiscent of Java days. I would highly encourage removing any dependencies that use CGO or Unsafe - and move to pure Go. This seemed to be the only way to tame these sort of errors in the wild in Java land.

Also, have you done stress tests using the race detector? I’m betting that the vast majority of the source of errors is incorrect concurrency.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CF271E82-EF60-4808-B678-FDBC70DEAAFD%40kastelo.net.

K.S. Bhaskar

unread,

Aug 25, 2019, 3:35:26 PM8/25/19

to golang-nuts

cgo is very picky, as is signal handling if you have Go and non-Go signal handlers. My advice is:

Understand everywhere that cgo is used.
Read every sentence in the cgo documentation, no matter how dense the prose is.
Read every sentence pertaining to signal handling if you have Go and non-Go signal handlers.

We are about to release the production grade code for a cgo based wrapper to make YottaDB (https://yottadb.com) accessible from Go (https://godoc.org/lang.yottadb.com/go/yottadb). It took more than twice as long as we anticipated, probably three times the amount of person-hours we felt it should take, and more than four times any reasonable expectation of heartache. https://docs.yottadb.com/Presentations/DragonsofCGO.pdf is a presentation we gave recently, and we will be giving an updated one at All Things Open in October.

If you don't have cgo anywhere in your stack, ignore what I said. And good luck.