Creating a crash dump file with Go and C stack traces

1,059 views
Skip to first unread message

martin....@gmail.com

unread,
Aug 30, 2016, 9:06:45 AM8/30/16
to golang-nuts
Hi,

I have a Go executable that uses a shared C library which spawns it own threads. In case of a crash (in Go or C code), I want to dump all stacktraces of all C threads and Go routines into a crash dump file. Go does not handle signals in non-Go threads executing non-Go code, so I have to install a custom C signal handler to handle those cases. And as Go does not invoke a preinstalled C handler in case of crashes in Go code, the C handler has to be registered after the Go handler.

After some experiments - restricted to Linux amd64 - I got it working somehow (https://gist.github.com/trxa/302c5dbe9055ef287da9139e68d0a93e). But it feels a bit hacky with some drawbacks and I wonder if somebody can propose a better solution or improvements.


How it basically works:

The Go handlers are stored when the C handler gets installed. 
If invoked, for example by a SIGSEGV, the handler opens a file and writes the stack trace of the current thread into that file.
Then, it signals all other threads to dump their stack into the file too.
After all threads are dumped, the IP of the failing instruction is saved and the Go handler is invoked by calling it directly to keep the ucontext of the crash.
After the Go handler has returned, it is checked whether the IP of the uc_mcontext has been changed by Go.
If it is changed, the IP points to runtime.sigpanic which triggers a panic and dumps the Go routine stacks to STDERR. 
If it is not changed, the crash was in non-Go code on a non-Go thread and Go does not handle the crash. In that case, the IP register in uc_mcontext is set to the function pointer of an exposed Cgo function which calls panic() to dump the stack to STDERR.
Before returning from the C handler, the STDERR file descriptor is replaced by the crash dump file descriptor, so that Go panics into the file. (The Go handlers should probably be restored before returning, if Go still wants to backtrace the threads via SIGQUIT itself.)
After the C handler has returned, runtime.sigaction or the cgo function is executed and does not return. 


Here are the disadvantages and things to watch out, which makes the solution a bit creepy:

1. signal.Notify has to be called for all signals you want to handle for C crashes, although they are not handled in Go. Otherwise the Go handler does not return in the "non-Go-code/thread" case, but creates a core dump.

2. Setting the IP to a cgo function to be executed when the handler returns, makes the program panicing synchronously, as with runtime.sigpanic, but is probably not async-signal-safe, for example if it has to request more stack.
A workaround would be to panic in Go, if the signal is read from the notify channel. In addition, the C handler must not return to avoid reexecution of the faulting instruction. This can be done by putting the thread to sleep. Doing this is probably even more platform independent, but that way, a synchronous signal from C is handled as an asynchronous one and you don't have a chance to distinguish it in Go by the information you get (in case you only want to dump and continue for asynchronous signals).

3. Cloning the STDERR file descriptor to point to a file feels also a bit fragile compared to directly writing to it. Another thread might write to it. The fd cannot be closed (except maybe in a global destructor) and the OS would have to flush the buffers correctly (or I have to use synchronous write mode, which slows writing the dump down tremendously).

4. There are duplicate stack traces, and it's not always obvious to match a thread stack trace to the running go routine.

5. It would be desirable to have the stack trace of the failing instruction redundantly in the crash file and in the log file, but with this solution it is only possible for C frames and the first Go frame on top of the thread, at least if you use a common unwinder library.

There might be more.


From my point of view, a better solution would be, when Go has an option (maybe via GOTRACEBACK env var) to trace C threads as well, for example by using the cgo traceback functions introduced in Go 1.7. Also setting a file descriptor/handle as target for a dump should be allowed (maybe in addition to the dump on STDERR). In addition to the cgo traceback functions, there might be one or more functions for gathering additional information, which will be printed in the crash dump. A use case for that would be a list of loaded modules/libraries or environment variables.
I can imagine that it's easier said than done, but that's what I would prefer.


Thanks for your opinions!
Martin

Ian Lance Taylor

unread,
Aug 30, 2016, 10:29:10 AM8/30/16
to Martin Strenge, golang-nuts
On Tue, Aug 30, 2016 at 6:06 AM, <martin....@gmail.com> wrote:
>
> From my point of view, a better solution would be, when Go has an option
> (maybe via GOTRACEBACK env var) to trace C threads as well, for example by
> using the cgo traceback functions introduced in Go 1.7.

I don't see how that would work. The Go code has no idea what C
threads exist. I'm not aware of any portable API that would let it
determine that.

> Also setting a file
> descriptor/handle as target for a dump should be allowed (maybe in addition
> to the dump on STDERR).

To me that sounds like something to be handled when invoking the Go
program, e.g., feed stderr into tee.

> In addition to the cgo traceback functions, there
> might be one or more functions for gathering additional information, which
> will be printed in the crash dump. A use case for that would be a list of
> loaded modules/libraries or environment variables.
> I can imagine that it's easier said than done, but that's what I would
> prefer.

Calling a user Go function while crashing definitely sounds
problematic. If we are crashing because we have run out of memory or
because the heap has been corrupted, there isn't anything reasonable
that the user function can do.

I'm impressed that your code works as well as it sounds, but in
general Go is focused on dumping Go state. I can imagine adding
another callback, along the lines of SetCgoTraceback, that would
provide a C function that would be invoked when crashing, immediately
before exiting. Perhaps in dieFromSignal or perhaps in runtime.exit.
Would that help your case? If so, please open an issue and we can
think about it. Of course such a function would be restricted to
async-signal-safe function calls, so I don't know if that would really
help.

Ian
Reply all
Reply to author
Forward
0 new messages