I have three questions about this that I cannot answer myself:
Hi,
I'm using Go in our startup project for about 7 months now.
The majority of my day job programming so far has been in Erlang.
Comparing the two environments, the one thing I keep thinking about
is the handling of runtime panics. In Erlang, processes are first class
objects that can monitor each other and behave appropriately when
a monitored/linked process crashes (commit suicide, restart the crashed process,
log something, etc.). This facility is at the core of Erlang's bespoke suitability
for systems that 'cannot fail'.
In Go, since goroutines are invisible (except in stack dumps), no such thing
exists. This is a design decision, the apparent reason being that code should
not assume which goroutine it is running in. This decision result in several kinds
of failures that can happen (and do happen in my code, at least during development).
- Goroutine leaks are easy to produce and hard to find. Given that the
only way to actually 'see' groutines is panicking, a recommended debugging
technique is to put panic("show me the stacks") somewhere into the program
during development.
- A panicking goroutine can bring down the whole process, unless panics are
recovered. However, recover is local to a specific goroutine and thus cannot
prevent crashes in all cases. This too is by design.
Following go tip, I have seen a discussion where the runtime will raise
unrecoverable panics in some situations.
I have three questions about this that I cannot answer myself:
- Is it true that panics not supposed to happen, ever, in a production program?
I know that panic/recover are used by the json decoder to make its internal
error handling simpler. I'm not talking about those uses.
- Is it worthwhile to wrap certain goroutines with recover simply because
I suspect that the code that they run might be subtly wrong? net/http does this ;)
Since that code might create new goroutines that do not recover, I feel like
I'm risking a crash all the time.
- How do others deal with runtime panics in production? I watched Peter Bourgons
talk at GopherCon where he spoke about production issues, but panics were
not mentioned at all!
Please help me find reasonable answers to those questions.
- I know that it is possible to make the runtime raise SIGABRT on panic with
GOTRACEBACK=crash, but that doesn't help very much given that gdb still
doesn't work very well for debugging go programs, especially from a core dump
(it's seems to be impossible to display the stack of a specific goroutine from a core dump,
at least for non-wizards).
- Goroutine leaks are easy to produce and hard to find. Given that the
only way to actually 'see' groutines is panicking, a recommended debugging
technique is to put panic("show me the stacks") somewhere into the program
during development.
- A panicking goroutine can bring down the whole process, unless panics are
recovered. However, recover is local to a specific goroutine and thus cannot
prevent crashes in all cases. This too is by design.
Following go tip, I have seen a discussion where the runtime will raise
unrecoverable panics in some situations.
I have three questions about this that I cannot answer myself:
- Is it true that panics not supposed to happen, ever, in a production program?
How do others deal with runtime panics in production? I watched Peter
Bourgons
talk at GopherCon where he spoke about production issues, but panics were
not mentioned at all!In the general case, production code should never use panic to
indicate anything other than unrecoverable error.
You're misreading the question, That's kind of my fault because
I didn't phrase it well enough.
Let's face it: panics do happen, also in production code. In Go, any panic
(be it from a call the built in function panic, or e.g. a nil pointer deref) will
kill the operating system process. What I'm asking about would be tips/experiences
around handling that case. I could be asking this question on a sysadmin mailing
list, it's not specific to Go. I'm asking on golang-nuts because Go does some things
that e.g. C programs don't do:
How do you handle that in production? This is mostly about tooling around
Go programs...
With Erlang, when a lightweight process (= goroutine) crashes, a detailed crash
report is sent to the VM-wide error logger. The report includes the stacktrace, the
process state (for OTP servers) and some memory statistics. The error logger
can be extended to send this somewhere else.
net/http does this too. In Go 1.3, it even includes a way to set the log output
accept error messages and handler panic dumps.
> How do others deal with runtime panics in production? I watched Peter
> Bourgons
> talk at GopherCon where he spoke about production issues, but panics were
> not mentioned at all!
In the general case, production code should never use panic to
indicate anything other than unrecoverable error.