Recover considered harmful

1,557 views
Skip to first unread message

Sokolov Yura

unread,
Apr 24, 2017, 5:02:55 AM4/24/17
to golang-nuts
Good day, people.

Title is a bit controversial :-)

I want to ask:
- how useful `recover` for you?
- Don't you think it is a bit "dangerous"?

I mean: panic usually means programmer error, so if it happens, then
program behaves incorrectly, and there is always a chance of serious
state corruption. So, is there reason to recover at all?

Also, presence of `recover` complicates implementation of `defer`.
I believe, it could be optimized much harder in absence of `recover`
(i.e. if program always exits on panic).

I could be mistaken.

Yura.

Christian von Pentz

unread,
Apr 24, 2017, 6:42:01 AM4/24/17
to golan...@googlegroups.com
On 04/24/2017 11:02 AM, Sokolov Yura wrote:
> I mean: panic usually means programmer error, so if it happens, then
> program behaves incorrectly, and there is always a chance of serious
> state corruption. So, is there reason to recover at all?

I encountered many cases of panics when using external tools/libraries
which were completely "fine" to recover from. magicmime was such a
package that had a few "hiccups" when used in a multi-threaded
environment mostly due to the underlying libmagic. That being said, very
easy and convenient to recover from, so yeah, I would say recover is a
perfectly valid strategy sometimes.

Kevin Conway

unread,
Apr 24, 2017, 7:07:06 AM4/24/17
to Christian von Pentz, golan...@googlegroups.com
I'd say that recover() is not a problem but, instead, a symptom of panic() being available to developers. I'd flip the title and say panic() should be considered harmful. To quote from https://blog.golang.org/defer-panic-and-recover :
> The process continues up the stack until all functions in the current goroutine have returned, at which point the program crashes

Any code that invokes panic is very clearly stating that an error has occurred that is completely unrecoverable and the _only_ choice of action that could possibly be taken is to end the program. The recover() builtin must exist to account for the fact that _all_ uses of panic in user space are, in fact, recoverable errors.

As someone developing in go, it is infuriating when external libraries (whether 3rd party or std lib) make decisions about when my program should stop. Code related bugs, such as nil pointer dereferences or invalid interface conversions, should result in a process failure just like a segfault in any other runtime. However, some library using the same process ending mechanism to let me know that it doesn't like the format of my string input is unacceptable.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ian Davis

unread,
Apr 24, 2017, 7:35:39 AM4/24/17
to golan...@googlegroups.com
On Mon, 24 Apr 2017, at 12:06 PM, Kevin Conway wrote:
I'd say that recover() is not a problem but, instead, a symptom of panic() being available to developers. I'd flip the title and say panic() should be considered harmful. To quote from https://blog.golang.org/defer-panic-and-recover :
> The process continues up the stack until all functions in the current goroutine have returned, at which point the program crashes


The standard library uses panic in a couple of places to exit from deeply nested function calls, e.g. https://github.com/golang/go/blob/master/src/encoding/json/decode.go#L167


Ian



Jan Mercl

unread,
Apr 24, 2017, 7:40:12 AM4/24/17
to Kevin Conway, golan...@googlegroups.com

On Mon, Apr 24, 2017 at 1:06 PM Kevin Conway <kevinjac...@gmail.com> wrote:

> Any code that invokes panic is very clearly stating that an error has occurred that is completely unrecoverable and the _only_ choice of action that could possibly be taken is to end the program.

It's sometimes a perfectly valid and quite reasonable approach to defer a recover() in an API function and panic(forWhateverReason) somewhere down the call chain. A recursive descent parser may get much simpler and easier to code, for example.

Notice that real unrecoverable errors are not subject to defer/recover() at all.

--

-j

Sokolov Yura

unread,
Apr 24, 2017, 8:24:00 AM4/24/17
to golang-nuts, kevinjac...@gmail.com

> Notice that real unrecoverable errors are not subject to defer/recover() at all.

If so, then how should I raise unrecoverable error, if I really know that it is unrecoverable?
Something like C style assert(): "guy, something goes completely wrong, and it is
much better to stop functioning than corrupt your data further" 

> It's sometimes a perfectly valid and quite reasonable approach to defer a recover() in an API function and panic(forWhateverReason) somewhere down the call chain.
> A recursive descent parser may get much simpler and easier to code, for example.

I don't agree. I call it "abusing". In absence of other comfortable ways, panic is abused to unwind stack fast (upto recover).
I could be mistaken.

Kevin Conway

unread,
Apr 24, 2017, 8:40:36 AM4/24/17
to Sokolov Yura, golang-nuts
If so, then how should I raise unrecoverable error, if I really know that it is unrecoverable?

I don't believe you can ever know whether an error in your library is truly unrecoverable by the process executing the code. As a someone writing and operating the process, I'd expect a library to provide me all the tools necessary to make the right decision (such as custom error references or types) but never to make the decision for me.

I've yet to find a panic that would not be better served as a returned error.

Axel Wagner

unread,
Apr 24, 2017, 8:41:07 AM4/24/17
to Sokolov Yura, golang-nuts, kevinjac...@gmail.com
My 2¢:
1. panic if an API is clearly used wrongly. If a dev chose to not read the docs for this one function and ignore how it's supposed to be called, then what else have they not read the docs of? If you can detect that a program is incorrect, failing loudly seems the right thing to do
2. Do not panic, if an API is used correctly; this includes failing syscalls that you'd expect to be correct if the API is correctly - your expectations might be wrong. Return an error on non-code related problems.
3. Don't recover, pretty much universally. Even using it as a control-flow mechanism seems broken to me; it would hide actual programming errors that *should* crash
4. If you are using a library that panics and you dislike that, I see two possible root-causes; a) you are using a library that badly coded (according to 2) or b) your program is buggy. In either case, the correct solution doesn't seem to paper over either bug, but to complain loudly so it gets fixed.
5. Be prepared for your stuff crashing, from a dev-induced panic or a runtime-induced panic.

And as a preventative measure: I say this as a person who was oncall while a large service, written in a memory safe language, crashed globally and took our service out. I know it's hard to be prepared for these things and to recover from them, but I still believe that crashing is the right thing to do. You can not prevent crashes, even globally synchronized ones, to happen. Because programmers are just humans and humans are fallible and stuff happens. You need to be prepared to deal with human failures.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.

Sokolov Yura

unread,
Apr 24, 2017, 9:06:07 AM4/24/17
to golang-nuts, funny....@gmail.com
Fully agree with Axel Wagner.

- I may make mistake in my library, ie my code found that some invariant is broken.
  If library is a "shared state manager" (for example, in-memory db, or on disk db),
  then I clearly have to stop whole process instead of continue to corrupt data
  (same as Axel's example)
- I may give to programmer "low level" interface with access to internals (for performance).
  If programmer uses it in a wrong way, and my library's code detected that (but too late
  to return error, probably, by founding broken invariants), then it is clearly better to stop
  functioning.

In both cases I need a way stop process, and `panic` is a most clear way to do that...
if no one calls `recover`.

понедельник, 24 апреля 2017 г., 15:40:36 UTC+3 пользователь Kevin Conway написал:

Rob Pike

unread,
Apr 24, 2017, 9:09:56 AM4/24/17
to Axel Wagner, Sokolov Yura, golang-nuts, kevinjac...@gmail.com
Your point 3 misses an important practical detail. Packages that use recover internally call panic with an identifiable type and the recover code checks whether the type is the expected one and, if not, panics again, thus behaving like any other unexpected problem.

See encoding/gob/error.go for an example.

More generally, recover is excellent for isolation of errors in multi-client servers.

Even more generally, blanket statements about what to do or not do with the features of a programming language are too often taken as strict rules rather than thoughtful guidelines. "Don't use panic or recover" is an example. Panic and recover are the perfect tools for some problem and prohibiting them outright eliminates some powerful designs.

-rob

Axel Wagner

unread,
Apr 24, 2017, 9:13:06 AM4/24/17
to Rob Pike, Sokolov Yura, golang-nuts, kevinjac...@gmail.com
True, I genuinely missed that possibility (I always forget that panic is perfectly well-behaved when re-panicing). 

Sokolov Yura

unread,
Apr 24, 2017, 9:19:50 AM4/24/17
to golang-nuts, axel.wa...@googlemail.com, funny....@gmail.com, kevinjac...@gmail.com
понедельник, 24 апреля 2017 г., 16:09:56 UTC+3 пользователь Rob 'Commander' Pike написал:
Your point 3 misses an important practical detail. Packages that use recover internally call panic with an identifiable type and the recover code checks whether the type is the expected one and, if not, panics again, thus behaving like any other unexpected problem.

See encoding/gob/error.go for an example.

More generally, recover is excellent for isolation of errors in multi-client servers.

Even more generally, blanket statements about what to do or not do with the features of a programming language are too often taken as strict rules rather than thoughtful guidelines. "Don't use panic or recover" is an example. Panic and recover are the perfect tools for some problem and prohibiting them outright eliminates some powerful designs.

-rob

Rob, you just described panic as an generic exception mechanism.
Then why Go has no convenient exceptions?

And what about unrecoverable panic? C-style `assert`?
`net/http` recovers from all panics currently, and it is clear design flaw.
There should be distinction between "safe to recover" errors/panics and
"have to stop execution" panics/asserts.
 

On Mon, Apr 24, 2017 at 5:40 AM, 'Axel Wagner' via golang-nuts <golan...@googlegroups.com> wrote:
My 2¢:
1. panic if an API is clearly used wrongly. If a dev chose to not read the docs for this one function and ignore how it's supposed to be called, then what else have they not read the docs of? If you can detect that a program is incorrect, failing loudly seems the right thing to do
2. Do not panic, if an API is used correctly; this includes failing syscalls that you'd expect to be correct if the API is correctly - your expectations might be wrong. Return an error on non-code related problems.
3. Don't recover, pretty much universally. Even using it as a control-flow mechanism seems broken to me; it would hide actual programming errors that *should* crash
4. If you are using a library that panics and you dislike that, I see two possible root-causes; a) you are using a library that badly coded (according to 2) or b) your program is buggy. In either case, the correct solution doesn't seem to paper over either bug, but to complain loudly so it gets fixed.
5. Be prepared for your stuff crashing, from a dev-induced panic or a runtime-induced panic.

And as a preventative measure: I say this as a person who was oncall while a large service, written in a memory safe language, crashed globally and took our service out. I know it's hard to be prepared for these things and to recover from them, but I still believe that crashing is the right thing to do. You can not prevent crashes, even globally synchronized ones, to happen. Because programmers are just humans and humans are fallible and stuff happens. You need to be prepared to deal with human failures.
On Mon, Apr 24, 2017 at 2:24 PM, Sokolov Yura <funny....@gmail.com> wrote:

> Notice that real unrecoverable errors are not subject to defer/recover() at all.

If so, then how should I raise unrecoverable error, if I really know that it is unrecoverable?
Something like C style assert(): "guy, something goes completely wrong, and it is
much better to stop functioning than corrupt your data further" 

> It's sometimes a perfectly valid and quite reasonable approach to defer a recover() in an API function and panic(forWhateverReason) somewhere down the call chain.
> A recursive descent parser may get much simpler and easier to code, for example.

I don't agree. I call it "abusing". In absence of other comfortable ways, panic is abused to unwind stack fast (upto recover).
I could be mistaken.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Sam Whited

unread,
Apr 24, 2017, 9:42:32 AM4/24/17
to Kevin Conway, Sokolov Yura, golang-nuts
On Mon, Apr 24, 2017 at 7:39 AM, Kevin Conway
<kevinjac...@gmail.com> wrote:
> I've yet to find a panic that would not be better served as a returned
> error.

While I generally agree with you, panics in libraries should probably
not bubble up to anythihng outside of the library, the exception is
security issues. If for some reason I can't get a handle to urandom(4)
I'd probably rather crash the program than risk having another
developer ignore that error and generate keys with a zeroed IV (or
whatever the case may be).

—Sam

Jan Mercl

unread,
Apr 24, 2017, 9:53:48 AM4/24/17
to Sokolov Yura, golang-nuts
On Mon, Apr 24, 2017 at 3:19 PM Sokolov Yura <funny....@gmail.com> wrote:

> And what about unrecoverable panic? C-style `assert`?

fmt.Fprintf(os.Stderr, "it's full of stars!\n")
os.Exit(1)

--

-j

Юрий Соколов

unread,
Apr 24, 2017, 10:30:41 AM4/24/17
to Jan Mercl, golang-nuts
:-)
Does os.Exit(1) prints backtrace of all goroutines like unrecovered panic does?

Sam Whited

unread,
Apr 24, 2017, 10:52:45 AM4/24/17
to Юрий Соколов, Jan Mercl, golang-nuts
On Mon, Apr 24, 2017 at 9:30 AM, Юрий Соколов <funny....@gmail.com> wrote:
> :-)
> Does os.Exit(1) prints backtrace of all goroutines like unrecovered panic
> does?

import "runtime/debug"
fmt.Fprintf(os.Stderr, "it's full of stars!\n")
debug.PrintStack()
os.Exit(1)

roger peppe

unread,
Apr 24, 2017, 1:14:44 PM4/24/17
to Rob Pike, Axel Wagner, Sokolov Yura, golang-nuts, kevinjac...@gmail.com
On 24 April 2017 at 14:09, Rob Pike <r...@golang.org> wrote:
> Your point 3 misses an important practical detail. Packages that use recover
> internally call panic with an identifiable type and the recover code checks
> whether the type is the expected one and, if not, panics again, thus
> behaving like any other unexpected problem.
>
> See encoding/gob/error.go for an example.
>
> More generally, recover is excellent for isolation of errors in multi-client
> servers.

I've certainly used both these techniques in the past, but I'm no longer
entirely sure whether this really is "excellent". It's so easy to have
code that relies on non-deferred cleanup of state that invokes some
code that happens to panic (whether with a known type or not), leading
to hard-to-debug problems that would have been considerably easier if
the whole thing had just crashed.

In general, the only time I'd now consider using the "panic with
identifiable type" technique is in recursive descent parsers and the
like, where the domain is severely constrained and the convenience of
being able to use the results of called functions directly is great.

That said, my most recent use was to exit early from sort.Search when an
unexpected error occurred talking to the network. Useful but arguably
a bit dirty.

rog.

Jesper Louis Andersen

unread,
Apr 24, 2017, 2:47:12 PM4/24/17
to Rob Pike, Axel Wagner, Sokolov Yura, golang-nuts, kevinjac...@gmail.com
On Mon, Apr 24, 2017 at 3:09 PM Rob Pike <r...@golang.org> wrote:
Your point 3 misses an important practical detail. Packages that use recover internally call panic with an identifiable type and the recover code checks whether the type is the expected one and, if not, panics again, thus behaving like any other unexpected problem.


To extend on this:

JSON/Gob decoding is an abstraction over which you operate. Because it is an abstraction, the internal implementation can be substituted by any other implementation as long as the semantics are kept the same. Liskov and Wing formulated this property in the presence of a subtyping system, but there are variants of this formulation and this is one of them.

In particular, we can replace a slow implementation with a faster one. Even better, we can exploit the abstraction and cheat in the implementation, violating our normally defined rules in the programming language or coding style, as long as we preserve the abstraction and our cheat doesn't leak out (i.e., we get caught cheating). Using panic()/recover() internally constitutes what is called a "benign effect" in the sense that no-one apart from the implementation itself knows about the effect.

Rob's other example with isolation of, say, HTTP requests and recover()'ing on errors in one request holds an important subtlety. A program may not first mess up the heap in a way which is observable by some other goroutine, then panic() and then recover() while keeping the heap in a messed up state. For the abstraction to hold, you must assume a (global) store/heap and make sure your panic()'ing doesn't alter said heap in an observable way that could affect other goroutines.

The notion is related to program proof as well. Say, we are trying to prove properties about unsigned integers. An unsigned integer of, say, 64bit can be seen as an array of 64 bool values and we can define a 1-bit full-adder, combine those into a carry-lookahead adder and then define an ALU. However, such structure is hard to prove properties about. We could also define natural numbers as peano numbers:

data UInt : Type where
  Zero : UInt
  Succ : UInt -> UInt

which is to say that Zero is a representation of zero (0) and Succ is a representation of a successor to some other UInt. So 1 = (Succ Zero) and 3 = (Succ (Succ (Succ Zero))).

The kicker is that a lot of proofs are easier on the peano numbers than on a system where we have defined what an ALU is based on a bool array. So we once and for all prove that the two systems are equivalent, except perhaps for runtime efficiency[0]. And now, we can pick either representation for our proofs, whatever turns out to be easiest. Proving that something like X + Y = Y + X (commutativity) tends to be easier on peano numbers, but a proof on properties about XOR or other bitwise operations tend to be easier on the bool array.

Due to the abstraction, we can freely replace our peano variant or our bool array with a native 64bit uint. This obtains the full speed of the CPU while still allowing us the necessary structure to prove properties about programs. In essence, Rob is alluding to the same strategy w.r.t. the use over panic()/recover(): Once we've proven that the variant is semantically equivalent, we can replace one based on errors deeply nested, with a variant based on recover().

In short: you are allowed use of panic()/recover() in a program as long as you can guarantee a local benign effect which cannot be observed from the outside.

The difference compared to a "traditional" exception flow is that you allow the exception to cross an API boundary in a language such as Java. In turn, the exception is part of the expected return type and signals an error. The problem with that solution is that exception flow doesn't follow the usual scope rules of a programs control flow, and this can lead to subtle errors. Especially if concurrency and state are involved[1].

[0] The astute reader may note that in reality, you want to prove the equivalence on natural numbers mod 2^64 for this to work out. In particular, one needs a proper formalization of overflow and underflow. And some times once also needs to mention the changes to flag registers in order to capture how a real CPU operates.

[1] Erlang prefers the "go way" as well. The rule is that exceptions should not cross module boundaries but be converted into normal values which are passed around.

Dan Kortschak

unread,
Apr 24, 2017, 7:32:15 PM4/24/17
to Sam Whited, Kevin Conway, Sokolov Yura, golang-nuts
We (gonum) would extend the security exception to include scientific
code; there are far too many peer reviewed works that depend on code
that will blithely continue after an error condition that should stop
execution or log failure. These can and do end up contributing to costs
of (mis)development of pharmacuetical and other health-related
technologies and worse in patient health outcome failures (also
probably in other field, but I those are outside my expertise).

For horror, see this talk https://youtu.be/7gYIs7uYbMo?t=523 (time at
point in talk where he talks about the software issues that ultimately
resulted in drug trials based on completely spurious data).

Sam Whited

unread,
Apr 24, 2017, 10:31:31 PM4/24/17
to Dan Kortschak, Kevin Conway, Sokolov Yura, golang-nuts
On Mon, Apr 24, 2017 at 6:31 PM, Dan Kortschak
<dan.ko...@adelaide.edu.au> wrote:
> We (gonum) would extend the security exception to include scientific
> code; there are far too many peer reviewed works that depend on code
> that will blithely continue after an error condition that should stop
> execution or log failure.

Also a great example! The main take away here is that we should always
design for failure, and sometimes the primary failure mode should be
"abort at all costs and let the application developer know that
something catastrophic happened which could lead to worse things
happening in the future".

—Sam

Kevin Conway

unread,
Apr 24, 2017, 11:06:54 PM4/24/17
to Sam Whited, Dan Kortschak, Sokolov Yura, golang-nuts
In this example we're considering panic as a mechanism of preventing otherwise avoidable code bugs. What happens when the same code begins silencing panics and continuing on? Do we add a new level of panic that overcomes the normal recovery method? The fundamental assertion being made by panic advocates is that you know better than I when my program should end and you want some mechanism to enforce that opinion on me.

I'll argue that sticking to idiomatic errors returned by function calls combined with static analysis tools, like errcheck, are sufficient in solving for all scenarios where panic might otherwise be used to signal an error state. If you want to use panic internally within an API that's completely acceptable so long as that panic is never exposed beyond the API boundary. To quote the golang blog on the subject:

The convention in the Go libraries is that even when a package uses panic internally, its external API still presents explicit error return values.


Henry

unread,
Apr 25, 2017, 12:22:50 AM4/25/17
to golang-nuts
I do think that panic should be avoided whenever possible. I had a third party library that panicked and crashed my application during the production run. If it were to return errors instead, I could have anticipated the problem and handled the situation with a bit more grace. The problem with panic is that it isn't obvious from the method signature that there is a possible alternate path. The author does not always document it. Thus, this hidden path is often unhandled and crashes the application. This may be acceptable during development phase, but not in the production run.

I am pretty sure panic has its uses, but at the moment there are still people who use panic as a rare error. An error is an error. If you can return an error, you should return an error, even if it is rare or near impossible to happen. The fact that panic is used as a rare error makes it even dangerous because they represent the corner cases that are often unanticipated by the developers.

Nowadays I just wrap any third party library and use recover in case if any of them suddenly goes into shock and panic.

Axel Wagner

unread,
Apr 25, 2017, 2:20:43 AM4/25/17
to Kevin Conway, Sam Whited, Dan Kortschak, Sokolov Yura, golang-nuts
On Tue, Apr 25, 2017 at 5:06 AM, Kevin Conway <kevinjac...@gmail.com> wrote:
In this example we're considering panic as a mechanism of preventing otherwise avoidable code bugs. What happens when the same code begins silencing panics and continuing on? Do we add a new level of panic that overcomes the normal recovery method? The fundamental assertion being made by panic advocates is that you know better than I when my program should end and you want some mechanism to enforce that opinion on me.

I'll argue that sticking to idiomatic errors returned by function calls combined with static analysis tools, like errcheck, are sufficient in solving for all scenarios where panic might otherwise be used to signal an error state.

Array OOB? Nil-dereference? Division by zero? Why not add an error value to every slice and arithmetic operation, because the programmer knows best when to abort their program. This is obviously absurd, but it does strike home; error returns are changing the ergonomics of an API, but more than that, whatever your theory on panics and errors is, it must also account for why, apparently, some panics are okay after all.

There is nothing fundamentally different between a panic created by the runtime/language and a panic created by a library author. In that way, at least "if you detect a clear code bug, panic" is at least consistent.

 
If you want to use panic internally within an API that's completely acceptable so long as that panic is never exposed beyond the API boundary. To quote the golang blog on the subject:

The convention in the Go libraries is that even when a package uses panic internally, its external API still presents explicit error return values.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.

Axel Wagner

unread,
Apr 25, 2017, 2:35:54 AM4/25/17
to Henry, golang-nuts
On Tue, Apr 25, 2017 at 6:22 AM, Henry <henry.ad...@gmail.com> wrote:
I do think that panic should be avoided whenever possible. I had a third party library that panicked and crashed my application during the production run. If it were to return errors instead, I could have anticipated the problem and handled the situation with a bit more grace. The problem with panic is that it isn't obvious from the method signature that there is a possible alternate path. The author does not always document it. Thus, this hidden path is often unhandled and crashes the application. This may be acceptable during development phase, but not in the production run.

I fundamentally disagree. It is especially important during production runs. Because then it's about production traffic and production data and production users and not ignoring bugs during that time is especially important.

It's not apparent when things can panic, but the basic rule of thumb is, that any non-trivial piece of code can panic, because any non-trivial piece of code can have bugs. That third-party library could've just as well divided by zero. And apparently, all that returning an error would've done, is giving you a false sense of security that this could never fail in an unpredictable way. But bugs are happening and bugs are unpredictable.

I understand that it's frustrating to have software crash in production, I really do (see above). But the root-cause to fix this is not to ignore bugs that you know of and plough on, but to build layers of defense around failure, be it tests (I'd argue that any panic that isn't a very clear and serious code bug should be trivial to surface in tests), load-shedding, quick restarts, static analysis or formal proofs (whatever the level of security you want); because that also helps against the bugs you don't know of.
 
I am pretty sure panic has its uses, but at the moment there are still people who use panic as a rare error. An error is an error. If you can return an error, you should return an error, even if it is rare or near impossible to happen. The fact that panic is used as a rare error makes it even dangerous because they represent the corner cases that are often unanticipated by the developers.

There is an important difference between "a rare error" and "a code bug". A rare error is, when opening a file will fail on a full-moon; a code bug is if you corrupt state or do obviously incorrect things. Of course the rarity of an error should not determine it's handling.

Nowadays I just wrap any third party library and use recover in case if any of them suddenly goes into shock and panic.


On Monday, April 24, 2017 at 4:02:55 PM UTC+7, Sokolov Yura wrote:
Good day, people.

Title is a bit controversial :-)

I want to ask:
- how useful `recover` for you?
- Don't you think it is a bit "dangerous"?

I mean: panic usually means programmer error, so if it happens, then
program behaves incorrectly, and there is always a chance of serious
state corruption. So, is there reason to recover at all?

Also, presence of `recover` complicates implementation of `defer`.
I believe, it could be optimized much harder in absence of `recover`
(i.e. if program always exits on panic).

I could be mistaken.

Yura.

--

Sam Whited

unread,
Apr 25, 2017, 10:01:13 AM4/25/17
to Kevin Conway, Dan Kortschak, Sokolov Yura, golang-nuts
On Mon, Apr 24, 2017 at 10:06 PM, Kevin Conway
<kevinjac...@gmail.com> wrote:
> In this example we're considering panic as a mechanism of preventing
> otherwise avoidable code bugs. What happens when the same code begins
> silencing panics and continuing on?

Then the application author has explicitly said "I don't care what
happens, just keep running" and it becomes their problem.

The "default" action if an application developer does nothing with
idiomatic errors is that those errors are ignored (but of course, they
can choose to handle them), the "default" action with panics is that
the program crashes (or they can choose to supress them). Sometimes,
although perhaps rarely in a library, that's what you want.

—Sam

Micky

unread,
Apr 25, 2017, 10:37:55 AM4/25/17
to Sam Whited, golang-nuts
As Rob Pike said, if you don't let the real reason behind "panicing" die, then recovering is most logical thing to do in times of dire need!

For instance, the Caddy server routinely uses *recover* to recover from panics but responsibily logs them during ServeHTTP because it has to provide ultimate reliability while depending upon plethora of third party middleware!


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.

Dan Kortschak

unread,
Apr 25, 2017, 6:18:46 PM4/25/17
to Kevin Conway, Sam Whited, Sokolov Yura, golang-nuts
On Tue, 2017-04-25 at 03:06 +0000, Kevin Conway wrote:
> The convention in the Go libraries is that even when a package uses
> panic internally, its external API still presents explicit error
> return values.

reflect?

All rules are wrong.

Dave Cheney

unread,
Apr 25, 2017, 6:32:56 PM4/25/17
to golang-nuts
Aside from arguments about using panic/recover to simulate longjmp inside recursive descent parsers I can think of no valid reason why recover should be used in production code.

Imo, the arguments about wrapping all goroutines in a catch all recover are solving th wrong problem​.

- if third party code you use panics regularly, maybe don't use it, or at least validate inputs passed to it to avoid provoking it.
- if your program needs to be available, then rather than trying to diagnose the program's state internally, use something like daemontools, upstart, or systemd to restart it if it crashes. Dont forget there are plenty of other ways to exit a go program abruptly; os.Exit or log.Fatal are two that come to mind. Prefer only software.
- if your program has to be highly available, then abandon the falsehood that a single machine can meet these requirements and invest your engineering effort in making your application run across multiple machines.

IMO there is no justification for using recover as a general safety net in production Go code.

Chris G

unread,
Apr 25, 2017, 8:57:58 PM4/25/17
to golang-nuts
I think those are all excellent things to do. They do not preclude the use of recovering from a panic to assist (emphasis on assist - it is certainly no silver bullet) in achieving fault tolerance.

Assuming a web service that needs to be highly available, crashing the entire process due to one misbehaved goroutine is irresponsible.  There can be thousands of other active requests in flight that could fail gracefully as well, or succeed at their task.

In this scenario, I believe a well behaved program should 
  • clearly log all information about the fault
  • remove itself from a load balancer
  • alert some monitoring program that it has experienced critical errors
  • depending on widespread severity, have a monitoring program alert a human to inspect it

Dave Cheney

unread,
Apr 25, 2017, 9:07:33 PM4/25/17
to golang-nuts


On Wednesday, 26 April 2017 10:57:58 UTC+10, Chris G wrote:
I think those are all excellent things to do. They do not preclude the use of recovering from a panic to assist (emphasis on assist - it is certainly no silver bullet) in achieving fault tolerance.

Assuming a web service that needs to be highly available, crashing the entire process due to one misbehaved goroutine is irresponsible.  There can be thousands of other active requests in flight that could fail gracefully as well, or succeed at their task.

In this scenario, I believe a well behaved program should 
  • clearly log all information about the fault
panic does that
 
  • remove itself from a load balancer
your load balancer should detect that; it shouldn't wait to be told that a backend has failed.
 
  • alert some monitoring program that it has experienced critical errors
The monitoring program should detect that the process exited; not the other way around.
 
  • depending on widespread severity, have a monitoring program alert a human to inspect it

Same; relying on a malfunctioning program to report its failure is like asking a sick human to perform their own surgery. 

Chris G

unread,
Apr 25, 2017, 10:37:10 PM4/25/17
to golang-nuts


On Tuesday, April 25, 2017 at 6:07:33 PM UTC-7, Dave Cheney wrote:


On Wednesday, 26 April 2017 10:57:58 UTC+10, Chris G wrote:
I think those are all excellent things to do. They do not preclude the use of recovering from a panic to assist (emphasis on assist - it is certainly no silver bullet) in achieving fault tolerance.

Assuming a web service that needs to be highly available, crashing the entire process due to one misbehaved goroutine is irresponsible.  There can be thousands of other active requests in flight that could fail gracefully as well, or succeed at their task.

In this scenario, I believe a well behaved program should 
  • clearly log all information about the fault
panic does that

Yes, and then crashes the program. In the scenario I described, with thousands of other requests in flight that meet an abrubt end.  That could be incredibly costly, even if it's been planned for.
 
 
  • remove itself from a load balancer
your load balancer should detect that; it shouldn't wait to be told that a backend has failed.
 
Your load balancer should detect a crashed backend, yes.  You shouldn't crash a live backend needlessly is my point.  All load balancers that I'm aware of relying on heartbeating the backend. Flipping a healthcheck endpoint to return an unhealthy state will be detected as quickly as if it had crashed.

The intent is to allow in flight requests to finish, while not allowing more in.
 
 
  • alert some monitoring program that it has experienced critical errors
The monitoring program should detect that the process exited; not the other way around.

Fair, poor wording on my part.
 
 
  • depending on widespread severity, have a monitoring program alert a human to inspect it

Same; relying on a malfunctioning program to report its failure is like asking a sick human to perform their own surgery. 
 
I was assuming that the monitoring program was some seperate process or service (nagios, or some commercial provider). I'm sorry if there was miscommunication on that. I don't believe this analogy holds

Dave Cheney

unread,
Apr 25, 2017, 10:52:25 PM4/25/17
to golang-nuts
> Yes, and then crashes the program. In the scenario I described, with thousands of other requests in flight that meet an abrubt end. That could be incredibly costly, even if it's been planned for

There are a host of other reasons that can take a server offline abruptly. It seems like a odd misallocation of resources to try to prevent one specific case - a goroutine panics due to a programming error or input validation failure -- both which are far better addressed with testing.

To try to postpone the exit of a program after a critical error to me implies a much more complex testing and validation process that has identified all the shared state in the program and verified that it is correct in the case that a panic is caught.

To me it seems simpler and more likely to have the root cause of the panic addressed to just let the program crash. The alternative, somehow firewalling the crash, and its effects on the internal state of your program, sounds unworkably optimistic.

Chris G

unread,
Apr 26, 2017, 12:24:23 AM4/26/17
to golang-nuts


On Tuesday, April 25, 2017 at 7:52:25 PM UTC-7, Dave Cheney wrote:
> Yes, and then crashes the program. In the scenario I described, with thousands of other requests in flight that meet an abrubt end.  That could be incredibly costly, even if it's been planned for

There are a host of other reasons that can take a server offline abruptly. It seems like a odd misallocation of resources to try to prevent one specific case - a goroutine panics due to a programming error or input validation failure -- both which are far better addressed with testing.

There's a cost benefit analysis to be done, for sure, but I don't always believe it to be a misallocation of resources.  I don't believe it's costly for every program, and for programs where it's important, I don't believe it to always be a hard problem to accomplish.  To your point, for a great many programs, the effort probably isn't worth the reward.
 

To try to postpone the exit of a program after a critical error to me implies a much more complex testing and validation process that has identified all the shared state in the program and verified that it is correct in the case that a panic is caught.

Not always applicable, but there are some relatively easy ways of coping with that:
- Don't have shared state to begin with (for a large number of programs, this isn't that hard! Look at how far php has gotten, for example)
- Don't have mutable shared state
- Copy on write, and only publish immutable shared state

Those properties can also make testing and validation much easier, I should note. And with those properties, I don't think it's necessarily hard to isolate a particular lifecycle, for example, an http request. 

Often it can just be a http handler that defers a recover and calls a real handler.  In the case of publishing an immutable object graph to shared state, only publish it once it's verified.  If a panic occurs in whatever publishing goroutine, published state remains in a known-good condition.

Of course, it's very possible to imagine a program that is complex enough where shared state isn't simple to manage. I would also argue, independently on if it's worth any effort to make a single lifecycle crash-safe, that as a program reaches that level of complexity, it should be questioned if all of that state belongs in the same process at all.   Split it up and get process isolation from the operating system (and scale that up to multiple machines as well, to your third point).

To me it seems simpler and more likely to have the root cause of the panic addressed to just let the program crash. The alternative, somehow firewalling the crash, and its effects on the internal state of your program, sounds unworkably optimistic.


I'm by no means advocating for leaving a fault in a program. I don't believe these are alternatives at all! Fix your program!  But I certainly don't think resiliency within a process space is always unworkable.  Perhaps optimistic, I'll give you that :)

Kevin Conway

unread,
Apr 26, 2017, 12:59:38 AM4/26/17
to Dave Cheney, golang-nuts
> To try to postpone the exit of a program after a critical error to me implies a much more complex testing and validation process that has identified all the shared state in the program and verified that it is correct in the case that a panic is caught

There's an implicit argument here that the panic is, in fact, the result of a critical error. This is my primary contention with the general use of panic(). There is no guarantee for me as the consumer of a panicking library that the panic in question is truly related to an unrecoverable exception state that can only be resolved by a process exit 

I posit the question one last time: How can the author of shared code understand, in sufficient detail, all the possible ways that the code could be leverage such that she or he could determine, objectively, that any given process must stop when a particular error state is encountered?

> There are a host of other reasons that can take a server offline abruptly. It seems like a odd misallocation of resources to try to prevent one specific case.

This, generally, is the argument that "if you can't stop all exceptions then why bother to stop any?". Contrary to exception states such as my cloud provider has terminated my instance abruptly or my data center has lost power, panic() uses are entirely defined by developers and not strictly related to unrecoverable exception states. The process exit in the case of a panic is entirely preventable unlike a true, systemic failure. To say that panic leads to process termination and, therefore, panic is equivalent to all process termination events is fallacious. I stand firm that only the process developer knows when the process should exit.

To put it more succinctly: The idea that your exception state should stop my process is, well, that's just, like, your opinion, man.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Юрий Соколов

unread,
Apr 26, 2017, 1:30:36 AM4/26/17
to Chris G, golang-nuts
It looks like there is two point of view:

- optimists who never used to build mutable shared state by them self, and they hope libraries they use also don't use mutable shared state,
- and those who know that mutable shared state usually exists.

In absence of mutable shared state it is perfectly valid to recover to immitate what PHP or Erlang does. But PHP and Erlang has real "process" isolation, and they tend to not recover cause conurrent requests are really "not affected".

Erlang's philosophy is "let it crash" cause there is always "superwiser" in a separate isolated "process" who will respawn "worker process".

PHP will let whole process to crash if it meets C-level "assert" cause process doesn't serve more than one request at time, and there is also always a superwiser who will spawn new process to serve requests.

But in Go you have no such support from runtime. So, you have to inspect all third-party libraries to check they doesn't have mutable shared state, if you want to `recover`. But if you did inspect them, then why didn't you prevent them from 'panic'? Why did you pass input that leads to panic you "allowed to recover"? You didn't test your program enough?

Most languages, who stick with Exceptions, usually has two kind of exceptions:
- "exception for regular error", i.e. wrong input, wrong system state, or for "control flow"
- and "fatal exceptions",
First kind of exceptions are safe to catch and recover from.
Second kind is always documented as "you'd better crash, but don't recover from it". They usually have separate inheritance root, so regular 'catch' doesn't catch them (some exceptions even checked by runtime to not be catched). And everyone in safe mind will not catch those exceptions.

Go says:
- first kind is just an error. Return error, analyze error, rereturn error, and you will be happy, and your hair will shine.
- Second kind is... yeah, it should be panic. You'd better not recover. But you know what... I'm sometimes use it for control flow... and I do recover in 'net/http' cause I pretend to be new PHP... So, you have no "blessed way" to "fatal error". Go ahead, and do your own "super-panic" with "debug.PrintStack(); os.Exit(1)".

I'm sad falcon.

PS. To be fair, "fatal exceptions" usually allows to eval `finally` statements, so my point for "optimize defer in absence of recover" is not perfectly valid.

26 апр. 2017 г. 7:24 AM пользователь "Chris G" <ch...@guiney.net> написал:
--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/rW6LB-9N37I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts+unsubscribe@googlegroups.com.

Bakul Shah

unread,
Apr 26, 2017, 1:35:44 AM4/26/17
to golang-nuts
Recover/panic need not be used only in case of a “critical” error. Just as in the case of setjmp/longjmp, there are other useful patterns. For example, a user may ask an interpreter to abandon its current computation by typing  ^C. This would be handled by a longjmp/panic() to regain control at the REPL level.

There are actually at least three use cases:
1. Reduce the "semantic clutter" of having to check error return at every level just because a very deeply nested function may fail
2. Regain control as above in case of cancellation. 
3. Indicate a critical error.

For 1. (IMHO) the best mechanism was Pascal’s non-local goto. It *only* returned control higher up in the stack and to a lexically enclosing function - this could be checked at compile time. In Go panic/recover are analogs of longjmp/setjump and compile time checking is not possible to ensure that panic doen’t escape a package scope. Also, Go allows lexical nesting of unnamed functions but not named ones so one would not write, e.g. a parse() function as one giant function with multiple sub functions, one each for a parse rule. And concurrency complicates things.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Юрий Соколов

unread,
Apr 26, 2017, 2:03:06 AM4/26/17
to Bakul Shah, golang-nuts
`panic/recover` are really bad names for setjmp/longjmp. `throw/catch` are much closer.

And C has `assert`. You may set handler for SIGABRT, but then you have to know what are you doing. Usually it is set for backtrace printing only.

I mean: usual languages has clean separation between "wrong state or input", "fast control flow" and "fatal" errors.
C has "return code + errno" for first, "setjmp/longjmp" for second and "assert" for third.
Languages with exceptions have exceptions hierarhy with documented support from runtime for "fatal exceptions".

Some languages choose process isolation for recovering from fatal error (php, erlang... .Net has domains, but looks like .Ney Core doesn't).

Go has "return error" for first, abuses "panic/recover" for second, and in absence of true process isolation, simply has no choice for "fatal error".

Go should have choice for "fatal error". Since many people want to recover from "fatal error" (and this thread shows it clearly), Go should have true "process isolation".

But it is just a dream.

26 апр. 2017 г. 8:35 AM пользователь "Bakul Shah" <ba...@bitblocks.com> написал:
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/rW6LB-9N37I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts+unsubscribe@googlegroups.com.

Bakul Shah

unread,
Apr 26, 2017, 2:59:29 AM4/26/17
to Юрий Соколов, golang-nuts
The practical issue here is how to use Go effectively to handle various error/exceptional situations. For that you have to first think about and understand these scenarios and the best way to handle them. But that is not simple and context dependent! Example: I may start up N threads in parallel to search for something and when one finds it, all others are to be terminated. How do we do this? A different example: Take a function for matrix inversion. It would fail if the determinant is zero but whether this failure is an error or not depends on the context. For a certain application, if there is an alternative method that doesn’t use matrix inversion but is much slower, a failed matrix inversion is not an error but a signal to try the slower method. Another question is, in case of exception handling ala panic/recover, how to close open file descriptors to avoid leaks or do other cleanup to maintain consistent state & who does the cleanup and when? In Go you can use not just error return or panic/recover but also a chan to report things asynchronously. What makes most sense in a given situation? Then there are issues of whether to take heroic efforts to keep a program running (think a robotics control program) or to bail at the first error (and let another shard handle requests). Or whether it is ok to not lose a single request or what is an acceptable threshold in a given application. And what to do get useful debugging data out of exceptional situations. “Recover considered harmful” doesn’t quite do justice to this rich topic!

To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Henry

unread,
Apr 26, 2017, 3:20:25 AM4/26/17
to golang-nuts
The problem with panic is that it hides an alternate execution path. I think it is better to make things explicit and obvious.

Peter Herth

unread,
Apr 26, 2017, 4:55:50 AM4/26/17
to golang-nuts
On Wed, Apr 26, 2017 at 3:07 AM, Dave Cheney <da...@cheney.net> wrote:


On Wednesday, 26 April 2017 10:57:58 UTC+10, Chris G wrote:
I think those are all excellent things to do. They do not preclude the use of recovering from a panic to assist (emphasis on assist - it is certainly no silver bullet) in achieving fault tolerance.

Assuming a web service that needs to be highly available, crashing the entire process due to one misbehaved goroutine is irresponsible.  There can be thousands of other active requests in flight that could fail gracefully as well, or succeed at their task.

In this scenario, I believe a well behaved program should 
  • clearly log all information about the fault
panic does that

No, panic certainly does not do that. It prints the stack trace. A proper logger could add additional information about the program state at the point of the panic, which is not visible from the stack trace. It also might at least be reasonable to perform an auto-save before quitting.

Same; relying on a malfunctioning program to report its failure is like asking a sick human to perform their own surgery. 

What makes you think that a panic implies that the whole program is malfunctioning? A panic should certainly taken seriously, and the computation in which it happened should be aborted. But if you think of a functional programming style, there are clear points in the call tree, at which the recover could happen and the the computation can safely aborted without impacting the rest of the program. If you think of any multi-user software, at worst you can kill the session for a user, but do not necessarily have to impact the other users. 

Best regards,
Peter


Axel Wagner

unread,
Apr 26, 2017, 5:38:58 AM4/26/17
to Peter Herth, golang-nuts
On Wed, Apr 26, 2017 at 10:55 AM, Peter Herth <he...@peter-herth.de> wrote:


On Wed, Apr 26, 2017 at 3:07 AM, Dave Cheney <da...@cheney.net> wrote:


On Wednesday, 26 April 2017 10:57:58 UTC+10, Chris G wrote:
I think those are all excellent things to do. They do not preclude the use of recovering from a panic to assist (emphasis on assist - it is certainly no silver bullet) in achieving fault tolerance.

Assuming a web service that needs to be highly available, crashing the entire process due to one misbehaved goroutine is irresponsible.  There can be thousands of other active requests in flight that could fail gracefully as well, or succeed at their task.

In this scenario, I believe a well behaved program should 
  • clearly log all information about the fault
panic does that

No, panic certainly does not do that. It prints the stack trace. A proper logger could add additional information about the program state at the point of the panic, which is not visible from the stack trace. It also might at least be reasonable to perform an auto-save before quitting.

Same; relying on a malfunctioning program to report its failure is like asking a sick human to perform their own surgery. 

What makes you think that a panic implies that the whole program is malfunctioning?

But that is not the claim. The claim is, that if you discover a condition which can uniquely be attributed to a code bug, you should always err on the side of safety and prefer bailing out to continuing with a known-bad program. It's not "as I see this bug, I know the rest of the program is broken too", it's "as I see this bug, I can not pretend that it can't be".
 
A panic should certainly taken seriously, and the computation in which it happened should be aborted. But if you think of a functional programming style

If you are thinking of that, then you are not thinking about go. Go has shared state and mutable data. One of the major arguments here is, that there is a level of isolation of state, which is very good, from all we know, and that's the process; if the process dies, all locks are being released, file descriptors closed and memory freed, so it gives a known-good re-starting point. And that, in the presence of mutable state, potential data races and code bugs, that is the correct layer of isolation to fall back to. And I am also aware, that it's also not a perfect layer; you might have already corrupted on-disk state or abused a protocol to corrupt some state on the network. Those also need to be defended against, but process isolation still gives a good tradeoff between efficiency, convenience and safety.


FWIW, I don't believe there is any convincing to be done here on either side. There are no technical arguments anymore; it is just that one set of people are holding one belief and another set of people are holding another belief. Both certainly do that based on technical arguments, but in the end, they are simply weighing them differently.

I mean, I definitely agree that it would be great for a program to never crash. Or to have only panics which definitely can't be recovered from. Or to have all state isolated and safely expungeable. I agree, that the process being up for a larger timeslice is valuable and that other requests shouldn't fail because one of them misbehaved.

I also assume you agree that errors should be noticed, caught and fixed. I assume you agree that crashing a binary will make the bug more noticeable. That crashing would allow you to recover from a safer and better-known state. And that being able to recover from any crash swiftly and architecting a service so that processes dying doesn't take it down is valuable and bugs shouldn't make it to production.

The facts are straight, this is just a question of opinion and different experiences; and I don't see any way out of it than saying "agree to disagree; if you don't think you can tolerate panic's, you just can't use my stuff and I won't use yours, if I consider it to hide failures or be unergonomic".
This argument becomes much more difficult, when I'm having it with my coworkers, as it does depend on how the service is run, which needs to be decided by the team; in regards to this thread, at least we all have the luxury that we can agree to disagree and move on :)

roger peppe

unread,
Apr 26, 2017, 5:59:04 AM4/26/17
to Axel Wagner, Peter Herth, golang-nuts
FWIW I have seen real problems in production where long-running worker
goroutines stopped working. We looked into it and found that certain rare
requests were panicking, not releasing a mutex
and thus preventing the long-running goroutine from acquiring that mutex.

This took ages to work out - made worse because I'd forgotten that
the stdlib recovers from panics in HTTP requests by default...

This is the kind of subtle problem that makes me think that recovering
from panics as a way of making the system more reliable can actually lead
to nastier problems further down the line.


On 26 April 2017 at 10:38, 'Axel Wagner' via golang-nuts

Jan Mercl

unread,
Apr 26, 2017, 6:03:33 AM4/26/17
to roger peppe, golang-nuts
On Wed, Apr 26, 2017 at 11:58 AM roger peppe <rogp...@gmail.com> wrote:

> FWIW I have seen real problems in production where long-running worker
goroutines stopped working. We looked into it and found that certain rare
requests were panicking, not releasing a mutex
and thus preventing the long-running goroutine from acquiring that mutex.

Code bug: Don't Lock without defer Unlock(). (I'm a sinner, just telling what I learned.)

--

-j

Юрий Соколов

unread,
Apr 26, 2017, 11:47:47 AM4/26/17
to Jan Mercl, roger peppe, golang-nuts
> Don't Lock without defer Unlock(). (I'm a sinner, just telling what I learned.)

You are not quite right. Sometimes there is scope dependency between Lock
and Unlock, ie Unlock happens not at function exit, but is triggered asynchronously
by some condition.
More: if you panic cause you already found state inconsistent, then there is
no way to fix it, cause most likely you don't know how it became inconsistent.


Юрий Соколов

unread,
Apr 26, 2017, 11:48:54 AM4/26/17
to Jan Mercl, roger peppe, golang-nuts
s/there is scope dependency/there is no scope dependency/

Jan Mercl

unread,
Apr 26, 2017, 12:08:48 PM4/26/17
to Юрий Соколов, roger peppe, golang-nuts
My lesson learned includes that Unlock before returning from the function that did the Lock is the only safe-by-design form of using a mutex. In other cases channel is probably a better choice.



--

-j

Sam Whited

unread,
Apr 26, 2017, 12:35:54 PM4/26/17
to Юрий Соколов, Jan Mercl, roger peppe, golang-nuts
On Wed, Apr 26, 2017 at 10:47 AM, Юрий Соколов <funny....@gmail.com> wrote:
> You are not quite right. Sometimes there is scope dependency between Lock
> and Unlock, ie Unlock happens not at function exit, but is triggered
> asynchronously by some condition.

If you can stomach the overhead of yet another stack frame allocation
you can always wrap your lock/unlock (or file close or whatever) and
logic in an anonymous closure. This of course may not be appropriate
for all situations, and isn't especially nice looking; anytime I find
myself doing this my general reaction is that I need to refactor so
that it's not necessary.

> func() {
> m.Lock()
> defer m.Unlock()
>
> // Do some stuff
> }()
> // …
> // outer function end
> }


—Sam

roger peppe

unread,
Apr 27, 2017, 2:47:49 AM4/27/17
to Jan Mercl, golang-nuts
Just for the record, there are *many* occurrences of Unlock without defer,
even in the standard library.

$ cd $GOROOT
$ find . -name '*.go' | grep -v test | xargs grep '\.Unlock(' |
grep -v defer | wc
242 489 11123

Jesper Louis Andersen

unread,
Apr 27, 2017, 8:50:30 AM4/27/17
to Peter Herth, golang-nuts
On Wed, Apr 26, 2017 at 10:55 AM Peter Herth <he...@peter-herth.de> wrote:

No, panic certainly does not do that. It prints the stack trace. A proper logger could add additional information about the program state at the point of the panic, which is not visible from the stack trace. It also might at least be reasonable to perform an auto-save before quitting.


Additional comments in a haphazard order:

It makes sense to accept a panic() in a Go program will have some collateral requests being taken down as a consequence. This argument can be extended however. Since the operating system kernel might be wrong, it is better to halt the operating system whenever a Goroutine panics. After all, the logic seems, who can be sure the operating system forget to release a Mutex lock? And why stop there? The hardware on which you are running may have a failure. Better replace that whenever a goroutine panics!

In practice---I think this is due to work by Peter J. Denning originally---we use process isolation at the OS level to guard against such failure. We ought to use a layered model, where each layer guards the layers below it. There is a 7 year old blogpost I wrote on the subject, in which I used an onion as a metaphor for the model[0], and it is one of the blog posts which have had more readers than other posts.

In general, failure is something you ought to capture for post-mortem analysis. Get the core-dump, push it into your blob store, restart the process and then attach a debugger to the blob to figure out what is wrong. In my experience, it is also important to have access to the memory state of the program in addition to the backtrace if the problem is complex.

What Erlang people acutely understands is that the granularity of failure matter. A single request failing in a system is, usually, localized to that single request. If, however, we have a situation as Roger Pepper mentions where a mutex is locked, the failure of single requests should at some point escalate to larger parts of the system. This is where the concept of a "restart strategy" in Erlang systems are necessary: more than K failures in a time-frame window of W increases the granularity and resets larger parts of the system. Eventually, the whole node() resets, which is akin to a Go panic() which isn't getting caught. The advantage is that the size of the failure determines its impact: small errors have small impact. Large errors have large impact.

Dave Cheney touches on another important point: if you care about requests and a panic in one Go process can make other requests running collaterally fail, then you should build your load balancer such that it retries the requests on another worker[1].

Yet another point worth mentioning is that a panic() can have a large recovery time for a process. If you have a large heap of several hundred gigabytes of data, reestablishing such heap after a failure might take a long time. Thus, it can be beneficial to restart parts of the system at a finer granularity first, before resorting to rebooting the full process. Likewise, if a system knows it is in a bad state, it is often faster to report said state to the load-balancer rather than relying on it eventually figuring it out. Depending on your SLA, you may fail many requests in the mean time and this may affect your reliability measure. This is especially true if your system has a high processing rate of requests, which makes it far more sensitive to latency fluctuations.

So what is a Go programmer to do? The solution, at least from my view, is to use the 'context' package to establish a tree of work-areas for different parts of the Go program. Failure in one tree can then be handled by failing a given context, and if the system can clean up, you can continue operating. The question is, then, what to do with cross-cutting concerns where one context talks to the goroutines of another context in the tree. My guess is you signal that by closing channels appropriately, but I'm not sure. Erlang systems provide a monitor-concept for this very situation, in which you can subscribe to the lifetime of another part of the system. If it fails, a message is sent to you about its failure so you can take action.
[1] Beware of a "poisonous" request however! A single bad request that panics a system and then getting retried in the load balancer can easily take down all of your backend worker pool.

Johan Bolmsjö

unread,
Apr 27, 2017, 8:50:40 AM4/27/17
to golang-nuts

Den måndag 24 april 2017 kl. 11:02:55 UTC+2 skrev Sokolov Yura:
I want to ask:
- how useful `recover` for you?

In one piece of code it was extremely handy. A TLV protocol decoder taking a bytes buffer as input. The TLVs could be nested so TLVs containing TLVs etc. Instead of manually bounds checking each TLV i just relied on the out of bounds checking provided by Go. In the top level decode function I recovered the out of bounds panic raised by the run time and converted it to an error that was returned the usual way. So yea, basically excpetions :-)
 
- Don't you think it is a bit "dangerous"?

No, but in my own "library code" I have never called panic without internally caught it with recover and converter it to an error. I don't count bounds checking panics etc as I count those as bugs (with the exception of the example above).
 

mhh...@gmail.com

unread,
Apr 27, 2017, 8:57:13 AM4/27/17
to golang-nuts, ch...@guiney.net
Most languages, who stick with Exceptions, usually has two kind of exceptions:
- "exception for regular error", i.e. wrong input, wrong system state, or for "control flow"
- and "fatal exceptions",

agree to that.
Current error management is not satisfying and pushes the community to find a convention**,
unfortunately it seems impossible to reach,
the only one who made things goes forward is Dave Cheney with pkg/errors.

** One might say, by now all errors must implement IsFatal so that the consumer can determine the severity of error it is dealing with.
It is not impossible to do today, but two packages not written together with this rule in mind won t match.
And consensus has not come, so we are left with a broken leg.

PS: panic is just another capability provided, there s no need to fell into some dogma like `never use it`,
like everything else in go, use it with care, or panic and fix it.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Dave Cheney

unread,
Apr 27, 2017, 5:53:36 PM4/27/17
to golang-nuts
The take away for me is; prefer returning an error to to caller wherever possible. Overuse of panic begats overuse of recover and that just leads to more problems [1].

1. https://github.com/golang/go/issues/13879

John Souvestre

unread,
Apr 30, 2017, 7:31:00 PM4/30/17
to golang-nuts
I had a few library functions which I couldn't decide on a "fits all" handling method. Often the decision is based on the caller's environment. So I decided to let the caller decide.

// Panic holds an optional user function which is used to handle serious
// run-time errors. If not set, the default (panic) is used.
//
// Examples:
// math.Panic = func(err error) { log.Fatal(err) } // Log and panic
// math.Panic = func(err error) { log.Warn(err) } // Log and continue
// math.Panic = func(err error) { } // Ignore and continue
//
var Panic func(error) = func(err error) { panic(err) } // Default: panic

Also, this avoids using recover - which gets really messy if you have multiple spots where such an error can take place and the desired response might vary.

John

John Souvestre - New Orleans LA
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

LowEel

unread,
May 4, 2017, 1:51:15 PM5/4/17
to golan...@googlegroups.com


On 04/24/2017 11:02 AM, Sokolov Yura wrote:
> Good day, people.
>
> Title is a bit controversial :-)
>
> I want to ask:
> - how useful `recover` for you?
> - Don't you think it is a bit "dangerous"?
>
> I mean: panic usually means programmer error,

I don"t know what you mean saying that any panic is caused by a bug:
being honest, I have seen panic
conditions related to weird behaviors into the underlying
infrastructure, like "not really seamless" network failover, or storage
failover into SAN
systems.

Personaly I have used it to manage the situation when the connection to
a server is dropped because
of any change of a failover interfaces, when raw sockets are not managed
(we implemented SCTP over raw sockets in Linux), and the daemon
just survives and reconnects.

I cannot see any "danger" in this, until you check this doesn't ends
into a loop of endless restarts, when the failure condition keeps existing.

L.



Reply all
Reply to author
Forward
0 new messages