go (master) - io.CopyBuffer - why would you panic instead of error on 0-size buffer?

945 views
Skip to first unread message

Danny

unread,
Aug 8, 2015, 8:49:47 PM8/8/15
to golang-nuts
Hi all,

First of all, I'm sorry if I skipped a step in the process. I'm not that familiar with the various facilities (bug tracker, source control, etc.) for Go. If this is the wrong place for my question, please direct me. That being said, now for my question.

I noticed by coincidence this new function: http://tip.golang.org/src/io/io.go?s=12567:12645#L348 (change https://go.googlesource.com/go/+/2c89992f445a631da250517d6f9b9fcd7852872e%5E!/)

It creates a function io.CopyBuffer which can be used if you want to do an io.Copy and provide your own buffer. However, it troubles me that this function throws a panic for a 0-size buffer.

Is there a good reason for panicking here, instead of return a defined error indicating that a 0-size buffer is unusable.

It bugs me personally, because I'd like to keep the number of potential panics to a bare minimum and this case seems to be way overkill(?) I.e. it is a situation that is quite controllable by simply returning an appropriate error.

Now, if you all do not agree. Could you explain me why a panic is appropriate here and maybe some reasoning as to how you come to that conclusion, so I can learn from that.

Thank you very much,
Danny

Ian Lance Taylor

unread,
Aug 8, 2015, 8:59:16 PM8/8/15
to Danny, golang-nuts
io.CopyBuffer could return an error for a zero-sized buffer. That
would work.

That said, it's reasonably common for Go functions to panic when
called in such a way that they can not work. You can think of it as
verifying preconditions. If your program calls CopyBuffer with a
zero-sized buffer, your program is incorrect. You're making some sort
of fundamental error. A panic is not required for such a thing, but
it's not wrong.

Ian

Joshua Liebow-Feeser

unread,
Aug 9, 2015, 3:45:46 AM8/9/15
to golang-nuts, danny.v...@gmail.com
To add to what Ian said, the general principle that seems to be followed (though not often explicitly stated) is this: errors are for things that might happen even to a correctly-written program, while panics are for things that would never happen in a correctly-written program. That is, you should use panics to indicate that there is a bug in the program. This makes sense when you consider that errors can be dealt with nicely - it is up to you as the programmer what to do when a function returns an error. On the other hand, panics are very difficult and cumbersome to handle nicely - they're designed to crash your program. And if your program is written incorrectly, it should not try to recover from this - it should crash so that you know it is wrong and can debug it and fix it.

Danny

unread,
Aug 9, 2015, 8:08:36 AM8/9/15
to golang-nuts, danny.v...@gmail.com
Okay, I understand. That way of reasoning makes sense to me.

The only issue I can think of is when you write a library that itself allows the user to supply a buffer. If badly documented then that user can still get an unexpected panic. But that's more of a documentation issue than a coding issue.

Thank you for explaining this.

Danny

Joshua Liebow-Feeser

unread,
Aug 9, 2015, 2:03:29 PM8/9/15
to Danny, golang-nuts
I agree that the documentation could be improved.

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/Qg4CNgC7RPY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Uriel Fanelli

unread,
Aug 9, 2015, 3:19:28 PM8/9/15
to golang-nuts
Uhm... from the operation's department point of view, I don't agree
with this approach.

Ok, sure in a perfect world software has no bugs. In a perfect world
thigs which
should not happen aren't happening.

Now, put it in my position. You handover me your amazing server, which
I am supposed
to keep running 24/7, and 99.999 of SLA.
And you say "if any error" not only one thread will crash, but the whole
server will. And please notice, this server was started by a shell, and
I have no idea if
an attacker can get a shell just after the server crashes: I can't
predict the effects of a crash.

But let's forget the evil hacker: just imagine a thread runs in a
condition
which"should not happen", (which means I restart a firewall and all the
tcp connection
are cut off , by example) , or maybe we do a failover event on some
duplicated SAN.

What I expect as operations department is that some thread will die,
some logs will trigger some alarms,
but the server itself will not need a night callout for a full
restart.

If the code panick, and the server crashes at all, I need to call the
callout duty and ask to
check what happened and restore the working conditions. Which is a
cost.

So, can I suggest to keep in consideration the operation's department
point of view? :)

Maybe some flag like "-healty_panick" somewhere?

Just my two cents, of course.

Actually, when we commission anything, we relase NFR, non functional
requirements,
and we usually write we don't accept software which is not resilient to
several conditions,
which makes this position very un-popular. Ok, we are just a telco in
europe, don't take me wrong,
but were you need high uptime and you have operations working on
network and storages 24/7,
this position will make people very hostile.

And operations can be very harsh, when discussing SLA and OLA.

Uriel

Axel Wagner

unread,
Aug 9, 2015, 3:37:19 PM8/9/15
to low...@gmx.de, golang-nuts
Go software will panic all the time for all kinds of bugs. I don't think
this is at all relevant, for that reason. If a panic can screw up
production in this way, you are *already* pretty much screwed. Because
Servers and Programs fail all the time for all kinds of reason, you
usually build your Software with an architecture, that allows for that
kind of failure while still staying within SLA. "Leave the process
running, no matter what bugs it may contain" is simply not a very
usefull strategy for building resilient systems.
> --
> You received this message because you are subscribed to the Google Groups "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

Axel Wagner

unread,
Aug 9, 2015, 3:43:55 PM8/9/15
to Danny, golang-nuts
Danny <danny.v...@gmail.com> writes:
> The only issue I can think of is when you write a library that itself
> allows the user to supply a buffer. If badly documented then that user can
> still get an unexpected panic.

Nothing prevents you, as a library author, to do
if len(buf) == 0 {
return 0, errors.New("invalid buffer")
}
return io.CopyBuffer(dst, src, buf)

Danny

unread,
Aug 9, 2015, 5:47:11 PM8/9/15
to golang-nuts, danny.v...@gmail.com
Sure, that's a valid approach. I was mostly looking for how to recognize the line and where to draw it.

John Souvestre

unread,
Aug 9, 2015, 6:09:25 PM8/9/15
to golan...@googlegroups.com
> Go software will panic all the time for all kinds of bugs. I don't
> think this is at all relevant, for that reason. If a panic can screw up
> production in this way, you are *already* pretty much screwed. Because
> Servers and Programs fail all the time for all kinds of reason, you
> usually build your Software with an architecture, that allows for that
> kind of failure while still staying within SLA. "Leave the process
> running, no matter what bugs it may contain" is simply not a very
> usefull strategy for building resilient systems.

Planning for the worst does not mean ignoring the non-worst cases. Following your logic, all car crashes should be treated as fatal, hence there is no reason to wear seat belts or to install airbags.

Uriel said that he would be logging non-fatal errors and addressing them. He just wants to keep the system on line as much as possible while addressing them. With some errors you can, with some you can't.

John

John Souvestre - New Orleans LA



Axel Wagner

unread,
Aug 9, 2015, 6:46:40 PM8/9/15
to John Souvestre, golan...@googlegroups.com
John Souvestre <jo...@souvestre.com> writes:
> Planning for the worst does not mean ignoring the non-worst cases.
> Following your logic, all car crashes should be treated as fatal,
> hence there is no reason to wear seat belts or to install airbags.

That is just a horrifyingly cynical strawmen.

What I said was that your SLA is pretty much meaningless if you can't
handle the crash of a program without any problems. Unless you are
unrealistically good at writing incredibly resilient software. And in
that case, this one additional thing you have to check for your
unbelievable trackrecord of stable software won't really make a
difference anyway.

A hypothetical SLA was brought up as an argument for a change to a
language implementation detail. In my opinion, that is simply a very
weak argument. Because the solution to this hypothetical SLA-Problem is
"you should think better about your SLA".

> Uriel said that he would be logging non-fatal errors and addressing
> them. He just wants to keep the system on line as much as possible
> while addressing them. With some errors you can, with some you can't.

But what distinguishes this panic from *any other* panic? What makes the
"passing in a zero length buffer programming mistake" any different from
the "accessing a slice out of bounds programming mistake", or the
"accessing a nil map programming mistake", or the "deadlock programming
error"? Following the "never ever crash" logic, the runtime should just
log an error and return the zero value instead. Or do nothing. Or do
anything, except crash.

"Things that are 100% clearly a programming error panic" is a very
clear line to draw. A buggy program can behave in any buggy way it
wants, including inducing a domino-effect and taking down more systems.

And if you never ever want to crash (however bad an idea I consider this
notion) you are still free to recover. And resume in whatever way you
want (and in case you say "but I don't know what's the correct way to
recover is" -- that's exactly why it's a panic).

Best,

Axel

Uriel Fanelli

unread,
Aug 10, 2015, 2:42:17 PM8/10/15
to golan...@googlegroups.com

Well,

1) We can resist program crashes pretty weel. This is because we are
redundant in each building,
redundant in two buildings, and also georedundant in germany and
ireland. We manage more or less ~7500 nodes
so we are used to resist to a program crash. This is how ~500 Million
users, (~200 of them are machines, so called Machine2Machine traffic,
but the dumb consumer knows it as 'IoT') are served since a dozen of
years. We have 100% of global uptime in the last 6 years.

So I have no problem to resist to a program crash.

2) How it works now is easy: the *thread* , and not the whole server ,
is crashing.
Yes, it is. While crashing it logs. We open a Problem Investigation
Ticket (you know, ITIL)
we take a thread dump, and the vendor is going to tell us what was
wrong. Of course, if this
happens in a maintenance window authorized by the CAB (again, ITIL) ,
we strongly suspect what is the point.
Nevertheless, we ask the vendor to investigate and provide what we call
"hard facts".

I repeat: threads are crashing.

The problem, as I mentioned in the part of my email you did *not* read,
is about costs and escalations.

Golang is a language which aims to have a strong cuncurrency and
provides methods to easily create
threads.

My expectation is that a a thread is not making the whole server to
crash.

The reason I cannot make *the shit to hit the fan*, as you suggest, is
that this will:

1) Trigger the oncall duty.
2) Trigger a high priority ticket, which will escalate if not answered
in 10 minutes and solved in 60 minutes.
3) The escalation manager will also ask an investigation about the loss
of service, how many seconds, how many customers impacted.

and this means: costs.

In a data center, there is *always*discontinuity. When we have to
reflash a switch, by exaple , we create a failover event on the
duplicate,
and this will kill a few thread to crash here and there, in the
jeopardy of software *ANY* telco manages. The same when we recable the
machines.
The same when we failover a SAN fiber switch: regardless there is a
second route, some i/o retention will happen.

If everytime we need to change a switch we will need the oncall duty
support AND the vendor support (to restart an HLR is not easy you think)
the cost are going to blast. And expecially, some user cannot be
served. Which for you is maybe not a problem, for us it is.

I know this is very hard to explain. Most of people thinks information
tecnology is only about the new
fancy social network which makes the humankind aware of your last
cocktail. Many are thinking working in IT
is like having a desktop shaped like a swiss cow and having a billiard
pool room to relax.

Well, I am sorry to say there are infrastructures and also industry.
And I would remember you when you think
like "i call the police", "i call the fireguards", "i call the
ambulance", you actually mean "I PHONE to" . Which means,
losing a dozen of customers could cause a BIG problem. Do you know when
a old person is alone
with a "panic button" to push, this small box is sending a heartbeat
each minute? Do you know how many alarming systems
(shops, houses, museums, etc) are using sms to trigger alarms right
now? Each of them is sending also an heartbeat, and if this
is lost, an alarm is raised somewhere. And the same for industrial
machines. And the same for...

So please, if you think you can crash a server because of a single
thread, I can understand it: the humankind can survive
without some dozens of people unaware of a cool cocktail in some
fashion bar.

But, still, until you need emergency call, power, water, healtcare,
workplace security, please consider there is boring people
staying the whole day (and often night) in front of a terminal , with
no web stilesheet, making utilities working, and no,
we cannot loose "ust a dozen of customers transaction", just because we
like the shit hits the fan.

Before of going in our production, any software requires 3 steps of QA.
If we see a single thread can burn a whole server,
with hundreds of trasactions aborted , we simply don't permit it to go
in production.

Our applications written in Java are crashing thread by thread, and the
server not only keeps running: it keeps *operating*.
The same for the one written in C++ and ADA. We are testing some
written in erlang.

Golang looks very promising, and my suggestion was only to implement a
mechanism of panicking which
only affects one thread, and not crashing the whole server.

That's it: I just gave a suggestion. And maybe you are not even
entitled to teach operations to me, btw.

Axel Wagner

unread,
Aug 10, 2015, 3:11:35 PM8/10/15
to Uriel Fanelli, golan...@googlegroups.com
I am confused. You now sound like you are having a completely
*different* discussion, than the one this thread is about. This thread
is about the question of whether or not io.CopyBuffer should panic or
not, when passing a zero-sized buffer. This thread is in particular
*not* about the question of wether or not a panic that isn't recovered
should crash the process or not.

My point was (and still is): Bugs in a go program often lead to
panics. So, if you can't stand a panic, you shouldn't use go or make
*very* sure that you have no such bugs (or that you recover from every
panic). But that has nothing to do with the question if *this particular
programming error* should panic or not. This particular programming
error is in no way more likely to happen than any other programming
error that will lead to a panic, in fact I would say it is *far less*
likely to happen. So, if panics are fatal or not is irrelevant to the
discussion: If they are, another possible source of a panic *shouldn't*
change anything about your SLA. If they aren't, well, another possible
source of panic *won't* change anything about your SLA by definition.

Uriel Fanelli <uriel....@gmail.com> writes:
> The problem, as I mentioned in the part of my email you did *not* read,
> is about costs and escalations.

I did read your email. I just didn't agree with you that you made a
relevant argument *to this discussion*.

> My expectation is that a a thread is not making the whole server to
> crash.

Then you really should create a separate thread (as in "on the mailing
list"), where you suggest a mechanism to globally recover from panics
and have them not take down your process.

> Golang looks very promising, and my suggestion was only to implement a
> mechanism of panicking which only affects one thread, and not crashing
> the whole server.

Apparently that is the problem: I was assuming that you are talking
about the topic of the thread you were replying to.

> That's it: I just gave a suggestion. And maybe you are not even
> entitled to teach operations to me, btw.

I didn't try. I just (under the assumption that you are talking about
the thing this thread was about) pointed out that your argument is
invalid. If we drop the assumption and make this a discussion about
wether or not panics *in general* should take down the process if not
recovered, then, yes, your argument is a very different one and has some
merit.

So again: If you want to discuss, if panics *in general* should take
down a process, I suggest you create a thread for that. And leave this
thread for the discussion of whether or not *this particular*
programming error should be a panic or not.

Matt Harden

unread,
Aug 10, 2015, 3:23:52 PM8/10/15
to Uriel Fanelli, golan...@googlegroups.com
If you want to catch any panics generated within a goroutine then you can defer a function to recover() from the panic as you start each goroutine, like this. http://play.golang.org/p/K9Nr4byhx3. Everyone here is telling you that it's unwise for a server to try to continue running when clearly the program has encountered a programmer error, but if you choose to ignore this advice, you can use the above technique.

Andrew Gerrand

unread,
Aug 10, 2015, 8:28:23 PM8/10/15
to Uriel Fanelli, golang-nuts
On 11 August 2015 at 04:41, Uriel Fanelli <uriel....@gmail.com> wrote:
But, still, until you need emergency call, power, water, healtcare, workplace security, please consider there is boring people

This may sound facetious, and I apologize if it comes across that way, but I would be very disturbed if any of these critical services would be affected by a single process crashing.
 
Our applications written in Java are crashing thread by thread, and the server not only keeps running: it keeps *operating*.
The same for the one written in C++ and ADA.  We are testing some written in erlang.

In a shared memory environment when a thread (or goroutine) "crashes" for whatever reason, it is impossible to make any assurances about the correctness or reliability of the rest of the process. That's why a panicking goroutine can take down an entire Go process. This was a deliberate design decision.
 
Golang looks very promising, and my suggestion was only to implement a mechanism of panicking which
only affects one thread, and not crashing the whole server.

Such a mechanism exists, and it's called "recover". So if you are careful you can write software in Go that can recover from panics. But for the reason I gave above, you should be very judicious in using this technique.
 
And maybe you are not even entitled to teach operations to me, btw.

This kind of defensive statement adds nothing whatsoever to the discussion. Please stick to technical concepts rather than arguing from authority. Although if you do want to think in those terms, please consider Google's experience in building and operating production systems.

Andrew

Paul Borman

unread,
Aug 10, 2015, 9:27:16 PM8/10/15
to Andrew Gerrand, Uriel Fanelli, golang-nuts
cliff notes: only ever generate a non-recovered panic in a public library if memory safety has been compromised.

What I am taking away from all of this is:
  1. A goroutine can panic and take down your program.
  2. A panic should only be introduced in code when there is no possible other operation that can be done after this point.
  3. Go may not have the panic bar set high enough for some operations.
As an OS person, who spent most of his professional life inside the BSD kernel, I view every panic as something to be avoided (unless I am being lazy) and the operation should be made to fail, not bail.  Clearly the application domain is more lenient as crashing the kernel crashes everyone, while crashing a program only crashes that program (hopefully).  The degree to which that crashing of the application is acceptable is probably domain specific.

If memory safety has been compromised in Go, then a panic is probably the only sensible thing (unless there were some way to return to memory safety).  In the case of io.CopyBuffer, an error could easily be returnrf.  It is highly subjective to what degree a program could continue on if io.CopyBuffer were to fail.  Personally, I believe the application should decide, not the library.

Here at Google we design our systems to be tolerant of process failure because our processes can be killed (preempted, machine dies, memory fails) at anytime and without warning.  A panic here or there is no different (and not considered a big deal).  At Google it is probably better to just crash and start up again and hopefully someone will take a look at the panic.  We also don't really write command line utilities at Google, we write servers.  Reporting failure on a command line utility, as opposed to a panic, is probably more helpful to the user.  Having an error on one input (that does not cause a loss of memory safety) probably should not cause a program to panic, even if technically the program is not written correctly (its math generated a zero length buffer that then got passed to io.CopyBuffer because it calculated no data would be transferred).

--

Bakul Shah

unread,
Aug 10, 2015, 11:13:01 PM8/10/15
to Uriel Fanelli, golan...@googlegroups.com
I don't know if this is workable but one idea is to make a
disciplined use of Go and just use channels as a means of
communication between goroutines and don't use any unsafe
operations. Now if a goroutine panics, it has only damaged
its own state. So after a panic you only need to deal with
channels.

If your C++ or Java server keeps running in spite of crashes
I expect you are using these languages in a disciplined way
as well.

Another suggestion is to fork() a bunch of servers so if one
process dies, the others continue servicing.

In general mechanisms that work in one language do not map
easily to another. What may seem easy to implement to you may
not be so. On the other hand once you become familiar with a
language you can usually find a way to solve a problem. So if
I were you I'd start playing with Go and see what works well
and what doesn't and only then decide on making a bigger bet
on Go.

Paul Borman

unread,
Aug 10, 2015, 11:19:44 PM8/10/15
to Bakul Shah, Uriel Fanelli, golang-nuts
Hi Bakul,

I think your idea about not using usafe operations is a good one.  The main issue with the concept of your goroutine solution is that goroutines are being created all the time, without your knowledge.  If one of them panics, game over (which is why I believe only operations that break memory safety should cause unrecovered panics).  The idea with fork, (should be exec.Command) is good, just as long as the main process is pretty much locked down solid so it will "never" fail.   It is actually similar to what is done at Google, though we have a specially designed "main process" called borg that keeps restarting your jobs if they crash and burn (and what also might kill you off to make room for someone more important).

    -Paul

Bakul Shah

unread,
Aug 11, 2015, 12:49:13 AM8/11/15
to Paul Borman, Uriel Fanelli, golang-nuts
On Mon, 10 Aug 2015 20:19:23 PDT Paul Borman <bor...@google.com> wrote:
>
> I think your idea about not using usafe operations is a good one. The main
> issue with the concept of your goroutine solution is that goroutines are
> being created all the time, without your knowledge. If one of them panics,

Aaand I was forgetting all the runtime shared state!

The idea was to protect against application errors but if
application code can panic runtime, all bets are off!

> game over (which is why I believe only operations that break memory safety
> should cause unrecovered panics). The idea with fork, (should be
> exec.Command) is good, just as long as the main process is pretty much
> locked down solid so it will "never" fail. It is actually similar to what

Here the unit of recovery is a process. Uriel Fanelli wants
thread level recovery but I don't think it will be possible
with Go.

> is done at Google, though we have a specially designed "main process"
> called borg that keeps restarting your jobs if they crash and burn (and
> what also might kill you off to make room for someone more important).

Similar orchestration software will be needed for any large
cluster because you not only have to start/kill/restart
processes & nodes but also manage process command line options
& resources. And this can add its own complexity!

Matt Harden

unread,
Aug 11, 2015, 9:38:37 AM8/11/15
to Bakul Shah, Paul Borman, Uriel Fanelli, golang-nuts
I don't agree with Paul that we should only panic if memory safety has been compromised. This is certainly not the case today, as for example trying to read a nonexistent map entry (without comma ok) causes a panic. I like the current guideline that if there's a clear programmer error (bug) we should panic. There is no valid reason to provide a 0-size buffer to io.CopyBuffer, so this qualifies. I agree it can be subjective and a judgement call, but IMO if a good argument can be made that a certain condition can only indicate a bug in the program, we should panic immediately. This way the stack traces will be closer to the actual bug in the code. If we were to continue on without panicking, we can lose that valuable information, and since we know there is a bug, it's likely that further unwanted events will happen as we continue to run.

Ian Lance Taylor

unread,
Aug 11, 2015, 10:59:51 AM8/11/15
to Matt Harden, Bakul Shah, Paul Borman, Uriel Fanelli, golang-nuts
On Tue, Aug 11, 2015 at 6:38 AM, Matt Harden <matt....@gmail.com> wrote:
>
> I don't agree with Paul that we should only panic if memory safety has been
> compromised. This is certainly not the case today, as for example trying to
> read a nonexistent map entry (without comma ok) causes a panic.

This turns out not to be the case. Reading a non-existent map entry
returns the zero value of the map's value type.

Ian

Paul Borman

unread,
Aug 11, 2015, 11:55:54 AM8/11/15
to Matt Harden, Bakul Shah, Uriel Fanelli, golang-nuts
Matt, when running jobs in Google production I agree, it is the way Google works.  When I put on my OS hat I disagree.  I will stand by my belief that a public package or runtime should only panic when it has no other choice (no possible way to return an error).  For package main, or a private package (a package only for your application), feel free to panic at any time :-)  The go runtime panics with index out of range, but that is simply because there is no way to return the error, except by panic, similar to a wonky pointer in C.  I still assert the panic in io.CopyBuffer is simply gratuitous as there was a perfectly good solution that did not tear down the processes needlessly.  The assertion that it is likely that further unwanted events will happen is not particularly persuasive to me.  One could argue we should just have func init() { panic("your program probably will have bugs") } and be done with it! :-)  <-- please see smiley face!

Anyhow, back to io.CopyBuffer.  Take this simple program:

package main

import (
"bytes"
"io"
)

func main() {
var a, b bytes.Buffer
buf := make([]byte, a.Len())
io.CopyBuffer(&b, &a, buf)
}

In my mind, there is no good reason why that program should panic.  You might argue that it does nothing, that it is silly to copy an empty buffer, or that I should have just used b.Write(a.Bytes()), but still, a panic?

Anyhow, it is fine to disagree.  I think the panic in io.CopyBuffer is over the top, but clearly the author and the reviewer did not agree.

On Tue, Aug 11, 2015 at 6:38 AM, Matt Harden <matt....@gmail.com> wrote:

Manlio Perillo

unread,
Aug 11, 2015, 11:57:25 AM8/11/15
to golang-nuts, low...@gmx.de


Il giorno domenica 9 agosto 2015 21:19:28 UTC+2, Uriel Fanelli ha scritto:
Uhm... from the operation's department point of view, I don't agree  
with this approach.

Ok, sure in a perfect world software has no bugs. In a perfect world  
thigs which
should not happen aren't happening.

Now, put it in my position. You handover me your amazing server, which  
I am supposed
to keep running 24/7, and 99.999 of SLA.
And you say "if any error" not only one thread will crash, but the whole
server will. And please notice, this server was started by a shell, and  
I have no idea if
an attacker can get a shell just after the server crashes: I can't  
predict the effects of a crash.


Erlang was designed to solve this kind of problems.

> [...]

Regards  Manlio

fatdo...@gmail.com

unread,
Aug 11, 2015, 12:09:25 PM8/11/15
to golang-nuts, jo...@souvestre.com


On Sunday, August 9, 2015 at 6:46:40 PM UTC-4, Axel Wagner wrote:
John Souvestre <jo...@souvestre.com> writes:
> Planning for the worst does not mean ignoring the non-worst cases.
> Following your logic, all car crashes should be treated as fatal,
> hence there is no reason to wear seat belts or to install airbags.

That is just a horrifyingly cynical strawmen.

What I said was that your SLA is pretty much meaningless if you can't
handle the crash of a program without any problems. Unless you are
unrealistically good at writing incredibly resilient software. And in
that case, this one additional thing you have to check for your
unbelievable trackrecord of stable software won't really make a
difference anyway.

A hypothetical SLA was brought up as an argument for a change to a
language implementation detail. In my opinion, that is simply a very
weak argument. Because the solution to this hypothetical SLA-Problem is
"you should think better about your SLA".

library implementation detail
 

fatdo...@gmail.com

unread,
Aug 11, 2015, 12:17:03 PM8/11/15
to golang-nuts
I'm not sure a library should make such fatal decisions, just return an error.

Matt Harden

unread,
Aug 11, 2015, 3:13:46 PM8/11/15
to Ian Lance Taylor, Bakul Shah, Paul Borman, Uriel Fanelli, golang-nuts
Doh! I knew that. Still many other programmer mistakes do cause panics, such as closing a nil channel. That could just as easily have been made a no-op. I for one would rather see a panic than have Go silently ignore bugs like that.

Axel Wagner

unread,
Aug 11, 2015, 3:21:55 PM8/11/15
to fatdo...@gmail.com, golang-nuts
fatdo...@gmail.com writes:
> I'm not sure a library should make such fatal decisions, just return an
> error.

I grepped "panic(", to get a rough approximation of a list of stdlib
packages that panic somewhere. This is only an approximation: Some of
them will use it as a controlflow device, some will be completely
reasonable (like runtime/syscall), some *might* use it in a comment? But
most are probably just the regular use of panic as proposed here. So, an
approximated (and maybe unfair) list of stdlib packages currently (on go
1.4) using panic:

archive/zip
bufio
builtin
bytes
compress/bzip2
compress/flate
compress/zlib
crypto
crypto/aes
crypto/cipher
crypto/des
crypto/ecdsa
crypto/md5
crypto/rand
crypto/rc4
crypto/rsa
crypto/sha1
crypto/sha256
crypto/sha512
crypto/subtle
crypto/tls
crypto/x509
database/sql
debug/gosym
encoding/asn1
encoding/binary
encoding/gob
encoding/json
encoding/xml
flag
fmt
go/ast
go/build
go/doc/testdata
go/parser
go/printer
go/printer/testdata
go/scanner
go/token
html
html/template
image/jpeg
image/png
io/ioutil
liblink
log
math/big
math/cmplx
math/rand
mime
mime/multipart
net
net/http
net/http/cgi
net/http/cookiejar
net/http/httptest
net/http/httputil
net/mail
net/rpc
net/rpc/jsonrpc
net/textproto
os
os/signal
path/filepath
reflect
regexp
regexp/syntax
runtime
runtime/cgo
runtime/pprof
runtime/race/testdata
strconv
strings
sync
sync/atomic
syscall
testing
text/tabwriter
text/template
text/template/parse
time
unicode/utf8

Axel Wagner

unread,
Aug 11, 2015, 3:24:31 PM8/11/15
to golang-nuts
fatdo...@gmail.com writes:
> library implementation detail

No. It is not an implementation detail, it is part of the API (that's
why it's mentioned in the comment). And the stdlib is *pretty much* part
of the language. It's as close as it gets without restricting yourself
completely to the spec.

Also: Semantics.

Dan Kortschak

unread,
Aug 11, 2015, 7:11:57 PM8/11/15
to Paul Borman, Matt Harden, Bakul Shah, Uriel Fanelli, golang-nuts
This can be qualified. In the gonum matrix package we have pretty extensive use of panic for dimension errors and similar (basically in the class of programmer errors, but this is arguable. The basis for this is that the parenthesis below is extended to be "no possible way to return an error that is guaranteed to be checked". The rationale for this is that the matrix package is a response to the need for improved safe scientific computing where noisy failure is far more important than continuing to run.

Axel Wagner

unread,
Aug 11, 2015, 8:03:22 PM8/11/15
to Paul Borman, Matt Harden, Bakul Shah, Uriel Fanelli, golang-nuts
Hi,

I don't understand the discussion. The condition of a panic
is clearly documented and it's 8 lines of completely straightforward
code to wrap the stdlib version of CopyBuffer into a panic-safe version:
http://play.golang.org/p/rbD-gT1t7V
If you are concerned about your software panicing, just add this to your
project and mandate the use of this wrapper instead of the function
provided by the io package.

And why is this creeping up now, when actually a lot, if not most, of
stdlib packages (I found panics in 120 of 200 packages, but the counts
are only approximate on both ends, because lazy) panic under some
condition or the other already?


'Paul Borman' via golang-nuts <golan...@googlegroups.com> writes:
> Matt, when running jobs in Google production I agree, it is the way Google
> works. When I put on my OS hat I disagree. I will stand by my belief that
> a *public* package or runtime should only panic when it has no other choice
> (no possible way to return an error).

That is your right and privilege, but the "fail loudly" approach also
applies to interactive software (very probably even *more* there). And
what would you rather have in a bug-report: A logline saying "Error:
passed zero buffer to io.CopyBuffer", or a stacktrace containing the
same string, but also all the calls leading up to it and the precise
location of the call? I'd prefer the second one.

> The go runtime panics with index out of range, but that is simply
> because there is no way to return the error, except by panic, similar to a
> wonky pointer in C.

That is untrue. Map accesses don't panic. It is absolutely at the
discretion of the language and runtime to do whatever it likes on an
error. If the language spec prescribed, that out-of-bounds accesses
shouldn't panic, they wouldn't and instead (maybe) the zero value would
be returned, with an optional ,ok idiom to test for invalid
accesses.

Panics are a part of the design of go and they are in general used,
when an error is clearly a programming error. This applies both to
programming errors fetched by the runtime (close of a closed channel,
send to a closed channel, out-of-bounds accesses, …) and programming
errors fetched by some package in the stdlib (see my other mail in this
thread, and above. A *lot* of stdlib packages panic when catching
programming errors).

> I still assert the panic in io.CopyBuffer is simply gratuitous as
> there was a perfectly good solution that did not tear down the
> processes needlessly.

But "needlessly" is an opinion (to which you are entitled, but not
everyone needs to agree). See above for my technical argument for my
(and other peoples) opinion, that programming errors should *always*
panic when catched. It creates much more usefull bug-reports. As
bugs are by definition abnormal, I don't buy the argument, that a
program should output clear error messages to users. Because, yes, of
course it should, but when you pass a zero buffer to io.CopyBuffer, the
program clearly doesn't behave as it should already and that issue should
be resolveable as fast and easy as possible.

Complaining that your program will behave user-unfriendly in the
presence of a bug seems a bit alien to me. Of course it does. It's buggy.

> In my mind, there is no good reason why that program should panic. You
> might argue that it does nothing, that it is silly to copy an empty buffer,
> or that I should have just used b.Write(a.Bytes()), but still, a
> panic?

I would argue that this example is very contrived. There is no reason to
use CopyBuffer in this way. Contrived examples are relatively
useless. Yes, I agree that it wouldn't matter, if this program doesn't
panic. But it doesn't make sense to build it (or something like it) in
the first place, so I don't think that we can derive anything for the
general case for it. *And* I don't think it is *wrong* for this program
to panic too. It's not wrong to not panic doesn't mean it's wrong to
panic. Either way is fine for *this program*.

> Anyhow, it is fine to disagree.

Agreed :)

> I think the panic in io.CopyBuffer is over
> the top, but clearly the author and the reviewer did not agree.

As I said above: If you are unhappy and want an unpanicy version of
CopyBuffer, it's only 8 lines of go-fmted code away. :)

Best,

Axel

Michael Jones

unread,
Aug 12, 2015, 9:47:00 AM8/12/15
to Axel Wagner, Paul Borman, Matt Harden, Bakul Shah, Uriel Fanelli, golang-nuts
Or… http://play.golang.org/p/G9pbYjoVGU


Michael Jones, CEO • mic...@wearality.com+1 650 656-6989
Wearality Corporation • 289 S. San Antonio Road • Los Altos, CA 94022

Bakul Shah

unread,
Aug 12, 2015, 11:07:17 AM8/12/15
to Dan Kortschak, Paul Borman, Matt Harden, Uriel Fanelli, golang-nuts
I agree with Paul here. A public package that panics instead of returning an error makes for a fragile system and forces a defensive programming style. Not to mention each defense will be different (as we already saw in this CopyBuffer case). How would you like it if the OS panicked instead of returning an error on a syscall with invalid arguments? Ultimately it must be the user who is in charge. A service must check its inputs and refuse to provide service if given invalid inputs but killing a client seems extreme. Particularly when Go makes it so easy to return errors. Panic *is* the right response if some internal consistency check fails but not for client mistakes.

The fact that so many stdlib packages can panic is not an argument in favor of panicking; it is an argument in favor of critically (re)examining these panics : )

Matt Harden

unread,
Aug 12, 2015, 1:17:43 PM8/12/15
to Michael Jones, Axel Wagner, Paul Borman, Bakul Shah, Uriel Fanelli, golang-nuts
Or let CopyBuffer do the allocation - http://play.golang.org/p/Z53G2wSck6.

Matt Harden

unread,
Aug 12, 2015, 1:27:14 PM8/12/15
to Bakul Shah, Dan Kortschak, Paul Borman, Uriel Fanelli, golang-nuts
On Wed, Aug 12, 2015 at 10:07 AM Bakul Shah <ba...@bitblocks.com> wrote:
I agree with Paul here. A public package that panics instead of returning an error makes for a fragile system and forces a defensive programming style.

How does this force a defensive programming style?
 
Not to mention each defense will be different (as we already saw in this CopyBuffer case). How would you like it if the OS panicked instead of returning an error on a syscall with invalid arguments?

If it killed the currently running process / thread I would be fine with it. A syscall with invalid arguments is a programmer error.
 
Ultimately it must be the user who is in charge. A service must check its inputs and refuse to provide service if given invalid inputs but killing a client seems extreme.

The guideline is to panic on *bugs* - *programmer errors*, not invalid user inputs.
 
Particularly when Go makes it so easy to return errors. Panic *is* the right response if some internal consistency check fails but not for client mistakes.

Panic is also the right response when a clear bug in the code is detected by a library, to maximize the information the programmer has to fix said bug. And when panics are documented well, which is the case here, no programmer can claim he wrote correct code that panicked unexpectedly.

The fact that so many stdlib packages can panic is not an argument in favor of panicking; it is an argument in favor of critically (re)examining these panics : )

Since my programs seldom panic in the absence of programmer error, I conclude that the stdlib authors have done a fantastic job of putting panics in just the right places. :-)

Paul Borman

unread,
Aug 12, 2015, 1:58:28 PM8/12/15
to Matt Harden, Bakul Shah, Dan Kortschak, Uriel Fanelli, golang-nuts
I must take issue with this statement:
 
If it killed the currently running process / thread I would be fine with it. A syscall with invalid arguments is a programmer error.

You may be fine, but you are only one customer of the OS.  A system call to the operating system should always return if at all possible.  The only reason it should ever kill the process is because the system call cannot return and cannot move forward.  The order of strictness is: OS > system library > application library > application.  The Go standard library should be held to standards much higher than application libraries.

Vending 3 different operating systems over a span of 25 years taught me well that it is not okay for the OS to decide some returnable error is worth tearing down the process.   Bugs will be filed and customers will be unhappy if you do.  The same essentially holds true for system libraries (such as the c library).  Calling abort was never done lightly.

Joshua Liebow-Feeser

unread,
Aug 12, 2015, 2:09:05 PM8/12/15
to Matt Harden, Bakul Shah, Dan Kortschak, Paul Borman, Uriel Fanelli, golang-nuts
This has been a very long thread, and I haven't read all of it, so forgive me if I'm repeating what someone else has said.

IMO, the important things about mistakes of the sort that we're discussing is that they're a) introduced at program-writing time and, b) can be completely avoided by a correctly-written program. I don't mean to say this to be condescending - obviously everybody writes "incorrect" programs in the sense that they have bugs. The point is that unlike things like disk and network errors, which even a completely flawless program can't avoid, things like passing a 0-length buffer to a function that forbids it is a condition which can be avoided with 100% certainty by a correctly-written program.

I think a good analogy for this sort of situation is a static type system. In Go (and many other languages), many errors are literally impossible. I am quite sure that everyone in this thread programs in Go without the fear that their ints are actually floats, or that their strings are actually mutable, because these are conditions that the compiler guarantees are impossible (so long as you don't use unsafe, anyway). As a result, I'm sure none of you put checks in your program to make sure that your ints are really ints or that your strings are really immutable. You trust that the compiler has guaranteed these statically, and on the off chance that the compiler has a bug, you'd probably rather your program crash so you can debug the compiler than have your program continue along, silently giving incorrect answers. To paraphrase the CTO of Joyent, a person who worries about production-scale reliability for a living, "may all of your failures be hard failures, and may none of your bugs simply silently produce incorrect results."

Now, it'd be nice if Go's type system were able to statically prevent things like passing a 0-length buffer to io.CopyBuffer, but it can't. As a result of this limitation, the idiom has arisen that the standard library uses panics in cases in which an error was made that indicates that the program is incorrect - the sort of errors that a more powerful type system might be able to statically prevent.

So it seems to me that what we have is a philosophical disagreement about what programs should do if they enter incorrect state. The position that the standard library seems to take, and the one that I'm inclined to agree with, is that, at a very fundamental level, it's not even clear what it means for a program to "recover" from an error like using an int as a float or dividing by zero. You have no guarantees that, if you are to continue, your results will even be correct, and unless the point of computation is simply to continue computing things, I would think that not computing at all is a much better option than computing with no idea of whether the answers you're getting are correct. It's kind of like the difference between saying "I don't know the answer to this math problem" and saying "I have no idea if I'm right, but the answer certainly could be 375; let's go with that."

And, to bring it back to the specifics of the discussion at hand, a single goroutine does have the ability to affect the correctness of the program as a whole. Simply restarting the goroutine is a very dangerous approach to take, because it would mean that programmers would be blind to when their programs were in incorrect state. Now, that's not to say that every program works that way. If your program has the property that goroutines really can be handled independently, then that's fine, and you can go ahead and catch the panic and restart the job. But that's not always the case, and Go can't assume it would be. It makes sense for the default choice to be to crash in the face of a panic, and it makes sense for io.CopyBuffer to panic if it's given arguments which are explicitly documented as being incorrect (well, explicitly documented now... that was certainly a useful addition).


--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/Qg4CNgC7RPY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Axel Wagner

unread,
Aug 12, 2015, 4:35:55 PM8/12/15
to golang-nuts
'Paul Borman' via golang-nuts <golan...@googlegroups.com> writes:
> A system call to
> the operating system should always return if at all possible. The only
> reason it should *ever* kill the process is because the system call cannot
> return and cannot move forward.

For example, executing a syscall after entering SECCOMP mode will
immediately kill the program (as opposed to return some bogus value
*which it could*).

Yes, I know "that's completely different". It's the only example I know,
by coincidence, from the top of my head, not a complete expert :)

Bakul Shah

unread,
Aug 12, 2015, 5:32:54 PM8/12/15
to Matt Harden, Dan Kortschak, Paul Borman, Uriel Fanelli, golang-nuts
On Aug 12, 2015, at 10:26 AM, Matt Harden <matt....@gmail.com> wrote:

On Wed, Aug 12, 2015 at 10:07 AM Bakul Shah <ba...@bitblocks.com> wrote:
I agree with Paul here. A public package that panics instead of returning an error makes for a fragile system and forces a defensive programming style.

How does this force a defensive programming style?

Excessive argument checking is defensive programming. You’d have to check the argument in a wrapper function and take some corrective action. A library function has to check its argument and in that case it may as well return an error instead of panicing.

 
Not to mention each defense will be different (as we already saw in this CopyBuffer case). How would you like it if the OS panicked instead of returning an error on a syscall with invalid arguments?

If it killed the currently running process / thread I would be fine with it. A syscall with invalid arguments is a programmer error.

This is not fine in general. In general this behavior will penalize a user who just happens to use some software that has an obscure bug. And your OS/package will get a reputation for being brittle.

Example: Your favorite editor has an obscure bug and crashes the system. Not only will you lose your data but debugging this will be very difficult. I ran into a write bug on an setup of the Acme editor+osx+fuse+sshfs mounted filesystems. It took a long time to diagnose. Can you imagine how hard it would have been to catch if the OS crashed on me each time? I would’ve probably just given up on using Acme.

 
Ultimately it must be the user who is in charge. A service must check its inputs and refuse to provide service if given invalid inputs but killing a client seems extreme.

The guideline is to panic on *bugs* - *programmer errors*, not invalid user inputs.

By input I meant input arguments to exported functions. If the args are not valid, clearly there is a bug in the client code so pass back an error and let the client worry about it.

 
Particularly when Go makes it so easy to return errors. Panic *is* the right response if some internal consistency check fails but not for client mistakes.

Panic is also the right response when a clear bug in the code is detected by a library, to maximize the information the programmer has to fix said bug. And when panics are documented well, which is the case here, no programmer can claim he wrote correct code that panicked unexpectedly.

Passing back the right error gives the programmer that information. The guideline should be: if the bug is in your own code and there is no way to proceed further, panic. If the bug is in the caller’s code and there is a way to pass back an error, pass back the error! While all code should be tested exhaustively, it is simply not possible to do thorough checking. your unit tests check the barest minimum; they don’t check combinations. So runtime errors are virtually impossible to stamp out (except in very constrained environment or where money and schedule are liberal). Resiliency (dealing gracefully with errors) is pretty much a requirement if you have a large number of modules/programs/subsystems developed by a diverse set of people.


The fact that so many stdlib packages can panic is not an argument in favor of panicking; it is an argument in favor of critically (re)examining these panics : )

Since my programs seldom panic in the absence of programmer error, I conclude that the stdlib authors have done a fantastic job of putting panics in just the right places. :-)

For you! 



Axel Wagner

unread,
Aug 12, 2015, 6:00:30 PM8/12/15
to golang-nuts
Bakul Shah <ba...@bitblocks.com> writes:
> Example: Your favorite editor has an obscure bug and crashes the
> system. Not only will you lose your data but debugging this will be
> very difficult. I ran into a write bug on an setup of the Acme
> editor+osx+fuse+sshfs mounted filesystems. It took a long time to
> diagnose. Can you imagine how hard it would have been to catch if the
> OS crashed on me each time?

I think, far easier, actually. Program X has a bug, syscalls with wrong
arguments, dumps core. Attach gdb to coredump, dump
stacktrace. Stacktrace will contain location of buggy syscall. Hunt for bug.

I don't know, if I think, the system should just crash programs with
buggy arguments. But I do actually think, it would make bugs easier to
find and debug :) And as I said before: A bug in the program is
*already* user unfriendly. Crashing it just makes it easier to debug and
eliminate this unfriendliness.

> By input I meant input arguments to exported functions. If the args
> are not valid, clearly there is a bug in the client code so pass back
> an error and let the client worry about it.

But that produces (at best, or might silently ignore the problem, if the
error is forgotten to check, not entirely unlikely in a CopyBuffer call,
as you might assume that a write to a buf.Writer never fails) useless
error messages that can't possibly be used to debug the problem.
A crash produces a bugreport that you can use to bughunt this instant.

> Passing back the right error gives the programmer that information.

No. It will most likely produce an Error message in a log, or a crash
saying that "passing zero buffer to CopyBuffer", without a line number
or anything, or ignore the error (see above) or will crash the program
anyway (with before message, but again without telling us *which*
CopyBuffer). In theory, yes, we get all the information. In *practice*
there will be most of the information lost when it reaches the dev who
then has to manually hunt the correct line by trying to reproduce the bug

Compare with a panic, which gives the message too, and a nice stacktrace
to hunt down the problem.

> The guideline should be: if the bug is in your own code and there is
> no way to proceed further, panic. If the bug is in the caller’s code
> and there is a way to pass back an error, pass back the error!

I disagree and I think there can be nothing else but agreement of
disagreement in this argument (for everyone involved). Every argument of
both sides has been rehashed ad infinitum already :)

> Resiliency (dealing gracefully with errors) is pretty much a
> requirement if you have a large number of modules/programs/subsystems
> developed by a diverse set of people.

But resiliency is not ignoring bugs. It's also not letting a hidden bug
create a domino-reaction of bugs (e.g. making the buffer-bug, forgetting
to check the error (see above), thus thinking a copy was successfull,
overwriting a file with zeros,…).

Handling errors gracefully: of course. Shadowing bugs: absolutely
not. When you have a bug it's a bad idea to continue on. It could
compound with other bugs to destroy data or do all kinds of bad things.

But, again. This has been rehashed ad infinitum. :)

Paul Borman

unread,
Aug 12, 2015, 6:37:25 PM8/12/15
to Axel Wagner, golang-nuts
I think, far easier, actually. Program X has a bug, syscalls with wrong
arguments, dumps core. Attach gdb to coredump, dump
stacktrace. Stacktrace will contain location of buggy syscall. Hunt for bug.

I believe this is the crux of it.  There is more than one target audience.  If you are a developer and only write programs for your own internal consumption by other developers (virtually all Google servers fall into this category) then the stack trace, panic, whatever doesn't seem so bad.  If you write programs for external consumption, the resulting mess of a panic and a few hundred thousand lines of stack traces is probably not what you want to present to the customer.  As an OS vendor, most of our customers were not developers who would appreciate what Go produces when an application panics.  This is why productized (different than production) C based applications would often trap all signals, allowing the application to present the error to the user, rather than just "bus error: core dumped".

Joshua Liebow-Feeser

unread,
Aug 12, 2015, 6:48:10 PM8/12/15
to Paul Borman, Axel Wagner, golang-nuts
Again, though, panics can be caught, and probably should if you really want your application to keep going. The issue in contention isn't whether panics should be caught - if you want to catch them, go ahead and catch them. The issue is whether the panic should happen in the first place.

Given that, I think it's useful to consider the difference in programming style between panicking and returning errors (since they are functionally equivalent, even if the former requires uglier code). The difference as I see it is that, besides being more serious, panics also naturally lend themselves to being caught only in certain places, while errors lend themselves to being handled immediately upon the return of the function returning the error. This, I think, is compatible with the argument you're making. If you want one goroutine's failure not to affect the rest of the program, you can catch the panic, but it probably makes sense to recover from it it at the top level, rather than having a bunch of different defer-recovers all throughout your code. After all, you don't care where or why the panic happened - it's already too late to recover the correctness of this particular goroutine if you have a bug that caused a panic - you only care that it doesn't bring down your whole application. Panics and recovers seem well suited to this.

Errors, on the other hand, usually encourage finer-grained handling, and that's usually what you want when you get an error. Writing to the temp file failed? Maybe make another temp file. Maybe try without a temp file. Those are errors that you understand well, and you know how they affect the correctness of your overall program. Division by zero? Your program probably doesn't really have the ability to isolate the source of the problem and recover from it, so it makes sense to use a less granular approach.

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/Qg4CNgC7RPY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Axel Wagner

unread,
Aug 12, 2015, 6:55:59 PM8/12/15
to Paul Borman, golang-nuts
Paul Borman <bor...@google.com> writes:
> I believe this is the crux of it. There is more than one target audience.
> If you are a developer and only write programs for your own internal
> consumption by other developers (virtually all Google servers fall into
> this category) then the stack trace, panic, whatever doesn't seem so bad.
> If you write programs for *external* consumption, the resulting mess of a
> panic and a few hundred thousand lines of stack traces is probably not what
> you want to present to the customer.

And a crypting, to them nonsensical error message is better? And, don't
forget, you can a) recover panics and do whatever you want with them, if
need be in every goroutine to catch all crashing panics and b) without
any problem wrap your software in e.g. a shell-skript that slurps a
potential stacktrace and sends it to the developer. If it is important
to you to get debugability *and* a nice UX.

> As an OS vendor, most of our
> customers were *not* developers who would appreciate what Go produces when
> an application panics. This is why productized (different than production)
> C based applications would often trap all signals, allowing the application
> to present the error to the user, rather than just "bus error: core
> dumped".

It seems trivial to me to wrap that binary (with a fork(), if need be,
otherwise with a shell-script) and when it dies, extract the signal from
the error code and, if it's due to an "illegal argument" signal, present
a nice error message and the option, to send a bugreport including the
coredump to the developer. And all of that while captchuring all
necessary state to immediately debug the program *and* pertaining
safeness from further corruptions by accumulated bugs.

User experience actually increases together with debugability by
crashing a program, as opposed to let it run wild.

Paul Borman

unread,
Aug 12, 2015, 7:14:57 PM8/12/15
to Axel Wagner, golang-nuts
You cannot recover panics that happen in other goroutines (since this discussion is no longer about just io.CopyBuffer but the philosophy of panics).

Someone making nonsensical error messages is a problem, returning errors is not.

Anyhow, my experience has taught me that you abort/panic as last resort if you are producing a product for sale and/or wide external use.  Others may have had different experiences.  I am glad I am not currently writing software for sale, I can be much more lazy when my customer is just other engineers.

I doubt I will move over to the "panics are cool if I think the calling program is broken."  Sounds like others will not move out of it.  Nothing is perfect, and perfection is subjective.

Joshua Liebow-Feeser

unread,
Aug 12, 2015, 7:18:11 PM8/12/15
to Paul Borman, Axel Wagner, golang-nuts
On Wed, Aug 12, 2015 at 4:14 PM, 'Paul Borman' via golang-nuts <golan...@googlegroups.com> wrote:
You cannot recover panics that happen in other goroutines (since this discussion is no longer about just io.CopyBuffer but the philosophy of panics).

Someone making nonsensical error messages is a problem, returning errors is not.

Anyhow, my experience has taught me that you abort/panic as last resort if you are producing a product for sale and/or wide external use.  Others may have had different experiences.  I am glad I am not currently writing software for sale, I can be much more lazy when my customer is just other engineers.

I doubt I will move over to the "panics are cool if I think the calling program is broken."  Sounds like others will not move out of it.  Nothing is perfect, and perfection is subjective.
Serious question - do you not consider calling a function using arguments that are explicitly documented as being incorrect a form of being broken? I'm not asking sarcastically - I think that may be a difference of opinion, but it's an interesting discussion to have if we do in fact disagree about it.
 

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/Qg4CNgC7RPY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Paul Borman

unread,
Aug 12, 2015, 7:31:21 PM8/12/15
to Joshua Liebow-Feeser, Axel Wagner, golang-nuts
Serious question - do you not consider calling a function using arguments that are explicitly documented as being incorrect a form of being broken? I'm not asking sarcastically - I think that may be a difference of opinion, but it's an interesting discussion to have if we do in fact disagree about it.

To me, that is not the issue at hand.  The issue is that there is a panic there at all, documented or not.  There is no reason for the either the panic or the comment in the documentation that tries to justify the panic.  Using a panic for this is, in my personal opinion, a bad use of panic.  If a function documents "do not send 42 as parameter x" then callers should not pass 42 as x.  Of course, that doesn't mean the function is well designed because it documented it.  I hope to not see a proliferation of calls to panic in standard library code as I believe it will not benefit the long term usefulness and adoption rate of Go outside of the current user base.  Just my opinion.

Andy Maloney

unread,
Aug 12, 2015, 7:34:43 PM8/12/15
to golang-nuts, axel.wa...@googlemail.com
only ever generate a non-recovered panic in a public library if memory safety has been compromised

I have to agree 100% with Paul's take on it.  Not everyone is using Go for servers or command-line tools. One use case I have not seen in this thread is the one I'm looking at: using Go to create a C archive that I can link into a C++ application.  If Go is to be useful in this case, then anything in the standard lib that can return an error instead of panicking should do so.

Axel Wagner

unread,
Aug 12, 2015, 7:40:22 PM8/12/15
to Andy Maloney, golang-nuts
Andy Maloney <asma...@gmail.com> writes:
> I have to agree 100% with Paul's take on it. Not everyone is using Go for
> servers or command-line tools. One use case I have not seen in this thread
> is the one I'm looking at: using Go to create a C archive that I can link
> into a C++ application. If Go is to be useful in this case, then anything
> in the standard lib that can return an error instead of panicking should do
> so.

One Question: why? i.e. why do the arguments brought up in this thread
for programs written in go do not apply to this usecase in the exact
same way? :)

Andy Maloney

unread,
Aug 12, 2015, 7:45:48 PM8/12/15
to golang-nuts, asma...@gmail.com
Everything I've read in this (long) thread seems to focus on developer apps or servers and recommends "just record the panic and restart".   That's not acceptable/possible for an application in use by "mere mortals".  Imagine if the C std lib just aborted instead of returning errors?  That would make it effectively useless for writing consumer-facing applications.

Joshua Liebow-Feeser

unread,
Aug 12, 2015, 7:46:52 PM8/12/15
to Paul Borman, Axel Wagner, golang-nuts
I think our discussion might be slightly hamstrung by the fact that we're discussing a call that already returns an error. Consider, instead, a function which has no error return, like sync.WaitGroup.Done(). That function just atomically decrements the internal counter of outstanding goroutines. If the counter goes below zero, Done panics. Would you argue that it should instead add an error return value, and require a call that would otherwise just be

wg.Done()

to turn into

err := wg.Done()
if err != nil {
...
}

instead? Keep in mind that every function which could encounter any sort of bad state as a result of programmer error would have to take on an equally cumbersome invocation, and you'd have error checking code falling out of your ears.

Joshua Liebow-Feeser

unread,
Aug 12, 2015, 7:48:20 PM8/12/15
to Andy Maloney, golang-nuts
On Wed, Aug 12, 2015 at 4:45 PM, Andy Maloney <asma...@gmail.com> wrote:
Everything I've read in this (long) thread seems to focus on developer apps or servers and recommends "just record the panic and restart".   That's not acceptable/possible for an application in use by "mere mortals".  Imagine if the C std lib just aborted instead of returning errors?  That would make it effectively useless for writing consumer-facing applications.
But in practice, what does a program do if it gets EINVAL? It usually just either 1) ignores it, in which case the program may be returning invalid results, which is very bad, or, 2) logs the error and quits, which is exactly how panics behave by default.
 

On Wednesday, August 12, 2015 at 7:40:22 PM UTC-4, Axel Wagner wrote:
Andy Maloney <asma...@gmail.com> writes:
> I have to agree 100% with Paul's take on it.  Not everyone is using Go for
> servers or command-line tools. One use case I have not seen in this thread
> is the one I'm looking at: using Go to create a C archive that I can link
> into a C++ application.  If Go is to be useful in this case, then anything
> in the standard lib that can return an error instead of panicking should do
> so.

One Question: why? i.e. why do the arguments brought up in this thread
for programs written in go do not apply to this usecase in the exact
same way? :)

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/Qg4CNgC7RPY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Andy Maloney

unread,
Aug 12, 2015, 8:07:37 PM8/12/15
to golang-nuts, bor...@google.com, axel.wa...@googlemail.com
Yes this is a valid point.  We already have a lot of error checking throughout and this would add to it.  However if devs choose to ignore the errors (as we see a lot with things like (*File) Close()) then that's their decision.

As an application developer I would still expect/prefer that a standard library only abort/panic as a last resort.  If I were writing command line tools or servers maybe I would see it differently.

Joshua Liebow-Feeser

unread,
Aug 12, 2015, 8:30:36 PM8/12/15
to Andy Maloney, golang-nuts, Paul Borman, Axel Wagner
Oh I certainly agree about the last resort part. When possible, you should let the type system handle it for you, since it can make real static guarantees. This is the philosophy behind the idea that, whenever possible, the zero value of an exported type should be a valid instance of that type. For example, it's good that you can just declare and use a sync.Mutex or bytes.Buffer without explicitly initializing them because it prevents certain errors.

fatdo...@gmail.com

unread,
Aug 12, 2015, 8:33:27 PM8/12/15
to golang-nuts
It is about options, just return an error. It is inconsistent with the rest of the package. Most functions handle inappropriately sized buffers with errors not panics.

Axel Wagner

unread,
Aug 12, 2015, 8:37:51 PM8/12/15
to Andy Maloney, golang-nuts, asma...@gmail.com

Andy Maloney <asma...@gmail.com> writes:
> Everything I've read in this (long) thread seems to focus on developer apps
> or servers and recommends "just record the panic and restart". That's not
> acceptable/possible for an application in use by "mere mortals". Imagine
> if the C std lib just aborted instead of returning errors? That would make
> it effectively useless for writing consumer-facing applications.

Why? I know a lot of "applications" that wrap their main binary in a
script to do *exactly* that: Catch any crashes of their main binary and
provide an option to send a coredump to the developer.
Why is this approach not viable? I just don't see it. This increases
usability, there is literally no bug that could interfer with your
ability to send stacktraces to the developer and display an arbitrarily
nice message to the user. This is not a fancy technique, it is very
basic and widely used (e.g. eclipse uses it afaik. And qtcreator does
something like it too).

Joshua Liebow-Feeser

unread,
Aug 12, 2015, 8:42:40 PM8/12/15
to Axel Wagner, Andy Maloney, golang-nuts
"arbitrarily nice." Computer scientists are a weird bunch, aren't we?


--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/Qg4CNgC7RPY/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.

Konstantin Shaposhnikov

unread,
Aug 12, 2015, 8:52:57 PM8/12/15
to golang-nuts, asma...@gmail.com
Eclipse is actually very good example of why keeping on running after encountering a programmer's error is not always a good idea.

It won't crash because of NullPointerException (clearly a programmer's error), it will just log it and will keep running. But very often it will become unusable after that and an user will have to restart it anywhay. For example it can start showing a modal dialog in an infinite loop. Even worse, on a rare ocasion it can become unusable in very subtle ways. For example it can let you beleive that you saved your changes to a file when you actiually haven't.

I also beleive that hiding such errors in the error log makes fixing them slower.

low...@gmx.de

unread,
Aug 12, 2015, 8:57:40 PM8/12/15
to golang-nuts, uriel....@gmail.com
Here I am again, sorry for changing email, the old one had some problems.

Since you wanted some tecnical reasons, I can tell you some. But before, please stop insulting. It is very, very, very annoying
when people shots sentences like "if any of these critical services would be affected by a single process crashing.". Because doing that
is like you assume people is stupid or incompetent. The problem is not that they are affected: the problem is how we made them not being affected.

This sounds very insulting, irritating, expecially from a person which I never met before, and perhaps have no idea of my daily job.

Please simply stop to do that. Our services are stable, there are many reasons because, and one of those is that we don't allow I/O voids to crash platforms which could need days to be restarted.

Now, to speak a bit more tecnical, here the reason I would AVOID an io.Reader to crash the whole program. here some examples:

1) Network Failover events on layer 2. You have to update a switch or a load balancer , you need to restart it for any reason, so you failover on its copy. What happens to open tcp connections? The second device takes the MAC address, but is (OFTEN: we have also some that can act differently)  unaware of the old sessions. Now, under the point of view of the application, the socket is there. Is simply silent. Until a TCP keepalive is sent. The second device, unaware of the socket, will simply answer back with "RST". Brutal close of connection. So here you have your io.Reader blasting.

Now, imagine this switch was serving some VLAN, and maybe some dozen applications. Maybe a web server takes some minutes to restart, but some platforms, like HLR, SMSC, MSC and others are requiring some expertise, some vendor support, and so on. If they crashed just for this, it means everytime we need to  do a failover event  we should manage a "cutover night" with 10-15 vendors. This is unacceptable.

What happens today is that we FIRST switch the traffic on another building/datacenter, which is going to create a jeopardy of "zero bit i/o" on a ton of machines. Those machines are experiencing a lot of crashes on threads, but they keep operating. We failover the device, and then we balance traffic again.

All of this is "cheap" because all the plaftorms are surviving the failover of the traffic on the second site. If we had to manage a full restart of a telco because of that, it could take days, and need vendor suppoert from a jeopardy of companies. Still feasible, BUT EXPENSIVE.

This is the first reason  I DON'T suggest to make io.Reader capable of crashing a whole platform.

2) SAN. The typical structure of a SAN is duplicated into its own factory, which means also switches are duplicated. So when a switch failovers, there is always, and automatically , changing the path. (assuming multipath scsi/whatever is in place). The storage itself has usually enough cache to queue the I/O operations, and modern storages like LUSTRE or other can keep metadata aligned in an efficient way. What will happen on the terminal machines is "some seconds of I/O retention". This can end to a timeout , which will end in some I/O operation calling an exception.

Again, this happens. When it happens, we expect the single thread will crash (and dump), but the service is kept operating. Sure, a single instance crashing is not a problem, but , since a failover on a switch is unpredictable, we cannot be sure of how many instances will be affected.

So again, I don't suggest to made this specific call, and similar, to be able to crash a whole platform, into situations where to restart a platform is a very expensive process.

3) More specific about telco. We use a flavour of IP which is SCTP/IP. On top of it we put part of SS7 signaling, and we call "SIGTRAN"  this. The main issue of sigtran, which emulates an old SS7 "link" , is that it has no socket or connections like TCP/IP, it has  "associations". Each association is working as a physical cable. Of course they are replicated and redundant. Sure.

But, since sigtran has a set of features for queuing, after a change in BGP announce on exterior network, or OSPF/EIGRP rebuild of topology into the interior network, it happens this associations are dropped, and since they have an heartbeat, recreated. In such a case, the thread itself has a queue (usually due of the threshold to maintain) in memory. In such a case, to avoid loosing the signaling in memory queue, it crucial that *even the tread keeps alive* at least until the queue is processed or dumped on disk.

The platform I manage has, more or less, ~250.000 items in memory queues right now (very common amount). If any interior/exterior change of topology would mean to lose 250.000 operations, it would be a mess, and very expensive to restart, I.E, a signaling gateway without the vendor support.

Again, we have particular requirements. The government wants us to grant emergency numbers, to grant SLA to some crucial services, and so on. But we are not the only: in Gas/Oil field, they not only react to existing issues, they put probes everywhere in pipes and react to "pattern of risk". Similar issues in finance , health care and chemistry.

In general, a data center has lot of failover events 24/7 , and many of them can disturb the I/O. To crash the applications for that reason is not considered acceptable. This is why compilers have enforced policies, standard like MISRA , and so on.

Not all the planet works on php, and industry+infrastructure+government+finance is not exactly "a niche". 66 Billions of Google could seem a big number, until you don't compare to 1600 Billions of Deutsche Bank, just for an example.

Please consider the idea that IT is used everywhere, and regardless the fact Google and Facebook and the fancy web economy are huge, compared to the rest of the economy, where we have programs too, is just a niche of IT.

So, my humble suggestion is to make the go compiler to enforce as many good practices as possible.

To have I/O functions to crash a whole platform and ask the developer to have care of it, instead of enforcing  could turn golang to be seen as the new PHP. Which means it could be very popular in the market niche of web applications,  but it will not penetrate telco,automotive,gas&oil,utilities,power,ad so on.

In my opinion, to be acceptable as an alternative of C++ , realtime Java, ADA (yes, still used), erlang  , and others (you could wonder how much cobol is still used in finance) , golang should enforce as many good practice as possible, without leaving to the programmer the job of preventing race conditions or I/O failure.

Hope now is more clear.


Uriel







On Tuesday, August 11, 2015 at 2:28:23 AM UTC+2, Andrew Gerrand wrote:
On 11 August 2015 at 04:41, Uriel Fanelli <uriel....@gmail.com> wrote:
But, still, until you need emergency call, power, water, healtcare, workplace security, please consider there is boring people

This may sound facetious, and I apologize if it comes across that way, but I would be very disturbed if any of these critical services would be affected by a single process crashing.
 
Our applications written in Java are crashing thread by thread, and the server not only keeps running: it keeps *operating*.
The same for the one written in C++ and ADA.  We are testing some written in erlang.

In a shared memory environment when a thread (or goroutine) "crashes" for whatever reason, it is impossible to make any assurances about the correctness or reliability of the rest of the process. That's why a panicking goroutine can take down an entire Go process. This was a deliberate design decision.
 
Golang looks very promising, and my suggestion was only to implement a mechanism of panicking which
only affects one thread, and not crashing the whole server.

Such a mechanism exists, and it's called "recover". So if you are careful you can write software in Go that can recover from panics. But for the reason I gave above, you should be very judicious in using this technique.
 
And maybe you are not even entitled to teach operations to me, btw.

This kind of defensive statement adds nothing whatsoever to the discussion. Please stick to technical concepts rather than arguing from authority. Although if you do want to think in those terms, please consider Google's experience in building and operating production systems.

Andrew

Konstantin Shaposhnikov

unread,
Aug 12, 2015, 9:31:48 PM8/12/15
to golang-nuts
I think you are ignoring the fact that io.Reader doesn't panic on I/O errors.

CopyBuffer panics when given invalid arguments. How can any of the situations you described result in zero length buffer passed as an argument to CopyBuffer?

Axel Wagner

unread,
Aug 13, 2015, 4:40:55 AM8/13/15
to low...@gmx.de, golang-nuts, uriel....@gmail.com
low...@gmx.de writes:
> Since you wanted some tecnical reasons, I can tell you some. But before,
> please stop insulting. It is very, very, very annoying
> when people shots sentences like "if any of these critical services would
> be affected by a single process crashing.". Because doing that
> is like you assume people is stupid or incompetent. The problem is not that
> they are affected: the problem is how we made them not being affected.
>
> This sounds very insulting, irritating, expecially from a person which I
> never met before, and perhaps have no idea of my daily job.

I find it very weird to read this from someone who is all the time
insinuating that everyone else giving them advice is not as skilled in
building distributed systems as them. e.g. from this very E-Mail:

> Not all the planet works on php, and
> industry+infrastructure+government+finance is not exactly "a niche". 66
> Billions of Google could seem a big number, until you don't compare to 1600
> Billions of Deutsche Bank, just for an example.

> Please consider the idea *that IT is used everywhere*, and regardless the
> fact Google and Facebook and the fancy web economy are huge, compared to
> the rest of the economy, where we have programs too, is just a niche of IT.

> To have I/O functions to crash a whole platform and ask the developer to
> have care of it, instead of enforcing could turn golang to be seen as the
> new PHP. Which means it could be very popular in the market niche of web
> applications, but it will not penetrate
> telco,automotive,gas&oil,utilities,power,ad so on.

This is the second, third and fourth time you implied that people just have no
idea of what IT really is and is blind to anyones usage except their own
and what it means to maintain a product. It is annoying. Please assume
that everyone here is at least as skilled and educated as you are. We do
the same (even if you read our responses differently).


In regards to the rest of the E-Mail: You explain in great detail, why
you can't have programs crash on you. You don't explain why
a) this particular panic is so much worse than *any other* panic due to a bug
(yes, io.Readers panic all the time due to bugs. For example
out-of-bound accesses of slices. It happens, it's basically an everyday
bug)
b) why you are using go in the first place, if this policy of
"crash when detecting programmer bugs" is so impossible to tolerate,
c) why just silently ignoring a failed Copy and letting code that
assumed it finished (which is a much more common bug than passing a
zero-sized buffer. e.g. there is no need to error check a Copy from a
bytes.Buffer to a bytes.Buffer, as neither can ever fail, so programmers
might assume they can ignore the returned err from Copy) is better and
can't lead to even more catastrophic outages,
d) how any of your described cases can lead to a zero buffer in io.CopyBuffer
e) why you can't just wrap io.CopyBuffer in a non-panicing function and
use that instead

You talked a great length about the importance of your applications and
the abscence of crashes and I don't think anyone really doubted
that. But your arguments still somewhat lack consistency, you don't say
why you are arguing *about this panic in particular*, instead of any of
the other panics. As I illustrated elsewhere about half of all stdlib
packages panic at *some* point and you don't seem to be concerned about
them. I don't understand how that comes, if every panic leads to these
catastrophic failures.


> So, my humble suggestion is to make the go compiler to *enforce as many
> good practices as possible.*

Might I suggest using rust instead? Rust gives you much stronger
guarantees by the compiler. go is a language of tradeoffs in regard to
strict typing, where a lot of compile-time safeties are skipped in favor
of productivity. For example, you can have dataraces in go, but not in
rust. Those are even more insidious than panics, as they compromise
memory safeness.

> In my opinion, to be acceptable as an alternative of C++ , realtime Java,
> ADA (yes, still used), erlang , and others (you could wonder how much
> cobol is still used in finance) , golang should enforce as many good
> practice as possible, without leaving to the programmer the job of
> preventing race conditions or I/O failure.

That is your opinion and you are entitled to it. But it is not what go
is. No one blames you or thinks less of you for disliking it for the
choices it made.

Best,

Axel

Loweel

unread,
Aug 13, 2015, 2:51:54 PM8/13/15
to Konstantin Shaposhnikov, golan...@googlegroups.com


On 08/13/2015 03:44 PM, Konstantin Shaposhnikov wrote:
> It looks like you accidentally hit Reply instead of Reply All.

Yes, sorry for that. I am putting the list in the loop again.I'll keep the first answer quoted.

> I think you are misunderstanding what a buffer is used for in
> io.CopyBuffer. Have a look at the docs:
> https://tip.golang.org/pkg/io/#CopyBuffer

We are on the same line. Basically this is coping after an assumption
the buffer exists and is full of things.

Now, I see two big situations which can cause this:

  1. The code is wrong. Here there is nothing you can do, bad programmers will produce bad code.
  2. The code is correct, but the assumption the buffer was "somehow" filled is wrong.
The second option is the one I am stressing more: i see lot of situations where the assumption
of "atomic I/O" or "copy on write" can fail. when a SAN has a storage when metadata and data
are decoupled in dedicated hardware, by example, you can see the filesystem , open a file,
allocate a seek pointer, and when you read you get nothing from the "data" disk pool. Or, deleting
a huge file with a seek pointer allocated, i,e at 4GB, you see the file of size 1M but you can still
try to "read".

Also the network can be very weird.

Which was my point.






>
> I do not agree that you can built a robust system by hiding
> programmer's errors (that is what panic is used in Go).
>
> As demonstrated by Eclipse example that I mentioned earlier
> programmer's errors can lead to unexpected or even undesirable
> behaviour. It is better to design system that is very quick to restart
> and restart it in case if it enters a bad state.

This sounds very good, but honestly, i miss an Oracle RAC which you can
"quickly restart".  And it is just an example. I would like to have a world
where I can restart Z/OS in seconds. :-) :-)

Consider a RAC is usually distributed in two buildings and the storage is replicated.
Now, in case of failover, which is a transparent event (in a perfect world), you want the database crashes?

Fine. You just loose any operation was not committed, plus any change in the XGA memory.

Now, we just need:

24/36 hrs to do the integrity check of all data, while 2 System DBA ( usually certified by oracle)
are restarting the whole cluster.

This means at least 2FTE on the RAC. Plus the loss of revenues of any platform connected to
it, which cannot access data anymore.

And I don't mention my boss screaming "who pays?".

Still I keep my opinion: platforms should be resilient.

To crash a platform just to show there is a problem is like to kill your friends to proof they have no combat skills.
Ok, then you  proof it, but the price is not easy to pay.


> Think about software that controls engine acceleration in a car.


I suspect you did the wrong example. :-) :-) . Car's software are usually requiring MISRA compilers,
plus a set of EAL practices, which are way stronger the ones I am suggesting. At the end, by example,
i am fine if you crash the program during the pre-production phase, or the QA phase,
to magnify problems.

What I am asking is to, somehow, disable this for production-ready software.

If you read my first comment, I was suggesting to have a flag to disable "hard panic".


> I am
> sure you do not want it to accidentally disable breaks because of some
> unexpected behaviour.

Either I don't want it crashes because some developer wants to do QA test just after the product
is in production. Sorry for that, in germany we have highways with no speed limit, and to have a
truck rebooting in front of me is not my wet dream, even if the core dump will be way intresting for Bosch. :-)

Just to clearify: I am ok if during the testing phase you want to play hard and make any problem evident.

This is why , at the end of my first message, I suggested to place  a flag which can enable or disable panicing:
if you want to crash the program during the testing phase, to make evident the code has a problem, I am with you.

But then, after the due tests, I would prefer if you compiled it with something like "-safe_panic", and disable this feature.

Also, I don't trust so much vendors. Don't take me wrong, developers are often ok. But their companies wants to shorten times.
And to shorten. And they have NO TIME to follow best practices. So you are right, a good programmer could check the lenght of
buf slice BEFORE of invoking this copy. But it will never: its "project manager" will push and push and push, and at the end, despite
of the best will, the code will be sub-optimal.

The only compiler I remember to avoid that was the ADA95, (and I guess ADA is still like that), because it forced you to
put data templates to access any data. You can do this with interfaces in golang, but it is not enforced.

This is why I still think is much better if the compiler enforces it: the programmer cannot say "no" to the company,
while a computer can. Let's make the computer do the job.





>
>
>
> On 13 August 2015 at 16:55,  <low...@gmx.de> wrote:

Because in such a conditions you have (or you risk to have, depending by your implementation) a buffer which you suppose to have something inside, which is not.

When you use I/O you assume, i.e. "since the file is open and your seek pointer is set, then we can read". It is a OS problem to make this happens.
Mentioning SAN I mentioned the most common case where a read operation which the developer can assume to be able to read a buffer , while it cannot.

When you manage a I/O operation, you assume it is atomic, be example, but this is not the case of SAN. Of course it is the job of the SAN to abstract it an represent to you as atomic. So could get the size of the file and a seek pointer from the -some vendor here-  metadata pool, which is hosted on some ssd raid, and reading i.e. 1024 bytes. unfortunately, the data pool of the storage, which is a separate entity, could be unable to serve the actual read, because it is being failovered. So your assumption of atomicity could be wrong, and there is very little the kernel of your operating system can do.

This problems are mitigated by checksummed filesystems like zfs on solaris and others, but not all filesystems are like zfs. And this is only a mitigation.

So my idea is to enforce as many robustness as possible on libraries, because even the so-call devops cannot take care of any specific infrastructure issue. The fact the underlying infrastructure can , due of inherent complexity, crack any assumption (tcp state machine, I/O is atomic, and so on) make me think of MITIGATING any risk.

Again, it is not that you can predict what can happen, it is you don't want to learn what can go wrong *in your face*. In 20+ years of operation i have seen a ton of developers saying "but in theory this should work like that" . And then you see it will be fixed patching the os -two months later- , you will see how peculiar branded hardware behaves, and so on.


As I repeated, it is about to avoid patterns of risks and enforce good practices. The price of this is hard to measure, until is too late.
signature.asc
Reply all
Reply to author
Forward
0 new messages