All the std programming text tell us how good it is to use assertions
(I am thinking of books like OOSC(II)) but the mechanism they describe
is like cassert every time, i.e turned off in production. I have
noticed that it is hardly ever used on any C++ projects I have seen.
It doesn't seem to get much use in C libraries either but does seem to
have a slightly higher usage there.
Does this mean C++ programmers don't like such a C-like mechanism? Or
they don't don't like the fact that the assertions are disabled in
production? Or is there some other reason?
Regards,
Andrew Marlow
--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]
I have faced this situation several times in the past. In the end I decided
that if I could not bring myself to turn off the Assertions then I was
implying that the code was not robust. I don't tend to use assertions very
often these days and tend to throw exceptions on bad inputs, in cases where
the scope for error is high and the scope for a "catastrophic outcome" is
real (eg. a crash, as opposed to simply computing an invalid result).
What is your definition of a programming contract, in terms of an API? Some
people think every function should check all the arguments and throw
graceful exceptions if you violate the contract. This acts like a Guard at
the door.
Others think that violation of the contract leads to violation in terms of
the operational behaviour (eg. a crash or whatever) and if you hadn't
violated the contract you wouldn't get the crash. This is the Garbage in,
Garbage Out principle.
The practical upshot is that one viewpoint puts the onus on the API writer
to check their arguments, and the other puts somewhat of an onus on the
caller to check their arguments.
I don't think a blanket statement works either way. High-performance code
cannot afford to check every argument to ensure it is getting the right
arguments every time - because that would be detrimental to performance.
>From my experience over the years, programming with some
basic properties like structuring, documenting, using
design pattern, recognizing anti pattern and using a
good error handling is a matter of discipline!
It's a big topic!
Most developers do not have that discipline.
In addition - the companies managements are forcing quick
solution for the lack od software quality.
It's a big topic too!
Assert mechanism are important! No doubt!
In production code you have to be aware of situation
which can fail and the code still shouldn't crash!
The next example easily demonstrates the problem with
production code:
---
bool check(Data* data)
{
assert(data);
return data->isValid();
}
---
A little better:
---
bool check(Data* data)
{
assert(data);
if (data)
return data->isValid();
return false;
}
---
I hope it helps a little.
I think you [possibly] misunderstand the fundamental rule
for assertion usage. That is,
" Use assertions for conditions that should *never* occur, and
use error handling code for conditions you expect to occur. "
If you always want the checks on, that means the conditions you
checked are expected to occur during run-time. Therefore, yes,
it is not necessary to use assertion here, just handle them and
everything will be fine. Define the possible errors and categorize
them for assertion/error-handling is the first thing you should do
for your defensive code.
[ "how good it is to use assertions" are snipped . ]
And the macro disable issue:
For the standard assert macro, it is specified in C standard
that assert shall refer to NDEBUG, and C++ borrow this
rule from C. There is no like/dislike issue at all, since the
underlying assert macros are same. If you don't like it,
write your own assertion handlers.
BTW, it is not unusual that both assertion and error handling
are used together to make the code safer. That is, although
only one of them should be sufficient, instead of using
standard assert, you can enable and handle your own
assertions during run time (of course you need install your
special assertion handlers), well, lets say for mission-critical
code.
Jiang
Incorrect. It is common in my experience to leave assertions enabled
all throughout a program's development, sometimes even when it ships
(this is more common when a product is shipped internally, ie: not for
retail).
In any case, NDEBUG disables the assert macro: it does not enable it
as you said.
> This is one reason I don't
> use the assert macro. I always throw an exception that means a fatal
> programming error has occurred. Am I the only one that does this?
> Surely not. My reasoning is that I always want the checks on, just in
> case. This means it is something I tend to do sparingly, after all,
> loads of them might well create a performance issue.
This is bad programming practice. An exception should be thrown when
there is a reasonable chance that it will be handled, or to
intentionally propagate failure to the highest level (usually a top-
level handler of some kind). This doesn't characterize many
situations: for example, validating pre-conditions of a method or
function is much-better suited to an assert. It's local to the point
of failure, potentially allows continuation (for debugging), and has a
more concise syntax than a condition test + throw statement. Note that
it's also common to use custom assertion macros that throw exceptions
on failure (this is often used in Unit Testing frameworks): the two
are not mutually exclusive.
> All the std programming text tell us how good it is to use assertions
> (I am thinking of books like OOSC(II)) but the mechanism they describe
> is like cassert every time, i.e turned off in production. I have
> noticed that it is hardly ever used on any C++ projects I have seen.
> It doesn't seem to get much use in C libraries either but does seem to
> have a slightly higher usage there.
Assertions are not used enough because overall people code badly. If
you're throwing exceptions around instead of wrapping your simple pre-
and post-conditions into a simple assertion macro (whether that macro
throws an exception or not, and regardless of the conditions under
which it's enabled or not), then I submit that you're also coding
badly.
> Does this mean C++ programmers don't like such a C-like mechanism? Or
> they don't don't like the fact that the assertions are disabled in
> production? Or is there some other reason?
It's tough to answer what all or most or many or any C++ programmers
like or don't like. A problem that emerges with C++'s flexibility is
that it doesn't enforce much rigor. For example, if you contrast C++
with, say, Eiffel, you will notice that Eiffel places more emphasis on
logical rigor by simplifying Design By Contract.
I suggest that you learn and apply best practices in the language that
you work in, and learn other languages and their best practices to
round out your knowledge. It's good to look at other people's code,
but it's better to look at the very best examples - you will probably
notice a much higher degree of assertion checking in such code.
Personally I have nothing against keeping the assertions in even in the
production code. After all if your code works correctly you will never
hit an assertion but if there are they can be very useful when the
customer reports the error.
The advantage of assertions it that they give you the file and line
where the error occurred for free. You can get the same information with
exceptions, and if you put some effort into it you can probably get a
stack trace also.
As for performance, I would not worry about it, after all you should
only assert things that are not supposed to happen and which can not
gracefully be handled. There should not be too many of these.
--
Erik Wikström
Sorry, but that is wrong: You have to disable (not enable!) assertions with
the NDEBUG macro. Further, whether that is done in production code is up to
personal taste. I like them off in my production builds though, because I
need the speed more than the absolute correctness of the result, but I
don't program medical equipment or aviation devices either.
> This is one reason I don't use the assert macro. I always throw an
> exception that means a fatal programming error has occurred. Am I the
> only one that does this?
How do you throw exceptions from a destructor? How do you throw them from a
signal handler or thread entry function which has C binding and therefore
must not throw? In any case, you are neither aborting cleanly nor are you
unwinding the stack to a place where the state is valid again. After all,
detecting a programming error, an unexpected inconsistency often means the
state of the program is beyond repair, so anything that still touches it
(like throwing exceptions) is likely to not make it better.
Also, but I don't know if you meant that, an exception doesn't automatically
mean a fatal programming error has occurred, exceptions are a perfectly
normal part of error handling.
> My reasoning is that I always want the checks on, just in case. This
> means it is something I tend to do sparingly, after all, loads of them
> might well create a performance issue.
Right, lots of checks can create performance issues. However, you shouldn't
handle this as error handling but rather as a built-in diagnostic mode for
your code. For example, I have a parser table and my algorithm uses a
binary search that obviously relies on the table being sorted. Now, I don't
check each and every time when I look up a value that the keys are sorted.
Rather, I only do so when NDEBUG is not defined. The point is simply that I
am confident that the production and debug variants work the same, but in
the debug variant I check this using alternative algorithms or expensive
validation. You could call this a built-in unit test.
> All the std programming text tell us how good it is to use assertions
> (I am thinking of books like OOSC(II)) but the mechanism they describe
> is like cassert every time, i.e turned off in production. I have
> noticed that it is hardly ever used on any C++ projects I have seen.
> It doesn't seem to get much use in C libraries either but does seem to
> have a slightly higher usage there.
Well, lots of bad code are out there, in particular for languages that make
it damn easy to shoot yourself in the foot like C and C++. That is IMHO not
a reason why the language or in this case assertions are a failure, it's
just that people don't understand assertions or that they don't understand
their value.
Uli
--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932
An assertion is a useful low-level mechanism for catching bugs near
where they live, rather than waiting until 100000 cycles later when the
error shows up someplace else.
In production code, you work really hard to ensure that the user doesn't
lose data. That means that you don't make the program spew its guts up
if it finds a zero-length string in a dialog box, say, or can't find
enough file handles to open a logfile. Those are great places for
assertions, or some similar C++ mechanism, precisely because you want to
catch them in development but ignore them or handle them gently in
production. The same is generally true of checking class invariants.
There are also lots of runtime checks that you do want to have in
production code (including some of the same ones as you'd put in
assertions), but you don't use assert() for that--if the user's data is
at stake, it's really really bad manners to call abort(). I'd chuck a
program like that in the bitbucket instantly.
Cheers,
Phil Hobbs
That's the opposite. Assertions are enabled unless you define NDEBUG
(no debug).
The rationale for that is that assertions are for programming errors,
and those errors have no reason to happen in a valid program. Once
you've tested your program enough, or performed enough static
analysis, you should be able to tell whether your program is valid or
not and never invokes those programming errors.
Only in that case are you to disable assertions, since you have
demonstrated they were never fired and thus introduced useless
overhead.
> I always throw an exception that means a fatal
> programming error has occurred.
There is a big problem in this sentence.
A fatal programming error is not a recoverable exception. Throwing an
exception (which is to be caught) doesn't make sense. You should abort
the program.
> Am I the only one that does this?
> Surely not. My reasoning is that I always want the checks on, just in
> case.
Simple: do not define NDEBUG in your release builds.
That kinda means you clearly expect your program not to be robust,
though.
> Does this mean C++ programmers don't like such a C-like mechanism? Or
> they don't don't like the fact that the assertions are disabled in
> production? Or is there some other reason?
The C++ type system allows static assertions and types that always
satisfy some invariants.
This diminishes the need for runtime assertions.
Unfortunately, there are also a great deal of people that are using
exceptions for case that really ought to be assertions.
> The next example easily demonstrates the problem with
> production code:
> ---
> bool check(Data* data)
> {
> assert(data);
> return data->isValid();}
>
The real question here, is why are you even taking a pointer?
And why isn't the code const-correct?
bool check(const Data& data)
{
return data.isValid();
}
No need to assert anything.
I disagree, errors that you can handle you either handle locally or
throw an exception, you never use assertions, not even during
development. On the other hand there will always be (unless you write
only trivial programs) a few errors that you simply can not handle, in
which cases assertions are used.
The alternative is to either continue with faulty data (which is *much*
worse than abort()), or to throw an exception which you catch in main()
and terminate the application there. Unless you can generate a stack-
trace or get some other useful information that way it does not buy you
anything you would not get from normal assert(), except a bit more
complicated code.
--
Erik Wikström
> I disagree, errors that you can handle you either handle locally or
> throw an exception, you never use assertions, not even during
> development. On the other hand there will always be (unless you write
> only trivial programs) a few errors that you simply can not handle, in
> which cases assertions are used.
That's a style point. I don't use assertions to replace error handling,
but for things like class invariant checks, using assert() allows you to
put them at the beginning and end of member functions, where the
performance penalty is usually unacceptable in production code.
Similarly, there are places where I want to use a safe but slow
algorithm to check a faster but not proven algorithm. Assertions are
great for that--it's like having built-in unit tests.
It's even sometimes reasonable to put assertions *inside* the production
error handlers.
> The alternative is to either continue with faulty data (which is *much*
> worse than abort()), or to throw an exception which you catch in main()
> and terminate the application there.
That's an over-generalization--not all faulty data is a fatal error. Do
you really have to call abort() and destroy the user's work just because
you can't open a logfile or you have a zero-length string in a dialog
box? Puh-*leeze*.
> Unless you can generate a stack-
> trace or get some other useful information that way it does not buy you
> anything you would not get from normal assert(), except a bit more
> complicated code.
I generally use a modified assert() that causes a divide-by-zero FPE, so
that any time an assertion fires the program drops into the debugger
conveniently. This obviously isn't a replacement for careful coding,
but it doesn't take too many bugchases avoided to repay the small
additional effort.
Cheers,
Phil Hobbs
--
> > marlow.and...@googlemail.com wrote:
> >> Assertions via assert.h (or cassert) are disabled in production and
> >> have to be enabled via the NDEBUG macro.
I admit, I got this the wrong way round. Many thanks for all the
corrections :-)
> > An assertion is a useful low-level mechanism for catching bugs near
> > where they live, rather than waiting until 100000 cycles later when the
> > error shows up someplace else.
I disagree. Like most people on this thread, I think assertions are
used to test a programmers assumptions, which means if they trigger
then there is a coding error somewhere.
> > In production code, you work really hard to ensure that the user doesn't
> > lose data. That means that you don't make the program spew its guts up
> > if it finds a zero-length string in a dialog box, say, or can't find
> > enough file handles to open a logfile. Those are great places for
> > assertions, or some similar C++ mechanism,
Not assertions. I agree that these conditions should be caught and
handled. However, they do not represent violated assumptions, i.e they
are not related to coding errors.
> I disagree, errors that you can handle you either handle locally or
> throw an exception, you never use assertions, not even during
> development.
I disagree with your disagreement ;-) So when would use assertions
then? Never? Hmmm. Never say never.
On the other hand there will always be (unless you write
> only trivial programs) a few errors that you simply can not handle, in
> which cases assertions are used.
No. Errors that you cannot handle should result in an exception that
is passed up the call stack in the hope that somehone higher up can
handle it. If they can all well and good, if not then the program will
terminate with a sensible error message. Either way its not your
problem. You do not use assertions for this. For a start they are
often turned off which would make the program go bad. And if they are
turned on they would make the program halt, preventing someone higher
up in the call chain from handling it.
>
> The alternative is to either continue with faulty data (which is *much*
> worse than abort()),
agreed. We won't do that then.
> or to throw an exception which you catch in main()
> and terminate the application there.
much better.
> Unless you can generate a stack-
> trace or get some other useful information that way it does not buy you
> anything you would not get from normal assert(), except a bit more
> complicated code.
Sometimes code written by good programmers has to be deployed in an
environment where the application as a whole must not fail, come what
may. Such programs tend to be very large, built up over years,
containing many functions written by loads of people. It would be very
bad for such a program to terminate when new code is deployed that
contains a bug which asserts. IMHO. In such cases the good code must
not terminate, it should throw instead. Terminating is just such bad
manners.
Regards,
Andrew Marlow
--
You misunderstood, I was talking about how to deal with errors that can
not be handled gracefully. In which case terminating is the only solution.
>> Unless you can generate a stack-
>> trace or get some other useful information that way it does not buy you
>> anything you would not get from normal assert(), except a bit more
>> complicated code.
>
> I generally use a modified assert() that causes a divide-by-zero FPE, so
> that any time an assertion fires the program drops into the debugger
> conveniently. This obviously isn't a replacement for careful coding,
> but it doesn't take too many bugchases avoided to repay the small
> additional effort.
I was talking about production-code, where you want to output
informative text which can help you locate the error and fix it. But I
don't think that dropping into a debugger (if one exist on the
production environment) is appreciated by the customers.
--
Erik Wikström
>> I disagree, errors that you can handle you either handle locally or
>> throw an exception, you never use assertions, not even during
>> development.
>
> I disagree with your disagreement ;-) So when would use assertions
> then? Never? Hmmm. Never say never.
For errors you can not handle, such as broken invariants etc.
> On the other hand there will always be (unless you write
>> only trivial programs) a few errors that you simply can not handle, in
>> which cases assertions are used.
>
> No. Errors that you cannot handle should result in an exception that
> is passed up the call stack in the hope that somehone higher up can
> handle it. If they can all well and good, if not then the program will
> terminate with a sensible error message. Either way its not your
> problem. You do not use assertions for this. For a start they are
> often turned off which would make the program go bad. And if they are
> turned on they would make the program halt, preventing someone higher
> up in the call chain from handling it.
Well, my point was that you use assertions for errors you can not
handle, which means that they can not be handled anywhere in the call-
stack. And since you will be terminating the program anyway you might as
well do it as quickly ass possible, *unless* you can gain additional
information (which can be used to fix the error) by walking up the stack.
>> Unless you can generate a stack-
>> trace or get some other useful information that way it does not buy you
>> anything you would not get from normal assert(), except a bit more
>> complicated code.
>
> Sometimes code written by good programmers has to be deployed in an
> environment where the application as a whole must not fail, come what
> may. Such programs tend to be very large, built up over years,
> containing many functions written by loads of people. It would be very
> bad for such a program to terminate when new code is deployed that
> contains a bug which asserts. IMHO. In such cases the good code must
> not terminate, it should throw instead. Terminating is just such bad
> manners.
Applications that must not fail should be deployed with some kind of
redundancy and terminate as soon as they suspect that something might be
wrong and let the redundant units take over. Or, if they are not that
critical they will be restarted once they fail.
--
Erik Wikström
Of course not. That's what NDEBUG is for. assert() is intended as a
debugging aid--hence the NDEBUG macro's name.
Cheers,
Phil Hobbs
--
The more experience I have, the more I'm convinced that a failing
assertion should reveal *a bug*. The corollary is that if it's not a
bug, assertions are not the correct way to handle it.
So that means a few important things:
1. Assertions in production code are a good thing. Test before you
ship if you're afraid they might fail.
2. A failing assertion should crash immediately (after due logging).
Not throw an exception, not return, just crash. Don't let your crazy
app corrupt the user's data.
3. The assert-and-if pattern as shown above is always wrong: either
assert(), or handle it, but not both.
What do you people think?
Cheers,
Carl
The particular app I had in mind was one that was a large server
servicing very different kinds of requests from thousands of clients
at a time. Restarting this server in the event of an assertion failure
during the execution of one buggy command would cause it to disconnect
thousands of users. This makes it something to be avoided. The server
was kicked off from a script that would restart the server in the
event it crashed. But you don't want to write code that will make it
do this (i.e writing assert or calls to abort, exit etc etc).
Although in theory a buggy command could cause corruptions that might
lead to a SEGV we found that in practise it either didn't happen or
only happened when that particular command was run loads of times. We
did not want to punish the thousands of other users. There was a
political element to this. If we had put such code in then a coding
error (we all make them) would have terminated the server which was a
very public thing. The script that handled the restart also raised a
trouble ticket to indicate there was a problem. Our group would have
received these tickets and quickly gained a bad reputation. Much
better for the top level function to trap AssertionFailure and report
an internal error to the user.
Regards,
Andrew Marlow
--
No you're not the only one. I do a similar thing primarily because
assertions are turned off in release/production builds. Either that or
leave the NDEBUG macro defined in your release/production build. It's
easier to track down a problem in the field when there's an assertion
there to tell you where it happened. Unless efficiency is important
then there are probably more good reasons to leave the checks there in
your release/production build than have them removed.
I use them in the production release, and find them to be very effective
at finding bugs.
----
Walter Bright
Digital Mars
C, C++, D programming language compilers
http://www.digitalmars.com
Sure - changing one magical switch to modify the assertion behavior of
a whole app is, most of the time, a no-no: that would mean changing
the error-handling policy of the code, which is by no means a sound
idea. It's just as risky as, say, using a tool to replace all raw
pointers with smart pointers and re-ship the whole thing as soon as it
compiles. Assertions can be used in different ways depending on how
they are implemented.
So I think the question about leaving assertions in a production
system is part of the error handling strategy, and it should be raised
only on new or small projects, or big projects in the very early
stages of implementation.
Cheers,
Carl
It depends on the application and the context of the function call. Do
you really want to assert if the user inputs something that fails the
pre-condition? Do you really want to assert if some network packet
data received fails a pre-condition (i.e. a malicious client sending
malformed data in an attempt to crash your program)? In those cases it
would be better to detect the invalid data and throw an exception and
let the calling function deal with the raised exception.
>
> > All the std programming text tell us how good it is to use assertions
> > (I am thinking of books like OOSC(II)) but the mechanism they describe
> > is like cassert every time, i.e turned off in production. I have
> > noticed that it is hardly ever used on any C++ projects I have seen.
> > It doesn't seem to get much use in C libraries either but does seem to
> > have a slightly higher usage there.
>
> Assertions are not used enough because overall people code badly. If
> you're throwing exceptions around instead of wrapping your simple pre-
> and post-conditions into a simple assertion macro (whether that macro
> throws an exception or not, and regardless of the conditions under
> which it's enabled or not), then I submit that you're also coding
> badly.
>
If that's the case then a fair number of network appliances in large
telcos and investment banks are running 'bad' code because they're
based on a commercial software platform that basically uses the
philosophy 'if precondition is false then log error and throw
exception' everywhere and has a paucity of assertions. The result of
this on a failed test, is a log of errors showing the functions called
prior to the failed test and, depending on the nature of the error,
the daemon can choose to continue or exit. It seems to work rather
well and would hate to think what would happen if the tests were
replaced with assertions instead.
I hate to break it to you but we know better now. :-) Just because they
thought that assert was only useful a quarter of a century ago (when C
was standardised) or perhaps even earlier does not mean that it is true.
I pretty much agree with Carl's three points, and I know many others who do.
--
Erik Wikström
Well, the kinds of applications that I had in mind were safety critical
systems like those used in medical applications or systems in aeroplanes
and spaceshuttles, or high availability systems like those in telephone
systems with 99.99% uptime guarantees.
> Although in theory a buggy command could cause corruptions that might
> lead to a SEGV we found that in practise it either didn't happen or
> only happened when that particular command was run loads of times. We
> did not want to punish the thousands of other users. There was a
> political element to this. If we had put such code in then a coding
> error (we all make them) would have terminated the server which was a
> very public thing. The script that handled the restart also raised a
> trouble ticket to indicate there was a problem. Our group would have
> received these tickets and quickly gained a bad reputation. Much
> better for the top level function to trap AssertionFailure and report
> an internal error to the user.
Once again you have missed the point, if you can report an error to the
user it means you obviously could handle the error, in which case you
should not use assertions. If an operations failed and you can report it
to the user then the application did not fail.
--
Erik Wikström
[snip]
>
>All the std programming text tell us how good it is to use assertions
>(I am thinking of books like OOSC(II)) but the mechanism they describe
>is like cassert every time, i.e turned off in production. I have
>noticed that it is hardly ever used on any C++ projects I have seen.
>It doesn't seem to get much use in C libraries either but does seem to
>have a slightly higher usage there.
>
>Does this mean C++ programmers don't like such a C-like mechanism? Or
>they don't don't like the fact that the assertions are disabled in
>production? Or is there some other reason?
>
>Regards,
>
>Andrew Marlow
Assertions are a debugging tool. They are not intended to be in release
code. The intended use is to allow the programmer to constrain variables
within acceptable limits and to trap into the debugger when those
constraints are not met. This allows you to examine the variables and
identify the state of the function at that place in the code and determine
the cause of the aberrant behavior. If your program needs run-time checks,
then by all means use them and throw exceptions when that happens but
assert will be redundant if you are already performing runtime checks.
You use assert to quickly write a test for valid data and to document what
the function will consider valid data, once you discover it is receiving
valid or invalid data you should correct the caller or handle it in the
called function but you should never rely on assert to catch runtime errors
in production code.
Your program should be checking data for validity and should NEVER crash
because of bad input data. Assert is a crash. It doesn't belong in release
code.
Which would you prefer:
1) Client rings you at three in the morning spitting and cursing that
the application you supplied has just crashed.
2) Client rings you at three in the morning saying that the application
you supplied has just failed with an assertion aasking him to ring this
number and pass on the rest of the message '...'
3) Client rings you at three in the morning complaining that the
application you supplied keeps aborting with a message about some exception.
At least 2) and 3) might tell you where to start looking for the problem.
So perhaps you ship to early? ;-)
If I am to be more serious, something like "assert" can be quite
useful in production code. The problem is that assert in my opinion is
to coarse and perhaps to badly defined (I would not like a debugger to
be opened when the customer tries to run my code).
Some asserts are useful to leave in, but others are better left out. A
typical example is verification of code: assume you write a smart
o(log n) algorithm to replace your current o(n). In those cases, I
validate my new algorithm against the old one, but the result is that
your code gets slower, not faster. Using assert to validate this is
clearly a bad idea.
/Peter
To me that sounds as though you have painted yourself into a corner. You
can't afford many assertions in release builds because of the performance
penalty, and your reasoning says that they should not be enabled in debug
unless they are also enabled in release, and so you don't use them much
in debug builds, even though they would benefit you there and the
performance doesn't matter.
My view is that assertions should be used so extensively that they /are/
expensive. And therefore should be disabled in release builds. If they
aren't too expensive to leave on in release builds, you aren't yet
checking enough stuff.
-- Dave Harris, Nottingham, UK.
Me too. But it seems like we are in minority. Most people seem to be
taking the view that because asserts are used to check programming
assumptions when they trigger it indicates a coding error and coding
errors have no place in production code, therefore asserts have no
place there either.
I hope we all agree that there should not be a any bugs in production
code but the fact is that there often are. When there are then with
our approach these bugs are quickly identified. With the other
approach I reckon it is harder.
Some say that the presence of bugs is the programmers fault and we
should not come up with conventions and approaches that pander to the
weaknesses/inefficiencies of certain programmers. I disagree. We are
all human and conventions that help us are good, they are not
pandering to our weaknesses. But it seems this is just a matter of
opinion.
I am beginning to think that whether or not one uses asserts in
production code is now just down to opinion, a bit like whether or not
your project allows the TAB character in source code or if you are
allowed to put curly brackets on the same line.
Many thanks to everyone who has contributed to this thread, it has
helped me think more clearly about the issues.
Regards,
Andrew Marlow
The particular 'error' was not a conventional application error, it
was the app detecting that programmer assumptions have been violated,
i.e a programming error. When this happened our function had to
terminate but the server that run it must NOT terminate. We had to
handle this condition otherwise the server would have terminated,
which was not acceptable. This did not alter the fact that the
condition arose as a result of programmer assumptions being violated.
That's why we did it by throwing an AssertionFailed exception.
Regards,
Andrew Marlow
--
Absolutely. 2) is the best choice. If the assertion mechanism logs the
appropriate information and shows a message to the user saying that,
basically, "we screwed up, sorry, call us", it is so much better than
the application just disappearing from the screen (or the server
suddenly not responding anymore), leaving users with no clue and lots
of frustration.
Option 4: the client calls you because the software has been
generating wrong financial reports for the last 4 months without
anyone noticing, and their company has lost $4 Million because of a
bug in your software. I'd rather take any one of 1, 2 or 3! Assertions
in production code are there to prevent this kind of problems.
Cheers,
Carl
> Assertions via assert.h (or cassert) are disabled in production and
> have to be enabled via the NDEBUG macro. This is one reason I don't
> use the assert macro. I always throw an exception that means a fatal
> programming error has occurred. Am I the only one that does this?
> Surely not. My reasoning is that I always want the checks on, just in
> case. This means it is something I tend to do sparingly, after all,
> loads of them might well create a performance issue.
>
> All the std programming text tell us how good it is to use assertions
> (I am thinking of books like OOSC(II)) but the mechanism they describe
> is like cassert every time, i.e turned off in production. I have
> noticed that it is hardly ever used on any C++ projects I have seen.
> It doesn't seem to get much use in C libraries either but does seem to
> have a slightly higher usage there.
>
> Does this mean C++ programmers don't like such a C-like mechanism? Or
> they don't don't like the fact that the assertions are disabled in
> production? Or is there some other reason?
I'm not sure why you've experienced that. From Boost's point-of-view,
since we are writing libraries, we don't know the precise requirements
of our clients, so we use a typical wrapper (BOOST_ASSERT) whose
behavior on failure can be customized.
However, I wanted to note one point that is typically overlooked in
favor of asserts that disappear in production: a programmer who knows
that an assert will not disappear may be less inclined to write one (for
performance reasons). In many programming scenarios, it's a good idea to
encourage *lots* of assertions.
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
More like I've had the hubris that my code is perfect beaten out of me.
> If I am to be more serious, something like "assert" can be quite
> useful in production code. The problem is that assert in my opinion is
> to coarse and perhaps to badly defined (I would not like a debugger to
> be opened when the customer tries to run my code).
That doesn't happen on mine, it just prints a message indicating a
compiler bug and the user (hopefully) reports it to me with a
reproducible test case.
I started using asserts heavily back in the DOS days, where an errant
pointer could literally scramble your hard disk. This is absolutely
unacceptable behavior, so the asserts were effective in blunting that risk.
> Some asserts are useful to leave in, but others are better left out. A
> typical example is verification of code: assume you write a smart
> o(log n) algorithm to replace your current o(n). In those cases, I
> validate my new algorithm against the old one, but the result is that
> your code gets slower, not faster. Using assert to validate this is
> clearly a bad idea.
Sure, you have to use some judgment. I'm willing to put up with a 5%
slowdown for the extra checking, though. It has paid off handsomely for me.
Building a bit of redundancy into the data structures helps a great deal
in detecting corrupted data.
Anyone who does have that attitude simply hasn't been hit by the bus of
experience yet :-)
Hi,
Walter Bright wrote:
> marlow...@googlemail.com wrote:
[...]
>> Me too. But it seems like we are in minority. Most people seem to be
>> taking the view that because asserts are used to check programming
>> assumptions when they trigger it indicates a coding error and coding
>> errors have no place in production code, therefore asserts have no
>> place there either.
>
> Anyone who does have that attitude simply hasn't been hit by the bus of
> experience yet :-)
>
Consider the case where you optimise an algorithm for some task. This
algorithm, written specifically to improve performance of your software
should work the same as the algorithm you're replacing:
assert ( do_new_fast (...) == do_old_slow (...) );
Given that you're making the change for performance, it doesn't make
sense to leave this assertion in the code as that will do exactly the
opposite to what you want.
As has been mentioned in other threads, the assertion above helps you
find the point of failure during your internal testing. If/when it
triggers you know exactly what you have to fix.
Regards,
Richard
--
Richard Corden
I would say that if you want to leave assertions in the
code, they should simply log a message when the assertion
is violated, but not do anything else.
> So that means a few important things:
> 1. Assertions in production code are a good thing. Test before you
> ship if you're afraid they might fail.
>
This depends on what an assertion does (in production code). (See below)
> 2. A failing assertion should crash immediately (after due logging).
> Not throw an exception, not return, just crash. Don't let your crazy
> app corrupt the user's data.
>
I 100% DISagree with the crash-part. Logging is always good of course.
In a realistically complex application, nothing, ever, should crash the
application. Certainly NOT a bug. Every application contains bugs. Some
of them will be the kind of bugs that cause assertions. Not all these
bugs are fatal (very few are, imho). If a bug is not fatal the app. must
not crash (but still may want to indicate an error via an appropriate
mechanism.)
> 3. The assert-and-if pattern as shown above is always wrong: either
> assert(), or handle it, but not both.
>
In real-life it's often the least-bad-thing to do. You don't want to
crash you app via the assert(), so you choose some default behavior if
no other means of error reporting are possible or useful in the situation.
> What do you people think?
>
I think that assert() should be a debug only tool.
In a well-written application that makes correct and consistent use of
exceptions, I could see the merits of an assert_always(x) that raises a
logic_exception() at runtime or somesuch.
br,
Martin
> I would say that if you want to leave assertions in the
> code, they should simply log a message when the assertion
> is violated, but not do anything else.
Often this is just not possible. That's because when the assertion
triggers it is because a programmers assumptions have been violated.
The next line of code may rely on that assumption which is why it is
being asserted, eg:
SOME_ASSERT_MACRO(i != 0);
val = x / i;
If all the assert macro does is log then the next line of code will
give a runtime error and make the program go bad. If an
AssertionFailure exception is thrown then recovery can be done higher
up.
Regards,
Andrew Marlow
Yes, those you want to elide for production use. But there are an awful
lot of other uses of assert that have little or no perceptible runtime
cost that are very reasonable to keep in. For example:
switch (foo)
{
case BAR: ...
case BAZ: ...
default: assert(0); // oops!
}
or:
for (int i = 0; 1; i++)
{
assert(i < dim); // must find it
if (foo[i]) return i;
}
--
Yes -- although I have sometimes wished for a standard macro called
assume(), with identical behavior as assert().
I think of 'assert(foo)' as "I assert that foo holds, and if it
doesn't, then I have made a logical error which invalidates big parts
of the program".
'assume(foo)' would read as "let's pretend, for now, to know that foo
holds. But let's redesign it before anyone uses it in production."
As a metaphysical side note, I think that usage of assert() among
non-English speaking programmers suffers because they do not fully
understand the word's original meaning. Wiktionary:
1. To affirm; to declare with assurance or plainly and strongly; to
state positively; to aver; to asseverate.
2. To maintain; to defend.
[...]
That sums up the spirit of assert() nicely. You state something as an
absolute truth, something non-negotiable. And if you are proven
wrong, you have no way of handling the new situation.
/Jorgen
--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!
Asserts are not for checking the validity of input data to the program.
They are for double checking the logic of the program itself. If you are
able to write non-trivial programs that have no logic bugs in them, then
I salute you. As for me, I will continue to use asserts.
No. Without the assertion, the next line may execute without
runtime error, for example, if x has floating-point type. It
may be the case that giving val a bad value is bad, but it may
also be the case that this error soon disappears in the course
of further processing and has no very harmful effect. With the
assertion, the program crashes or fails to do what it was going
to do, which, for example, could be saving someone's hours of
work.
I speak from hard, practical experience. At my company we have
long-running reports which operate over many different security
types, and we have had cases where assertions for some new type
would abort runs after hours of work, wasting all that time.
And what exactly can you do when you catch an AssertionFailure?
You have already disrupted the normal flow of the code, and by
definition it's something that isn't supposed to happen, so how
can you program any recovery?
>On 12 Aug, 21:43, Hyman Rosen <hyro...@mail.com> wrote:
>
>> I would say that if you want to leave assertions in the
>> code, they should simply log a message when the assertion
>> is violated, but not do anything else.
>
>Often this is just not possible. That's because when the assertion
>triggers it is because a programmers assumptions have been violated.
>The next line of code may rely on that assumption which is why it is
>being asserted, eg:
>
>SOME_ASSERT_MACRO(i != 0);
>val = x / i;
>
>If all the assert macro does is log then the next line of code will
>give a runtime error and make the program go bad. If an
>AssertionFailure exception is thrown then recovery can be done higher
>up.
>
And what's wrong with
SOME_ASSERT_MACRO(i != 0);
if (i != 0)
val = x/i;
else
return FAILURE_CODE;
// or alert user
// or ...
The assertion assures a nice trap while in development & test, calling
attention to the fact that an assumption has been violated and the runtime
check assures that when the assertion code isn't there in release that
something is there to catch it anyway.
>Geoff wrote:
>> You use assert to quickly write a test for valid data and to document what
>> the function will consider valid data, once you discover it is receiving
>> valid or invalid data you should correct the caller or handle it in the
>> called function but you should never rely on assert to catch runtime errors
>> in production code.
>>
>> Your program should be checking data for validity and should NEVER crash
>> because of bad input data. Assert is a crash. It doesn't belong in release
>> code.
>
>Asserts are not for checking the validity of input data to the program.
>They are for double checking the logic of the program itself. If you are
>able to write non-trivial programs that have no logic bugs in them, then
>I salute you. As for me, I will continue to use asserts.
program = function = class :)
I don't say asserts shouldn't be used, I say they don't belong in released
code as they merely crash the program and there are many cases in my line
of work where this is not permissible and is downright irresponsible.
If your program (function) needs runtime checks of the data, by all means
put them in the code (e.g., non-null pointer tests). Asserts are there and
are mostly a C legacy for an instant debug break at the point of failure.
Your design should be robust, not merely catching errors and crashing. The
purpose is to alert you to the fact your function (class) is being called
with arguments or data your function is not prepared to accept so you
should find out why it's happening or find a runtime method for handling
that case. Anything else is bad design, bad test, bad implementation.
Runtime data integrity failure in code released to customers should output
a meaningful diagnostic, either as a message to the user or written to a
log file for the user to send to your support team, not just an assertion
message.
One of the most vehement and IMO, righteous criticisms of Windows and
Windows programs was the propensity to crash with cryptic messages often
identifiable as assertion traps, left in "released" software. If this is
the acceptable quality of your code then I suppose there is nothing worry
about. The almighty BSOD is nothing more than another assert. The same
thing goes for Guru Meditation and Kernel Panic, those days should be long
gone in the programming art.
I do use asserts, but I use them to catch errors in debug/test before
releasing the code to customers. I also have runtime checks in place for
cases where bad data must not cause the function to fail and appropriate
handling is going to be performed in debug and release mode code. This is
why functions have error returns and why programmers always check them, is
it not?
I have an optimization suggestion, then. If your program does not care
about the value of val, then I suggest replacing the assert and divide with:
val = 0;
and your program will run even faster :-)
> With the
> assertion, the program crashes or fails to do what it was going
> to do, which, for example, could be saving someone's hours of
> work.
An assertion failure is not a crash. It is an abort, and what happens
when an assert is tripped is entirely up to the programmer. In your
case, it can call a routine which saves the work in progress. It could
also be programmed to restart the program, or hunker down into some
minimal state and await further instructions.
> And what exactly can you do when you catch an AssertionFailure?
> You have already disrupted the normal flow of the code, and by
> definition it's something that isn't supposed to happen, so how
> can you program any recovery?
I don't see how any attempt at recovery could be worse than ignoring the
corrupted state of the program and hoping it will just go away.
> I don't say asserts shouldn't be used, I say they don't belong in released
> code as they merely crash the program and there are many cases in my line
> of work where this is not permissible and is downright irresponsible.
If you don't have faith that you have caught all the bugs and the
ASSERTs will never fire, why do you have faith that your recovery code
will solve the much harder problem of taking the program from an
unpredictable state into a legitimate sane state?
> If your program (function) needs runtime checks of the data,
ASSERTs aren't for runtime checks of the data. They are to detect
programming errors.
> Runtime data integrity failure in code released to customers should output
> a meaningful diagnostic, either as a message to the user or written to a
> log file for the user to send to your support team, not just an assertion
> message.
How do you do that if you can no longer trust the state of your program?
> I do use asserts, but I use them to catch errors in debug/test before
> releasing the code to customers. I also have runtime checks in place for
> cases where bad data must not cause the function to fail and appropriate
> handling is going to be performed in debug and release mode code. This is
> why functions have error returns and why programmers always check them, is
> it not?
Bad input is not a programming error. ASSERTs are for catching
programming errors.
--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620
> > All the std programming text tell us how good it is to use assertions
> > (I am thinking of books like OOSC(II)) but the mechanism they describe
> > is like cassert every time, i.e turned off in production. I have
> > noticed that it is hardly ever used on any C++ projects I have seen.
> > It doesn't seem to get much use in C libraries either but does seem to
> > have a slightly higher usage there.
> >
> > Does this mean C++ programmers don't like such a C-like mechanism? Or
> > they don't don't like the fact that the assertions are disabled in
> > production? Or is there some other reason?
>
> I'm not sure why you've experienced that.
The responses on this thread so far seem to show a wide variety of
taste and preference. Some say that when programmer assumptions are
violated the code should crash (in preference to continuing on to
undefined behaviour) whilst others say an exception should be thrown
to allow recovery higher up (that's my view). And some say that
whichever of these is done it should always be disabled in production
whilst others (including me) say that the check should always be on in
production. I think that there is confusion because some people see
the word 'assert' and map it directly onto the assert in the old C
header. It can be implemented in that way but my preference is to have
a macro that throws AssertionFailure when the assertion expression
evaulates to false.
> From Boost's point-of-view,
> since we are writing libraries, we don't know the precise requirements
> of our clients, so we use a typical wrapper (BOOST_ASSERT) whose
> behavior on failure can be customized.
This sounds like a good idea when it is a general purpose library like
boost. For a project that does not ever anticipate releasing its work
in library form it might be a good idea to use a similar wrapper put
pick the policy as a project-wide policy. Then if the project is ever
unexpectedly called on to release its work in library form (hey, it
happens) the use of the wrapper will save you like the boost wrapper
can.
I will bear this in mind the next time I find myself in a position
where I have the power to dicate what a project does in this area.
> However, I wanted to note one point that is typically overlooked in
> favor of asserts that disappear in production: a programmer who knows
> that an assert will not disappear may be less inclined to write one (for
> performance reasons). In many programming scenarios, it's a good idea to
> encourage *lots* of assertions.
My belief is that assertions should always be on in production so I
tend to avoid ones that are expensive to compute in tight loops. But
how often does that happen? Hardly ever, I reckon. Much more common is
checking you don't have a null pointer before referencing through it,
checking an integer is not zero before dividing by it, etc etc.
Regards,
Andrew Marlow
--
> And what's wrong with
>
> SOME_ASSERT_MACRO(i != 0);
> if (i != 0)
> val = x/i;
> else
> return FAILURE_CODE;
> // or alert user
> // or ...
>
> The assertion assures a nice trap while in development & test, calling
> attention to the fact that an assumption has been violated and the runtime
> check assures that when the assertion code isn't there in release that
> something is there to catch it anyway.
Tell me how you tested it. Assume you have an ASSERT macro that fires
but doesn't abort. Either:
The ASSERT never fires, in which case you haven't fully tested the code
that runs after it, which might put your program in a worse
unpredictable state than the one it would be in if the ASSERT just
aborted.
The ASSERT fires and you remove the buggy code. Once the buggy code is
removed, you can no longer fully test the code that runs after the
ASSERT.
The ASSERT fires and you leave in the buggy code, in which case you are
shipping a buggy product that you know can get into an unpredictable
state.
I'm with Walter on this one. An ASSERT detects a programming error, and
your program is in an unpredictable state. You have to get it back into
a predictable state. On some systems, this is accomplished by aborting
and restarting. On others, life is more complicated. But just ignoring
the detected programming error is never the right thing to do.
--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Speaking from hard, practical experience (this means I speak for the
code I have to work with day-to-day) I can say that it's almost *never*
desirable to execute the code following an ASSERT. If I could just go on
without the ASSERT what's the point of having it there, it seems the
code is well behaved anyway.
> And what exactly can you do when you catch an AssertionFailure?
> You have already disrupted the normal flow of the code, and by
> definition it's something that isn't supposed to happen, so how
> can you program any recovery?
>
For some applications, where a large part of error handling consists in
just ignoring unlikely errors (been there, done that), it may well be
the best option to follow up the ASSERT with a if and do just nothing.
On the other hand, if the app supports halfway decent error handling, it
may be worthwhile for the assert to throw an exception, or return an
error code.
As for disrupting the normal flow of code: If I hit an assert, the code
part where I'm currently at is broken anyway and the least appropriate
thing to to is to just continue as if nothing happened (still I may be
forced to do this.) It's just that calling abort() seems to be a bit of
an overkill, because there may be other independent code-paths running
at this time that can continue quite successfully ...
br,
Martin
If that is how you would interpret assume() than I'm just as glad that
it's not in the standard. We all know that a frightening large amount of
code in production systems was supposed to be redesigned before being
put in production :-)
--
Erik Wikström
But I explained exactly that. In the division-by-zero
case, for example, val might be set to Inf or NaN, used
in a few further calculations, perhaps output, and then
the program goes on to do something else and all is
forgotten. The corrupted state of the program does in
fact "just go away". Array overflow is similar - the
code might write past the end of an auto array and
clobber a few other variables on the stack, leading to
some anomalous behavior, but if hasn't overwritten too
far, once the function returns, the effects of the error
might be completely erased.
With an assertion left in, even if the program is not
terminated, it goes into a very bad state because by
definition, a failed assertion is unanticipated. No
programmer is going to write detailed recovery code for
each different situation where an assertion exception
can be thrown, because by definition, those situations
can't happen, and therefore there is no recovery logic
available. How do you recover from 1 + 1 == 3?
Assertions in production code are in programmer's logic.
They'd rather be right than be useful.
>ASSERTs aren't for runtime checks of the data. They are to detect
>programming errors.
Precisely my point. Which is why they don't belong in released software.
Asserts are there to catch programming errors when function A calls
function B with argument values function B doesn't expect or more
precisely:
Asserts express assumptions about the state of the program at any point in
its execution and fire when those assumptions are violated. They are
indicators of fundamental design or implementation flaws that need to be
corrected before release of the code, not after.
The stated goal is to detect those errors at unit test or integration test
before the sheet hits the fan and to make sure that modules are integrated
and meet their documented specifications.
If an assumption fails, there is an error in the program or the assumed
range of expected data or the calling module or the documentation of both
caller and callee. In any case the place to halt the process is in debug
where it can be immediately analyzed and corrected. If the assumptions are
wrong, it's time to fix them and handle the reality of the data, if the
data is wrong, it's time to trace back to what caused the data to go wrong.
If the caller and callee disagree about what is expected, valid data, it's
time to fix the documentation and correct the affected modules.
If you are going to hand software to customers that is debug code, say so
explicitly and hand him debug code. Don't hand him "release" code and allow
it to crash. Yes, end users can hand you bad data that no amount of
in-house testing can reveal but hopefully, by using assertions in debug and
by properly testing in your runtime code and handling those exceptions in
your explicit code you have no need to crash and burn because some
assertion fired that should never have been in the release to begin with.
Can't trust your code? Do you think the pilot of an F-22 cares about
recording and reporting the output of your assert macro while he's pulling
the eject handle?
--
You test it by deliberately passing in erroneous data because you make
every effort to test all pathways through your code, including the error
handlers as above. You do this in debug and again in release before handing
it off to end users, now you have confidence the explicit runtime checks
you put there will respond appropriately. If necessary, disable the asserts
in debug mode before testing in release. In this particular case the assert
would be redundant depending on how the error was handled in the explicit
code. Wouldn't you do the same when passed a null pointer?
>
>The ASSERT fires and you leave in the buggy code, in which case you are
>shipping a buggy product that you know can get into an unpredictable
>state.
>
>
>I'm with Walter on this one. An ASSERT detects a programming error, and
>your program is in an unpredictable state. You have to get it back into
>a predictable state. On some systems, this is accomplished by aborting
>and restarting. On others, life is more complicated. But just ignoring
>the detected programming error is never the right thing to do.
The above example is an oversimplification of course, but it shows three
issues in assumptions about asserts in released code. :)
1. That writing x/i where i can be zero can be a programming error as well
as a data assumption error.
2. Assert won't always catch a program error, only bad data.
3. That aborting the process is the only way out.
I reject #3 completely. If you understand the domain of the data and you
understand the process enough to write asserts, you certainly better
understand it well enough to write a recovery handler for instances at the
boundary conditions where the assertions will fire and you must test them.
The point is that relying on asserts to catch errors in released code is no
better than handing your customer untested code in the first place.
--
whilst others say an exception should be thrown
> to allow recovery higher up (that's my view).
The problem with throwing an exception is that once the ASSERT detects a
programming error, the program is in a state that was not predicted by
the programmer. If the state of the program is not one which was coded
for, how can one trust that the destructors will all do the right thing
during stack unrolling?
--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
But that makes no sense. What kind of recovery will
you make for a situation that can't happen? Why would
a programmer write recovery code for something that
can't happen? My understanding of assertions is that
they document the logic of the code during development,
so that mistakes are quickly exposed. By your premise,
every time a programmer adds an assertion he has to
write the recovery code for when that assertion fails,
even though he believes that it cannot happen. The only
possible result of this is that he will use no assertions
at all.
> In article <4bj6a45fm44lj5n9d...@4ax.com>,
> Geoff <ge...@invalid.invalid> wrote:
>
> > And what's wrong with
> >
> > SOME_ASSERT_MACRO(i != 0);
> > if (i != 0)
> > val = x/i;
> > else
> > return FAILURE_CODE;
> > // or alert user
> > // or ...
> >
> > The assertion assures a nice trap while in development & test, calling
> > attention to the fact that an assumption has been violated and the runtime
> > check assures that when the assertion code isn't there in release that
> > something is there to catch it anyway.
>
> Tell me how you tested it.
The whole idea of assertions is that they test that a programmers
assumptions are true. If we can engineer a situation in which they are
not then we have found our logic bug. Of course, if we find a bug
before we ship then we remove it! The idea of leaving assertions in is
that they can find any cases that were missed.
-Andrew Marlow
--
> Walter Bright wrote:
> > I don't see how any attempt at recovery could be worse
> > than ignoring the corrupted state of the program and
> > hoping it will just go away.
I agree.
> But I explained exactly that. In the division-by-zero
> case, for example, val might be set to Inf or NaN, used
> in a few further calculations, perhaps output, and then
> the program goes on to do something else and all is
> forgotten.
eh? forgotten? how?
> The corrupted state of the program does in
> fact "just go away".
That depends on the error. Many will cause a SEGV.
> No programmer is going to write detailed recovery code for
> each different situation where an assertion exception
> can be thrown
I think you misunderstand. When I trap AssertionFailed at the top
level of my function I terminate the function with an error message to
the user. My function then returns to the server which continues to
run and maintains it connections to the thousands of other users it is
connected to.
Handling AssertionFailure means handling that one exception, not
worrying about all the different situations that might cause it to get
raised.
>, because by definition, those situations
> can't happen, and therefore there is no recovery logic
> available. How do you recover from 1 + 1 == 3?
See above.
Regards,
Andrew Marlow
Release builds are here to be released. They are meant to be fast and
stable. They're what are used in the real world.
Debug builds are here for two reasons.
1- Debug builds are there for debugging. Symbols are left in and
optimizations are turned off to allow human debugging of the machine
code.
2- Debugs builds are there for testing. We don't care about
performance of debug builds, so go crazy with runtime checks to catch
bugs.
Now, suppose you have a function. This could be a public interface or
an internal function. This function has a contract, laid out in
comments, etc. You have two options:
A- "Garbage in, garbage out." If someone calls your function with bad
data, the effects are not defined.
B- You have checks at runtime to ensure the input data is valid.
The silly argument in this thread is just an argument between points A
and B. There is no "right" answer. The answer involves a normative
judgment on your part. Option B produces "safer" code, but slower
performance. Option A is faster, but "less safe". Which way you go,
and how far, depends greatly on what you're writing, specifically the
penalty of a crash or incorrect behavior that would be caught by the
runtime check versus the performance costs involved for the runtime
check.
A public interface should not necessarily check its input data. It's
perfectly fine to say "Garbage in, garbage out". For low level
routines, anything else but this no-check release mode is
unacceptable. Would you want every mutex::acquire to perform a sanity
check to make sure you don't get into a deadlock, and throw an
exception / return an error code if you do? The performance costs of
this would be staggering.
Having said that, it's quite clear that asserts are meant only for
debug builds. If only for consistency with the community at large, use
them that way. This is not an answer to point A vs B. If you want to
put in a runtime check, possibly deep within your program, which
returns an error code, throws an exception, logs a message, kills the
program, or some combination or variation thereof, go for it. You made
the normative judgment that the runtime performance costs are worth
this extra sanity check. Just don't use assert to do it. Use
"runtime_assert" or "logical_assert" or "BOOST_ASSERT" or whatever you
want to call it.
> You test it by deliberately passing in erroneous data because you make
> every effort to test all pathways through your code, including the error
> handlers as above.
It is easy to make error handlers that mask the symptom. How do you
write ones that fix the underlying problem? And if you aren't
addressing the underlying problem, you might as well make the error
handler part of the documented interface.
For example:
ASSERT(p);
if (!p)
return;
Why bother with the ASSERT? Just document what happens when p is NULL.
If p isn't allowed to be NULL on this call, then the early return
doesn't fix the underlying problem anyway.
> 1. That writing x/i where i can be zero can be a programming error as well
> as a data assumption error.
Sure, it is based on context.
> 2. Assert won't always catch a program error, only bad data.
If you are using assert to detect bad data, you are using it wrong
IMNSHO.
> 3. That aborting the process is the only way out.
>
> I reject #3 completely.
That is a straw man argument. I never said that. What I did say is
that ASSERT detects that a program is in a bad state that the
programmers didn't account for, and you have to now get the program into
a state that was accounted for. Sometimes abort is used to accomplish
this; other times far more complex mechanisms are used, especially in
safety-critical applications.
> If you understand the domain of the data and you
> understand the process enough to write asserts, you certainly better
> understand it well enough to write a recovery handler for instances at the
> boundary conditions where the assertions will fire and you must test them.
They aren't firing at boundary conditions. They are firing at
programming errors.
> The point is that relying on asserts to catch errors in released code is no
> better than handing your customer untested code in the first place.
If the ASSERTs aren't firing during testing, then I have enough
confidence to leave them in because I do not believe they will fire
during the production run. And if they do happen to fire during
production, it is still far better than silently ignoring the
programming error.
--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> If you are going to hand software to customers that is debug code, say so
> explicitly and hand him debug code. Don't hand him "release" code and allow
> it to crash.
It sure beats your alternative of handing them "release" code which
silently attempts to swallow unpredictable programming errors.
> Yes, end users can hand you bad data that no amount of
> in-house testing can reveal
Again, it has nothing to do with bad data. It has to do with detecting
programming errors.
> Can't trust your code? Do you think the pilot of an F-22 cares about
> recording and reporting the output of your assert macro while he's pulling
> the eject handle?
Or your code where the plane ignores the eject handle because of a
detected but ignored software bug?
ASSERT obviously has to do something different than abort in the case of
a jet fighter. I'm not arguing against that. It has to get whatever
program it is running from its current unpredictable state to a well
known state that the code can handle. Silently ignoring the programming
bug you just detected rarely accomplishes that.
--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Yes -- but that would be the exact point with assume(). It would be a
standardized way of declaring that the author isn't satisfied with the
code yet -- much like the "XXX" or "TODO" conventions in comments, but
stronger.
Maybe it's a bizarre idea. Still, I find myself using assert() in this
way in early prototyping, for things like unlikely I/O errors. I don't
like it, but the other common way of dealing with it is to ignore the
possibility, and that is far worse.
(I cannot come up with any other example of a language construct -- in
any language -- which is supposed to be used during development, but
never remain in production.)
/Jorgen
--
// Jorgen Grahn <grahn@ Ph'nglui mglw'nafh Cthulhu
\X/ snipabacken.se> R'lyeh wgah'nagl fhtagn!
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> But I explained exactly that. In the division-by-zero
> case, for example, val might be set to Inf or NaN, used
> in a few further calculations, perhaps output, and then
> the program goes on to do something else and all is
> forgotten.
Or it might be your bank account balance.
And given that you have detected a programming error, the program is now
in a state not anticipated by the designers, and you have no idea
whether silently ignoring the failure will be innocuous or deadly.
> How do you recover from 1 + 1 == 3?
Why do you prefer that that be silently ignored?
--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Geoff, that sounds like an assumption that shipping code is bug-free.
I've certainly never been that lucky myself! I view assertions as
internal checks: as soon as the program *knows* it has a bug, it
should not continue, at the risk of corrupting or loosing user data,
or sending out 50 m^3 radioactive water out in the river.
Example: I swear every time I get an "Internal Compiler Error" from
Visual C++, which is rather rare I should say, but deep down I'm
rather thankful it doesn't go on generating wrong code knowing that
there's a problem, because that would be a *nightmare* for me to debug
thinking that the bug is in *my* code!
The point is: strong testing before you ship should guarantee that in
most situations, no assertion fails. However, you should still expect
some assertion in some obscure and rarely executed part of the code to
fail, but that wouldn't be bad enough to make your program unusable.
Assertions make bugs more visible, in a way, and it's up to you to
decide at what point your code is acceptable. But we definitely want
bugs to be visible - it keeps us honest!
Cheers,
Carl
> it's quite clear that asserts are meant only for debug builds. If only
> for consistency with the community at large, use them that way. This
> is not an answer to point A vs B. If you want to put in a runtime
> check, possibly deep within your program, which returns an error code,
> throws an exception, logs a message, kills the program, or some
> combination or variation thereof, go for it. You made the normative
> judgment that the runtime performance costs are worth this extra
> sanity check. Just don't use assert to do it. Use "runtime_assert" or
> "logical_assert" or "BOOST_ASSERT" or whatever you want to call it.
You made a good effort to back up and uncover the real issues, but
there's one as-yet-unmentioned linguistic disagreement you didn't deal
with: about half the people in this thread think of the word "assert" as
meaning the thing that comes from the <assert.h> / <cassert> headers,
while the other half have something more general in mind, as in
something like, "any runtime check for programming errors." Until those
two concepts are distinguished, I don't think the debate can lead
anywhere useful.
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
> Geoff wrote:
>> If you understand the domain of the data and you
>> understand the process enough to write asserts,
>> you certainly better understand it well enough to
>> write a recovery handler for instances at the
>> boundary conditions where the assertions will
>> fire and you must test them.
>
> But that makes no sense. What kind of recovery will
> you make for a situation that can't happen? Why would
> a programmer write recovery code for something that
> can't happen? My understanding of assertions is that
> they document the logic of the code during development,
> so that mistakes are quickly exposed. By your premise,
> every time a programmer adds an assertion he has to
> write the recovery code for when that assertion fails,
> even though he believes that it cannot happen. The only
> possible result of this is that he will use no assertions
> at all.
Not quite. The other one is that his program fills up with untested
(and surely incorrect) "recovery" code. This is one danger of
"defensive programming."
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
For the purposes of this post, please consider:
- An "assertion" is a stated condition which is only false if some
manner of "critical error" has occurred, with an associated (implicit
or explicit) programmatic action. (If you're already replying to
argue, read on first).
- A "critical error" is a relative term, necessarily only understood
in the context of the test. For example, list where pointer to head
is NULL when size() > 0.
Assertions can then be divided into two types:
- NECESSARY assertions
- helping provide behaviours defined in the usage specification of a
software (sub)system
- OPTIONAL assertions
- situations where:
- behaviour is undefined (i.e. unsupporting inputs), or
- defined behaviour can not be (or perhaps can not
necessarily continue to be) reliably provided
These definitions are deliberately kept inclusive to allow for the
fact that some people will write systems that validate run-time inputs
by "asserting" their validity. In such a context, assertion failures
are predictable in typical client usage. This _is_ a credible
programming model in certain types of applications. In this
situation, whether the assertions are NECESSARY or OPTIONAL depends on
whether the system documents defined behaviours (e.g. error exits,
core dumps) when inputs are invalid. Either way, the tests may still
take the form of ASSERT(CONDITION) - it's pointless to argue that
they're not assertions in the general sense.
So, for NECESSARY assertions, the C++ best-practice involves making
all the same decisions re error-handling needed to provide optimally
useful behaviours to users. It's not interesting to discuss what that
may be, nor does it belong in a separate "assertion" problem space
distinct from error handling in general, regardless of whether the
notation used and mentality involved is "assertive".
It's now possible to focus the discussion on OPTIONAL assertions
without being bothered with irrelevant counter examples drawn from
NECESSARY assertion usages.
By the definition above, a program is designed and intending to
function identically - for the supported range of inputs - with or
without OPTIONAL assertions being active.
> My reasoning [for not using assertions] is that I always want
> the checks on, just in case. This means it is something I tend
> to do sparingly, after all, loads of them might well create a
> performance issue.
This succinctly captures how the appeal of assertions (e.g...
- "security" of knowing:
- things look "healthy" at all points during processing
- alternative implementations produce same results
- higher chance of program pro-actively self-diagnosing
previous untriggered errors, whether due to new inputs,
race-conditions, code changes etc.
- self-documentation
) conflicts with the performance penalties. There must necessarily be
a threshold beyond which you omit testing, and assert.h provides a
very crude (and obviously for you unsatisfying) mechanism:
Realistically, many low-level (sub)system can't perform exhaustive
tests without adversely affecting users. So, it's better for the
caller to be able to control the extent to which optional assertions
are being checked. A compile time include/exclude switch is about as
crude - and generally effective - as you can get, but there's no
logical reason a variety of thresholds - some compile time and some
run time - can't be employed by a library or system. It's as complex
a problem space as you want to make it - all the way from compile-time
benchmarks to controlling compile-time exclusion to run-time changes
of assertion-checking levels based on system load.
--- A good low-level library lets the user tune how much performance
is sacrificed for extra assurances of correctness ---
There's also the question of what to do when an optional assertion
fails:
--- A good low-level library lets the user decide what they'd like to
have happen when optional assertions fail ---
It might also attempt to provide a reasonable default, which is where
we step off the edge into the grey area of experience, application
architectures and usage styles that vary from IT system to system and
industry to industry. I'd rather leave it alone, but there may be
some value in encouraging sensible defaults....
Both an error message and an exception constitute the safest bet in
terms of:
- potentially attracting developer attention (IF code / test coverage
causes the assertion to fail in development), and
- probability of being handled reasonably by pre-existing "robustness/
correctness" framework of most fault-tolerant, high-availability C++
systems.
At least for production systems, it should be this outer, application-
level code that decides what the ramifications of an error should be.
But, this is only a compromise that many developers will want to
override, some habitually.
> All the std programming text tell us how good it is to use assertions
> (I am thinking of books like OOSC(II)) but the mechanism they describe
> is like cassert every time, i.e turned off in production. I have
> noticed that it is hardly ever used on any C++ projects I have seen.
> It doesn't seem to get much use in C libraries either but does seem to
> have a slightly higher usage there.
>
> Does this mean C++ programmers don't like such a C-like mechanism? Or
> they don't don't like the fact that the assertions are disabled in
> production? Or is there some other reason?
I believe the issues resemble those for testing (and documentation) in
general: many programmers don't take a systematic approach, and most
work under heavy time/productivity pressures (sometimes resulting from
the desire to be as lazy as possible) that preclude doing things the
way they'd say they like ;-). Sometimes they just don't have and end-
to-end model established for clients to easily control the use of
assertions, and don't want to be blamed for any particular default
behaviour.
Tony
--
I'm glad Nevin answered, because I found it pretty odd that the "will go
away" theory has been aired with any seriousness.
Allow me to remark that the real answer may be much more entropic than
the crisp yes/no answer that some want to embrace. Probably the answer
is tilted one way with the experience each of us has at work. One writes
compilers. In that domain it is perfectly sensible to abort compilation
as soon as something gets fishy. In other domains it's preferable to
keep the application running if there's a good likelihood of the
corruption did not affect its core structures. It depends.
My own recent experience is with a batch system (a sentence-level speech
recognizer), so it should be ok to abort compulsively, which is exactly
what I did initially. However, the batch system runs for about 8 hours.
Each batch takes a few minutes to process. On occasions I do have
failures I could not explain to myself, but after a few frustrating
half-runs, I found it useful to have assert throw an exception that is
caught at top level. That way one failed batch did not compromise an
8-hour job. Indeed, I had no theoretical backing to think that any
application-wide invariants held after a failed assertion. But
practically I had a pretty good guess that the error has to do with the
working set associated with that particular batch, so throwing that away
and loading the next batch was an acceptable route to take. I could
still draw conclusions after the statistically successful experiment in
addition to cues on where my bugs lie.
So what I'm saying at the end of the day is that the answer to the
multi-faceted questions asked in this thread is an equally multi-faceted
"it depends".
Andrei
--
You need to realize that it's not about you. When the person
or the toaster is using software, it's no good for the program
to toss up its metaphorical hands, decry its creator, and give
up, because then there won't be any toast.
In most cases, the cost of not continuing far outweighs the cost
of blindly plunging on as if things were OK. I'll bet Amazon would
rather be subject to a few stack overflow attacks than not be able
to take orders for a few hours.
It's really simple. You make your production assertions log errors
so that programmers can examine them at leisure to see what might
have gone wrong, but otherwise you do nothing and keep going, unless
you're in a circumstance where shutting down makes more sense. The
radiation therapy machine, yes. The airplane's navigation system,
questionable. The word processor, no way.
> In article
> <10ce7fc0-d6e1-4801...@a1g2000hsb.googlegroups.com>,
> marlow...@googlemail.com wrote:
>
> whilst others say an exception should be thrown
>> to allow recovery higher up (that's my view).
>
> The problem with throwing an exception is that once the ASSERT detects a
> programming error, the program is in a state that was not predicted by
> the programmer. If the state of the program is not one which was coded
> for, how can one trust that the destructors will all do the right thing
> during stack unrolling?
Furthermore, there's no reason to think that code far from the point
where a bug is detected is in a better position to correct the problem.
Throwing exceptions when preconditions have been violated is usually
just a way of avoiding responsibility for the hard decision about
whether to terminate the program.
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
> You made a good effort to back up and uncover the real issues, but
> there's one as-yet-unmentioned linguistic disagreement you didn't deal
> with: about half the people in this thread think of the word "assert" as
> meaning the thing that comes from the <assert.h> / <cassert> headers,
> while the other half have something more general in mind, as in
> something like, "any runtime check for programming errors." Until those
> two concepts are distinguished, I don't think the debate can lead
> anywhere useful.
Well, as the OP, I have to admit that I muddied the waters by talking
about both at the same time. I wanted to talk about the latter but
started off by mentioning that the common implementation of a check
for programmer errors is the assert.h/cassert mechanism. We seem to
have got bogged down talking about that mechanism.
Can we please refocus and talk about what is wanted from a mechanism
that checks for programming errors.
-Andrew Marlow
--
Ah, good point. I had the more general notion of "run-time check for
programming errors" in mind.
Carl
--
> On Aug 5, 2:13 pm, marlow.and...@googlemail.com wrote:
>> Assertions via assert.h (or cassert) are disabled in production and
>> have to be enabled via the NDEBUG macro. This is one reason I don't
>> use the assert macro. I always throw an exception that means a fatal
>> programming error has occurred. [snip]
>
> For the purposes of this post, please consider:
>
> - An "assertion" is a stated condition which is only false if some
> manner of "critical error" has occurred, with an associated (implicit
> or explicit) programmatic action. (If you're already replying to
> argue, read on first).
>
> - A "critical error" is a relative term, necessarily only understood
> in the context of the test. For example, list where pointer to head
> is NULL when size() > 0.
>
> Assertions can then be divided into two types:
>
> - NECESSARY assertions
> - helping provide behaviours defined in the usage specification of a
> software (sub)system
>
> - OPTIONAL assertions
> - situations where:
> - behaviour is undefined (i.e. unsupporting inputs), or
> - defined behaviour can not be (or perhaps can not
> necessarily continue to be) reliably provided
>
> These definitions are deliberately kept inclusive
i.e. vague. What you're talking about is inclusive enough to mean
little more than "if statement" does. The power of terminology comes
from its ability to make distinctions. Thus, my definition is crisp:
An assertion is a runtime check for a condition that would indicate a
programming error.
In general, such a condition can not be (reliably) recovered from and
indicates an arbitrary and unknowable amount of brokenness in the
program state. Unrecoverable conditions should be handled with minimal
emergency measures -- the details of which may depend on the application
-- and not by using the same code paths as recoverable errors, which can
be written to execute in an environment of full confidence in the
program state. That's why it's important to distinguish assertions from
other kinds of "error" checks.
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Actually there may as well be. Intuitively, by and large the deeper on
the call stack you are, the more state you are dealing with. In main, at
the top level, you have no other state than globals and main's only
locals. The less state there is, the less risk for corruption there is.
Of course, there is always risk that the global heap, the stack, the
global data section, or the code section get corrupted, but
statistically, it is reasonable to think that if you have a large amount
of state that is corrupt in an unpredictable point, reducing that state
by a lot will reduce your risks of corruption as well.
By that reasoning, the farther the point of correction from the point of
assertion, the more credible the correction.
> Throwing exceptions when preconditions have been violated is usually
> just a way of avoiding responsibility for the hard decision about
> whether to terminate the program.
And IMHO that's not a bad thing. Getting back to the batch system I
described in another post, in main() I catch any failed assertion and
proceed to opening and processing a new file. I would have been less
happy if the machine learning library I use decided it's best to abort
the program on my behalf.
(Just to peel off one more level of detail, I could have used an
extralinguistic means such as a shell script to run an executable many
times with different file arguments. The executable would return error
code 1 on failure, and the script would acknowledge the error and
continue looping. In that case all global postprocessing of results must
have taken place outside the executable, e.g. in a different executable
or in the script itself (awkward). Besides, all of this is going on a
cluster where it's somewhat easier and more efficient to run one
straight native binary instead of a shell script that in turns loads a
binary multiple times.)
Andrei
--
A better solution would have been to checkpoint the state before every
batch (and using a monitor script to restart the application from
where it stopped).
Or running every batch in its own address space.
Both solutions are much more effective (and give greater peace of
mind :) ).
> Indeed, I had no theoretical backing to think that any
> application-wide invariants held after a failed assertion. But
> practically I had a pretty good guess that the error has to do with the
> working set associated with that particular batch, so throwing that away
> and loading the next batch was an acceptable route to take. I could
> still draw conclusions after the statistically successful experiment in
> addition to cues on where my bugs lie.
>
The problems start when your boss asks you to trace why a certain
output has been generated: telling him that it might have popped into
existence because of random corruption is usually a bad idea :)
--
gpd
But it is about the user! It's really all about respecting the user's
work (or life).
- Radiation therapy: agreed. Nuff said.
- The airplane's navigation system (I've worked on embedded airplane
systems) has a bug. At worst, this goes unnoticed because there is no
internal check, and the auto-pilot sets a low course right on a
mountain. At best, the program notices and dies, and a watchdog
mechanism kicks it back alive again in a sound state. Because there
are watchdogs for such conditions, and in the meantime, the second
redundant system did it's own part of the work.
- The word processor has a bug. It ignores it because assertions are
off in production code, and you go on typing the best chapters of the
best novel of your life. The document's structure is somehow in a rare
but valid state that the save-to-file code can't handle, and so each
time you save, only the first two paragraphs are saved, the other 99
pages are never written to disk. You'll only notice that the next
morning. Or, there's a good assertion mechanism that notices
something's wrong, tries hard to copy all your text in the clipboard
before dying, letting you know that there was an internal error and
your text was saved in the clipboard.
I'm being the devil's advocate here, of course, and Andrei's perfectly
right in that software is never about absolute truths and one-size-
fits-all solutions, but I feel bad enough about having any software
more useful than Minesweeper in a corrupt state that I really feel
custom, crashing assertions on in production code should be the
default for most new projects.
Cheers,
Carl
> An assertion is a runtime check for a condition that would indicate a
> programming error.
I should add "or a hardware failure," but the rest of my reasoning
applies equally to that situation.
> David Abrahams wrote:
>
>> there's no reason to think that code far from the point
>> where a bug is detected is in a better position to correct the problem.
>
> Actually there may as well be. Intuitively, by and large the deeper on
> the call stack you are, the more state you are dealing with. In main,
> at the top level, you have no other state than globals and main's only
> locals.
That's the exact opposite of my inuition. The program's globals (or the
space in a very shallow stack frame -- it amounts to almost the same
thing) lead to all its other state. In a well-written program, the
leaves of the call tree are typically general-purpose functions that
make very few assumptions about the whole program and deal only with the
data accessible via their parameters, while further up the stack you
have functions dealing with larger abstractions such as documents,
graphs, etc., whose state encompasses the state being operated on at the
leaves.
> The less state there is, the less risk for corruption there
> is.
>
> Of course, there is always risk that the global heap, the stack,
> the global data section, or the code section get corrupted, but
> statistically, it is reasonable to think that if you have a large
> amount of state that is corrupt in an unpredictable point, reducing
> that state by a lot will reduce your risks of corruption as well.
>
Agreed, but from my point of view, you've got it upside-down.
> By that reasoning, the farther the point of correction from the point
> of assertion, the more credible the correction.
Sorry, I just can't get it.
>> Throwing exceptions when preconditions have been violated is usually
^^^^^^^
>> just a way of avoiding responsibility for the hard decision about
>> whether to terminate the program.
>
> And IMHO that's not a bad thing. Getting back to the batch system I
> described in another post, in main() I catch any failed assertion and
> proceed to opening and processing a new file. I would have been less
> happy if the machine learning library I use decided it's best to abort
> the program on my behalf.
I am not an expert in the domain of long-running computations, but those
experts I've talked to use checkpointing, as suggested elsewhere in this
thread. I didn't find your argument for continuing to be persuasive,
although I could be missing something -- you left out a lot of detail.
What do the results mean once your state has been corrupted? Are they
even reproducible?
That said, I'm not absolutist about this; I did write "usually." There
are systems where the price of catastrophic failure that might ensue
from continuing with broken state is low enough that robust measures
like redundant systems or checkpointing aren't worth implementing, but
where it would be better to continue than abort. Toys might fall into
that category. There are also systems where you can have reasonable
confidence that broken invariants are confined to a limited area of the
system (behind a process boundary makes a very strong example).
However, if you find yourself making that argument to yourself about
code under your control, it's a good sign you don't really understand
what that code is doing, and you should rewrite it so that you are able
to better reason about its behavior.
<snip description of possible use of shell script to establish process
boundary> ...
> Besides, all of this is going on a cluster where it's somewhat easier
> and more efficient to run one straight native binary instead of a
> shell script that in turns loads a binary multiple times.)
I don't want to presume, but what you're doing, from an architectural
point-of-view, doesn't sound that different from what many other
researchers do with long-running computations. So it seems to me that
there must be plenty of other experience in this area. Did you consult
precedent before deciding how to handle these situations?
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
In fact, for long running application, you often need checkpointing
for other reasons anyway, for example because you might need to take
down the application for maintainance, or you might want to remotely
inspect its state; or the full state might even not fit in memory at
all; at that point, using it for failure recovery basically comes for
free.
[...]
> There are also systems where you can have reasonable
> confidence that broken invariants are confined to a limited area of the
> system (behind a process boundary makes a very strong example).
A simple fork(2) (also fancily known as "address space separation")
can help a lot in making an application more robust.
--
gpd
--
We have a multi threaded video database written in C++ and we did run
into untraceable errors too. Like deadlocks, crashes etc. We decided to
handle severe errors by simply stopping the database service and restart
it (< 3 seconds) and write a dump file to be able to trace and fix the
error.
I think it depends on the error, but in a previous version we continue
to record, which lead to silent corruption of our database.
> [...]
Andre
This defines an important and qualitatively distinct subset of the
uses to which the term "assertions" is popularly put. Simply
qualifying it in some way... e.g. "program correctness assertion"...
avoids any argument about having excluded other common uses of the
term.
Cross-referencing this with my earlier post, this would be one
category of OPTIONAL assertions, on the basis that programming errors
are not expected to happen, and hence the program would be expected to
act identically with the assertions disabled.
> In general, such a condition can not be (reliably) recovered from and
> indicates an arbitrary and unknowable amount of brokenness in the
> program state. Unrecoverable conditions should be handled with minimal
> emergency measures -- the details of which may depend on the application
> -- and not by using the same code paths as recoverable errors, which can
> be written to execute in an environment of full confidence in the
> program state. That's why it's important to distinguish assertions from
> other kinds of "error" checks.
Good insights and advice for those "in general" scenarios you qualify
it with, but corporate systems frequently stray into the grey areas
outside. There are enough situations where the range of potential
problems (e.g. resource leaks) is well enough understood to make
continuation a worthwhile or even necessary objective. Sometimes
resources can be owned/controlled by the caller, and released if the
subsystem fails, achieving the full confidence in program state you
describe, even if not confidence that the same error won't bite
again. Much of the discussion in this thread revolves around whether
it's worthwhile making this recovery attempt - my opinion is that the
developer must decide case by case based on the business cost of
alternatives, and probabilities: again, grey areas.
Anyway, ultimately I'm saying what you've also said: "minimal
emergency measures -- the details of which may depend on the
application" - though "minimal" is also vague enough to hide
differences of opinion... ;-).
Cheers,
Tony
--
"Better" or "effective" depends on the criteria you choose. My main
criterion was to get some partial results out of a long run without
investing too much. For that matter, there are myriads of bette
solutions if the amount of work invested is not an issue. Checkpointing
is a lot more coding for a doubtful benefit. I'd rather focus on fixing
the actual bugs (which I did), in addition to drawing conclusions from
the partial runs.
>> Indeed, I had no theoretical backing to think that any
>> application-wide invariants held after a failed assertion. But
>> practically I had a pretty good guess that the error has to do with the
>> working set associated with that particular batch, so throwing that away
>> and loading the next batch was an acceptable route to take. I could
>> still draw conclusions after the statistically successful experiment in
>> addition to cues on where my bugs lie.
>>
>
> The problems start when your boss asks you to trace why a certain
> output has been generated: telling him that it might have popped into
> existence because of random corruption is usually a bad idea :)
Which emphasizes my earlier point that it all depends. In my statistical
system, getting good results by mistake is practically downright
impossible. So if I get a partially failed run with crappy results, I
just call it a failure and look into fixing my code. If I get a
partially failed run with pretty good results, I have something to think
about after 8 hours of running, in addition to hints for fixing my code.
Good points. There are a lot of cases where there's absolutely no
heap or stack corruption, and the issue triggering the assertion is
something much more mundane: unexpected internal state of variables,
attempted read to an invalid index, unexpected character found during
deserialisation, unhandled case in switch etc. - which is known not to
have any potential adverse affect outside the sub-system. Combined
with classes that release resources as the stack is unwound, there's a
good chance of complete recovery.
Tony
Well, what does it mean for textbooks to have errors in them?
It doesn't make the entire textbook valueless. Corrupt state
could mean catastrophic failure, occasional and harmless odd
behavior, or it could be ephemeral and self-repairing. What
does it mean for users of the program for it to abandon what
it is in the middle of doing? That's the consideration.
> However, if you find yourself making that argument to
> yourself about code under your control, it's a good sign
> you don't really understand what that code is doing, and
> you should rewrite it so that you are able to better reason
> about its behavior.
Are you forgetting that we're talking about assertion failure
here? The programmer *has* understood and reasoned about the
behavior of the program. That's why there's an assertion in
the code rather than an if-statement. The problem isn't failure
to reason, it's making a mistake, so that the results of your
reasoning are incorrect, and what happens then.
--
>
> - The word processor has a bug. It ignores it because assertions are
> off in production code, and you go on typing the best chapters of the
> best novel of your life. The document's structure is somehow in a rare
> but valid state that the save-to-file code can't handle, and so each
> time you save, only the first two paragraphs are saved, the other 99
> pages are never written to disk. You'll only notice that the next
> morning. Or, there's a good assertion mechanism that notices
> something's wrong, tries hard to copy all your text in the clipboard
> before dying, letting you know that there was an internal error and
> your text was saved in the clipboard.
>
I can remember a time when the only viable solution to a crash of the WP
program I was using was to switch my computer off so that it could not
attempt to do any sort of cleanup. TH OS recovery from an unexpected
'crash' was a lot better than the WP's.
Yes, when my WP goes unstable I want the opportunity to be able to
recover my data, and having the program go on working without bothering
to tell me that it has a problem is not what I want.
I realise that there are time critical programs where speed of
evaluation is vital, but where it isn't (true for a great number of
applications) I would prefer to see the assert (whatever way that is
designed to work) mechanism left on
--
Note that robinton.demon.co.uk addresses are no longer valid.
> > And IMHO that's not a bad thing. Getting back to the batch system I
> > described in another post, in main() I catch any failed assertion and
> > proceed to opening and processing a new file. I would have been less
> > happy if the machine learning library I use decided it's best to abort
> > the program on my behalf.
Exactly. The same is true for the server my code was running in.
>
> I am not an expert in the domain of long-running computations, but those
> experts I've talked to use checkpointing, as suggested elsewhere in this
> thread.
Couldn't do that in my particular case.
> I didn't find your argument for continuing to be persuasive,
> although I could be missing something -- you left out a lot of detail.
> What do the results mean once your state has been corrupted? Are they
> even reproducible?
In my case the state corruption was confined to the operation one of
the clients requested. That client gets an error reported (internal
error or something similar), the other clients are allowed to remain
connected and they carry on ok.
> There are also systems where you can have reasonable
> confidence that broken invariants are confined to a limited area of the
> system
That was the case for me.
> However, if you find yourself making that argument to yourself about
> code under your control, it's a good sign you don't really understand
> what that code is doing, and you should rewrite it so that you are able
> to better reason about its behavior.
This was new code and yes, it did contain some programmer errors that
caused assertions to trigger at runtime in production. The function
was being invoked by thousands of users every day. They gave it more
testing than we ever could ;-)
-Andrew Marlow
--
Not that you've said otherwise, but for the sake of general
discussion: while there is a subset of assertions checking for
programming/hardware errors, there are definitely cases where "checks
for programming errors" are not good candidates for optional
assertions, nor for minimal and fatal handling...
Another post proves a useful illustration:
carl.seleb...@gmail.com wrote:
> - The word processor has a bug. It ignores it because assertions are
> off in production code, and you go on typing the best chapters of the
> best novel of your life. The document's structure is somehow in a rare
> but valid state that the save-to-file code can't handle, and so each
> time you save, only the first two paragraphs are saved, the other 99
> pages are never written to disk. You'll only notice that the next
> morning. Or, there's a good assertion mechanism that notices
> something's wrong, tries hard to copy all your text in the clipboard
> before dying, letting you know that there was an internal error and
> your text was saved in the clipboard.
A poor outcome (lost work) might result from the following flawed
reasoning:
- all sanity checks relating to programmatic/hardware error are
assertions,
- all assertions should be turned off in production, and/or
- when assertions are detected, the focus should be on notification &
feedback for the developer rather than preservation of user data
In this case, cheap sanity checks are akin to normal always-included
error handling, despite detecting programmatic error. There's still a
spectrum of potential checks with increasing resource requirements
(coding time, code size, CPU time, memory), such that at design,
compile and run-time certain compromises must be struck, and the on/
off compile-time choices directly supported by cassert may be
inappropriately crude.
Cheers,
Tony
--
>> What do the results mean once your state has been corrupted?
>
> Well, what does it mean for textbooks to have errors in them?
I don't understand the relationship of your question to mine. Anyway,
Andrei answered my question in a reply to someone else.
>> However, if you find yourself making that argument to
>> yourself about code under your control, it's a good sign
>> you don't really understand what that code is doing, and
>> you should rewrite it so that you are able to better reason
>> about its behavior.
>
> Are you forgetting that we're talking about assertion failure
> here?
Hardly.
> The programmer *has* understood and reasoned about the behavior of the
> program.
Not if the assertion fires.
> That's why there's an assertion in the code rather than an
> if-statement. The problem isn't failure to reason, it's making a
> mistake, so that the results of your reasoning are incorrect, and what
> happens then.
A mistake in reasoning == a failure to reason (successfully). I'm not
interested in whether you made an effort but in whether you
succeeded. In my experience, mistakes in reasoning about code behavior
don't just happen at random -- they occur when the design is
inhospitable to comprehension.
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
There is a misunderstanding at work somewhere around here. The amount of
state accessible to any given function is not the focus here. It's just
the amount of state, period. This is because a failed assertion could
indicate out-of-bounds access, following a stray pointer or whatnot, so
you can't count of visibility taking care of protecting data for you. Of
course we could define "a well-written" program as a program that has no
such problems, but then one could define a well-written program as one
that does not need assertions in the first place etc.
Anyhow, even as far as visibility is concerned, in main() it is minimal,
then it increases as the worker functions take over, and then indeed it
decreases as specialized functions are being called. So we could say
that falling back to main() or a function on the call stack close to it
reduces the amount of state the program is having, and therefore the
risk of that state being corrupted.
>> The less state there is, the less risk for corruption there
>> is.
>>
>> Of course, there is always risk that the global heap, the stack,
>> the global data section, or the code section get corrupted, but
>> statistically, it is reasonable to think that if you have a large
>> amount of state that is corrupt in an unpredictable point, reducing
>> that state by a lot will reduce your risks of corruption as well.
>>
>
> Agreed, but from my point of view, you've got it upside-down.
It's not upside-down even as visibility of state is concerned. Few
programs make their all or most of their state visible from main or from
globals. So as you walk the stack trace from main() to the top, you'll
see that accessible state first grows then shrinks.
>> By that reasoning, the farther the point of correction from the point
>> of assertion, the more credible the correction.
>
> Sorry, I just can't get it.
See above.
>>> Throwing exceptions when preconditions have been violated is usually
> ^^^^^^^
>>> just a way of avoiding responsibility for the hard decision about
>>> whether to terminate the program.
>> And IMHO that's not a bad thing. Getting back to the batch system I
>> described in another post, in main() I catch any failed assertion and
>> proceed to opening and processing a new file. I would have been less
>> happy if the machine learning library I use decided it's best to abort
>> the program on my behalf.
>
> I am not an expert in the domain of long-running computations, but those
> experts I've talked to use checkpointing, as suggested elsewhere in this
> thread. I didn't find your argument for continuing to be persuasive,
> although I could be missing something -- you left out a lot of detail.
> What do the results mean once your state has been corrupted? Are they
> even reproducible?
Good suggestions could go on forever, as long as the amount of work
involved is not an issue. I myself would rather work on fixing the cause
of the assertion instead of adding checkpointing.
I left out a lot of detail to focus on the main thrust, which is that
sometimes it's good that an assert throws a catchable exception. Yes,
results are meaningful once my state has been corrupted simply because I
happened to get rid of all corrupted state.
At the highest level, a machine learning system must make prediction of
a label (e.g. word, syllable, part of speech) from some features (e.g.
sound samples, written text). Rinse, lather, repeat millions of times.
So the system eats numbers and utters predictions. You have the true
labels in a separate file for comparison. If the label space is
sufficiently large (e.g. beyond 5-6 labels), it is virtually impossible
for a corrupt system to systematically make better guesses than a
carefully-setup learner. The madman blabbering incoherently may, with a
stretch of imagination, answer something true or deep or interesting
once in a while, but cannot *systematically* do so millions of times.
> That said, I'm not absolutist about this; I did write "usually." There
> are systems where the price of catastrophic failure that might ensue
> from continuing with broken state is low enough that robust measures
> like redundant systems or checkpointing aren't worth implementing, but
> where it would be better to continue than abort. Toys might fall into
> that category. There are also systems where you can have reasonable
> confidence that broken invariants are confined to a limited area of the
> system (behind a process boundary makes a very strong example).
> However, if you find yourself making that argument to yourself about
> code under your control, it's a good sign you don't really understand
> what that code is doing, and you should rewrite it so that you are able
> to better reason about its behavior.
Between calling my dissertation a toy and the social stigma of not
understanding what it is doing, guess I'll choose the former :o).
> <snip description of possible use of shell script to establish process
> boundary> ...
>
>> Besides, all of this is going on a cluster where it's somewhat easier
>> and more efficient to run one straight native binary instead of a
>> shell script that in turns loads a binary multiple times.)
>
> I don't want to presume, but what you're doing, from an architectural
> point-of-view, doesn't sound that different from what many other
> researchers do with long-running computations. So it seems to me that
> there must be plenty of other experience in this area. Did you consult
> precedent before deciding how to handle these situations?
There are 3-4 other labs in the country that do Automatic Speech
Recognition, and there is sharing of such knowledge and code among them.
Andrei
--
Your case makes me think of an important parameter in the discussion:
who owns the data the program works on?
In your case, you're the developer, as well as the owner of the data.
I guess it makes it easier for you to be OK with corrupted results,
because you know how they are generated. But how about all the
projects where the programmer is not the owner of the data? If he's
your office-mate, maybe that's OK too, but what if there distance is
greater than, say, 2 hallways, or $10?
So my feeling is that this is one of the really important facts to
take into account when trying to answer the OP's question (and my
arguments were all given in the context of the opposite situation,
where your customer owns the data).
Cheers,
Carl
Indeed. And that was the case for the app I was working on. It sounds
like it is the case for Andrei's also. But for some people this will
not be the case. So it seems to me that it would be beneficial for a
project to have a facility for doing assertions where it is runtime
time configurable whether it throws AssertionFailed or calls abort.
This gives users of that facility (and of libraries that use the
facility) several ablities:
1) they can catch and handle the error higher up as I do.
2) they can let the error bubble up right to the top and let the app
die, possibly to be restarted by the script that spawned it.
3) they can configure it to let it call abort, which will give them a
core dump and the ability to trace the stack in the debugger.
But IMVHO what you don't want to do is ignore the error and trundle on
in production. I do find it amazing that people seem to want this
though. Perhaps they might be happy with a configurable mode whereby
it takes no notice of the assertion failure. This would be something
that would be checked at runtime, unlike the assert.h/cassert
mechanism. In fact it would be more like the assert mechanism
available in java.
Regards,
Andrew Marlow
--
As I tried to explain earlier, and as David Abrahams said more
clearly, there are different kinds of asserts. There is the debug-
build-only assert. Its purpose is to be spread liberally in code to
help development and testing.
Then there is the release-build assert. Its purpose is to catch
programming errors that escaped testing.
All release build asserts are also debug build asserts. However, there
are good reasons that not all debug build asserts are release build
asserts. If you liberally spread debug build asserts in your code, as
you should be doing, and keep them in for release build, this could
have a very negative impact on release runtime performance. Thus, when
deciding which debug asserts to also keep in for release mode, you
need to evaluate them on a case by case basis, asking how much will
this hurt performance, how likely is the assert condition to be false,
how bad would it be if the assert condition is false, etc.
You have runtime performance, correctness of runtime performance, and
quality of testing. With fixed resources, you must make tradeoffs
between these three things. A release build assert is gaining
additional guarantees on the correctness of the program at the cost of
performance.
Also, as other people have said, if keeping all the debug asserts in
release build is not a big performance penalty, you probably aren't
using enough debug asserts.
If you're spending the time to perform the check, and you're inside
some kind of system where wrong behaviour is worse than no behaviour,
then clearly it would be bizarre to completely ignore the assertion.
Several people in thread have already given examples of cases where
wrong behaviour might be better than no behaviour, e.g.:
- inside a game / prefer graphics of not-entirely-correct shape or
colour to nothing
- If visually noticeable, then someone might consult some logging
generated by the assertion and give the issue more attention.
- tester is arbiter of what's serious, assertions generate super-set
of information on both serious and not-so-serious errors....
- low battery/UPS: unattended save of all files
- best effort better than aborted effort, especially if data
split across several independent files or file sections
What was it the old DOS prompt used to ask... "Abort, Retry,
Ignore...?". Did you ever choose Ignore? ;-)
Tony
> David Abrahams wrote:
>> on Sun Aug 17 2008, Andrei Alexandrescu
>> <SeeWebsiteForEmail-AT-erdani.org> wrote:
>>
>>> David Abrahams wrote:
>>>
>>>> there's no reason to think that code far from the point
>>>> where a bug is detected is in a better position to correct the problem.
>>> Actually there may as well be. Intuitively, by and large the deeper on
>>> the call stack you are, the more state you are dealing with. In main,
>>> at the top level, you have no other state than globals and main's only
>>> locals.
>>
>> That's the exact opposite of my inuition. The program's globals (or the
>> space in a very shallow stack frame -- it amounts to almost the same
>> thing) lead to all its other state.
>> In a well-written program, the
>> leaves of the call tree are typically general-purpose functions that
>> make very few assumptions about the whole program and deal only with the
>> data accessible via their parameters, while further up the stack you
>> have functions dealing with larger abstractions such as documents,
>> graphs, etc., whose state encompasses the state being operated on at the
>> leaves.
>
> There is a misunderstanding at work somewhere around here.
Definitely.
> The amount of state accessible to any given function is not the focus
> here. It's just the amount of state, period.
The amount of state used by the whole program?
> This is because a failed assertion could indicate out-of-bounds
> access, following a stray pointer or whatnot,
Sure
> so you can't count of visibility taking care of protecting data for
> you. Of course we could define "a well-written" program as a program
> that has no such problems, but then one could define a well-written
> program as one that does not need assertions in the first place etc.
Yes, but I can't ignore the fact that unwinding requires potentially a
great deal of code to run that touches the corrupted state, and
that in my experience one of the most common ways to crash is when
something invalid gets passed to a resoure-releasing function
(e.g. free), and that the state that got corrupted might *very easily*
include some of the return addresses on the call stack.
> Anyhow, even as far as visibility is concerned, in main() it is
> minimal, then it increases as the worker functions take over,
I don't think I would be willing to make any such general assertion. It
depends how the program is written.
> and then indeed it decreases as specialized functions are being
> called. So we could say that falling back to main() or a function on
> the call stack close to it reduces the amount of state the program is
> having, and therefore the risk of that state being corrupted.
>
> It's not upside-down even as visibility of state is concerned. Few
> programs make their all or most of their state visible from main or from
> globals. So as you walk the stack trace from main() to the top, you'll
> see that accessible state first grows then shrinks.
If it isn't main(), IME, there is usally a function one or two calls
away from main() that has the role of establishing storage for the state
of the whole program, and usually that's very close to the catch blocks
used for error reporting.
>>> By that reasoning, the farther the point of correction from the point
>>> of assertion, the more credible the correction.
>>
>> Sorry, I just can't get it.
>
> See above.
OK, now I get but don't buy it :-)
>>>> Throwing exceptions when preconditions have been violated is usually
>> ^^^^^^^
>>>> just a way of avoiding responsibility for the hard decision about
>>>> whether to terminate the program.
>>> And IMHO that's not a bad thing. Getting back to the batch system I
>>> described in another post, in main() I catch any failed assertion and
>>> proceed to opening and processing a new file. I would have been less
>>> happy if the machine learning library I use decided it's best to abort
>>> the program on my behalf.
>>
>> I am not an expert in the domain of long-running computations, but those
>> experts I've talked to use checkpointing, as suggested elsewhere in this
>> thread. I didn't find your argument for continuing to be persuasive,
>> although I could be missing something -- you left out a lot of detail.
>> What do the results mean once your state has been corrupted? Are they
>> even reproducible?
>
> Good suggestions could go on forever, as long as the amount of work
> involved is not an issue. I myself would rather work on fixing the cause
> of the assertion instead of adding checkpointing.
I understand.
> I left out a lot of detail to focus on the main thrust, which is that
> sometimes it's good that an assert throws a catchable exception.
Agreed. My thrust is that such cases are rare.
> Yes, results are meaningful once my state has been corrupted simply
> because I happened to get rid of all corrupted state.
Once state is corrupted, you really can't know for sure, can you?
> At the highest level, a machine learning system must make prediction of
> a label (e.g. word, syllable, part of speech) from some features (e.g.
> sound samples, written text). Rinse, lather, repeat millions of times.
> So the system eats numbers and utters predictions. You have the true
> labels in a separate file for comparison. If the label space is
> sufficiently large (e.g. beyond 5-6 labels), it is virtually impossible
> for a corrupt system to systematically make better guesses than a
> carefully-setup learner. The madman blabbering incoherently may, with a
> stretch of imagination, answer something true or deep or interesting
> once in a while, but cannot *systematically* do so millions of times.
I understand that, but I don't understand how it makes continuing after
an assertion useful to you. I'm not saying it isn't useful to you; I
just don't get it yet.
>> That said, I'm not absolutist about this; I did write "usually." There
>> are systems where the price of catastrophic failure that might ensue
>> from continuing with broken state is low enough that robust measures
>> like redundant systems or checkpointing aren't worth implementing, but
>> where it would be better to continue than abort. Toys might fall into
>> that category. There are also systems where you can have reasonable
>> confidence that broken invariants are confined to a limited area of the
>> system (behind a process boundary makes a very strong example).
>> However, if you find yourself making that argument to yourself about
>> code under your control, it's a good sign you don't really understand
>> what that code is doing, and you should rewrite it so that you are able
>> to better reason about its behavior.
>
> Between calling my dissertation a toy and the social stigma of not
> understanding what it is doing, guess I'll choose the former :o).
Sorry, a _very_ poor choice of words on my part. I meant "you" in
general, not you in particular, Andrei.
>> I don't want to presume, but what you're doing, from an architectural
>> point-of-view, doesn't sound that different from what many other
>> researchers do with long-running computations. So it seems to me
>> that there must be plenty of other experience in this area. Did you
>> consult precedent before deciding how to handle these situations?
>
> There are 3-4 other labs in the country that do Automatic Speech
> Recognition, and there is sharing of such knowledge and code among
> them.
So your approach to failed assertions is used elsewhere in your field?
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
Well, I must admit, this is something I did ignore on the project
where I used the AssertionFailed exception. It seemed to work there
but the project did have an unusual feature that might explain why we
got away with it. The server was written in FORTRAN and only a handful
of new pieces were written in C++ including our bit. Thus hardly
anything in the server was using C++. Legacy code that we called was
written in FORTRAN so even if there were bugs there they would be most
unlikely to corrupt the heap.
> > and then indeed it decreases as specialized functions are being
> > called. So we could say that falling back to main() or a function on
> > the call stack close to it reduces the amount of state the program is
> > having, and therefore the risk of that state being corrupted.
> >
> > It's not upside-down even as visibility of state is concerned. Few
> > programs make their all or most of their state visible from main or from
> > globals. So as you walk the stack trace from main() to the top, you'll
> > see that accessible state first grows then shrinks.
There was a point in our app when we could always conveniently fail
to. It was the entrypoint for our function. By catching and reporting
anything there we could issue an error ('an internal error has occured
please contact support') and still let the server continue to run.
It's true that there wasn't much state there to give any context
information on what had gone wrong but we used our logger to take care
of that. The logger would log the assertion failure and in development
it would log much more as part of the general logging we did (lots,
which was disabled in production for performance reasons).
> >> I am not an expert in the domain of long-running computations, but
those
> >> experts I've talked to use checkpointing, as suggested elsewhere in
this
> >> thread. I didn't find your argument for continuing to be persuasive,
> >> although I could be missing something -- you left out a lot of detail.
> >> What do the results mean once your state has been corrupted? Are they
> >> even reproducible?
> >
> > Good suggestions could go on forever, as long as the amount of work
> > involved is not an issue. I myself would rather work on fixing the cause
> > of the assertion instead of adding checkpointing.
>
> I understand.
What we did was to use detailed logging during our computations. When
an AssertionFailed was thrown in production we would have some
information from the AssertionFailed log which we would use in
conjunction with the user error report to try to reproduce the
problem. Once we had reproduced it in production we would take the
same steps in development where the logger had full trace turned on.
This approach enabled us to find and fix problems quickly.
> > I left out a lot of detail to focus on the main thrust, which is that
> > sometimes it's good that an assert throws a catchable exception.
>
> Agreed. My thrust is that such cases are rare.
Maybe my app is a special case then :-)
> > Yes, results are meaningful once my state has been corrupted simply
> > because I happened to get rid of all corrupted state.
>
> Once state is corrupted, you really can't know for sure, can you?
With the FORTRAN legacy we had we could be pretty sure.
> I understand that, but I don't understand how it makes continuing after
> an assertion useful to you. I'm not saying it isn't useful to you; I
> just don't get it yet.
Hopefully you will see now why it was useful for my case.
Regards,
Andrew Marlow
--