Interresting thread in comp.lang.eiffel

Pascal Obry

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

The thread is named :
"Would Eiffel have avoided the $500 million mistake ?"

(talking about Ariane 5 crash)

This is raising issues like exception, unhandled exceptions,
Eiffel, DbC (Design By Contract)...

Anyway anybody with good understanding of the Ariane 5 first
flight could enlighten the thread...

Pascal.

--

--|------------------------------------------------------
--| Pascal Obry Team-Ada Member
--| 45, rue Gabriel Peri - 78114 Magny Les Hameaux FRANCE
--|------------------------------------------------------
--| http://perso.wanadoo.fr/pascal.obry
--|
--| "The best way to travel is by means of imagination"

Marin D. Condic

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

Pascal Obry wrote:
>
> The thread is named :
> "Would Eiffel have avoided the $500 million mistake ?"
>
> (talking about Ariane 5 crash)
>

Is this *still* being debated? This was a fairly major firestorm a
couple of years ago and I would have hoped it had managed to die. The
problem with this disaster was not the language used to program the
system. It was a failure to test the software in a new environment.
Hopefully, those who want to advocate Eiffel will drop this one and move
to more convincing arguments.

MDC
--
======================================================================
Marin David Condic - Quadrus Corporation - http://www.quadruscorp.com/
Send Replies To: m c o n d i c @ q u a d r u s c o r p . c o m
Visit my web site at: http://www.mcondic.com/

"Some people think programming Windows is like nailing jello to the
ceiling... easy with the right kind of nails."

-- Ivor Horton - Beginning Visual C++ 6
======================================================================

Ken Garlington

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

"Marin D. Condic" <mcondic...@acm.com> wrote in message
news:3950B2A3...@acm.com...

> Pascal Obry wrote:
> >
> > The thread is named :
> > "Would Eiffel have avoided the $500 million mistake ?"
> >
> > (talking about Ariane 5 crash)
> >
> Is this *still* being debated? This was a fairly major firestorm a
> couple of years ago and I would have hoped it had managed to die. The
> problem with this disaster was not the language used to program the
> system. It was a failure to test the software in a new environment.
> Hopefully, those who want to advocate Eiffel will drop this one and move
> to more convincing arguments.
>
> MDC

Well, that explains the recent e-mail traffic I suddenly started receiving
about my Ariane pseudo-paper. For the record, here's my opinion on the whole
thing:

http://www.flash.net/~kennieg/ariane.html

Bobby D. Bryant

unread,

Jun 21, 2000, 3:00:00 AM6/21/00

to

Pascal Obry wrote:

> Anyway anybody with good understanding of the Ariane 5 first
> flight could enlighten the thread...

From what I gathered, they reused a mechanical part, and since the
software for that part had been working perfectly, they reused the
software as well. Unfortunately, the more powerful booster on the A5
put the rocket out of the performance envelope that the original
software was designed for, so the software barfed.

If there is a programming language that is immune to *that* kind of
problem, I'm eager to learn it!

Bobby Bryant
Austin, Texas

Hyman Rosen

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

"Bobby D. Bryant" <bdbr...@mail.utexas.edu> writes:
> If there is a programming language that is immune to *that* kind of
> problem, I'm eager to learn it!

There is a legitimate question of whether runtime error checking
is appropriate for production code. That is, when code inspection
has determined (perhaps erroneously) that a certain error state
can not occur, is there nevertheless value in coding a check for
this condition (perhaps automatically, by the compiler) for use
in production, not test, code.

Laurent Guerby

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

I think "production code" is too vague to conclude on the issue.

In the Ariane case, this is embedded systems you have no chance to
update at all. In the Ariane case, the check made it obvious of what
went wrong, it of course didn't save the rocket ;-).

For software to be used interactively (and where support from the
developper is immediately available for example), I feel that removing
all checks for releases is just a mistake. Better have a message with
"internal error: CONSTRAINT_ERROR at foo.adb:666" than wrong results
or complete application instability due to memory corruption later on.
There are practical cases were it is better to have no result (an
error message) than wrong results!

--
Laurent Guerby <gue...@acm.org>

Hyman Rosen

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Laurent Guerby <gue...@acm.org> writes:
> For software to be used interactively (and where support from the
> developper is immediately available for example), I feel that removing
> all checks for releases is just a mistake. Better have a message with
> "internal error: CONSTRAINT_ERROR at foo.adb:666" than wrong results
> or complete application instability due to memory corruption later on.
> There are practical cases were it is better to have no result (an
> error message) than wrong results!

That's questionable. I can think of many applications where limping
along with damage is better than termination, since very often the
damage is not so severe that work in progress can't be saved. I
have had games abruptly die with an assertion failure. Perhaps the
programmer felt morally upright, but it was terribly annoying losing
a section of valuable progress.

Also, Robert Dewar has pointed out that Ada lends itself to a style
of programming in which buffer overflows are a priori unlikely, even
with error checking disabled, so errors are less likely to corrupt
memory in Ada than in C, say.

Laurent Guerby

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Hyman Rosen <hy...@prolifics.com> writes:
> That's questionable. I can think of many applications where limping
> along with damage is better than termination, since very often the
> damage is not so severe that work in progress can't be saved. I
> have had games abruptly die with an assertion failure. Perhaps the
> programmer felt morally upright, but it was terribly annoying losing
> a section of valuable progress.

Well, in your game example, the same bug unoticed could have garbled
your save file later on. A real world example here are Microsoft
products various data files that were saved successfully but cannot be
read afterwards...

Anyway, I was thinking of application where the internal error is
reported instead of the result, but the application doesn't fail.

> Also, Robert Dewar has pointed out that Ada lends itself to a style
> of programming in which buffer overflows are a priori unlikely, even
> with error checking disabled, so errors are less likely to corrupt
> memory in Ada than in C, say.

Hum, a bit weak to me. Looks like the "real programmer never makes
mistake, so why bother checking" thing. The idea behind Ada is that
human beings are not real programmers ;-).

--
Laurent Guerby <gue...@acm.org>

Hyman Rosen

unread,

Jun 22, 2000, 3:00:00 AM6/22/00

to

Laurent Guerby <gue...@acm.org> writes:
> Hum, a bit weak to me. Looks like the "real programmer never makes
> mistake, so why bother checking" thing. The idea behind Ada is that
> human beings are not real programmers ;-).

I believe he meant things like index variables which are declared
to have the range of the array they are indexing, so that they
can never access outside of the array.

Never mind, though. In this case, the compiler would presumably
elide the test anyway.

Tarjei T. Jensen

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

Hyman Rosen wrote in message ...

>I can think of many applications where limping
>along with damage is better than termination, since very often the
>damage is not so severe that work in progress can't be saved. I
>have had games abruptly die with an assertion failure. Perhaps the
>programmer felt morally upright, but it was terribly annoying losing
>a section of valuable progress.

One of the point with run-time checking is that you can detect errors and use
that information to recover. C++ style assertion failures are not examples of
good programming. The Ada community should be able to do better.

Greetings,

Marin D. Condic

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

Hyman Rosen wrote:
>
> "Bobby D. Bryant" <bdbr...@mail.utexas.edu> writes:
> > If there is a programming language that is immune to *that* kind of
> > problem, I'm eager to learn it!
>
> There is a legitimate question of whether runtime error checking
> is appropriate for production code. That is, when code inspection
> has determined (perhaps erroneously) that a certain error state
> can not occur, is there nevertheless value in coding a check for
> this condition (perhaps automatically, by the compiler) for use
> in production, not test, code.

This may make an interesting debate for those who are developing on
workstations, etc. In the original Ariane 5 disaster, they were using a
Mil-Std-1750a microprocessor and were absolutely up against the wall in
terms of throughput. They *had* to remove the runtime checks or the
system would not have functioned at all. (This is not uncommon in hard,
real-time systems - especially when running with very slow processors
and very high CPU utilizations.)

And as has been pointed out in the past, when building these sorts of
hard, real-time, embedded systems with high reliability requirements,
you don't test code with one set of compiler switches, then change the
compiler switches to make production code. (possible exception: smoke
testing) What you do is compile one specific pattern of bits, then
validate that pattern of bits to be as sure as you can that they are
correct. If you change so much as a single bit, you can no longer
guarantee that your validation is any good.

Within the context of the IRS for Ariane 5, programming is a whole
different world than what most people are used to. What makes perfectly
good sense in most environments becomes problematic in this world.

Robert A Duff

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

"Marin D. Condic" <mcondic...@acm.com> writes:

> And as has been pointed out in the past, when building these sorts of
> hard, real-time, embedded systems with high reliability requirements,
> you don't test code with one set of compiler switches, then change the

> compiler switches to make production code. ...

You don't (or shouldn't) do that for desktop workstation code, either.
If you're going to change the compiler switches for production code,
you have to run your test suite again. How is that different for
hard real-time blah blah blah? For *either* type of system, you should
test the exact bits that get shipped.

- Bob

Hyman Rosen

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

"Tarjei T. Jensen" <tarjei...@kvaerner.com> writes:
> One of the point with run-time checking is that you can detect errors and use
> that information to recover. C++ style assertion failures are not examples of
> good programming. The Ada community should be able to do better.

But once you've decided that an error can't happen, you are not
going to have recovery code to handle that error. In that case,
it may be better to not catch the error, in the hope that its
effects will be local and damped out.

mjs...@my-deja.com

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

In article <t7bt0si...@calumny.jyacc.com>,

But you can never decide an error can't happen -- all you can decide is
that you can't get a *software* error. Since the hardware is a
physical entity you can't make such a guarantee for it. The Ariane
folks took the view that any unhandled exception was the result of a
hardware error, and then chose the (rather extreme) response of
shutting off the hardware and letting the (presumably unaffected)
backup take over. One of the (obvious?) recommendations of the
investigation was that modules should always attempt to output best-
effort results, which I interpret to mean, among other things, that
exceptions should be handled much more locally. In the Ariane case
this would probably have meant adding extra hardware, since the
processors were working at their allowed limits, but then $500 Million
can buy a lot of processors...

Mike

Sent via Deja.com http://www.deja.com/
Before you buy.

Pat Rogers

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

"Hyman Rosen" <hy...@prolifics.com> wrote in message
news:t7bt0si...@calumny.jyacc.com...

> "Tarjei T. Jensen" <tarjei...@kvaerner.com> writes:
> > One of the point with run-time checking is that you can detect
errors and use
> > that information to recover. C++ style assertion failures are not
examples of
> > good programming. The Ada community should be able to do better.
>
> But once you've decided that an error can't happen, you are not
> going to have recovery code to handle that error. In that case,
> it may be better to not catch the error, in the hope that its
> effects will be local and damped out.

IMHO, for safety-critical applications the overall architecture should
have diverse backups at whatever levels make sense. At those levels,
and perhaps only at those levels, exceptions are a clean way of
detecting those errors that should not have been possible. Errors
are indeed possible, even though they have been proven otherwise,
because of two things: specification errors and environmental stimuli
(e.g. radiation-induced bit toggling).

Specifically, I favor allowing exceptions within the reusable fault
recovery mechanism itself (e.g.. recovery blocks) and within the upper
levels of the application where it interacts with the fault recovery
mechanism.

If both the proof of impossibility *and* no recovery (especially from
software faults) is intended, then I can see the argument for turning
off exceptions.

--
Pat Rogers Consulting and Training in:
http://www.classwide.com Deadline Schedulability Analysis
pro...@classwide.com Software Fault Tolerance
(281)648-3165 Real-Time/OO Languages

Pat Rogers

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

I dropped a couple of words in the editing...

"Pat Rogers" <pro...@NOclasswideSPAM.com> wrote in message
news:TNM45.18$Hr4....@nnrp1.sbc.net...

I meant to say:

If both the proof of impossibility is performed *and* no recovery

Jeff Carter

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

"Marin D. Condic" wrote:
> This may make an interesting debate for those who are developing on
> workstations, etc. In the original Ariane 5 disaster, they were using a
> Mil-Std-1750a microprocessor and were absolutely up against the wall in
> terms of throughput. They *had* to remove the runtime checks or the
> system would not have functioned at all. (This is not uncommon in hard,
> real-time systems - especially when running with very slow processors
> and very high CPU utilizations.)

No, the runtime check was left in, and the exception was handled, but
incorrectly. They thought they had proved no software errors were
possible, so they treated all exceptions as hardware errors.

Yet another reason to leave runtime checks in and handle them
intelligently, even when you've proved that the exceptions cannot occur.

--
Jeff Carter
"You tiny-brained wipers of other people's bottoms!"
Monty Python & the Holy Grail

tmo...@bix.com

unread,

Jun 23, 2000, 3:00:00 AM6/23/00

to

>Within the context of the IRS for Ariane 5, programming is a whole
>different world than what most people are used to.

Are there textbooks and CompSci courses on reliable programming?
There seem to be a lot of people around who believe it's impossible
to do any better than, say, Windows, and there's very little mention
of counterexamples, much less how to build them.

Marin D. Condic

unread,

Jun 24, 2000, 3:00:00 AM6/24/00

to

Robert A Duff wrote:
> You don't (or shouldn't) do that for desktop workstation code, either.
> If you're going to change the compiler switches for production code,
> you have to run your test suite again. How is that different for
> hard real-time blah blah blah? For *either* type of system, you should
> test the exact bits that get shipped.
>

It's different because if I build, say, a word processor for a
PC/Workstation and it crashes because of some obscure compiler bug (high
optimization maybe generating incorrect code for something that worked
correctly with all debug options set) I don't blow up a rocket or kill
hundreds of people. I just lose some typing and have to restart the app.

Perhaps you took my meaning wrong. Yes, there are safety critical
applications that run on workstations and therefore they may have test
requirements that are just as stringent as for real-time, embedded
computers. But on a workstation, you typically have quite a bit of
processor headroom and so leaving runtime checks or other debug options
on may not matter. Usually, these sort of workstation apps aren't tested
with nearly the rigor that typically is done for an embedded device.
(Yes, I work on these sorts of apps too and have seen many other
development efforts of this type in different organizations. I have not
yet seen anyone test these apps as thoroughly as I've seen engine
controls tested. That doesn't mean it is not done anywhere - just that
if it is, it isn't "normal practice".)

For many applications that are done on workstations (maybe the bulk?)
we're talking about things that if they fail, it's no big deal. (Witness
Micro$oft) Commercial products don't want to fail in the field, but lots
of stuff is built to be used in-house where if it crashes, there's a
techie somewhere, around whos neck you can get your fingers. Often the
testing on this sort of stuff is very minimal. Changing some switches on
a compiler of known quality to eliminate debugging info or to improve
optimization for a production release is not only common, but probably
not a "bad" thing.

Marin D. Condic

unread,

Jun 24, 2000, 3:00:00 AM6/24/00

to

Jeff Carter wrote:
> No, the runtime check was left in, and the exception was handled, but
> incorrectly. They thought they had proved no software errors were
> possible, so they treated all exceptions as hardware errors.
>

I may be getting senile and am not remembering correctly, so you can go
back and re-read the report if you like and prove me wrong. My
recollection was that the fault occurred in a subprogram which was
originally found to execute too slowly to be workable unless they
removed all the runtime checking. The Ariane-4 flight profile was
analyzed and it was determined that this could never be the case there -
hence it was safe to turn off the checks. In the process of converting a
floating point number to an integer, they triggered a hardware overflow
interrupt. The interrupt handler was designed on the assumption that
because no incorrect computations were going to happen, any overflow
must be the fault of a bad sensor or other hardware problem. (Or at
least this was the most likely cause.) Hence, the accommodation was to
shut down the bad channel and transfer control to the other side.

In the sense of Ada runtime checks - there were none in the code in
question. In the sense of "general runtime checks" there was a hardware
check in place and an accommodation set up to handle it.

I may be wrong about this, so go ahead and check me on it. I was
building a rocket engine control at the time this happened and we were
using the same microprocessor and the same language. Immediately after
the blowup, I had a bunch of nervous engineers at LockMart flogging me
for weeks to demonstrate that our system couldn't do the same thing. I
still have the whip marks on my back, so I think it was a pretty
memorable experience.

> Yet another reason to leave runtime checks in and handle them
> intelligently, even when you've proved that the exceptions cannot occur.
>

A) It couldn't be done in this particular instance because of throughput
constraints. B) In a sense there *were* runtime checks, but the
accommodation was not correct for the actual failure.

Failure detection is one part of the FD&A problem. That was, in fact,
done. The "A" part (accommodation) was another matter. Having done my
fair share of FD&A analysese on various systems, I can say this: You can
never come up with the perfect accommodation because you can never be
sure of the exact cause and "correct" response for every possible case.
You're stuck with the fact that one accommodation may be right for one
cause, but for another cause (or path) the accommodation may be exactly
the wrong thing to do. You can only hope that you discover the most
probable cause(s) and provide the most generally correct accommodation.

Marin D. Condic

unread,

Jun 24, 2000, 3:00:00 AM6/24/00

to

tmo...@bix.com wrote:
> Are there textbooks and CompSci courses on reliable programming?

I'm sure there are, but offhand I don't have a bibliography - just field
experience. It might be a useful thing to post at Adapower.

> There seem to be a lot of people around who believe it's impossible
> to do any better than, say, Windows, and there's very little mention
> of counterexamples, much less how to build them.

Well, I think there are lots of examples flying around in the sky and
keeping hearts beating! :-) Of course the cost of insuring reliability
is considerable, so for lots of systems, it isn't done. I'm sure we
could do a better job than most Windows apps or Windows itself, just by
using Ada (or similar languages) because of the compile time checks. It
would definitely kick it up a notch. Could we achieve a "Failure Is Not
An Option!" level of reliability for most apps? Probably, it wouldn't be
worth the cost. Better languages and better designs would likely improve
things quite a bit.

Jeff Carter

unread,

Jun 24, 2000, 3:00:00 AM6/24/00

to

In article <3954C8A9...@acm.com>,

The error was in a system called the IRS.

The inquiry report talks about "exceptions" and "exception handlers"; it also
describes "protecting" other conversions from floating-point to integer. I
have interpreted this (perhaps incorrectly) to mean that some conversions had
additional code around them to ensure that the conversions could not fail,
including special processing when the floating-point values did exceed the
range of the integer type. Other conversions (including the horizontal-bias
value that caused the problem) did not have such protection due to throughput
constraints and proofs that out-of-range values could not occur (for
Ariane-4, of course). In the absence of such protection, the conversion's
runtime check caused an exception that was handled by an exception handler.
The specification for the exception handler was that all exceptions were
hardware failures and should be handled by shutting down the IRS.

Perhaps I am incorrect as interpreting these as Ada exceptions, caused by
runtime checks, and Ada exception handlers. The report does not state
explicitly what it means by "exception" and "exception handler".

I knew someone at LM who was working on Titan SW at the time who was
confident that their system integration testing would detect any such error,
and the inquiry report states that such testing would have detected the
Ariane-5 error.

>
> > Yet another reason to leave runtime checks in and handle them
> > intelligently, even when you've proved that the exceptions cannot occur.
> >
> A) It couldn't be done in this particular instance because of throughput
> constraints. B) In a sense there *were* runtime checks, but the
> accommodation was not correct for the actual failure.
>
> Failure detection is one part of the FD&A problem. That was, in fact,
> done. The "A" part (accommodation) was another matter. Having done my
> fair share of FD&A analysese on various systems, I can say this: You can
> never come up with the perfect accommodation because you can never be
> sure of the exact cause and "correct" response for every possible case.
> You're stuck with the fact that one accommodation may be right for one
> cause, but for another cause (or path) the accommodation may be exactly
> the wrong thing to do. You can only hope that you discover the most
> probable cause(s) and provide the most generally correct accommodation.

Devising ways to handle exceptions intelligently is not necessarily (or
usually) easy. The inquiry report did suggest having the exception handlers
allow essential systems to continue providing "best effort" values rather
than simply shutting down, but did not state how this could have been done
for the case in question. It is hard to see how such an exception handler
could discriminate between a SW design error and the random hardware failures
the SW was designed to handle. In the former case, it should provide "best
effort" values (since the backup has the same error), while in the latter, it
should switch to the backup system.

Certainly there are situations in which runtime checks should not be on.
However, the Ariane 5 had 2 IRS systems, and used exception handling to
switch from the main to the backup IRS, because it assumed that all
exceptions were due to hardware failures. Some sort of runtime checking is
necessary to cause the exceptions to allow the exception hander to make the
switch.

--
Jeff Carter
"Now go away or I shall taunt you a second time."
-- Monty Python and the Holy Grail

Marin D. Condic

unread,

Jun 24, 2000, 3:00:00 AM6/24/00

to

Jeff Carter wrote:
>
> Perhaps I am incorrect as interpreting these as Ada exceptions, caused by
> runtime checks, and Ada exception handlers. The report does not state
> explicitly what it means by "exception" and "exception handler".
>

My recollection of the report was that it was rather badly written if
one was from the perspective of an Ada programmer. Much terminology was
misused or imprecise. The author(s) were most likely system people who
maybe had a passing knowledge of programming, but were not versed
specifically in Ada or the 1750a microprocessor, so their phraseology
was bad. My understanding at the time was that the code was supressed
that would normally have range checked the result and raised
Constraint_Error - presumably going off to some exception handler that
may have performed some kind of recovery. Instead, it hit a hardware
overflow, which trapped to an interrupt handler, the behavior of which
was programmed to shut down the channel and transfer control to the
other side.

I have not looked at the report in a very long time and my memory of
what transpired could be flawed, but this is my recollection of the
actual cause from all the data sources I had at that time.

Bertran...@eiffel.com

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

The point made in the IEEE Computer article by Jean-Marc Jezequel
and myself (see
http://www.eiffel.com/doc/manuals/technology/contract/ariane)
is not that using Eiffel would by itself have avoided the bug.
Such a statement wouldn't make much sense (it's obvious that
in any general-purpose programming language you can mess things up).

But we do make the point that in the Design by Contract
culture pervasive in Eiffel, and directly supported
by the language (including its inheritance mechanism,
exception handling, documentation tools, debugging tools...)
the standard and almost obligatory way of reusing a module is
through its contract, including its conditions of use (precondition).
The normal procedure is to document a routine through its
contract and, when reusing it, to check that every call satisfies
the contract. In a software development culture that has assimilated
and integrated the principles of Design by Contract
the first task of quality assurance is to check that
every call takes the contracts into account. In the Ariane case
this wouldn't even have required a test; just static inspection.
Any organization that used Eiffel and had even a minimal
quality assurance process adapted to the Eiffel technology
would have done that, even if the quality assurance process
were otherwise imperfect. (That's the important point:
obviously, if the QA process is perfect, assuming that's
possible, it will catch errors. The important question is
how a serious but possibly imperfect process, routinely
associated with a certain technology, will perform.)

So the gist of the argument in our article is: a standard
Eiffel practice would most likely have caught the problem.

What's interesting is not a claim of the form "If they had done
X they would have avoided it". This is true of many `X' and,
after the fact, it's easy by definition to come up with an `X'
since we know what the error was. What's interesting is that
the actual X mentioned above -- checking that contracts are
satisfied -- is an integral part of the Eiffel method, routinely
applied, not an ex-post-facto suggestion based on anything
Ariane-specific.

It's also quite easy and cheap to do. In contrast, the recommendations
of the commission that examined the crash and published the report
involved "more of the same" quality assurance techniques -- more
management, more tests -- leading to ever higher costs, without
the guarantee that systemic problems such as not requiring
and enforcing explicit contracts won't again lead to similar
results, however many millions of dollars are spent on better
management practices and more extensive testing.

Many of the same observations, by the way, apply to the September
1999 loss of the Mars Climate Orbiter (see
http://mars.jpl.nasa.gov/msp98/news/mco990930.html).

So, no, use of Eiffel doesn't guarantee you won't make errors,
but it certainly helps avoid a large class of common and
damaging errors.

--
Bertrand Meyer
Interactive Software Engineering
ISE Building, 2nd Floor, 270 Storke Road Goleta CA 93117
Phone 805-685-1006, Fax 805-685-6869, http://eiffel.com

David Gillon

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

Bertran...@eiffel.com wrote:
> But we do make the point that in the Design by Contract
> culture pervasive in Eiffel, and directly supported
> by the language (including its inheritance mechanism,
> exception handling, documentation tools, debugging tools...)
> the standard and almost obligatory way of reusing a module is
> through its contract, including its conditions of use (precondition).

But in the Ariane 501 case the reuse was at the LRU level, not software.
They treated the Inertial Reference System as a COTS plug-in, assuming
the Ariane 4 design requirements (contract?) remained valid and did not
need to be re-evaluated. The systems analysis failure which allowed this
was outside of the IRS development, meaning that Eiffel would only make
a difference if applied across the entire Ariane 5 system development.
Not knowing Eiffel, I'm unsure how well its DBC facilities would cope
with requirements flow down across completely different systems -- if
the interface requirements specification must be updated manually in
such cases then the historical failure can still occur in precisely the
same manner.

> The normal procedure is to document a routine through its
> contract and, when reusing it, to check that every call satisfies
> the contract.

But how well does this scale to cover re-use of independently developed
LRUs?

> In a software development culture

It's probably a mistake to think of a programme on the scale of Ariane 5
as having a single engineering culture. I'm not familiar enough with the
programme to know whether the IRS was sourced internally to Arianespace
or from an outside subcontractor, but even for the internal case the
development of IRS interface requirements, IRS requirements and IRS
software might well be separated across different sites, even countries.
We do know that the initial development for Ariane 4 and the
reassessment for Ariane 5 are separated by a space of years, so the
decision to use the system unchanged likely involved a largely different
set of engineers.

> the first task of quality assurance is to check that
> every call takes the contracts into account. In the Ariane case
> this wouldn't even have required a test; just static inspection.

I'm not certain this would have made a difference. The error seems to
have been an assumption that, in Eiffel terms, the contract was
unchanged, and so did not need to be reassessed. They likely had
processes that might have detected the error if invoked, but time and
cost pressure caused them to make the assumption that they were not
needed in this case.

> So the gist of the argument in our article is: a standard
> Eiffel practice would most likely have caught the problem.

If it had been applied. But even if such a process was in place, would
it have been applied in this case? Or would the decision have been taken
that it didn't need to be applied? I suspect the error and its avoidance
is fundamentally a human factors issue, rather than an engineering one.

> Many of the same observations, by the way, apply to the September
> 1999 loss of the Mars Climate Orbiter (see
> http://mars.jpl.nasa.gov/msp98/news/mco990930.html).

Including that this is apparently a failure in the interface
requirements specification between two separately developed processes,
not a simple invocation of one routine from another.

--

David Gillon

Gisle Sælensminde

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

In article <8j67p8$afd$1...@nnrp1.deja.com>, Bertran...@eiffel.com wrote:
>The point made in the IEEE Computer article by Jean-Marc Jezequel
>and myself (see
>http://www.eiffel.com/doc/manuals/technology/contract/ariane)
>is not that using Eiffel would by itself have avoided the bug.
>Such a statement wouldn't make much sense (it's obvious that
>in any general-purpose programming language you can mess things up).
>

>But we do make the point that in the Design by Contract
>culture pervasive in Eiffel, and directly supported
>by the language (including its inheritance mechanism,
>exception handling, documentation tools, debugging tools...)
>the standard and almost obligatory way of reusing a module is
>through its contract, including its conditions of use (precondition).

>The normal procedure is to document a routine through its
>contract and, when reusing it, to check that every call satisfies

>the contract. In a software development culture that has assimilated
>and integrated the principles of Design by Contract

>the first task of quality assurance is to check that
>every call takes the contracts into account. In the Ariane case
>this wouldn't even have required a test; just static inspection.

>Any organization that used Eiffel and had even a minimal
>quality assurance process adapted to the Eiffel technology
>would have done that, even if the quality assurance process
>were otherwise imperfect. (That's the important point:
>obviously, if the QA process is perfect, assuming that's
>possible, it will catch errors. The important question is
>how a serious but possibly imperfect process, routinely
>associated with a certain technology, will perform.)
>

>So the gist of the argument in our article is: a standard
>Eiffel practice would most likely have caught the problem.

I'm sure that if the Ariane 5 software had been written in
a different language from Ada, someone would have written articles
saying that with "standard Ada practice" the bug would have been
avoided. The problem was that a subsystem was moved from Ariane 4
to Ariane 5 without realistic testing. The Ariane 4 software
developers delibaratly omited a test, since analysis concluded
that this never could happen, and for Ariane 4 this was indeed
the case. Since this is poor management more than anything else,
I really wonder how a language can help against poor management.

As I understand, you don't say that Eiffel in itself could avoided
the problem, but that the design by contract mindset could have
avoided it. Design by contract is a nice concept which I would
like to see spread, but I don't think that it would have helped.
The problem was the decision to move a component from
Ariane 4 to Ariane 5 without sufficient testing. This is bad
practice anyway, and no language or design methology can avoid
bad management.

>What's interesting is not a claim of the form "If they had done
>X they would have avoided it". This is true of many `X' and,
>after the fact, it's easy by definition to come up with an `X'
>since we know what the error was. What's interesting is that
>the actual X mentioned above -- checking that contracts are
>satisfied -- is an integral part of the Eiffel method, routinely
>applied, not an ex-post-facto suggestion based on anything
>Ariane-specific.
>
>It's also quite easy and cheap to do. In contrast, the recommendations
>of the commission that examined the crash and published the report
>involved "more of the same" quality assurance techniques -- more
>management, more tests -- leading to ever higher costs, without
>the guarantee that systemic problems such as not requiring
>and enforcing explicit contracts won't again lead to similar
>results, however many millions of dollars are spent on better
>management practices and more extensive testing.

I cannot see how design by contract could avoided the failure. The
device worked as designed, much of the problem was that it was moved
to an environment it was not designed for without testing and analysis.

>Many of the same observations, by the way, apply to the September
>1999 loss of the Mars Climate Orbiter (see
>http://mars.jpl.nasa.gov/msp98/news/mco990930.html).

The problem was caused by someone where feeding orbital data into Mars
climate orbiter using feet as unit, while the program expected meters.
NASA is using the SI system, but Americans seems to be more used to
the old English units, so this is much of a cultural problem. Probably
the software developers hadn't even thought of this even in their
wildest dreams, so chances are that they would neither put anything
like this into the contract if they had used Eiffel and design by
contract methology. How could we know in hindsight?

>So, no, use of Eiffel doesn't guarantee you won't make errors,
>but it certainly helps avoid a large class of common and
>damaging errors.

Design by contract can help discovering bugs and problems during
testing and ensure that the implementation is correct with respect to
the design, but it don't help if realistic testing not is performed,
or improve bad design desicions. If you see backward on Comp.lang.ada,
you will in fact see that many comes with the claim that Ada and it's
mindset will reduce such errors. A good language design and design
methology will help in many cases, but it cannot replace common sense.

--
Gisle Sælensminde ( gi...@ii.uib.no )

Tarjei T. Jensen

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

Hyman Rosen wrote in message ...

>"Tarjei T. Jensen" <tarjei...@kvaerner.com> writes:
>> One of the point with run-time checking is that you can detect errors and
use
>> that information to recover. C++ style assertion failures are not examples
of
>> good programming. The Ada community should be able to do better.
>
>But once you've decided that an error can't happen, you are not
>going to have recovery code to handle that error. In that case,
>it may be better to not catch the error, in the hope that its
>effects will be local and damped out.

Does not matter if that particular error is not supposed to happen. You still
want to handle it and recover. If neccessary by shutting down parts of your
computing system.

Greetings,

Tarjei T. Jensen

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

Bertran...@eiffel.com wrote

>But we do make the point that in the Design by Contract
>culture pervasive in Eiffel, and directly supported
>by the language (including its inheritance mechanism,
>exception handling, documentation tools, debugging tools...)
>the standard and almost obligatory way of reusing a module is
>through its contract, including its conditions of use (precondition).

You're skating on thin ice here. There is no reason to believe that Ada
programmers are less conscientious than Eiffel programmers. Quite the contrary.
Ada has a strong engineering culture.

Besides. This was running on 1750 hardware. Was there any eiffel compiler
available for at the time for the CPUs used? Are there any Eiffel compiler
available for space hardened CPUs?

Greetings,

Al Christians

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

"Tarjei T. Jensen" wrote:
>
> Bertran...@eiffel.com wrote
> >But we do make the point that in the Design by Contract
> >culture pervasive in Eiffel, and directly supported
> >by the language (including its inheritance mechanism,
> >exception handling, documentation tools, debugging tools...)
> >the standard and almost obligatory way of reusing a module is
> >through its contract, including its conditions of use (precondition).
>
> You're skating on thin ice here.

Didn't the managers decide to re-use the code without reviewing it?
I'd think that if you don't look at the code, it doesn't matter much
what language it isn't written in.

A while back I downloaded an Eiffel demo from B. Meyer's company, an
example of the results the programming culture he espouses. It was
a cute demo that ran under Windows, presented some GUI elements, and
moved them around on the screen in response to mouse movements. About
three mouse clicks later, Windows was dead. Meyer is a superb
pontificator, but all his design by contract cerebration couldn't
keep his own demo, his own firm's best foot forward, flying any
longer than the Ariane flew. Perhaps that is because real OS's,
eg Windows, and real rockets, eg Ariane, are not bound by contracts.

Al

bertran...@my-deja.com

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

"Tarjei T. Jensen" <tarjei...@kvaerner.com> wrote:

>
> There is no reason to believe that Ada
> programmers are less conscientious than Eiffel programmers.

Of course not. Who claimed that?

> Quite the contrary.

The contrary? Evidence?

> Ada has a strong engineering culture.

Very true. But that culture doesn't include
the discipline of Design by Contract.

bertran...@my-deja.com

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

Al Christians <ach...@easystreet.com> wrote:

> Didn't the managers decide to re-use the code without reviewing it?
> I'd think that if you don't look at the code, it doesn't matter much
> what language it isn't written in.

You are missing the point. It's not a matter of reviewing
the code; it's a matter of not reusing code without a specification.

The code of a module needs to be reviewed by the developers of that
module question. For users ("clients") of the module, what matters
is the specification -- the contract. You don't reuse without
a contract.

> A while back I downloaded an Eiffel demo from B. Meyer's company.
>[...] About

> three mouse clicks later, Windows was dead.

Fair enough. We've released software with bugs before.
We're learning, like everyone else, and a GUI demo is not
a mission-critical system developed with the kind of attention
such systems deserve. In addition one would need to know the
details and the context. I don't know what that demo was, but
obviously it worked elsewhere (otherwise we wouldn't have released
it), so it's impossible to know what the problem was and whose
"fault" was involved (the demo, the OS, the installation
etc.) Assuming it was indeed a bug in ISE's software --
in the absence of precise information we have by default
to take responsibility for it -- and you want to dismiss
a whole methodology on the basis of that example, no one
can really criticize you.

There are people building large, complex systems, and there
are people teaching how to build systems. Usually these two
communities are pretty much disjoint; the former produce
software, the latter produce papers and books. In our (ISE's) case
we do both: we developed and teach techniques for building
better software, and we also sell a sophisticated development
environment (ISE Eiffel), with thousands of library classes,
interfaces to all kinds of technologies (COM, the Windows API,
SQL, X Windows, GTK, C++, Java, NAG, CORBA etc. etc.) and
versions for many platforms (Windows, Linux, many Unixes, VMS).
This inevitably exposes us to the kind of criticism expressed
in Mr. Cristians's message whenever we mess up. So be it.
We are not perfect, but we do as we say (i.e. we use our
own technology), and the result is for everyone to download
and see: for worse, but also (as I hope you'll realize if
you explore the environment further) for better. And it
keeps us honest: a reader can't crash a slightly flawed paper,
but a user can crash a slightly buggy demo program (so that,
among other things, the bug gets fixed).

Al Christians

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

bertran...@my-deja.com wrote:
>
> you want to dismiss whole methodology on the basis of that example,

> no one can really criticize you.
>

No. But one actual example of what actually happened when I actually
ran code produced your way ought to carry more weight than one
hypothetical example you present of what you say would have happened if
only someone else had produced code your way. Or am I overly obsessed
with reality?

Has any rocket or other flying vehicle or projectile ever actually flown
with real-time Eiffel software on board keeping it in the air and on
course?

Al

Bertrand Meyer

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

David Gillon wrote:
>
[...]

>
> But in the Ariane 501 case the reuse was at the LRU level, not software.
> They treated the Inertial Reference System as a COTS plug-in, assuming
> the Ariane 4 design requirements (contract?) remained valid and did not

> need to be re-evaluated. [...]

> Not knowing Eiffel, I'm unsure how well its DBC facilities would cope
> with requirements flow down across completely different systems -- if
> the interface requirements specification must be updated manually in
> such cases then the historical failure can still occur in precisely the
> same manner.

If you do look into Eiffel and Design by Contract you will see that
it is the client's responsibility to ensure a precondition before the
call. That the operation has been built with another technology
doesn't matter; we deal with "external" features all the time
("external" is a keyword of Eiffel) and, when encapsulating them,
equip them with contracts exactly as if they were internal.

[BM]!!! The normal procedure is to document a routine through its
!!! contract and, when reusing it, to check that every call satisfies
!!! the contract.

>
> But how well does this scale to cover re-use of independently developed
> LRUs?

That's exactly the point. In an environment fostering Design by Contract
the obligatory path to reuse is through contracts. If you don't have
a contract, you examine the reused element and equip it with a contract.
The intellectual discipline leads you to ask "what are the conditions
of use of this thing?". You don't reuse it until you have the answer to
that question. Of course the answer might be wrong. But the key step
is to ask the question. In the case under discussion this would most
likely have prompted the realization that the operation had specific
constraints which the caller had to meet.

-- Bertrand Meyer

Stanley R. Allen

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

bertran...@my-deja.com wrote:
>
> "Tarjei T. Jensen" <tarjei...@kvaerner.com> wrote:
>
> > Ada has a strong engineering culture.
>
> Very true. But that culture doesn't include
> the discipline of Design by Contract.
>

And the Eiffel culture does not include the discipline of
protected objects.

This is a monumentally stupid discussion. We all know that
Eiffel has features X and Y that Ada does not have, and vice
versa.

The problem of software reliability will never be solved by
any language or feature of any language, nor will it ever be
solved by 'the culture' surrounding a language or technology.

When software failures are reported in the news, be they in
aerospace or telecommunications or banking systems, the ONLY
RATIONAL NON-DECEPTIVE CLAIM that we as supporters of Ada or
Eiffel or DBC can make is that our technologies COULD have
REDUCED THE PROBABILITY of the failure, and give reasons why.

--
Stanley Allen
mailto:Stanley_R...@raytheon.com

durchholz@halstenbach.com Joachim Durchholz

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

David Gillon <david....@gecm.com> wrote:
>
> Not knowing Eiffel, I'm unsure how well its DBC facilities would cope
> with requirements flow down across completely different systems -- if
> the interface requirements specification must be updated manually in
> such cases then the historical failure can still occur in precisely
the
> same manner.

Actually this is possible. DbC modelling is indeed applicable to
hardware, software, and even business processes, and across levels of
abstraction.

For the Ariane-5 example, you can do this (just showing the gross
outline and omitting *lots* of detail, i.e. ignore scaling and precision
issues in real numbers etc.):

deferred class ROCKET
-- A model of the physical Ariane rocket.
-- This is just a specification (the class is "deferred",
-- which is Eiffel terminology for an abstract class).

feature

x, y, z: REAL is deferred
-- Position in space.
-- (In reality we'd probably use a more suitable
-- coordinate system.)
-- (We'd probably even use a POSITION_VECTOR class with
-- suitably defined conversions etc. to keep ROCKET at
-- a manageable size (and to get a useful level of
-- abstraction).
end

dx, dy, dz: REAL is deferred
-- Current speed.
end

velocity: REAL is deferred
-- This might be a function or a value. At the interface
-- level, Eiffel doesn't care about the difference.
ensure -- A postcondition.
Result = sqrt (dx * dx + dy * dy + dz * dz)
end

invariant
velocity <= maximum_velocity
end

The purpose of this class isn't executable code (though one could write
a concrete subclass for simulation purposes). Rather, it's just a
specification.

The IRS could be programmed like this:

class INERTIAL_REFERENCE_SYSTEM

feature {NONE}
-- {NONE} means the same as "protected" in C++.
-- (One could do a "selective export" by giving
-- a list of class names instead of NONE, for
-- a more granular control over who gets to see
-- a group of features.)

reality: ROCKET

feature -- Public features: no {NONE} restriction here

maximum_velocity: REAL is 3600.0
-- Maximum velocity up to which the IRS is guaranteed
-- to work.

precision: REAL is 0.5
-- Absolute precision of sensor.

measured_x: REAL
-- Sensor output.
require
rocket.velocity <= maximum_velocity
ensure
(rocket.x - Result).abs <= precision

-- Simile for measured_y and measured_z.
-- Again, we'd probably return all three values
-- as an object of some VECTOR class. In this case,
-- we'd probably not use REAL as well, but some
-- scaled integer type.

end

Now the programmer who uses an object of type INERTIAL_REFERENCE_SYSTEM
and asks for its measured_x will automatically see the precondition. At
that point, he'll likely check that the precondition matches that of the
ROCKET that he has (either as an abstract specification, or even as a
software simulation). Either way, he'll want to verify that the
precondition of 'measured_x' matches the invariant of ROCKED - and when
he's using an ARIANE_5 object, he'll see that its 'maximum_velocity' is
indeed above the 'maximum_velocity' value of INERTIAL_REFERENCE_SYSTEM.

(This doesn't mean that DbC would have actually caught the problem. The
concrete situation was that no part of the Ariane-5 software was
actually accessing the IRS; depending on the details of modelling, this
might have been noted or not. The best design method cannot make up for
design mistakes; however, it can make mistakes more obvious.)

> > The normal procedure is to document a routine through its

> > contract and, when reusing it, to check that every call satisfies

> > the contract.
>
> But how well does this scale to cover re-use of independently
> developed LRUs?

The IRS is hardware - but you can use the INERTIAL_REFERENCE_SYSTEM
class as outlined above as a replacement for a Detail Design Document of
the hardware. The team building the main control software of the Ariane
can make a ROCKET_WITH_SENSORS class that has a ROCKET, a deferred
version of INERTIAL_REFERENCE_SYSTEM; this class is, again, a
specification of the Ariane that can be used as a design background,
just as the programmers of INERTIAL_REFERENCE_SYSTEM used ROCKET as
design background.
The power of this method is that the original specifications from ROCKET
will automatically become part of the specification of the main control
software. No expensive and error-prone transformation of design
specifications is required. (This benefit will be available only as far
as all levels of the system are designed using the same specification
language; in the worst case, the written specifications of external
parts would need their specifications rewritten as specification
classes. After that, these classes make a good superclass for simulation
software, and can serve as documentation; if it's done in Eiffel, the
Eiffel tools will automatically trace any high-level software
specifications back to its origins in external-parts specifications.
This should give excellent traceability with a minimum of overhead on
the side of the designer or programmer.)

> > In a software development culture

> > the first task of quality assurance is to check that
> > every call takes the contracts into account. In the Ariane case
> > this wouldn't even have required a test; just static inspection.
>

> I'm not certain this would have made a difference. The error seems to
> have been an assumption that, in Eiffel terms, the contract was
> unchanged, and so did not need to be reassessed. They likely had
> processes that might have detected the error if invoked, but time and
> cost pressure caused them to make the assumption that they were not
> needed in this case.

Agreed. However, if contracts are expressed in a uniform way and are
accessible through a powerful class browser (such as those available for
all commercial Eiffel compilers), checking is much easier, so it's more
likely that this test would have been done statically.

Given the concrete circumstances of the Ariane-5 crash, I'm not sure
that it would have made a difference. As you say, it's an easy thing to
overlook, even if everything is fully specified - in fact everything was
fully specified, the inconsistency just never became obvious.
Yet I still think there is at least some effect. However, the real
advantage of DbC in a safety-critical section is not that it would have
made the Ariane crash less likely by whatever margin, it's that it makes
keeping the code consistent with the specifications much easier. The
specification is available in the form of assertions, it's even possible
to write simulating subclasses (of ROCKET for testing I_R_S, of I_R_S to
test the main control software) that actually test them, so it's easy to
write a quick-and-dirty test that exercises the software against changed
hardware requirements.

> Including that this is apparently a failure in the interface
> requirements specification between two separately developed processes,
> not a simple invocation of one routine from another.

I hope I have demonstrated that DbC is applicable even in this case.

Finally, let me apply a disclaimer: This is how it *think* that DbC
could be applied to safety-critical software and hardware interfaces. I
have seen safety-critical programming from a short distance, so I hope
the ideas developed in this post are useful - but if I were charged with
introducing DbC into a safety-critical project, I'd do a few experiments
first.

Regards,
Joachim
--
This is not an official statement from my employer or from NICE.
Reply-to address changed to discourage unsolicited advertisements.

durchholz@halstenbach.com Joachim Durchholz

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

Tarjei T. Jensen <tarjei...@kvaerner.com> wrote:
>

> Besides. This was running on 1750 hardware. Was there any
> eiffel compiler available for at the time for the CPUs used?
> Are there any Eiffel compiler available for space hardened CPUs?

Did it have a C compiler? If yes, it also had an Eiffel compiler. Most
Eiffel compilers in existence can emit C code.

Keith Thompson

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

gi...@spurv.ii.uib.no (Gisle Sælensminde) writes:
[...]

> >Many of the same observations, by the way, apply to the September
> >1999 loss of the Mars Climate Orbiter (see
> >http://mars.jpl.nasa.gov/msp98/news/mco990930.html).
>
> The problem was caused by someone where feeding orbital data into Mars
> climate orbiter using feet as unit, while the program expected meters.

Actually, I think it was pounds vs. Newtons.

> NASA is using the SI system, but Americans seems to be more used to
> the old English units, so this is much of a cultural problem. Probably
> the software developers hadn't even thought of this even in their
> wildest dreams, so chances are that they would neither put anything
> like this into the contract if they had used Eiffel and design by
> contract methology. How could we know in hindsight?

As I recall, NASA uses SI, but the contractor (I *think* it was
Lockheed Martin) uses English units internally. They had procedures
for reconciling the units; they just missed it in this one case.

I've heard that someone on the project had noticed a problem some time
before the spacecraft was lost (the number in question was a thrust,
and this was during a phase when only minor course corrections were
being done), but didn't have time to follow up on it.

A more recent report is at
<http://mars.jpl.nasa.gov/msp98/news/mco991110.html>.

--
Keith Thompson (The_Other_Keith) k...@cts.com <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://www.sdsc.edu/~kst>
Welcome to the last year of the 20th century.

Keith Thompson

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

"Joachim Durchholz" <joachim durc...@halstenbach.com> writes:
[...]

> Did it have a C compiler? If yes, it also had an Eiffel compiler. Most
> Eiffel compilers in existence can emit C code.

Do Eiffel compilers generate *portable* C code? I.e., does porting an
Eiffel compiler consist merely of copying the generated C code from
one platform to another and recompiling it? If so, I'm impressed --
especially if this works for the 1750 (which, among other oddities,
has 16-bit storage units).

Ken Garlington

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

MDC: Your interpretation of the report is the same as mine - the exception
was caught by a global interrupt handler, not a localized exception handler.
The local exception was suppressed to save run-time, which caused the
hardware overflow exception to be triggered without subsequently returning
to the local scope.

Ken Garlington

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

Reliable programming, or safe programming? There is a difference...

Reliable programming covers the gamut from Programming Pearls and other
"good practices" books -- building reliability into a product -- to software
reliability engineering texts that describe how to determine what
reliability was achieved. A few examples of the former include:

Sennett, C. T. (ed). High-Integrity Software. Plenum Press, 1989. ISBN
0-306-43552-7
Schulmeyer, G. Gordon. Zero Defect Software. McGraw-Hill, 1990. ISBN
0-07-055663-6
Dyer, Michael. The Cleanroom Approach to Quality Software Development. John
Wiley & Sons, 1992. ISBN 0-471-54823-5

For SRE, I would recommend the Handbook of Software Reliability Engineering,
or John Musa's book. See http://www.hartdevco.com/page5.html

For software safety, I still think Safeware by Nancy Leveson is the best
thing I've ever read. See http://sunnyday.mit.edu/

<tmo...@bix.com> wrote in message
news:UVR45.52$SR1....@news.pacbell.net...

> >Within the context of the IRS for Ariane 5, programming is a whole
> >different world than what most people are used to.

> Are there textbooks and CompSci courses on reliable programming?

Howard W. LUDWIG

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

Joachim Durchholz wrote:

> DbC modelling is indeed applicable to hardware, software, and even
> business processes, and across levels of abstraction.
>
> For the Ariane-5 example, you can do this (just showing the gross outline
> and omitting *lots* of detail, i.e. ignore scaling and precision issues in
> real numbers etc.):

[Although scaling seems to have been a key issue in the case of
Ariane 5.]

> deferred class ROCKET
> -- A model of the physical Ariane rocket.
> -- This is just a specification (the class is "deferred",
> -- which is Eiffel terminology for an abstract class).
>
> feature
>
> x, y, z: REAL is deferred
> -- Position in space.
> -- (In reality we'd probably use a more suitable
> -- coordinate system.)
> -- (We'd probably even use a POSITION_VECTOR class with
> -- suitably defined conversions etc. to keep ROCKET at
> -- a manageable size (and to get a useful level of
> -- abstraction).
> end
>
> dx, dy, dz: REAL is deferred
> -- Current speed.
> end
>
> velocity: REAL is deferred
> -- This might be a function or a value. At the interface
> -- level, Eiffel doesn't care about the difference.
> ensure -- A postcondition.
> Result = sqrt (dx * dx + dy * dy + dz * dz)
> end

I am inadequately knowledgeable of Design by Contract and
Eiffel, so somebody please help me out of my confusion. I
would normally expect some calculations (for this type of
application domain anyway) with Result being computed in a
"do" section between the preconditions ("require" section)
and the post-conditions ("ensure" section). What is ensured
above looks just like what I would expect to see in the "do"
section. If you duplicate the "do" computations (which I
grant are not listed in the code snippet above, but I would
expect to see there in practice) in the "ensure" section, of
course they are going to match and the postcondition does
not really check anything. What am I missing here? What do
you do to put truly meaningful checks in postconditions as
distinct from your nominal calculations in the "do" section?

> invariant
> velocity <= maximum_velocity
> end
>
> The purpose of this class isn't executable code (though one could write a
> concrete subclass for simulation purposes). Rather, it's just a
> specification.
>
> The IRS could be programmed like this:
>
> class INERTIAL_REFERENCE_SYSTEM
>
> feature {NONE}
> -- {NONE} means the same as "protected" in C++.
> -- (One could do a "selective export" by giving
> -- a list of class names instead of NONE, for
> -- a more granular control over who gets to see
> -- a group of features.)
>
> reality: ROCKET
>
> feature -- Public features: no {NONE} restriction here
>
> maximum_velocity: REAL is 3600.0
> -- Maximum velocity up to which the IRS is guaranteed
> -- to work.
>
> precision: REAL is 0.5
> -- Absolute precision of sensor.

The problem with the Mars Climate Orbiter (which Bertrand Meyer
referenced in an earlier posting in this thread) was a mismatch
of measurement units between two separate computer programs.
A valuable mechanism to keep such units straight and to find
mismatches is to have some sort of interface specification
which states clearly which units shall be used (what does 1.0
mean for floating point and what does 1 [LSB] mean for scaled
integer)--no functional equivalence accepted and then to have
the units explicitly included as part of the executed software
(that is, not just in comments). Code like the above, with
"3600.0" and "0.5" but no units, seems to be continuing with
the tradition of careless, shortcut software, which will not
alleviate problems, such as with MCO. No language guarantees
an adequate specification of contract.

> would need their specifications rewritten as specificationclasses. After

> ideas developed in this post are useful - but if I were charged with the
> ideas developed in this post are useful - but if I'd do a few experiments
> first.

> Regards,
> Joachim

Another part of the problem was synchronization of events.
It is clear the constraint would have been violated in any
case at higher altitudes. It is also clear that at very
low altitudes the constraint is not violated. The problem
was that the transition point occurred earlier for Ariane 5
than for Ariane 4. This seems to me to be a dynamic issue,
not a static issue, so that static checking would not work.

In terms of the culture and the thought process, the software
in question was regarded as serving no purpose after lift-
off and was, therefore, harmless; the fallacy of the logic
yielding that consequence became painfully apparent.
Inductive reasoning based on the success of every Ariane 4
launch and the reuse of the same hardware/software system
(the SRI) in Ariane 5 furthered the fallacious thinking.
But the conclusion was still, basically, there is no contract
to satisfy or violate--if software is doing nothing, then
there is nothing to assure. Therefore, the contract is
[vacuously] satisfied. Do we really think that most people,
even those steeped in DbC, would really pursue the contract
analysis further to verify up-front--by system analysis,
not after-the-fact system integration and verification--that
the velocity constraints would be or would not be met during
the time after lift-off that the software in question continues
to run, and that the velocity constraints are "don't cares"
after the software was scheduled to turn off? It is nice for
us to look in hindsight and examine what can we do to prevent
repeating this mistake. That is good, and we can put extra
or different steps in our process to make sure it will not
happen again in similar circumstances, and, had those steps
already been in place, would not have happened. But I think
it naive to say such-and-such already existing (in 1996)
design process or concept would almost certainly have caught
the problem. (System testing, which was already accepted
practice but deliberately skipped in this case, was demonstrated
to have been adequate.) The sample snippet of code that ignores
units of measurement but is supposedly an example of DbC does
not give me confidence that this approach will reliably find
critical errors for programs like MCO and Ariane 5. Please
don't get me wrong when I say this, because I see much value
in the DbC concept from what I have read about it, and I have
no doubt that it is a useful approach that will catch many
major flaws in a design--in some ways it seems to extend the
valuable concept of the peer review process by supporting more
automated checking, and by providing more substantive data
for human reviewers; however, to regard DbC as a cure-all
eliminating all substantive flaws from a specification or a
design is a delusion. The proverbial lack of a "silver bullet"
still seems to hold true.

Howard W. LUDWIG

David Starner

unread,

Jun 26, 2000, 3:00:00 AM6/26/00

to

On Tue, 27 Jun 2000 00:56:24 +0200, Joachim Durchholz <joachimdo...@halstenbach.com> wrote:
>> especially if this works for the 1750 (which, among other oddities,
>> has 16-bit storage units).
>

>Well, I wouldn't expect that this is a problem. Of course, the proof of
>such a pudding is in the eating, so I won't make any bold claims until I
>have seen such a beast run. But these portability issues are largely
>solved in C, so Eiffel just relies on the C semantics.

They're solved in C? For what value of solved? Most C programmers just
don't worry about 16 bit systems. The GNU project (in an era of 16 bit
machines) decided that it wasn't worth their trouble to write for 16
bit machines. C99 has/will improve things some, but I wouldn't say the
issues are solved, even largely.

--
David Starner - dstar...@aasaa.ofe.org
http/ftp: x8b4e53cd.dhcp.okstate.edu
"A dynamic character with an ability to survive certain death and
a questionable death scene leaving no corpse? Face it, we'll never
see her again." - Sluggy Freelance

dot durchholz@halstenbach.com Joachim Durchholz

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

Keith Thompson <k...@cts.com> wrote:
> "Joachim Durchholz" <joachim durc...@halstenbach.com> writes:
> [...]
> > Did it have a C compiler? If yes, it also had an Eiffel compiler.
Most
> > Eiffel compilers in existence can emit C code.
>
> Do Eiffel compilers generate *portable* C code? I.e., does porting an
> Eiffel compiler consist merely of copying the generated C code from
> one platform to another and recompiling it?

Largely. There are always library differences. Sometimes the C compiler
is stranger than the hardware that it was written for. But such problems
are rare.
The worst problems are usually getting all the tools (compiler, linker,
loader, whatever) called with the right options and in the right order.
For example, a port to gcc sort-of failed because the GNU tools were
unable to read .lib files in the Microsoft variant of the coff format.
(Linking the same libraries as DLLs would have worked fine but wasn't
considered an option initially.)

> If so, I'm impressed --

This is not really necessary. The generated code is quite
straightforward, so if there's a bug in the code generation (be it
portability or something else), it will usually be present all over the
generated C code.
C works remarkably well as a "portable assembler", with one exception:
It's darn difficult to make the generated executable react reliably to
integer overflows.

> especially if this works for the 1750 (which, among other oddities,
> has 16-bit storage units).

Well, I wouldn't expect that this is a problem. Of course, the proof of
such a pudding is in the eating, so I won't make any bold claims until I
have seen such a beast run. But these portability issues are largely
solved in C, so Eiffel just relies on the C semantics.

(This is not part of the language definition, it's just what *most*
Eiffel compilers do. There is a significant minority that compiles
directly to machine language; in such a case, standard portability
issues must be addressed again.)

Peter Horan

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

"Howard W. LUDWIG" wrote:

> > velocity: REAL is deferred
> > -- This might be a function or a value. At the interface
> > -- level, Eiffel doesn't care about the difference.
> > ensure -- A postcondition.
> > Result = sqrt (dx * dx + dy * dy + dz * dz)
> > end
>
> I am inadequately knowledgeable of Design by Contract and
> Eiffel, so somebody please help me out of my confusion. I
> would normally expect some calculations (for this type of
> application domain anyway) with Result being computed in a
> "do" section between the preconditions ("require" section)
> and the post-conditions ("ensure" section). What is ensured
> above looks just like what I would expect to see in the "do"
> section. If you duplicate the "do" computations (which I
> grant are not listed in the code snippet above, but I would
> expect to see there in practice) in the "ensure" section, of
> course they are going to match and the postcondition does
> not really check anything. What am I missing here? What do
> you do to put truly meaningful checks in postconditions as
> distinct from your nominal calculations in the "do" section?

The point in the general case is that you are able to specify
the postcondition without also having to write the code to
deliver the goods at the same time. In other words, the
postcondition says _what_ the relationship between result and
the parameters must be without saying _how_ this relationship
is obtained. It is a specification, not code.

As soon as you write the code, in this example, it is obvious
the postcondition will be satisfied, because the expressions in
the the two parts of the yet to be written assignment and the
two parts of the postcondition are the same. But what about
a postcondition such as
ensure
exists: file.exists
where things are not so similar?

> > feature -- Public features: no {NONE} restriction here
> >
> > maximum_velocity: REAL is 3600.0
> > -- Maximum velocity up to which the IRS is guaranteed
> > -- to work.
> >
> > precision: REAL is 0.5
> > -- Absolute precision of sensor.
>
> The problem with the Mars Climate Orbiter (which Bertrand Meyer
> referenced in an earlier posting in this thread) was a mismatch
> of measurement units between two separate computer programs.
> A valuable mechanism to keep such units straight and to find
> mismatches is to have some sort of interface specification
> which states clearly which units shall be used (what does 1.0
> mean for floating point and what does 1 [LSB] mean for scaled
> integer)--no functional equivalence accepted and then to have
> the units explicitly included as part of the executed software
> (that is, not just in comments). Code like the above, with
> "3600.0" and "0.5" but no units, seems to be continuing with
> the tradition of careless, shortcut software, which will not
> alleviate problems, such as with MCO. No language guarantees
> an adequate specification of contract.

It is only intended as an example. So specify the units also.
But I agree that there is always room for overlooking things.

> The sample snippet of code that ignores
> units of measurement but is supposedly an example of DbC does
> not give me confidence that this approach will reliably find
> critical errors for programs like MCO and Ariane 5. Please

It was only an example, as I suggested above. The point about
specification is that is can also develop as understanding
improves, and it can be developed ahead of the code.

> don't get me wrong when I say this, because I see much value
> in the DbC concept from what I have read about it, and I have
> no doubt that it is a useful approach that will catch many
> major flaws in a design--in some ways it seems to extend the
> valuable concept of the peer review process by supporting more
> automated checking, and by providing more substantive data
> for human reviewers; however, to regard DbC as a cure-all
> eliminating all substantive flaws from a specification or a
> design is a delusion. The proverbial lack of a "silver bullet"
> still seems to hold true.

I have not read the discussion in this way that DbC is a silver
bullet. But, I guess that is because I use DbC and would not
be without it. I do read the current discussion as "maybe DbC
would have helped".
--
Peter Horan School of Computing and Mathematics
pe...@deakin.edu.au Deakin University
+61-3-5227 1234 (Voice) Geelong
+61-3-5243 7723 (Home) Victoria 3217
+61-3-5227 2028 (FAX) Australia

-- The Eiffel Guarantee: Specification to Implementation (http://www.elj.com)

Thomas Beale

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

"Gisle Sælensminde" wrote:

> As I understand, you don't say that Eiffel in itself could avoided
> the problem, but that the design by contract mindset could have
> avoided it. Design by contract is a nice concept which I would
> like to see spread, but I don't think that it would have helped.
> The problem was the decision to move a component from
> Ariane 4 to Ariane 5 without sufficient testing. This is bad
> practice anyway, and no language or design methology can avoid
> bad management.

You may be right, but this is a reductionist way of looking at things.

The question is whether such bad practice would have occurred in a _culture_
of design by contract. The question isn't about whether one event might have
taken place or not if the culture were the same, but Eiffel was use, but
rather whether it would have taken place in a different development culture.

The assertion being debated is more or less that: DbC represents a different
culture; such a different culture could well have averted the error, since in
the DbC mindset, _everything_ that is re-used has its contracts reviewed &
tested.

As a long-time user of Eiffel, I can certainly say that DbC, when used in a
pervasive way, does constitute a cultural change, and could well prevent such
"reuse" errors.

By the way, for Ada users, I don't think that Ada was at fault; if DbC were
being used in the Ada development, the cultural benefits would be gained, _if_
it was actually implemented in the language (or some reasonable library or
macro addition - whatever one does in Ada!). Like any nice idea, it does
little for real projects unless implemented directly in the tools.

- thomas beale

Thomas Beale

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

"Tarjei T. Jensen" wrote:

> Bertran...@eiffel.com wrote
> >But we do make the point that in the Design by Contract
> >culture pervasive in Eiffel, and directly supported
> >by the language (including its inheritance mechanism,
> >exception handling, documentation tools, debugging tools...)
> >the standard and almost obligatory way of reusing a module is
> >through its contract, including its conditions of use (precondition).
>

> You're skating on thin ice here. There is no reason to believe that Ada
> programmers are less conscientious than Eiffel programmers. Quite the contrary.

> Ada has a strong engineering culture.

The above comment does not say otherwise. As an engineer (and software engineer)
who has worked on realtime async systems, I know indeed that Ada people are
conscientious developers. The above comment just says that if the DbC approach (as
found in Eiffel) were used (wherever) then things might have been different.

> Besides. This was running on 1750 hardware. Was there any eiffel compiler
> available for at the time for the CPUs used? Are there any Eiffel compiler
> available for space hardened CPUs?

Depends. Is there an ANSI C compiler for the 1750? An Eiffel system was made to run
inside an HP printer...

- thomas beale

Thomas Beale

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

Al Christians wrote:

> A while back I downloaded an Eiffel demo from B. Meyer's company, an
> example of the results the programming culture he espouses. It was
> a cute demo that ran under Windows, presented some GUI elements, and
> moved them around on the screen in response to mouse movements. About
> three mouse clicks later, Windows was dead. Meyer is a superb
> pontificator, but all his design by contract cerebration couldn't
> keep his own demo, his own firm's best foot forward, flying any
> longer than the Ariane flew. Perhaps that is because real OS's,
> eg Windows, and real rockets, eg Ariane, are not bound by contracts.
>
> Al

One has to ask: what OS did you run the thing under? If 95 or 98, you know
who to call...

Secondly, would you use Windows-anything in a rocket? (Please don't say yes)

- thomas beale

David Gillon

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

Joachim Durchholz wrote:

> Actually this is possible. DbC modelling is indeed applicable to
> hardware, software, and even business processes, and across levels of
> abstraction.

But how well does it function across multiple projects and companies,
each with their own development practises?

> The power of this method is that the original specifications from ROCKET
> will automatically become part of the specification of the main control
> software. No expensive and error-prone transformation of design
> specifications is required. (This benefit will be available only as far
> as all levels of the system are designed using the same specification
> language

On a programme such as Ariane 5, and certainly if we want to scale this
to the general case of safety critical vehicle management software, you
are looking at multiple teams working in multiple locations under
multiple development environments. Eiffel itself, or any other
requirements/implementation tool is effectively a decoy here, the real
issue is whether there is an advantage to using Design By Contract in
these cases. I don't see anything to indicate that the transfer of
interface requirements expressed as DbC data between companies would be
inherently safer than the transfer of the same information as a formal
interface requirements document.

> Agreed. However, if contracts are expressed in a uniform way and are
> accessible through a powerful class browser (such as those available for
> all commercial Eiffel compilers), checking is much easier

It's the 'if' that is the problem.

> > Including that this is apparently a failure in the interface
> > requirements specification between two separately developed processes,
> > not a simple invocation of one routine from another.
>
> I hope I have demonstrated that DbC is applicable even in this case.

Possibly, but a properly handled Interface Requirements document should
have caught the same errors. NASA/LockMart were bitten by trying to go
faster, better, cheaper and allowing their reviewing process to decay to
the point something as fundamental as a units inconsistency was not
spotted.

--

David Gillon

Richard Kenner

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

In article <3958361A...@deakin.edu.au> Peter Horan <pe...@deakin.edu.au> writes:
>It is only intended as an example. So specify the units also.
>But I agree that there is always room for overlooking things.

And that's exactly the point. *No* language or methodology can prevent
people from overlooking things, and that's precisely what happened in
both of these incidents.

Marin D. Condic

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to Ken Garlington

Thanks for the assessment. This disaster gets discussed every so often
and there is usually a lot of mistaken information about what really
caused it and what could have been done to prevent it. At the time, I
was pretty familiar with the case, but I don't like to trust my memory
100% and really didn't want to go back and re-read the report. I've read
your paper on the subject and you seem to have quite a bit of knowledge
about the incident so now I can go back to trusting my memory, having
been backed up by an expert. ;-)

I guess what bothers me most about the discussions when they get started
is the belief that some process, language feature or other change would
have prevented it. The error was caught and the system did what it was
designed to do in the event of that error. The problem was not found
ahead of time because someone high enough up the totem pole said "We
don't need to waste the time and money testing this - it worked fine
over there and it will work fine over here." He probably wasn't a
programmer and he wouldn't have been versed in anybody's software design
methodology, so it is pretty well pointless to argue about what language
would have saved the day or what methodology would have caught the
problem. Nothing will save you when you decide to be bone-headed and
toss out whatever validation processes you may already have because you
are under budget or schedule pressure.

MDC
--
======================================================================
Marin David Condic - Quadrus Corporation - http://www.quadruscorp.com/
Send Replies To: m c o n d i c @ q u a d r u s c o r p . c o m
Visit my web site at: http://www.mcondic.com/

"Some people think programming Windows is like nailing jello to the
ceiling... easy with the right kind of nails."

-- Ivor Horton - Beginning Visual C++ 6
======================================================================

Peter Horan

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

I don't understand your excitement. I agree with you.

Howard wrote before my comment and I was agreeing with him.

> No language guarantees an adequate specification of contract.

And I also added afterwards:

> The point about specification is that is can also develop as understanding
> improves, and it can be developed ahead of the code.

Marin D. Condic

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

Thomas Beale wrote:
> The question is whether such bad practice would have occurred in a _culture_
> of design by contract. The question isn't about whether one event might have
> taken place or not if the culture were the same, but Eiffel was use, but
> rather whether it would have taken place in a different development culture.
>

Yeah, but that is a lot like saying "If we lived in a culture of
philosopher-saints, there would be no world hunger." or "If we lived in
a culture of pacifism, there would be no war". Sure, if the Ariane guys
lived in a different world, it wouldn't have been the world they lived
in. In a different universe, the rocket may not have blown up.

My assertion has been that had they bothered to do *any* kind of
checking and testing, they would have found the problem. DbC may have
found the problem in advance. Plugging the damned thing in and running a
flight profile through it would also have found it. They hosed up
because someone said "don't bother checking it out."

Using Ariane-5 to tout Eiffel/DbC or any other technology or process
seems to me to be unfair. Either the claim is "Language/Process X and
*only* X would have saved the day" or the claim is "Language/Process X
and just about any other Language/Process that was properly applied
would have saved the day." In the former case, there is a dearth of
evidence to back this up. In the latter case, it really isn't much of a
claim.

> The assertion being debated is more or less that: DbC represents a different
> culture; such a different culture could well have averted the error, since in
> the DbC mindset, _everything_ that is re-used has its contracts reviewed &
> tested.
>

Do you think that a bunch of metal benders would be infected by a
software process? DbC may be just fine within the world of software but
remember that the bulk of the rocket building exercise is done by
mechanical engineers, electrical engineers, physicists, etc., who A) are
not likely to be versed in any sort of software technology/process and
B) have their own share of development problems which nobody has ever
demonstrated are amenable to the same processes as software.

The claim that DbC/Eiffel would have saved the day is either too
incredible to be believed (and certainly undemonstrated) or it is so
general as to amount to a statement of support for "mom, apple pie and
Chevrolet." I don't think that Ariane-5 makes a good backdrop for
hawking Eiffel/DbC or any other technology. It was a management
screw-up, pure and simple. We've been having those since Adam bit the
apple and I'll lay odds that we'll continue to have them until Gabriel
blows the horn.

Hyman Rosen

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

"Tarjei T. Jensen" <tarjei...@kvaerner.com> writes:
> Does not matter if that particular error is not supposed to
> happen. You still want to handle it and recover. If neccessary by
> shutting down parts of your computing system.

But every line of code is the potential source of an error that
can't happen! How are you going to handle those errors?

if i = 3 then
-- handle possibility that i is in fact *not* 3?
end if;

Hyman Rosen

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

Jeff Carter <jrca...@acm.org> writes:
> No, the runtime check was left in, and the exception was handled, but
> incorrectly. They thought they had proved no software errors were
> possible, so they treated all exceptions as hardware errors.
>
> Yet another reason to leave runtime checks in and handle them
> intelligently, even when you've proved that the exceptions cannot occur.

But that's exactly what they did! Since they had "proved" that there
could be no software error, the only explanation could be hardware
error.

Preben Randhol

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

On 27 Jun 2000 09:19:46 -0400, Hyman Rosen wrote:
> if i = 3 then
> -- handle possibility that i is in fact *not* 3?
> end if;

perhaps use else ;-)

--
Preben Randhol -- [ran...@pvv.org] -- <http://www.pvv.org/~randhol/>
+---+ "It is assumed that the reader is reasonably familiar with the
|:-?| dpkg System Administrators' manual. Unfortunately this manual
+---+ does not yet exist." -- Debian Packaging Manual

Al Christians

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

"Marin D. Condic" wrote:
> The claim that DbC/Eiffel would have saved the day is either too
> incredible to be believed (and certainly undemonstrated) or it is so
> general as to amount to a statement of support for "mom, apple pie and
> Chevrolet."

Exactly. The IBM methodology of 40 years back, inscribed on plaques
worldwide, would have saved the day just as well. It was a 1-word
methodology: "Think". The 3-word methodology of "design by
contract", like most others, is neither as strong nor as universally
applicable. The Ariane problem was not a design problem -- it was a
use problem.

Al

Ole-Hjalmar Kristensen

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

randho...@pvv.org (Preben Randhol) writes:

I think he is talking about the possiblity that i is not 3 even after
the test has decided that it is 3:

if i = 3 then

if i /= 3 then -- We don't really trust the above test...

-- handle possibility that i is in fact *not* 3?

i := 3; -- After all, it SHOULD have been 3 :-)
end if;
end if;

Like my father used to say, it does not help to secure a fuse with
another fuse. You have to stop somewhere, or your design is totally
obliterated by tests for things which cannot happen, and which does
not really buy you more safety. You have to rely on other mechanisms with
different modes of failure to catch those errors, be it hardware
detection, comparison with another implementation of the same program,
or exhaustive testing over a sufficient set of test cases.

--
--
"Plus I remember being impressed with Ada because you could write an
infinite loop without a faked up condition. The idea being that in Ada
the typical infinite loop would normally be terminated by detonation."
-Larry Wall

Ole-Hj. Kristensen

Hyman Rosen

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

randho...@pvv.org (Preben Randhol) writes:
> On 27 Jun 2000 09:19:46 -0400, Hyman Rosen wrote:
> > if i = 3 then
> > -- handle possibility that i is in fact *not* 3?
> > end if;
> perhaps use else ;-)

No, that's where you handle the possibility that i *is* 3 :-)

Berend de Boer

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

Keith Thompson wrote:

> Do Eiffel compilers generate *portable* C code? I.e., does porting an
> Eiffel compiler consist merely of copying the generated C code from

> one platform to another and recompiling it? If so, I'm impressed --

> especially if this works for the 1750 (which, among other oddities,
> has 16-bit storage units).

Most Eiffel compilers generate plain ANSI C code. And you probably can
use any cross compiler you wish.

Groetjes,

Berend. (-:

Jeffrey Carter

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

Ken Garlington wrote:
>
> MDC: Your interpretation of the report is the same as mine - the exception
> was caught by a global interrupt handler, not a localized exception handler.
> The local exception was suppressed to save run-time, which caused the
> hardware overflow exception to be triggered without subsequently returning
> to the local scope.

I don't recall this from the inquiry report, but since both of you seem
to think this is the case, I'm probably wrong.

--
Jeff Carter
"We burst our pimples at you."
Monty Python & the Holy Grail

tmo...@bix.com

unread,

Jun 27, 2000, 3:00:00 AM6/27/00

to

>Sennett, C. T. (ed). High-Integrity Software. Plenum Press, 1989. ISBN
>0-306-43552-7
>Schulmeyer, G. Gordon. Zero Defect Software. McGraw-Hill, 1990. ISBN
>0-07-055663-6
>Dyer, Michael. The Cleanroom Approach to Quality Software Development. John
>Wiley & Sons, 1992. ISBN 0-471-54823-5
>

>For software safety, I still think Safeware by Nancy Leveson is the best

I note that Computer Literacy, *the* computer bookstore chain of
Silicon Valley, has *none* of these in stock in their stores. Does
that say something about Silicon Valley and quality software? @.@

Thomas Beale

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

"Marin D. Condic" wrote:

> Thomas Beale wrote:
> > The question is whether such bad practice would have occurred in a _culture_
> > of design by contract. The question isn't about whether one event might have
> > taken place or not if the culture were the same, but Eiffel was use, but
> > rather whether it would have taken place in a different development culture.
> >
> Yeah, but that is a lot like saying "If we lived in a culture of
> philosopher-saints, there would be no world hunger." or "If we lived in
> a culture of pacifism, there would be no war". Sure, if the Ariane guys
> lived in a different world, it wouldn't have been the world they lived
> in. In a different universe, the rocket may not have blown up.

I disagree - we're just talking software (and engineering) culture here - you can
go into two different companies and see wildly different development (and for that
matter, management) cultures.

> My assertion has been that had they bothered to do *any* kind of
> checking and testing, they would have found the problem. DbC may have

But they apparently did do a lot of checking and testing; just not on things being
re-used, if I understand correctly.

> found the problem in advance. Plugging the damned thing in and running a
> flight profile through it would also have found it. They hosed up
> because someone said "don't bother checking it out."
>
> Using Ariane-5 to tout Eiffel/DbC or any other technology or process
> seems to me to be unfair. Either the claim is "Language/Process X and
> *only* X would have saved the day" or the claim is "Language/Process X
> and just about any other Language/Process that was properly applied
> would have saved the day." In the former case, there is a dearth of
> evidence to back this up. In the latter case, it really isn't much of a
> claim.

It probably isn't much of a claim that something like DbC might have saved the day!

> > The assertion being debated is more or less that: DbC represents a different
> > culture; such a different culture could well have averted the error, since in
> > the DbC mindset, _everything_ that is re-used has its contracts reviewed &
> > tested.
> >
> Do you think that a bunch of metal benders would be infected by a
> software process? DbC may be just fine within the world of software but
> remember that the bulk of the rocket building exercise is done by
> mechanical engineers, electrical engineers, physicists, etc., who A) are
> not likely to be versed in any sort of software technology/process and
> B) have their own share of development problems which nobody has ever
> demonstrated are amenable to the same processes as software.

Fair enough; probably there is a deep argument about basic engineering quality
processes which need to occur in mech/elec / software engineering alike, and which
need to cross borders in a complex engineering project like a rocket.

- thomas beale

bertran...@my-deja.com

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Al Christians <ach...@easystreet.com> wrote:

> The IBM methodology of 40 years back, inscribed on plaques
> worldwide, would have saved the day just as well. It was a 1-word
> methodology: "Think". The 3-word methodology of "design by
> contract", like most others, is neither as strong nor as universally
> applicable.

A rather ill-considered comment, in my opinion. Design by Contract
is not a 3-word methodology. It has a 3-word name, and is supported
by lots of articles, including a fat book (too fat in some
people's view).

No one has the revealed truth but if you want to dismiss Design
by Contract I think you need more serious arguments than "there is
no silver bullet" (sure -- did anyone claim there is one? --
but that's not a good enough reason to dismiss proposed techniques)
and, now, "a three-word methodology is not enough".

> The Ariane problem was not a design problem -- it was a
> use problem.

Right, more precisely a reuse problem (as the article by
Jezequel and myself indeed argues). Design by Contract has a lot to
say about reuse (as well as design, but also documentation,
analysis, testing, management, and other applications).
In fact I have argued, in the paper and elsewhere, that
it's plain foolish to reuse without contracts.
You may disagree with the way the methodology addresses
the problem at hand, but you can't deny that it includes
it in its scope.

--
Bertrand Meyer
Interactive Software Engineering
ISE Building, 2nd Floor, 270 Storke Road Goleta CA 93117
Phone 805-685-1006, Fax 805-685-6869, http://eiffel.com

Sent via Deja.com http://www.deja.com/
Before you buy.

Ken Garlington

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

"Marin D. Condic" <mcondic...@acm.com> wrote in message
news:3958B07B...@acm.com...
[snip]

>
> My assertion has been that had they bothered to do *any* kind of
> checking and testing, they would have found the problem. DbC may have

> found the problem in advance. Plugging the damned thing in and running a
> flight profile through it would also have found it. They hosed up
> because someone said "don't bother checking it out."

Well, I think this is unfair to the IRS development team (assuming this is
what you meant by "them"). As noted in the report, and in my notes on the
subject at

http://www.flash.net/~kennieg/ariane.html#s3.1.5

the IRS team did not have access to the correct flight profile. If they had
run the Ariane 4 flight profile, they would not have run the error. Granted,
it would have been smart to run an end-to-end test, but that would not have
been sufficient without the proper test data. (Of course, the absence of
this data also implies some difficulties with detecting the error through a
DbC approach, as described in my notes...)

Ken Garlington

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

"Jeffrey Carter" <jrca...@acm.org> wrote in message
news:39592AAE...@acm.org...

> Ken Garlington wrote:
> >
> > MDC: Your interpretation of the report is the same as mine - the
exception
> > was caught by a global interrupt handler, not a localized exception
handler.
> > The local exception was suppressed to save run-time, which caused the
> > hardware overflow exception to be triggered without subsequently
returning
> > to the local scope.
>
> I don't recall this from the inquiry report, but since both of you seem
> to think this is the case, I'm probably wrong.

The relevant quote from the report:

"The internal SRI software exception was caused during execution of a data
conversion from 64-bit floating point to 16-bit signed integer value. The
floating point number which was converted had a value greater than what
could be represented by a 16-bit signed integer. This resulted in an Operand
Error. The data conversion instructions (in Ada code) were not protected
from causing an Operand Error, although other conversions of comparable
variables in the same place in the code were protected.... Although the
source of the Operand Error has been identified, this in itself did not
cause the mission to fail. The specification of the exception-handling
mechanism also contributed to the failure. In the event of any kind of
exception, the system specification stated that: the failure should be
indicated on the databus, the failure context should be stored in an EEPROM
memory (which was recovered and read out for Ariane 501), and finally, the
SRI processor should be shut down."

There is no Operand Error as an Ada exception name, to my knowledge, so I
assume this is referring to a CPU interrupt generated from the
float-to-fixed conversion instruction execution. I infer from the phrase
"not protected (in Ada code)" that no local interrupt handler was defined.
Most Ada compilers map CPU interrupts related to overflow to
Constraint_Error, Numeric_Error, or both, so a local interrupt handler would
probably have been executed if present. The phrase "in the event of any kind
of exception" implies a global, last-ditch interrupt handler to me.
Additional indirect evidence for this is implied by the recommendation
"Wherever technically feasible, consider confining exceptions to tasks and
devise backup capabilities" found later in the report. Also, elsewhere in
the report, it notes that some conversions were "protected," which I assume
means that the result was saturated or otherwise "fixed" locally on
overflow.

Veli-Pekka Nousiainen

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

"Thomas Beale" <tho...@deepthought.com.au> wrote in message
news:395888DF...@deepthought.com.au...

LOL

> - thomas beale
>
Veli-Pekka

Tarjei T. Jensen

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Hyman Rosen wrote in message ...

>"Tarjei T. Jensen" <tarjei...@kvaerner.com> writes:
>> Does not matter if that particular error is not supposed to
>> happen. You still want to handle it and recover. If neccessary by
>> shutting down parts of your computing system.
>
>But every line of code is the potential source of an error that
>can't happen! How are you going to handle those errors?
>

> if i = 3 then
> -- handle possibility that i is in fact *not* 3?
> end if;

I don't get it.

Your code just says that something is to be done if I = 3. Otherwise it does
nothing.

Greetings,

Marin D. Condic

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Ken Garlington wrote:
>
> There is no Operand Error as an Ada exception name, to my knowledge, so I
> assume this is referring to a CPU interrupt generated from the
> float-to-fixed conversion instruction execution. I infer from the phrase
> "not protected (in Ada code)" that no local interrupt handler was defined.

As I recall, there is no "Operand Error" on the 1750a processor either.
There is an "Overflow" interrupt that can result from the floating to
integer conversion instruction and I took this to be what they meant. As
you re-read the section, it seems to become clear that the writers were
not versed in Ada or 1750a terminology and used words like "exception"
fairly loosly. This has created lots of confusion in the Ada world
concerning this report.

It is not uncommon to tie any hardware interrupts that are not being
detected/used by the software to some general handler that presumes a
hardware failure and takes action on that basis. After reading the
report and talking with other people in the field, we gathered that this
was the case in this situation. Note that the failure *was* detected and
an accommodation *was* in place - just that the accommodation was the
wrong thing to do.

Marin D. Condic

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Ken Garlington wrote:
>
> "Marin D. Condic" <mcondic...@acm.com> wrote in message
> news:3958B07B...@acm.com...
> [snip]
> >
> > My assertion has been that had they bothered to do *any* kind of
> > checking and testing, they would have found the problem. DbC may have
> > found the problem in advance. Plugging the damned thing in and running a
> > flight profile through it would also have found it. They hosed up
> > because someone said "don't bother checking it out."
>
> Well, I think this is unfair to the IRS development team (assuming this is
> what you meant by "them"). As noted in the report, and in my notes on the
> subject at
>

I think the "them" I was referring to is that nebulous "them" or "they"
that we blame everything on. Not necessarily the same "them" or "they"
that covered up the alien landings at Roswell and are currently busy
following me around and tapping my phone lines. :-) (Anybody keeping up
with recent reports of cattle mutilations? :-)

What I meant - and I should have been more precise - was that the IRS in
conjunction with the given vehicle was going to be exposed to certain
conditions. Heat, vibration, Gamma rays, flight envelope, etc. There had
to be *somebody* in the organization who looked at the IRS and made
decisions such as: "Heat? Same as Ariane 4 and it worked fine there. No
need to test" This "somebody" (fow whatever reasons) did not include
"flight profile" in as part of what the IRS should be checked out
against. Maybe they *did* put it in a "shake-and-bake" because the
vibration and temperature ranges were different. If *I* was the systems
engineer responsible for that part, I'd want to verify physical
characteristics like that if they were different from prior flights.

So "they" decided that there was no need to run a test - either on the
IRS itself or in integration with the vehicle - that included the
Ariane-5 flight profile. (Or maybe it just didn't occur to them? Same
thing - it *should* have!) This, in my mind, would be pretty basic and
would normally be part of just about everyone's development process.
They'd have done that way back in the 1960's space program, long before
the invention of DbC/Eiffel, Structured Programming, Peer Reviews or
anything else that someone wants to claim would have saved the day.

Would DbC/Eiffel have saved the day? If DbC/Eiffel would have
pimp-slapped that unknown systems engineer who excluded that test and
forced him to include it, then I guess you can say that DbC/Eiffel would
have saved the day. But since just about any sound engineering practice
would have indicated that the device needed to be run against the new
profile, the claim doesn't amount to much and smells suspiciously of
"marketing". (Does brushing your teeth with Crest give you cleaner,
whiter teeth? Yes. And the same applies if you brushed your teeth with
Colgate, baking soda or sand. So what? The important part is to brush
your teeth!)

This is why I contend that it was a management problem and not a
methodology or technology problem. Somebody out there did *not* do
something that they should have known needed to be done. So far as I
know, there has never been, and never will be, a methodology or
technology that will guarantee that people won't shoot themselves in the
foot from time to time. If a claim is made that DbC/Eiffel would create
a "no foot shooting culture" - that is interesting, but not compelling.
Most cultures I know of frown upon and discourage their members from
shooting themselves in the foot. Yet every so often, a bridge in Tacoma
falls down, a skywalk in a hotel atrium colapses, a space shuttle blows
up, etc. Coming back after the fact and saying "my pet technology would
have prevented that!" is pretty well useless unless it can rightfully
claim that it would have caught and fixed the specific fault. Since the
fault was a lapse of human judgement - not a technological weakness - I
find such a claim difficult to believe and even harder to demonstrate.

Igor Boukanov

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

In comp.lang.eiffel Marin D. Condic <mcondic...@acm.com> wrote:
> Ken Garlington wrote:
> .....

Exactly! In Eiffel/DbC mostly checks related to static typing can be done
by compiler, to check the rest you have to run the code. And if one
decided not to test, then it does not matter was it DbC or not.

Regards, Igor

Al Christians

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Thomas Beale wrote:
> An Eiffel system was made to run
> inside an HP printer...
>

Tell us about this, if you can. When I last checked for Eiffel
success stories, there were only a few, and the HP printer thing
was the most recent one that looked at all significant. But I
hear from acquaintances at HP say that they developed a JVM for
controlling printers, they were most pleased with the JVM, and
Eiffel was not anything major in HP's printer products. What's
the real story?

Al

Al Christians

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

"Marin D. Condic" wrote:
> If a claim is made that DbC/Eiffel would create
> a "no foot shooting culture" - that is interesting, but not > compelling.

Here's what I find on page 128 of the only DBC book I have that
predates the Ariane problem, _Seamless_OO_SOftware_Architecture_,
by Walden and Nerson:

".. restricting reuse to what can be successfully planned will not
leave us much better off than with the old technologies. ...In the
meantime, we are much better off with 'accidental reuse' than with
no reuse at all."

Al

Hyman Rosen

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

You are ignoring the possibility that the test for 'i = 3' may
return true, we will begin executing the statement controlled
by that condition, and we will discover to our chagrin that
despite the test, i is not equal to 3. Don't you want to write
code to handle that possibility? Look at your quote above about
handling errors which are not supposed to happen.

Hyman Rosen

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

tmo...@bix.com writes:
> I note that Computer Literacy, *the* computer bookstore chain of
> Silicon Valley, has *none* of these in stock in their stores. Does
> that say something about Silicon Valley and quality software? @.@

Also, a couple of weeks ago I was in the McGraw-Hill bookstore in
Manhattan. It's one of the best science and technology bookstores
around. They had exactly two Ada books on their shelves, one of
which was an old annotated reference manual.

bertran...@my-deja.com

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

"Ken Garlington" <Ken.Gar...@computer.org> wrote:

> [...] For the record, here's my opinion on the whole thing:
>
> http://www.flash.net/~kennieg/ariane.html

I just realized that our paper at
http://www.eiffel.com/doc/manuals/technology/contract/ariane/
did not include a link Mr. Garlington's critique of it. This is
an oversight and it's being fixed. Even though we disagree
with Mr. Garlington's comments (as discussed earlier in these
forums) and even though it is not common for commercial sites
like ours to link to contrarian views, we basically think of our
site as a way to promote software quality; giving readers of
our paper access to a contrarian view and letting them make
up their own mind can only be beneficial. My apologies to
Mr. Garlington for not noticing the oversight earlier.

--
Bertrand Meyer
Interactive Software Engineering
ISE Building, 2nd Floor, 270 Storke Road Goleta CA 93117

Phone 805-685-1006, Fax 805-685-6869, http://www.eiffel.com

mjs...@my-deja.com

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

In article <t71z1ha...@calumny.jyacc.com>,

I think these two recommendations from the Ariane report address this
issue nicely:

R3 Do not allow any sensor, such as the inertial reference system, to
stop sending best effort data.

R6 Wherever technically feasible, consider confining exceptions to
tasks and devise backup capabilities.

Should a hardware error be detected (and many won't -- if 'i' got
flipped to a 2 it would probably not exceed any tested bounds) then
perhaps the local handler can return the last good data, or a simple
extrapolation of the last data, and it might also re-initialize
hardware or data as appropriate, or simply restart the module in
question. If the hardware error was transient then everything may get
back to normal in a few mS; if it's permanent then you have to fall
back to redundant hardware. The key phrase is "where technically
feasible".

Mike

tmo...@bix.com

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

>Also, a couple of weeks ago I was in the McGraw-Hill bookstore in
>Manhattan. It's one of the best science and technology bookstores

Each Computer Literacy store I've been in here in Silicon Valley has
a set of Ada books in the "Programming Languages, Ada" section. So I
guess the Manhattan McGraw-Hill, while generally very good in science
and technology, is not so good in computer technology. Do they have a
section on "reliable software", or, as test cases, any of

>Sennett, C. T. (ed). High-Integrity Software. Plenum Press, 1989. ISBN
>0-306-43552-7
>Schulmeyer, G. Gordon. Zero Defect Software. McGraw-Hill, 1990. ISBN
>0-07-055663-6
>Dyer, Michael. The Cleanroom Approach to Quality Software Development. John
>Wiley & Sons, 1992. ISBN 0-471-54823-5
>
>For software safety, I still think Safeware by Nancy Leveson is the best

www.computerliteracy.com does have such books, but in a bookstore you
can take a look before buying, and, importantly, people who never
imagined putting the words "reliable" and "software" together (there
exist many such people in Silicon Valley), may discover them while
browsing.

Eirik Mangseth

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

"Al Christians" <ach...@easystreet.com> wrote in message
news:395A190E...@easystreet.com...

Proving what? That the same management that insisted continuing with
C decided to run with the herd? It's called "management by magazine"
and it's not very commendable, but very common.

> What's the real story?

The evangelist left and management bought
into Sun's marketing machine trading quality
for fashion.

>
>
> Al

Eirik M

"If I can't Eiffel in heaven, I won't go"

wv...@my-deja.com

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Many Ada proponents believe it's possible to do better than Windows.
Till they are proven right, they have to post from their Windows box.
Wil

In article <UVR45.52$SR1....@news.pacbell.net>,
tmo...@bix.com wrote:
>
> Are there textbooks and CompSci courses on reliable programming?
> There seem to be a lot of people around who believe it's impossible
> to do any better than, say, Windows, and there's very little mention
> of counterexamples, much less how to build them.

Al Christians

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Eirik Mangseth wrote:
>
> The evangelist left and management bought
> into Sun's marketing machine trading quality
> for fashion.
>

Can you expound any examples of bad behavior of HP printers caused
by this lack of quality? Those things are everywhere, so if you
can point out the problems, you would have a better example
in favor of DBC than the Ariane V, as half the computerized world
could easily verify the flaws first-hand.

What do you call an evangelist who doesn't convert anyone?

Al

Jeffrey Carter

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

Ken Garlington wrote:
>
> "Jeffrey Carter" <jrca...@acm.org> wrote in message

> > I don't recall this [runtime checks suppressed, hardware interrupt caught the out-of-range value on conversion from floating-point to integer] from the inquiry report, but since both of you seem

> > to think this is the case, I'm probably wrong.
>
> The relevant quote from the report:

[Long quote deleted]

>
> There is no Operand Error as an Ada exception name, to my knowledge, so I
> assume this is referring to a CPU interrupt generated from the
> float-to-fixed conversion instruction execution. I infer from the phrase
> "not protected (in Ada code)" that no local interrupt handler was defined.

> Most Ada compilers map CPU interrupts related to overflow to
> Constraint_Error, Numeric_Error, or both, so a local interrupt handler would
> probably have been executed if present. The phrase "in the event of any kind
> of exception" implies a global, last-ditch interrupt handler to me.
> Additional indirect evidence for this is implied by the recommendation

> "Wherever technically feasible, consider confining exceptions to tasks and

> devise backup capabilities" found later in the report. Also, elsewhere in
> the report, it notes that some conversions were "protected," which I assume
> means that the result was saturated or otherwise "fixed" locally on
> overflow.

I've read the entire report, and find it nebulous enough that it can be
interpreted however one chooses. Your interpretation is as likely as any
other. Certainly Operand Error is not an Ada exception (predefined or
otherwise) since it contains a space, but that doesn't mean the
"exception" is not an Ada exception nor that it is not a hardware
interrupt. Similarly, "protecting" a conversion could mean having a very
local exception handler for the conversion that does something when the
conversion fails, or having code that explicitly checks the range and
does something with out-of-range values.

I interpreted the report as meaning runtime checks were on but no local
handler for this conversion; you and Condic interpreted as runtime
checks off and a hardware interrupt generated. I see nothing that
supports one view more than another except the greater number of people
who interpreted as meaning a hardware interrupt. That is not conclusive,
but I'm willing to accept it as strengthening the possibility that my
interpretation is incorrect.

--
Jeff Carter
"I blow my nose on you."

Jeffrey Carter

unread,

Jun 28, 2000, 3:00:00 AM6/28/00

to

"Marin D. Condic" wrote:
> As I recall, there is no "Operand Error" on the 1750a processor either.
> There is an "Overflow" interrupt that can result from the floating to
> integer conversion instruction and I took this to be what they meant. As
> you re-read the section, it seems to become clear that the writers were
> not versed in Ada or 1750a terminology and used words like "exception"
> fairly loosly. This has created lots of confusion in the Ada world
> concerning this report.

I've never used a processor that could detect a type-conversion range
error, which no doubt contributed to me interpreting the report as
meaning runtime checks were left on but not locally handled for the
conversion in question.

Are you saying that on the 1750A would trigger a hardware interrupt from
the following:

with Ada.Text_IO;
with System;
procedure Overflow_Test is
pragma Suppress (All_Checks);

type Big is digits System.Max_Digits;

Too_Big : Big;
I : Integer;
begin -- Overflow_Test
Too_Big := Big (Integer'Last) + 1.0;
I := Integer (Too_Big);
Ada.Text_IO.Put_Line (Integer'Image (I) );
end Overflow_Test;

? With GNAT 3.12p/Win98/AMD K6-2 3D-Now! 400 MHz, it outputs

" 2147483647"

which is interesting in itself.

Removing the pragma Suppress result in Constraint_Error, of course (with
-gnato).

bertran...@my-deja.com

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

Al Christians wrote:

> Can you expound any examples of bad behavior of HP printers caused
> by this lack of quality? Those things are everywhere, so if you
> can point out the problems, you would have a better example
> in favor of DBC than the Ariane V, as half the computerized world
> could easily verify the flaws first-hand.

Thanks for the opportunity. I am not in a position to comment
on HP's strategy. But I can certainly cite some of what happened
during the development that Thomas Beale and Eirik Mangseth have
mentioned in this thread and which is documented on our site.
Among the notable events in that development:

- Because the Eiffel modules were being called by
the main body of C/C++ software, contracts in the
Eiffel part (preconditions) brought up bugs that
had been dormant in the C part, most likely for
years. This caused quite a stir as a bug in a laser
printer is not something to be taken lightly.

- Later in the project a violated contract (which
had not failed during simulation) showed that the
*hardware* (a chip from an external supplier, more
precisely its floating-point power operator) was
faulty! It caused even more of a stir, and interesting
discussions with the supplier.

As an anecdote, we were told in both cases that the initial
finger-pointing was at Eiffel -- "See this strange tool you
brought into the project, it breaks the software!" -- until
it was realized that the fault lay elsewhere and the Eiffel
contracts had actually served to evidence it. This is not
unlike the Ariane case (in different circumstances): reuse
errors. Only in these cases the errors were found during
development, directly thanks to the contracts.

There is a report with more details, in the form of an interview
of the project leader, at
http://www.eiffel.com/eiffel/projects/hp/creel.html.

This is a pretty good example of the kind of difficult real-time
system for which Eiffel has been shown to shine. To answer an
earlier message by Mr. Cristians, I don't know of any rockets
guided by Eiffel software. But many mission-critical systems
with extremely harsh requirements are functioning today (some of
them continuously for the past several years), written in Eiffel.
The page at http://www.eiffel.com/eiffel/projects give a few
pointers. I must qualify it by noting that (frustratingly
for ISE) there are a number of others (such as the price
reporting system of a major financial exchange, and several
large defense-related applications) that we have been unable
to discuss publicly, often because of the customers' quite
understandable desire to keep their leading edge. However
the projects that are documented on the Web page already include
some quite interesting stuff.

--
Bertrand Meyer
Interactive Software Engineering
ISE Building, 2nd Floor, 270 Storke Road Goleta CA 93117

Phone 805-685-1006, Fax 805-685-6869, http://eiffel.com

Ken Garlington

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

<mjs...@my-deja.com> wrote in message news:8jdbds$iop$1...@nnrp1.deja.com...

Actually, a single bit-flip error as you describe can be detected through
parity hardware (and corrected through ECC hardware)

> then
> perhaps the local handler can return the last good data, or a simple
> extrapolation of the last data, and it might also re-initialize
> hardware or data as appropriate, or simply restart the module in
> question. If the hardware error was transient then everything may get
> back to normal in a few mS; if it's permanent then you have to fall
> back to redundant hardware. The key phrase is "where technically
> feasible"

Unfortunately, for feedback systems that are measuring dynamically-changing
data, these may or may not help (holding the "last good value" is typically
what's done in this environment, but how do you know what that is?).
Re-initialization is usually going to generate at least a good-size
transient error, and the loss of previous state -- particularly for an
IRS -- may make the error unrecoverable.

More to the point, the thinking that an unknown error is a hardware error,
to be handled by redundant hardware, is exactly what the Ariane design did -
with very undesirable results!

(It should be noted, by the way, that the recommendations listed above
didn't come with any examples of how to implement them...)

Ken Garlington

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

"Jeffrey Carter" <jrca...@acm.org> wrote in message

news:395A8002...@acm.org...

> I've never used a processor that could detect a type-conversion range
> error, which no doubt contributed to me interpreting the report as
> meaning runtime checks were left on but not locally handled for the
> conversion in question.

Most other CPUs (including the AMD K6, I believe) have the capability to
generate interrupts on overflow; however, the compiler may not wish to
generate code to take advantage of it (particularly under Windows!). Note
that this isn't exactly the same as a "type-conversion range error" --
hardware interrupts typically work on violations of hardware sizes, and not
necessarily abstract type sizes. For example, if you declare a type

type X is Integer range 1 .. 10;

then in the general case, an assignment of zero won't raise a hardware error
(since zero is legal for any integer at the hardware level). So, the
compiler must generate checks (or decide that zero can never be assigned).
On the other hand, for the code you post below...

> Are you saying that on the 1750A would trigger a hardware interrupt from
> the following:
>
> with Ada.Text_IO;
> with System;
> procedure Overflow_Test is
> pragma Suppress (All_Checks);
>
> type Big is digits System.Max_Digits;
>
> Too_Big : Big;
> I : Integer;
> begin -- Overflow_Test
> Too_Big := Big (Integer'Last) + 1.0;
> I := Integer (Too_Big);
> Ada.Text_IO.Put_Line (Integer'Image (I) );
> end Overflow_Test;

The 1750 would generate a hardware overflow on the execution of the add
(assuming the generated code actually let the add take place). Of course, as
soon as the interrupt is raised, the run-time interrupt handler would likely
respond by returing to the instruction after the one that caused the
interrupt, since exceptions are suppressed.

> ? With GNAT 3.12p/Win98/AMD K6-2 3D-Now! 400 MHz, it outputs
>
> " 2147483647"
>
> which is interesting in itself.

Looks perfectly reasonable to me...

> Removing the pragma Suppress result in Constraint_Error, of course (with
> -gnato).

Looking at the GNAT output, it appears that the compiler detects that this
will always raise an exception (as it will), and so when you turn on the
check it simply inserts the code to raise it immediately (with no actual
check in the object code). Most 1750 compilers would do the same thing. The
more interesting case is:

function Overflow_Test (F : Float) return Integer is
begin
return Integer(F);
end;

It appears as though GNAT generates an explicit check of the value of Float,
to ensure that it is in the range of Integer, before doing the conversion.
So, it does not permit the hardware to generate an overflow. On the other
hand, not too many 1750s support Windows (!), and efficient code in the
normal case is highly prized, so at least some compilers will simply do the
1750 instruction to convert float to integer (a single instruction), and let
the interrupt handler catch the (hopefully rare) out-of-bounds cases.

Ken Garlington

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

"Jeffrey Carter" <jrca...@acm.org> wrote in message

news:395A7DE9...@acm.org...

> I interpreted the report as meaning runtime checks were on but no local
> handler for this conversion; you and Condic interpreted as runtime
> checks off and a hardware interrupt generated. I see nothing that
> supports one view more than another except the greater number of people
> who interpreted as meaning a hardware interrupt. That is not conclusive,
> but I'm willing to accept it as strengthening the possibility that my
> interpretation is incorrect.

The key item that persuaded me to my view was the statement "It has been
stated to the Board that not all the conversions were protected because a
maximum workload target of 80% had been set for the SRI computer." Given my
experience with most Ada compilers' approaches to generating code for the
MIL-STD-1750, I can't think of any obvious reason why adding a local
_handler_ to the code would affect throughput in the normal flow of
execution. However, if the compiler generates an explicit check of
floating-point values before converting them to integer (which it may or may
not have to do, as I note in a different message), then this will take up
throughput in the normal case. Therefore, it makes more sense that the check
was suppressed. However, even if the check is suppressed, the interrupt will
still occur. If the interrupt is then mapped to a global fault handler (as
opposed to an Ada exception handler), then the result described in the
report makes sense.

Tarjei T. Jensen

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

Al Christians wrote
>Can you expound any examples of bad behavior of HP printers caused
>by this lack of quality? Those things are everywhere, so if you
>can point out the problems, you would have a better example
>in favor of DBC than the Ariane V, as half the computerized world
>could easily verify the flaws first-hand.

We have quite a number of HP printers and the problem rate especially with
regards to jetDirect cards are high enough to be annoying. It is not uncommon
for the tcp/ip stack to crash. Some of the printers do not wake reliably when
in power saving mode.

As far as I can se everything that were discovered in the HP project would have
been detected by any other programming language which does range checking on
variables. Other people have had similar experiences with Ada (search c.l.a.
for Dowhile Jones postings).

Greetings,

Tarjei T. Jensen

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

Hyman Rosen wrote:
>You are ignoring the possibility that the test for 'i = 3' may
>return true, we will begin executing the statement controlled
>by that condition, and we will discover to our chagrin that
>despite the test, i is not equal to 3. Don't you want to write
>code to handle that possibility? Look at your quote above about
>handling errors which are not supposed to happen.

No, I am not disregarding that possibility. That sort of problems are not
something that a program deals with on a statement to statement basis. There is
a time and place for everything.

Greetings,

pc_c...@my-deja.com

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

In article <8jd4bb$na7$1...@toralf.uib.no>,

bouk...@sentef2.fi.uib.no (Igor Boukanov) wrote:
>
> Exactly! In Eiffel/DbC mostly checks related to static typing can be
done
> by compiler, to check the rest you have to run the code. And if one
> decided not to test, then it does not matter was it DbC or not.
>

I do not agree. DbC means *integrating* contracts, and thus the
specification of the code, with the implementation of the code. It
is impossible to reuse a given piece of code written with DbC without
actually seeing the contracts. It is impossible to reuse a given method/
function/procedure without seeing the contracts belonging to the
given method/function/procedure. Just as it is impossible not to see
and take into consideration the signature of the method/function/
procedure.

Seeing the contracts will alert a potential programmer, about to
reuse some piece of code, to the specific contracts/specifications
for the code he/she is about to reuse. It will make him/her think
about how he/she can reuse the code. And that is independent of
testing the code with contracts turned on or not.

Of course there's no guarantee that just because a language supports
DbC, programmers using the language will always specify contracts.
But the culture promoted by being able to specify contracts and
eventually using contracts will definitely reduce the amount of errors,
due to wrong reuse, in production code.

Just as a programming culture of consistent coding style and consistent
choice of module/class/function/attribute etc. names will in general
make the code more easy to support and/or reuse. There is of course
no guarantee for it (since many other factors also affect how easy a
given piece of code is to support/reuse), but generally well written
consistent code is easier to support and/or reuse.

So code with contracts is generally harder to mis-reuse than code
without contracts. And in my personal experience harder by *magnitude*
to mis-reuse than code without contracts. Add to that the ability
to check the contracts (ie. specifications) against the implementation
in runtime tests and you have a formidable tool for developing high-
quality reusable code.

I fully agree with the slogan "Reuse without contracts is sheer folly".
Just as delivering production code without testing is sheer folly. I am
convinced that in 5 or 10 years time very few people will consider
developing any new software without support for contracts along
the lines of Eiffel/DbC.

That said, of course no one can say that Eiffel/DbC would definitely
have caught the Ariane bug, but then again, as I've understood, no one
has claimed so.

/Paul Cohen

David Starner

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

On Wed, 28 Jun 2000 22:05:27 GMT, wv...@my-deja.com <wv...@my-deja.com> wrote:
>Many Ada proponents believe it's possible to do better than Windows.
>Till they are proven right, they have to post from their Windows box.
>Wil

Reliablilty-wise (which is what he was talking about), I think it's
obvious that it's possible to do better than Windows. Go ask one of the
Linux/*BSD advocates, who will point to a computer that has a 100+ day
uptime, then look at the Vax person, who will point to a Vax that
has been up without a reboot longer than Linux, *BSD, or Windows have
existed.

--
David Starner - dstar...@aasaa.ofe.org
http/ftp: x8b4e53cd.dhcp.okstate.edu
"A dynamic character with an ability to survive certain death and
a questionable death scene leaving no corpse? Face it, we'll never
see her again." - Sluggy Freelance

Thomas Beale

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

Al Christians wrote:

> Thomas Beale wrote:
> > An Eiffel system was made to run inside an HP printer...
> >
>
> Tell us about this, if you can. When I last checked for Eiffel
> success stories, there were only a few, and the HP printer thing
> was the most recent one that looked at all significant. But I
> hear from acquaintances at HP say that they developed a JVM for
> controlling printers, they were most pleased with the JVM, and

> Eiffel was not anything major in HP's printer products. What's
> the real story?

Well for success stories (other please excuse my making noise about
this, but since we're all cross-posted to c.l.a they might be interested
:-) see under the AMP and GEHR headings at www.eiffel.com or at
http://www.deepthought.com.au/it/eiffel/eif_bus_sys/main.book.pdf
and http://www.gehr.org, particularly at
http://www.gehr.org/technical/source/Documentation/index.html if you
just want to dive into some online Eiffel....

(These are just projects in which I have personal involvement, there are
many others, no doubt bigger and better. )

For the printer, I note Bertrand has already replied on this.

- thomas beale

Marin D. Condic

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

Jeffrey Carter wrote:
> Are you saying that on the 1750A would trigger a hardware interrupt from
> the following:
>
> with Ada.Text_IO;
> with System;
> procedure Overflow_Test is
> pragma Suppress (All_Checks);
>
> type Big is digits System.Max_Digits;
>
> Too_Big : Big;
> I : Integer;
> begin -- Overflow_Test
> Too_Big := Big (Integer'Last) + 1.0;
> I := Integer (Too_Big);
> Ada.Text_IO.Put_Line (Integer'Image (I) );
> end Overflow_Test;
>

Pretty much. If you look at the 1750a instructions, there is one which
will take a Float and convert it to a 16 bit Integer and, in the event
that the Float was outside the range of the Integer, it will trigger the
Overflow interrupt. (The 1750a is pretty spiffy in that it has on-board
floating point processing. This is not the case for many 16-bit
processors)

Now as to *exactly* what happens will depend on the compiler as well as
the chip. My guess would be that if you supressed checks on the above
code and compiled for the 1750a, you'd get this floating-to-integer
instruction at "I := Integer (Too_Big);" However, I could imagine a
compiler producing some other kind of code in the face of this and you
may get some completely different result.

If I had my 1750a book handy, I'd look up the instruction for you and
quote you the exact behavior of the instruction. Since I don't I guess
we can leave that as an "Exercise For The Student" :-)

Ken Garlington

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

<pc_c...@my-deja.com> wrote in message news:8jfabb$1d8$1...@nnrp1.deja.com...

> That said, of course no one can say that Eiffel/DbC would definitely
> have caught the Ariane bug, but then again, as I've understood, no one
> has claimed so.

The exact claim was: "Does this mean that the [Ariane 5] crash would
automatically have been avoided had the mission used a language and method
supporting built-in assertions and Design by Contract? Although it is always
risky to draw such after-the-fact conclusions, the answer is probably
yes..."

carr_tom

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

From MIL-STD-1750A (2 July 1980):

FIX Ra,Rb

The integer portion of the floating point Derived Operand, DO (i.e., the
contents of registers Rb and Rb+1), is stored into register Ra. If the
actual value of the DO floating point exponent is greater than 1F(hex),
then Ra remains unchanged and a fixed point overflow occurs. The condition
status, CS, is set based on the result in Ra.

The instruction description for EFIX (convert extended precision floating
point to 32-bit integer) is similar. Both of these can produce a fixed
point overflow interrupt.

Note that the 1750A also includes separate interrupts for floating point
overflow and floating point underflow.

How these interrupts are handled depends on what mechanisms have been
programmed into the run time, which tends to vary from program to program.

All three of these overflow/underflow interrupts can be masked, so a
compiler with checks suppressed could generate code that chooses to handle
overflow by
initially setting "Ra" to integer'last,
masking the overflow interrupt,
performing the conversion,
clearing the interrupt (whether it happened or not),
unmasking the interrupt, and
going merrily on its way.

What happens on using any specific Ada compiler targeted to the 1750A is
left as an "Exercise For The Student".

YMMV

/Tom
--
Thomas Peter Carr | I have a dream, ...
carr...@si.com (Internet) | M L King Jr 08/28/63
616-241-8846 / 616-241-8745 FAX (Telephone) |
Smiths Industries, MS 3D1; 3290 Patterson Ave SE; Grand Rapids, MI 49512-1991

Hyman Rosen

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

"Tarjei T. Jensen" <tarjei...@kvaerner.com> writes:

> No, I am not disregarding that possibility. That sort of problems
> are not something that a program deals with on a statement to
> statement basis. There is a time and place for everything.

And I am saying that it's just as valid to conclude that an
overflow can't happen on a conversion as it is to believe
that a variable will have a certain value.

Tarjei Tjųstheim Jensen

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

It is entirely different. You can have automatic detection of overflow.
Most if not all processors can do that if requested.

Greetings,

Hyman Rosen

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

"Tarjei Tjųstheim Jensen" <tar...@online.no> writes:
> It is entirely different. You can have automatic detection of overflow.
> Most if not all processors can do that if requested.

A major part or programming is reasoning about the program's behavior.
Once you conclude that something cannot happen, what is the point of
putting in checks just in case it happens anyway? Conversely, if you
are putting in such checks, why do you not worry equally about the
i = 3 case?

Bertrand Meyer

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

Tarjei T. Jensen wrote:

> As far as I can [see] everything that were discovered in the HP [printer] project

> would have been detected by any other programming language which does range checking on
> variables.

As a matter of fact, no. It is true that range checks are very useful,
but they capture only a small part of what can be expressed and
detected through contracts. The hardware defect that came out
during the HP project and caused such a shock was (as mentioned in my
earlier posting) a flaw in the hardware power operator in the chip,
and it manifested itself during testing through a violation of an
assertion of the form

2 ^ i <= N

(where ^ is the power operator; if my memory is correct i was 8 and N
was 256). This cannot be expressed as a simple interval type
declaration.

I believe the other problems encountered and detected also involved
non-trivial assertions. With contracts you can use the full power
of boolean expressions (and indeed, with Eiffel's agent mechanism,
arbitrary high-order functionals); for example you can have
a precondition of the form

some_property (x)

where `some_property' is a function describing a complex
condition. The actual conditions used in practice vary from
things like "this reference is non-void, i.e. that object
exists" (very simple, but repeatedly useful, and not expressible
as an interval type declaration) to rather complex ones,
such as "this object is present in the hash table",
"that window has been displayed at least once", "that list
of integers includes at least one positive value",
"that list of cars includes at least one non-domestic model" etc.
They include simple range checks but go far beyond them.

-- Bertrand Meyer
ISE Inc., Santa Barbara
http://www.tools-conferences.com http://www.eiffel.com

Robert Dewar

unread,

Jun 29, 2000, 3:00:00 AM6/29/00

to

In article <395BCE66...@eiffel.com>,

Bertran...@eiffel.com wrote:
> Tarjei T. Jensen wrote:
>
> > As far as I can [see] everything that were discovered in the
HP [printer] project
> > would have been detected by any other programming language
which does range checking on
> > variables.
>
> As a matter of fact, no. It is true that range checks are very
useful,
> but they capture only a small part of what can be expressed
and
> detected through contracts.

But the real issue is whether contracts as provided in Eiffel
are sufficiently more useful than the simple pragma Assert
found in typical Ada compilers. All the examples in this
message are in the category of simple assertions, and I think
it is far from clear that a case can be made for anything
more complex. Yes, I know examples can be constructed, but
I am struck by the fact that virtually 100% of examples
given do not need the additional complexity.

Ed Falis

unread,

Jun 30, 2000, 3:00:00 AM6/30/00

to

Robert Dewar <robert...@my-deja.com> wrote:
> But the real issue is whether contracts as provided in Eiffel are
> sufficiently more useful than the simple pragma Assert found in
> typical Ada compilers. All the examples in this message are in the
> category of simple assertions, and I think it is far from clear that a
> case can be made for anything more complex. Yes, I know examples can
> be constructed, but I am struck by the fact that virtually 100% of
> examples given do not need the additional complexity.

Having used both Ada assertions and Eiffel DbC, I find one significanct
difference, and one notational convenience in the latter.

The first is that preconditions and postconditions are inherited by
descendants, and have strict rules about the manner in which they can by
extended, which are checked by the compiler. This could be done
manually, but would involve a lot of cutting and pasting with pragma
Assert.

Invariants, which must hold following initialization, and after exit
from any exported routine, are much more easily implemented in the
Eiffel notation. Again, they could be done with pragma Assert, but at a
cost of a great deal of drudgery. There would still be no compiler
support to ensure that they'd been put in.

I haven't given much thought as to whether pragma Assert could simulate
the use of Eiffel agents in predicates, but I suspect from the cases
where I've used them that it would get pretty complex. And I did find
them very useful in the last substantial piece of software I built.

So, I do find DbC as implemented in Eiffel to be more powerful, cleaner
and more useful than pragma Assert. But, like most things, it depends on
how you use them, and how integrated they are in the development
process.

- Ed