'Heisenbugs'

Ed Peschko

unread,

Jan 12, 1994, 6:46:23 PM1/12/94

to

hello --

I just read today in an article on distributed programming that there are a
class of bugs nicknamed 'Heisenbugs': bugs that (they said) are due to the
'timing dependance of machines, and which go away when debugging or tracing
is turned on.'

They go on to say that 'They are among the most frustrating bugs to find' but
give no actual advice on HOW TO FIND THEM.

This interested me a great deal, because I can think of at least two examples
of my own C++ code in which I had 'Heisenbugs', and in fact did not find them.

Does anyone have sources of articles on how to deal with these horrid beasts?
Or any advice on the side on how to deal with them?

Thanks much,

Ed

Ben Last

unread,

Jan 13, 1994, 3:22:45 AM1/13/94

to

In article <peschko.758418383@s13> pes...@arc.umn.edu writes:

>I just read today in an article on distributed programming that there are a
>class of bugs nicknamed 'Heisenbugs': bugs that (they said) are due to the
>'timing dependance of machines, and which go away when debugging or tracing
>is turned on.'
>

>Does anyone have sources of articles on how to deal with these horrid beasts?
>Or any advice on the side on how to deal with them?

Often a good approach to these things is to gain as much data as possible
about what may be happening and theorise. Then test the theory. Usually,
it's easy to test a theory by embedding prints into the code to show
the state of variables. On occasion, you'll find that, while you can't
debug the code by stepping, you can set breakpoints and still get the
behaviour to happen.

I think that the first impulse of many programmers is to leap into the
debugger and step through the code. This is usually a good strategy, but
"heisenbugs" (great name!) need a lateral approach. The folklore has it
that these are a new step in bug evolution; they know when they're being
observed and change their behaviour. Also called Schrodinger's bugs... :-)

--
--------------------------------------------------------------------------
| Ben Last, Fisons Instruments | (All postings are personal, and not the |
| b...@vgdata.demon.co.uk | opinions of Fisons Instruments) |
--------------------------------------------------------------------------

Rudi Vankemmel

unread,

Jan 13, 1994, 7:45:20 AM1/13/94

to

Ed Peschko (pes...@arc.umn.edu) wrote:
: hello --

: I just read today in an article on distributed programming that there are a
: class of bugs nicknamed 'Heisenbugs': bugs that (they said) are due to the
: 'timing dependance of machines, and which go away when debugging or tracing
: is turned on.'

...
: This interested me a great deal, because I can think of at least two examples

: of my own C++ code in which I had 'Heisenbugs', and in fact did not find them.

...

I'm interested in this too. I had a couple of bugs I could solve
by inserting very small delays in my code (using sleep() or local wait
loops). However I don't like this idea. If you have responses can you
summarise on the net ?

Thanks....
--
-------------------------------------------------------------------------------
Rudi Vankemmel | These are my opinions, not those of
IMEC vzw. - ASP division | my employer, so don't take them away
Process and Device Modelling group |________________________________________
Kapeldreef 75 phone: (32)-(0)16/28 13 37
3001 Leuven fax: (32)-(0)16/28 12 14
Belgium email: vank...@imec.be
-------------------------------------------------------------------------------

Carl Orthlieb

unread,

Jan 13, 1994, 12:26:28 PM1/13/94

to

From the Hacker's Lexicon:

Heisenbug /hi:'zen-buhg/ from Heisenberg's Uncertainty Principle in
quantum physics n. A bug that disappears or alters its behavior when one
attempts to probe or isolate it. Antonym of Bohr bug; see also mandelbug,
schroedinbug. In C, nine out of ten heisenbugs result from either
fandango on core phenomena (esp. lossage related to corruption of the
malloc arena) or errors that smash the stack.

Schroedinbug MIT: from the Schroedinger's Cat thought-experiment in
quantum physics n. A design or implementation bug in a program which
doesn't manifest until someone reading source or using the program in an
unusual way notices that it never should have worked, at which point the
program promptly stops working for everybody until fixed. Though this
sounds impossible, it happens; some programs have harbored latent
schroedinbugs for years. Compare heisenbug, Bohr bug, mandelbug.

Bohr bug /bohr buhg/ from quantum physics n. A repeatable bug; one that
manifests reliably under a possibly unknown but well-defined set of
conditions. Antonym of heisenbug; see also mandelbug, schroedinbug.

Mandelbug /mon'del-buhg/ from the Mandelbrot set n. A bug whose
underlying causes are so complex and obscure as to make its behavior
appear chaotic or even non-deterministic. This term implies that the
speaker thinks it is a Bohr bug, rather than a heisenbug. See also
schroedinbug.

Robert Lerche

unread,

Jan 13, 1994, 12:47:29 PM1/13/94

to

pes...@arc.umn.edu (Ed Peschko) writes:

>I just read today in an article on distributed programming that there are a
>class of bugs nicknamed 'Heisenbugs': bugs that (they said) are due to the
>'timing dependance of machines, and which go away when debugging or tracing
>is turned on.'

...

>Does anyone have sources of articles on how to deal with these horrid beasts?
>Or any advice on the side on how to deal with them?

Ah, debugging. Here are some rules of thumb.

1. Distrust everything

- your own code
- the compiler
- the run time library
- the operating system

I just tracked down a bug in a vendor's implementation of "sscanf"
-- it failed to check for end of input string when processing a
"%s" specification, resulting in a clobbered stack.

2. Get and use appropriate tools

These days powerful debuggers are available and many processors
have some kind of hardware break point capability. One of the
most powerful tools for tracking down a "Heisenbug" is a memory
store break point.

3. Instrument your code

For debugging real time, event driven systems this is almost a
requirement. A simple trace table (circular buffer in memory)
filled with key events is a tremendous help.

4. Read the machine instructions

This is a corollary of (1) -- look at the generated code, not just
the source. This can often lead directly to the problem, even
when there's no compiler bug. I once found a stack clobber by
noticing a particular procedure had a very large stack allocation,
which wasn't obvious by looking at the source code because it was
an array with a size given by a #define'd symbol.

5. Follow the path where it leads

This is really the key to all debugging and it goes back to (1)
again.

When you're on the track of a problem, and you reach a point where
the thought "oh, it can't be that" occurs to you, CHECK IT. Avoid
making assumptions about what must be so; check every assumption
by testing it.

Richard Romero

unread,

Jan 13, 1994, 3:15:30 PM1/13/94

to

Excerpts from netnews.comp.unix.programmer: 13-Jan-94 Re: 'Heisenbugs'
Ben La...@vgdata.demon.co (1502)

> Usually,it's easy to test a theory by embedding prints into the code to

> show the state of variables.

Unfortunately, even this can cause Heisenbugs to go away... I've had one
where the prints caused enough I/O + X windows overhead to slow down the
program enough so that the bugs disappeared...

My recommendation is to first ask yourself 'am I using any code that
depends on calls to OS timing or time functions?' and if you are, then
'does the accurracy/granularity of this timing matter to my code?' That
would have been the quickest way for me to find my bug...

-rick romero

Lezz Giles

unread,

Jan 13, 1994, 11:59:48 AM1/13/94

to

One approach that I have found works for most bugs, and which has worked
for some Heisenbugs, is to find the smallest possible change in the
source code that causes the bug to appear/disappear. In the case
of Heisenbugs there are probably several equivalent small changes, so it
then becomes the task of finding the smallest significant change - e.g.
instead of finding that adding a print statement can remove the bug, you
try moving local variables, eliminiating or extending loops, etc. I
removed a bug in perl this way - I never found what the bug was, but
I found some changes that were so minimal that they didn't affect the
functionality of the code; they just removed the bug.

Just my $0.02 worth

Lezz

H. Anders Lonnemo

unread,

Jan 13, 1994, 8:30:50 PM1/13/94

to

Just wondering, what was your confidence level that you had actually
removed the bug?

Anders

peter hoffman

unread,

Jan 13, 1994, 3:37:38 PM1/13/94

to

Since the topic is "Heisenbugs" I thought I would mention the one that I
use as an example when I have to explain the term "Heisenbug" --

I was working on some C code under MSDOS using the MS C compiler. The
program would sometimes run fine and sometimes lock up. It locked up often
enough so you could make progress debugging it. Fired up CodeView and went
to work -- no bug, no lock ups! Without CodeView -- bug and lockups!
Canned CodeView and went with printf()'s. Narrowed it down to one location.
If I put in *any* kind of printf() there it worked fine. My boss came over
to see how things were going and I explained what I had found.

"Uninitialized pointer" he said. "CodeView initializes the pointers so it
can control and debug the program" he continued. "The printf() statement
just shifts everything around a little so the wild pointer isn't smashing
anything important" he concluded.

After *careful* grovelling through the code I found that he was right!

Thanks jce@usc, those were educational days!

Peter Hoffman
--
..!uunet!melanie!peter -- Public Access Linux -- (USA)803/271-0688

John Murray

unread,

Jan 13, 1994, 10:32:48 PM1/13/94

to

In article <CJL5A...@melanie.uucp> pe...@melanie.uucp (peter hoffman) writes:
>Since the topic is "Heisenbugs" I thought I would mention the one that I
>use as an example when I have to explain the term "Heisenbug" --
>

[good example, but deleted anyway :-) ]

>
>Thanks jce@usc, those were educational days!
>Peter Hoffman

I think I found my first Heisenbug around 1975 when, at a tender age, I
discovered that moving a COBOL sub-routine from one part of the source
file to another had the same effect as inserting debugging statements
- i.e. it made the problem go away.

Later, when I "graduated" to microcode, the fun really started. Consider
the problem of inserting a circuit board using an extender so you can
attach oscilloscope probes or logic analyzers. The extension board adds
another couple of nanoseconds to the path lengths, which accumulate in
no time to mask the original problem. Or even better, imagine discovering
- after days of hunting - that you can turn a bug on and off by alternately
blowing hot air or spraying cooling fluid onto one of the chips!

Gosh, I love every minute of it!!!

John Murray, HCILab, University of Michigan

Brent Burton

unread,

Jan 13, 1994, 11:01:38 PM1/13/94

to

b...@vgdata.demon.co.uk writes:
|
|Often a good approach to these things is to gain as much data as possible
|about what may be happening and theorise. Then test the theory. Usually,
|it's easy to test a theory by embedding prints into the code to show
|the state of variables.

This usually does the trick for Heisenbugs I've had. However, I'd
like to tell of a true Heisenbug...

I was doing some average-complexity graphics on an SGI (using GL) when
the software started dumping core for no apparent reason. Probing further,
I found the code was dying in a call to change the color, RGBcolor(),
which is one of the simplest GL calls and only has 3 shorts as arguments.

With the debugger, the program may or may not crash there, which I found
really strange. Next, I inserted prints every so often to track things
and the bug conveniently left the RGBcolor() call for another cozy
residence. In short, every time I thought I had the bug nailed, it
would move. This is the essence of the Heisenbug.

Oh, the bug was traced to using GCC 2.4.5 with GL on the SGIs.

-Brent
bre...@tamsun.tamu.edu

John Sloan

unread,

Jan 13, 1994, 11:35:58 PM1/13/94

to

From article <2h53p0...@srvr1.engin.umich.edu>, by j...@engin.umich.edu (John Murray):

> - after days of hunting - that you can turn a bug on and off by alternately
> blowing hot air or spraying cooling fluid onto one of the chips!

Back in my real-time PDP-11 days I had a standalone disk device driver
that would work in the afternoon but not in the morning. Turns out the
disk controller had a fault where it would work after it had warmed up
for a while. When I went in and turned it on first thing in the A.M., no
go. I tore my hair out on that one until the hardware guys figured it
out.

I've seen "Heisenbugs" (great name) turn up in all sorts of code that
exhibits any sort of asynchronous/real-time/non-deterministic behavior,
where precise order of events cannot be predicted. Just messing around
with stuff as mundane as signals and sockets in C/UNIX can turn one up
once in a while. But I've also seen it in a plain old FORTRAN numeric
code using the old IBM "H" compiler (the very same compiler that I once
had to peruse the assembler output to find that the generated code was
stepping on its own base register; man, I _hated_ the H compiler). Four
of us never did figure out what was going wrong, but that extra
debugging print statement made it work, even if we directed it to "DD
DUMMY" (/dev/null to the UNIX folks). Man, I also _hate_ doing stuff
like that.

--
John Sloan Omnia mutantur, +1 303 497 1243
NCAR/SCD nos et mutamur in illis. Fax +1 303 497 1818
Boulder CO 80307-3000 USA ML48B jsl...@ncar.ucar.edu
Logical Disclaimer: belong(opinions,jsloan). belong(opinions,_):-!,fail.

Will Duquette

unread,

Jan 14, 1994, 11:53:19 AM1/14/94

to

In article <1994Jan13.1...@adobe.com> Carl Orthlieb,

orth...@adobe.com writes:
>Bohr bug /bohr buhg/ from quantum physics n. A repeatable bug; one that

>manifests reliably under a possibly unknown but well-defined set of
>conditions. Antonym of heisenbug; see also mandelbug, schroedinbug.

A friend of mine once encountered what I guess you'd have to call a
mutating
Bohr bug. His program was behaving badly, and after a long day's work,
he had
succeeded in coming up with a sequence of commands that would produce the
bug reliably. It was late, so he went home, confident that he could fix
the bug the next day.

The next day, the bug had disappeared; the fatal sequence of commands
worked
perfectly. Puzzled, he resumed work, and soon encountered another bug.
After a long day's work, he had succeeded in coming up with a sequence of
commands that would produce the bug reliably. It was late, so he went
home,
confident that he could fix the bug the next day.....

Eventually, he discovered that he had an uninitialized pointer--which
reliably pointed at an address in memory that held the current day of the
week. So every day, something different happened.

Dave Seaman

unread,

Jan 14, 1994, 10:16:11 AM1/14/94

to

In article <1994Jan13....@merlin.dev.cdx.mot.com>
le...@merlin.dev.cdx.mot.com (Lezz Giles) writes:
[...]

> try moving local variables, eliminiating or extending loops, etc. I
> removed a bug in perl this way - I never found what the bug was, but
> I found some changes that were so minimal that they didn't affect the
> functionality of the code; they just removed the bug.

You mean they removed the symptoms. The bug was still there.
You can't remove a bug without changing the functionality of the code.

--
Dave Seaman
a...@seaman.cc.purdue.edu

Steven D. Majewski

unread,

Jan 14, 1994, 1:06:12 PM1/14/94

to

In article <0hBOjW200...@andrew.cmu.edu>,

Richard Romero <ric...@CMU.EDU> wrote:
>Excerpts from netnews.comp.unix.programmer: 13-Jan-94 Re: 'Heisenbugs'
>Ben La...@vgdata.demon.co (1502)
>
>> Usually,it's easy to test a theory by embedding prints into the code to
>> show the state of variables.
>
>Unfortunately, even this can cause Heisenbugs to go away... I've had one
>where the prints caused enough I/O + X windows overhead to slow down the
>program enough so that the bugs disappeared...
>

There are also optimizer related(*) bugs that will disappear when either
compiled with different optimization when using the debugger, OR can also
disappear when use of the variable in a print statement causes a
different allocation for the variable.

(*) I say "optimizer related" because: in the old days, I used to find
many optimizer CAUSED bugs; compilers are getting better these days -
most cases I run across now are actually real bugs that can be MASKED
by the optimizer. ( Or else are normally hidden and are revealed by
the optimizer. ). So obviously one trick is to compile the code with
different optimizations and see if the bug disappears or moves around.

- Steve Majewski (804-982-0831) <sd...@Virginia.EDU>
- UVA Department of Molecular Physiology and Biological Physics

IanMaclure

unread,

Jan 14, 1994, 3:32:40 PM1/14/94

to

jsl...@ncar.ucar.edu (John Sloan) writes:

>From article <2h53p0...@srvr1.engin.umich.edu>, by j...@engin.umich.edu (John Murray):
>> - after days of hunting - that you can turn a bug on and off by alternately

[Snip]

>with stuff as mundane as signals and sockets in C/UNIX can turn one up
>once in a while. But I've also seen it in a plain old FORTRAN numeric
>code using the old IBM "H" compiler (the very same compiler that I once
>had to peruse the assembler output to find that the generated code was
>stepping on its own base register; man, I _hated_ the H compiler). Four

^^^^^^^^^^^^^^^^^^^^^^^^

[Snip]

Yes, but I'll bet you now know one heck of a lot more about what goes
on at that level than you would have otherwise. I hated F, G, H, and
H Extended but when, a decade or so later, I found myself embroiled
in assembler work I realized how much I had actually learned from all
the pain and suffering.

IBM
--
################ No Times Like The Maritimes, Eh! ######################
# IBM aka # ian_m...@QMGATE.arc.nasa.gov (desk) #
# Ian B MacLure # maclure@(remulac/eos).arc.nasa.gov (currently) #
########## Opinions expressed here are mine, mine, mine. ###############

Kevin Schnitzius

unread,

Jan 14, 1994, 6:27:10 PM1/14/94

to

r...@imagen.com (Robert Lerche) writes:

>pes...@arc.umn.edu (Ed Peschko) writes:

>>I just read today in an article on distributed programming that there are a
>>class of bugs nicknamed 'Heisenbugs': bugs that (they said) are due to the
>>'timing dependance of machines, and which go away when debugging or tracing
>>is turned on.'

>>Does anyone have sources of articles on how to deal with these horrid beasts?

>>Or any advice on the side on how to deal with them?

>Ah, debugging. Here are some rules of thumb.

>1. Distrust everything

> - your own code
> - the compiler
> - the run time library
> - the operating system

Ayeeeeeeeee!

You would trust the hardware?

I wasted a couple weeks on an intermittant parity-causing cable, but
that wasn't a real Heisenbug. I have seen cases that ran when debugged
which turned out to be "undocumented features" of the hardware.
--
Kevin Schnitzius
kschn...@encore.com

Chris Torek

unread,

Jan 15, 1994, 6:23:54 PM1/15/94

to

In article <ral.758483249@panda> r...@imagen.com (Robert Lerche) writes:
>>Ah, debugging. Here are some rules of thumb.

>>1. Distrust everything ...

In article <CJn7t...@encore.com> ksch...@encore.com

(Kevin Schnitzius) writes:
>You would trust the hardware?

Indeed, hardware is one of the `everything's not to be trusted in
tracking the most stubborn bugs.

Another approach is to use `ESP': Exaggeration of System Parameters.
When looking at some piece of code, imagine that some (normally
slow) operation happens at the very first possible instant, or that
some (normally rapid) operation decides to take several whole
*seconds*. This will help prevent bugs exemplified by, e.g.:

alarm(1);
signal(SIGALRM, catch);

(the two calls here are in the wrong order) and:

signal(SIGALRM, catch);
alarm(1);
pause();

(this code races between the alarm(1) and pause() call).
--
In-Real-Life: Chris Torek, Berkeley Software Design Inc (+1 510 549 1145)
Berkeley, CA Domain: to...@bsdi.com

Michael Hanlon

unread,

Jan 14, 1994, 1:49:14 PM1/14/94

to

In article <0hBOjW200...@andrew.cmu.edu> Richard Romero <ric...@CMU.EDU> writes:
>From: Richard Romero <ric...@CMU.EDU>
>Subject: Re: 'Heisenbugs'
>Date: Thu, 13 Jan 1994 15:15:30 -0500

>Excerpts from netnews.comp.unix.programmer: 13-Jan-94 Re: 'Heisenbugs'
>Ben La...@vgdata.demon.co (1502)

>> Usually,it's easy to test a theory by embedding prints into the code to
>> show the state of variables.

>Unfortunately, even this can cause Heisenbugs to go away... I've had one
>where the prints caused enough I/O + X windows overhead to slow down the
>program enough so that the bugs disappeared...

[...useful alternative deleted...]
>-rick romero

When outputing to a file or screen won't work, I usually write "records" to a
large (say, 64K) buffer, appending as I go. Then after the program has hit a
breakpoint which I have theorized might be interesting, I examine the buffer's
contents using the debugger's dump command. It is easy to make the buffer
wrap-around, if 64K isn't big enough, especially if the records you write to
the buffer are fixed length, and, better yet, have a length which is a power
of two (say 16, 32, ....).The advantages of this technique (writing to the
buffer) are that it is fast, involves no OS calls, and if your environment is
non-preemptive (not many of those anymore) happens in "no time" at all, since
no other process can execute during the write to the buffer.
Michael Hanlon Phone: 512/346-8380
Internet: mha...@novell.com MHS: mhanlon@novell
Please use ^^^^^^^^^^^^^^^^ if responding to me by mail.
Disclaimer: These are my views, I do not speak for Novell.

Rudi Vankemmel

unread,

Jan 17, 1994, 7:07:24 AM1/17/94

to

H. Anders Lonnemo (1736...@MSU.EDU) wrote:

: In article <1994Jan13....@merlin.dev.cdx.mot.com>, le...@merlin.dev.cdx.mot.com (Lezz Giles) says:
: >
: >In article <1994Jan13....@imec.be>, vank...@imec.be (Rudi Vankemmel) writes:
: >|>Ed Peschko (pes...@arc.umn.edu) wrote:
: >|>: hello --
: >|>
: >|>: I just read today in an article on distributed programming that there are a
: >|>: class of bugs nicknamed 'Heisenbugs': bugs that (they said) are due to the
: >|>: 'timing dependance of machines, and which go away when debugging or tracing
: >|>: is turned on.'

: >|>

[ stuff deleted ]

and I wrote:
: >|>
: >|>I'm interested in this too. I had a couple of bugs I could solve

: >|>by inserting very small delays in my code (using sleep() or local wait
: >|>loops). However I don't like this idea. If you have responses can you
: >|>summarise on the net ?

: >|>
: >
: >One approach that I have found works for most bugs, and which has worked

: >for some Heisenbugs, is to find the smallest possible change in the
: >source code that causes the bug to appear/disappear. In the case

[stuff deleted]

: Just wondering, what was your confidence level that you had actually
: removed the bug?

Well this is exactly what bothers me and why I don't like playing around
with those small delays in order to get it working because you're never sure
that it will work again on, for example, a slightly faster machine. I
remember having an X-application which worked fine when the host was not
a net router but failed when the host was configured as a net router.
Now this is VERY cumbersome if you're code must be ported.

Richard van de Stadt

unread,

Jan 17, 1994, 7:57:53 AM1/17/94

to

So, what do you do then? Go home and eat peanuts? :-)

|> 2. Get and use appropriate tools
|>
|> These days powerful debuggers are available and many processors
|> have some kind of hardware break point capability. One of the
|> most powerful tools for tracking down a "Heisenbug" is a memory
|> store break point.

I've got good experience with the tool 'Purify', which intercepts and
checks *every* memory access. In C, for instance, memory can be f***ed up
by some function and then at the strangest, preferably non-predictable,
moments another function can suffer from this. Purify (I don't work for
them, by the way) would detect that a function is accessing memory that
it should not touch. Purify reports exactly what causes the error.

Besides my own code, I've also 'purified' other's code and by just
giving Purify's report to the implementor he could correct his code.

Just my 1,000.00 UK Pounds (That's what 4 'simple licenses' cost us
for educational purposes a year ago. Normally it cost 2,295.00 UK Pounds.
I believe the price has gone up a little since, but it's well worth
the money.)

Richard
--
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\ Richard van de Stadt /
/ Email: st...@cs.utwente.nl \
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
------
Typos in this message are caused by misinterpretation of my keystrokes
by the editor.

Jon Jagger

unread,

Jan 17, 1994, 11:04:05 AM1/17/94

to

Hi,

I once heard (from a Hewlett Packard Engineer) about a program that only
failed on the second Wednesday of the month at around 2pm.

It took a LONG time to suss it it out. The 'puter the program ran on was in a
tall building near a Window facing the docks (it was at a seaside town). Every
second Wednesday of the month the same ship would dock in the dock. This was a
big ship, with powerful radar, and that was what caused it.

I was told that it was definitely true, but even if it isn't its a nice
story.

Cheers
JJ

Paul Andrews

unread,

Jan 17, 1994, 7:31:52 AM1/17/94

to

>...!uunet!melanie!peter -- Public Access Linux -- (USA)803/271-0688

Yup. Unless you are writing interrupt driven code (i.e. multi-threaded) most such
bugs come down to:

1. Uninitialized data.
2. Writing past array bounds.
3. Using data when it has ceased to be valid (e.g. using allocated
memory after it has been free'd).

Use purify. It traps these sorts of things.

P.S. I don't work for Pure Software.

---
,,,
(o o)
+---------------oOO--(_)--OOo------------------------------------+
| Paul Andrews m...@london.sbi.com | |
| +44-71-721-2261 | |
+----------------------------------------------------------------+

William Chesters

unread,

Jan 17, 1994, 1:26:15 PM1/17/94

to

In article <J.R.Jagger....@shu.ac.uk>, J.R.J...@shu.ac.uk (Jon
Jagger) writes:
#
# It took a LONG time to suss it it out. The 'puter the program ran on was in
a
# tall building near a Window facing the docks (it was at a seaside town).
Every
# second Wednesday of the month the same ship would dock in the dock. This was
a
# big ship, with powerful radar, and that was what caused it.

My father tells a story (corroborated by one of his then colleagues)
of making a PDP-something-based controller for a bakery near Heathrow
airport. They had regular crashes which resulted in the whole plant
filling up with over-sticky dough and having to be taken to bits and
cleaned. I must ask him *how* they found out it was the radar ...

--
William Chesters (will...@aifh.ed.ac.uk)
Computer vision? I'll believe that when it sees me.

William Chesters

unread,

Jan 17, 1994, 3:48:33 PM1/17/94

to

In article <CJrxH...@meteor.sbi.com>, m...@london.sbi.com (Paul Andrews)
writes:
# most bugs come down to:
#
# 1. Uninitialized data.
# 2. Writing past array bounds.
# 3. Using data when it has ceased to be valid (e.g. using allocated
# memory after it has been free'd).
#
# Use purify. It traps these sorts of things.

Alternatively, don't use C{++} in the first place. There are
languages out there that don't make you think about pointers,
do have array bounds checking, don't have unitialized data,
and do go fast enough for 95% of apps.

Incidentally, they tend to be much nicer languages to program
in as well.

James Youngman

unread,

Jan 17, 1994, 9:21:23 PM1/17/94

to

In article <CJML2...@mentor.cc.purdue.edu> a...@seaman.cc.purdue.edu (Dave Seaman) writes:

[ Shifting code about without changing the meaning to remove a bug ]

>You mean they removed the symptoms. The bug was still there.
>You can't remove a bug without changing the functionality of the code.

Unless you're up against a bug in your compiler...

Regards,
James Youngman mbc...@hpc.ph.man.ac.uk
RA, SToMP group, Physics Dept, Manchester University (061) 232 9818
The trouble with the rat-race is that even if you win, you're still a rat.
--------------------------------------------------------------------------

James C. Benz

unread,

Jan 18, 1994, 8:00:17 AM1/18/94

to

In article 0010...@shu.ac.uk, J.R.J...@shu.ac.uk (Jon Jagger) writes:

> It took a LONG time to suss it it out. The 'puter the program ran on was in a
> tall building near a Window facing the docks (it was at a seaside town). Every
> second Wednesday of the month the same ship would dock in the dock. This was a
> big ship, with powerful radar, and that was what caused it.
>

I can't top that, but in the trucking company where I once worked, they complained
that "every once in a while I see a bright flash and the terminal starts printing
Greek letters" - turned out there was an arc welder on the other side of the wall
plugged into the same circuit as the terminal.

Chris Fulmer

unread,

Jan 18, 1994, 9:54:15 AM1/18/94

to

HA!

This thread is getting a little funny. I'm not trying
to bust on anybody, but what apparently has happened is
that somebody mentioned that he was reading an article on
distributed systems that made mention of "Heisenbugs,"
which are essentially non-determanistic bugs. You run
the program once, and it works. The second time it
doesn't. The article was specifically referring to bugs
in distributed systems, which can exhibit non-deterministic
properties, due to the fact that the system runs on networked
processors, and not on a single processor.

From there, everybody started thinking about this new class
of bug, the "Heisenbug," suddenly taking seriously what the
original author had only meant in jest.

And, now, I'm going to make it worse....

There are mechanisms for tracing bugs of this nature in
parallel systems, which can be extended to distributed
systems..... K.C. Tai at NC State developed a method in
which you determine the interesting events on each node,
then keep track of the global ordering of events in the
system. Then, when you find a bug, you make sure that the
same global order of events happens when you try to debug
it. In fact, Tai developed a big chunk of ADA code to do
that sort of thing on a parallel system, using another
process to record events and serialize events. The main
events in question were the sending and receiving of
messages.

The same sort of thing should be extensible to distributed
systems, but I don't think it'd work in the presence of
partial failure, such as node or network failure. You'd
need to write a network serializer process that kept track
of the order of incoming events, and you'd need to put
front-ends around your send() and receive() calls, so that
they wait, until the serializer tells them to go ahead.

--
------------------------------------------------------
Chris Fulmer Bell-Northern Research
Sr. Software Designer Research Triangle, NC
mail to woodstock!ch...@mcnc.org (preferable) or...
chr...@bnr.ca (not so preferable)
Oh yeah. The usual disclaimers apply here.

Kevin D. Quitt

unread,

Jan 18, 1994, 6:37:06 PM1/18/94

to

Thus wrote pes...@arc.umn.edu (Ed Peschko):

>Or any advice on the side on how to deal with them?

Same way as any other bug. Try to find the least required change to tickle
or prevent the bug. Example:

Writing assembly language on the 6502, using NOPs at locations where I
could use the debugging hardware to break. Code worked perfectly,
until I removed a NOP. Replace NOP, code works. Problem was traced
to a bug in the 6502 internal logic. Rockwell's response to my call
pointing out the bug (after the "Oh, my God"), was "we'll have a fix
for you tomorrow". Their fix was an assembler error if the code you
wrote would create the error. Not satisfactory, but functional.

--
#include <standard.disclaimer>
_
Kevin D Quitt 91351-4454 96.37% of all statistics are made up

Piercarlo Grandi

unread,

Jan 19, 1994, 2:01:47 PM1/19/94

to

>>> On Wed, 12 Jan 1994 23:46:23 GMT, pes...@arc.umn.edu (Ed Peschko) said:

Ed> I just read today in an article on distributed programming that
Ed> there are a class of bugs nicknamed 'Heisenbugs': bugs that (they
Ed> said) are due to the 'timing dependance of machines, and which go
Ed> away when debugging or tracing is turned on.'

Ed> They go on to say that 'They are among the most frustrating bugs to
Ed> find' but give no actual advice on HOW TO FIND THEM.

Well, I think, and here is one of my pet/holy peeves, you must first
start to change your thinking *radically*.

The very name "bug" leads you into a completely wrong state of mind:

Ed> Does anyone have sources of articles on how to deal with these
Ed> horrid beasts?

Very antropomorphic. Bug programs and computers are not. Bugs are not
little squishy animals that creep in your code until you find and
exterminate them. Calling them "bugs" is a cop out.

I wish everybody used the right word, _mistake_.

Once one starts using the right word, the frame of mind shifts
considerably.

How do I find where I have made mistakes in my program?
How do I find where I have made time-dependent mistakes in my program?

These are good questions, and the answer then comes immediately springs
to mind:

A mistake is where you either violated some assumption, or where you
actually implemented a different function from the required one.

So discovering and correcting mistakes involves two activities:

1) checking all points where you have made assumptions.
2) verifying that each block of code implements the required function.

This involves nothing harder than liberal sprinkling of 'assert' calls
in your code.

You don't need to put in 'assert' calls everywhere; from the nature of
the problem you can deduce which assumptions are probably being
violated, and/or which functions have not been implemented.

The mistakes that are hardest to discover are violating assumptions of
the underlying language/runtime implementation. They are hard to find
because to deduce where they are you would need to instrument with
'assert' the code of the underlying implementation.

For example, you might have written in C:

static char *temp;
static int linelen;
...
temp = malloc(8);
...
strcpy(temp,"12345678");

then, in most implementations, depending on whether the underlying CPU
architecture is little or big endian, 'linelen' will have its least or most
significant byte set to zero; odds are that if its msB is the one that
gets affected you will never realize it, until you find a line longer
than 16 million characters.

Even worse

{
char temp[8];
...
strcpy(temp,"12345678");
...
}

Will cause stack corruption that will be noticed on a big endian machine
only if the top word of stack frame contains a pointer to an
address greater than 2^24.

Well, the short observation to be made here is that in languages like C
it is damn hard to verify that no underlying constraint of the
implementation is being violated.

In that case tracing and other animation of the program will be
helpful. As another poster observed looking at generated code is often
required too, as often the implementation itself contains bugs.

Finally, as to time dependent mistakes, they are almost invariably the
result of violating some assumption on the time ordering of events. It
helps considerably :-) to make sure you _know_ what are the assumptions
you wrote in your program as to which partial ordering of events is
expected to happen, and 'assert' them too.

As soon as the unanticipated order happens, the 'assert's will catch it.

If introducing the 'assert' themselves changes the spectrum of possible
event orderings, such that the unanticipated ones no longer happen, the
right way to discover the mistake is to make sure you know where all
intended orderings happen, and you verify that only those _can_ happen,
and that they are a subset of the orderings expected in the rest of the
program.

This requires actually _understanding_ the logic you have embedded in
your code, instead of more or less throwing it together and then
"hunting for bugs", but it is invariably quicker than the latter, and
rather more interesting. It also as a rule leads immediately, once the
mistake has been discovered, to its correction, as the violated
assumption normally indicates the right code to substitute.

"Heisenbug" is a nice sound bite; too bad, for all its being just a
word, that it obscures rather than help clarify the issue.

Scott Simpson

unread,

Jan 19, 1994, 4:55:01 PM1/19/94

to

In article <PCG.94Ja...@decb.aber.ac.uk> p...@aber.ac.uk (Piercarlo Grandi) writes:
> Very antropomorphic. Bug programs and computers are not. Bugs are not
> little squishy animals that creep in your code until you find and
> exterminate them. Calling them "bugs" is a cop out.
>
> I wish everybody used the right word, _mistake_.

I think you are being a little pedantic here. So people call them bugs
instead of errors? So what. This is not a capital offense and using
misnomers is not unknown in the English language. Get off your soapbox.

> Once one starts using the right word, the frame of mind shifts
> considerably.
>
> How do I find where I have made mistakes in my program?
> How do I find where I have made time-dependent mistakes in my program?
>
> These are good questions, and the answer then comes immediately springs
> to mind:
>
> A mistake is where you either violated some assumption, or where you
> actually implemented a different function from the required one.
>
> So discovering and correcting mistakes involves two activities:
>
> 1) checking all points where you have made assumptions.
> 2) verifying that each block of code implements the required function.
>
> This involves nothing harder than liberal sprinkling of 'assert' calls
> in your code.

Boy, it sounds so simple when you state it like that. Just sprinkle
some assert calls. This reminds me of George Polya's famous book "How
to Solve It" which was real popular when it came out in the 40s or 50s.
It basically gave a number of steps like 1) Get a good specification
of the problem 2) Look at other implementations 3) Construct an
implementation. (I'm paraphrasing.) This is all well and good but it
didn't give you any more strategies than people had been using all
along. There is no silver bullet.

Kevin D. Quitt

unread,

Jan 19, 1994, 2:12:05 PM1/19/94

to

Thus wrote pes...@arc.umn.edu (Ed Peschko):

>I just read today in an article on distributed programming that there are a
>class of bugs nicknamed 'Heisenbugs': bugs that (they said) are due to the
>'timing dependance of machines, and which go away when debugging or tracing
>is turned on.'

There are analogous situations with hardware, whereby the application
of a scope probe is enough to prevent the problem. In that case,
however, the solution is simple: you check the capacitance of the
probe (typically 10-15pf) and put the equivalent capacitor across the
contact point and ground. It's ugly, but it works.

Ben Last

unread,

Jan 20, 1994, 3:38:38 AM1/20/94

to

In article <PCG.94Ja...@decb.aber.ac.uk> p...@aber.ac.uk writes:

>Well, I think, and here is one of my pet/holy peeves, you must first
>start to change your thinking *radically*.
>
>The very name "bug" leads you into a completely wrong state of mind:

>I wish everybody used the right word, _mistake_.

Here we go again, the old "call them defects and all will be well" heresy.
Let's clarify this; "bug" leads *you* into a wrong state of mind. Some of
us know perfectly well that there are no little creatures in the code.

>Once one starts using the right word, the frame of mind shifts
>considerably.

> How do I find where I have made mistakes in my program?
> How do I find where I have made time-dependent mistakes in my program?

How about time-dependent mistakes in the operating system? *I* didn't
make them, and, in fact, they may not be mistakes on anyone's part, just
previously undocumented interactions between functioning systems. In that
case, "mistake" is inappropriate.

>This involves nothing harder than liberal sprinkling of 'assert' calls
>in your code.

Oh, so that's all we need to do. That means that, at runtime, on site,
when the code goes wrong it'll exit with an assert message rather than
crashing with a stack dump. I'm sure the average user will be much happier
with that.

>You don't need to put in 'assert' calls everywhere; from the nature of
>the problem you can deduce which assumptions are probably being
>violated, and/or which functions have not been implemented.

Yeah, and the nature of heisenbugs is such that there will always be one
where adding the assert call makes the bug disappear.

[munched a lot of stuff about classic C bugs, none of which are heisenbugs]

>"Heisenbug" is a nice sound bite; too bad, for all its being just a
>word, that it obscures rather than help clarify the issue.

"Object", "structured" and "defect" are also words. The meaning of words
is context dependent; the word "while" has a different (but related meaning)
in C than in English. Understanding the semantics, as you point out, is what
matters. Debates about the associations of words don't help.

--
--------------------------------------------------------------------------
| Ben Last, Fisons Instruments | (All postings are personal, and not the |
| b...@vgdata.demon.co.uk | opinions of Fisons Instruments) |
--------------------------------------------------------------------------

Alan Dyke

unread,

Jan 20, 1994, 4:39:08 AM1/20/94

to

sim...@confucius.tfs.com (Scott Simpson) wrote:

>In article <PCG.94Ja...@decb.aber.ac.uk> p...@aber.ac.uk (Piercarlo Grandi) writes:
>> I wish everybody used the right word, _mistake_.

>> A mistake is where you either violated some assumption, or where you
>> actually implemented a different function from the required one.
>>
>> So discovering and correcting mistakes involves two activities:
>>
>> 1) checking all points where you have made assumptions.
>> 2) verifying that each block of code implements the required function.
>>
>> This involves nothing harder than liberal sprinkling of 'assert' calls
>> in your code.
>
>Boy, it sounds so simple when you state it like that. Just sprinkle
>some assert calls. This reminds me of George Polya's famous book "How
>to Solve It" which was real popular when it came out in the 40s or 50s.
>It basically gave a number of steps like 1) Get a good specification
>of the problem 2) Look at other implementations 3) Construct an
>implementation. (I'm paraphrasing.) This is all well and good but it
>didn't give you any more strategies than people had been using all
>along. There is no silver bullet.

There might be no silver bullet, but one approach Brooks does recommend
in his paper `No silver bullet' is recognising and nurturing skilled
professionals. To that end Piercarlo's comments which seem to
recommend the adoption of some professionalism appear to be entirely in
order. It might not be a `Silver Bullet' but the author of the
original paper considers it the way forward and I concord.

,-----------------------------------------------------------------------------.
|Alan Dyke cac9...@cs.paisley.ac.uk | That is the problem with you liberal|
|Department of Computing and Information| academics, you can see every ones |
|Science, Paisley University, Scotland. | point of view and have none of your |
|#include <std_disclaimer> | own - Thatcher. Damn Right - Me. |
`-----------------------------------------------------------------------------'

m...@mole-end.matawan.nj.us

unread,

Jan 20, 1994, 3:19:43 AM1/20/94

to

In article <PCG.94Ja...@decb.aber.ac.uk>, p...@aber.ac.uk (Piercarlo Grandi) writes:
> >>> On Wed, 12 Jan 1994 23:46:23 GMT, pes...@arc.umn.edu (Ed Peschko) said:

> Ed> I just read today in an article on distributed programming that

> Ed> there are a class of bugs nicknamed 'Heisenbugs': ...

> The very name "bug" leads you into a completely wrong state of mind:

> Ed> Does anyone have sources of articles on how to deal with these
> Ed> horrid beasts?

> Very antropomorphic. Bug programs and computers are not. ...
...

> I wish everybody used the right word, _mistake_.

...

> How do I find where I have made mistakes in my program?
> How do I find where I have made time-dependent mistakes in my program?

> ... and the answer then comes immediately springs to mind:

> A mistake is where you either violated some assumption, or where you
> actually implemented a different function from the required one.
> So discovering and correcting mistakes involves two activities:

> 1) checking all points where you have made assumptions.
> 2) verifying that each block of code implements the required function.

In practical and actual fact, this tends to be an uneconomical way to solve
the problem.

If a suspicious character is seen in Jones' yard, you don't comb the city
examining everybody you find, determining if they are suspicious and then
determining if they might have been in Jones' yard. You go first to the
yard of Jones. Nor, if someone has been stealing turnips from Jones'
vegetable patch, do you intern everyone in the city until the days' meal
has passed from their digestive tract so you can examine it for traces of
turnip.

Rounding up the usual suspects is an expensive proposition.

...
> ... from the nature of the problem you can deduce which assumptions are

> probably being violated, and/or which functions have not been implemented.

Ah, so now you are back to viewing the _symptoms_ of the problem. Yes, we
must consider the symptoms and the _behavior_ exhibited by the symptoms.

...

> "Heisenbug" is a nice sound bite; too bad, for all its being just a
> word, that it obscures rather than help clarify the issue.

That depends. It's not a good description of the nature of the cause of
the observed symptoms, but it's a striking--and therefore effective--
description of the behavior of the symptoms themselves.

To quote Stroustrup, ``Design and programming are human activities;
forget that and all is lost.''

I'll admit Heisenbugs and all the others, too, so long as admitting them
tells me something about them.
--
(This man's opinions are his own.)
From mole-end Mark Terribile
m...@mole-end.matawan.nj.us, Somewhere in Matawan, NJ
(Training and consulting in C, C++, UNIX, etc.)

DJ MACQUARRIE

unread,

Jan 20, 1994, 11:55:33 AM1/20/94

to

: In article <PCG.94Ja...@decb.aber.ac.uk> p...@aber.ac.uk writes:

: >Well, I think, and here is one of my pet/holy peeves, you must first
: >start to change your thinking *radically*.
: >
: >The very name "bug" leads you into a completely wrong state of mind:
: >I wish everybody used the right word, _mistake_.

I'll make you a deal: I'll stop using "bug" when all you touch-tone users
out there stop using "dial the number".

Do you use "run" for "execute"? "Crash" for "abnormal program termination"?
If so, perhaps you're being led into a wrong state of mind? :^)

Just my $0.0158 worth (0.02 Canadian)....

Don

...coming soon: Borland Turbo DeMistaker for Windows!

Kevin A. Archie

unread,

Jan 20, 1994, 12:41:21 PM1/20/94

to

p...@aber.ac.uk (Piercarlo Grandi) writes:

>>>> On Wed, 12 Jan 1994 23:46:23 GMT, pes...@arc.umn.edu (Ed Peschko) said:

>Ed> I just read today in an article on distributed programming that
>Ed> there are a class of bugs nicknamed 'Heisenbugs': bugs that (they
>Ed> said) are due to the 'timing dependance of machines, and which go
>Ed> away when debugging or tracing is turned on.'

>Ed> They go on to say that 'They are among the most frustrating bugs to
>Ed> find' but give no actual advice on HOW TO FIND THEM.

>Well, I think, and here is one of my pet/holy peeves, you must first
>start to change your thinking *radically*.

>The very name "bug" leads you into a completely wrong state of mind:

Call them Programmer-Induced Unintentional Functionality Enhancements.
See? No more bugs.

> [Advocacy of formal methods and assert() macro deleted...]

That's it! Just prove your programs correct, and there will be
no bugs in them. Why, I prove all of my programs correct. Watch
for my provably correct word processor, which should be available in
the mid 23rd century.

Okay, that's unfair. I'm actually a fan of formal methods, and I
make extensive use of the assert() macro. I find that structuring
my work this way helps me to think critically about what I'm doing,
and that I end up saving time in the long run if I spend a little
extra time being pedantic when I start. But given (a) limited
resources, (b) development deadlines, and (c) limited attention span,
bugs inevitably creep in.

Heisenbugs are not really restricted to distributed systems.
I've seen a number of crashes that went away when the program
was run in the debugger. Approaches that I've generally found
useful:

1. Recognize this as a variant of another bug you fixed a while ago.
(This is the big one. The only way to get better at programming
is to program. There is no substitute for experience.)

2. Use lint. No, really, I mean it. If you write your code with
the intent of running lint on it every so often, it's not hard to
write it so that it generates a minimum of meaningless messages.
(Obviously, you can replace 'lint' with 'the verbose warning
message-generating mode of your compiler.)

3. Find someone else to look at your code, since you've probably
been staring at it too long and have convinced yourself it's
correct. This works best if the person you consult can recognize
it as a bug she fixed a while ago - see approach (1) above.

Other things I often do include: checking the values of all
pointers around the problem area of code (the really elusive
bugs are rather often due to subtle corruption of the stack),
building the program on a different platform or with a different
compiler (not because you suspect the OS or compiler is at fault,
but rather because Heisenbugs often manifest themselves differently
on different platforms), and looking at the generated assembler
to see if the code looks reasonable (you don't really need to know
the assembler of the underlying machine, you just need a general
sense for what assembler looks like.)

Hope some of this proves useful.

- Kevin

lor...@fnalo.fnal.gov

unread,

Jan 20, 1994, 4:43:46 PM1/20/94

to

In article <2hmfo1$b...@gap.cco.caltech.edu>, kar...@cco.caltech.edu (Kevin A. Archie) writes:
> That's it! Just prove your programs correct, and there will be
> no bugs in them. Why, I prove all of my programs correct. Watch
> for my provably correct word processor, which should be available in
> the mid 23rd century.

Wasn't D.E.Knuth that said something like "I have proved that
program correct, but I did not actually run it - so, beware of
bugs" ? Does somebody remember the correct quotation ?

From Italy, Maurizio

Heinz Wrobel

unread,

Jan 20, 1994, 4:09:02 PM1/20/94

to

Piercarlo Grandi (p...@aber.ac.uk) wrote:
: I wish everybody used the right word, _mistake_.

Most but not all Heisbugs are caused by somebodies mistakes, some are
caused by e.g. dying HW which can't be referred to as somebodies mistake.

I find the the term "bug" not that inappropriate unless you do scientific
reasearch on the topic. It groups in a humorous way all the problems one
has to cope with and hits right on the mark in most cases. After all, you
say "car" and not "vehicle with usually four wheels and often a roof
powered typically by an internal combustion engine, steering wheel on the
left side unless british and not exported."

What do you do: "bug hunting" or "correcting mistakes"? Let's face it, the
second expression is often a euphemism. But that is a different topic.

: How do I find where I have made mistakes in my program?
^
: How do I find where I have made time-dependent mistakes in my program?
^

Hmm. While nobody is perfect, I find this rather narrow minded. Except for
pointer messups in C on my side, the Heisenbugs I've come across were
hardly ever caused by _me_. If you mean by saying "mistakes" only "your own
mistakes", you are limiting your posting in an extreme way. Bugs of the OS,
compiler, library, and the HW have some part in it.

: So discovering and correcting mistakes involves two activities:

: 1) checking all points where you have made assumptions.
: 2) verifying that each block of code implements the required function.

Well, this might not be that easy. The very nature of Heisenbugs (I really
like that term :-) is that they may _or_may_not_ appear. How can you test
then, that a block of code does the right thing? You can't, because the bug
might appear or it might not appear. So who is to say that some code is ok?
The code might be ok, but the compiler might produce trash only under
certain environmental circumstances. Or the linker messes up because of
???. Or two bits of your ram are defect and neither your standard system
check nor the HW parity check finds it for some reason. Have fun.

: This involves nothing harder than liberal sprinkling of 'assert' calls
: in your code.

Well, this reminds me of the saying that you can debug everything by
inserting printf's (to stay with C terminology). I find this rather
simplistic and a dangerous attitude, because it sounds like you are saying:
"Hey, I use assert calls. Nothing can ever happen to me.' Surely, you don't
want to give that impression.

: You don't need to put in 'assert' calls everywhere; from the nature of
^^^^^^^^^
: the problem you can deduce which assumptions are probably being
^^^^^^^^^^^
: violated, and/or which functions have not been implemented.

The very term Heisenbug tells you something. There might not be a "nature
of the problem". If there were, it wouldn't be a Heisenbug, but a simple
mistake to find and correct.

Example 1: Pointer messup in C

Once upon a time ;^) I did a MS-DOS piece of commercial SW that did
"lots" of things. It had a nice standard GUI, the application code,
supporting code like file requester code based on the GUI engine etc.
The only problem with it was that it crashed _once_in_a_while_ in
random places. I never found any reproducable example. But I found the
bug. It all came down to one random byte of memory being trashed by a
messed up pointer in the file requester code. But the effect of this
trashed byte did not show up immediatly.

Due to the event based GUI engine code there was no way to have a
reproducable example because there was no reasonable way to record and
reproduce the same events in the same time frame with the same clock
values and the same stack frames and the same memory usage and by this
the same trashed byte ... For this bug not even a memory allocation and
pointer overrun checker library helped, because the messed up pointer
could have hit anything in valid or invalid areas. An 'assert' like
call might or might not help depending on which byte is trashed.

My conclusion: "crashes/misbehaves once in a while in random places" is a
problem, but it tells you nothing valuable about the "nature
of the problem". A typical Heisenbug.

: The mistakes that are hardest to discover are violating assumptions of

: the underlying language/runtime implementation. They are hard to find

Depends. Below you describe two cases of pointer overruns in C. These are
actually rather easy to avoid and mostly easy to find. The typical case of
the "start with 0 or 1" or "n + 1" problem is found by memory overrun
checker libs or tools. And nowadays they come with pretty much any
professional compiler for a language where lots of things are done with
pointers like C. On my system at home the standard tools I use constantly
are called "Enforcer" and "Mungwall" They check memory accesses to
obviously illegal areas and pointer overruns by placing a "wall" area
around any allocation that can be tested. And they "munge" uninitialized
and freed memory to make wrong assumptions about memory contents visible.
On other systems they have other names and similar functionality. But they
are not a 100% cure, just more help. I haven't had an uncaught pointer
overrun for quite some time now.

: For example, you might have written in C:

: static char *temp;
: static int linelen;
: ...
: temp = malloc(8);
: ...
: strcpy(temp,"12345678");

: then, in most implementations, depending on whether the underlying CPU
: architecture is little or big endian, 'linelen' will have its least or most
: significant byte set to zero; odds are that if its msB is the one that

Nope. Not linelen but the memory beyond the allocation. And depending on
the heap allocation system you might never even see this problem if e.g.
all allocations are in multiples of 32 bytes no matter what the requested
size is. A memcheck lib to replace malloc with a function that does a
"wall" type check will immediatly point you to this error as soon as you
run the actual overrun check before exiting the debugged application. No
true Heisenbug.

: Even worse

: Will cause stack corruption that will be noticed on a big endian machine

: only if the top word of stack frame contains a pointer to an
: address greater than 2^24.

This is of course true. Pointer overruns in stack frames can be nasty and
will happen to anyone once in a while (assuming a pointer oriented
language). But as this kind of bug _always_ trashes that one byte it will
not be that hard to locate. As it is always the same byte of the same
variable or address relative in the stack frame that is trashed, you will
pretty soon find the cause. It is usually not manifesting itself as a
randomly occuring problem, so I wouldn't call this type of mistake a
typical cause of Heisenbugs.

The "n+1" or "trailing 0 byte" problem is probably the best known problem
for C. It still happens once in while that one forgets about it, but if you
overlook this constantly, you haven't done your homework. Everytime I write
"malloc", a little light comes up in my head ... :-)

: Well, the short observation to be made here is that in languages like C

: it is damn hard to verify that no underlying constraint of the
: implementation is being violated.

True. But not that much of problem if you have a "defensive" attitude while
programming. At least it is now seldom a cause for Heisenbugs here. I've
learned my lesson with the example I described above. Almost any language
has its pitfalls. For C the most prominent is incorrect use of pointers.
Anybody doing serious work in any language should at least have a basic
knowledge of the common pitfalls.

: helpful. As another poster observed looking at generated code is often

: required too, as often the implementation itself contains bugs.

Sometimes even this doesn't help because the bug might be hidden somewhere
in library or compiler internals.

Example 2: What do you think of this BASIC code:

a$ = UCASE$(COMMAND$)

Not that bad, isn't it? What would you think if I told you that I had to
use a compiler where this code compiled with certain options runs exactly
26 times in a row and then the SW dies? What if this piece of code is not
used in a loop but only depending on certain events some of which may not
be caused by the operator? Finding a reproducable case is sometimes like
finding the answer to life, the universe, and everything. BTW, the
workaround was a

temp$ = COMMAND$
a$ = UCASE$(temp$)

Ugly problem, isn't it? And any 'assert' like call doesn't help a bit.

: Finally, as to time dependent mistakes, they are almost invariably the

: result of violating some assumption on the time ordering of events. It

Concentrating on the word "ordering", let me call this time ordering
problem "sync problem" for now.

Not all of time related mistakes are sync problems. Staying with simple
examples, take a home computer or ibm compatible running MS-DOS. A nice
time related problem for applications that run in some way continously can
be unexpected stack overruns. This does not necessaryily cause a crash on
machines where the stack is not "memory protected". But it can lead to all
kinds of effects from "100% working" to "crash 30 minutes later in some
other code".

True sync problems pop up mostly when you have some kind of a state machine
in the works and e.g. most event based SW classifies for his, I'd say. But
in this case debugging does not obscure the bug because it only alters the
time scale of response to your queue of events. It does not alter the order
of occurence which causes the bug. So you will still find it by tracing.
For e.g. a combination of standard code and an interrupt handler this
sequential model does not apply if you don't "queue" the interrupt events
in some way but e.g. change a value immediately that is used by the
application code at the "same" time. This can be nasty for "multi access
values" like values that can be set or retrieved only by two or more CPU
accesses. But, frankly, anyone not using some sort of semaphore protection
for this type of stunt deserves no better IMHO.

Time related "true" Heisenbugs are IMHO probably more in the "lost events"
or "wrong events" department which kill your state machine. If you have
this e.g. in a hard realtime environment, you can only hope that your logic
analyzer catches the circumstances somehow.

An example I've encountered was a streamer connected to a WD chip based
controller where the driver was confused to death by an incorrect event
produced by the WD chip due to a strange timing problem with the streamer.

You might even see true time related Heisenbugs it in the "slow & fat" Unix
box next door. Just think of some obscure filesystem problem which lets
your 10MB application wait forever for some file access maybe because some
event got lost and the retry code just doesn't feel like getting out of bed
today. If this happens randomly once a week, you have a Heisenbug. But how
do you know it is _not_ your application? Only shrewed guesses while
debugging can help here.

What if a crash occurs only all Thursday nights at 10? You can tear your
hair out until you find that your neighbor turns on his 1.21 GWatt stereo
in his basement which causes a spike on the power line that the power
supply of your computer simply can't take. Another Heisenbug. And no
'assertion' or debugging whatsoever can help.

: This requires actually _understanding_ the logic you have embedded in

: your code, instead of more or less throwing it together and then

Well, shouldn't one understand the code anyway. ;^) :-)

: "Heisenbug" is a nice sound bite; too bad, for all its being just a

: word, that it obscures rather than help clarify the issue.

I don't think so. Once you have the "nature of the problem", you no longer
have a Heisenbug. Because if you have it nailed down in at least a
reproducable example, it has become a "known problem" to find. As long as a
problem pops up for no obvious reason at any time, you have a Heisenbug.

And Heisenbugs (by definition?) don't have a "nature". While they are in
fact manifestations of "true" bugs, these might not have any obvious
relation to the Heisenbug effect they cause. The problem is that Heisenbugs
are to 99% not the "primary" problem you get to see, but a secondary effect
of a "true" bug that could be hidden potentially a continent and a
millenium away. And for this there is IMHO no systematic cure except for
warped thinking in the testing/debugging phase and of the guys doing QA. If
there was any straightforward approach, there would not be any Heisenbugs.

Ever found the cause of a Heisenbug or the way to mail it down while taking
a shower or watching a movie? That is what I call warped thinking. And
unfortunately no knowledge canned in a book can help you there other than
the constant reminder of existing "standard" mistakes and maybe a
discussion of possible effects by showing examples. There can't be more
than "the national Heisenbug awareness week". ;^)

Assertions are part of defensive programming, but there are limits to their
effectiveness. _Random_ memory trashing might be the smallest example to
point out the problems of assertions not catching the problem.

--
Heinz Wrobel Edotronik GmbH: he...@edohwg.adsp.sub.org
Private Mail: he...@hwg.muc.de
My private FAX: +49 89 850 51 25, I prefer email

John William Chambless

unread,

Jan 21, 1994, 10:10:59 AM1/21/94

to

In article <SIMPSON.94...@confucius.tfs.com>,

Scott Simpson <sim...@confucius.tfs.com> wrote:
>
>Boy, it sounds so simple when you state it like that. Just sprinkle
>some assert calls. This reminds me of George Polya's famous book "How
>to Solve It" which was real popular when it came out in the 40s or 50s.
>It basically gave a number of steps like 1) Get a good specification
>of the problem 2) Look at other implementations 3) Construct an
>implementation. (I'm paraphrasing.) This is all well and good but it
>didn't give you any more strategies than people had been using all
>along. There is no silver bullet.

Yes, it's about like the famous Feynman Problem-Solving Algorithm:

Step 1: Write down the problem.
Step 2: Go away and think real hard about the problem.
Step 3: Come back and write down the answer.

The earlier poster _did_ have a valid point in a sense though.

For _some_ people, the "bug" mentality can lead to voodoo debugging;
others see things differently.

I worked in a university computing center for a couple semesters;
you'd be amazed how many students don't grasp the simple fact that
if your program doesn't do what you think it should, you probably didn't
write it correctly.

In my experience, there aree two kinds of "bugs":

1: I did something stupid. ( 99% of all bugs)
2: The compiler/library/whatever writer did something stupid.

Either way, the solution is:

Analyse the problem and change MY code.
--
* Billy Chambless University of Southern Mississippi
* "If you lie to the compiler, it will get its revenge." Henry Spencer

Ivan D. Reid

unread,

Jan 21, 1994, 11:01:00 AM1/21/94

to

In article <2hora3$1r...@whale.st.usm.edu>,
cham...@whale.st.usm.edu (John William Chambless) writes...

>In my experience, there aree two kinds of "bugs":

>1: I did something stupid. ( 99% of all bugs)
>2: The compiler/library/whatever writer did something stupid.

>Either way, the solution is:

>Analyse the problem and change MY code.

I have a notice on my office wall that I paraphrased from an item in
one of Jon Bentley's books:

o Of all my programming bugs, 80% are typing errors.
o Of the remaining 20%, 80% are syntax errors.
o Of the remaining 4%, 80% are trivial logic errors.
o Of the remaining 0.8%, 80% are pointer errors.
o And the rest are bloody hard!

Ivan Reid, Paul Scherrer Institute, CH. iv...@cvax.psi.ch

Alan Christiansen

unread,

Jan 21, 1994, 10:28:21 AM1/21/94

to

Will Duquette <wi...@solstice.jpl.nasa.gov> writes:

>In article <1994Jan13.1...@adobe.com> Carl Orthlieb,
>orth...@adobe.com writes:
>>Bohr bug /bohr buhg/ from quantum physics n. A repeatable bug; one that
>
>>manifests reliably under a possibly unknown but well-defined set of
>>conditions. Antonym of heisenbug; see also mandelbug, schroedinbug.

>A friend of mine once encountered what I guess you'd have to call a
>mutating
>Bohr bug. His program was behaving badly, and after a long day's work,
>he had
>succeeded in coming up with a sequence of commands that would produce the
>bug reliably. It was late, so he went home, confident that he could fix
>the bug the next day.

>The next day, the bug had disappeared; the fatal sequence of commands
>worked
>perfectly. Puzzled, he resumed work, and soon encountered another bug.
>After a long day's work, he had succeeded in coming up with a sequence of
>commands that would produce the bug reliably. It was late, so he went
>home,
>confident that he could fix the bug the next day.....

>Eventually, he discovered that he had an uninitialized pointer--which
>reliably pointed at an address in memory that held the current day of the
>week. So every day, something different happened.

I like this so much that the program I write will have one these
in it on purpose. :) Then its a feature :)

Alan

--
| This space was intentionally left blank,
| until some silly included a self descriptive
| self referential self referential self ...
| ... Stack overflow. Executing cleanup rm *.*

Eddie Wyatt

unread,

Jan 21, 1994, 4:55:29 PM1/21/94

to

An interest type of bugs I've run into in my younger days, would
be with multi-tasking systems were one process would fork/exec another
process off and the parent continues to execute normal flow of
control. When the child process exits - the parent receives a
SIGCHILD (Unix environment here) that would occur at random time
intervals. Sometimes they would occur within other system calls
that didn't like being interupted and hence would randomly fail.

Eddie Wyatt
--

Eddie Wyatt
e...@picker.com

Lezz Giles

unread,

Jan 20, 1994, 1:05:27 PM1/20/94

to

In article <PCG.94Ja...@decb.aber.ac.uk>, p...@aber.ac.uk (Piercarlo Grandi) writes:

I'm glad other people slammed this guy too... I'm now going to add my
$.02

|>>>> On Wed, 12 Jan 1994 23:46:23 GMT, pes...@arc.umn.edu (Ed Peschko) said:
|>
|>Ed> I just read today in an article on distributed programming that
|>Ed> there are a class of bugs nicknamed 'Heisenbugs': bugs that (they
|>Ed> said) are due to the 'timing dependance of machines, and which go
|>Ed> away when debugging or tracing is turned on.'
|>
|>Ed> They go on to say that 'They are among the most frustrating bugs to
|>Ed> find' but give no actual advice on HOW TO FIND THEM.
|>
|>Well, I think, and here is one of my pet/holy peeves, you must first
|>start to change your thinking *radically*.
|>

Well, first of all I'd like to know what is wrong with this way of thinking.
I've been thinking of them as "bugs" for all my professional life, and I've
had no significant problems tracking the little bastards down and squashing
them. Would my life have been easier if I'd thought of them as "mistakes"
instead? If a way of thinking works then why change it to another way of
thinking that may work as well, but almost certainly won't work better? In
particular, your way of thinking won't work for Heisenbugs because, as other
posters have pointed out, you don't deal with Heisenbugs in the rest of your
posting - you just deal with the normal, common or garden variety of bug
(pun intended).

You missed a few.... How do I find where the compiler writer has made
mistakes? How do I find where the assembler/linker writer has made mistakes?
How do I find where the hardware manufacturer has made mistakes (who remembers
the 6502 bug which didn't handle 16-bit values that straddled the end of a
page properly?)? How do I find where the OS writer has made mistakes
(i.e. where the underlying OS doesn't follow documented behaviour, which
could also (of course) be a problem with the documentation - I had a bug in
a piece of code that I was porting to a Sun which depended on being
able to call realloc() with a NULL pointer (which, according to ANSI, should
behave like malloc()), but the Sun ANSI compiler is broken and core dumps
instead)?

Impossible. We make assumptions all the time about everything. You cannot
check _all_ points where you have made assumptions. If you do this then
you'll spend all your time checking assumptions and no time at all coding.
For example, this would mean that I should check that every POSIX function
that I call in my code is OK on the OS that I want to port my code to, when
I have a bug. It is _much_ more efficient to narrow the search down to,
for example, investigating the behaviour of just one possible culprit.

One assumption that you almost always take for granted, but which can
easily be the cause of a bug, is that the compiler does its job properly.

|> 2) verifying that each block of code implements the required function.

Impossible. I did a course at University on program verification. It took
about a hundred lines of formal derivation to prove that a 5-line program
did what we expected it to. In addition it can be proven that such a derivation
can not always be generated by a machine - there is no general algorithm to
formally verify a piece of code. In addition, before you can verify a piece
of code, you have to verify the compiler. And before you can verify the
compiler you must verify the underlying hardware/cpu. Don't laugh - people
who work for the DoD and other like-minded organizations are spending money
on trying to do this. As far as I know they haven't come close to succeeding
yet for anything more than the most trivial programs.

|>This involves nothing harder than liberal sprinkling of 'assert' calls
|>in your code.

But the nature of Heisenbugs is that the simple act of observing them
changes them. Lots of asserts are a Good Thing if (a) you can turn them
off globally, and (b) you can make damn sure that your customer will never
see an assert fire, and (c) performance isn't too important. In my most
important piece of code I have multiple levels of informationals that I can
select at run time - I can trace everything going on in my code and pinpoint
exactly where a problem arises. However this doesn't prevent Heisenbugs from
rearing their ugly heads (this particular piece of code is in Perl, which
seems to be a fertile breeding ground for this kind of bug), and as soon
as I try to find out precisely _which_line_ is causing the core dump by
adding just one more informational or even by using the debugger, the
problem disappears.

|>You don't need to put in 'assert' calls everywhere; from the nature of
|>the problem you can deduce which assumptions are probably being
|>violated, and/or which functions have not been implemented.

A sensible observation, but still not enough to catch Heisenbugs.

|>Finally, as to time dependent mistakes, they are almost invariably the
|>result of violating some assumption on the time ordering of events. It
|>helps considerably :-) to make sure you _know_ what are the assumptions
|>you wrote in your program as to which partial ordering of events is
|>expected to happen, and 'assert' them too.
|>
|>As soon as the unanticipated order happens, the 'assert's will catch it.
|>
|>If introducing the 'assert' themselves changes the spectrum of possible
|>event orderings, such that the unanticipated ones no longer happen, the
|>right way to discover the mistake is to make sure you know where all
|>intended orderings happen, and you verify that only those _can_ happen,
|>and that they are a subset of the orderings expected in the rest of the
|>program.
|>
|>This requires actually _understanding_ the logic you have embedded in
|>your code, instead of more or less throwing it together and then
|>"hunting for bugs", but it is invariably quicker than the latter, and
|>rather more interesting. It also as a rule leads immediately, once the
|>mistake has been discovered, to its correction, as the violated
|>assumption normally indicates the right code to substitute.

Have you ever programmed a real-time system? With multiple processes
just a time-slice away from each other? How about a multi-threaded
system? Or even just a simple system with a few communicating
extended FSMs? In any but the most trivial case you can't even begin
to list "all intended orderings", mainly because such orderings can
typically be defined using a grammar and the number of valid strings
that can be generated from such a grammar is so large as to be, for
all practical purposes, infinite.

|>
|>"Heisenbug" is a nice sound bite; too bad, for all its being just a
|>word, that it obscures rather than help clarify the issue.
|>

"Heisenbug" does describe a class of bug, and it is worth-while discussing
techniques and tricks for catching them.

Lezz

Craig Cockburn

unread,

Jan 21, 1994, 8:18:02 PM1/21/94

to

> In article <PCG.94Ja...@decb.aber.ac.uk>, p...@aber.ac.uk (Piercarlo
> Grandi) writes:
>
> |>Ed> Does anyone have sources of articles on how to deal with these
> |>Ed> horrid beasts?
> |>
> |>Very antropomorphic. Bug programs and computers are not. Bugs are not
> |>little squishy animals that creep in your code until you find and
> |>exterminate them. Calling them "bugs" is a cop out.
> |>

But that's how the term originated for computer programs. A dead insect
(ie a bug) was caught in a relay causing a program to malfunction. The
term "bug" was coined by Admiral Grace Hopper.

--
Craig Cockburn, pronounced "coburn" Email: cr...@scot.demon.co.uk
M.Sc. Student, Napier University, Edinburgh, Scotland
Sgri\obh thugam 'sa Ga\idhlig ma 'se do thoil e.

John Gordon

unread,

Jan 22, 1994, 4:21:25 PM1/22/94

to

cr...@scot.demon.co.uk (Craig Cockburn) writes:

>But that's how the term originated for computer programs. A dead insect
>(ie a bug) was caught in a relay causing a program to malfunction. The
>term "bug" was coined by Admiral Grace Hopper.

Actually, that's not true. The term 'bug' was in use before the
famed moth incident. Check out _The New Hacker's Dictionary_ for info on
this and many other hacker-type stories. A great book.

---
John Gordon My incredibly witty saying has been
gor...@osiris.cso.uiuc.edu Politically Corrected into oblivion.

Piercarlo Grandi

unread,

Jan 22, 1994, 6:32:18 PM1/22/94

to

>>> On 20 Jan 94 15:43:46 -0600, lor...@fnalo.fnal.gov said:

loreti> In article <2hmfo1$b...@gap.cco.caltech.edu>,

loreti> kar...@cco.caltech.edu (Kevin A. Archie) writes:

>> That's it! Just prove your programs correct, and there will be no
>> bugs in them. Why, I prove all of my programs correct. Watch for my
>> provably correct word processor, which should be available in the mid
>> 23rd century.

This is a silly straw man! Program verification is a chimera for now,
despite som claims to the contrary I have seen recently;
_corroboration_, that is _knowing_ (and being able to show to others)
what your program ought to do and therefore being then able to quickly
realize what it is doing instead when you have made a mistake, is rather
more easily attainable.

loreti> Wasn't D.E.Knuth that said something like "I have proved that
loreti> program correct, but I did not actually run it - so, beware of
loreti> bugs" ? Does somebody remember the correct quotation ?

Almost :-). It was E.W.Dijkstra that wrote in "Discipline of Programming"
something like "all the programs written in this books have been
corroborated by rigorous reasoning; of course none has ever ben run on a
computer".

The point being not that he is snotty (he is!) but that one ought to be
confident about a program text _before_ executing it.

Execution is just an _incidental_ step in a program's life; programs are
meant mainly to _communicate_ (to be read), execution being a useful
consequence of what happens when the reader is a compiler/loader.

This is the Algol/"European" view of programming (descriptive,
algorithmic language, mistakes); the American/"PL/1" view (prescriptive,
programming language, bugs) is that programs are meant _essentially_ to
have effects (to be executed), communication being an incidental step to
that end. Thus completely different emphasis on "I know it will work"
vs. "I hacked it until it worked".

Piercarlo Grandi

unread,

Jan 22, 1994, 6:42:42 PM1/22/94

to

>>> On 21 Jan 1994 09:10:59 -0600, cham...@whale.st.usm.edu (John
>>> William Chambless) said:

John> In article <SIMPSON.94...@confucius.tfs.com>,
John> Scott Simpson <sim...@confucius.tfs.com> wrote:

Scott> Boy, it sounds so simple when you state it like that. Just
Scott> sprinkle some assert calls. This reminds me of George Polya's
Scott> famous book "How to Solve It" which was real popular when it came
Scott> out in the 40s or 50s. It basically gave a number of steps like
Scott> 1) Get a good specification of the problem 2) Look at other
Scott> implementations 3) Construct an implementation. (I'm
Scott> paraphrasing.) This is all well and good but it didn't give you
Scott> any more strategies than people had been using all along. There
Scott> is no silver bullet.

John> Yes, it's about like the famous Feynman Problem-Solving Algorithm:

John> Step 1: Write down the problem.
John> Step 2: Go away and think real hard about the problem.
John> Step 3: Come back and write down the answer.

Now, now, These are straw men too. The strategy is there: one must
_know_ _explicitly_ what kind of function each block of code is supposed
to compute, and then check it is computed. It is actually a very
specific strategy: check that assumptions are not being violated
(including the assumptions that the hw is not faulty and the execution
environemnt is not itself riddled with mistakes); check that code
compues the intended functions.

This can be done by wide and sparse checking first, and then narrowing.
The strategy is to obtain an assertion violation, and then strengthen
(narrow) it, and from the specific violation deduce what is going wrong
(that's the particular value of _knowing_ which functions ought to be
computed and checking for them; the violation does not just tell you
that "something is wrong", it will usually tell you a lot about the
mistake).

For time dependent problems, make sure that you know _all_ possible
event orderings (e.g. by suitably constraining them if they are too
many), and that your code can cope with all of them. More specific
strategies than this... require a book, not a news article.

Mark Brader

unread,

Jan 24, 1994, 1:01:04 AM1/24/94

to

> The very name "bug" leads you into a completely wrong state of mind

> ... I wish everybody used the right word, _mistake_.

Don't be silly.
--
Mark Brader "Anyone who can handle a needle convincingly
SoftQuad Inc., Toronto can make us see a thread which is not there."
utzoo!sq!msb, m...@sq.com -- E. H. Gombrich

James C. Benz

unread,

Jan 24, 1994, 11:00:24 AM1/24/94

to

In article a...@cs6.rmc.ca, sm0...@d9.rmc.ca (DJ MACQUARRIE) writes:
> : In article <PCG.94Ja...@decb.aber.ac.uk> p...@aber.ac.uk writes:

> Do you use "run" for "execute"? "Crash" for "abnormal program termination"?
> If so, perhaps you're being led into a wrong state of mind? :^)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Isn't this against the Mann act? (see man Mann for details) :-)

Duncan Gibson

unread,

Jan 25, 1994, 2:53:01 AM1/25/94

to

In article <21JAN199...@erich.triumf.ca> \

iv...@erich.triumf.ca (Ivan D. Reid) writes:
>
> I have a notice on my office wall that I paraphrased from an item in
>one of Jon Bentley's books:
>
> o Of all my programming bugs, 80% are typing errors.
> o Of the remaining 20%, 80% are syntax errors.
> o Of the remaining 4%, 80% are trivial logic errors.
> o Of the remaining 0.8%, 80% are pointer errors.
> o And the rest are bloody hard!

As a constant reminder I have a poster on my wall which says:

"The Man who makes no mistakes does not usually make anything"

Just having this in front of me all day has even changed my attitude so
that I am now less defensive than before and I readily admit my mistakes
to other people. It's amazing how much more productive it is...

Cheers
Duncan
--
Duncan Gibson, ESTEC/YCV, Postbus 299, 2200AG Noordwijk, The Netherlands
Preferred email: dun...@yc.estec.esa.nl or ...!sun4nl!esatst!duncan
Desperate email: dgi...@estec.esa.nl or dgi...@ESTEC.BITNET

Mark Newton

unread,

Jan 23, 1994, 3:53:42 PM1/23/94

to

In article <2hmfo1$b...@gap.cco.caltech.edu> kar...@cco.caltech.edu (Kevin A. Archie) writes:
>Heisenbugs are not really restricted to distributed systems.
>I've seen a number of crashes that went away when the program
>was run in the debugger.

Wow! That must mean that the debugger has been working! :-)

- mark
--
--------------------------------------------------------------------
I tried an internal modem, new...@cleese.apana.org.au
but it hurt when I walked. Mark Newton
----- Voice: +61-8-3735575 --------------- Data: +61-8-3736006 -----

m...@mole-end.matawan.nj.us

unread,

Jan 24, 1994, 6:25:00 AM1/24/94

to

In article <PCG.94Ja...@decb.aber.ac.uk>, p...@aber.ac.uk (Piercarlo Grandi) writes:

> This is the Algol/"European" view of programming (descriptive,
> algorithmic language, mistakes); the American/"PL/1" view (prescriptive,
> programming language, bugs) is that programs are meant _essentially_ to
> have effects (to be executed), communication being an incidental step to
> that end. Thus completely different emphasis on "I know it will work"
> vs. "I hacked it until it worked".

Aw, come off it, willya! There's plenty of smarts and plenty of dumbs
on both sides of the Atlantic and both sides of the Channel and both
sides of the Pillars of Hercules.

Let me remind you that, contrary to popular opinion, the Wright brothers
taught themselves calculus and what fluid dynamics there was at that time,
and conducted actual wind tunnel experiments. They were a lot more than
`bicycle mechanics.'

Zoltan Somogyi

unread,

Jan 27, 1994, 1:29:38 AM1/27/94

to

lor...@fnalo.fnal.gov writes:
>Wasn't D.E.Knuth that said something like "I have proved that
>program correct, but I did not actually run it - so, beware of
>bugs" ? Does somebody remember the correct quotation ?

I don't know about Knuth, but around 1980 somebody here wrote and proved
correct a small text formatting program. I heard that when they ran it,
they found that the output looked bad. I don't know whether the proof
was in error or the specification itself was wrong, but it gave the author
a red face for a while.

The proof was actually published in an article in the March 1983 issue
of Software - Practice and Experience, although I strongly doubt whether
this anecdote made it into print.

Zoltan Somogyi <z...@cs.mu.OZ.AU>
Department of Computer Science, University of Melbourne, AUSTRALIA

Peter da Silva

unread,

Jan 27, 1994, 12:34:34 PM1/27/94

to

In article <2hs5cl$3...@vixen.cso.uiuc.edu>,

John Gordon <gor...@osiris.cso.uiuc.edu> wrote:
> Actually, that's not true. The term 'bug' was in use before the
> famed moth incident. Check out _The New Hacker's Dictionary_ for info on
> this and many other hacker-type stories. A great book.

Eric still doesn't have the moth incident right. Regardless of what the
people at the Smithsonian told him, the moth *was* at the Smithsonian during
1979, because I saw it there when I was visiting Washington that year.

Now, is this a bug in the book or a mistake?

(neither, it's a bug in the Smithsonian)
--
Peter da Silva `-_-'
Network Management Technology Incorporated 'U`
1601 Industrial Blvd. Sugar Land, TX 77478 USA
+1 713 274 5180 "Hast Du heute schon Deinen Wolf umarmt?"

Mark Brader

unread,

Jan 28, 1994, 5:53:23 PM1/28/94

to

> > But that's how the term originated for computer programs. ...

>
> Actually, that's not true. The term 'bug' was in use before the
> famed moth incident. Check out _The New Hacker's Dictionary_ for info on
> this and many other hacker-type stories. A great book.

And it may also be noted that the bug in the famed incident affected the
*hardware*, and was not a "_mistake_" in a program!

The New Hacker's Dictionary is the book form of the Jargon File, which is
available for anonymous FTP from prep.ai.mit.edu (18.71.0.38) in the directory
pub/gnu. The files in that directory include

-rw-r--r-- 1 14910 523845 Jul 29 1993 jarg300.info.gz
-rw-r--r-- 1 14910 478061 Jul 29 1993 jarg300.txt.gz
-rw-r--r-- 1 14910 1663 Jul 29 1993 jargon-README
-rw-r--r-- 1 14910 145946 Jul 29 1993 jargon-upd.gz
-rw-r--r-- 1 14910 37055 Apr 11 1990 jargon.text.gz

Here is how jargon-README describes the others:

jargon.text.gz
the original MIT/Stanford AI jargon file.

jarg300.txt.gz
The 3.0.0 version, corresponding to the second paper edition from
MIT Press. 1961 entries.

jarg300.info.gz
Version of the above suitable for info browsing.

jargon-upd.gz
All new and changed entries since 2.9.6.

Entries with only trivial changes (spelling, punctuation,
references, minor changes of phrasing) have been omitted from the
change list.

(2.9.6 corresponded to the first edition of the book.)

Also note that vh-1.5.tar.gz contains sources for a hypertext reader
handy for browsing the File.

All the files are compressed with gzip, the GNU compression program which
provides significantly more compression than "compress". (I assume gzip
source is on there too, but I haven't checked.) The uncompressed form of
jarg300.txt.gz is 1,169,372 bytes long.

--
Mark Brader "... there is no such word as 'impossible' in
SoftQuad Inc., Toronto my dictionary. In fact, everything between
utzoo!sq!msb 'herring' and 'marmalade' appears to be missing."
m...@sq.com -- Douglas Adams: Dirk Gently's Holistic Detective Agency

This article is in the public domain.

Lars Joedal

unread,

Jan 28, 1994, 5:26:01 AM1/28/94

to

z...@munta.cs.mu.OZ.AU (Zoltan Somogyi) writes:
>[...] around 1980 somebody here wrote and proved

>correct a small text formatting program. I heard that when they ran it,
>they found that the output looked bad. I don't know whether the proof
>was in error or the specification itself was wrong, but it gave the author
>a red face for a while.

That reminds me of the concluding section of "Programming From
Specifications" by C. Morgan (Printice Hall, 1990). The book is on
how to turn an exact specification into a computer program by a
formal method. Thus the resultant programs are automatically proved
to be correct - the proof is the derivation from the specification.
At the end of the book there is an example of a text formatting
program developed from a formal specification. The author admits
that the first time the program was run on a computer it didn't work.
This was because there had been a slight error in the derivation of
the program from the specification, and another slight error in the
translation from the resultant (slightly abstract) program to an
actual Modula-2 implementation.
Note that this author does nothing to conceal the error. On the
contrary, it is held up as an example showing that because of human
errors formal metods does *not* remove the need for debugging.

Brad Baker

unread,

Jan 26, 1994, 9:40:39 PM1/26/94

to

>Well, this might not be that easy. The very nature of Heisenbugs (I really
>like that term :-) is that they may _or_may_not_ appear. How can you test
>then, that a block of code does the right thing? You can't, because the bug
>might appear or it might not appear. So who is to say that some code is ok?
>The code might be ok, but the compiler might produce trash only under
>certain environmental circumstances. Or the linker messes up because of
>???. Or two bits of your ram are defect and neither your standard system
>check nor the HW parity check finds it for some reason. Have fun.

Software engineering is never about writing perfect code but you can write
code with a high degree of certainty that it will work correctly. I think
you are heading down the wrong path if you try to say, as you do in the
above staement, "My code is correct." You should be looking at it as "My
code is correct as best as I can ascertain".

*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*
- -
* Brad Baker *
- Software Engineer, Sterling Software -
* *
- PHONE : 61-2-975-8350 -
* FAX : 61-2-975-5907 *
- -
* INTERNET : br...@dill.sydney.sterling.com *
- COMPUSERVE : 10032,2611 -
* *
- -
*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*