Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

"The weirdest compiler bug" by Scott Rasmussen

60 views
Skip to first unread message

Lynn McGuire

unread,
Feb 8, 2021, 6:39:28 PM2/8/21
to
"The weirdest compiler bug" by Scott Rasmussen
https://blog.zaita.com/mingw64-compiler-bug/

"There are approximately 7.5x10^18 grains of sand on Earth. This story
is about finding changes in an equation that has a difference of
approximately 1e-18 out of hundreds of billions of calculations. That is
7 grains of sand that are different to what we expect across the entire
planet Earth."

"After spending days generating gigabytes of debug logs and GDB
breakpoints, I finally discovered a very peculiar bug in the compiler. I
thought this would be an interesting story to tell."

Lynn

Juha Nieminen

unread,
Feb 9, 2021, 4:28:24 AM2/9/21
to
You ain't debugged a compiler error until you have encountered one... on a
compiler for an 8-bit CPU (yes, those are still being used) in an embedded
system where your only way to debug is by writing things to an UART, and
where adding code to the program (to write debug values to that UART)
affects the bug.

Yes, been there, done that. It was a nightmare to find. Quite rewarding
when I finally found it (and was able to circumnavigate it by changing
the code in such a manner that it didn't trigger the bug).

(If you are curious, the compiler in question was sdcc.)

David Brown

unread,
Feb 9, 2021, 6:13:37 AM2/9/21
to
Ha! Young folk today, complaining that they have only a UART for
debugging. When I were a lad, we used a voltmeter on a spare pin as the
debugger. The first debugger I had for assembly programming was
listening to the sound of the computer's power supply as types of work
was done.

(Yes, compiler bugs are a real pain. I've come across a few over the
years, in different compilers.)

Manfred

unread,
Feb 9, 2021, 10:11:31 AM2/9/21
to
Even if, as far as I can remember, it is the only Windows port of GCC,
MinGW is not officially supported by the gcc team.

Paavo Helde

unread,
Feb 9, 2021, 12:50:27 PM2/9/21
to
There is at least also Cygwin port of GCC. But I'm quite sure this is
not officially supported by the gcc team either.

Chris M. Thomasson

unread,
Feb 9, 2021, 3:15:06 PM2/9/21
to
Fwiw, a long time ago there was a "bug?" in GCC that could prevent POSIX
pthread_mutex_trylock from working correctly:

https://groups.google.com/g/comp.programming.threads/c/Y_Y2DZOWErM/m/JI3i5zlA2H0J

It was an optimization that would screw things up pretty bad. The scary
part is that it could introduce a race-condition. A lot of times the
program would work, other times it could silently corrupt data. Those
are pretty damn hard to debug.

There are a lot of smart people commenting on that thread. Heck, even I
am there. ;^)

red floyd

unread,
Feb 10, 2021, 1:31:17 AM2/10/21
to
That's a "Heisenbug". Observing the bug affects its behavior.

red floyd

unread,
Feb 10, 2021, 1:35:19 AM2/10/21
to
Not quite a compiler bug, but a library bug. We were using a vendor
provided library for the TI 34020, and had to compile it ourselves.
Unfortunately, it only worked when compiled without optimization.

The issue was that registers were memory-mapped, and they were being
optimized out of polling loops.

e.g:

const char *const register = (const char *) 0x12345678L;

// ...

while ((register & SOMEBIT) == 0)
/* spin */ ;

The fix was to declare it const volatile:

const volatile char *const register = (const char *) 0x12345678L;

That one took a while to find.


David Brown

unread,
Feb 10, 2021, 5:52:29 AM2/10/21
to
Vendor supplied libraries and code regularly have bugs, and even more
regularly have limitations on the optimisation that lets them work (that
is, of course, a kind of bug in the code - the vendor's viewpoint might
be that it is just using the code outside of specifications.
Undocumented specifications, of course).

I've seen vendor code that fails to compile in C99 mode, and is only
usable in C90 mode. (It takes a special kind of imaginative cock-up to
achieve that.) I've seen vendor code that is pointlessly written in
assembly, when simple clear C code would have been much more efficient -
and would not fail to build due to small changes in the compiler flags.

A lot of vendor-supplied code seems to be written by students on summer
jobs. And a lot seems to be hacks and patches built on decades-old code
written for a different type of processor.


But it is IMHO more annoying when it is bugs in the compiler or
compiler-supplied libraries, from suppliers that normally have very high
quality. One that I find particularly irritating is that the
gcc-supplied atomics library for the ARM (I think it is more general
than that, but I use gcc on embedded ARMs) is totally wrong. Spin locks
are /not/ a good idea on single processor systems!


James Kuyper

unread,
Feb 10, 2021, 8:33:19 AM2/10/21
to
On 2/10/21 5:52 AM, David Brown wrote:
...
> I've seen vendor code that fails to compile in C99 mode, and is only
> usable in C90 mode. (It takes a special kind of imaginative cock-up to
> achieve that.)

Not really - relying upon implicit int or implicit function definition
would be sufficient.

David Brown

unread,
Feb 10, 2021, 8:41:58 AM2/10/21
to
A sufficiently lax C99 compiler will accept these, perhaps with warnings
(and like gcc, this compiler was pretty lax by default - far too lax,
IMHO). No, the problem here was caused by an incomprehensible jungle of
conditional compilations that were there to support different compilers
and different microcontrollers - most of which were no longer even
produced. This mess had errors, leading to compilation faults with
certain perfectly reasonable compiler flags. (Bart would have had a
field day if he saw that code!)


Jorgen Grahn

unread,
Feb 10, 2021, 4:49:08 PM2/10/21
to
Sounds like my experience with programmable network controllers. We
got this multi-megabyte SDK, and you couldn't use it as a library: you
had to use it as a framework, and submit to their build tools and
everything[1]. We mostly wanted to add some filters to make different
packets end up on different Rx queues.

I could never figure out if it was hard on purpose (an obfuscation
layer so you couldn't learn how the hardware worked?) or if it was due
to incompetence. But wouldn't it be good for a hardware vendor if the
hardware was easy to integrate with software?

> (Bart would have had a field day if he saw that code!)

I think I felt like Bart a lot in those days.

/Jorgen

[1] Which was a problem, because we were already submitting,
BDSM-style, to a number of other build chains.

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Chris M. Thomasson

unread,
Feb 10, 2021, 8:21:48 PM2/10/21
to
Are they using spinlocks to implement the atomics? I have not seen the
library's source code.

Chris M. Thomasson

unread,
Feb 10, 2021, 8:30:59 PM2/10/21
to
On 2/9/2021 10:35 PM, red floyd wrote:
> On 2/9/2021 12:14 PM, Chris M. Thomasson wrote:
>> On 2/8/2021 3:39 PM, Lynn McGuire wrote:
>>> "The weirdest compiler bug" by Scott Rasmussen
>>>     https://blog.zaita.com/mingw64-compiler-bug/
>>>
>>> "There are approximately 7.5x10^18 grains of sand on Earth. This
>>> story is about finding changes in an equation that has a difference
>>> of approximately 1e-18 out of hundreds of billions of calculations.
>>> That is 7 grains of sand that are different to what we expect across
>>> the entire planet Earth."
>>>
>>> "After spending days generating gigabytes of debug logs and GDB
>>> breakpoints, I finally discovered a very peculiar bug in the
>>> compiler. I thought this would be an interesting story to tell."
>>>
>>
>> Fwiw, a long time ago there was a "bug?" in GCC that could prevent
>> POSIX pthread_mutex_trylock from working correctly:
>>
>> https://groups.google.com/g/comp.programming.threads/c/Y_Y2DZOWErM/m/JI3i5zlA2H0J
>>
>>
>> It was an optimization that would screw things up pretty bad. The
>> scary part is that it could introduce a race-condition. A lot of times
>> the program would work, other times it could silently corrupt data.
>> Those are pretty damn hard to debug.
>>
>> There are a lot of smart people commenting on that thread. Heck, even
>> I am there. ;^)
>
> Not quite a compiler bug, but a library bug.

Well, the compiler optimization would break POSIX. I thought it was a
bit odd because gcc is used for the Linux Kernel.


>  We were using a vendor
> provided library for the TI 34020, and had to compile it ourselves.
> Unfortunately, it only worked when compiled without optimization.
>
> The issue was that registers were memory-mapped, and they were being
> optimized out of polling loops.
>
> e.g:
>
> const char *const register = (const char *) 0x12345678L;
>
> // ...
>
> while ((register & SOMEBIT) == 0)
>    /* spin */ ;
>
> The fix was to declare it const volatile:
>
> const volatile char *const register = (const char *) 0x12345678L;
>
> That one took a while to find.

Oh yeah. I have definitely had to debug code like that. Back in the day,
volatile was a work around. Today, using std::atomic can be a better option.

Juha Nieminen

unread,
Feb 16, 2021, 9:10:21 AM2/16/21
to
red floyd <no.spa...@its.invalid> wrote:
>> You ain't debugged a compiler error until you have encountered one... on a
>> compiler for an 8-bit CPU (yes, those are still being used) in an embedded
>> system where your only way to debug is by writing things to an UART, and
>> where adding code to the program (to write debug values to that UART)
>> affects the bug.
>>
>> Yes, been there, done that. It was a nightmare to find. Quite rewarding
>> when I finally found it (and was able to circumnavigate it by changing
>> the code in such a manner that it didn't trigger the bug).
>>
>> (If you are curious, the compiler in question was sdcc.)
>>
> That's a "Heisenbug". Observing the bug affects its behavior.

If you are curious about why modifying the code affected the bug, it
was because the compiler was generating code that wrote on the wrong
part of the stack, which would thus corrupt something that some
function upper on the call stack was using. (I started suspecting that
to be the case when the program was crashing on a 'return'. Indeed,
the return address was being corrupted somewhere deep down the
call stack.)

Rather obviously, if you wrote any code that added (or removed)
anything from the stack, the thing that would get corrupted would
likewise change.
0 new messages