Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

is there any UB in this integer math...

192 views
Skip to first unread message

Chris M. Thomasson

unread,
Feb 17, 2019, 11:23:23 PM2/17/19
to
There is a sort of funny discussion going on over on reddit about my
read/write mutex. Some people think is has UB wrt an integer overflow or
underflow. I boiled it down to the following simple program:
____________________________________
#include <iostream>
#include <climits>


long fetch_add(long& gcount, long addend)
{
long lcount = gcount;
gcount += addend;
return lcount;
}


int main()
{
long m_count = LONG_MAX;
std::cout << "m_count = " << m_count << "\n";

// simulate three concurret readers
fetch_add(m_count, -3);
std::cout << "m_count = " << m_count << ", 3 readers\n";

// now m_count = LONG_MAX - 3

// simulate a writer.
long count = fetch_add(m_count, -LONG_MAX);
std::cout << "m_count = " << m_count << ", 3 readers in write mode\n";

if (count < LONG_MAX)
{
long readers = LONG_MAX - count;
std::cout << "count = " << count << "\n";
std::cout << "readers = " << readers << "\n";
}

return 0;
}
____________________________________


Is there any UB in there? Some on reddit seem to think so. They are most
likely trolling me. Little shi%'s.

Chris M. Thomasson

unread,
Feb 18, 2019, 1:15:03 AM2/18/19
to
Fwiw, this is in regard to the ct_rwmutex::lock_shared function here:

https://pastebin.com/raw/xCBHY9qd

Actually, the only way ct_rwmutex::m_count can get into UB land is if
the number of concurrent reader threads exceeds LONG_MAX. Well, that is
a heck of a lot of threads. Sorry, but still venting from some reddit
trolls. I think they know better: Almost sorry for posting it there in
the first place. Not sorry for posting it here, and here:

https://groups.google.com/d/topic/lock-free/3sO-lnwtsi8/discussion

____________________________
#define RWMUTEX_COUNT_MAX LONG_MAX

struct ct_rwmutex
{
// shared state
std::atomic<long> m_wrstate;
std::atomic<long> m_count;
std::atomic<long> m_rdwake;

ct_slow_semaphore m_rdwset;
ct_slow_semaphore m_wrwset;
ct_fast_mutex m_wrlock;


ct_rwmutex() :
m_wrstate(1),
m_count(RWMUTEX_COUNT_MAX),
m_rdwake(0),
m_rdwset(0),
m_wrwset(0) {
}


// READ, pretty slim...
void lock_shared()
{
if (m_count.fetch_add(-1, std::memory_order_acquire) < 1)
{
m_rdwset.dec();
}
}

void unlock_shared()
{
if (m_count.fetch_add(1, std::memory_order_release) < 0)
{
if (m_rdwake.fetch_add(-1, std::memory_order_acq_rel) == 1)
{
m_wrwset.inc();
}
}
}


// WRITE, more hefty
void lock()
{
m_wrlock.lock();

long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,
std::memory_order_acquire);

if (count < RWMUTEX_COUNT_MAX)
{
long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count,
std::memory_order_acquire);

if (rdwake + RWMUTEX_COUNT_MAX - count)
{
m_wrwset.dec();
}
}
}

// write unlock
void unlock()
{
long count = m_count.fetch_add(RWMUTEX_COUNT_MAX,
std::memory_order_release);

if (count < 0)
{
m_rdwset.add(-count);
}

m_wrlock.unlock();
}
};
____________________________



____________________________

Chris M. Thomasson

unread,
Feb 18, 2019, 1:17:30 AM2/18/19
to
On 2/17/2019 10:14 PM, Chris M. Thomasson wrote:
> On 2/17/2019 8:23 PM, Chris M. Thomasson wrote:
>> There is a sort of funny discussion going on over on reddit about my
>> read/write mutex. Some people think is has UB wrt an integer overflow
>> or underflow. I boiled it down to the following simple program:
[...]
> Actually, the only way ct_rwmutex::m_count can get into UB land is if
> the number of concurrent reader threads exceeds LONG_MAX. Well, that is
> a heck of a lot of threads. Sorry, but still venting from some reddit
> trolls. I think they know better: Almost sorry for posting it there in
> the first place. Not sorry for posting it here, and here:
>
> https://groups.google.com/d/topic/lock-free/3sO-lnwtsi8/discussion
[...]

ARGH! Wrong link, but still an interesting discussion nonetheless, that
happens to be related to read/write access patterns. Strange mistake.

Anyway, here is the correct link:

https://groups.google.com/d/topic/lock-free/zzZX4fvtG04/discussion

Sorry.

David Brown

unread,
Feb 18, 2019, 2:52:24 AM2/18/19
to
On 18/02/2019 05:23, Chris M. Thomasson wrote:
> There is a sort of funny discussion going on over on reddit about my
> read/write mutex. Some people think is has UB wrt an integer overflow or
> underflow. I boiled it down to the following simple program:
> ____________________________________
> #include <iostream>
> #include <climits>
>
>
> long fetch_add(long& gcount, long addend)
> {
>     long lcount = gcount;
>     gcount += addend;
>     return lcount;
> }
>
>
> int main()
> {
>     long m_count = LONG_MAX;
>     std::cout << "m_count = " << m_count << "\n";
>
>     // simulate three concurret readers
>     fetch_add(m_count, -3);
>     std::cout << "m_count = " << m_count << ", 3 readers\n";
>
>     // now m_count = LONG_MAX - 3
>
>     // simulate a writer.
>     long count = fetch_add(m_count, -LONG_MAX);

So count is LONG_MAX - 3, while m_count is now -3.

>     std::cout << "m_count = " << m_count << ", 3 readers in write mode\n";
>
>     if (count < LONG_MAX)
>     {
>         long readers = LONG_MAX - count;

Readers is 3.

I don't see any overflows here.

If someone thought that fetch_add returned the result of the addition,
then they'd think count is -3 and thus readers tries to be LONG_MAX + 3.
Could that be the mistake they are making?


(I haven't looked at your real code, just this post.)

Juha Nieminen

unread,
Feb 18, 2019, 2:58:14 AM2/18/19
to
Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
> There is a sort of funny discussion going on over on reddit about my
> read/write mutex. Some people think is has UB wrt an integer overflow or
> underflow.

I doubt there exists a single piece of hardware capable of compiling and
running a program in use today that does not use 2's complement arithmetic
and where over and underflow isn't dealt simply by doing all arithmetic
modulo the size of the integer types.

I don't remember if it's undefined behavior according to the standard,
but who cares? It's going to work as arithmetic modulo the maximum size
anyway, for all practical purposes. Nobody is going to use the library
in an architecture where that's not the case (even if such an architecture
even exists).

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

David Brown

unread,
Feb 18, 2019, 3:45:18 AM2/18/19
to
On 18/02/2019 08:58, Juha Nieminen wrote:
> Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>> There is a sort of funny discussion going on over on reddit about my
>> read/write mutex. Some people think is has UB wrt an integer overflow or
>> underflow.
>
> I doubt there exists a single piece of hardware capable of compiling and
> running a program in use today that does not use 2's complement arithmetic
> and where over and underflow isn't dealt simply by doing all arithmetic
> modulo the size of the integer types.

Well, you'd be wrong. /Very/ wrong.

There are no modern systems that don't use 2's complement arithmetic -
that half is correct. But modern compilers can, and do, assume that
your signed arithmetic never overflows.

>
> I don't remember if it's undefined behavior according to the standard,
> but who cares? It's going to work as arithmetic modulo the maximum size
> anyway, for all practical purposes. Nobody is going to use the library
> in an architecture where that's not the case (even if such an architecture
> even exists).
>

That is extraordinarily bad advice. Do not let your signed integers
overflow - it is /always/ wrong. (Regardless of the behaviour of the
compiler, language, and hardware, overflow - signed or unsigned - is
almost always a bug in the software with meaningless results.)

Compilers know that signed integer arithmetic never overflows. They
optimise using that knowledge. If you have written code that relies on
the behaviour when overflow /does/ occur, you are lying to your compiler
- and bad things will happen. Compilers don't cry - programmers do.

If Chris' code has signed integer overflow, then it is undefined
behaviour and he needs to find it and fix it - not ignore it and hope no
one notices. (In the code he posted, I could not see any problems.)

gcc's "-fsanitize=undefined" is your friend here, as is "-Wstrict-overflow".

Paavo Helde

unread,
Feb 18, 2019, 5:11:21 AM2/18/19
to
Most probably you have such an architecture/compiler right under your
fingertips:

> cat ->test1.cpp
#include <limits>
int main() {
int i = std::numeric_limits<int>::max();
i = i+1;
}
> g++ -ftrapv test1.cpp
> ./a.out
Aborted (core dumped)

Öö Tiib

unread,
Feb 18, 2019, 5:46:21 AM2/18/19
to
On Monday, 18 February 2019 10:45:18 UTC+2, David Brown wrote:
> On 18/02/2019 08:58, Juha Nieminen wrote:
> > Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
> >> There is a sort of funny discussion going on over on reddit about my
> >> read/write mutex. Some people think is has UB wrt an integer overflow or
> >> underflow.
> >
> > I doubt there exists a single piece of hardware capable of compiling and
> > running a program in use today that does not use 2's complement arithmetic
> > and where over and underflow isn't dealt simply by doing all arithmetic
> > modulo the size of the integer types.
>
> Well, you'd be wrong. /Very/ wrong.
>
> There are no modern systems that don't use 2's complement arithmetic -
> that half is correct. But modern compilers can, and do, assume that
> your signed arithmetic never overflows.

Since compiler writers are people with extremely benchmark-oriented
head shape they sometimes even assume that when you have explicitly
used implementation-specific compiler flags to define behavior on case
of signed arithmetic overflow. They of course do fix such issues when
reported but some keep slipping in both with clang and gcc.

Without -ftrapv or -fwrapv all bets are off and one can get full set of
nasal demons on signed integer overflow on those compilers.

Bart

unread,
Feb 18, 2019, 7:52:37 AM2/18/19
to
But you gave it the -ftrapv option, which without looking it up I assume
means trap-on-overflow.

So you told it to abort in this situation

This can be done whatever the language, compiler, or hardware.

Presumably you can also invent an option, if it is not already the
default behaviour, to ignore such an overflow.


David Brown

unread,
Feb 18, 2019, 8:22:21 AM2/18/19
to
On 18/02/2019 11:46, Öö Tiib wrote:
> On Monday, 18 February 2019 10:45:18 UTC+2, David Brown wrote:
>> On 18/02/2019 08:58, Juha Nieminen wrote:
>>> Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
>>>> There is a sort of funny discussion going on over on reddit about my
>>>> read/write mutex. Some people think is has UB wrt an integer overflow or
>>>> underflow.
>>>
>>> I doubt there exists a single piece of hardware capable of compiling and
>>> running a program in use today that does not use 2's complement arithmetic
>>> and where over and underflow isn't dealt simply by doing all arithmetic
>>> modulo the size of the integer types.
>>
>> Well, you'd be wrong. /Very/ wrong.
>>
>> There are no modern systems that don't use 2's complement arithmetic -
>> that half is correct. But modern compilers can, and do, assume that
>> your signed arithmetic never overflows.
>
> Since compiler writers are people with extremely benchmark-oriented
> head shape they sometimes even assume that when you have explicitly
> used implementation-specific compiler flags to define behavior on case
> of signed arithmetic overflow. They of course do fix such issues when
> reported but some keep slipping in both with clang and gcc.

Nonsense. That is a popular propaganda spread by people who don't
understand how C (or C++) works, or don't want it to work the way it
does. People writing compilers do so for people who use compilers.
Accusations that compiler writers are only interested in maximum
benchmark speeds are absurd. gcc, for example, tests compilation on the
entire Debian repository - vast quantities of code, much of which is
ancient.

What compiler writers do not do, however, is limit their handling of
well-written code because some people write poor code. Instead, they
provide flags to support those that have code that relies on particular
handling of undefined behaviours, or code that "worked fine on my old
compiler" - flags like "-fwrapv" and "-fno-strict-aliasing". Then
people who know how to integer arithmetic works in C and C++ can get
faster code, and people who can't get it right (or who have to use
broken code from others) have an escape route.

>
> Without -ftrapv or -fwrapv all bets are off and one can get full set of
> nasal demons on signed integer overflow on those compilers.
>

Of course all bets are off if you have signed integer overflow - this
should well known to anyone who has learned C or C++ programming. It is
not the compiler that launches nasal daemons - it is the programmer,
when they write code that does not make sense in the language.

"-ftrapv", by the way, has been considered a poor and unreliable
solution for a good many years - "-fsanitize=signed-integer-overflow" is
what you want.

And "-fwrapv" turns invalid code with undefined behaviour into valid
code with defined but wrong behaviour, while slowing down code that was
correct all along. (Unless you think that it makes sense to add an
apple to a pile of 2147483647 apples and get -2147483648 apples - in
which case it is exactly what you want.)

Write decent code, that does what you actually want, and signed integer
overflow behaviour doesn't matter. And if you need help spotting
problems (as we all do), use the tools the compiler gives you rather
than blaming the compiler for your mistakes.

Paavo Helde

unread,
Feb 18, 2019, 8:31:07 AM2/18/19
to
The point is that this option is fully standards-conformant (a core dump
upon UB is one of the best manifestations of UB) and as a library writer
one would have a hard time explaining to his library users that they
cannot compile his library with a standards-conforming compiler, or has
to tune down its bug detection capabilities.

Note that this was in response to the claim "Nobody is going to use the
library in an architecture where [signed int wrapover] not the case".

Also, signed int overflow is not so innocent beast, I have found some
real bugs with -ftrapv and would not like to be told that I cannot use it.











David Brown

unread,
Feb 18, 2019, 8:45:06 AM2/18/19
to
Yes.

"-fsanitize=signed-integer-overflow" will also cause similar outputs.


> Note that this was in response to the claim "Nobody is going to use the
> library in an architecture where [signed int wrapover] not the case".
>
> Also, signed int overflow is not so innocent beast, I have found some
> real bugs with -ftrapv and would not like to be told that I cannot use it.
>

My understanding (and experience) is that "-ftrapv" is considered of
questionable worth, and I would not be surprised to see it removed from
gcc in the future. Use the sanitize option with modern gcc (or clang).

Ben Bacarisse

unread,
Feb 18, 2019, 9:34:32 AM2/18/19
to
"Chris M. Thomasson" <invalid_chris_t...@invalid.com>
writes:

> There is a sort of funny discussion going on over on reddit about my
> read/write mutex. Some people think is has UB wrt an integer overflow
> or underflow. I boiled it down to the following simple program:

You could boil it down even more, or at the very least say where the
people on Reddit think there is overflow.

I went to look and all I could find is a perfectly correct remark that
some other code (quite different to this boiled down code) might
overflow.

The main you posted has no overflow, but any signed integer arithmetic
might, depending on the operands.

<snip>
> Is there any UB in there? Some on reddit seem to think so. They are
> most likely trolling me. Little shi%'s.

Tell us where! I could no find your posted code on Reddit.

--
Ben.

Paavo Helde

unread,
Feb 18, 2019, 10:15:33 AM2/18/19
to
On 18.02.2019 15:44, David Brown wrote:

> My understanding (and experience) is that "-ftrapv" is considered of
> questionable worth, and I would not be surprised to see it removed from
> gcc in the future. Use the sanitize option with modern gcc (or clang).

Thanks, will look into it!



David Brown

unread,
Feb 18, 2019, 11:03:18 AM2/18/19
to
There are many sanitize options. Some of them can be quite detrimental
to performance, but run-time checking is useful during debugging and
bug-hunting.


Öö Tiib

unread,
Feb 18, 2019, 11:50:45 AM2/18/19
to
On Monday, 18 February 2019 15:22:21 UTC+2, David Brown wrote:
> On 18/02/2019 11:46, Öö Tiib wrote:
> > On Monday, 18 February 2019 10:45:18 UTC+2, David Brown wrote:
> >> On 18/02/2019 08:58, Juha Nieminen wrote:
> >>> Chris M. Thomasson <invalid_chris_t...@invalid.com> wrote:
> >>>> There is a sort of funny discussion going on over on reddit about my
> >>>> read/write mutex. Some people think is has UB wrt an integer overflow or
> >>>> underflow.
> >>>
> >>> I doubt there exists a single piece of hardware capable of compiling and
> >>> running a program in use today that does not use 2's complement arithmetic
> >>> and where over and underflow isn't dealt simply by doing all arithmetic
> >>> modulo the size of the integer types.
> >>
> >> Well, you'd be wrong. /Very/ wrong.
> >>
> >> There are no modern systems that don't use 2's complement arithmetic -
> >> that half is correct. But modern compilers can, and do, assume that
> >> your signed arithmetic never overflows.
> >
> > Since compiler writers are people with extremely benchmark-oriented
> > head shape they sometimes even assume that when you have explicitly
> > used implementation-specific compiler flags to define behavior on case
> > of signed arithmetic overflow. They of course do fix such issues when
> > reported but some keep slipping in both with clang and gcc.
>
> Nonsense. That is a popular propaganda spread by people who don't
> understand how C (or C++) works, or don't want it to work the way it
> does. People writing compilers do so for people who use compilers.

Nonsense. People writing compilers focus on what their employers
tell them to focus. That feels to be mostly about generating shorter
and faster code quicker than competitors. The whole "Meltdown and
Spectre" problem is clearly because of such benchmark-orientation.
Both hardware makers and compiler writers kept optimizing until it
was "overly optimal".

> Accusations that compiler writers are only interested in maximum
> benchmark speeds are absurd. gcc, for example, tests compilation on the
> entire Debian repository - vast quantities of code, much of which is
> ancient.

Where I wrote "only"? Sure, they have to care of other things like to
keep backwards compatibility. Also if code of popular benchmark
contains UB then they have to avoid "optimizing" it, to keep the result
"correct".

> What compiler writers do not do, however, is limit their handling of
> well-written code because some people write poor code. Instead, they
> provide flags to support those that have code that relies on particular
> handling of undefined behaviours, or code that "worked fine on my old
> compiler" - flags like "-fwrapv" and "-fno-strict-aliasing". Then
> people who know how to integer arithmetic works in C and C++ can get
> faster code, and people who can't get it right (or who have to use
> broken code from others) have an escape route.
>
> >
> > Without -ftrapv or -fwrapv all bets are off and one can get full set of
> > nasal demons on signed integer overflow on those compilers.
> >
>
> Of course all bets are off if you have signed integer overflow - this
> should well known to anyone who has learned C or C++ programming. It is
> not the compiler that launches nasal daemons - it is the programmer,
> when they write code that does not make sense in the language.
>
> "-ftrapv", by the way, has been considered a poor and unreliable
> solution for a good many years - "-fsanitize=signed-integer-overflow" is
> what you want.

Why did it become "poor" and "unreliable"? What was the reasoning?
Is it OK to release software with "poor" and "unreliable" features?
When incorrect answer is worse than no answer then normal people
want defects to crash in release too.

> And "-fwrapv" turns invalid code with undefined behaviour into valid
> code with defined but wrong behaviour, while slowing down code that was
> correct all along. (Unless you think that it makes sense to add an
> apple to a pile of 2147483647 apples and get -2147483648 apples - in
> which case it is exactly what you want.)

All I want is that they stop "optimizing" it into nasal demons when I
did specify -ftrapv and the signed arithmetic that was used did
overflow.

> Write decent code, that does what you actually want, and signed integer
> overflow behaviour doesn't matter. And if you need help spotting
> problems (as we all do), use the tools the compiler gives you rather
> than blaming the compiler for your mistakes.

I am certainly not omnipotent and so write and repair several bugs daily.
When I repair a defect then there is decent likelihood that I introduce
another one. I know noone who is different. The only people who do
no errors are the ones who do nothing. However I avoid releasing
anything with features about what I have to say that it is "poor" and
"unreliable".

Bonita Montero

unread,
Feb 18, 2019, 11:57:45 AM2/18/19
to
> Most probably you have such an architecture/compiler right under your
> fingertips:
> > cat ->test1.cpp
> #include <limits>
> int main() {
>   int i = std::numeric_limits<int>::max();
>   i = i+1;
> }
> > g++ -ftrapv test1.cpp
> > ./a.out
> Aborted (core dumped)

Is it possible to mark certain variables to excluded from this
behaviour?
And is it possible to direct the compiler to assume wrap-around
for individual variables and to exclude according optimizations
with optimized code?

Paavo Helde

unread,
Feb 18, 2019, 12:12:21 PM2/18/19
to
Sure, there is a special keyword for that: unsigned. Not much useful for
anything else than just defining wrap-over integer types.

> cat test1.cpp
#include <limits>
#include <iostream>
int main() {
int i = std::numeric_limits<int>::max();
i = static_cast<unsigned int>(i)+1;
std::cout << i << "\n";
}
> g++ -ftrapv test1.cpp
> ./a.out
-2147483648


Bonita Montero

unread,
Feb 18, 2019, 1:11:31 PM2/18/19
to
>> Is it possible to mark certain variables to excluded from this
>> behaviour?
>> And is it possible to direct the compiler to assume wrap-around
>> for individual variables and to exclude according optimizations
>> with optimized code?

> Sure, there is a special keyword for that: unsigned. ...

No, that's not that what I imagined. I imagined something that
marks a variable to be excluded from the default-behaviour set
by the compiler-flag.

David Brown

unread,
Feb 18, 2019, 1:56:42 PM2/18/19
to
I am sorry, but you appear to be confused. The "Meltdown and Spectre"
stuff is a hardware problem, not a software problem. Compiler writers
have tried to find ways to work around the hardware flaw.

As for "what employers say", then yes, in /some/ cases that is what
compiler writers focus on. But you'll find that for a number of
important compilers - including the ones most targeted by such "they
only work on benchmark" claims - the people writing them are widely
dispersed with totally different kinds of employers. In particular, the
gcc developers fall into several categories:

1. Those working for chip manufacturers - Intel, AMD, ARM, etc. These
don't care if you use gcc or anything else, as long as you buy their
chips, so their concern is that you (the programmer) get the best from
the compiler and their chip. Benchmarks for the compiler don't matter -
support for the chip matters.

2. Those working for software companies like Red Hat, IBM, etc., that
provide tools and services to developers. They want programmers to be
happy with the tools - they don't care if you use a different compiler
instead.

3. Those working for big users, like Google and Facebook. They don't
care about benchmarks - they care about performance on their own software.

4. The independent and volunteer developers. They care about the
quality of their code, and making something worthwhile - they don't care
about benchmark performances.

I'm sure there are other categories that you can think of. I can't see
any significant number being benchmark oriented. People don't choose
compilers because of their benchmarks - they choose for features, target
support, static checking, language support, compatibility with existing
source code, etc. They expect a gradual progress towards faster code
with newer versions, but not more than that. And those that pick a
compiler for its speed, do so based on the speed for their own source
code, not for some benchmark.


Like all conspiracy theories, the best way to test it is to follow the
money. Who would profit from making compilers focused on benchmark
performance as the main goal, with a disregard for support for existing
C or C++ sources?

> Both hardware makers and compiler writers kept optimizing until it
> was "overly optimal".
>

That again is simply incorrect.

Developers - hardware or software - can make mistakes, and release a
design which later turns out to have unforeseen flaws. With software,
you can often find these early and fix them, but sometimes the flaws are
discovered quite late. Hardware flaws are harder to fix - but very easy
for amateurs to condemn once they are found.

>> Accusations that compiler writers are only interested in maximum
>> benchmark speeds are absurd. gcc, for example, tests compilation on the
>> entire Debian repository - vast quantities of code, much of which is
>> ancient.
>
> Where I wrote "only"? Sure, they have to care of other things like to
> keep backwards compatibility. Also if code of popular benchmark
> contains UB then they have to avoid "optimizing" it, to keep the result
> "correct".

They sometimes have to make such adjustments, yes. Often that is
because they realise that not only do the benchmark writers make such
mistakes, but others do too - and that it can be helpful to treat such
code in the manner the programmer appeared to expect. But for most
undefined behaviour, it is hard or impossible to guess what the
programmer expected - that is the nature of undefined behaviour.

>
>> What compiler writers do not do, however, is limit their handling of
>> well-written code because some people write poor code. Instead, they
>> provide flags to support those that have code that relies on particular
>> handling of undefined behaviours, or code that "worked fine on my old
>> compiler" - flags like "-fwrapv" and "-fno-strict-aliasing". Then
>> people who know how to integer arithmetic works in C and C++ can get
>> faster code, and people who can't get it right (or who have to use
>> broken code from others) have an escape route.
>>
>>>
>>> Without -ftrapv or -fwrapv all bets are off and one can get full set of
>>> nasal demons on signed integer overflow on those compilers.
>>>
>>
>> Of course all bets are off if you have signed integer overflow - this
>> should well known to anyone who has learned C or C++ programming. It is
>> not the compiler that launches nasal daemons - it is the programmer,
>> when they write code that does not make sense in the language.
>>
>> "-ftrapv", by the way, has been considered a poor and unreliable
>> solution for a good many years - "-fsanitize=signed-integer-overflow" is
>> what you want.
>
> Why did it become "poor" and "unreliable"? What was the reasoning?
> Is it OK to release software with "poor" and "unreliable" features?

Many bugs have been found in the "-ftrapv" implementation - and in
particular, it does not trap in all cases. Personally, I think the flag
should be dropped in favour of the sanitizer, which is a more modern and
flexible alternative and which is actively maintained.

> When incorrect answer is worse than no answer then normal people
> want defects to crash in release too.

"-ftrapv" could has always been slower than non-trapping code. People
usually aim to right correct code, and have that correct code run as
fast as reasonably possible. If you want software that is full of
run-time checks, you don't program in C or C++.

In C and C++, you can always manually add any checks you want. With
C++, you can make your own types that do checking in the manner that
suits your needs.

David Brown

unread,
Feb 18, 2019, 1:59:30 PM2/18/19
to
Use the sanitizer, rather than -ftrapv, and you can mark functions with
the "no_sanitize_undefined" attribute.


James Kuyper

unread,
Feb 18, 2019, 3:22:33 PM2/18/19
to
On 2/17/19 23:23, Chris M. Thomasson wrote:
> There is a sort of funny discussion going on over on reddit about my
> read/write mutex. Some people think is has UB wrt an integer overflow or
> underflow. I boiled it down to the following simple program:
...
> Is there any UB in there? Some on reddit seem to think so. They are most
> likely trolling me. Little shi%'s.


I see no undefined behavior in the code you've shown. In general, I'm
less reliable when I say there's no problem than I am when saying that
there is a problem - but I've got David and Ben backing me up on this.
Are you sure the code you've give us is the same code they're referring to?

It would help a great deal if you could explain the arguments they've
given for thinking it's undefined. However, it's generally very
difficult to accurately summarize an argument that hasn't convinced you,
even if it's a valid argument that should have convinced you - your own
biases get in the way of accurately summarizing it.
It would be better yet if you could give us a link so we can review the
discussion and determine the validity of their arguments ourselves.

James Kuyper

unread,
Feb 18, 2019, 3:23:16 PM2/18/19
to
I've no doubt that he understood exactly what you were asking for. His
point is that this solution is better than the one you were asking for.

Chris M. Thomasson

unread,
Feb 18, 2019, 4:59:29 PM2/18/19
to
In the following thread:

https://www.reddit.com/r/cpp/comments/are68n/experimental_readwrite_mutex/

Here is a start of the sub thread about integer overflow and underflow:

https://www.reddit.com/r/cpp/comments/are68n/experimental_readwrite_mutex/egok4pi

Afaict, they did not seem to understand that fetch_add atomically
returns the previous value, not the new value. Btw, Google has been
using my algorithm in the GO language for over 7 years:

https://groups.google.com/d/msg/lock-free/zzZX4fvtG04/ebVWl0BCBAAJ

However, Dmitry Vyukov did not give be credit for creating it, e.g., put
my name in the source code. Will ask him to put my name in the GO
language source code. Fwiw, I invented my algorithm back in 2008-2009.

Chris M. Thomasson

unread,
Feb 18, 2019, 5:12:52 PM2/18/19
to
It has to come from confusion about fetch_add on Reddit. Some apparently
think that count can be negative. It will never be negative. So, I
boiled it down to a case that shows what happens when a write access
notices a batch of n readers. It simulates how the writer obtains n in
order to properly wait for the n readers.
________________________
#include <iostream>
#include <climits>


long fetch_add(long& gcount, long addend)
{
long lcount = gcount;
gcount += addend;
return lcount;
}


int main()
{
long m_count = LONG_MAX;
std::cout << "m_count = " << m_count << "\n";

// simulate three concurret readers
fetch_add(m_count, -3);
std::cout << "m_count = " << m_count << ", 3 readers\n";

// now m_count = LONG_MAX - 3

// simulate a writer. There can only be a single writer.
long count = fetch_add(m_count, -LONG_MAX);
std::cout << "m_count = " << m_count << ", 3 readers in write mode\n";

if (count < LONG_MAX)
{
std::cout << "\nReaders Detected!\n";
long readers = LONG_MAX - count;
std::cout << "count = " << count << "\n";
std::cout << "readers = " << readers << "\n";
}

return 0;
}
________________________


There can only be a single writer here, because this is modeling the
point where a writer detects n readers. count will never be negative.


Fwiw, here is my algorithm:
________________________
// Chris M. Thomassons Experimental Read/Write Mutex
// Yeah, it is pretty damn fat wrt the state, however
// it has some interesting properties...
// The state can be compressed a bit...
// btw, it has no loops...
// Take a look at the lock_shared and unlock_shared functions
Can you notice any integer overflow and/or underflow? I cannot.

james...@alumni.caltech.edu

unread,
Feb 18, 2019, 5:22:56 PM2/18/19
to
On Monday, February 18, 2019 at 4:59:29 PM UTC-5, Chris M. Thomasson wrote:
> On 2/18/2019 6:34 AM, Ben Bacarisse wrote:
> > "Chris M. Thomasson" <invalid_chris_t...@invalid.com>
> > writes:
> >
> >> There is a sort of funny discussion going on over on reddit about my
> >> read/write mutex. Some people think is has UB wrt an integer overflow
> >> or underflow. I boiled it down to the following simple program:
> >
> > You could boil it down even more, or at the very least say where the
> > people on Reddit think there is overflow.
> >
> > I went to look and all I could find is a perfectly correct remark that
> > some other code (quite different to this boiled down code) might
> > overflow.
> >
> > The main you posted has no overflow, but any signed integer arithmetic
> > might, depending on the operands.
> >
> > <snip>
> >> Is there any UB in there? Some on reddit seem to think so. They are
> >> most likely trolling me. Little shi%'s.
> >
> > Tell us where! I could no find your posted code on Reddit.
> >
>
> In the following thread:
>
> https://www.reddit.com/r/cpp/comments/are68n/experimental_readwrite_mutex/
>
> Here is a start of the sub thread about integer overflow and underflow:
>
> https://www.reddit.com/r/cpp/comments/are68n/experimental_readwrite_mutex/egok4pi
>
> Afaict, they did not seem to understand that fetch_add atomically
> returns the previous value, not the new value.

I saw no comments suggesting any such misunderstanding. I did see a
claim by you that "m_count can have any value between, and including,
-LONG_MAX and LONG_MAX". That is NOT true for the code you showed us:
the code you showed us only allows m_count to take on three possible
values: LONG_MAX, LONG_MAX-3, and -3. Therefore, your comment is about
some other piece of code that you have not yet shown us. This is also
supported by the fact that your main critic quotes code, presumably
an excerpt from your code, that doesn't match any code you've
provided in this thread. Could you please provide that code?

Chris M. Thomasson

unread,
Feb 18, 2019, 5:32:04 PM2/18/19
to
The only way my algorithm can get into a condition where m_count can
equal -LONG_MAX is if LONG_MAX readers concurrently acquire read access.
This is fine. So, in my example I showed 3 readers, this takes m_count
to -3. Try it with LONG_MAX readers. Now, this is critical: If _more_
than LONG_MAX readers hit my algorithm at the same time, then integer
overflow and underflow _will_ occur. So, as long as the number of
readers is LONG_MAX, and never exceeds this amount, then there is no UB.

Basically, the math in my algorithm requires that the number of reader
threads never exceeds LONG_MAX. Btw, have you seen a computer that had
more than LONG_MAX threads running? Perhaps 64-bit LONG_MAX?

Chris M. Thomasson

unread,
Feb 18, 2019, 5:44:40 PM2/18/19
to
On 2/18/2019 2:31 PM, Chris M. Thomasson wrote:
> On 2/18/2019 2:22 PM, james...@alumni.caltech.edu wrote:
>> On Monday, February 18, 2019 at 4:59:29 PM UTC-5, Chris M. Thomasson
>> wrote:
>>> On 2/18/2019 6:34 AM, Ben Bacarisse wrote:
>>>> "Chris M. Thomasson" <invalid_chris_t...@invalid.com>
>>>> writes:
[...]
> The only way my algorithm can get into a condition where m_count can
> equal -LONG_MAX is if LONG_MAX readers concurrently acquire read access.
> This is fine. So, in my example I showed 3 readers, this takes m_count
> to -3.

It goes to -3 when a writer subtracts LONG_MAX from m_count. This is in
regard to the boiled down code:
_________________________
m_count = LONG_MAX

fetch_add(m_count, -3) // three readers

m_count = LONG_MAX - 3

count = fetch_add(m_count, -LONG_MAX); // a single writer

count = LONG_MAX - 3

m_count = -3


readers = LONG_MAX - count = 3 = perfect!
_________________________


okay, now take this into account wrt LONG_MAX readers...
_________________________
m_count = LONG_MAX

fetch_add(m_count, -LONG_MAX) // LONG_MAX readers

m_count = LONG_MAX - LONG_MAX = 0 // Okay, we are full of readers!

count = fetch_add(m_count, -LONG_MAX); // a single writer

count = 0

m_count = -LONG_MAX

readers = LONG_MAX - count = LONG_MAX = Nice! :^D
_________________________


Only a single writer is allowed to perform the:

count = fetch_add(m_count, -LONG_MAX);

mutation.

james...@alumni.caltech.edu

unread,
Feb 18, 2019, 5:45:44 PM2/18/19
to
On Monday, February 18, 2019 at 5:12:52 PM UTC-5, Chris M. Thomasson wrote:
...
> It has to come from confusion about fetch_add on Reddit. Some apparently
> think that count can be negative. It will never be negative. So, I
> boiled it down to a case that shows what happens when a write access
> notices a batch of n readers. It simulates how the writer obtains n in
> order to properly wait for the n readers.
> ________________________
> #include <iostream>
> #include <climits>
>
>
> long fetch_add(long& gcount, long addend)

This is a non-member function.

> if (m_count.fetch_add(-1, std::memory_order_acquire) < 1)

This is a member function, whose definition you have not provided.
Presumably they're related, but the member function you haven't shown us
takes one argument with no analogue in the non-member function you did
show us. Once again, it would be much easier to evaluate the issues if
you would show all of the relevant code in an example that's compileable.

I have not looked at your code in any detail - seeing comments that
makes it clear that the code being commented on is significantly
different from the code that has been presented makes me unwilling to
invest much time on the code that has been presented. However, a quick
scan lead me to the following questions:

You said that m_count can have any value in the range -LONG_MAX to
LONG_MAX. Under what circumstances would m_count be LONG_MAX? What
happens if the following code is executed at a time when that is the
case?

> if (m_count.fetch_add(1, std::memory_order_release) < 0)

You said that m_count can have any value in the range -LONG_MAX to
LONG_MAX, inclusive. Under what circumstances would m_count be negative?
What happens if the following code is executed at a time when that is
the case?

> long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,

You said that m_count can have any value in the range -LONG_MAX to
LONG_MAX, inclusive. Under what circumstances would m_count be positive?
What happens if the following code is executed at a time when that is
the case?

> long count = m_count.fetch_add(RWMUTEX_COUNT_MAX,

Since I haven't bothered examining your code in detail; there might very
well be good reasons why the cases I've just mentioned above will never
actually come up. However, your response to the person who asked the
very same question on Reddit (in less detail) just denigrated his
understanding without bother to explain what those reasons are.

Chris M. Thomasson

unread,
Feb 18, 2019, 5:50:11 PM2/18/19
to
m_count is initialized to LONG_MAX in the constructor of ct_rwmutex. Any
time that ct_rwmutex::m_count equals LONG_MAX means that there are no
readers or writers, or activity whatsoever.

________________
#define RWMUTEX_COUNT_MAX LONG_MAX

struct ct_rwmutex
{
// shared state
std::atomic<long> m_wrstate;
std::atomic<long> m_count;
std::atomic<long> m_rdwake;

ct_slow_semaphore m_rdwset;
ct_slow_semaphore m_wrwset;
ct_fast_mutex m_wrlock;


ct_rwmutex() :
m_wrstate(1),
m_count(RWMUTEX_COUNT_MAX),
m_rdwake(0),
m_rdwset(0),
m_wrwset(0) {
}
[...]
________________

James Kuyper

unread,
Feb 18, 2019, 6:07:00 PM2/18/19
to
On 2/18/19 17:50, Chris M. Thomasson wrote:
> On 2/18/2019 2:45 PM, james...@alumni.caltech.edu wrote:
...
>> You said that m_count can have any value in the range -LONG_MAX to
>> LONG_MAX, inclusive. Under what circumstances would m_count be positive?
>
> m_count is initialized to LONG_MAX in the constructor of ct_rwmutex. Any
> time that ct_rwmutex::m_count equals LONG_MAX means that there are no
> readers or writers, or activity whatsoever.

So a call to unlock_shared() when there are no readers, writers, or
activity, produces signed overflow.

I asked about three different cases. Your responded to my second case
with an answer to my first case, and didn't respond to either of the
other two cases.

Chris M. Thomasson

unread,
Feb 18, 2019, 6:16:36 PM2/18/19
to
On 2/18/2019 3:06 PM, James Kuyper wrote:
> On 2/18/19 17:50, Chris M. Thomasson wrote:
>> On 2/18/2019 2:45 PM, james...@alumni.caltech.edu wrote:
> ...
>>> You said that m_count can have any value in the range -LONG_MAX to
>>> LONG_MAX, inclusive. Under what circumstances would m_count be positive?
>>
>> m_count is initialized to LONG_MAX in the constructor of ct_rwmutex. Any
>> time that ct_rwmutex::m_count equals LONG_MAX means that there are no
>> readers or writers, or activity whatsoever.
>
> So a call to unlock_shared() when there are no readers, writers, or
> activity, produces signed overflow.

Why in the world would a thread call unlock_shared() when it did not
call lock_shared() first? Your question make no sense to me. The only
time you call unlock_shared() is to unlock read access that was
previously acquired with lock_shared():
__________________
lock_shared()
// read activity
unlock_shared()


lock()
// write activity
unlock()
__________________

Think about it for a moment.

>
> I asked about three different cases. Your responded to my second case
> with an answer to my first case, and didn't respond to either of the
> other two cases.
>

Working on on another benchmark right now. A little time constrained. It
will have some comments.

Chris M. Thomasson

unread,
Feb 18, 2019, 6:40:35 PM2/18/19
to
On 2/18/2019 2:45 PM, james...@alumni.caltech.edu wrote:
That would be unlocking a reader. Without contention from a writer this is:
______________________
m_count = LONG_MAX

// reader

lock_shared():
fetch_add(m_count, -1);
m_count = LONG_MAX - 1


// read access


unlock_shared():
fetch_add(m_count, 1);
m_count = LONG_MAX
______________________


You snipped way to much of my code.



>
> You said that m_count can have any value in the range -LONG_MAX to
> LONG_MAX, inclusive. Under what circumstances would m_count be negative?

When there is a writer.


> What happens if the following code is executed at a time when that is
> the case?
>
>> long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,

This gains writer access and can only be executed by a single thread.
You just snipped the part where we gain a writer lock. Why did you snip
so much context? Only a single thread can decrement m_count by LONG_MAX.
Look:
_________________
// WRITE, more hefty
void lock()
{
m_wrlock.lock();

long count = m_count.fetch_add(-RWMUTEX_COUNT_MAX,
std::memory_order_acquire);

if (count < RWMUTEX_COUNT_MAX)
{
long rdwake = m_rdwake.fetch_add(RWMUTEX_COUNT_MAX - count,
std::memory_order_acquire);

if (rdwake + RWMUTEX_COUNT_MAX - count)
{
m_wrwset.dec();
}
}
}

// write unlock
void unlock()
{
long count = m_count.fetch_add(RWMUTEX_COUNT_MAX,
std::memory_order_release);

if (count < 0)
{
m_rdwset.add(-count);
}

m_wrlock.unlock();
}
_________________

lock()
// write
unlock()



>
> You said that m_count can have any value in the range -LONG_MAX to
> LONG_MAX, inclusive. Under what circumstances would m_count be positive?

m_count is initialized to LONG_MAX. It means no activity.


> What happens if the following code is executed at a time when that is
> the case?
>
>> long count = m_count.fetch_add(RWMUTEX_COUNT_MAX,

This can only be executed when we unlock write access, this can only
occur when the calling thread has previously locked for write access.

Remember:

lock()
// write
unlock()



>
> Since I haven't bothered examining your code in detail; there might very
> well be good reasons why the cases I've just mentioned above will never
> actually come up. However, your response to the person who asked the
> very same question on Reddit (in less detail) just denigrated his
> understanding without bother to explain what those reasons are.
>

You should take a deeper look.

Paavo Helde

unread,
Feb 19, 2019, 1:39:48 AM2/19/19
to
Not necessarily better as I do not see much value in the wrap-around
behavior in general, but 'unsigned' is already existing and its
wrap-around behavior is guaranteed by the standards.

What Bonita asked for was a signed int with defined wrap-around
behavior. But what would be the use of it? What's so special about the
number -2147483648? Not to speak about the complications arising in
non-two's-complement implementations, hypothetical or not.

For unsigned types at least the wrap-over appears at number 0, which is
pretty special, the binary representation is fixed, simple and maps well
to the behavior of existing hardware bit registers. So it makes a bit
more sense.

But TBH, in my mind also the unsigned wrap-around is mostly a misfeature
just hiding some bugs and allowing some too clever folks to cut some
corners. In a language like C++ it should be a library feature (and of
course also allow for other ranges than 0..256^N).

David Brown

unread,
Feb 19, 2019, 3:01:32 AM2/19/19
to
On 19/02/2019 07:39, Paavo Helde wrote:
> On 18.02.2019 22:23, James Kuyper wrote:
>> On 2/18/19 13:11, Bonita Montero wrote:
>>>>> Is it possible to mark certain variables to excluded from this
>>>>> behaviour?
>>>>> And is it possible to direct the compiler to assume wrap-around
>>>>> for individual variables and to exclude according optimizations
>>>>> with optimized code?
>>>
>>>> Sure, there is a special keyword for that: unsigned. ...
>>>
>>> No, that's not that what I imagined. I imagined something that
>>> marks a variable to be excluded from the default-behaviour set
>>> by the compiler-flag.
>>
>> I've no doubt that he understood exactly what you were asking for. His
>> point is that this solution is better than the one you were asking for.
>
> Not necessarily better as I do not see much value in the wrap-around
> behavior in general, but 'unsigned' is already existing and its
> wrap-around behavior is guaranteed by the standards.
>
> What Bonita asked for was a signed int with defined wrap-around
> behavior.

Actually, I am not sure that is what he asked for. I think he asked for
a way to turn off checking, not necessarily to define the behaviour.
Disabling the sanitizer of a particular function is the best I could offer.

> But what would be the use of it? What's so special about the
> number -2147483648? Not to speak about the complications arising in
> non-two's-complement implementations, hypothetical or not.

That is, IMHO, a key point in all this. People often say they want
wrapping signed integers because they don't want undefined behaviour -
but wrapping behaviour is almost invariably /wrong/. The underlying
hardware does not have wrapping signed overflow because two's complement
wrapping is desirable - it has it because it is cheap, simple, is easy
to extend (think of doing 32-bit arithmetic on 8-bit processors), and
re-uses the same hardware logic as wrapping unsigned overflow. Some
languages, like java and "gcc -fwrapv" C, make signed overflow defined
behaviour with wrapping - that just hides problems and means the
compiler can't be of any help with warnings or run-time checks that
might have helped you find your bugs.

And having a defined behaviour limits the optimiser - it limits how
expressions can be re-arranged and simplified.

>
> For unsigned types at least the wrap-over appears at number 0, which is
> pretty special, the binary representation is fixed, simple and maps well
> to the behavior of existing hardware bit registers. So it makes a bit
> more sense.

Unsigned overflow wrapping is usually a mistake too, but it is perhaps a
little more useful than with signed. And it is useful to have /some/
way to get wrapping overflow for the rare occasions when it is needed.

>
> But TBH, in my mind also the unsigned wrap-around is mostly a misfeature
> just hiding some bugs and allowing some too clever folks to cut some
> corners. In a language like C++ it should be a library feature (and of
> course also allow for other ranges than 0..256^N).
>

Agreed.

Surely it shouldn't be too hard to make a standardised library for
integers with a selection of traits that could be chosen, such as sizes
and behaviour on overflow (undefined behaviour, unspecified result,
wrapping, saturation, exception, errno, fatal error, etc.)

Bart

unread,
Feb 19, 2019, 6:03:31 AM2/19/19
to
On 19/02/2019 08:01, David Brown wrote:
> On 19/02/2019 07:39, Paavo Helde wrote:

>> But what would be the use of it? What's so special about the
>> number -2147483648? Not to speak about the complications arising in
>> non-two's-complement implementations, hypothetical or not.

>> For unsigned types at least the wrap-over appears at number 0, which is
>> pretty special, the binary representation is fixed, simple and maps well
>> to the behavior of existing hardware bit registers. So it makes a bit
>> more sense.
>
> Unsigned overflow wrapping is usually a mistake too, but it is perhaps a
> little more useful than with signed.

That's the question that should be asked: what's so special about
unsigned over signed that overflow is tolerated with one but that the other.

People might use unsigned for various reasons, and probably not often
that they actually want modular arithmetic. I might use it to get a more
useful numeric range than the same width signed.

There are few things a language can do, but what is really undesirable
is the compiler (it tends not to be the language), deciding that the
programmer could never have been so crass as to deliberately allow
signed overflow, so it assumes it can't happen and takes the opportunity
to do something entirely unexpected and unintuitive.

More importantly, especially with C compilers (as there are necessarily
more of these around), to exhibit behaviour which is at odds with most
other compilers. Or sometimes even the same one, if that behaviour is
only turned on with optimisation.


David Brown

unread,
Feb 19, 2019, 7:06:09 AM2/19/19
to
On 19/02/2019 12:03, Bart wrote:
> On 19/02/2019 08:01, David Brown wrote:
>> On 19/02/2019 07:39, Paavo Helde wrote:
>
>>> But what would be the use of it? What's so special about the
>>> number -2147483648? Not to speak about the complications arising in
>>> non-two's-complement implementations, hypothetical or not.
>
>>> For unsigned types at least the wrap-over appears at number 0, which is
>>> pretty special, the binary representation is fixed, simple and maps well
>>> to the behavior of existing hardware bit registers. So it makes a bit
>>> more sense.
>>
>> Unsigned overflow wrapping is usually a mistake too, but it is perhaps a
>> little more useful than with signed.
>
> That's the question that should be asked: what's so special about
> unsigned over signed that overflow is tolerated with one but that the
> other.

I agree. I suppose the answer might be that wrapping is occasionally
useful, and it can be easily supported for unsigned types in C (unlike
for signed types, where it could at most be implementation dependent),
so it is then reasonable to support wrapping on unsigned types.

Personally, I'd prefer that usually unsigned overflow was also undefined
- as my unsigned arithmetic seldom needs to wrap, that would let me get
the advantages of undefined behaviour in optimisation.

>
> People might use unsigned for various reasons, and probably not often
> that they actually want modular arithmetic. I might use it to get a more
> useful numeric range than the same width signed.
>
> There are few things a language can do, but what is really undesirable
> is the compiler (it tends not to be the language), deciding that the
> programmer could never have been so crass as to deliberately allow
> signed overflow, so it assumes it can't happen and takes the opportunity
> to do something entirely unexpected and unintuitive.

It is the language in this case. C assumes the programmer writes
correct code. /Compilers/ may choose not to make that assumption in
some cases.

>
> More importantly, especially with C compilers (as there are necessarily
> more of these around), to exhibit behaviour which is at odds with most
> other compilers. Or sometimes even the same one, if that behaviour is
> only turned on with optimisation.
>

Of course C compilers exhibit behaviour that is different from other
compilers - do you think they should work like Pascal or Fortan
compilers? It is a different language.

And of course they exhibit different behaviour with different flags.
But if your code is accurate, then the results will be the same
regardless of optimisation (baring compiler bugs, obviously).

If your code fails when optimising, due to signed integer overflow
issues, then it will also fail when not optimising - unless you
seriously misunderstand the language (in which case there is little hope
for any compiler to guess what you might have meant). They symptoms
might be different, but the results will be wrong.

Manfred

unread,
Feb 19, 2019, 7:55:00 AM2/19/19
to
On 2/19/2019 12:03 PM, Bart wrote:
>> Unsigned overflow wrapping is usually a mistake too, but it is perhaps a
>> little more useful than with signed.
>
> That's the question that should be asked: what's so special about
> unsigned over signed that overflow is tolerated with one but that the other.

Trivial observation, but that answers the question: in unsigned types
the first bit is a digit bit. In signed types it is a sign bit.
This makes the unsigned type suitable for bit-oriented arithmetic, that
is obviously not the same as usual arithmetic, in which wrapping from
0xFFFFFFFF to 0x00000000 can make sense.

A good example is given in:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1969.htm

under the section "Testing Return Values For Errors"

Ralf Goertz

unread,
Feb 19, 2019, 9:14:49 AM2/19/19
to
Am Tue, 19 Feb 2019 13:54:49 +0100
schrieb Manfred <non...@add.invalid>:

> On 2/19/2019 12:03 PM, Bart wrote:
> >> Unsigned overflow wrapping is usually a mistake too, but it is
> >> perhaps a little more useful than with signed.
> >
> > That's the question that should be asked: what's so special about
> > unsigned over signed that overflow is tolerated with one but that
> > the other.
>
> Trivial observation, but that answers the question: in unsigned types
> the first bit is a digit bit. In signed types it is a sign bit.
> This makes the unsigned type suitable for bit-oriented arithmetic,
> that is obviously not the same as usual arithmetic, in which wrapping
> from 0xFFFFFFFF to 0x00000000 can make sense.

Does that really matter? If I have an unsigned short set to its maximum
possible value (32767 here) then add 1 I get -32768. Modulo 65536 (the
same module that unsigned short has) this is the same as 32768. So with
signed you get just another representation of the integers modulo 2^n
where n is the number of bits. Some mathematicians prefer one others the
other. No big deal. No?

Bart

unread,
Feb 19, 2019, 10:01:57 AM2/19/19
to
On 19/02/2019 12:54, Manfred wrote:
> On 2/19/2019 12:03 PM, Bart wrote:
>>> Unsigned overflow wrapping is usually a mistake too, but it is perhaps a
>>> little more useful than with signed.
>>
>> That's the question that should be asked: what's so special about
>> unsigned over signed that overflow is tolerated with one but that the
>> other.
>
> Trivial observation, but that answers the question: in unsigned types
> the first bit is a digit bit. In signed types it is a sign bit.

It is open to interpretation:

unsigned int a=2000000000,b=1000000000;
signed int c=2000000000,d=1000000000;
a+b;
c+d;

Both of these operations generate the same x86 'add' operation for
example. How is it treating the top bit? It doesn't matter. The x86 will
set flags both for an unsigned add, and a signed one. The resulting bit
pattern is the same in both cases.

C however (and presumably C++ inherits) says that the result is
undefined for the second 'add' operation, even it is always well-defined
for x86.

But take this case:

unsigned int a=5, b=2;
a-b;
b-a;

C says that both of these are well-defined, even though a-b gives you
the expected 3, while b-a gives you 4294967293.

No signed arithmetic is going on. But if you interpret the 4294967293
result as two's complement, with the top bit as a sign bit, then you
will get -3 if printed.

Similarly, the c+d in the first example gives -1294967296, but
3000000000 if the same bit pattern is interpreted as an unsigned value.

The two representations are closely related, in that the corresponding
bit patterns are interchangeable, but C refuses to acknowledge that.

That might be because two's complement representation is not universal,
but it means some C compiler ending up doing unexpected things even on
the 99.99% of machines which do use it.

Bart

unread,
Feb 19, 2019, 10:11:33 AM2/19/19
to
On 19/02/2019 12:05, David Brown wrote:
> On 19/02/2019 12:03, Bart wrote:

>> More importantly, especially with C compilers (as there are necessarily
>> more of these around), to exhibit behaviour which is at odds with most
>> other compilers. Or sometimes even the same one, if that behaviour is
>> only turned on with optimisation.
>>
>
> Of course C compilers exhibit behaviour that is different from other
> compilers - do you think they should work like Pascal or Fortan
> compilers? It is a different language.

No, I meant at odds even with other C compilers.

Manfred

unread,
Feb 19, 2019, 10:35:31 AM2/19/19
to
On 2/19/2019 4:01 PM, Bart wrote:
> On 19/02/2019 12:54, Manfred wrote:
>> On 2/19/2019 12:03 PM, Bart wrote:
>>>> Unsigned overflow wrapping is usually a mistake too, but it is
>>>> perhaps a
>>>> little more useful than with signed.
>>>
>>> That's the question that should be asked: what's so special about
>>> unsigned over signed that overflow is tolerated with one but that the
>>> other.
>>
>> Trivial observation, but that answers the question: in unsigned types
>> the first bit is a digit bit. In signed types it is a sign bit.
>
> It is open to interpretation:
>
>     unsigned int a=2000000000,b=1000000000;
>       signed int c=2000000000,d=1000000000;
>       a+b;
>       c+d;
>
> Both of these operations generate the same x86 'add' operation for
> example. How is it treating the top bit? It doesn't matter. The x86 will
> set flags both for an unsigned add, and a signed one. The resulting bit
> pattern is the same in both cases.

I know, but in the case of 'signed' It Happens to Work™, under the
assumption of an underlying two's complement representation.
In the case of 'unsigned' It Just Works™, guaranteed by the standard and
with no assumptions required.
When dealing with any kind of workable logic (math, algorithms, etc.)
the possibility of dropping an assumption is an added value.

>
> C however (and presumably C++ inherits) says that the result is
> undefined for the second 'add' operation, even it is always well-defined
> for x86.

You know that C as a language and x86 as an architecture are two
different things.

>
> But take this case:
>
>     unsigned int a=5, b=2;
>     a-b;
>     b-a;
>
> C says that both of these are well-defined, even though a-b gives you
> the expected 3, while b-a gives you 4294967293.
>
> No signed arithmetic is going on. But if you interpret the 4294967293
> result as two's complement, with the top bit as a sign bit, then you
> will get -3 if printed.

This too is based on the assumption of two's complement.

>
> Similarly, the c+d in the first example gives -1294967296, but
> 3000000000 if the same bit pattern is interpreted as an unsigned value.

As a confirmation of earlier arguments, here -1294967296 as a result
falls into the category of 'rubbish', 3000000000 does not.

>
> The two representations are closely related, in that the corresponding
> bit patterns are interchangeable, but C refuses to acknowledge that.

The two representations are interchangeable under the assumption of
two's complement. C has made the choice of lifting this constraint.

>
> That might be because two's complement representation is not universal,
> but it means some C compiler ending up doing unexpected things even on
> the 99.99% of machines which do use it.

I wouldn't look at this this way. I believe the rationale is that
wrapping overflow has negligible use for signed types, while it has
clear value for unsigned ones (see earlier example).

More than the compiler doing unexpected things, it is that C requires
the programmer to pay attention to details.
It is not distracted-friendly.

Öö Tiib

unread,
Feb 19, 2019, 11:12:27 AM2/19/19
to
The hardware vendors are major employers of those compiler writers.

> As for "what employers say", then yes, in /some/ cases that is what
> compiler writers focus on. But you'll find that for a number of
> important compilers - including the ones most targeted by such "they
> only work on benchmark" claims - the people writing them are widely
> dispersed with totally different kinds of employers. In particular, the
> gcc developers fall into several categories:

Note that "they only work on benchmark" is not quote of mine. I can't
defend positions of straw-men that you build. ;-) Will you tell next
that I wrote somewhere that "I do not care about performance"?

> 1. Those working for chip manufacturers - Intel, AMD, ARM, etc. These
> don't care if you use gcc or anything else, as long as you buy their
> chips, so their concern is that you (the programmer) get the best from
> the compiler and their chip. Benchmarks for the compiler don't matter -
> support for the chip matters.

The market statements are not mostly about performance of those chips
and the likes of "Meltdown and Spectre" are not result of overeager
optimizing of those and even when it seems to be so then nothing of it
is measured with benchmarks compiled on those compilers?

> 2. Those working for software companies like Red Hat, IBM, etc., that
> provide tools and services to developers. They want programmers to be
> happy with the tools - they don't care if you use a different compiler
> instead.

The programmers they target are not overly concerned with performance?
My impression is that programmers start to post about performance before
figuring out how to turn the optimizations on.

> 3. Those working for big users, like Google and Facebook. They don't
> care about benchmarks - they care about performance on their own software.

They do not use compilers to build their software and so do not care about
compiler optimizations?

> 4. The independent and volunteer developers. They care about the
> quality of their code, and making something worthwhile - they don't care
> about benchmark performances.

Again there are developers who don't care about performance?

> I'm sure there are other categories that you can think of. I can't see
> any significant number being benchmark oriented. People don't choose
> compilers because of their benchmarks - they choose for features, target
> support, static checking, language support, compatibility with existing
> source code, etc. They expect a gradual progress towards faster code
> with newer versions, but not more than that. And those that pick a
> compiler for its speed, do so based on the speed for their own source
> code, not for some benchmark.

However it seems that there are only few weirdos like me who think that
it does not matter how fast the wrong answers are calculated and consider
it better when those wrong answers are not calculated at all.

> Like all conspiracy theories, the best way to test it is to follow the
> money. Who would profit from making compilers focused on benchmark
> performance as the main goal, with a disregard for support for existing
> C or C++ sources?

What conspiracy theory? Where did I say that they disregard support for
existing source code? If to follow money then Google, Apple, Microsoft
and Oracle have tons of own C and C++ source code that they want to
be performant and don't want to break but they want developers to use
Go, Swift, C# or Java. So indeed they might want to "extend" crappy
"features" and "optimizations" into C++ that they won't ever use in
and with their own code.

> > Both hardware makers and compiler writers kept optimizing until it
> > was "overly optimal".
> >
>
> That again is simply incorrect.
>
> Developers - hardware or software - can make mistakes, and release a
> design which later turns out to have unforeseen flaws. With software,
> you can often find these early and fix them, but sometimes the flaws are
> discovered quite late. Hardware flaws are harder to fix - but very easy
> for amateurs to condemn once they are found.

The cache, branch prediction and speculative execution are performance
optimizations piled together. That can be tricky to get such a pile
correct and if to prioritize correctness below performance then defects
slip through. Same things do happen with compiler optimizations.

> >> Accusations that compiler writers are only interested in maximum
> >> benchmark speeds are absurd. gcc, for example, tests compilation on the
> >> entire Debian repository - vast quantities of code, much of which is
> >> ancient.
> >
> > Where I wrote "only"? Sure, they have to care of other things like to
> > keep backwards compatibility. Also if code of popular benchmark
> > contains UB then they have to avoid "optimizing" it, to keep the result
> > "correct".
>
> They sometimes have to make such adjustments, yes. Often that is
> because they realise that not only do the benchmark writers make such
> mistakes, but others do too - and that it can be helpful to treat such
> code in the manner the programmer appeared to expect. But for most
> undefined behaviour, it is hard or impossible to guess what the
> programmer expected - that is the nature of undefined behaviour.

What is so controversial what is the behavior that programmer expects
on case of -ftrapv? Fortunately has been obvious that I want
division-by-zero to trap (even on MIPS and ARM, without special
compiler options) but that might also change out of blue when
a way to "optimize" it will be discovered, and then we need to
add some -fplease-dont-remove-divide-by-zero-trap I suspect.

> >> What compiler writers do not do, however, is limit their handling of
> >> well-written code because some people write poor code. Instead, they
> >> provide flags to support those that have code that relies on particular
> >> handling of undefined behaviours, or code that "worked fine on my old
> >> compiler" - flags like "-fwrapv" and "-fno-strict-aliasing". Then
> >> people who know how to integer arithmetic works in C and C++ can get
> >> faster code, and people who can't get it right (or who have to use
> >> broken code from others) have an escape route.
> >>
> >>>
> >>> Without -ftrapv or -fwrapv all bets are off and one can get full set of
> >>> nasal demons on signed integer overflow on those compilers.
> >>>
> >>
> >> Of course all bets are off if you have signed integer overflow - this
> >> should well known to anyone who has learned C or C++ programming. It is
> >> not the compiler that launches nasal daemons - it is the programmer,
> >> when they write code that does not make sense in the language.
> >>
> >> "-ftrapv", by the way, has been considered a poor and unreliable
> >> solution for a good many years - "-fsanitize=signed-integer-overflow" is
> >> what you want.
> >
> > Why did it become "poor" and "unreliable"? What was the reasoning?
> > Is it OK to release software with "poor" and "unreliable" features?
>
> Many bugs have been found in the "-ftrapv" implementation - and in
> particular, it does not trap in all cases. Personally, I think the flag
> should be dropped in favour of the sanitizer, which is a more modern and
> flexible alternative and which is actively maintained.

Sanitizers sound like debugging options. Why two almost equal features
are developed into same tool? With such logic one day the
-fsanitize=signed-integer-overflow will also become "poor"
and "unreliable" and then some third feature will be the "correct"
way to make programs to crash on signed integer overflow. With
feature creep after a while nothing is reliable.

> > When incorrect answer is worse than no answer then normal people
> > want defects to crash in release too.
>
> "-ftrapv" could has always been slower than non-trapping code. People
> usually aim to right correct code, and have that correct code run as
> fast as reasonably possible. If you want software that is full of
> run-time checks, you don't program in C or C++.

See? We have modern, branch-predicting and eagerly executing hardware
with megabytes of cache but reasonability of -ftrapv usage is questionable.
Sure, it is not for omnipotent programmers, without even measuring what
it costs but I am fallible and have never met anyone almighty.
How is it so self-evident that -ftrapv is unreasonable option?
Correctness of behavior is more important than performance of behavior
and incorrect behavior is often worse than no behavior whatsoever.

> In C and C++, you can always manually add any checks you want. With
> C++, you can make your own types that do checking in the manner that
> suits your needs.

Why? I agree that signed integer overflow is programming error and
so if the integer calculations happen to be important then the -ftrapv
handles it correctly (when it happens to work).
Also how can I be sure that they don't "optimize" my manual checks
away? After a while it can be like with null pointers. One optimizes
dereference away, other optimizes null pointer check away (since it
was "after" dereference) and result is that neither dereference does
crash nor check work.

Manfred

unread,
Feb 19, 2019, 11:43:31 AM2/19/19
to
On 2/18/2019 7:56 PM, David Brown wrote:
> As for "what employers say", then yes, in /some/ cases that is what
> compiler writers focus on.  But you'll find that for a number of
> important compilers - including the ones most targeted by such "they
> only work on benchmark" claims - the people writing them are widely
> dispersed with totally different kinds of employers.  In particular, the
> gcc developers fall into several categories:

I believe you about the categories below, but I find it hard to believe
that benchmarks don't matter.
In fact whenever results from different compilers are presented,
performance is one of the key metrics that is discussed, which is
obvious since correctness of the generated binary is no issue for any
self-respecting compiler.

Moreover, most occurrences of "undefined behavior" are usually justified
by increased performance of compiler optimizations.

>
> 1. Those working for chip manufacturers - Intel, AMD, ARM, etc.  These
> don't care if you use gcc or anything else, as long as you buy their
> chips, so their concern is that you (the programmer) get the best from
> the compiler and their chip.  Benchmarks for the compiler don't matter -
> support for the chip matters.

But they care that the chip is properly supported by the compiler, and
"properly" includes performance.

>
> 2. Those working for software companies like Red Hat, IBM, etc., that
> provide tools and services to developers.  They want programmers to be
> happy with the tools - they don't care if you use a different compiler
> instead.

You may have a point in this category. I would say that more than making
/programmers/ happy, they want to make software project /mangers/ happy,
which brings into the picture features like time-to-market, man-hours
cost etc., all goals that come prior to performance of the final product.
This is IMHO one reason for which in many benchmarks MSVC comes usually
after clang and gcc (in this order)

>
> 3. Those working for big users, like Google and Facebook.  They don't
> care about benchmarks - they care about performance on their own software.

True, but they know that in order to get performance out of their
application software, they need performing compilers.
I believe clang developers know this very well.

>
> 4. The independent and volunteer developers.  They care about the
> quality of their code, and making something worthwhile - they don't care
> about benchmark performances.

Not really. Performance may not be their first goal (compared to e.g.
being the first to support the latest technology for the open source
community), but sure it is one of the goals, especially after first
introducing such new technologies.

james...@alumni.caltech.edu

unread,
Feb 19, 2019, 1:04:45 PM2/19/19
to
On Monday, February 18, 2019 at 6:40:35 PM UTC-5, Chris M. Thomasson wrote:
> On 2/18/2019 2:45 PM, james...@alumni.caltech.edu wrote:
...
> > I have not looked at your code in any detail - seeing comments that
> > makes it clear that the code being commented on is significantly
> > different from the code that has been presented makes me unwilling to
> > invest much time on the code that has been presented. However, a quick
> > scan lead me to the following questions:
...
> You snipped way to much of my code.

Your reddit critic quoted text, presumable from your code that he was
commenting about, that doesn't match any of the code you've posted on
this thread. The topic of this thread being the validity of his comments,
I've no particular interest in diving deeply into code that isn't the
code he was commenting about. I'm certainly not interested in your code
for it's own sake. I just scanned your code, made a few comments, and
cut everything not directly relevant to the comments I made.

Note - another indication that he was commenting on different code than
I was looking at is his comment that m_count is never decremented. The
code you presented does decrement m_count. If the code he was commenting
on also had a decrement in it, then when he made that comment your best
response would have been to point out where that decrement was. Why
didn't you respond in that fashion? If it didn't have a decrement, then
one of the dangerous cases is not when you have LONG_MAX simultaneous
lockers, but whenever you've had a total of LONG_MAX lockers, whether or
not they've ever been unlocked. For programs that run 24/7 for long
periods of time (which some do), that could be a significant problem

I've no idea whether or not this is the case, since I haven't seen the
code he was commenting on.

...
> ... Why did you snip
> so much context?

As explained above.

> > You said that m_count can have any value in the range -LONG_MAX to
> > LONG_MAX, inclusive. Under what circumstances would m_count be positive?
>
> m_count is initialized to LONG_MAX. It means no activity.

For the second time in a row, you responded to a question about m_count
being positive, by addressing only the possibility that m_count is
LONG_MAX. What about values for m_count from 1 to LONG_MAX-1?

The code I saw did nothing to prevent a call to fetch_add() when m_count
happens to have a value that trigger overflow inside fetch_add(). Your
responses seem to all take the form of saying that it never makes sense
to call the relevant functions when m_count has such a value. If that is
true (and for the reasons I've given above, I haven't bothered checking
whether it is), then explaining the correct rules for using your package
so as to avoid signed integer overflow would have been a much better
response than asserting, without explanation, that your critics have
failed to understand your package.

...
> > Since I haven't bothered examining your code in detail; there might very
> > well be good reasons why the cases I've just mentioned above will never
> > actually come up. However, your response to the person who asked the
> > very same question on Reddit (in less detail) just denigrated his
> > understanding without bother to explain what those reasons are.
> >
>
> You should take a deeper look.

Post the code that your critic was commenting on, and I might bother
examining it more carefully.

Thinking about this issue led me to another point. Could you describe
what m_count means? I'm looking for a description that remains accurate
before and after you add LONG_MAX to m_count. Similarly, it should
remain accurate before and after you subtract LONG_MAX from m_count.
To put it in another, more formal way, can you define a class invariant involving m_count which remains unchanged after calling the member
function that adds LONG_MAX to it? It should also remain unchanged after
calling the member function that subtracts LONG_MAX from it.
I'm not saying that this can't be done, but only that I'm having a hard
timing imagining how it could be done. My imagination has often failed
me in similar circumstances.

I might be able to figure this out on my own from close examination of
your code, but for the reasons given above, I'm unwilling to perform
such an examination. In any event, I shouldn't have to - a description
of what m_oount means should be part of the documentation for your
design of this class. Not everyone bothers thinking explicitly in terms
of class invariants (I certainly can't claim that I do so on every
occasion), but well-designed classes should have such invariants, even
if the designer wasn't thinking explicitly about them.

Bart

unread,
Feb 19, 2019, 2:58:10 PM2/19/19
to
On 19/02/2019 15:35, Manfred wrote:
> On 2/19/2019 4:01 PM, Bart wrote:

>> Both of these operations generate the same x86 'add' operation for
>> example. How is it treating the top bit? It doesn't matter. The x86
>> will set flags both for an unsigned add, and a signed one. The
>> resulting bit pattern is the same in both cases.
>
> I know, but in the case of 'signed' It Happens to Work™, under the
> assumption of an underlying two's complement representation.
> In the case of 'unsigned' It Just Works™, guaranteed by the standard and
> with no assumptions required.
> When dealing with any kind of workable logic (math, algorithms, etc.)
> the possibility of dropping an assumption is an added value.

I wonder how many programs would stop working if they were recompiled
for a processor that didn't use two's complement (assume everything else
needed to run them is the same).

>>
>> C however (and presumably C++ inherits) says that the result is
>> undefined for the second 'add' operation, even it is always
>> well-defined for x86.
>
> You know that C as a language and x86 as an architecture are two
> different things.
>
>>
>> But take this case:
>>
>>      unsigned int a=5, b=2;
>>      a-b;
>>      b-a;
>>
>> C says that both of these are well-defined, even though a-b gives you
>> the expected 3, while b-a gives you 4294967293.
>>
>> No signed arithmetic is going on. But if you interpret the 4294967293
>> result as two's complement, with the top bit as a sign bit, then you
>> will get -3 if printed.
>
> This too is based on the assumption of two's complement.

I think I said I chose x86 as an example, but it's a VERY common example
of how things actually work.

Do you have any instance of actual machines that don't use two's
complement? I'd imagine they would either be obsolete, be odd
micro-controllers, or be some IBM mainframe.

However, looking at the numeric limits of various other languages for a
32-bit signed int type:

C# -2**31 to +2**31-1
Java -2**31 to +2**31-1
Rust believed to be -2**31 to +2**31-1
Go -2**31 to +2**31-1
D not specified that I could see

Where known, they all assume two's complement, regardless of machine.
And all, including D, seem to have agreed on widths of 8/16/32/64 bits.

C doesn't even stipulate that.

Can you blame anyone for wanting a language to acknowledge something
that is pretty much universal?

> As a confirmation of earlier arguments, here -1294967296 as a result
> falls into the category of 'rubbish', 3000000000 does not.

So just like 1u-2u, but that one apparently is not classed as rubbish.

> I wouldn't look at this this way. I believe the rationale is that
> wrapping overflow has negligible use for signed types, while it has
> clear value for unsigned ones (see earlier example).

After 40 years of experiencing exactly that behaviour on a fair number
of machines and languages, it's what I'm come to expect.

Take 8-bit signed to keep the numbers simple: (120+20) will overflow the
upper limit of +127. But what about (120+20)-20?

This should end up back as 120 using two's complement, but according to
C, it's undefined because an intermediate value overflows.

Paavo Helde

unread,
Feb 19, 2019, 3:31:44 PM2/19/19
to
On 19.02.2019 21:57, Bart wrote:
>
> Take 8-bit signed to keep the numbers simple: (120+20) will overflow the
> upper limit of +127. But what about (120+20)-20?
>
> This should end up back as 120 using two's complement, but according to
> C, it's undefined because an intermediate value overflows.

And rightly so. Try e.g. (120+20)/2 - 20, with those imaginary 8-bit
wrap-around rules the result would be -78, not the expected 50.

In C++ 8-bit ints get promoted to normal ints, so the above is not a
real example, but there are actual problems with larger numbers.
Effectively you are saying that some sloppy code is OK because it
accidentally happens to work most of the time, whereas some other very
similar sloppy code is not OK.

On the same token one could claim that writing to a location after a
dynamic array of odd number of elements is OK as the memory allocator
would leave an unused alignment gap at that place anyway.

Jorgen Grahn

unread,
Feb 19, 2019, 3:55:10 PM2/19/19
to
On Tue, 2019-02-19, Öö Tiib wrote:
> On Monday, 18 February 2019 20:56:42 UTC+2, David Brown wrote:
...
>> Many bugs have been found in the "-ftrapv" implementation - and in
>> particular, it does not trap in all cases. Personally, I think the flag
>> should be dropped in favour of the sanitizer, which is a more modern and
>> flexible alternative and which is actively maintained.
>
> Sanitizers sound like debugging options.

But that's what you want, isn't it?

> Why two almost equal features are developed into same tool?

I imagine someone at Google got funding for developing the ideas
behind ASan, did it, and got it into clang and then GCC. This guy
wasn't interested in in removing the legacy mechanism, and the GCC
people weren't interested in delaying the work by insisting that it be
harmonized with -ftrapv.

Personally I'm more annoyed by ASan not being well integrated into
GCC: the Google-style user interface with annoying colors, and the
documentation on a wiki somewhere.

Still, I've seen much worse things happen to software.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Manfred

unread,
Feb 19, 2019, 3:55:21 PM2/19/19
to
On 2/19/2019 8:57 PM, Bart wrote:
>> As a confirmation of earlier arguments, here -1294967296 as a result
>> falls into the category of 'rubbish', 3000000000 does not.
>
> So just like 1u-2u, but that one apparently is not classed as rubbish.

1u-2u is rubbish in conventional arithmetic, but it is not in wrapping
arithmetic.
Any programmer knows that the result is 0xFFFFFFFF, and this is what C
guarantees.

>
>> I wouldn't look at this this way. I believe the rationale is that
>> wrapping overflow has negligible use for signed types, while it has
>> clear value for unsigned ones (see earlier example).
>
> After 40 years of experiencing exactly that behaviour on a fair number
> of machines and languages, it's what I'm come to expect.

I'll rephrase: C requires, like for most program behaviours, that
wrapping be intentional for the programmer to use.
Since an intentional use of wrapping implies that all bits follow the
same semantics, then the unsigned type is the correct choice.

Using wrapping arithmetic with a signed type is not correct - purely
from a binary logic point of view, because that first bit would be
arbitrarily supposed to behave like the others - thus C assumes that a
careful programmer will not use it with signed types, and leaves freedom
to compiler writers to use this feature space anyhow they like, by
specifying undefined behavior.

I would guess that since this has been clearly and intentionally
specified in the standard, at least /some/ use for this 'undefinedness'
must have been foreseen by the committee.

David Brown

unread,
Feb 19, 2019, 4:34:27 PM2/19/19
to
On 19/02/2019 17:43, Manfred wrote:
> On 2/18/2019 7:56 PM, David Brown wrote:
>> As for "what employers say", then yes, in /some/ cases that is what
>> compiler writers focus on.  But you'll find that for a number of
>> important compilers - including the ones most targeted by such "they
>> only work on benchmark" claims - the people writing them are widely
>> dispersed with totally different kinds of employers.  In particular,
>> the gcc developers fall into several categories:
>
> I believe you about the categories below, but I find it hard to believe
> that benchmarks don't matter.

I am not saying that benchmarks don't matter at all - but they are of
less consequence than other factors. Also, the current trend appears to
be moving away from synthetic benchmarks to using real applications.
Look at benchmarks on www.phoronix.com for examples.

> In fact whenever results from different compilers are presented,
> performance is one of the key metrics that is discussed, which is
> obvious since correctness of the generated binary is no issue for any
> self-respecting compiler.

Yes, speed is important - and it can be an easy number to show for
comparisons. But I really don't think it is the biggest factor in
deciding which compiler to use (or which compiler to buy) - people look
at how it supports the target they want, how it supports the languages
and language versions they want, what other tools are available, ease of
use, static error checking, extensions, documentation, support,
compatibility, speed of compilation. Compiler writers - and compiler
sellers - know this.

They also know that a compiler that generates unexpected nonsense from
code that works fine on other compilers will not be popular. Sometimes
they do optimise from undefined behaviour anyway, either because the
gains are significant enough, or because they simply don't think much
code will be adversely affected.

>
> Moreover, most occurrences of "undefined behavior" are usually justified
> by increased performance of compiler optimizations.
>

That is one point, yes. Another is that it can help find mistakes. A
compiler can (with static checks or run-time checks) warn about signed
integer overflow, because it is not allowed in C. In a language that
allows it with wrapping, the tools have to accept it and can't help you
find the problem - despite the fact that it is almost certainly a bug in
the code.

And for many types of undefined behaviour, there simply is no sensible
choice of what would be an expected and appropriate defined behaviour.
That is the case for signed integer overflow - there is /no/ sensible
alternative. The current situation in C makes "int" far more like real
mathematical integers that would be possible with any given choice of
behaviour - for example, (x + 1 > x) is always true in C and in
mathematics, but is not true in, say, Java.

Arguably there are some things that are undefined behaviour in C that
could have been fully defined or implementation defined, such as shift
operator behaviours. Some compilers /do/ define such behaviour fully.

Other kinds of undefined behaviour, such as out-of-bounds array access,
cannot possibly be anything other than undefined behaviour. There is
nothing a compiler could do here without adding very significant
overhead to the code. If you don't like a language that has undefined
behaviour, then C and C++ are not for you.

>>
>> 1. Those working for chip manufacturers - Intel, AMD, ARM, etc.  These
>> don't care if you use gcc or anything else, as long as you buy their
>> chips, so their concern is that you (the programmer) get the best from
>> the compiler and their chip.  Benchmarks for the compiler don't matter
>> - support for the chip matters.
>
> But they care that the chip is properly supported by the compiler, and
> "properly" includes performance.
>

Yes. But they are interested in performance on real code, not
benchmarks - and they are only interested if the code compiles to give
the required results.

>>
>> 2. Those working for software companies like Red Hat, IBM, etc., that
>> provide tools and services to developers.  They want programmers to be
>> happy with the tools - they don't care if you use a different compiler
>> instead.
>
> You may have a point in this category. I would say that more than making
> /programmers/ happy, they want to make software project /mangers/ happy,
> which brings into the picture features like time-to-market, man-hours
> cost etc., all goals that come prior to performance of the final product.
> This is IMHO one reason for which in many benchmarks MSVC comes usually
> after clang and gcc (in this order)

A good project manager is only happy when his/her programmers are happy.
I am not claiming that all managers are good, of course, and sometimes
their job may involve persuading the programmers to change what they are
happy with. But whoever decides, benchmark speeds should not be the
deciding factor (though speed on the developers' real code may be).

>
>>
>> 3. Those working for big users, like Google and Facebook.  They don't
>> care about benchmarks - they care about performance on their own
>> software.
>
> True, but they know that in order to get performance out of their
> application software, they need performing compilers.
> I believe clang developers know this very well.

Yes. But again, it is performance on real code that is important, not
artificial benchmarks.

>
>>
>> 4. The independent and volunteer developers.  They care about the
>> quality of their code, and making something worthwhile - they don't
>> care about benchmark performances.
>
> Not really. Performance may not be their first goal (compared to e.g.
> being the first to support the latest technology for the open source
> community), but sure it is one of the goals, especially after first
> introducing such new technologies.

Again, synthetic benchmark performance is not the goal.

David Brown

unread,
Feb 19, 2019, 4:35:53 PM2/19/19
to
Undefined behaviour means there is no definition of what the code should
do. Different compilers can implement it in different ways and give
different results. That does not mean that one compiler is "right" and
the other "wrong" - the source code is wrong, and both compilers are right.

Chris M. Thomasson

unread,
Feb 19, 2019, 5:30:30 PM2/19/19
to
On 2/19/2019 10:04 AM, james...@alumni.caltech.edu wrote:
> On Monday, February 18, 2019 at 6:40:35 PM UTC-5, Chris M. Thomasson wrote:
>> On 2/18/2019 2:45 PM, james...@alumni.caltech.edu wrote:
> ...
>>> I have not looked at your code in any detail - seeing comments that
>>> makes it clear that the code being commented on is significantly
>>> different from the code that has been presented makes me unwilling to
>>> invest much time on the code that has been presented. However, a quick
>>> scan lead me to the following questions:
> ...
>> You snipped way to much of my code.
>
> Your reddit critic quoted text, presumable from your code that he was
> commenting about, that doesn't match any of the code you've posted on
> this thread.

Huh? Wow. I posted the same ct_rwmutex here that is in the reddit post:

https://pastebin.com/raw/xCBHY9qd

The ct_rwmutex is the exact same one I posted here, in this thread:

https://groups.google.com/d/msg/comp.lang.c++/q4dZJFQxpdg/hzHBsE1YBQAJ

The original post in this thread was about boiling down the math to
address the question over on Reddit. There is no UB, there is no integer
overflow or underflow as long as the number of readers never exceeds
LONG_MAX.




> The topic of this thread being the validity of his comments,
> I've no particular interest in diving deeply into code that isn't the
> code he was commenting about.

It is the exact same algorithm I posted here, in this very thread.

https://groups.google.com/d/msg/comp.lang.c++/q4dZJFQxpdg/hzHBsE1YBQAJ

And on here in the reddit thread:

https://www.reddit.com/r/cpp/comments/are68n/experimental_readwrite_mutex/

Where is it different? I just boiled down the code into a simple form
that shows how m_count will never be less than -LONG_MAX, or never
greater than LONG_MAX, and how it can take any value in between. There
is no UB. Also, wrt the lock() function, count will never be negative.


> I'm certainly not interested in your code
> for it's own sake. I just scanned your code, made a few comments, and
> cut everything not directly relevant to the comments I made.

But you cut a very important moment in the lock() function. There can
only be a single thread that takes writer access to my rwmutex. You
cannot just cut something out in a sensitive synchronization algorithm
and pretend it never happened!

>
> Note - another indication that he was commenting on different code than
> I was looking at is his comment that m_count is never decremented. The
> code you presented does decrement m_count.

He seems to be a troll, and cuts out important context, and asks
questions as if that cut context never existed. Wow. He must be doing it
on purpose. He said m_count never decrements because he does know how to
comprehend the algorithm, or doing it on purpose.

[...]

Chris M. Thomasson

unread,
Feb 19, 2019, 6:01:05 PM2/19/19
to
On 2/19/2019 6:14 AM, Ralf Goertz wrote:
> Am Tue, 19 Feb 2019 13:54:49 +0100
> schrieb Manfred <non...@add.invalid>:
>
>> On 2/19/2019 12:03 PM, Bart wrote:
>>>> Unsigned overflow wrapping is usually a mistake too, but it is
>>>> perhaps a little more useful than with signed.
>>>
>>> That's the question that should be asked: what's so special about
>>> unsigned over signed that overflow is tolerated with one but that
>>> the other.
>>
>> Trivial observation, but that answers the question: in unsigned types
>> the first bit is a digit bit. In signed types it is a sign bit.
>> This makes the unsigned type suitable for bit-oriented arithmetic,
>> that is obviously not the same as usual arithmetic, in which wrapping
>> from 0xFFFFFFFF to 0x00000000 can make sense.
>
> Does that really matter? If I have an unsigned short set to its maximum
> possible value (32767 here) then add 1 I get -32768.

Fwiw, if I allowed my ct_rwmutex algorithm to break the bounds of a long
wrt -LONG_MAX and LONG_MAX, it would not work at all. LONG_MAX + 1, or
-LONG_MAX - 1, would ruin everything! UB aside for a moment... It would
screw up the counting, and things would start to deadlock all over the
place.

So in my algorithm, there is no overflow or underflow, by design. It is
a key aspect that makes the algorithm correct. This boils down to the
reason why my algorithm cannot deal with _more_ than LONG_MAX readers
hitting it at the same time.

David Brown

unread,
Feb 20, 2019, 6:27:07 AM2/20/19
to
I did not mean to imply that you used those words - I meant it as a
general sort of remark that I have seen made many times. Some people
are convinced that compiler makers are concerned only about the speed of
benchmarks, not how their compiler works for other code.

>> 1. Those working for chip manufacturers - Intel, AMD, ARM, etc. These
>> don't care if you use gcc or anything else, as long as you buy their
>> chips, so their concern is that you (the programmer) get the best from
>> the compiler and their chip. Benchmarks for the compiler don't matter -
>> support for the chip matters.
>
> The market statements are not mostly about performance of those chips
> and the likes of "Meltdown and Spectre" are not result of overeager
> optimizing of those and even when it seems to be so then nothing of it
> is measured with benchmarks compiled on those compilers?

Meltdown and Spectre are both /hardware/ problems. Can we at least
agree on that? Compilers have been able to add additional code to
reduce the effect of the problems - but this is /not/ because the
compilers were previously generating incorrect or suspect code.

(And while it's easy to see in hindsight that Intel should have been
more careful with speculation across privilege levels, it was harder to
see that in advance. There is no way to make a system where code from
multiple sources can run and share resources, without information
leakage at least in theory.)


>
>> 2. Those working for software companies like Red Hat, IBM, etc., that
>> provide tools and services to developers. They want programmers to be
>> happy with the tools - they don't care if you use a different compiler
>> instead.
>
> The programmers they target are not overly concerned with performance?

Of course performance is important to many people - compiler writers and
compiler users included. Performance on /benchmarks/ is of smaller
concern, especially of artificial benchmarks. Programmers want /their/
programs to run fast.

> My impression is that programmers start to post about performance before
> figuring out how to turn the optimizations on.
>

It is often the case that people who understand less, post more. (That
is in no way a comment about anyone posting here, in case you feel I was
trying to be insulting.)

>> 3. Those working for big users, like Google and Facebook. They don't
>> care about benchmarks - they care about performance on their own software.
>
> They do not use compilers to build their software and so do not care about
> compiler optimizations?

I really don't understand your point. I said they /do/ care about
performance - and therefore also about compiler optimisations.

>
>> 4. The independent and volunteer developers. They care about the
>> quality of their code, and making something worthwhile - they don't care
>> about benchmark performances.
>
> Again there are developers who don't care about performance?


I think you have been misunderstanding my point all along here. I
apologise if I have been unclear.

Yes, compiler writers care about performance and optimisations. I have
said this all along.

No, compiler writers are /not/ solely concerned about performance and
optimisations. It is merely one of many factors. The same applies to
people choosing compilers.

And it is definitely not the case that compiler writers care /primarily/
about the speed on artificial benchmark programs (like SPEC benchmarks).
They do not write their compilers with an aim of getting top marks on
these benchmarks. That does not mean they don't run them sometimes, and
compare figures (especially if there are regressions). It just means
that they do not use them as driving forces.

It is not the case that compiler writers are always looking for new ways
to take undefined behaviour in source code and turn it into a mess in
the object code. Rather, it /is/ the case that compiler writers have no
responsibility to try to generate "correct" results from incorrect
source code. They have /never/ had that responsibility. From the days
of the first C compilers 40+ years ago, signed integer overflow has been
undefined behaviour and compilers could assume it never happened if that
helped optimisation. Code that assumes wrapping behaviour is either
tied to a specific compiler (if the compiler documents that behaviour),
or it is /wrong/. As compilers have got more sophisticated, there have
been more occasions where incorrect code had predictable behaviour in
the past. That is why compiler writers add extra "this code has
mistakes, but worked with the behaviour of the old compiler" flags like
"-fwrapv" or "-fno-strict-aliasing".


>
>> I'm sure there are other categories that you can think of. I can't see
>> any significant number being benchmark oriented. People don't choose
>> compilers because of their benchmarks - they choose for features, target
>> support, static checking, language support, compatibility with existing
>> source code, etc. They expect a gradual progress towards faster code
>> with newer versions, but not more than that. And those that pick a
>> compiler for its speed, do so based on the speed for their own source
>> code, not for some benchmark.
>
> However it seems that there are only few weirdos like me who think that
> it does not matter how fast the wrong answers are calculated and consider
> it better when those wrong answers are not calculated at all.
>

/I/ don't care how fast wrong answers are calculated. I just don't
expect compilers to calculate right answers from wrong code. And I do
care how fast right answers are calculated from correct code.

>> Like all conspiracy theories, the best way to test it is to follow the
>> money. Who would profit from making compilers focused on benchmark
>> performance as the main goal, with a disregard for support for existing
>> C or C++ sources?
>
> What conspiracy theory? Where did I say that they disregard support for
> existing source code?

You wrote "Since compiler writers are people with extremely
benchmark-oriented head shape" and then suggested it is because that's
what their employers tell them to do - /that/ is the "conspiracy theory".

> If to follow money then Google, Apple, Microsoft
> and Oracle have tons of own C and C++ source code that they want to
> be performant and don't want to break but they want developers to use
> Go, Swift, C# or Java. So indeed they might want to "extend" crappy
> "features" and "optimizations" into C++ that they won't ever use in
> and with their own code.

You think Microsoft would intentionally introduce broken features or
mistakes into their C++ compilers to push people towards using C# ?

>
>>> Both hardware makers and compiler writers kept optimizing until it
>>> was "overly optimal".
>>>
>>
>> That again is simply incorrect.
>>
>> Developers - hardware or software - can make mistakes, and release a
>> design which later turns out to have unforeseen flaws. With software,
>> you can often find these early and fix them, but sometimes the flaws are
>> discovered quite late. Hardware flaws are harder to fix - but very easy
>> for amateurs to condemn once they are found.
>
> The cache, branch prediction and speculative execution are performance
> optimizations piled together. That can be tricky to get such a pile
> correct and if to prioritize correctness below performance then defects
> slip through. Same things do happen with compiler optimizations.

My understanding of the hardware problems here is that the operation was
perfectly correct - but there was information leakage. A certain degree
of information leakage is inevitable when there are shared resources,
but this situation could be exploited more conveniently than many others.

Compiler optimisations are a different game altogether. The main
similarity seems to be "they are both hard, so someone might make a
mistake".

>
>>>> Accusations that compiler writers are only interested in maximum
>>>> benchmark speeds are absurd. gcc, for example, tests compilation on the
>>>> entire Debian repository - vast quantities of code, much of which is
>>>> ancient.
>>>
>>> Where I wrote "only"? Sure, they have to care of other things like to
>>> keep backwards compatibility. Also if code of popular benchmark
>>> contains UB then they have to avoid "optimizing" it, to keep the result
>>> "correct".
>>
>> They sometimes have to make such adjustments, yes. Often that is
>> because they realise that not only do the benchmark writers make such
>> mistakes, but others do too - and that it can be helpful to treat such
>> code in the manner the programmer appeared to expect. But for most
>> undefined behaviour, it is hard or impossible to guess what the
>> programmer expected - that is the nature of undefined behaviour.
>
> What is so controversial what is the behavior that programmer expects
> on case of -ftrapv? Fortunately has been obvious that I want
> division-by-zero to trap (even on MIPS and ARM, without special
> compiler options) but that might also change out of blue when
> a way to "optimize" it will be discovered, and then we need to
> add some -fplease-dont-remove-divide-by-zero-trap I suspect.

"-ftrapv" was somewhat poorly conceived and specified in the first place
(IMHO), and has only got worse. Originally compilers were relatively
straightforward - if your code said "x = y + z;", you'd get an "add"
assembler instruction. "-ftrapv" just meant adding a "trap if overflow
flag set" instruction after it. Then things got complicated - what
about "x = y * 5", that can be done with a single "lea" instruction (on
x86)? What about "x = y + z - y;" - do you simplify without trapping,
or do you detect overflows on the intermediary values? The original
idea was to have a cheap and simple way to spot overflow errors, but the
practice is a good deal more complicated, and subject to a lot of
opinions as to what kinds of checks should be done.

So if I wanted code to check for overflows, I'd do it specifically in
the situations where it was appropriate - using __builtin_add_overflow
and friends, or more manual checking (using C++ classes to keep it all
neat). And for debugging or testing, -fsanitize=undefined is a good choice.

>
>>>> What compiler writers do not do, however, is limit their handling of
>>>> well-written code because some people write poor code. Instead, they
>>>> provide flags to support those that have code that relies on particular
>>>> handling of undefined behaviours, or code that "worked fine on my old
>>>> compiler" - flags like "-fwrapv" and "-fno-strict-aliasing". Then
>>>> people who know how to integer arithmetic works in C and C++ can get
>>>> faster code, and people who can't get it right (or who have to use
>>>> broken code from others) have an escape route.
>>>>
>>>>>
>>>>> Without -ftrapv or -fwrapv all bets are off and one can get full set of
>>>>> nasal demons on signed integer overflow on those compilers.
>>>>>
>>>>
>>>> Of course all bets are off if you have signed integer overflow - this
>>>> should well known to anyone who has learned C or C++ programming. It is
>>>> not the compiler that launches nasal daemons - it is the programmer,
>>>> when they write code that does not make sense in the language.
>>>>
>>>> "-ftrapv", by the way, has been considered a poor and unreliable
>>>> solution for a good many years - "-fsanitize=signed-integer-overflow" is
>>>> what you want.
>>>
>>> Why did it become "poor" and "unreliable"? What was the reasoning?
>>> Is it OK to release software with "poor" and "unreliable" features?
>>
>> Many bugs have been found in the "-ftrapv" implementation - and in
>> particular, it does not trap in all cases. Personally, I think the flag
>> should be dropped in favour of the sanitizer, which is a more modern and
>> flexible alternative and which is actively maintained.
>
> Sanitizers sound like debugging options.

Yes.

> Why two almost equal features
> are developed into same tool?

-ftrapv is a debugging tool. It is not really appropriate for final
binaries - it is too unpredictable, and plays havoc with code
optimisation. In particular, expressions involving several parts are
either significantly slower as each bit needs to be handled separately,
or you don't get full overflow checking. (I have no idea where the
lines are drawn in practice in gcc.)

> With such logic one day the
> -fsanitize=signed-integer-overflow will also become "poor"
> and "unreliable" and then some third feature will be the "correct"
> way to make programs to crash on signed integer overflow. With
> feature creep after a while nothing is reliable.
>

There is /no/ "correct way to make programs crash on undefined
behaviour". There never has been, and never will be. The best you can
get are tools to help you find the bugs in your program when testing and
debugging. Maybe one day other tools will replace the sanitize options.

Options like "-ftrapv" have never been about changing the semantics of C
to give a defined behaviour to signed integer overflow. (This is
different from -fwrapv.) If you want to complain about poor or
misleading documentation in the gcc manual page, then I can agree with that.

C and C++ are languages which trust the programmer - they expect you to
write code that is correct and will not misbehave. They have little or
no checking unless you add it manually, which is how you can get
efficient code. But there is nothing hindering you from manual checks -
especially in C++ where you can make classes with operators to simplify
the process. (Yes, I know that means you can get overflow checking in
your own code, but can't easily add it to existing code.)

>>> When incorrect answer is worse than no answer then normal people
>>> want defects to crash in release too.
>>
>> "-ftrapv" could has always been slower than non-trapping code. People
>> usually aim to right correct code, and have that correct code run as
>> fast as reasonably possible. If you want software that is full of
>> run-time checks, you don't program in C or C++.
>
> See? We have modern, branch-predicting and eagerly executing hardware
> with megabytes of cache but reasonability of -ftrapv usage is questionable.
> Sure, it is not for omnipotent programmers, without even measuring what
> it costs but I am fallible and have never met anyone almighty.
> How is it so self-evident that -ftrapv is unreasonable option?
> Correctness of behavior is more important than performance of behavior
> and incorrect behavior is often worse than no behavior whatsoever.
>

"-ftrapv" does not turn incorrect behaviour into correct behaviour. At
best, it helps you spot the incorrect behaviour so that you can fix the
program. (And that's a useful thing for testing and debugging.)

>> In C and C++, you can always manually add any checks you want. With
>> C++, you can make your own types that do checking in the manner that
>> suits your needs.
>
> Why? I agree that signed integer overflow is programming error and
> so if the integer calculations happen to be important then the -ftrapv
> handles it correctly (when it happens to work).
> Also how can I be sure that they don't "optimize" my manual checks
> away? After a while it can be like with null pointers. One optimizes
> dereference away, other optimizes null pointer check away (since it
> was "after" dereference) and result is that neither dereference does
> crash nor check work.
>

You can be sure it works by writing the code correctly. The "null
pointer" incident you are alluding to was a serious bug in the Linux
source code - changes to the compiler optimisation affected the
consequences of the bug, but did not introduce the bug. Compilers
assume that the programmer obeys the rules of the language - including
the "thou shalt not dereference a null pointer" rule.

Öö Tiib

unread,
Feb 21, 2019, 4:25:10 AM2/21/19
to
On Tuesday, 19 February 2019 22:55:10 UTC+2, Jorgen Grahn wrote:
> On Tue, 2019-02-19, Öö Tiib wrote:
> > On Monday, 18 February 2019 20:56:42 UTC+2, David Brown wrote:
> ...
> >> Many bugs have been found in the "-ftrapv" implementation - and in
> >> particular, it does not trap in all cases. Personally, I think the flag
> >> should be dropped in favour of the sanitizer, which is a more modern and
> >> flexible alternative and which is actively maintained.
> >
> > Sanitizers sound like debugging options.
>
> But that's what you want, isn't it?

I consider it as free sanity check in all signed integer calculations.
When correctness of those calculations is of importance then I would
leave -ftrapv on forever.

Lets say in summer we have product with 0% crashes per usage. Then
someone of our team (me) maintains some integer arithmetic incorrectly
and since the defect manifests rarely it reaches users in fall.

a) With -ftrapv our automated crash reporting shows 0.2% crashes per
usage. The patch is up within a week.

b) Without it 0.2% runs result with incorrect answers. Those are
harder to notice (and in complex whole system the reasons can be
also harder to find/blame). The defect will sit in product until spring.

When the results of calculations matter to anything then I prefer a).

Paavo Helde

unread,
Feb 21, 2019, 4:56:48 AM2/21/19
to
On 21.02.2019 11:24, Öö Tiib wrote:
> On Tuesday, 19 February 2019 22:55:10 UTC+2, Jorgen Grahn wrote:
>> On Tue, 2019-02-19, Öö Tiib wrote:
>>> On Monday, 18 February 2019 20:56:42 UTC+2, David Brown wrote:
>> ...
>>>> Many bugs have been found in the "-ftrapv" implementation - and in
>>>> particular, it does not trap in all cases. Personally, I think the flag
>>>> should be dropped in favour of the sanitizer, which is a more modern and
>>>> flexible alternative and which is actively maintained.
>>>
>>> Sanitizers sound like debugging options.
>>
>> But that's what you want, isn't it?
>
> I consider it as free sanity check in all signed integer calculations.
> When correctness of those calculations is of importance then I would
> leave -ftrapv on forever.
>
> Lets say in summer we have product with 0% crashes per usage. Then
> someone of our team (me) maintains some integer arithmetic incorrectly
> and since the defect manifests rarely it reaches users in fall.
>
> a) With -ftrapv our automated crash reporting shows 0.2% crashes per
> usage. The patch is up within a week.
>
> b) Without it 0.2% runs result with incorrect answers.

This does not follow. As pointed out by others, the intermediate
overflows can still produce correct results when balanced with opposite
overflows or cast to unsigned.

Based on my limited experience, over half of -ftrapv crashes would be
false alarms. Depends on the codebase, of course.

Now it would be up to the management to decide whether the problems
caused by false alarms on customer sites justify the earlier catching of
non-false alarms.

David Brown

unread,
Feb 21, 2019, 7:26:35 AM2/21/19
to
On 21/02/2019 10:24, Öö Tiib wrote:
> On Tuesday, 19 February 2019 22:55:10 UTC+2, Jorgen Grahn wrote:
>> On Tue, 2019-02-19, Öö Tiib wrote:
>>> On Monday, 18 February 2019 20:56:42 UTC+2, David Brown wrote:
>> ...
>>>> Many bugs have been found in the "-ftrapv" implementation - and in
>>>> particular, it does not trap in all cases. Personally, I think the flag
>>>> should be dropped in favour of the sanitizer, which is a more modern and
>>>> flexible alternative and which is actively maintained.
>>>
>>> Sanitizers sound like debugging options.
>>
>> But that's what you want, isn't it?
>
> I consider it as free sanity check in all signed integer calculations.

But it is not free - not /remotely/ free. Please, go to
<https://godbolt.org> and try some samples with optimisation on (-O1 or
-O2), with and without -"ftrapv" and
"-fsanitize=signed-integer-overflow". Let me give you a couple of examples:

int foo1(int a, int b) {
return 2 * a + b;
}

int foo2(int a, int b) {
return 2 * a + b - a;
}

(Yes, I know people don't write code like in foo2 directly - but such
things can easily occur due to macros, inlining, constant folding,
templates, etc.)

This is the normal code (for x86) :

foo1:
lea eax, [rsi+rdi*2]
ret
foo2:
lea eax, [rdi+rsi]
ret

This is the code with -ftrapv :


foo1:
push rbp
mov ebp, esi
mov esi, 2
call __mulvsi3
mov esi, ebp
mov edi, eax
call __addvsi3
pop rbp
ret
foo2:
push r12
mov r12d, esi
mov esi, 2
push rbp
mov ebp, edi
sub rsp, 8
call __mulvsi3
mov esi, r12d
mov edi, eax
call __addvsi3
mov esi, ebp
mov edi, eax
call __subvsi3
add rsp, 8
pop rbp
pop r12
ret

This is the code with -fsanitize=signed-integer-overflow :

foo1:
push rbp
push rbx
mov ebx, esi
sub rsp, 24
imul ebp, edi, 2
jo .L8
.L2:
mov eax, ebx
add eax, ebp
jo .L9
.L4:
add rsp, 24
pop rbx
pop rbp
ret
.L8:
movsx rsi, edi
mov edx, 2
mov edi, OFFSET FLAT:.Lubsan_data0
call __ubsan_handle_mul_overflow
jmp .L2
.L9:
movsx rdx, ebp
movsx rsi, ebx
mov edi, OFFSET FLAT:.Lubsan_data1
mov DWORD PTR [rsp+12], eax
call __ubsan_handle_add_overflow
mov eax, DWORD PTR [rsp+12]
jmp .L4
foo2:
push r13
push r12
push rbp
mov ebp, esi
push rbx
mov ebx, edi
sub rsp, 24
imul r13d, edi, 2
jo .L18
.L11:
mov r12d, ebp
add r12d, r13d
jo .L19
.L13:
mov eax, r12d
sub eax, ebx
jo .L20
.L15:
add rsp, 24
pop rbx
pop rbp
pop r12
pop r13
ret
.L18:
movsx rsi, edi
mov edx, 2
mov edi, OFFSET FLAT:.Lubsan_data2
call __ubsan_handle_mul_overflow
jmp .L11
.L20:
movsx rdx, ebx
movsx rsi, r12d
mov edi, OFFSET FLAT:.Lubsan_data4
mov DWORD PTR [rsp+12], eax
call __ubsan_handle_sub_overflow
mov eax, DWORD PTR [rsp+12]
jmp .L15
.L19:
movsx rdx, r13d
movsx rsi, ebp
mov edi, OFFSET FLAT:.Lubsan_data3
call __ubsan_handle_add_overflow
jmp .L13




In what world are these checks "free" ?

The "sanitize" version is significantly better than the "-ftrapv"
version in that it has short paths when no overflows occur. You could
ask why the "-ftrapv" code is not similar - the answer, I think, is that
"-ftrapv" simply has not received much care or attention from gcc
developers for a very long time.

> When correctness of those calculations is of importance then I would
> leave -ftrapv on forever.
>

In my world, a program that crashes with a message "overflow detected"
is /not/ correct. It is merely broken in a different way from one that
gets the calculation wrong. Whether this is better or worse depends on
the circumstances, but it is /definitely/ not a correct result.

> Lets say in summer we have product with 0% crashes per usage. Then
> someone of our team (me) maintains some integer arithmetic incorrectly
> and since the defect manifests rarely it reaches users in fall.
>
> a) With -ftrapv our automated crash reporting shows 0.2% crashes per
> usage. The patch is up within a week.
>
> b) Without it 0.2% runs result with incorrect answers. Those are
> harder to notice (and in complex whole system the reasons can be
> also harder to find/blame). The defect will sit in product until spring.
>
> When the results of calculations matter to anything then I prefer a).
>

-ftrapv (or, preferably, -fsanitize=signed-integer-overflow) can be a
useful option for testing and debugging. But it is an aid to finding
bugs in the program - it does /not/ improve correctness, it is of no
help in a deployed system (unless you could post-mortems), and it is of
very significant cost.

Use tools for their appropriate purpose. Take your car to the garage
when you have a problem with it - don't tow a caravan behind your car
with a couple of mechanics and a bootload of tools on the off-chance
that your car breaks down on the journey.

Öö Tiib

unread,
Feb 21, 2019, 12:35:39 PM2/21/19
to
On Thursday, 21 February 2019 14:26:35 UTC+2, David Brown wrote:
> On 21/02/2019 10:24, Öö Tiib wrote:
>
> In what world are these checks "free" ?

Free in sense that I do not have to write and to use classes that do
same thing and likely even less efficiently than those two options.

> In my world, a program that crashes with a message "overflow detected"
> is /not/ correct. It is merely broken in a different way from one that
> gets the calculation wrong. Whether this is better or worse depends on
> the circumstances, but it is /definitely/ not a correct result.

I have repeatedly expressed my lack of knowledge about any beings in
this universe who can write correct programs. Only "programmers" who
do not write incorrect programs are those who do not write programs.

>
> > Lets say in summer we have product with 0% crashes per usage. Then
> > someone of our team (me) maintains some integer arithmetic incorrectly
> > and since the defect manifests rarely it reaches users in fall.
> >
> > a) With -ftrapv our automated crash reporting shows 0.2% crashes per
> > usage. The patch is up within a week.
> >
> > b) Without it 0.2% runs result with incorrect answers. Those are
> > harder to notice (and in complex whole system the reasons can be
> > also harder to find/blame). The defect will sit in product until spring.
> >
> > When the results of calculations matter to anything then I prefer a).
> >
>
> -ftrapv (or, preferably, -fsanitize=signed-integer-overflow) can be a
> useful option for testing and debugging. But it is an aid to finding
> bugs in the program - it does /not/ improve correctness, it is of no
> help in a deployed system (unless you could post-mortems), and it is of
> very significant cost.
>
> Use tools for their appropriate purpose. Take your car to the garage
> when you have a problem with it - don't tow a caravan behind your car
> with a couple of mechanics and a bootload of tools on the off-chance
> that your car breaks down on the journey.

So you would prefer b) or have some kind of c) that results with something
better in that situation or what you were trying to say?

Öö Tiib

unread,
Feb 21, 2019, 12:50:18 PM2/21/19
to
Ok, fair enough. Lets say a) 0.2% crash or b) 0.05% of runs result with
incorrect answers. Is it easier to decide that way?

David Brown

unread,
Feb 21, 2019, 4:58:51 PM2/21/19
to
On 21/02/2019 18:35, Öö Tiib wrote:
> On Thursday, 21 February 2019 14:26:35 UTC+2, David Brown wrote:
>> On 21/02/2019 10:24, Öö Tiib wrote:
>>
>> In what world are these checks "free" ?
>
> Free in sense that I do not have to write and to use classes that do
> same thing and likely even less efficiently than those two options.
>

Classes could be more flexible and more efficient (certainly more
efficient than the -ftrapv code), and give you accurate control of when
you use overflow detection instead of an "all or nothing" choice. But I
agree that it takes effort to make and use them.

>> In my world, a program that crashes with a message "overflow detected"
>> is /not/ correct. It is merely broken in a different way from one that
>> gets the calculation wrong. Whether this is better or worse depends on
>> the circumstances, but it is /definitely/ not a correct result.
>
> I have repeatedly expressed my lack of knowledge about any beings in
> this universe who can write correct programs. Only "programmers" who
> do not write incorrect programs are those who do not write programs.
>

It is entirely possible to eliminate large classes of bugs - at least on
certain kinds of programming. For most of my work, I know that my code
has /no/ bugs with dynamic memory allocation or freeing. It has /no/
bugs on signed integer overflow. It never has divide by zeros, or
shifts of negative numbers. It does not have code that accidentally
uses variables without initialising them. I can't promise I have no
out-of-bounds array accesses, but they should be extraordinarily rare -
as will buffer overflows.

And even if you assume that it is plausible that you have signed integer
overflow, how do you think -ftrapv improves matters? Clearly it could
be useful in testing and debugging - that is what it is for. But in
released code, who would be happy with a crash and an overflow error
message?

I could understand this more if we were talking about a type of error
that is harder to avoid, or harder to catch in testing - things like
race conditions and deadlocks in multi-threaded code. But it is /easy/
to avoid signed integer overflow. It is /easy/ to write your code in a
way that it can't happen, or that can detect it appropriately and
respond sensibly, rather than crashing.

>>
>>> Lets say in summer we have product with 0% crashes per usage. Then
>>> someone of our team (me) maintains some integer arithmetic incorrectly
>>> and since the defect manifests rarely it reaches users in fall.
>>>
>>> a) With -ftrapv our automated crash reporting shows 0.2% crashes per
>>> usage. The patch is up within a week.
>>>
>>> b) Without it 0.2% runs result with incorrect answers. Those are
>>> harder to notice (and in complex whole system the reasons can be
>>> also harder to find/blame). The defect will sit in product until spring.
>>>
>>> When the results of calculations matter to anything then I prefer a).
>>>
>>
>> -ftrapv (or, preferably, -fsanitize=signed-integer-overflow) can be a
>> useful option for testing and debugging. But it is an aid to finding
>> bugs in the program - it does /not/ improve correctness, it is of no
>> help in a deployed system (unless you could post-mortems), and it is of
>> very significant cost.
>>
>> Use tools for their appropriate purpose. Take your car to the garage
>> when you have a problem with it - don't tow a caravan behind your car
>> with a couple of mechanics and a bootload of tools on the off-chance
>> that your car breaks down on the journey.
>
> So you would prefer b) or have some kind of c) that results with something
> better in that situation or what you were trying to say?
>

Sorry - it looks like you have cut part of your answer here. I am not
sure what you meant to write.

Rosario19

unread,
Feb 22, 2019, 5:04:59 AM2/22/19
to
On Sun, 17 Feb 2019 20:23:13 -0800, "Chris M. Thomasson"
<invalid_chris_t...@invalid.com> wrote:

>There is a sort of funny discussion going on over on reddit about my
>read/write mutex. Some people think is has UB wrt an integer overflow or
>underflow. I boiled it down to the following simple program:
>____________________________________
>#include <iostream>
>#include <climits>
>
>
>long fetch_add(long& gcount, long addend)
>{
> long lcount = gcount;
> gcount += addend;
> return lcount;
>}

the behaviour can not be definited because long is not a fixed size
type is it 32 bit? is it 64? than even if it is fixed to 64 bit for
example: is the operation + defined from standard to all value in the
range over is defined?

Öö Tiib

unread,
Feb 22, 2019, 6:05:52 AM2/22/19
to
On Thursday, 21 February 2019 23:58:51 UTC+2, David Brown wrote:
> On 21/02/2019 18:35, Öö Tiib wrote:
> > On Thursday, 21 February 2019 14:26:35 UTC+2, David Brown wrote:
> >> On 21/02/2019 10:24, Öö Tiib wrote:
> >>
> >> In what world are these checks "free" ?
> >
> > Free in sense that I do not have to write and to use classes that do
> > same thing and likely even less efficiently than those two options.
> >
>
> Classes could be more flexible and more efficient (certainly more
> efficient than the -ftrapv code), and give you accurate control of when
> you use overflow detection instead of an "all or nothing" choice. But I
> agree that it takes effort to make and use them.

That paper https://www.cs.utah.edu/~regehr/papers/overflow12.pdf
tells that they found 43 sites of undefined integer overflows in
Microsoft SafeInt library (that is supposed to be such classes and
functions).

> >> In my world, a program that crashes with a message "overflow detected"
> >> is /not/ correct. It is merely broken in a different way from one that
> >> gets the calculation wrong. Whether this is better or worse depends on
> >> the circumstances, but it is /definitely/ not a correct result.
> >
> > I have repeatedly expressed my lack of knowledge about any beings in
> > this universe who can write correct programs. Only "programmers" who
> > do not write incorrect programs are those who do not write programs.
> >
>
> It is entirely possible to eliminate large classes of bugs - at least on
> certain kinds of programming. For most of my work, I know that my code
> has /no/ bugs with dynamic memory allocation or freeing. It has /no/
> bugs on signed integer overflow. It never has divide by zeros, or
> shifts of negative numbers. It does not have code that accidentally
> uses variables without initialising them. I can't promise I have no
> out-of-bounds array accesses, but they should be extraordinarily rare -
> as will buffer overflows.
>
> And even if you assume that it is plausible that you have signed integer
> overflow, how do you think -ftrapv improves matters? Clearly it could
> be useful in testing and debugging - that is what it is for. But in
> released code, who would be happy with a crash and an overflow error
> message?

Everybody. The sole correct handling of programming defects is to fix
those. Crashes get fixed fastest and so the total count of that defect
ever manifesting in practice will be smallest.

> I could understand this more if we were talking about a type of error
> that is harder to avoid, or harder to catch in testing - things like
> race conditions and deadlocks in multi-threaded code. But it is /easy/
> to avoid signed integer overflow. It is /easy/ to write your code in a
> way that it can't happen, or that can detect it appropriately and
> respond sensibly, rather than crashing.

Misbehavior on case of unusually large numbers for example because
of corrupt data can be quite tricky to test and quite easy to miss.

>
> >>
> >>> Lets say in summer we have product with 0% crashes per usage. Then
> >>> someone of our team (me) maintains some integer arithmetic incorrectly
> >>> and since the defect manifests rarely it reaches users in fall.
> >>>
> >>> a) With -ftrapv our automated crash reporting shows 0.2% crashes per
> >>> usage. The patch is up within a week.
> >>>
> >>> b) Without it 0.2% runs result with incorrect answers. Those are
> >>> harder to notice (and in complex whole system the reasons can be
> >>> also harder to find/blame). The defect will sit in product until spring.
> >>>
> >>> When the results of calculations matter to anything then I prefer a).
> >>>
> >>
> >> -ftrapv (or, preferably, -fsanitize=signed-integer-overflow) can be a
> >> useful option for testing and debugging. But it is an aid to finding
> >> bugs in the program - it does /not/ improve correctness, it is of no
> >> help in a deployed system (unless you could post-mortems), and it is of
> >> very significant cost.
> >>
> >> Use tools for their appropriate purpose. Take your car to the garage
> >> when you have a problem with it - don't tow a caravan behind your car
> >> with a couple of mechanics and a bootload of tools on the off-chance
> >> that your car breaks down on the journey.
> >
> > So you would prefer b) or have some kind of c) that results with something
> > better in that situation or what you were trying to say?
> >
>
> Sorry - it looks like you have cut part of your answer here. I am not
> sure what you meant to write.

Yes. I am still confused, it does read like you are not discussing the
practical example above that I gave (and to what you seem to respond
with above).

David Brown

unread,
Feb 22, 2019, 7:31:52 AM2/22/19
to
On 22/02/2019 12:05, Öö Tiib wrote:
> On Thursday, 21 February 2019 23:58:51 UTC+2, David Brown wrote:
>> On 21/02/2019 18:35, Öö Tiib wrote:
>>> On Thursday, 21 February 2019 14:26:35 UTC+2, David Brown wrote:
>>>> On 21/02/2019 10:24, Öö Tiib wrote:
>>>>
>>>> In what world are these checks "free" ?
>>>
>>> Free in sense that I do not have to write and to use classes that do
>>> same thing and likely even less efficiently than those two options.
>>>
>>
>> Classes could be more flexible and more efficient (certainly more
>> efficient than the -ftrapv code), and give you accurate control of when
>> you use overflow detection instead of an "all or nothing" choice. But I
>> agree that it takes effort to make and use them.
>
> That paper https://www.cs.utah.edu/~regehr/papers/overflow12.pdf
> tells that they found 43 sites of undefined integer overflows in
> Microsoft SafeInt library (that is supposed to be such classes and
> functions).
>

What can I say? You are right that most software has bugs - and that
many projects don't have a development methodology that is strong enough
to stop such preventable bugs. Integer overflows are just bugs like
anything else - but they are preventable bugs as long as you take
reasonable care.

As for the particular case of MS's "SafeInt" library - let me just say
that finding 43 bugs in it does not harm their reputation in my eyes.

I am only familiar with one name on that paper - John Regehr. I have
read other publications by him. He seems to make a living from making
condescending and sarcastic criticisms of C and C++ without really
appreciating the point of the languages. His is particularly critical
of compilers that support experienced developers and assume the
programmers actually understand the language they are using - he appears
to think C programmers believe the language is just Java without
classes, and compilers should behave accordingly.

Still, the paper had some interesting points. One surprising case is
the expression "char c = CHAR_MAX; c++;" which they say varies according
to compiler. "The question is: Does c get promoted to int before being
incremented? If so, the behaviour is well-defined. We found
disagreement between compiler vendors' implementations of this
construct". The C standards are quite clear on the matter - "c++;" has
the result of the original value of c, and the side effect of "c = c +
1;", for which c is first promoted to "int". If CHAR_MAX = INT_MAX,
this is undefined - otherwise it is well defined. Converting back to
"char" is implementation-defined if "char" is signed, well-defined
modulo conversion if it is unsigned.


The paper rightly points out that some programmers /don't/ understand
that that signed integer overflow is undefined behaviour - or that they
"know" the rules but think it is safe to break them and assume wrapping
behaviour. (Typically this is because they have tested some code and
seen that it works - for that code, with that compiler version, and
those flags.) This is one reason why gcc has the "-fwrapv" flag - if
you are faced with third-party code and you are not sure the programmer
understands that signed integers do not wrap, you can use the "-fwrapv"
flag. If the code is good, it will make no difference to the results
(but have some impact on the efficiency). If the code is bad, it will
run the way the programmer expected.


>>>> In my world, a program that crashes with a message "overflow detected"
>>>> is /not/ correct. It is merely broken in a different way from one that
>>>> gets the calculation wrong. Whether this is better or worse depends on
>>>> the circumstances, but it is /definitely/ not a correct result.
>>>
>>> I have repeatedly expressed my lack of knowledge about any beings in
>>> this universe who can write correct programs. Only "programmers" who
>>> do not write incorrect programs are those who do not write programs.
>>>
>>
>> It is entirely possible to eliminate large classes of bugs - at least on
>> certain kinds of programming. For most of my work, I know that my code
>> has /no/ bugs with dynamic memory allocation or freeing. It has /no/
>> bugs on signed integer overflow. It never has divide by zeros, or
>> shifts of negative numbers. It does not have code that accidentally
>> uses variables without initialising them. I can't promise I have no
>> out-of-bounds array accesses, but they should be extraordinarily rare -
>> as will buffer overflows.
>>
>> And even if you assume that it is plausible that you have signed integer
>> overflow, how do you think -ftrapv improves matters? Clearly it could
>> be useful in testing and debugging - that is what it is for. But in
>> released code, who would be happy with a crash and an overflow error
>> message?
>
> Everybody. The sole correct handling of programming defects is to fix
> those. Crashes get fixed fastest and so the total count of that defect
> ever manifesting in practice will be smallest.
>

Wrong.

An alternative handling of defects is to minimise the damage, to
maximise the use of the imperfect program. It is a well established
"fact" that all complex programs have bugs in them. If they were all to
stop with an error report on each bug, the software would be useless
until all bugs were eliminated. That might be nice in a perfect world -
in the real world, however, it would mean waiting decades for testing
and development to finish on big systems. People complain about Windows
- imagine how much more they would complain if it blue-screened for
every minor bug. And imagine the "fun" if your aeroplane flight
controller stopped with an error message because of a bug in a function
for a warning light.

Even during testing, stopping dead on a bug is not necessarily the best
choice - it may be better to collect more data first.

>> I could understand this more if we were talking about a type of error
>> that is harder to avoid, or harder to catch in testing - things like
>> race conditions and deadlocks in multi-threaded code. But it is /easy/
>> to avoid signed integer overflow. It is /easy/ to write your code in a
>> way that it can't happen, or that can detect it appropriately and
>> respond sensibly, rather than crashing.
>
> Misbehavior on case of unusually large numbers for example because
> of corrupt data can be quite tricky to test and quite easy to miss.
>

If your data comes from a source where it could be bad, then you check it.
I think the direction of the discussion has slid somewhat (no doubt it
is my fault), and we may not be discussing the same things. Perhaps it
is time to wind it down a bit?

Jorgen Grahn

unread,
Feb 22, 2019, 7:33:11 AM2/22/19
to
On Thu, 2019-02-21, David Brown wrote:
> On 21/02/2019 10:24, Öö Tiib wrote:
>> On Tuesday, 19 February 2019 22:55:10 UTC+2, Jorgen Grahn wrote:
>>> On Tue, 2019-02-19, Öö Tiib wrote:
>>>> On Monday, 18 February 2019 20:56:42 UTC+2, David Brown wrote:
>>> ...
>>>>> Many bugs have been found in the "-ftrapv" implementation - and in
>>>>> particular, it does not trap in all cases. Personally, I think the flag
>>>>> should be dropped in favour of the sanitizer, which is a more modern and
>>>>> flexible alternative and which is actively maintained.
>>>>
>>>> Sanitizers sound like debugging options.
>>>
>>> But that's what you want, isn't it?
>>
>> I consider it as free sanity check in all signed integer calculations.
>
> But it is not free - not /remotely/ free. Please, go to
> <https://godbolt.org> and try some samples with optimisation on (-O1 or
> -O2), with and without -"ftrapv" and
> "-fsanitize=signed-integer-overflow". Let me give you a couple of examples:
...
>
> In what world are these checks "free" ?

Devil's advocate: when you have way more computing power than you
need, you can spend it on things like this.

...
>> When correctness of those calculations is of importance then I would
>> leave -ftrapv on forever.
>>
>
> In my world, a program that crashes with a message "overflow detected"
> is /not/ correct.

But at least you don't get incorrect calculations, which seems to be
his point.

It's not hard to see what you both mean.

Fred.Zwarts

unread,
Feb 22, 2019, 8:41:49 AM2/22/19
to
"David Brown" schreef in bericht news:q4oq3e$nvd$1...@dont-email.me...
>
>...
>
>And imagine the "fun" if your aeroplane flight
>controller stopped with an error message because of a bug in a function
>for a warning light.

I see this argument more often. But imagine the opposite. The airplane
flight controller continues without error message because a bug is not
detected, so nobody is warned by the warning light.

David Brown

unread,
Feb 22, 2019, 10:32:48 AM2/22/19
to
That is vastly better. A missing warning on a plane is a lot better
than the plane stopping mid-flight.

Obviously you want to find the problems in testing and simulation. And
obviously there are other alternatives to handling errors than stopping
with a message, such as going into a "limp mode" or releasing all
control to the backup systems. But it is a clear example why "stop with
an error message" is /not/ a good general solution, and why it can be
better to continue and hope the problem does not get worse.

The worst possible attitude (and I am not suggesting for a moment that
Öö has this attitude) is to think that using "-ftrapv" somehow /fixes/
integer overflow problems or stops them being problems.

David Brown

unread,
Feb 22, 2019, 10:35:51 AM2/22/19
to
On 22/02/2019 13:33, Jorgen Grahn wrote:
> On Thu, 2019-02-21, David Brown wrote:
>> On 21/02/2019 10:24, Öö Tiib wrote:
>>> On Tuesday, 19 February 2019 22:55:10 UTC+2, Jorgen Grahn wrote:
>>>> On Tue, 2019-02-19, Öö Tiib wrote:
>>>>> On Monday, 18 February 2019 20:56:42 UTC+2, David Brown wrote:
>>>> ...
>>>>>> Many bugs have been found in the "-ftrapv" implementation - and in
>>>>>> particular, it does not trap in all cases. Personally, I think the flag
>>>>>> should be dropped in favour of the sanitizer, which is a more modern and
>>>>>> flexible alternative and which is actively maintained.
>>>>>
>>>>> Sanitizers sound like debugging options.
>>>>
>>>> But that's what you want, isn't it?
>>>
>>> I consider it as free sanity check in all signed integer calculations.
>>
>> But it is not free - not /remotely/ free. Please, go to
>> <https://godbolt.org> and try some samples with optimisation on (-O1 or
>> -O2), with and without -"ftrapv" and
>> "-fsanitize=signed-integer-overflow". Let me give you a couple of examples:
> ...
>>
>> In what world are these checks "free" ?
>
> Devil's advocate: when you have way more computing power than you
> need, you can spend it on things like this.
>

In such cases, C or C++ is probably not the language you want. And even
if it /is/ the language you want, you would be better using integer
classes that are checked and where you have clear and user-defined
control of what happens on overflow, rather than using a poorly
specified and imperfectly implemented compiler-specific flag. (Though
apparently you should avoid using MS's SafeInt library!)

> ...
>>> When correctness of those calculations is of importance then I would
>>> leave -ftrapv on forever.
>>>
>>
>> In my world, a program that crashes with a message "overflow detected"
>> is /not/ correct.
>
> But at least you don't get incorrect calculations, which seems to be
> his point.
>

Agreed. But "-ftrapv" is not, as far as I can see, a good way to get that.

> It's not hard to see what you both mean.
>

Yes.

Öö Tiib

unread,
Feb 22, 2019, 10:55:53 AM2/22/19
to
On Friday, 22 February 2019 14:31:52 UTC+2, David Brown wrote:
> On 22/02/2019 12:05, Öö Tiib wrote:
> > On Thursday, 21 February 2019 23:58:51 UTC+2, David Brown wrote:
> >>
> >> And even if you assume that it is plausible that you have signed integer
> >> overflow, how do you think -ftrapv improves matters? Clearly it could
> >> be useful in testing and debugging - that is what it is for. But in
> >> released code, who would be happy with a crash and an overflow error
> >> message?
> >
> > Everybody. The sole correct handling of programming defects is to fix
> > those. Crashes get fixed fastest and so the total count of that defect
> > ever manifesting in practice will be smallest.
> >
>
> Wrong.
>
> An alternative handling of defects is to minimise the damage, to
> maximise the use of the imperfect program. It is a well established
> "fact" that all complex programs have bugs in them.

We do not know where the yet undiscovered defects are and under
what circumstances these manifests and how. None do manifest in
the (normal, stretched and/or constrained) circumstances under what
the software has been tested and used so far. So how to minimize
damage when only thing that we know is that every software contains
defects and damage from wrong output can be large?

> If they were all to
> stop with an error report on each bug, the software would be useless
> until all bugs were eliminated.

You seem to forget that complex software with several potential undefined
behaviors and other defects in it does in practice misbehave very rarely.

> That might be nice in a perfect world -
> in the real world, however, it would mean waiting decades for testing
> and development to finish on big systems.

That scenario is not related to scenarios about what I talk.

> People complain about Windows
> - imagine how much more they would complain if it blue-screened for
> every minor bug.

It actually did crash frequently. I started to write under Windows from 2.0.
Technically it was relatively buggy, unstable and under-documented GUI
toolkit for MS DOS. Developers did hate it but end users liked it and so
I used it.

> And imagine the "fun" if your aeroplane flight
> controller stopped with an error message because of a bug in a function
> for a warning light.

Sure, how should program decide that the defect manifests only as
incoherence in warning light data? There are several better strategies
to achieve stability of critical systems than to ignore sanity checks.

> Even during testing, stopping dead on a bug is not necessarily the best
> choice - it may be better to collect more data first.

How? It did crash exactly on defect and core dump is full of data.

> >> I could understand this more if we were talking about a type of error
> >> that is harder to avoid, or harder to catch in testing - things like
> >> race conditions and deadlocks in multi-threaded code. But it is /easy/
> >> to avoid signed integer overflow. It is /easy/ to write your code in a
> >> way that it can't happen, or that can detect it appropriately and
> >> respond sensibly, rather than crashing.
> >
> > Misbehavior on case of unusually large numbers for example because
> > of corrupt data can be quite tricky to test and quite easy to miss.
> >
>
> If your data comes from a source where it could be bad, then you check it.

Exactly. Large amount of defects are that some corner case check wasn't
written by programmer and so it is not handled and no one did notice.
Raise hands who has not forgotten to add such a check anywhere
in last ten thousand of lines of code they wrote. I see none such hands.
May be. Like I said I did not understand. I am not saying that -ftrapv
makes (inherently defective) programs correct. I just described scenario
with actual product.

james...@alumni.caltech.edu

unread,
Feb 22, 2019, 11:25:56 AM2/22/19
to
On Friday, February 22, 2019 at 10:32:48 AM UTC-5, David Brown wrote:
> On 22/02/2019 14:41, Fred.Zwarts wrote:
> > "David Brown"  schreef in bericht news:q4oq3e$nvd$1...@dont-email.me...
> >>
> >> ...
> >>
> >> And imagine the "fun" if your aeroplane flight
> >> controller stopped with an error message because of a bug in a function
> >> for a warning light.
> >
> > I see this argument more often. But imagine the opposite. The airplane
> > flight controller continues without error message because a bug is not
> > detected, so nobody is warned by the warning light.
>
> That is vastly better. A missing warning on a plane is a lot better
> than the plane stopping mid-flight.

That depends upon what the warning is about. A missed warning could have
consequences that are even worse than having the plane stop mid-flight,
such as, for instance, causing it to explode mid-flight.

Whether it's better for an unexpected problem to cause a program to fail
immediately or continue onward with an undetected failure depends
entirely upon the context. The only unambiguously wrong approach is to
assume that there's a single correct choice that is the same in all
contexts.

David Brown

unread,
Feb 22, 2019, 11:42:01 AM2/22/19
to
Agreed. Really, you are saying the same thing I said - just using more
general terms.

Chris M. Thomasson

unread,
Feb 22, 2019, 5:01:27 PM2/22/19
to
A correction:

He said m_count never decrements because he does __NOT__ know how to

Chris M. Thomasson

unread,
Feb 22, 2019, 8:22:58 PM2/22/19
to
Take a look at the fetch_add function:

https://en.cppreference.com/w/cpp/atomic/atomic/fetch_add

It does not care about wrapping. It just blindly adds addend. If the
caller tries to add 1 to LONG_MAX using a long, then they are in UB
land, fetch_add or not.

David Brown

unread,
Feb 23, 2019, 10:18:29 AM2/23/19
to
On 22/02/2019 16:55, Öö Tiib wrote:
> On Friday, 22 February 2019 14:31:52 UTC+2, David Brown wrote:
>> On 22/02/2019 12:05, Öö Tiib wrote:
>>> On Thursday, 21 February 2019 23:58:51 UTC+2, David Brown wrote:
>>>>
>>>> And even if you assume that it is plausible that you have signed integer
>>>> overflow, how do you think -ftrapv improves matters? Clearly it could
>>>> be useful in testing and debugging - that is what it is for. But in
>>>> released code, who would be happy with a crash and an overflow error
>>>> message?
>>>
>>> Everybody. The sole correct handling of programming defects is to fix
>>> those. Crashes get fixed fastest and so the total count of that defect
>>> ever manifesting in practice will be smallest.
>>>
>>
>> Wrong.
>>
>> An alternative handling of defects is to minimise the damage, to
>> maximise the use of the imperfect program. It is a well established
>> "fact" that all complex programs have bugs in them.
>
> We do not know where the yet undiscovered defects are and under
> what circumstances these manifests and how. None do manifest in
> the (normal, stretched and/or constrained) circumstances under what
> the software has been tested and used so far. So how to minimize
> damage when only thing that we know is that every software contains
> defects and damage from wrong output can be large?

I have no good answer to that - I don't think there is a complete good
answer to be found. We have to improve in lots of areas. Better
testing is one of them - a great deal of software is poorly tested.
Splitting code into testable units, at attempting to verify that these
units are entirely correct, is another. Contracts (which are found in
some languages, and are coming to C++ Real Soon Now™) can help too. Of
course, the biggest step is to persuade developers - and the people
managing them - to take testing and correctness seriously.

>
>> If they were all to
>> stop with an error report on each bug, the software would be useless
>> until all bugs were eliminated.
>
> You seem to forget that complex software with several potential undefined
> behaviors and other defects in it does in practice misbehave very rarely.
>

Actually no, I am not forgetting that - it is /exactly/ my point. Lots
of software has bugs, or at least potential bugs, and yet manages to do
a useful job despite that. Turning these potential bugs into crashes
and error messages simply guarantees that it will /not/ do a useful job
(though it may mean the bugs get found and fixed faster).

Many errors, such as signed integer overflow, pointer mistakes, or
buffer overruns, are clearly undefined behaviour. But they won't
necessarily cause adverse effects in the program. If a buffer overrun
means you accidentally clear beyond the end of an array, but the space
you clear is not used (or is written before it is next read), no harm is
done. If your integer arithmetic overflows but the result is not used,
then it could be harmless. These sorts of things are /potential/
problems - a different compiler, different options, unrelated changes in
the source code could all turn them into /real/ problems. But they
won't be spotted using testing. And options like -ftrapv can make them
/real/ problems.


>> That might be nice in a perfect world -
>> in the real world, however, it would mean waiting decades for testing
>> and development to finish on big systems.
>
> That scenario is not related to scenarios about what I talk.
>
>> People complain about Windows
>> - imagine how much more they would complain if it blue-screened for
>> every minor bug.
>
> It actually did crash frequently. I started to write under Windows from 2.0.
> Technically it was relatively buggy, unstable and under-documented GUI
> toolkit for MS DOS. Developers did hate it but end users liked it and so
> I used it.

I never had the "pleasure" of Windows 2.0. Between MSDOS with GEM and
Windows 3.0 I was at university, using SunOS and Solaris.

>
>> And imagine the "fun" if your aeroplane flight
>> controller stopped with an error message because of a bug in a function
>> for a warning light.
>
> Sure, how should program decide that the defect manifests only as
> incoherence in warning light data? There are several better strategies
> to achieve stability of critical systems than to ignore sanity checks.

I did not say you should ignore them - I said you should not necessarily
stop with an error message.

I think we are saying the same thing in different ways. It appears that
I am saying "Don't always halt on all errors - consider carefully how to
deal with them." And you are saying "Don't always ignore errors -
consider carefully how to deal with them".

>
>> Even during testing, stopping dead on a bug is not necessarily the best
>> choice - it may be better to collect more data first.
>
> How? It did crash exactly on defect and core dump is full of data.
>
>>>> I could understand this more if we were talking about a type of error
>>>> that is harder to avoid, or harder to catch in testing - things like
>>>> race conditions and deadlocks in multi-threaded code. But it is /easy/
>>>> to avoid signed integer overflow. It is /easy/ to write your code in a
>>>> way that it can't happen, or that can detect it appropriately and
>>>> respond sensibly, rather than crashing.
>>>
>>> Misbehavior on case of unusually large numbers for example because
>>> of corrupt data can be quite tricky to test and quite easy to miss.
>>>
>>
>> If your data comes from a source where it could be bad, then you check it.
>
> Exactly. Large amount of defects are that some corner case check wasn't
> written by programmer and so it is not handled and no one did notice.

Make your checks positive, not negative - check that the data fits the
patterns you want, rather than checking for patterns that you know are bad.

Reinhardt Behm

unread,
Feb 23, 2019, 10:53:47 AM2/23/19
to
AT Saturday 23 February 2019 23:18, David Brown wrote:

> Make your checks positive, not negative - check that the data fits the
> patterns you want, rather than checking for patterns that you know are
> bad.

+1
Simply because you (hopefully) know what is correct but you never know all
incorrect patterns.

--
Reinhardt

Bonita Montero

unread,
Feb 23, 2019, 11:52:58 AM2/23/19
to
> There are no modern systems that don't use 2's complement arithmetic -
> that half is correct. But modern compilers can, and do, assume that
> your signed arithmetic never overflows.

For what is this assumption good for? This looks rather like a nerd
-decision of a compiler-writer for me. I mean integer-arithmetic is
not like fp-arithmetic on any platform; the ISAs on which the com-
pilers are based on always overflow.

David Brown

unread,
Feb 23, 2019, 12:05:04 PM2/23/19
to
They make that assumption for optimisation purposes. C++ is dependent
on optimising compilers - without optimisation, a lot of C++ code would
be very big and slow. Inlining and template expansion, followed by
strength reduction, constant propagation, and various optimisation
passes leads to all sorts of opportunities for making more efficient
object code. But a lot of that depends on the assumption that undefined
behaviour does not occur. And in the case of signed integer arithmetic,
assuming that there is no overflow allows all sorts of simplifications.

So the compiler knows that "x + 1 > x" is true, and "2 * x / 2" is "x",
and so on.

Bonita Montero

unread,
Feb 23, 2019, 12:17:27 PM2/23/19
to
> They make that assumption for optimisation purposes.

Cool, if you had not said so then I would not have come up with it!

> C++ is dependent on optimising compilers - without optimisation,
> a lot of C++ code would be very big and slow.

But I see in this case, no optimization that would be worth this
interpretation.


> So the compiler knows that "x + 1 > x" ...
> is true, and "2 * x / 2" is "x", and so on.

I think it's better for a compiler not to assume this.
All the more so because no programmer is so stupid to write such
code that makes such optimizations necessary. That is, there are
few advantages of such optimizations, but the disadvantage is
that the compiler does many things that you would not intuitively
expect from it.
For example, It can be very handy if you have a variable with a
set bit that you shift from the lowest position to the highest
and beyond to empty. In such and many other cases, one of the
compilers would get in the way.

Christian Gollwitzer

unread,
Feb 23, 2019, 3:18:48 PM2/23/19
to
Am 23.02.19 um 18:17 schrieb Bonita Montero:
>> So the compiler knows that "x + 1 > x" ...
>> is true, and "2 * x / 2" is "x",  and so on.
>
> I think it's better for a compiler not to assume this.
> All the more so because no programmer is so stupid to write such
> code that makes such optimizations necessary.

You're getting it wrong. The user may not explicitly write code such as
2*x/2, but due to inlining and template expansion an expression may come
up like this.

Think of index arithmetic with vectors and such things. For example:

std::vector<long> bla(30);
size_t sizeofvector = bla.size()*sizeof(long);

This second line could result in code that does the equivalent of

size_t bytesinvector;
sizeofvector = bytesinvector/sizeof(long)*sizeof(long);


Christian

David Brown

unread,
Feb 24, 2019, 5:53:45 AM2/24/19
to
On 23/02/2019 18:17, Bonita Montero wrote:
>> They make that assumption for optimisation purposes.
>
> Cool, if you had not said so then I would not have come up with it!
>
>> C++ is dependent  on optimising compilers - without optimisation,
>> a lot of C++ code would be very big and slow.
>
> But I see in this case, no optimization that would be worth this
> interpretation.
>
>
>> So the compiler knows that "x + 1 > x" ...
>> is true, and "2 * x / 2" is "x",  and so on.
>
> I think it's better for a compiler not to assume this.

You are entitled to your preferences here - and that is why gcc has the
"-fwrapv" switch to let people choose. But be aware that you are in the
minority, and you cannot assume compilers will cater to you unless they
explicitly say so.

> All the more so because no programmer is so stupid to write such
> code that makes such optimizations necessary. That is, there are
> few advantages of such optimizations, but the disadvantage is
> that the compiler does many things that you would not intuitively
> expect from it.

Programmers write code all the time that benefits from such
re-arrangements. People rarely write them explicitly, of course, but
inlining, constant propagation, macro expansion, etc., can often result
in expressions that can be simplified in such ways.


> For example, It can be very handy if you have a variable with a
> set bit that you shift from the lowest position to the highest
> and beyond to empty. In such and many other cases, one of the
> compilers would get in the way.

Such operations are /not/ "very handy". They are the kind of things
that only a small fraction of programmers ever need to do, and they only
need to do it very occasionally. And C (and C++) has a feature for
handling such cases - unsigned integer types.


Bonita Montero

unread,
Feb 24, 2019, 6:31:05 AM2/24/19
to
> You are entitled to your preferences here - and that is why gcc has the
> "-fwrapv" switch to let people choose.

That should be the default because ihe setting that comes with the
maximum assumtion is worth nothing and break intuitive assumptions.

David Brown

unread,
Feb 24, 2019, 6:57:14 AM2/24/19
to
(Please post correctly, with attributions. You will find it much easier
to use Usenet groups if you get a newsreader and newsserver, rather than
Google's broken "groups" interface. I recommend Thunderbird and
news.eternal-september.org as free solutions if you don't have any other
preferences.)
I've tried to explain, but apparently I have failed. If you can point
to things I wrote that you don't understand, or don't agree with, I'll
be happy to try to go in more detail. But if you are simply going to
re-state your original incorrect point, then I can't help you any more.

Jorgen Grahn

unread,
Feb 24, 2019, 11:34:53 AM2/24/19
to
On Sat, 2019-02-23, Bonita Montero wrote:

[David Brown]

>> C++ is dependent on optimising compilers - without optimisation,
>> a lot of C++ code would be very big and slow.
>
> But I see in this case, no optimization that would be worth this
> interpretation.
>
>
>> So the compiler knows that "x + 1 > x" ...
>> is true, and "2 * x / 2" is "x", and so on.
>
> I think it's better for a compiler not to assume this.

Then invent a language which defines the behavior, write a compiler
for it, and convince people to use it. Or switch to a language which
already works like that.

Personally I'm happy with the simple rule "never let signed arithmetic
overflow".

Bonita Montero

unread,
Feb 24, 2019, 11:56:22 AM2/24/19
to
>     size_t bytesinvector;
>     sizeofvector = bytesinvector/sizeof(long)*sizeof(long);

And where is the check against the range here that the compiler
could optimize away?

Öö Tiib

unread,
Feb 25, 2019, 5:17:51 AM2/25/19
to
Then I do not understand how the crashes on those bugs that are
reached only on tiny fraction of actual cases turn the software useless.

> Many errors, such as signed integer overflow, pointer mistakes, or
> buffer overruns, are clearly undefined behaviour. But they won't
> necessarily cause adverse effects in the program. If a buffer overrun
> means you accidentally clear beyond the end of an array, but the space
> you clear is not used (or is written before it is next read), no harm is
> done. If your integer arithmetic overflows but the result is not used,
> then it could be harmless. These sorts of things are /potential/
> problems - a different compiler, different options, unrelated changes in
> the source code could all turn them into /real/ problems. But they
> won't be spotted using testing. And options like -ftrapv can make them
> /real/ problems.

Earlier discovery is still clearly better statistically. Casual incorrect behavior
of unknown nature takes at least ten times more time to be tracked down
and eliminated than crash. Reporting crashes is easy to automate. Even if
only quarter of such crashes did prevent misbehavior and rest did not then
we still get more than two times less misbehaviors in total.

> >> And imagine the "fun" if your aeroplane flight
> >> controller stopped with an error message because of a bug in a function
> >> for a warning light.
> >
> > Sure, how should program decide that the defect manifests only as
> > incoherence in warning light data? There are several better strategies
> > to achieve stability of critical systems than to ignore sanity checks.
>
> I did not say you should ignore them - I said you should not necessarily
> stop with an error message.
>
> I think we are saying the same thing in different ways. It appears that
> I am saying "Don't always halt on all errors - consider carefully how to
> deal with them." And you are saying "Don't always ignore errors -
> consider carefully how to deal with them".

Everybody should consider best ways to handle known issues. For
me the discussion was about unknown errors that are there but about
what our best efforts so far have given no indications.

Basically during program run it has unexpected positive insanity check
(or negative sanity check).
Best would be to report it and roll back to sane situation. We have no
insanity barriers in process and we can not estimate where it started
and to where it has spread. Safest seems to halt the process.

> >> Even during testing, stopping dead on a bug is not necessarily the best
> >> choice - it may be better to collect more data first.
> >
> > How? It did crash exactly on defect and core dump is full of data.
> >
> >>>> I could understand this more if we were talking about a type of error
> >>>> that is harder to avoid, or harder to catch in testing - things like
> >>>> race conditions and deadlocks in multi-threaded code. But it is /easy/
> >>>> to avoid signed integer overflow. It is /easy/ to write your code in a
> >>>> way that it can't happen, or that can detect it appropriately and
> >>>> respond sensibly, rather than crashing.
> >>>
> >>> Misbehavior on case of unusually large numbers for example because
> >>> of corrupt data can be quite tricky to test and quite easy to miss.
> >>>
> >>
> >> If your data comes from a source where it could be bad, then you check it.
> >
> > Exactly. Large amount of defects are that some corner case check wasn't
> > written by programmer and so it is not handled and no one did notice.
>
> Make your checks positive, not negative - check that the data fits the
> patterns you want, rather than checking for patterns that you know are bad.

It is orthogonal in what way the missing check supposedly is. Regardless if
it is checking if the program is sane or insane It is just not there at all.

> > Raise hands who has not forgotten to add such a check anywhere
> > in last ten thousand of lines of code they wrote. I see none such hands.

Regardless if the checks are positive or negative I observe zero hands.

Fred.Zwarts

unread,
Feb 25, 2019, 7:34:21 AM2/25/19
to
schreef in bericht
news:bb9ae4ae-8993-42c9...@googlegroups.com...
Indeed. Unfortunately, for a program with unknown bugs, it is not very easy
to predict what the context will be when a bug is encountered.
In the case of an airplane controller, the consequences of undefined
behaviour can be very serious. I would prefer to switch off the automatic
controller and to switch to manual mode, instead of continuing an airplane
controller with undefined behaviour. But for a computer game it may be fun
to continue with undefined behaviour.

David Brown

unread,
Feb 25, 2019, 7:54:23 AM2/25/19
to
(I am not sure what you are saying here.)

>> Many errors, such as signed integer overflow, pointer mistakes, or
>> buffer overruns, are clearly undefined behaviour. But they won't
>> necessarily cause adverse effects in the program. If a buffer overrun
>> means you accidentally clear beyond the end of an array, but the space
>> you clear is not used (or is written before it is next read), no harm is
>> done. If your integer arithmetic overflows but the result is not used,
>> then it could be harmless. These sorts of things are /potential/
>> problems - a different compiler, different options, unrelated changes in
>> the source code could all turn them into /real/ problems. But they
>> won't be spotted using testing. And options like -ftrapv can make them
>> /real/ problems.
>
> Earlier discovery is still clearly better statistically. Casual incorrect behavior
> of unknown nature takes at least ten times more time to be tracked down
> and eliminated than crash.

I agree. That is why testing is so important. (And also why static
testing and compile-time warnings are so useful.) You should put a good
deal of effort into identifying problems as soon as possible, before
release.

But once you are talking about release, the balance changes. Your aim
is typically that any bugs in the code should present the least possible
problems to the user. That might mean "crash and report" - causing
short-term inconvenience with an aim to improvements in the long term.
But it might equally mean "Run quickly, ignore potential bugs and hope
there are no real effects" to maximise short-term usability of the
software. There is no single correct answer. "Run slowly, crash on any
possible overflow error but ignore all other errors, and hope that the
user reports in a useful manner", or "-ftrapv", is very unlikely to be
the best choice.


> Reporting crashes is easy to automate.

For some kinds of programs, yes. For others, no - it is impossible to
automate. You are making sweeping statements generalised from only some
possible types of software.

> Even if
> only quarter of such crashes did prevent misbehavior and rest did not then
> we still get more than two times less misbehaviors in total.
>
>>>> And imagine the "fun" if your aeroplane flight
>>>> controller stopped with an error message because of a bug in a function
>>>> for a warning light.
>>>
>>> Sure, how should program decide that the defect manifests only as
>>> incoherence in warning light data? There are several better strategies
>>> to achieve stability of critical systems than to ignore sanity checks.
>>
>> I did not say you should ignore them - I said you should not necessarily
>> stop with an error message.
>>
>> I think we are saying the same thing in different ways. It appears that
>> I am saying "Don't always halt on all errors - consider carefully how to
>> deal with them." And you are saying "Don't always ignore errors -
>> consider carefully how to deal with them".
>
> Everybody should consider best ways to handle known issues. For
> me the discussion was about unknown errors that are there but about
> what our best efforts so far have given no indications.
>
> Basically during program run it has unexpected positive insanity check
> (or negative sanity check).
> Best would be to report it and roll back to sane situation. We have no
> insanity barriers in process and we can not estimate where it started
> and to where it has spread. Safest seems to halt the process.

My argument is merely that "safest seems to halt the process" is, in
many cases, wrong. And even when it might be appropriate, the cost in
performance may be an issue (not every system is performance critical,
of course).

And in general, sweeping generalisations are wrong most of the time.

James Kuyper

unread,
Feb 25, 2019, 12:11:31 PM2/25/19
to
On 2/25/19 07:34, Fred.Zwarts wrote:
> schreef in bericht
> news:bb9ae4ae-8993-42c9...@googlegroups.com...

You should normally include the name (or at least, the userid) of the
person you're quoting.

...
>> That depends upon what the warning is about. A missed warning could have
>> consequences that are even worse than having the plane stop mid-flight,
>> such as, for instance, causing it to explode mid-flight.
>>
>> Whether it's better for an unexpected problem to cause a program to fail
>> immediately or continue onward with an undetected failure depends
>> entirely upon the context. The only unambiguously wrong approach is to
>> assume that there's a single correct choice that is the same in all
>> contexts.
>
> Indeed. Unfortunately, for a program with unknown bugs, it is not very easy
> to predict what the context will be when a bug is encountered.

Not really. In the case of program controlling a vending machine, you're
guaranteed that the context is "vending snacks". In the case of a
program controlling the launch of nuclear weapons, you're guaranteed
that the context is "nuclear weapons launch". If you use exactly the
same approach in both contexts, you're either wasting time and money
putting way too much effort into dealing with possible defects in your
vending machine code, or you're taking excessively large risks by
putting way too little effort into dealing with possible defects in your
nuclear launch code - quite possibly both.

It is a popular design goal to make sure that when failures occur, they
do so in the safest possible manner - the technical term for this is
"failsafe". However, what constitutes "safest" is also context
dependent. Ordinarily, for instance, it would be safest for a missile to
not explode. But what if successful detonation of that missile somewhere
near it's intended target is essential to preventing your own forces
from being on the receiving end of subsequent attacks? If a failure
occurs that substantially reduces, but does not eliminate, the
possibility of hitting that target, it may be safer (for the source, not
the target) if the missile deals with that failure by continuing to do
it's best to hit that target.


> In the case of an airplane controller, the consequences of undefined
> behaviour can be very serious. I would prefer to switch off the automatic
> controller and to switch to manual mode, instead of continuing an airplane
> controller with undefined behaviour. But for a computer game it may be fun
> to continue with undefined behaviour.

What if manual mode is essentially unusable? Traditionally, many
airplanes were designed to be "inherently stable" - if the plane were
rolled slightly to the right, and you let go of the controls, it would
have a built in tendency to start rotating to the left. If it were
pitched downward, and you let go of the controls, it would have an
innate tendency to start pitching upward. This was a very valuable
safety feature - in many contexts.
However, many modern planes are designed to be inherently unstable - if
rolled slightly to the right, and you take your hands off the controls,
they have an innate tendency to start rolling to the right faster. Why
would anyone do that? Because an inherent consequence of such
instability is that the plane is more maneuverable. Special electronics
is used to initiate corrective actions more quickly and more precisely
than any human could. In essence, software is used to make an inherent
unstable plane emulate an inherently stable plane - but much more
maneuverable. As a result, such planes can be difficult, maybe even
impossible, to fly manually, by anyone with less that superhuman
reflexes. So, when the software controlling such a system runs into a
problem that degrades it's ability to control your plane (for instance,
by having damaged sensors reporting invalid values that trigger
overflows), would you be better off having the program simply abort, or
to continue running with degraded control? That depends critically upon
how badly degraded the control is.

Öö Tiib

unread,
Feb 25, 2019, 12:11:34 PM2/25/19
to
You said "If they were all to stop with an error report on each bug,
the software would be useless until all bugs were eliminated."
My point was that it is false dichotomy and not how the actual
reality works.

In reality the conditions under what most of still undiscovered bugs
manifest are reached very rarely. Did you test it at all? Yes. So it is
obvious why.

So there can be hundreds of bugs in calculations of product, each
one guaranteed to crash thanks to that -ftrapv, each even easy to
reproduce once discovered *and* also hundreds of thousands
users monthly using it with not a single crash actually happening.

So how did -ftrapv turn the program suddenly into useless?

> >> Many errors, such as signed integer overflow, pointer mistakes, or
> >> buffer overruns, are clearly undefined behaviour. But they won't
> >> necessarily cause adverse effects in the program. If a buffer overrun
> >> means you accidentally clear beyond the end of an array, but the space
> >> you clear is not used (or is written before it is next read), no harm is
> >> done. If your integer arithmetic overflows but the result is not used,
> >> then it could be harmless. These sorts of things are /potential/
> >> problems - a different compiler, different options, unrelated changes in
> >> the source code could all turn them into /real/ problems. But they
> >> won't be spotted using testing. And options like -ftrapv can make them
> >> /real/ problems.
> >
> > Earlier discovery is still clearly better statistically. Casual incorrect behavior
> > of unknown nature takes at least ten times more time to be tracked down
> > and eliminated than crash.
>
> I agree. That is why testing is so important. (And also why static
> testing and compile-time warnings are so useful.) You should put a good
> deal of effort into identifying problems as soon as possible, before
> release.

None of what was discarded nor replaced with -ftrapv.

> But once you are talking about release, the balance changes. Your aim
> is typically that any bugs in the code should present the least possible
> problems to the user. That might mean "crash and report" - causing
> short-term inconvenience with an aim to improvements in the long term.
> But it might equally mean "Run quickly, ignore potential bugs and hope
> there are no real effects" to maximise short-term usability of the
> software. There is no single correct answer. "Run slowly, crash on any
> possible overflow error but ignore all other errors, and hope that the
> user reports in a useful manner", or "-ftrapv", is very unlikely to be
> the best choice.

Again built up argument? What program will perform "very slowly"? Most
stuff on current hardware is I/O bound. What "but ignore all other errors"?
I suggested to crash only on all programming errors. Basically do not
turn off sanity checks like specify NDEBUG, turn off -ftrapv etc. Ok,
twice bigger executable binary (that is anyway only fraction of all the files
and data) and 5% worse overall performance. Is it factor? It does no way
mean that I suggested usage of crashes as a form of user input validation
or -ftrapv as replacement to testing or that I do not care about performance
or whatever other such strawman nonsense.

> > Reporting crashes is easy to automate.
>
> For some kinds of programs, yes. For others, no - it is impossible to
> automate. You are making sweeping statements generalised from only some
> possible types of software.

Even for software with no persistent external communications it is
easier to automate reporting crashes and hangs than to automate
reporting misbehaviors of every imaginable nature.
I was specifically talking about processes whose misbehavior can have
worse consequences than lack of behavior. Why I have to restate it in
every second message? The alternative is that safest is to ignore
unexpected signs of insanity and hope that it passes and will not harm
and there will be bonus of performance gain.

How to get rid of that red herring of performance? Typical software
that heavily uses hardware resources are about like that:
About 5% of code base is executed about 30% of run-time, half of
rest of code base is executed 1.5% of run-time other half is executed
too close to 0% of run-time to express. Rest of 68.5% of run-time
the software is waiting after I/O from files and from elsewhere.
So if to optimize those 5% (for example 15K code lines of 300K SLOC)
of code-base to be twice more efficient then it perhaps eats somewhat
less battery and runs 15% faster overall. If it is desirable but somehow
made harder by some sanity checks in that 15K lines of code then with
that can be dealt locally, and not by turning off sanity checks in whole
program, especially in those 285K lines of code that are ran rarely.

> And in general, sweeping generalisations are wrong most of the time.

I was specifically talking about process whose wrong answers can
have worse consequences than lack of answers. I was not talking
about every kind of software.

Fred.Zwarts

unread,
Feb 26, 2019, 3:49:39 AM2/26/19
to
"James Kuyper" schreef in bericht news:q517jo$2ho$1...@dont-email.me...
>problem that degrades it's ability t o control your plane (for instance,
>by having damaged sensors reporting invalid values that trigger
>overflows), would you be better off having the program simply abort, or
>to continue running with degraded control? That depends critically upon
>how badly degraded the control is.

I fully agree with the above. But you assume that the not the program itself
is failing, but sensors, or other devices. But the question is what to do
with bugs in the program itself, if the program itself can no longer be
trusted. Is there a way to predict what the effect of a bug will be? In case
of undefined behaviour, for example. Is it really so evident that it is
better to run the plane with undefined behaviour of the program, than to
stop the program?
I am surprised that so many people, in particular when the consequences are
so serious, seem to think that stopping a program is always worse than
continuing with undefined behaviour. I have seen such arguments more often,
not only for the case of integer overflows, but also after detecting memory
corruption. Not only for airplane controllers, but also for nuclear power
plants, or surgery by robots.
I do not say that you sound like such people. I understand that you agree
that a choice to stop or to continue the program is not so simple.

David Brown

unread,
Feb 26, 2019, 6:12:34 AM2/26/19
to
On 26/02/2019 09:49, Fred.Zwarts wrote:
> I am surprised that so many people, in particular when the consequences
> are so serious, seem to think that stopping a program is always worse
> than continuing with undefined behaviour.

I don't think I have seen anyone in this thread propose that. Do you
have a reference?

All that has been said is that stopping a program is not always better
than continuing with undefined behaviour - and there seems to be
wide-spread agreement on that.

James Kuyper

unread,
Feb 26, 2019, 8:09:28 AM2/26/19
to
On 2/26/19 03:49, Fred.Zwarts wrote:
> "James Kuyper" schreef in bericht news:q517jo$2ho$1...@dont-email.me...
...
>> reflexes. So, when the software controlling such a system runs into a
>> problem that degrades it's ability to control your plane (for instance,
>> by having damaged sensors reporting invalid values that trigger
>> overflows), would you be better off having the program simply abort, or
>> to continue running with degraded control? That depends critically upon
>> how badly degraded the control is.
>
> I fully agree with the above. But you assume that the not the program itself
> is failing, but sensors, or other devices.

I'm talking about both. I deliberately brought the discussion back to
the original topic by citing a sensor failure that produced values
outside the expected range, causing the program itself to fail due to
overflow.
0 new messages