"C++ on the Move" by Darryl K. Taft

Lynn McGuire

unread,

Mar 7, 2023, 4:38:56 PM3/7/23

to

"C++ on the Move" by Darryl K. Taft
https://thenewstack.io/c-on-the-move/

"While the C++ programming language seems to be as popular as ever,
efforts to move it forward are ongoing both inside the ISO C++ steering
committee as well as other projects."

Lynn

Richard

unread,

Mar 7, 2023, 5:09:08 PM3/7/23

to

[Please do not mail me a copy of your followup]

Lynn McGuire <lynnmc...@gmail.com> spake the secret code
<tu8at1$ijm1$1...@dont-email.me> thusly:

>"C++ on the Move" by Darryl K. Taft

A ho-hum article IMO. You've posted better :).
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Terminals Wiki <http://terminals-wiki.org>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Lynn McGuire

unread,

Mar 7, 2023, 5:26:14 PM3/7/23

to

On 3/7/2023 4:08 PM, Richard wrote:
> [Please do not mail me a copy of your followup]
>
> Lynn McGuire <lynnmc...@gmail.com> spake the secret code
> <tu8at1$ijm1$1...@dont-email.me> thusly:
>
>> "C++ on the Move" by Darryl K. Taft
>
> A ho-hum article IMO. You've posted better :).

I was curious in what people thought of the last paragraph, "“Now bring
Rust to the table — a language well-suited for the low-level programming
that C++ typically dominates, but with built-in memory safety and other
modern features,” Campbell said. “When luminaries like Mark Russinovich
are saying things like [‘…halt starting any new projects in C/C++ and
use Rust…’], C++ must evolve or be left behind. The question is, at what
price?” he said."
https://twitter.com/markrussinovich/status/1571995117233504257?lang=en

Thanks,
Lynn

Richard

unread,

Mar 7, 2023, 6:10:43 PM3/7/23

to

[Please do not mail me a copy of your followup]

Lynn McGuire <lynnmc...@gmail.com> spake the secret code

<tu8dln$j2vi$1...@dont-email.me> thusly:

>On 3/7/2023 4:08 PM, Richard wrote:
>> [Please do not mail me a copy of your followup]
>>
>> Lynn McGuire <lynnmc...@gmail.com> spake the secret code
>> <tu8at1$ijm1$1...@dont-email.me> thusly:
>>
>>> "C++ on the Move" by Darryl K. Taft
>>
>> A ho-hum article IMO. You've posted better :).
>

>I was curious in what people thought of the last paragraph, [...]

IMO they found someone with some credibility to say something spicy in
order to get some rage clicks for their article.

That isn't to say that the safety aspects of C++ should be ignored;
but honestly 25 years ago I taught my team the Standard Library
containers and all our memory leaks and crashes disappeared. The
argument that C++ is only as safe as the habits and disciplines of
your team is still true. Herb Sutter's cpp2 is a better way to
address that than Rust, C#, Java, Carbon or whatever new language that
is going to "kill C++" this week.

A weak developer can code resource leaks in any language. I've seen it
done in Java and C# many times by my coworkers. A weak C++ developer
will blame std::vector<> as "buggy" when in reality the types they put
into the container didn't satisfy the Copyable requirement.

Every time some non-C++ programmer complains about how C++ is "unsafe"
and I ask them to show me an example, they show me C code. It's fine
to criticize C++ but at least know what you're talking about when you
do it. If I started bitching about Java and when asked to put up or
shut up, I started showing JavaScript, people would rightly laugh at
me.

Paavo Helde

unread,

Mar 8, 2023, 1:41:12 AM3/8/23

to

+100

Öö Tiib

unread,

Mar 8, 2023, 2:21:23 AM3/8/23

to

Most people agree that C++ contains too lot of features. They only differ
by opinion where to cut. I would put most effort into places where it says
"undefined behavior". I think half of such places can be replaced with
"program is ill formed", "std::terminate is called" or one of the features
that lead to described situation can be cut out (as badly thought thru).

Scott Lurndal

unread,

Mar 8, 2023, 11:10:28 AM3/8/23

to

=?UTF-8?B?w5bDtiBUaWli?= <oot...@hot.ee> writes:
>On Wednesday, 8 March 2023 at 00:26:14 UTC+2, Lynn McGuire wrote:

>> On 3/7/2023 4:08 PM, Richard wrote:=20
>> > [Please do not mail me a copy of your followup]=20
>> >=20
>> > Lynn McGuire <lynnmc...@gmail.com> spake the secret code=20
>> > <tu8at1$ijm1$1...@dont-email.me> thusly:=20
>> >=20
>> >> "C++ on the Move" by Darryl K. Taft=20
>> >=20

>> > A ho-hum article IMO. You've posted better :).

>>=20
>> I was curious in what people thought of the last paragraph, "=E2=80=9CNow=
> bring=20
>> Rust to the table =E2=80=94 a language well-suited for the low-level prog=
>ramming=20
>> that C++ typically dominates, but with built-in memory safety and other=
>=20
>> modern features,=E2=80=9D Campbell said. =E2=80=9CWhen luminaries like Ma=
>rk Russinovich=20
>> are saying things like [=E2=80=98=E2=80=A6halt starting any new projects =
>in C/C++ and=20
>> use Rust=E2=80=A6=E2=80=99], C++ must evolve or be left behind. The quest=
>ion is, at what=20
>> price?=E2=80=9D he said."=20
>> https://twitter.com/markrussinovich/status/1571995117233504257?lang=3Den=
>=20
>>=20

>Most people agree that C++ contains too lot of features. They only differ

>by opinion where to cut. I would put most effort into places where it says=
>=20
>"undefined behavior". I think half of such places can be replaced with=20

>"program is ill formed", "std::terminate is called" or one of the features
>that lead to described situation can be cut out (as badly thought thru).

The nice thing about C++ is that one need not use any of the new crap.

Keith Thompson

unread,

Mar 8, 2023, 12:53:09 PM3/8/23

to

I'd be interested in seeing some cases of undefined behavior that could
reasonably be detected at compile time or at run time. I'm sure there
are some, but I'd be surprised it were as many as half.

Just one example: Signed integer overflow has undefined behavior, and
defining it would require runtime checks.

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */

Malcolm McLean

unread,

Mar 8, 2023, 1:18:43 PM3/8/23

to

Not on the vast majority of platforms. However you'd have to define the
behaviour as a two's complement wrap.

Alf P. Steinbach

unread,

Mar 8, 2023, 1:20:53 PM3/8/23

to

I guess you meant to write, "detecting it", not "defining it".

Defining signed overflow as wrap-around requires no runtime checks, now
that C++ integers are guaranteed two's complement.

Some optimization opportunities would be lost, but maybe others gained.

- Alf

David Brown

unread,

Mar 8, 2023, 1:47:57 PM3/8/23

to

Or one need not use the old crap, and can use the new better features
instead.

Or a mixture. Or all. Or nearly none.

It's a big language, and a big library - few people are going to have
use for all of it, but the subset you find useful will vary widely from
person to person.

James Kuyper

unread,

Mar 8, 2023, 2:10:59 PM3/8/23

to

On 3/8/23 13:20, Alf P. Steinbach wrote:
> On 2023-03-08 6:52 PM, Keith Thompson wrote:

...

>> I'd be interested in seeing some cases of undefined behavior that could
>> reasonably be detected at compile time or at run time. I'm sure there
>> are some, but I'd be surprised it were as many as half.
>>
>> Just one example: Signed integer overflow has undefined behavior, and
>> defining it would require runtime checks.
>
> I guess you meant to write, "detecting it", not "defining it".

No, he meant "defining it". If they were to choose to define the
behavior of signed integer overflow, then on platforms where the defined
behavior was not the same as that which would naturally occur, run-time
checks would be required to change the behavior.

Michael S

unread,

Mar 8, 2023, 2:21:51 PM3/8/23

to

That 's not 100% true.
Two example of [not so] new non-crap libraries that are entangled with [not so] new
crap - std::chrono and std::random.
The former is infected by rational numbers crap.
The later is soiled both with <functional> and and with std::ref.
And both of them misuse operator overloading.
Not that I am aware of many places in standard library where operator overloading
is not misused.

David Brown

unread,

Mar 8, 2023, 2:46:05 PM3/8/23

to

Wrapping is not the only option. But since wrapping is the most useless
behaviour for signed integer overflow, doing something helpful (such as
the suggested call to std::terminate, or other error message to aid
debugging) /would/ require run-time checks.

Scott Lurndal

unread,

Mar 8, 2023, 2:49:54 PM3/8/23

to

Michael S <already...@yahoo.com> writes:
>On Wednesday, March 8, 2023 at 6:10:28=E2=80=AFPM UTC+2, Scott Lurndal wrot=
>e:
>> =3D?UTF-8?B?w5bDtiBUaWli?=3D <oot...@hot.ee> writes:=20

>> >On Wednesday, 8 March 2023 at 00:26:14 UTC+2, Lynn McGuire wrote:

>> >> On 3/7/2023 4:08 PM, Richard wrote:=3D20=20
>> >> > [Please do not mail me a copy of your followup]=3D20=20
>> >> >=3D20=20
>> >> > Lynn McGuire <lynnmc...@gmail.com> spake the secret code=3D20=20
>> >> > <tu8at1$ijm1$1...@dont-email.me> thusly:=3D20=20
>> >> >=3D20=20
>> >> >> "C++ on the Move" by Darryl K. Taft=3D20=20
>> >> >=3D20

>> >> > A ho-hum article IMO. You've posted better :).

>> >>=3D20=20
>> >> I was curious in what people thought of the last paragraph, "=3DE2=3D8=
>0=3D9CNow=3D=20
>> > bring=3D20=20
>> >> Rust to the table =3DE2=3D80=3D94 a language well-suited for the low-l=
>evel prog=3D=20
>> >ramming=3D20=20
>> >> that C++ typically dominates, but with built-in memory safety and othe=
>r=3D=20
>> >=3D20=20
>> >> modern features,=3DE2=3D80=3D9D Campbell said. =3DE2=3D80=3D9CWhen lum=
>inaries like Ma=3D=20
>> >rk Russinovich=3D20=20
>> >> are saying things like [=3DE2=3D80=3D98=3DE2=3D80=3DA6halt starting an=
>y new projects =3D=20
>> >in C/C++ and=3D20=20
>> >> use Rust=3DE2=3D80=3DA6=3DE2=3D80=3D99], C++ must evolve or be left be=
>hind. The quest=3D=20
>> >ion is, at what=3D20=20
>> >> price?=3DE2=3D80=3D9D he said."=3D20=20
>> >> https://twitter.com/markrussinovich/status/1571995117233504257?lang=3D=
>3Den=3D=20
>> >=3D20=20
>> >>=3D20
>> >Most people agree that C++ contains too lot of features. They only diffe=
>r
>> >by opinion where to cut. I would put most effort into places where it sa=
>ys=3D=20
>> >=3D20=20

>> >"undefined behavior". I think half of such places can be replaced with=

>=3D20
>> >"program is ill formed", "std::terminate is called" or one of the featur=
>es=20

>> >that lead to described situation can be cut out (as badly thought thru).
>> The nice thing about C++ is that one need not use any of the new crap.
>
>That 's not 100% true.

>Two example of [not so] new non-crap libraries that are entangled with [not=

> so] new
>crap - std::chrono and std::random.

There are platform native interfaces providing the functionality of both
std::chrono and std::random, so yes, it's true.

Michael S

unread,

Mar 8, 2023, 3:08:32 PM3/8/23

to

The whole point of std::chrono and std:random is that they are cross-platform.
May be, you don't care for anything non-Linux, but I certainly do.

Keith Thompson

unread,

Mar 8, 2023, 3:28:30 PM3/8/23

to

I meant "defining it", as in making the behavior defined rather than
undefined.

> Defining signed overflow as wrap-around requires no runtime checks,
> now that C++ integers are guaranteed two's complement.

Probably true. Two's complement representation doesn't necessarily
imply two's complement behavior on overflow, but that's probably the
default behavior on most platforms.

> Some optimization opportunities would be lost, but maybe others gained.

The only virtue of defining signed overflow as two's complement
wraparound is that it's convenient for the implementation. For users,
it would provide consistency, but having INT_MAX+1 yield INT_MIN is not
likely to be *useful*. If you want the language to define the behavior,
there are a number of options. I wasn't assuming wraparound. (Throwing
an exception is another possibility.)

Michael S

unread,

Mar 8, 2023, 3:48:24 PM3/8/23

to

My favorite example of usefulness of wrapping signed overflow behavior
is CIC (a.k.a. Hogenauer) interpolation and decimation filters.

Keith Thompson

unread,

Mar 8, 2023, 3:59:22 PM3/8/23

to

Perhaps it would have been better if I hadn't provided an example.
Nothing wrong with talking about signed integer overflow, but it wasn't
what I intended to focus on.

Öö Tiib speculated that about half of the cases of undefined behavior
could be replaced by "program is ill formed" (detectable at compile
time) or "std::terminate is called" (detectable at run time).

I'm sure that *some* cases of undefined behavior could be made
not-undefined, but I'm skeptical that as many as half could be.

(Counting cases of UB is tricky, especially given the cases that are
undefined by omission.)

Alf P. Steinbach

unread,

Mar 8, 2023, 4:34:20 PM3/8/23

to

Can you give a concrete (even if hypothetical) example of what you mean?

- Alf

Alf P. Steinbach

unread,

Mar 8, 2023, 4:40:31 PM3/8/23

to

On 2023-03-08 9:48 PM, Michael S wrote:
> On Wednesday, March 8, 2023 at 10:28:30 PM UTC+2, Keith Thompson wrote:
>> The only virtue of defining signed overflow as two's complement
>> wraparound is that it's convenient for the implementation. For users,
>> it would provide consistency, but having INT_MAX+1 yield INT_MIN is not
>> likely to be *useful*.
>
> My favorite example of usefulness of wrapping signed overflow behavior
> is CIC (a.k.a. Hogenauer) interpolation and decimation filters.

In Windows, the millisecond resolution `GetTickCount()` function wraps
every 49.7 days. Happily the result is an unsigned type. So, as long as
the time between two invocations is less than 49 days, one can simply
subtract the last from the first in order to find elapsed time.

With signed overflow defined as wrapping that func could have had signed
type return value, so as to not cause inadvertent wrapping problems in
comparisons and arithmetic involving negative signed type values.

So it's a bit about safety, of being able to choose signed type freely.

- Alf

Keith Thompson

unread,

Mar 8, 2023, 5:29:29 PM3/8/23

to

"Alf P. Steinbach" <alf.p.s...@gmail.com> writes:

For example, a future C++ standard might specify that signed integer
overflow throws a language-defined exception, or it might require
saturation (so INT_MAX + 1 == INT_MAX). If the target hardware
doesn't support that in some way, then checks have to be added.

If the standard defined signed integer overflow as two's-complement
wraparound (INT_MAX + 1 == INT_MIN), additional checks would not be
required on most target systems (perhaps all current ones). I didn't
have that specific definition in mind.

Or it might define out-of-bounds array indexing (for plain C-like
arrays) to wrap around, so that given `int a[10];` the expression
`a[15]` or `a[-5]` would be equivalent to `a[5]`. Or it could
require an exception to be thrown.

Ada defines signed integer overflow and out-of-bounds indexing to
raise an exception, so there is precedent. Both require the compiler
to insert run-time checks, which can be removed during optimization
if the compiler can prove that the error will never occur. And there
are mechanisms to suppress the checks; if an exception would have
been raised but the check is suppressed, the behavior is undefined
(Ada calls it "erroneous execution").

Again, I didn't intend to concentrate on the case of signed integer
overflow. My question was about which cases of undefined behavior
could be made into defined behavior without adding runtime checks.

James Kuyper

unread,

Mar 8, 2023, 6:11:54 PM3/8/23

to

On 3/8/23 16:34, Alf P. Steinbach wrote:
> On 2023-03-08 8:10 PM, James Kuyper wrote:

...

>> No, he meant "defining it". If they were to choose to define the
>> behavior of signed integer overflow, then on platforms where the defined
>> behavior was not the same as that which would naturally occur, run-time
>> checks would be required to change the behavior.
>
> Can you give a concrete (even if hypothetical) example of what you mean?

David has already done so.
The committee could define the behavior as wrapping, in a fashion
compatible with 2's complement behavior. In that case, run-time checks
would required on any platform where that wasn't the native behavior - a
small number of platforms, to be sure, but they do exist.

However, as David pointed out, wrapping is the most useless possible
behavior on signed overflow - it almost never gives a useful result.
Signed overflow almost always is the result of an error of some kind,
and should be treated as such. If the committee were, for example, to
choose to define the behavior as calling std::terminate(), it would be
more useful, but that would require run-time checks on virtually every
existing system.

David Brown

unread,

Mar 8, 2023, 6:59:34 PM3/8/23

to

It is also perhaps worth noting that if signed integer overflow is left
undefined, some optimisations are possible for the compiler that are not
if overflow is required to be wrapping. That might not qualify as
"run-time checks", but it is certainly "run-time overhead" in comparison
to the current language definition.

David Brown

unread,

Mar 8, 2023, 7:09:04 PM3/8/23

to

On 08/03/2023 21:48, Michael S wrote:
> On Wednesday, March 8, 2023 at 10:28:30 PM UTC+2, Keith Thompson wrote:

>> The only virtue of defining signed overflow as two's complement
>> wraparound is that it's convenient for the implementation. For users,
>> it would provide consistency, but having INT_MAX+1 yield INT_MIN is not
>> likely to be *useful*.
>
> My favorite example of usefulness of wrapping signed overflow behavior
> is CIC (a.k.a. Hogenauer) interpolation and decimation filters.
>

I think that still counts as "not likely to be useful". Yes, two's
complement wrapping on overflow is /occasionally/ useful, but only very
occasionally. And for many DSP algorithms, saturation would be much
more helpful than wrapping.

Overflow of fixed-size integers will, in general, give the wrong answer
- no matter how it is defined. Therefore, there is no point in defining
it to have a particular choice of wrong behaviour, to the detriment of
other aspects (such as optimisation, or the freedom for tools or
sanitizers to temporarily add run-time checks for debugging). Whatever
you do, some basic laws of normal mathematical integer arithmetic will
have to be broken - having the overflow behaviour left undefined
actually keeps more of them, and more useful laws (such as "x + 1 > x"
for any "x") than defined behaviours.

If there had to be one single defined behaviour, then saturation is the
only reasonable one. If you have a marble jar, and keep adding marbles,
it will reach a limit - try to add more, and you will still have the
same number of marbles in the jar (and more on the floor). It would be
mind-bogglingly insane to suggest that adding one more marble to a full
jar would give you a negative jarful of marbles.

David Brown

unread,

Mar 8, 2023, 7:12:45 PM3/8/23

to

Wrapping /can/ occasionally be useful (this kind of things is one such
case). Fortunately, there's an easy answer, even if the timers are
signed values - convert to unsigned, do the subtraction, convert back to
signed (if you want). That is all well-defined behaviour (albeit
partially implementation-dependent).

As a bit of trivia, Windows 95 had a bug that caused it to crash as soon
as that millisecond counter rolled over. The OS had been out for over
ten years before the bug was noticed - apparently no one had managed to
keep a Windows 95 system running for 49 days without it crashing for
some other reason!

Keith Thompson

unread,

Mar 8, 2023, 7:28:01 PM3/8/23

to

David Brown <david...@hesbynett.no> writes:
[...]

> As a bit of trivia, Windows 95 had a bug that caused it to crash as
> soon as that millisecond counter rolled over. The OS had been out for
> over ten years before the bug was noticed - apparently no one had
> managed to keep a Windows 95 system running for 49 days without it
> crashing for some other reason!

Or, perhaps more likely, nobody who had a Windows 95 system crash after
49 days ever figured out why it had crashed.

Malcolm McLean

unread,

Mar 9, 2023, 5:08:39 AM3/9/23

to

On Thursday, 9 March 2023 at 00:09:04 UTC, David Brown wrote:
>
> If there had to be one single defined behaviour, then saturation is the
> only reasonable one. If you have a marble jar, and keep adding marbles,
> it will reach a limit - try to add more, and you will still have the
> same number of marbles in the jar (and more on the floor). It would be
> mind-bogglingly insane to suggest that adding one more marble to a full
> jar would give you a negative jarful of marbles.
>

It makes it easy to put in a check.

int marbles = N;
marbles++;
if (marbles < 0)
/* This can't happen so we've overflowed. Take action. */

If you saturate, then you have to either put the test in before you do the calculation,
which will work for marbles++, but gets more difficult for marbles += N; or you have
to say that INT_MAX is an error value for the number of marbles. This is a headache
for anyone maintaining the code, as naturally they expect that the number of
marbles can go from 0 to INT_MAX inclusive.

Öö Tiib

unread,

Mar 9, 2023, 6:03:36 AM3/9/23

to

There are lot of low hanging undefined or unspecified behavior. Even what
I've open at the moment. <https://en.cppreference.com/w/cpp/io/c/FILE>
"While it may be possible to copy a valid std::FILE, using a pointer to such
a copy as an argument for an I/O function invokes unspecified behavior."
Such situation made sense maybe at 1998 ... at 2023 can require that
copying std::FILE is compiling error.

> Just one example: Signed integer overflow has undefined behavior, and
> defining it would require runtime checks.
>

Ideally such run-time checks will cost something only at hardware level
or when failing.

So when programming languages require checks then hardware guys have
market for putting those in.

Unfortunately the wannabe "better C++s" tend to choose wrap around
instead of failure on case of signed overflow. Wrap around is what
hardware already does but it makes very rarely sense. For most
programs an integer overflow is error like division by zero.
I would just add something for example "_Unsafe int" and
"_Wrapping int" for those who want signed int to have undefined
behavior or wrap around.

David Brown

unread,

Mar 9, 2023, 7:52:42 AM3/9/23

to

On 09/03/2023 01:27, Keith Thompson wrote:
> David Brown <david...@hesbynett.no> writes:
> [...]
>> As a bit of trivia, Windows 95 had a bug that caused it to crash as
>> soon as that millisecond counter rolled over. The OS had been out for
>> over ten years before the bug was noticed - apparently no one had
>> managed to keep a Windows 95 system running for 49 days without it
>> crashing for some other reason!
>
> Or, perhaps more likely, nobody who had a Windows 95 system crash after
> 49 days ever figured out why it had crashed.
>

You are spoiling a good legend with realism! :-)

David Brown

unread,

Mar 9, 2023, 8:06:18 AM3/9/23

to

I think a "look /before/ you leap" attitude is to be encouraged in
programming - rather than jump first and see what you can learn from the
post-mortem.

if (marbles < INT_MAX) {
marbles++;
} else {
// No room to proceed
}

or for gcc fans:

int m;
if (__builtin_add_overflow(marbles, 1, &m)) {
handle_overflow();
} else {
marbles = m;
}

Bonita Montero

unread,

Mar 9, 2023, 9:25:17 AM3/9/23

to

Am 07.03.2023 um 22:38 schrieb Lynn McGuire:
> "C++ on the Move" by Darryl K. Taft

> https://thenewstack.io/c-on-the-move/
>
> "While the C++ programming language seems to be as popular as ever,
> efforts to move it forward are ongoing both inside the ISO C++ steering
> committee as well as other projects."
>
> Lynn
>

Read this:
https://mega.nz/file/ehcR1S5B#5YcRMdcJ0WArshD0szI2bckaqVmk2jC59XmOZh_GXgM

Malcolm McLean

unread,

Mar 9, 2023, 10:15:05 AM3/9/23

to

On Thursday, 9 March 2023 at 13:06:18 UTC, David Brown wrote:
> On 09/03/2023 11:08, Malcolm McLean wrote:
> > On Thursday, 9 March 2023 at 00:09:04 UTC, David Brown wrote:
> >>
> >> If there had to be one single defined behaviour, then saturation is the
> >> only reasonable one. If you have a marble jar, and keep adding marbles,
> >> it will reach a limit - try to add more, and you will still have the
> >> same number of marbles in the jar (and more on the floor). It would be
> >> mind-bogglingly insane to suggest that adding one more marble to a full
> >> jar would give you a negative jarful of marbles.
> >>
> > It makes it easy to put in a check.
> >
> > int marbles = N;
> > marbles++;
> > if (marbles < 0)
> > /* This can't happen so we've overflowed. Take action. */
> >
> > If you saturate, then you have to either put the test in before you do the calculation,
> > which will work for marbles++, but gets more difficult for marbles += N; or you have
> > to say that INT_MAX is an error value for the number of marbles. This is a headache
> > for anyone maintaining the code, as naturally they expect that the number of
> > marbles can go from 0 to INT_MAX inclusive.
> >
> I think a "look /before/ you leap" attitude is to be encouraged in
> programming - rather than jump first and see what you can learn from the
> post-mortem.
>

In which case you can say that signed integer arithmetic has undefined behaviour,
because you catch the overflow before you do the calculation. Which is in fact
the current situation. One benefit of defining the behaviour is so that you can do
it post mortem. The other benefit is that if overflow isn''t handled correctly, the bug
is consistent.

> if (marbles < INT_MAX) {
> marbles++;
> } else {
> // No room to proceed
> }
>

That's how you've got to do it currently. Depending on the flow control, in actual code
it may be more or less difficult to understand than doing the test after the increment.
Defining the behaviour on overflow doesn't prevent you doing it your way, but it gives
you another option.

David Brown

unread,

Mar 9, 2023, 2:32:41 PM3/9/23

to

No, signed integer arithmetic has /defined/ behaviour - it is only
/overflow/ that has undefined behaviour.

> One benefit of defining the behaviour is so that you can do
> it post mortem. The other benefit is that if overflow isn''t handled correctly, the bug
> is consistent.

Consistent bugs are not necessarily better (or rather, less bad) than
inconsistent bugs. It can be helpful if a bug gives a consistent
detectable issue that point to the cause - but equally a consistent bug
could consistently hide the issue and lead you to look elsewhere for the
problem. The idea that consistent incorrect behaviour is better than
undefined behaviour is no more than a myth.

>
>> if (marbles < INT_MAX) {
>> marbles++;
>> } else {
>> // No room to proceed
>> }
>>
> That's how you've got to do it currently. Depending on the flow control, in actual code
> it may be more or less difficult to understand than doing the test after the increment.
> Defining the behaviour on overflow doesn't prevent you doing it your way, but it gives
> you another option.
>

It gives a /worse/ option that encourages bad programming habits. Do
not write code that is broken, and gets things wrong - no matter what
happens afterwards. (Note that this is very different from exceptions
in C++, where throwing an exception is part of the defined and
intentional behaviour of the code.)

But no, the snippet above is /not/ the way you want to handle things
normally. If you are checking for overflow like this, the chances are
that your code is badly designed in the first place. (There are always
exceptions and unusual cases.) You've got to ask yourself /why/ someone
might be trying to add a marble to a full jar in the first place, and
work backwards - somewhere in the earlier code there is likely to be
poorly designed code. Maybe there are too many marbles involved, or the
jar should be bigger, but something is probably wrong.

At the very least, the code should be :

if (marbles < jar_size) {
marbles++;
} else {
// The jar is full - maybe it is time to empty it
}

In real code, the jar would never be INT_MAX in size - thus post-mortem
checks for wrapping overflow cannot be used even if the language defined
integer arithmetic that way. Overflows are bugs - but not necessarily
in the bit of code they appear to be in.

Keith Thompson

unread,

Mar 9, 2023, 3:13:06 PM3/9/23

to

FILE is defined in the C standard library. The C standard says:

The address of the FILE object used to control a stream may be
significant; a copy of a FILE object need not serve in place of the
original.

which means that the behavior of passing a pointer to a copied FILE
object to a library function is implicitly undefined. (I'm not sure why
cppreference says the behavior is unspecified.)

C *can't* make copying a FILE object illegal. A C++ library
implementation could implement its own FILE type as a wrapper around the
C FILE type and prohibit copying it, but then passing a C++ FILE*
pointer to a C function that expects a C FILE* pointer would be awkward
at best. Maybe you could define implicit conversions (I'm not sure
whether that's even possible), but covering all the cases would be
difficult.

And I don't think it would even be worth the effort. FILE is treated as
an opaque type. That opacity isn't enforced, but I've never seen C or
C++ code that copies or even defines a FILE object. Real code just uses
FILE* pointers. It's a very minor corner case. If I wanted to provide
definitions for as much undefined behavior as possible, copying FILE
objects would be very low on my list.

>> Just one example: Signed integer overflow has undefined behavior, and
>> defining it would require runtime checks.
>>
> Ideally such run-time checks will cost something only at hardware level
> or when failing.

Ideally, sure.

> So when programming languages require checks then hardware guys have
> market for putting those in.
>
> Unfortunately the wannabe "better C++s" tend to choose wrap around
> instead of failure on case of signed overflow. Wrap around is what
> hardware already does but it makes very rarely sense. For most
> programs an integer overflow is error like division by zero.
> I would just add something for example "_Unsafe int" and
> "_Wrapping int" for those who want signed int to have undefined
> behavior or wrap around.

First you have to define what you mean by "failure". Do you want
evaluating INT_MAX+1 to abort the program? Throw an exception? Emit
nasal demons?

We're stuck with hardware with wraparound behavior for the foreseeable
future. Conceivably if C and C++ started requiring trapping on signed
overflow, hardware would gradually be updated to make that more
efficient, but we wouldn't see the effects for decades.

Michael S

unread,

Mar 9, 2023, 3:43:02 PM3/9/23

to

On Thursday, March 9, 2023 at 2:09:04 AM UTC+2, David Brown wrote:
> On 08/03/2023 21:48, Michael S wrote:
> > On Wednesday, March 8, 2023 at 10:28:30 PM UTC+2, Keith Thompson wrote:
>
> >> The only virtue of defining signed overflow as two's complement
> >> wraparound is that it's convenient for the implementation. For users,
> >> it would provide consistency, but having INT_MAX+1 yield INT_MIN is not
> >> likely to be *useful*.
> >
> > My favorite example of usefulness of wrapping signed overflow behavior
> > is CIC (a.k.a. Hogenauer) interpolation and decimation filters.
> >
> I think that still counts as "not likely to be useful". Yes, two's
> complement wrapping on overflow is /occasionally/ useful, but only very
> occasionally.

And CIC filters are just one example of very common pattern
where even number of integer overflows compensate each other,
producing correct final result. What is special in the case of CIC is that
the number of overflows can be quite large. In more common case there
are exactly two overflows compensating each other, most typically in
expression like y1 = y0 + x1 - x0.

> And for many DSP algorithms, saturation would be much
> more helpful than wrapping.
>

I did quite a lot of fix-point digital signal processing in the past.
In my personal experience saturated arithmeticis pretty much never useful.
What is useful, occasionally, is saturating conversion of wider signed type
into narrower signed type.

> Overflow of fixed-size integers will, in general, give the wrong answer
> - no matter how it is defined. Therefore, there is no point in defining
> it to have a particular choice of wrong behaviour, to the detriment of
> other aspects (such as optimisation, or the freedom for tools or
> sanitizers to temporarily add run-time checks for debugging). Whatever
> you do, some basic laws of normal mathematical integer arithmetic will
> have to be broken - having the overflow behaviour left undefined
> actually keeps more of them, and more useful laws (such as "x + 1 > x"
> for any "x") than defined behaviours.
>
> If there had to be one single defined behaviour, then saturation is the
> only reasonable one.

Absolutely not!
Saturation is non-associative, that's enough to take it out of consideration
for default behavior.

> If you have a marble jar, and keep adding marbles,
> it will reach a limit - try to add more, and you will still have the
> same number of marbles in the jar (and more on the floor). It would be
> mind-bogglingly insane to suggest that adding one more marble to a full
> jar would give you a negative jarful of marbles.

Somehow, you don't consider almost the same case insane for unsigned integers.

Michael S

unread,

Mar 9, 2023, 3:48:01 PM3/9/23

to

Your story was unrealistic to begin with.

Win95 was hardly used by anybody 5 years after its introduction, much less
so 10 years after introduction.
Win98, esp. Win98 SE, was simply better choice in almost any imaginable
situation.

Lynn McGuire

unread,

Mar 9, 2023, 5:59:23 PM3/9/23

to

No freaking way. Looks like a ransomware file to me.

Lynn

David Brown

unread,

Mar 9, 2023, 6:30:41 PM3/9/23

to

Wrapping doesn't even follow basic laws on ordering, taking it out of
consideration for default behaviour.

As I said, overflow will, in general, give incorrect answers. If you
choose to define its behaviour, it doesn't matter how you do it, it will
regularly be wrong and counter-productive, even if it would be helpful
in some use-cases. It will also spoil optimisations for code that does
not overflow.

If you want wrapping behaviour, make a class that supports that.

>
>> If you have a marble jar, and keep adding marbles,
>> it will reach a limit - try to add more, and you will still have the
>> same number of marbles in the jar (and more on the floor). It would be
>> mind-bogglingly insane to suggest that adding one more marble to a full
>> jar would give you a negative jarful of marbles.
>
> Somehow, you don't consider almost the same case insane for unsigned integers.
>

What makes you say that? I think that almost all cases over overflow of
unsigned integer arithmetic are the result of bugs in the code. (There
are a few idiomatic cases, like "unsigned int x = -1;".) Modulo
arithmetic makes more sense for unsigned types than signed types, but it
is still usually wrong.

Öö Tiib

unread,

Mar 9, 2023, 10:20:57 PM3/9/23

to

For me personally most undefined or unspecified behaviors individually
are low on list. The total time of explaining that this or that is such
behavior will be reduced and so people can deal with something more
valuable and therefore they have better programming language.

Compiler is tool, so can be required to add diagnostic. On case of FILE I see
no technical difficulties for neither C nor for C++ compiler. Would probably
just add some __attribute__ to the type for to ensure that diagnostic was
not confusing about copying incomplete type but about copying what
should not be copied.

> >> Just one example: Signed integer overflow has undefined behavior, and
> >> defining it would require runtime checks.
> >>
> > Ideally such run-time checks will cost something only at hardware level
> > or when failing.
>
> Ideally, sure.
>
> > So when programming languages require checks then hardware guys have
> > market for putting those in.
> >
> > Unfortunately the wannabe "better C++s" tend to choose wrap around
> > instead of failure on case of signed overflow. Wrap around is what
> > hardware already does but it makes very rarely sense. For most
> > programs an integer overflow is error like division by zero.
> > I would just add something for example "_Unsafe int" and
> > "_Wrapping int" for those who want signed int to have undefined
> > behavior or wrap around.
> First you have to define what you mean by "failure". Do you want
> evaluating INT_MAX+1 to abort the program? Throw an exception? Emit
> nasal demons?
>

I would prefer exception or signal.

> We're stuck with hardware with wraparound behavior for the foreseeable
> future. Conceivably if C and C++ started requiring trapping on signed
> overflow, hardware would gradually be updated to make that more
> efficient, but we wouldn't see the effects for decades.
>

It is all about the trend. Most platforms trap on division by zero for
very long time. Yet the standards declare that they define nothing. That
philosophy of unreliability of most performant languages has to be
repaired for hardware designers to see that there is market for their
efforts.

Bonita Montero

unread,

Mar 9, 2023, 11:58:14 PM3/9/23

to

Ransomware in a PDF or EPUB ?

Alf P. Steinbach

unread,

Mar 10, 2023, 3:17:48 AM3/10/23

to

When the first .jpg virus was discovered in 2002 I was happy, well a
little, because it was an obvious possibility to me but my students
laughed at me when I mentioned it, middle 1990's.

- Alf

David Brown

unread,

Mar 10, 2023, 4:36:26 AM3/10/23

to

On 10/03/2023 04:20, Öö Tiib wrote:

> It is all about the trend. Most platforms trap on division by zero for
> very long time. Yet the standards declare that they define nothing.

No, most platforms do not trap on division by zero.

The solid majority of different types of platforms that are programmed
in C (and C++ gets this stuff from C) are small microcontrollers and
other embedded processors.

The solid majority of processor cores made and sold are small
microcontrollers.

While it is true that pretty much all "big" processors and "big" OS's
will trap on division by zero, the C and C++ world is very much wider
than that. And it is absolutely correct for the standards to say
nothing about how division by zero should be handled (i.e., it should be
left undefined behaviour).

There is a strong tendency amongst programmers to think that all
processors are x86 or ARM, and all OS's are Linux or Windows.
Fortunately, the C and C++ standards committees are not as limited.

(I am of the opinion that undefined behaviour is often a good thing for
code that is clearly wrong, even if there were a consensus for how it
might be treated in most hardware. I want division by zero to be
undefined, because it means debug tools and analysers can be used to
help track such bugs in the code. But that is an orthogonal issue.)

David Brown

unread,

Mar 10, 2023, 4:44:38 AM3/10/23

to

Yes - pdf's in particular need to be treated with caution (especially if
you use that security disaster, Acrobat Reader).

But there is nothing in that URL to indicate that the target is a pdf,
epub, or anything else - it's just an incomprehensible string of letters
targeting a web host that is probably unknown to most people. It is
/exactly/ the sort of URL you'd expect for malware. A reader would have
to be very naïve or inexperienced to click on such a link unless they
were expecting to be given such a link from someone they trust, and when
they know what it is. Your post fails on all three accounts - I don't
expect many people will click on it.

If you post a link to the original source of the document (whatever it
might be), and a description of it and why it might be interesting for
people to read it, then it would be a lot more useful.

Öö Tiib

unread,

Mar 10, 2023, 7:27:23 AM3/10/23

to

On Friday, 10 March 2023 at 11:36:26 UTC+2, David Brown wrote:
> On 10/03/2023 04:20, Öö Tiib wrote:
>
> > It is all about the trend. Most platforms trap on division by zero for
> > very long time. Yet the standards declare that they define nothing.
> No, most platforms do not trap on division by zero.
>
> The solid majority of different types of platforms that are programmed
> in C (and C++ gets this stuff from C) are small microcontrollers and
> other embedded processors.
>
> The solid majority of processor cores made and sold are small
> microcontrollers.
>

Their reasoning of hardware designers is such: "division by 0 is undefined
anyway and so we counted that setting all bits of result to 1 takes few less
internal gates than setting all bits to 0 does."
So the whole reason of such design is lack of market to trap on division
by zero there.

> While it is true that pretty much all "big" processors and "big" OS's
> will trap on division by zero, the C and C++ world is very much wider
> than that. And it is absolutely correct for the standards to say
> nothing about how division by zero should be handled (i.e., it should be
> left undefined behaviour).
>

C and C++ do not want to be portable assemblers of whatever chips,
these want to be programming languages. From that viewpoint there
are no reason why something that hardware does not handle (despite
its cost to performance if handled is hard to notice) should be left
undefined behavior.

> There is a strong tendency amongst programmers to think that all
> processors are x86 or ARM, and all OS's are Linux or Windows.
> Fortunately, the C and C++ standards committees are not as limited.
>
> (I am of the opinion that undefined behaviour is often a good thing for
> code that is clearly wrong, even if there were a consensus for how it
> might be treated in most hardware. I want division by zero to be
> undefined, because it means debug tools and analysers can be used to
> help track such bugs in the code. But that is an orthogonal issue.)
>

The only "positive" thing of pointless undefined behaviors is job
security. All the time I have wasted into memorising those and
learning usage of tools to track those down gives me advantage
on job market. But the work is anyway plentiful, largely badly or
not done and I could do something more interesting instead of
helping to track down some undefined behavior.

Bonita Montero

unread,

Mar 10, 2023, 7:37:43 AM3/10/23

to

Am 10.03.2023 um 09:17 schrieb Alf P. Steinbach:

> When the first .jpg virus was discovered in 2002 I was happy, well
> a little, because it was an obvious possibility to me but my students
> laughed at me when I mentioned it, middle 1990's.

There might be an inconsistent content inside the file which might
cause intentional "misinterpretation" by the parser. But that's
rather unlikely.

Michael S

unread,

Mar 10, 2023, 8:10:21 AM3/10/23

to

Today .jpg viruses are indeed not likely. But PDF is inherently more vulnerable
than jpeg.

Even on the most basic level of functionality PDF is an encapsulated PostScript,
which is Turing-complete programming languge. Above that PDF often contain
other embedded formats that are expected to be displayed by the viewer.
And above all that many PDF viewers support JS scripts in the documents.

Malcolm McLean

unread,

Mar 10, 2023, 8:13:58 AM3/10/23

to

On Friday, 10 March 2023 at 12:27:23 UTC, Öö Tiib wrote:
> On Friday, 10 March 2023 at 11:36:26 UTC+2, David Brown wrote:
>
> > While it is true that pretty much all "big" processors and "big" OS's
> > will trap on division by zero, the C and C++ world is very much wider
> > than that. And it is absolutely correct for the standards to say
> > nothing about how division by zero should be handled (i.e., it should be
> > left undefined behaviour).
> >
> C and C++ do not want to be portable assemblers of whatever chips,
> these want to be programming languages. From that viewpoint there
> are no reason why something that hardware does not handle (despite
> its cost to performance if handled is hard to notice) should be left
> undefined behavior.
>

If we have

int x;
...
x++;

Then most C/C++ programmers would say that it's essential that this compiles
to
Load accumulator with contents of memory location x.
Increment accumulator.
Store accumulator at memory location x.

and not
Load Aummulator with memory location x.
Test Accumulator for equality to INT_MAX.
Jump if positive to error handling code.
Increment accumulator
Store accumulator at memory location x.

The details will vary with specific processor, of course.
Otherwise the case for using C / C++ is much weaker. People accept the difficulties of
fixed width types and the like, for the increased efficiency. And in a real program it's
both unlikely you can do anything useful if x overflows, and unlikely that this is possible
because of other contraints on the values x can hold.

Öö Tiib

unread,

Mar 10, 2023, 9:05:32 AM3/10/23

to

That maybe is so with most programmers or may be is not, I have no way of
knowing. They are provably mistaken. I believe that experienced programmers
tend to know that compilers produce programs that do what standard requires
and undefined behavior in standard means that it requires nothing.
For example gcc 11.2 fror x86-64 with -O3 does so:
<https://godbolt.org/z/Y6bTP3MK3>
There compiler "optimizes" some "simple loop up to 9" to infinite because of
undefined behavior.

> The details will vary with specific processor, of course.
> Otherwise the case for using C / C++ is much weaker. People accept the difficulties of
> fixed width types and the like, for the increased efficiency. And in a real program it's
> both unlikely you can do anything useful if x overflows, and unlikely that this is possible
> because of other contraints on the values x can hold.
>

Yes details will vary per processor, compiler and compiler options but
undefined behavior has full right not to manifest now but to do it in
insulting manner when your boss tries to demonstrate the "product"
to key investors.

Bonita Montero

unread,

Mar 10, 2023, 9:19:09 AM3/10/23

to

Am 10.03.2023 um 14:10 schrieb Michael S:

> Today .jpg viruses are indeed not likely.

Show me any lately incident ...

> But PDF is inherently more vulnerable than jpeg.

If you use sth. like Acrobat or Foxit that is capable of executing
embedded JavaScript. With other Readers that's very unlikely.

> Even on the most basic level of functionality PDF is an encapsulated

> PostScript, which is Turing-complete programming languge. ...

PDF is stripped PostScript without that capabilities.
And PostScript can't access external Resources.

David Brown

unread,

Mar 10, 2023, 9:54:57 AM3/10/23

to

On 10/03/2023 13:27, Öö Tiib wrote:
> On Friday, 10 March 2023 at 11:36:26 UTC+2, David Brown wrote:
>> On 10/03/2023 04:20, Öö Tiib wrote:
>>
>>> It is all about the trend. Most platforms trap on division by zero for
>>> very long time. Yet the standards declare that they define nothing.
>> No, most platforms do not trap on division by zero.
>>
>> The solid majority of different types of platforms that are programmed
>> in C (and C++ gets this stuff from C) are small microcontrollers and
>> other embedded processors.
>>
>> The solid majority of processor cores made and sold are small
>> microcontrollers.
>>
> Their reasoning of hardware designers is such: "division by 0 is undefined
> anyway and so we counted that setting all bits of result to 1 takes few less
> internal gates than setting all bits to 0 does."
> So the whole reason of such design is lack of market to trap on division
> by zero there.

Are you basing this on your intimate knowledge of the design decisions
made by multiple cpu design teams?

Everyone who has made it through primary school knows it makes no sense
to divide by zero. It therefore makes no sense for a processor to aim
for any particular result when someone tries to do the impossible. If a
processor already has strong support for hardware exceptions as a
debugging aid and for safe handling of multiple independent programs (as
you get on a "big" general purpose cpu), then it is likely to have a
trap of some sort on division by zero. If it is a chip optimised for
low cost, size or power, for dedicated single-program microcontroller
usage, then division by zero can be a "don't care". It might cause a
reset, a hang, a nonsense result, a consistent and documented result -
whatever.

It would be a terrible idea for an efficient language to try to define
what happens here, forcing compilers to generate pointless extra
run-time checks just because some people have bugs in their code.

>
>> While it is true that pretty much all "big" processors and "big" OS's
>> will trap on division by zero, the C and C++ world is very much wider
>> than that. And it is absolutely correct for the standards to say
>> nothing about how division by zero should be handled (i.e., it should be
>> left undefined behaviour).
>>
> C and C++ do not want to be portable assemblers of whatever chips,
> these want to be programming languages.

I have no idea what you are talking about here. C and C++ are not, and
never have been, "portable assemblers" - they are high level programming
languages defined by standards and abstract machines, not the behaviour
of particular processors. That is precisely why trying to retrofit the
language standards to match particular processor hardware is such a bad
idea.

> From that viewpoint there
> are no reason why something that hardware does not handle (despite
> its cost to performance if handled is hard to notice) should be left
> undefined behavior.
>

Of course there is. You have got this whole thing ass-backwards. (For
any Americans here who don't understand that phrase, it means putting
the donkey behind the cart.)

C and C++ leave a number of things as "implementation dependent", when
it is reasonable to expect that all hardware will be able to efficiently
support /some/ behaviour, but the details of what the behaviour is can
vary a lot.

They leave some things as "undefined behaviour" when it is reasonable to
expect that not all hardware can efficiently support /any/ defined and
consistent behaviour.

But there are many other reasons for having "undefined behaviour". That
also includes cases where there simply is no sensible concept of
"correct" behaviour, or even of expected and useful behaviour. Division
by zero (for integers) is an example. No matter what definition you
pick for it, it will be wrong. That includes trapping, or returning
particular values, or setting processor flags.

If you try to divide by zero, you have a bug in your code. The bug is
/before/ the division. That applies equally to languages that do
run-time checking and throw exceptions for division by zero - your code
is nonsensical, and therefore wrong.

Hardware cannot fix the bug in your program. The compiler cannot fix
the bug, nor can run-time checks. (Sometimes these can help you find
the bug.)

It makes no sense for a language to try to define behaviour on division
by zero - it is far better for the language to say "don't do that". Why
someone would then want the language to say what will happen when you do
something contrary to the rules of the language, is beyond my understanding.

>> There is a strong tendency amongst programmers to think that all
>> processors are x86 or ARM, and all OS's are Linux or Windows.
>> Fortunately, the C and C++ standards committees are not as limited.
>>
>> (I am of the opinion that undefined behaviour is often a good thing for
>> code that is clearly wrong, even if there were a consensus for how it
>> might be treated in most hardware. I want division by zero to be
>> undefined, because it means debug tools and analysers can be used to
>> help track such bugs in the code. But that is an orthogonal issue.)
>>
> The only "positive" thing of pointless undefined behaviors is job
> security. All the time I have wasted into memorising those and
> learning usage of tools to track those down gives me advantage
> on job market. But the work is anyway plentiful, largely badly or
> not done and I could do something more interesting instead of
> helping to track down some undefined behavior.

What a load of drivel.

No one wants /pointless/ undefined behaviour - that is a tautology. I
have already explained why undefined behaviour in cases like division by
zero and signed integer overflow is directly /useful/ and /beneficial/.
If you don't accept my arguments there, fair enough. But at least
accept that some programmers view things that way.

No programmer has to memorise undefined behaviour. /Everything/ is
undefined except the behaviours that are explicitly defined by the
language (and/or documented extensions in a particular tool). You have
to memories the /defined/ behaviours in C and C++ - this is called
"learning the language". If you don't know what the code you are
writing actually means, you are not doing your job correctly. (Of
course we all make mistakes sometimes - part of the job is finding and
fixing these.)

And the examples discussed here, such as division by zero, are so
obviously incorrect code that you should not have to learn the rules
specifically.

(There are certainly things in the C and C++ standards that are
explicitly labelled as "undefined behaviour" that could be changed to
required diagnostics for the compiler or linker, or perhaps to
implementation-dependent behaviour or even fully defined behaviour. But
division by 0 and signed integer overflow are not amongst them.)

David Brown

unread,

Mar 10, 2023, 9:59:48 AM3/10/23

to

On 10/03/2023 14:13, Malcolm McLean wrote:
> On Friday, 10 March 2023 at 12:27:23 UTC, Öö Tiib wrote:
>> On Friday, 10 March 2023 at 11:36:26 UTC+2, David Brown wrote:
>>
>>> While it is true that pretty much all "big" processors and "big" OS's
>>> will trap on division by zero, the C and C++ world is very much wider
>>> than that. And it is absolutely correct for the standards to say
>>> nothing about how division by zero should be handled (i.e., it should be
>>> left undefined behaviour).
>>>
>> C and C++ do not want to be portable assemblers of whatever chips,
>> these want to be programming languages. From that viewpoint there
>> are no reason why something that hardware does not handle (despite
>> its cost to performance if handled is hard to notice) should be left
>> undefined behavior.
>>
> If we have
>
> int x;
> ...
> x++;
>
> Then most C/C++ programmers would say that it's essential that this compiles
> to
> Load accumulator with contents of memory location x.
> Increment accumulator.
> Store accumulator at memory location x.
>

No, most C or C++ programmers would /not/ say that. Most have no idea
what an "accumulator" is. (And most use processors that do not have an
accumulator.) Then there are the programmers who understand how the
languages are defined and how compilers work, and would certainly not
say that it is essential for a compiler to generate code like that.

> and not
> Load Aummulator with memory location x.
> Test Accumulator for equality to INT_MAX.
> Jump if positive to error handling code.
> Increment accumulator
> Store accumulator at memory location x.
>

If you are trying to say that most C and C++ programmers expect
compilers to generate efficient object code for simple source code,
without extra run-time checks, then say that.

> The details will vary with specific processor, of course.
> Otherwise the case for using C / C++ is much weaker. People accept the difficulties of
> fixed width types and the like, for the increased efficiency. And in a real program it's
> both unlikely you can do anything useful if x overflows, and unlikely that this is possible
> because of other contraints on the values x can hold.

Agreed.

Scott Lurndal

unread,

Mar 10, 2023, 10:01:49 AM3/10/23

to

For that:

unsigned int x = ~0u;

Malcolm McLean

unread,

Mar 10, 2023, 10:12:29 AM3/10/23

to

On Friday, 10 March 2023 at 14:54:57 UTC, David Brown wrote:
> On 10/03/2023 13:27, Öö Tiib wrote:
> >
> If you try to divide by zero, you have a bug in your code. The bug is
> /before/ the division. That applies equally to languages that do
> run-time checking and throw exceptions for division by zero - your code
> is nonsensical, and therefore wrong.
>
> Hardware cannot fix the bug in your program. The compiler cannot fix
> the bug, nor can run-time checks. (Sometimes these can help you find
> the bug.)
>
> It makes no sense for a language to try to define behaviour on division
> by zero - it is far better for the language to say "don't do that". Why
> someone would then want the language to say what will happen when you do
> something contrary to the rules of the language, is beyond my understanding.
>

There's a point in saying, for example, that division by zero will always terminate
the program with an error message. Then if you do have a division by zero bug,
that's easier to find than if the result is set to a random value. Rarely would you
wish to generate this behaviour directly, but an exception is in an environment
with poor diagnostic facilites. Then you can put a division by zero in the code,
and if the error message triggers, you know that execution has reached that point.
Sometime you ned to resort to these stratagems.

If division by zero raises a non-signalling NaN, you might want to use that to generate
a NaN, which sometimes can be useful.

David Brown

unread,

Mar 10, 2023, 10:13:21 AM3/10/23

to

On 10/03/2023 14:10, Michael S wrote:
> On Friday, March 10, 2023 at 2:37:43 PM UTC+2, Bonita Montero wrote:
>> Am 10.03.2023 um 09:17 schrieb Alf P. Steinbach:
>>
>>> When the first .jpg virus was discovered in 2002 I was happy, well
>>> a little, because it was an obvious possibility to me but my students
>>> laughed at me when I mentioned it, middle 1990's.
>> There might be an inconsistent content inside the file which might
>> cause intentional "misinterpretation" by the parser. But that's
>> rather unlikely.
>
> Today .jpg viruses are indeed not likely. But PDF is inherently more vulnerable
> than jpeg.
>

Yes. JPEG format does not include executable code of any kind, so it's
use as a malware format relies on bugs in JPEG handling software that
incorrectly execute data in the file as code. This is how the infamous
jpeg viruses worked on Windows (relying on bugs in Internet Exploder, IIRC).

Many other file formats /do/ support executable code. Some of
Microsoft's font file formats, for example, can happily include any kind
of executable code.

> Even on the most basic level of functionality PDF is an encapsulated PostScript,
> which is Turing-complete programming languge. Above that PDF often contain
> other embedded formats that are expected to be displayed by the viewer.
> And above all that many PDF viewers support JS scripts in the documents.

PDF files support only a very small fraction of Postscript, and are not
(AFAIUI) suitable for malware in themselves. At worst, you might have a
PDF that contains infinite loops or other denial-of-service attacks.
Postscript files are much more powerful in terms of their programming
language.

But as you say, the key vector for PDF malware is embedded Javascript.
PDF files can also contain built-in attachments - if an attached
executable file can be run from Javascript via an insecure reader, or if
the user can be persuaded to start it themselves, then there are few
limits to how bad it can be.

Kalevi Kolttonen

unread,

Mar 10, 2023, 10:14:23 AM3/10/23

to

David Brown <david...@hesbynett.no> wrote:
> No, most C or C++ programmers would /not/ say that. Most have no idea
> what an "accumulator" is.

I am a quite lousy C programmer who knows only the basics of C++, but
I have also programmed a little bit in assembly. I certainly do know how
a simple CPU works, and I'd expect this to be very common knowlegde
among C programmers. It is no rocket science unless you intend to
get intimately familiar with all the features of a modern, complex CPU.

The "register" keyword is probably the closest that C language comes
to a CPU and the compilers are free to ignore "register", but if you
learn the C language, you should also learn the basics of how a
CPU and memory access works.

It is also good to know something about how a C compiler works, so
that you understand how simple C programs are translated into
assembly. When I went to my University over 20 years ago, all these
things were required knowlegde very early in the education process.

br,
KK

Malcolm McLean

unread,

Mar 10, 2023, 10:23:47 AM3/10/23

to

That might not go in. I'm showing what the run time check would be, on
a simple imaginary but realistic basic processor. That's more effective than
talking in vague generalities.

David Brown

unread,

Mar 10, 2023, 10:39:01 AM3/10/23

to

Yes, but the advantage of the "-1" version is that it scales
automatically to the size of the variable - you don't need to write
"~0ull" or whatever.

David Brown

unread,

Mar 10, 2023, 10:46:03 AM3/10/23

to

On 10/03/2023 16:14, Kalevi Kolttonen wrote:
> David Brown <david...@hesbynett.no> wrote:
>> No, most C or C++ programmers would /not/ say that. Most have no idea
>> what an "accumulator" is.
>
> I am a quite lousy C programmer who knows only the basics of C++, but
> I have also programmed a little bit in assembly. I certainly do know how
> a simple CPU works, and I'd expect this to be very common knowlegde
> among C programmers. It is no rocket science unless you intend to
> get intimately familiar with all the features of a modern, complex CPU.
>

Most programmers know nothing about assembly.

And even many of those that do, would not be particularly familiar with
an "accumulator" - or would use the term incorrectly. Most modern
processors are not accumulator based. (x86 has a heritage that
stretches back to an accumulator-based model, but has long ceased to
follow that style.)

(I think it is good for C and C++ programmers to have some familiarity
with assembly on their main target processors.)

> The "register" keyword is probably the closest that C language comes
> to a CPU and the compilers are free to ignore "register", but if you
> learn the C language, you should also learn the basics of how a
> CPU and memory access works.

Yes (although compilers can't quite ignore "register" - they are obliged
to complain if you try to take the address of a "register" variable or
parameter).

>
> It is also good to know something about how a C compiler works, so
> that you understand how simple C programs are translated into
> assembly. When I went to my University over 20 years ago, all these
> things were required knowlegde very early in the education process.
>

I agree. But I don't think many programmers /are/ familiar with the
workings of processors. I have no statistics, but I don't think more
than a fraction of C and C++ programmers learned programming from
comprehensive university-level education - short courses and
self-learning are, I think, more common.

David Brown

unread,

Mar 10, 2023, 10:48:20 AM3/10/23

to

Perhaps that is true.

David Brown

unread,

Mar 10, 2023, 10:53:31 AM3/10/23

to

I agree that trapping can be useful as a debugging aid - though it is
important to remember that the bug is /not/ at the point when the trap
comes, but some unknown time prior to that.

But it would be wrong for the language to /require/ such behaviour. You
can be looking for different features at different stages of
development, or for different kinds of program and different kinds of
target. Leaving the behaviour undefined in the standards gives users
and toolchain developers the freedom to provide options and better
features, while defining the behaviour limits things and leaves some
platforms obliged to generate inefficient code unnecessarily, while
simultaneously stopping more powerful tools from proving more aid to the
developer.

> If division by zero raises a non-signalling NaN, you might want to use that to generate
> a NaN, which sometimes can be useful.

If you are talking about floating point here, the mathematical and
programming model is significantly different from integer arithmetic.

Kalevi Kolttonen

unread,

Mar 10, 2023, 11:02:59 AM3/10/23

to

David Brown <david...@hesbynett.no> wrote:
> [...] But I don't think many programmers /are/ familiar with the

> workings of processors. I have no statistics, but I don't think more
> than a fraction of C and C++ programmers learned programming from
> comprehensive university-level education - short courses and
> self-learning are, I think, more common.

Long time ago I heard of a Java programmer who found C to be
totally incomprehensible. Java was the only language he knew, and
as Java is full of mandatory classes/objects everywhere, his
understanding of computer's inner workings was based very heavily
on the notion of classes/objects.

For instance, he could not see how C code could even exist
"outside" any classes. I am convinced he had no understanding of
how Von Neumann machine based CPUs work either.

He could have been a brilliant Java programmer, though.

br,
KK

Öö Tiib

unread,

Mar 10, 2023, 11:20:51 AM3/10/23

to

I did nowhere argue that division by zero is correct thing to do.

> It would be a terrible idea for an efficient language to try to define
> what happens here, forcing compilers to generate pointless extra
> run-time checks just because some people have bugs in their code.
>

I understand that you do not want that undefined behavior is reduced.

> >
> >> While it is true that pretty much all "big" processors and "big" OS's
> >> will trap on division by zero, the C and C++ world is very much wider
> >> than that. And it is absolutely correct for the standards to say
> >> nothing about how division by zero should be handled (i.e., it should be
> >> left undefined behaviour).
> >>
> > C and C++ do not want to be portable assemblers of whatever chips,
> > these want to be programming languages.
>
> I have no idea what you are talking about here. C and C++ are not, and
> never have been, "portable assemblers" - they are high level programming
> languages defined by standards and abstract machines, not the behaviour
> of particular processors. That is precisely why trying to retrofit the
> language standards to match particular processor hardware is such a bad
> idea.
>

These have never been. So language should not consider what particular
processor hardware does.

That is building misrepresentation and then saying that the result is beyond
your understanding. I neither do nor want to do anything contrary to
the rules of language. I want the rules of language to become better.
Typical case is that programmer did invoke undefined behavior but
program appears to work like he wanted ... until it does not. Who
benefits from that?

> >> There is a strong tendency amongst programmers to think that all
> >> processors are x86 or ARM, and all OS's are Linux or Windows.
> >> Fortunately, the C and C++ standards committees are not as limited.
> >>
> >> (I am of the opinion that undefined behaviour is often a good thing for
> >> code that is clearly wrong, even if there were a consensus for how it
> >> might be treated in most hardware. I want division by zero to be
> >> undefined, because it means debug tools and analysers can be used to
> >> help track such bugs in the code. But that is an orthogonal issue.)
> >>
> > The only "positive" thing of pointless undefined behaviors is job
> > security. All the time I have wasted into memorising those and
> > learning usage of tools to track those down gives me advantage
> > on job market. But the work is anyway plentiful, largely badly or
> > not done and I could do something more interesting instead of
> > helping to track down some undefined behavior.
> What a load of drivel.
>
> No one wants /pointless/ undefined behaviour - that is a tautology. I
> have already explained why undefined behaviour in cases like division by
> zero and signed integer overflow is directly /useful/ and /beneficial/.
> If you don't accept my arguments there, fair enough. But at least
> accept that some programmers view things that way.
>

I have observed no benefit of undefined behavior ever. I would still
leave some kind of attribute [[unsafe]] for people who want it.

> No programmer has to memorise undefined behaviour. /Everything/ is
> undefined except the behaviours that are explicitly defined by the
> language (and/or documented extensions in a particular tool). You have
> to memories the /defined/ behaviours in C and C++ - this is called
> "learning the language". If you don't know what the code you are
> writing actually means, you are not doing your job correctly. (Of
> course we all make mistakes sometimes - part of the job is finding and
> fixing these.)
>

That makes those to take so lot of years to learn. You think that the
1000 page standard in that cryptic language lawyer slang said something
about some case but then you check and no it did not. Maybe
elsewhere ... none there too. It may take some years before you face
the situation but sooner or later there will be.

> And the examples discussed here, such as division by zero, are so
> obviously incorrect code that you should not have to learn the rules
> specifically.
>
> (There are certainly things in the C and C++ standards that are
> explicitly labelled as "undefined behaviour" that could be changed to
> required diagnostics for the compiler or linker, or perhaps to
> implementation-dependent behaviour or even fully defined behaviour. But
> division by 0 and signed integer overflow are not amongst them.)
>

Undefined behavior is neither obvious nor certain. Incorrectness results
with program doing whatever only because rules say so. Throwing from
function "void f() noexcept;" is also obviously incorrect yet C++ standard
requires calling std::terminate() on that case.
There are reasons why intrinsics like __builtin_saddll_overflow are
there. Only that I do not want the code to call function named in Orcish
and check a bool. I would like guaranteed terminate, signal or exception.

Scott Lurndal

unread,

Mar 10, 2023, 11:27:15 AM3/10/23

to

Here's what one of the Burroughs mainframes did with DIV:

==== Function ====

The Divide instruction divides the contents of one memory location **B** by the contents of a second memory location **A**
storing the remainder in the **B** data field and storing
the quotient in a third memory location **C**.

The length of the __dividend__ field must be greater than the length of the
__divisor__ field (**BF** greater than **AF**). The length of the
__quotient__ field is the difference in length of the **A** and **B** fields
(**BF** - **AF**). If the result is too large to fit into the __quotient__
field or if **BF** is not greater than **AF**, the division is not performed, the
contents of **B** and **C** are unchanged, the
[[processor_state:comparison_flags|Comparison Flags]] are unchanged and the
[[processor_state:overflow_flag|Overflow Flag]] is set.

If the absolute value of the __divisor__ is not greater than the absolute
value of the equivalent number of leading digits of the __dividend__, the
division is not performed and the [[processor_state:overflow_flag|Overflow Flag]] is set with the
[[processor_state:comparison_flags|Comparison Flags]] remaining unchanged.
Note that a __divisor__ which is zero will fail this test and the
[[processor_state:overflow_flag|Overflow Flag]] will be set.

Store the absolute value of the quotient when the __quotient__ field data type is unsigned (**UN** or **UA**). Store the standard **EBCDIC** form of the result sign as the first digit of the result when the __quotient__ field data type is **SN**. Fill the zone digit with the **EBCDIC** numeric subset code (**F**) when the __quotient__ field data type is alphanumeric (**UA**).

Store the absolute value of the remainder when the __remainder__ field data type is unsigned (**UN** or **UA**).
Fill the zone digit with the **EBCDIC** numeric subset code (**F**) when the __remainder__ field data type is alphanumeric (**UA**).
When the __remainder__ field data type is signed numeric (**SN**), then the
absolute value of the remainder is stored after the __remainder__ sign digit,
leaving the __remainder__ sign digit with the original contents of the
__dividend__ sign digit.

Only the numeric digits of an alphanumeric field enter into the operation.

Unsigned (**UN** or **UA**) operands are assumed to be positive.

The sign of a __quotient__ is positive if the sign of the __divisor__ and the
__dividend__ are the same or the __quotient__ is zero, otherwise the sign is
negative.

If the __dividend__ data type is **SN**, the sign of the __dividend__ will be
left unchanged in memory and will thus become the sign of the __remainder__.
Therefore this final __remainder__ sign could be other than *C* or *D* and a
__remainder__ of zero magnitude could have a negative sign.

If the operand data contains undigits other than in the sign digit, cause an //Invalid Arithmetic Data// fault. See [[compatibility_notes:a.16|Compatibility Notes A.16]].

==== Comparison Flags ====

In all cases except overflow, set the [[processor_state:comparison_flags|Comparison Flags]] to indicate whether the sum is greater than (**HIGH**), equal to (**EQUAL**) or less than (**LOW**) zero.

Scott Lurndal

unread,

Mar 10, 2023, 11:36:55 AM3/10/23

to

You can always use ~0ull and it will be truncated as needed. But
I'd argue the programmer should be aware of that caveat when writing the
code.

Kalevi Kolttonen

unread,

Mar 10, 2023, 11:49:06 AM3/10/23

to

Öö Tiib <oot...@hot.ee> wrote:
> [...] I neither do nor want to do anything contrary to

> the rules of language. I want the rules of language to become better.
> Typical case is that programmer did invoke undefined behavior but
> program appears to work like he wanted ... until it does not. Who
> benefits from that?

Apologies for mentioning UB in C in a C++ newsgroup, but
everybody needs to realize that it is very important to
avoid UB at all costs. The following is a true horror
story and much worse things than this could happen.

When I was working as a system administrator, we used Cyrus IMAPD
as email storage. I guess the Cyrus version was 2.4.17 and if I
remember right, at the time it was running on Red Hat
Enterprise Linux 7. This Cyrus version had worked flawlessly
for years, but unfortunately the C code had UB.

In the code handling mailboxes database, I think I remember there
was an incorrect invocation of strcpy, a well-known C standard
library function. The strcpy manual page clearly states:

The strings may not overlap, and the destination string
dest must be large enough to receive the copy

The strings did overlap, but nobody ever noticed it, since glibc
implementation of strcpy guarded against this mistake. As we
know, with UB, anything could happen, including producing the
correct behavior as intended by the programmer.

Then one day glibc maintainers decided to optimize their
AMD64 implementation of strcpy, and the guards for detecting
overlapping strings were removed. Next thing we knew was that
the mailboxes database was getting more and more corrupted,
ending up in such a bad state that Cyrus IMAPD would no longer
start.

It was not a fun task to restore the mailboxes database
into working order. Lesson: Never rely on UB!

br,
KK

Keith Thompson

unread,

Mar 10, 2023, 2:08:38 PM3/10/23

to

Lynn McGuire <lynnmc...@gmail.com> writes:
> On 3/9/2023 8:26 AM, Bonita Montero wrote:

>> Am 07.03.2023 um 22:38 schrieb Lynn McGuire:
>>> "C++ on the Move" by Darryl K. Taft
>>> https://thenewstack.io/c-on-the-move/
>>>
>>> "While the C++ programming language seems to be as popular as ever,
>>> efforts to move it forward are ongoing both inside the ISO C++
>>> steering committee as well as other projects."
>>>
>>> Lynn

>>>
>> Read this:
>> https://mega.nz/file/ehcR1S5B#5YcRMdcJ0WArshD0szI2bckaqVmk2jC59XmOZh_GXgM
>
> No freaking way. Looks like a ransomware file to me.

I opened the link in a manner that is unlikely to allow any malware to
affect me. The browser reports it as "cppMoveBook.zip", 5.9 MB. I'm
not going to download it.

I wonder if it's a copy of "C++ Move Semantics The Complete Guide" by
Nicolai M. Josuttis (which is available legally as an ebook from
leanpub.com).

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for XCOM Labs
void Void(void) { Void(); } /* The recursive call of the void */

Keith Thompson

unread,

Mar 10, 2023, 2:14:41 PM3/10/23

to

Öö Tiib <oot...@hot.ee> writes:
[...]
[...]

> C and C++ do not want to be portable assemblers of whatever chips,
> these want to be programming languages. From that viewpoint there
> are no reason why something that hardware does not handle (despite
> its cost to performance if handled is hard to notice) should be left
> undefined behavior.

[...]

You say there's "no reason", and then you explain in the same sentence
what the reason is: performance.

I don't see how the distinction between "portable assemblers" (which
pretty much do not exist) and programming languages is relevant.

Keith Thompson

unread,

Mar 10, 2023, 2:33:38 PM3/10/23

to

David Brown <david...@hesbynett.no> writes:
[...]

> It would be a terrible idea for an efficient language to try to define
> what happens here, forcing compilers to generate pointless extra
> run-time checks just because some people have bugs in their code.

[...]

> It makes no sense for a language to try to define behaviour on
> division by zero - it is far better for the language to say "don't do
> that". Why someone would then want the language to say what will
> happen when you do something contrary to the rules of the language, is
> beyond my understanding.

[...]

Let me offer Ada as a counterexample (and I'm not suggesting that
C++ should follow Ada's example).

Signed numeric overflow and division by zero, along with a
number of other conditions like out of bounds array access and
dereferencing a null pointer, are defined in Ada to raise the
predefined Constraint_Error exception. If I write code that divides
by zero and then handles the exception, the behavior is well defined.
If I don't handle the exception, the program (or task, similar to
a thread) is aborted. The result of a division is either a well
defined numeric result or a well defined exception.

Since exceptions and the conditions that raise them are defined in
the language, optimizing compilers can omit a lot of checks if they
can prove that they'll never be triggered.

There are also language mechanisms to suppress checks. If a
check would have been triggered but is suppressed, the behavior
is undefined.

There is certainly a non-zero cost to this -- a cost that you can
avoid by suppressing checks -- but the cost is not all that great,
certainly less than the cost of using an interpreted language rather
than a compiled language.

David Brown

unread,

Mar 11, 2023, 4:44:13 AM3/11/23

to

The thing I personally find feels slightly wrong about initialising an
unsigned variable with -1 is that the value /actually/ assigned does not
match the value in the code. That would apply equally to using ~0ull if
it had to be truncated.

For my type of code, it is normal to use exact width integers, so you
know exactly what value you want for all ones - thus I'd normally write
"uint16_t x = 0xffff;" or "uint32_t y = 0xffff'ffff;".

People and projects have different styles and preferences.

Öö Tiib

unread,

Mar 11, 2023, 7:10:25 AM3/11/23

to

On Friday, 10 March 2023 at 21:14:41 UTC+2, Keith Thompson wrote:
> Öö Tiib <oot...@hot.ee> writes:
> [...]
> [...]
> > C and C++ do not want to be portable assemblers of whatever chips,
> > these want to be programming languages. From that viewpoint there
> > are no reason why something that hardware does not handle (despite
> > its cost to performance if handled is hard to notice) should be left
> > undefined behavior.
> [...]
>
> You say there's "no reason", and then you explain in the same sentence
> what the reason is: performance.
>

Performance of elementary operations depends on hardware and it
can choose to do something in parallel or as sequential
micro-operations. There possibly is still hardware in usage where adding
two integer registers takes multiple cycles but there is also hardware
where 4 unrelated with each other adds in a row take one cycle.

> I don't see how the distinction between "portable assemblers" (which
> pretty much do not exist) and programming languages is relevant.
>

It is relevant in sense that its rules should be about usefulness for
programming software. The throughput and latency of operations into
what it is translated can change.

Alf P. Steinbach

unread,

Mar 12, 2023, 6:42:44 AM3/12/23

to

On 2023-03-10 8:08 PM, Keith Thompson wrote:
> Lynn McGuire <lynnmc...@gmail.com> writes:
>> On 3/9/2023 8:26 AM, Bonita Montero wrote:
>>> Am 07.03.2023 um 22:38 schrieb Lynn McGuire:
>>>> "C++ on the Move" by Darryl K. Taft
>>>> https://thenewstack.io/c-on-the-move/
>>>>
>>>> "While the C++ programming language seems to be as popular as ever,
>>>> efforts to move it forward are ongoing both inside the ISO C++
>>>> steering committee as well as other projects."
>>>>
>>>> Lynn
>>>>
>>> Read this:
>>> https://mega.nz/file/ehcR1S5B#5YcRMdcJ0WArshD0szI2bckaqVmk2jC59XmOZh_GXgM
>>
>> No freaking way. Looks like a ransomware file to me.
>
> I opened the link in a manner that is unlikely to allow any malware to
> affect me. The browser reports it as "cppMoveBook.zip", 5.9 MB. I'm
> not going to download it.
>
> I wonder if it's a copy of "C++ Move Semantics The Complete Guide" by
> Nicolai M. Josuttis (which is available legally as an ebook from
> leanpub.com).

It is.

I don't know if it's legal.

But then I have retained a collection of formally illegal books that
I've downloaded to answer students questions about them, plus sometimes
my own nostalgic interest (e.g. Niklaus Wirth's "Algorithms + Data
Structures = Programs", unfortunately the only available version is the
visually ugly Oberon edition, not my beloved original Pascal version).

Association: Andrei Alexandrescu once expressed anger that I had
inadvertently accepted a clc++m article that referred to an illegal PDF
of his classic "Modern C++ Design" book. As I remember it I apologized.
I really hope that I remember that correctly. :(

I used to have that book on paper, one of the three C++ paper books I've
had (Bjarne's TCPPPL in 1st, 2nd and 3rd editions, K&R TCPL in 1st and
2nd editions, and Andrei's MCPPD in 1st edition), but somehow it's
disappeared; the only one I see now is TCPL.

- Alf

David Brown

unread,

Mar 12, 2023, 6:54:57 AM3/12/23

to

On 10/03/2023 17:20, Öö Tiib wrote:

> Undefined behavior is neither obvious nor certain. Incorrectness results
> with program doing whatever only because rules say so. Throwing from
> function "void f() noexcept;" is also obviously incorrect yet C++ standard
> requires calling std::terminate() on that case.
> There are reasons why intrinsics like __builtin_saddll_overflow are
> there. Only that I do not want the code to call function named in Orcish
> and check a bool. I would like guaranteed terminate, signal or exception.
>

Even if it has not always been clear in this thread, I think we can
agree on quite a number of things:

Any code that attempts to execute code with undefined behaviour, is
buggy. In some cases, it might work well enough for a while and then
fail due to later changes (such as in the anecdote about strcpy).

Undefined behaviour which could be caught at compile/link time without
unreasonable compiler effort, should be caught there.

Some kinds of undefined behaviour - bugs in the code - could be detected
by run-time checks, with or without hardware support. If hardware can
handle the checks with no overhead, that's great. If there is overhead
in code speed or missed optimisation, that must be balanced against the
benefits of catching bugs.

(If you don't agree on these points, I hope you'll say so.)

I am of the opinion that if a language cannot define a particular
behaviour in a correct manner, then it should be left undefined. I see
no point in a language insisting on incorrect behaviour. Leaving it
undefined gives the implementation freedom to optimise on the assumption
that it does not happen when the user is looking for optimal efficiency,
and also to allow extra checks and debugging aids to help developers
find problems in their code. Defining behaviour limits that, while not
being able to give the programmer any useful benefits.

Let's take a couple of examples. For signed integer overflow, there are
two realistic possible definitions for behaviour - two's complement
wrapping, and trapping. (There are other possibilities, such as
saturation, but that's going to be inefficient on most hardware.) Some
languages, such as Java, use wrapping. For some simple expressions,
that is efficient on all realistic hardware - but for many expressions
it limits optimisations and is therefore less efficient. It guarantees
an incorrect answer in a lot of situations. And it stops tools from
being able to add checks and traps for bug-hunting aids, as the language
does not consider overflow to be an error. Thus it is often worse than
useless as a feature for general coding, even though it occasionally helps.

Some languages, such as Ada, consider overflow to be a run-time error
that must be detected, leading to an exception or other trapping
behaviour. That's great for debugging, but can be very inefficient in
practice, even when the programmer knows the calculation cannot overflow
(if the compiler can /prove/ there can be no overflow, it can omit the
checks). And when such exceptions are part of the defined behaviour,
the program needs to handle them - in many cases, there is no sensible
choice of behaviour for the program when it sees such errors, and any
code paths handling them are not seriously considered or seriously tested.

Other languages, such as Python, do not have overflow - the integer
types are extendable and grow as needed. That means you always get the
correct answer, but at such considerable run-time cost that it is not
appropriate for a language aimed at efficient compiled code.

So there are pros and cons of these possible ways of handling overflow,
but everything other than "undefined behaviour that is assumed not to
happen in correct code" has a cost. For something as low-level and
common as integer arithmetic, any cost can quickly become significant.

Whereas other languages are often fixed in their handling of overflow,
C++ is flexible - you can make your own classes to handle overflow just
the way you want. The language does not impose any choice on the users,
but provides the mechanisms to let you make whatever kinds of integers
you want with whatever overflow handling you want. (You can also do
C-style manual checks whenever you feel the need.)

As another common example of undefined behaviour, let's look at
dereferencing invalid pointers - such as attempting to access array
elements outside of the array's bounds. This is UB in C and C++. On
bigger systems, hardware (MMU) and software (the OS) combine to give
near zero-overhead trapping for accesses that are far out of bounds,
outside the memory space of the process. This is vital for secure and
reliable multi-program and multi-user systems. But it could not be
behaviour defined or required in the language standard, because many
smaller systems (typically single-program devices) cannot do anything
like this, and even on big systems the mechanisms can vary wildly.

For "close miss" errors, where the target address is inside the
processes' memory spaces, there is no general low-cost mechanism to
detect and prevent such invalid accesses. The language would require
"fat pointers" everywhere, with very different ways of dealing with
accesses, array slices, etc. Basically, the language would have to
remove the concepts of "pointer" and direct references entirely - fine
for slow, safe, interpreted Python but not appropriate for C or C++.
Leaving the behaviour undefined means tools can add checking (like
memory sanitizers or valgrind) for debugging, and optimise code that you
know is correct. Note also that there is no general way to determine if
an access will be a "close miss" or outside the processes' memory spaces.

And again, you can add whatever behaviour and checks you like for your
own use. Put in checks manually. Make your own "array" class that has
bound-checked indexing by default, if that's what suits you.

"Undefined behaviour" is key to how C and C++ work, and what makes them
great languages. It is utterly essential to the philosophy of "zero
overhead" and not paying for things you don't use. As a C and C++
programmer, it is /your/ choice and /your/ responsibility to make sure
the code is correct, and to choose whatever level of extra checking you
feel appropriate to the task in hand. With C++ you can make your
classes to automate this as needed - UB is vital to giving you that freedom.

David Brown

unread,

Mar 12, 2023, 7:06:40 AM3/12/23

to

On 11/03/2023 13:10, Öö Tiib wrote:
> On Friday, 10 March 2023 at 21:14:41 UTC+2, Keith Thompson wrote:
>> Öö Tiib <oot...@hot.ee> writes:
>> [...]
>> [...]
>>> C and C++ do not want to be portable assemblers of whatever chips,
>>> these want to be programming languages. From that viewpoint there
>>> are no reason why something that hardware does not handle (despite
>>> its cost to performance if handled is hard to notice) should be left
>>> undefined behavior.
>> [...]
>>
>> You say there's "no reason", and then you explain in the same sentence
>> what the reason is: performance.
>>
> Performance of elementary operations depends on hardware and it
> can choose to do something in parallel or as sequential
> micro-operations. There possibly is still hardware in usage where adding
> two integer registers takes multiple cycles but there is also hardware
> where 4 unrelated with each other adds in a row take one cycle.

I don't see how that observation fits the points above it. However, it
is worth noting that checked arithmetic - whether you are
bounds-checking before the arithmetic, or using overflow or carry flags
after the operation - absolutely cripples a modern processor's ability
to do things in parallel. It enforces a serialisation that can only
partially be offset by speculative execution. This cost is in addition
to the sometimes severe limitations it enforces on compiler
optimisations, expression re-arrangement, and code re-arrangement.

>
>> I don't see how the distinction between "portable assemblers" (which
>> pretty much do not exist) and programming languages is relevant.
>>
> It is relevant in sense that its rules should be about usefulness for
> programming software. The throughput and latency of operations into
> what it is translated can change.

The usefulness of C and C++ for programming is dependent on efficient
code generation. If that is not relevant, switch to Python and get
extending integers and exceptions on errors. (Plenty of programmers,
and plenty of programs, would be better off with a language like Python,
or at least a half-way choice like C#. "Trust the programmer" languages
are not the right choice for all purposes.)

Öö Tiib

unread,

Mar 12, 2023, 12:29:30 PM3/12/23

to

On Sunday, 12 March 2023 at 12:54:57 UTC+2, David Brown wrote:
> On 10/03/2023 17:20, Öö Tiib wrote:
>
> > Undefined behavior is neither obvious nor certain. Incorrectness results
> > with program doing whatever only because rules say so. Throwing from
> > function "void f() noexcept;" is also obviously incorrect yet C++ standard
> > requires calling std::terminate() on that case.
> > There are reasons why intrinsics like __builtin_saddll_overflow are
> > there. Only that I do not want the code to call function named in Orcish
> > and check a bool. I would like guaranteed terminate, signal or exception.
> >
> Even if it has not always been clear in this thread, I think we can
> agree on quite a number of things:
>
> Any code that attempts to execute code with undefined behaviour, is
> buggy. In some cases, it might work well enough for a while and then
> fail due to later changes (such as in the anecdote about strcpy).
>
> Undefined behaviour which could be caught at compile/link time without
> unreasonable compiler effort, should be caught there.
>
> Some kinds of undefined behaviour - bugs in the code - could be detected
> by run-time checks, with or without hardware support. If hardware can
> handle the checks with no overhead, that's great. If there is overhead
> in code speed or missed optimisation, that must be balanced against the
> benefits of catching bugs.
>
> (If you don't agree on these points, I hope you'll say so.)
>

Basically I agree just that the trend is that even the bugs that are detected
on hardware level are ignored. That is like the effort of hardware designers
was wasted. Instead compilers often generate code that bypasses those
hardware checks and does something odd or even what programmers
did expect instead. So code generation is too much balanced towards
not catching bugs. That is pointless philosophy of unreliability.

> I am of the opinion that if a language cannot define a particular
> behaviour in a correct manner, then it should be left undefined. I see
> no point in a language insisting on incorrect behaviour. Leaving it
> undefined gives the implementation freedom to optimise on the assumption
> that it does not happen when the user is looking for optimal efficiency,
> and also to allow extra checks and debugging aids to help developers
> find problems in their code. Defining behaviour limits that, while not
> being able to give the programmer any useful benefits.
>

The integer arithmetic has been very rarely important to performance
for very long time. It was already in nineties when people did show that
with simpler compression algorithms uncompressing can be quicker
than memcpy between two uncompressed buffers. Changes in integer
arithmetic are rarely needed for performance but often for correctness.

You reply to where I mentioned that programmers can write their own
trapping, (or saturating or wrapping) arithmetic using compiler intrinsics.
Those are not part of C++ language. C++ language is as unhelpful as it
only can. There is only undefined behavior available. How is that flexible?
So compiler writers have attempted to at least help a bit.

> As another common example of undefined behaviour, let's look at
> dereferencing invalid pointers - such as attempting to access array
> elements outside of the array's bounds. This is UB in C and C++. On
> bigger systems, hardware (MMU) and software (the OS) combine to give
> near zero-overhead trapping for accesses that are far out of bounds,
> outside the memory space of the process. This is vital for secure and
> reliable multi-program and multi-user systems. But it could not be
> behaviour defined or required in the language standard, because many
> smaller systems (typically single-program devices) cannot do anything
> like this, and even on big systems the mechanisms can vary wildly.
>
> For "close miss" errors, where the target address is inside the
> processes' memory spaces, there is no general low-cost mechanism to
> detect and prevent such invalid accesses. The language would require
> "fat pointers" everywhere, with very different ways of dealing with
> accesses, array slices, etc. Basically, the language would have to
> remove the concepts of "pointer" and direct references entirely - fine
> for slow, safe, interpreted Python but not appropriate for C or C++.
> Leaving the behaviour undefined means tools can add checking (like
> memory sanitizers or valgrind) for debugging, and optimise code that you
> know is correct. Note also that there is no general way to determine if
> an access will be a "close miss" or outside the processes' memory spaces.
>
> And again, you can add whatever behaviour and checks you like for your
> own use. Put in checks manually. Make your own "array" class that has
> bound-checked indexing by default, if that's what suits you.
>

For those other examples there are even no compiler intrinsics.
Can do basically nothing but to use the skills accumulated with
decades to help out. Again, even if to remove quarter of undefined
behaviors the language would become simply better by that quarter.
There are no good or beneficial undefined behaviors.

> "Undefined behaviour" is key to how C and C++ work, and what makes them
> great languages. It is utterly essential to the philosophy of "zero
> overhead" and not paying for things you don't use. As a C and C++
> programmer, it is /your/ choice and /your/ responsibility to make sure
> the code is correct, and to choose whatever level of extra checking you
> feel appropriate to the task in hand. With C++ you can make your
> classes to automate this as needed - UB is vital to giving you that freedom.
>

All people are fallible. They make mistakes with often boring stuff that
compiler could refuse or that hardware does already trap. It is no way
super "key feature" that those opportunities are ignored.

Öö Tiib

unread,

Mar 12, 2023, 12:30:05 PM3/12/23

to

Point is simply that things can and are made differently on hardware
level. You talk about software level checks before or after operation.
The expressions that are rearranged or ran in parallel are anyway proven
to be unrelated by compiler or hardware that does those.

> >
> >> I don't see how the distinction between "portable assemblers" (which
> >> pretty much do not exist) and programming languages is relevant.
> >>
> > It is relevant in sense that its rules should be about usefulness for
> > programming software. The throughput and latency of operations into
> > what it is translated can change.
>
> The usefulness of C and C++ for programming is dependent on efficient
> code generation. If that is not relevant, switch to Python and get
> extending integers and exceptions on errors. (Plenty of programmers,
> and plenty of programs, would be better off with a language like Python,
> or at least a half-way choice like C#. "Trust the programmer" languages
> are not the right choice for all purposes.)
>

There are needs where such choice is not available and so I have very
low experience with C# and Python. Also to throw decades of
experience away and to say that nah I now just make JavaScript
for web page would be irresponsible.

Malcolm McLean

unread,

Mar 12, 2023, 12:57:17 PM3/12/23

to

On Sunday, 12 March 2023 at 16:29:30 UTC, Öö Tiib wrote:
>
> The integer arithmetic has been very rarely important to performance
> for very long time. It was already in nineties when people did show that
> with simpler compression algorithms uncompressing can be quicker
> than memcpy between two uncompressed buffers. Changes in integer
> arithmetic are rarely needed for performance but often for correctness.
>

On a big modern processor, the bottleneck will be memory access, in
particular cache misses.
However C and C++ are not used exclusively for big processors.

Öö Tiib

unread,

Mar 12, 2023, 1:26:52 PM3/12/23

to

From that does not follow that integer arithmetic is bottle-neck on
small processors. Lets say your small processor does not have much
memory but can do about 16 millions operations per second. What the
heck it does for integer arithmetic to become bottle-neck for it?

Keith Thompson

unread,

Mar 12, 2023, 5:18:08 PM3/12/23

to

Öö Tiib <oot...@hot.ee> writes:
[...]

> For those other examples there are even no compiler intrinsics.
> Can do basically nothing but to use the skills accumulated with
> decades to help out. Again, even if to remove quarter of undefined
> behaviors the language would become simply better by that quarter.
> There are no good or beneficial undefined behaviors.

[...]

Which brings us back to the point I tried to make earlier.

Upthread, you wrote:

Most people agree that C++ contains too lot of features. They only
differ by opinion where to cut. I would put most effort into places
where it says "undefined behavior". I think half of such places can
be replaced with "program is ill formed", "std::terminate is called"
or one of the features that lead to described situation can be cut
out (as badly thought thru).

Now you talk about removing a quarter of undefined behaviors.

Leaving aside for the moment the fact that most cases of undefined
behavior are implicit, I was hoping for some indication of *which* cases
you think can be removed. I agree that there are some, but I'm
skeptical that it would be practical to remove as many has half.

Any comment on that?

Öö Tiib

unread,

Mar 12, 2023, 7:18:51 PM3/12/23

to

On Sunday, 12 March 2023 at 23:18:08 UTC+2, Keith Thompson wrote:
> Öö Tiib <oot...@hot.ee> writes:
> [...]
> > For those other examples there are even no compiler intrinsics.
> > Can do basically nothing but to use the skills accumulated with
> > decades to help out. Again, even if to remove quarter of undefined
> > behaviors the language would become simply better by that quarter.
> > There are no good or beneficial undefined behaviors.
> [...]
>
> Which brings us back to the point I tried to make earlier.
>
> Upthread, you wrote:
>
> Most people agree that C++ contains too lot of features. They only
> differ by opinion where to cut. I would put most effort into places
> where it says "undefined behavior". I think half of such places can
> be replaced with "program is ill formed", "std::terminate is called"
> or one of the features that lead to described situation can be cut
> out (as badly thought thru).
>
> Now you talk about removing a quarter of undefined behaviors.
>
> Leaving aside for the moment the fact that most cases of undefined
> behavior are implicit, I was hoping for some indication of *which* cases
> you think can be removed. I agree that there are some, but I'm
> skeptical that it would be practical to remove as many has half.
>
> Any comment on that?

I do not think that majority of cases are implicit. Or may be there are
but the implicit cases are very rarely met. Yes, I think that about half
can be removed. May be that half is too lot of work then even quarter
would help in major way. For example that:

int& foo(int x) {return x;}

There must be is good reason why that is not an error (and even
not undefined behavior).

David Brown

unread,

Mar 13, 2023, 6:45:39 AM3/13/23

to

I think it's good to program in a variety of languages, or at least to
be familiar with them. You can get inspiration for different ways to
structure your code, even if the techniques are not idiomatic for the
language you happen to be using. Every language has its good points and
bad points, and you can learn from both.

Mut...@dastardlyhq.com

unread,

Mar 13, 2023, 8:01:10 AM3/13/23

to

True. But some languages are just incomprehensible to some people. Personally
I understood the concepts of OO within probably 10 mins of encountering it,
functional programming didn't take much longer to grok and SQL similar.

I must have programmed in at least 10 languages in my life but the one I could
never get my head around was Prolog and I still can't. Every time I thought
I understood what it was doing I'd write another bit of code and it would
do something completely different to what I was expecting.

I just Don't Get It. I guess my brain simply doesn't work that way.

Kalevi Kolttonen

unread,

Mar 13, 2023, 8:26:56 AM3/13/23

to

Mut...@dastardlyhq.com wrote:
> [...] some languages are just incomprehensible to some people.

One quite mind-expanding and weird language to study is Erlang.

There is this saying "You can write C in any language", but with
Erlang it is not true. It is so different.

Despite its strangeness, Erlang is not just some kind of
esoteric language that nobody actually uses for serious
programming tasks.

They use Erlang in some mobile phone networks, and with
the ability to update your code while the programs are
running, it is possible keep your code updated
(e.g. security, new features) to reach 10 years of uptime.

And no, I do not claim to be proficient in Erlang. I have
created only small toy programs long time ago, but I have
read two good books about Erlang just for educational
purposes.

br,
KK

David Brown

unread,

Mar 13, 2023, 9:30:25 AM3/13/23

to

That is, at best, a very narrow view - at worst it is just wrong.

It is certainly the case that memory speeds have not increased at close
to the rate of the increase in processor speeds - for large processors.
But, again, you are missing the big picture and only considering one
type of processor. It's a different world outside of your PC.

You are also failing to consider that the compression and decompression
relies heavily on integer arithmetic, as does all pointer arithmetic.
It is /only/ because C and C++ compilers can generate fast code for
integer arithmetic that the compression and decompression algorithms are
fast enough to make your benchmarks run quickly.

I'll happily agree that code efficiency is not important in all code.
And I am more than happy to agree that code correctness is far more
important than code efficiency.

But I do not accept that all correct code should be made to run
inefficiently just because some code is incorrect. If you want to do
that, pick a different programming language with a different balance
between efficiency and treatment of potential errors in code - C and C++
are not the languages for you.

I did not at all suggest that people have to use non-portable compiler
intrinsics. You /can/, if you want. And if you want a good, reusable
and efficient class, then that's what you should do - with conditional
compilation and generic fall-backs for portability.

So you want something like :

class CheckedInt32 {
private :
int32_t x_;
public :
constexpr CheckedInt32() : x_(0) {}
constexpr CheckedInt32(int32_t x) : x_(x) {}
explicit constexpr operator int32_t () { return x_; }
friend constexpr CheckedInt32 operator + (CheckedInt32 a,
CheckedInt32 b);
};

#if defined __GNUC__
constexpr CheckedInt32 operator + (CheckedInt32 a, CheckedInt32 b) {
int32_t y = 0;
if (__builtin_add_overflow(a.x_, b.x_, &y)) {
throw std::overflow_error("Overflow when adding");
}
return CheckedInt32(y);
}

#else
constexpr CheckedInt32 operator + (CheckedInt32 a, CheckedInt32 b) {
int64_t y = (int64_t) a.x_ + (int64_t) b.x_;
if ((y > INT32_MAX) || (y < INT32_MIN)) {
throw std::overflow_error("Overflow when adding");
}
return CheckedInt32(y);
}
#endif

It would be very nice to have this sort of thing as a template library
in the standard library, so that implementations could have optimal
implementations. I gather that C23 will have <stdckdint.h> with
functions/macros like "chk_add" that do this - I don't know if and when
C++ will pick it up.

So to be clear, I would like the standards to include this sort of thing
- as well as defined wrapping behaviour. But it must be in /addition/
to the "undefined behaviour" option, which must remain the default.

You can't remove this class of undefined behaviour - not even a quarter
of them. The best you can do is have the compiler complain in the few
cases that it can clearly see a problem, such as when there is a
constant array index that is known to be out of bounds. But you can't
mandate that in the language standards, because it is highly dependent
on the compiler, the code, and options.

> There are no good or beneficial undefined behaviors.

We've already gone through this. Nothing good can ever come of trying
to execute something with undefined behaviour. But the fact that many
things in the languages have undefined behaviour is a good thing - for
example, without that possibility, there would be no such thing as a
pointer (since we agree that invalid memory accesses cannot, in general,
have defined behaviours).

>
>> "Undefined behaviour" is key to how C and C++ work, and what makes them
>> great languages. It is utterly essential to the philosophy of "zero
>> overhead" and not paying for things you don't use. As a C and C++
>> programmer, it is /your/ choice and /your/ responsibility to make sure
>> the code is correct, and to choose whatever level of extra checking you
>> feel appropriate to the task in hand. With C++ you can make your
>> classes to automate this as needed - UB is vital to giving you that freedom.
>>
> All people are fallible. They make mistakes with often boring stuff that
> compiler could refuse or that hardware does already trap. It is no way
> super "key feature" that those opportunities are ignored.
>

That is why it is vital that such mistakes in code are undefined
behaviour - because only then can tools add testing and trapping
behaviour to aid debugging.

David Brown

unread,

Mar 13, 2023, 9:41:05 AM3/13/23

to

Let's rather say it is doing 16 operations in a microsecond - suddenly a
clock cycle more or less makes a /huge/ difference. Or it is doing 1000
loops of an FFT calculation every millisecond - 15 cycles in the inner
loop means success, 17 cycles means failure.

Remember, no one is arguing for getting the right answers out of
calculations with overflows. I am suggesting that programmers beat the
"garbage in, garbage out" situation by not putting in garbage in the
first place. There there is no benefit from all that extra effort for
"garbage in, error message out". Programmers should put their checks
and controls in the /right/ places, not rely on a nanny language to
double-check their homework. Instead of asking the language to do
something as silly as adding two numbers whose sum is out of range of
the target type, make sure the numbers are right in the first place, and
that the target type is appropriate. And if you want to put in checks,
do so in the part of the code that might be wrong, not later on.

Mut...@dastardlyhq.com

unread,

Mar 13, 2023, 11:00:40 AM3/13/23

to

On Mon, 13 Mar 2023 12:26:40 -0000 (UTC)
kal...@kolttonen.fi (Kalevi Kolttonen) wrote:
>Mut...@dastardlyhq.com wrote:
>> [...] some languages are just incomprehensible to some people.
>
>One quite mind-expanding and weird language to study is Erlang.
>
>There is this saying "You can write C in any language", but with
>Erlang it is not true. It is so different.

Its functional isn't it? Functional languages are interesting as they do
polymorphism by value rather than by type IME.

>They use Erlang in some mobile phone networks, and with

IIRC thats why ericsson wrote it.

Kalevi Kolttonen

unread,

Mar 13, 2023, 11:18:21 AM3/13/23

to

Mut...@dastardlyhq.com wrote:
> Its functional isn't it? Functional languages are interesting as they do
> polymorphism by value rather than by type IME.

Yes, it is functional.

There are no for or while loops in Erlang, but you
have to use recursion instead.

Variables are immutable, so you assign a value to them once
and they retain it. To me, this feature is probably the
trickiest or "craziest". In so many other programming languages,
we take it for granted that the values of variables can change
during the program execution.

The following is taken straight from Wikipedia. The information
is based on Joe Armstrong's (one of language creators) PhD
Thesis:

Everything is a process.
Processes are strongly isolated.
Process creation and destruction is a lightweight operation.
Message passing is the only way for processes to interact.
Processes have unique names.
If you know the name of a process you can send it a message.
Processes share no resources.
Error handling is non-local.
Processes do what they are supposed to do or fail.

>>They use Erlang in some mobile phone networks, and with
>
> IIRC thats why ericsson wrote it.

Correct.

I just quickly glanced through the Wikipedia article
and it says there is one notable new application written
in Erlang: WhatsApp. Maybe the client code is not Erlang,
but the server infrastructure is.

br,
KK

Öö Tiib

unread,

Mar 13, 2023, 11:25:37 AM3/13/23

to

It is becoming relatively rarely worth to write C++ for PC. Small
processors tend to be quite powerful for surrounding hardware so
are often idle. If these have some algebra to do then it is often floating
point or if these compress, decompress or crypt something then that
is done with wrapping arithmetic. In my experience. You might have
different problem domain.

> You are also failing to consider that the compression and decompression
> relies heavily on integer arithmetic, as does all pointer arithmetic.
>
> It is /only/ because C and C++ compilers can generate fast code for
> integer arithmetic that the compression and decompression algorithms are
> fast enough to make your benchmarks run quickly.
>

Pointer arithmetic is also not signed integer arithmetic. So signed
integer undefined behaviors do not help with those cases.

> I'll happily agree that code efficiency is not important in all code.
> And I am more than happy to agree that code correctness is far more
> important than code efficiency.
>
> But I do not accept that all correct code should be made to run
> inefficiently just because some code is incorrect. If you want to do
> that, pick a different programming language with a different balance
> between efficiency and treatment of potential errors in code - C and C++
> are not the languages for you.
>

Great that you agree. Note that it does not run inefficiently. There are
not lot of "magic" performance boosts hidden in unreliability of signed
integers. Rust for example traps in debug mode and wraps in release
mode ... result is rather competitive in performance. Ada always throws
and also performs not bad.

Yes, I have helped to repair overflow bugs from such classes. These
work OK.

> It would be very nice to have this sort of thing as a template library
> in the standard library, so that implementations could have optimal
> implementations. I gather that C23 will have <stdckdint.h> with
> functions/macros like "chk_add" that do this - I don't know if and when
> C++ will pick it up.
>

I hope that quite soon. One day they should end adding (sub-bar
for my usages) stuff like <valarray>, <regex> or <format>.

> So to be clear, I would like the standards to include this sort of thing
> - as well as defined wrapping behaviour. But it must be in /addition/
> to the "undefined behaviour" option, which must remain the default.
>

It is unclear to me why it must be default behavior. For me most
optimizations are worth to do only in small subset of code base.
And there other things like reducing copying, dynamic allocations or
adding cache blocking give way more than signed integer
micro-optimizations. Especially if those are compared to wrapping
(that I agree is usually error). Explicit checks and branches in code
can cost now but I'm sure that if like's of Ada and Rust gain more
market then hardware guys will add trapping signed integers.

Rust attempts to do that and in quite convincing manner. It is bit raw
yet and so tricky to populate a team but it gets better all the time.

> > There are no good or beneficial undefined behaviors.
>
> We've already gone through this. Nothing good can ever come of trying
> to execute something with undefined behaviour. But the fact that many
> things in the languages have undefined behaviour is a good thing - for
> example, without that possibility, there would be no such thing as a
> pointer (since we agree that invalid memory accesses cannot, in general,
> have defined behaviours).
>

Raw pointer is itself of too general purpose. I prefer everything with however
little more constraints like references, iterators, indexes, containers,
smart pointers. These each remove undefined behaviors only by making
meaningless in context operations unavailable. And that is not sole
thing that those improve.

> >
> >> "Undefined behaviour" is key to how C and C++ work, and what makes them
> >> great languages. It is utterly essential to the philosophy of "zero
> >> overhead" and not paying for things you don't use. As a C and C++
> >> programmer, it is /your/ choice and /your/ responsibility to make sure
> >> the code is correct, and to choose whatever level of extra checking you
> >> feel appropriate to the task in hand. With C++ you can make your
> >> classes to automate this as needed - UB is vital to giving you that freedom.
> >>
> > All people are fallible. They make mistakes with often boring stuff that
> > compiler could refuse or that hardware does already trap. It is no way
> > super "key feature" that those opportunities are ignored.
> >
> That is why it is vital that such mistakes in code are undefined
> behaviour - because only then can tools add testing and trapping
> behaviour to aid debugging.

That does not follow. Defined error helps to debug lot better, as it
is required to fail compile or run time and so can be silenced only
with special options not achieved by using separate special tools.

Öö Tiib

unread,

Mar 13, 2023, 12:00:05 PM3/13/23

to

On Monday, 13 March 2023 at 15:41:05 UTC+2, David Brown wrote:
> On 12/03/2023 18:26, Öö Tiib wrote:
> > On Sunday, 12 March 2023 at 18:57:17 UTC+2, Malcolm McLean wrote:
> >> On Sunday, 12 March 2023 at 16:29:30 UTC, Öö Tiib wrote:
> >>>
> >>> The integer arithmetic has been very rarely important to performance
> >>> for very long time. It was already in nineties when people did show that
> >>> with simpler compression algorithms uncompressing can be quicker
> >>> than memcpy between two uncompressed buffers. Changes in integer
> >>> arithmetic are rarely needed for performance but often for correctness.
> >>>
> >> On a big modern processor, the bottleneck will be memory access, in
> >> particular cache misses.
> >> However C and C++ are not used exclusively for big processors.
> >>
> > From that does not follow that integer arithmetic is bottle-neck on
> > small processors. Lets say your small processor does not have much
> > memory but can do about 16 millions operations per second. What the
> > heck it does for integer arithmetic to become bottle-neck for it?
> Let's rather say it is doing 16 operations in a microsecond - suddenly a
> clock cycle more or less makes a /huge/ difference. Or it is doing 1000
> loops of an FFT calculation every millisecond - 15 cycles in the inner
> loop means success, 17 cycles means failure.
>
> Remember, no one is arguing for getting the right answers out of
> calculations with overflows.

Refusing to do is also correct and valuable answer. I estimate that
about 30% of "information" around us is deliberate misinformation
and 60% is utter garbage or naively repeating that misinformation.

> I am suggesting that programmers beat the
> "garbage in, garbage out" situation by not putting in garbage in the
> first place.

Typically they beat it by unaided manual checking. Sometimes they
forget and that results with things like security breaches and
jpeg viruses.

> There there is no benefit from all that extra effort for
> "garbage in, error message out". Programmers should put their checks
> and controls in the /right/ places, not rely on a nanny language to
> double-check their homework. Instead of asking the language to do
> something as silly as adding two numbers whose sum is out of range of
> the target type, make sure the numbers are right in the first place, and
> that the target type is appropriate. And if you want to put in checks,
> do so in the part of the code that might be wrong, not later on.

Homework deals with isolated, internal problems but real software
typically gets the data from (potentially gruel world) outside not from
hands of programmer or teacher. Programmer has of course check
himself instead of declaring limits to types. I suspect that in few
years these manual checks face the destiny of "register" and
"inline" and turn into hints that compilers ignore.

Mut...@dastardlyhq.com

unread,

Mar 13, 2023, 12:00:35 PM3/13/23

to

On Mon, 13 Mar 2023 15:18:06 -0000 (UTC)
kal...@kolttonen.fi (Kalevi Kolttonen) wrote:
>Mut...@dastardlyhq.com wrote:
>> Its functional isn't it? Functional languages are interesting as they do
>> polymorphism by value rather than by type IME.
>
>Yes, it is functional.
>
>There are no for or while loops in Erlang, but you
>have to use recursion instead.
>
>Variables are immutable, so you assign a value to them once
>and they retain it. To me, this feature is probably the
>trickiest or "craziest". In so many other programming languages,
>we take it for granted that the values of variables can change
>during the program execution.

Sounds like its one of those languages thats theoretically "pure" at the
expense of making it hard to learn and write basic algorithms. Make a language
too obtuse and few people will both to learn it as IBM found out with APL.

David Brown

unread,

Mar 13, 2023, 12:54:47 PM3/13/23

to

The work done on PC's is totally and completely dominated by code
written in C or C++. You will probably find that all but a couple of
percent of used processor cycles is running code written in C or C++.
The programmer may have written source code in C#, or Python, or Ruby,
but the VM it runs on is in C or C++. The libraries that do the work in
the graphics, the network, the filesystems, the game engine - it's all C
and C++.

I think it is a good thing that people use different languages on PC's.
People can find a different balance between efficiency of the developer
and efficiency at run-time, or between developer knowledge and
experience and the safety of the code. Some will use low-level
languages and take care to get the details right - others will use
high-level languages and let the language runtime handle the details.

But you could say that it is precisely because a lot of modern PC
programming is done in languages other than C and C++, that it is even
more important that efficiency is a high-priority in C and C++. And it
is even more appropriate to expect the C and C++ programmers to be good
at their jobs and not need hand-holding to get simple tasks right.

So for PC's, you absolutely want to be able to conduct your integer
arithmetic as fast as possible without any unnecessary checking, traps,
exceptions, or other hinders in the flow.

As for your comments on embedded systems, I think you are seriously
misunderstanding the point when you say it is often wrapping arithmetic.
If wrapping is what you need - and it is /sometimes/ right, but not
often, even in cryptography - that's fine. But if it is wrong, it is
wrong. It really doesn't matter here if the behaviour is defined and
wrong, or undefined and wrong - except that being defined means it's
harder to find the errors and efficiency might be lower. The programmer
still needs to make the effort to make sure the inputs are valid for the
expressions in use. And when the programmer has made the effort, any
extra checks added by the compiler are a waste.

(I would like C and C++ to have unsigned types that have undefined
overflow behaviour, as found in the language Zig. I would not want to
give up on having wrapping types too - I want both.)

>> You are also failing to consider that the compression and decompression
>> relies heavily on integer arithmetic, as does all pointer arithmetic.
>>
>> It is /only/ because C and C++ compilers can generate fast code for
>> integer arithmetic that the compression and decompression algorithms are
>> fast enough to make your benchmarks run quickly.
>>
> Pointer arithmetic is also not signed integer arithmetic. So signed
> integer undefined behaviors do not help with those cases.
>

Pointer arithmetic is calculated with integer arithmetic semantics in C
and C++, and with the same kind of instructions in implementations. But
it has even more restrictions in the valid domains and it is even harder
to detect overflows. "arr[i]" means "*(arr + i)", where the expression
"arr + i" overflows on any result outside the array. Everything that
applies to signed integer overflow applies even more so to pointer
arithmetic.

>> I'll happily agree that code efficiency is not important in all code.
>> And I am more than happy to agree that code correctness is far more
>> important than code efficiency.
>>
>> But I do not accept that all correct code should be made to run
>> inefficiently just because some code is incorrect. If you want to do
>> that, pick a different programming language with a different balance
>> between efficiency and treatment of potential errors in code - C and C++
>> are not the languages for you.
>>
> Great that you agree. Note that it does not run inefficiently. There are
> not lot of "magic" performance boosts hidden in unreliability of signed
> integers. Rust for example traps in debug mode and wraps in release
> mode ... result is rather competitive in performance. Ada always throws
> and also performs not bad.
>

Rust and Ada perform reasonably, but not on a par with C and C++.
People spend vast amounts on buying a processor that is 20% faster than
mainstream alternatives - they do not want to run their code 50% slower.

Your thoughts here seems to be based on the idea that code just does a
bit of integer arithmetic every now and again, and the cpu is probably
waiting for slow main memory anyway. That's true for some code. It is
very far from true for other code.

On small systems, you are rarely waiting for memory, and individual
cycles count. You don't enable exceptions in C++, you don't stop with
error messages - you have nowhere to show an error message. Even if you
do get an error, what are you going to do about it? Recall all the
microwave ovens or oil valve controllers you've made? You write the
code correctly so that it doesn't overflow, and doesn't stop - or you
are looking for a new job. Extra checking just means bigger and more
expensive hardware to do the same job, taking extra power from your
battery life.

And if you look at PC's and big processors, you might have noticed a
trend in the last decades of more and more vector processing. While
some of that is for floating point vectors, doesn't it still give you a
clue that people /do/ care about integer arithmetic performance?

>>>> Whereas other languages are often fixed in their handling of overflow,
>>>> C++ is flexible - you can make your own classes to handle overflow just
>>>> the way you want. The language does not impose any choice on the users,
>>>> but provides the mechanisms to let you make whatever kinds of integers
>>>> you want with whatever overflow handling you want. (You can also do
>>>> C-style manual checks whenever you feel the need.)
>>>>
>>> You reply to where I mentioned that programmers can write their own
>>> trapping, (or saturating or wrapping) arithmetic using compiler intrinsics.
>>> Those are not part of C++ language. C++ language is as unhelpful as it
>>> only can. There is only undefined behavior available. How is that flexible?
>>> So compiler writers have attempted to at least help a bit.
>> I did not at all suggest that people have to use non-portable compiler
>> intrinsics. You /can/, if you want. And if you want a good, reusable
>> and efficient class, then that's what you should do - with conditional
>> compilation and generic fall-backs for portability.
>>
> Yes, I have helped to repair overflow bugs from such classes. These
> work OK.

That's the great thing about C++. Get it working right once, and use it
repeatedly.

>
>> It would be very nice to have this sort of thing as a template library
>> in the standard library, so that implementations could have optimal
>> implementations. I gather that C23 will have <stdckdint.h> with
>> functions/macros like "chk_add" that do this - I don't know if and when
>> C++ will pick it up.
>>
> I hope that quite soon. One day they should end adding (sub-bar
> for my usages) stuff like <valarray>, <regex> or <format>.
>

Different people have different needs. There's lots in the standard
library that is useless to me - but I realise it is useful to /some/ people.

>> So to be clear, I would like the standards to include this sort of thing
>> - as well as defined wrapping behaviour. But it must be in /addition/
>> to the "undefined behaviour" option, which must remain the default.
>>
> It is unclear to me why it must be default behavior. For me most
> optimizations are worth to do only in small subset of code base.
> And there other things like reducing copying, dynamic allocations or
> adding cache blocking give way more than signed integer
> micro-optimizations. Especially if those are compared to wrapping
> (that I agree is usually error). Explicit checks and branches in code
> can cost now but I'm sure that if like's of Ada and Rust gain more
> market then hardware guys will add trapping signed integers.
>

Just to be clear here - gcc (and presumably other compilers) have had
options for defined signed integer overflow for decades. You can choose
wrapping (-fwrapv) or trapping (-ftrapv). These are very rarely used in
practice - people either don't know about such things (and those
programmers would be unlikely to do a decent job of handling run-time
errors even if were the default), or they don't want such features. gcc
developers have long since realised that trying to make relatively
efficient checked integer arithmetic (as -ftrapv did) was pretty much
useless. It had significant overhead, was inconsistent, gave little
useful feedback, and only handled one small class of common errors.
Instead, they recommend more complete debugging tools such as sanitizers
that check much more and give more useful debugging feedback.

Ada is virtually dead as a language. Rust is a fad - the benefits you
get from re-writing C code in Rust can be achieved almost entirely by
re-writing the C code in decent modern C and using modern C development
tools. (Still, if it is easier to make people fix bugs by re-writing in
Rust than getting them to re-write in good C, the end result is fewer
bugs.) And if your re-write is good modern C++, even better. Why
bother with code analysis "borrow checker" tools with Rust when you can
write the code in C++ in a way that makes you sure you don't have memory
usage failures? I suppose the best you can say is that Rust makes it
marginally harder to write incorrect code, for some kinds of error.

>> You can't remove this class of undefined behaviour - not even a quarter
>> of them. The best you can do is have the compiler complain in the few
>> cases that it can clearly see a problem, such as when there is a
>> constant array index that is known to be out of bounds. But you can't
>> mandate that in the language standards, because it is highly dependent
>> on the compiler, the code, and options.
>>
> Rust attempts to do that and in quite convincing manner. It is bit raw
> yet and so tricky to populate a team but it gets better all the time.
>

You've swallowed the cool-aid. Rust, with all its bells and whistles
used correctly, is a big step up from malloc/free, and even new/delete.
But if you are happy with the overhead then you can replace you arrays
with std::vector and you don't need a half-made borrow checker to be
sure you are leak-free.

>>>
>> That is why it is vital that such mistakes in code are undefined
>> behaviour - because only then can tools add testing and trapping
>> behaviour to aid debugging.
>
> That does not follow. Defined error helps to debug lot better, as it
> is required to fail compile or run time and so can be silenced only
> with special options not achieved by using separate special tools.

I can only re-state my points in a limited number of ways. Either you
understand what I am saying, or you don't.

Keith Thompson

unread,

Mar 13, 2023, 1:02:40 PM3/13/23

to

David Brown <david...@hesbynett.no> writes:
[...]

> We've already gone through this. Nothing good can ever come of trying
> to execute something with undefined behaviour.

That's not entirely true. Extensions (which are explicitly permitted by
the standard) can take the form of an implementation defining behavior
that is not defined by the standard.

Paavo Helde

unread,

Mar 13, 2023, 2:38:16 PM3/13/23

to

13.03.2023 17:25 Öö Tiib kirjutas:

> It is becoming relatively rarely worth to write C++ for PC.

This might be so, but when something deliberately gets written in C++ on
PC then this is most often done to squeeze out the maximum performance.
Slowing down the program by e.g. mandatory signed integer overflow
checks would be a big no-no.

There is a reason why C and C++ adhere to the zero overhead principle.
If I know or have verified that no overflow can occur in my vector
arithmetics, I should not pay any penalty during the operation.

Kalevi Kolttonen

unread,

Mar 13, 2023, 2:43:26 PM3/13/23

to

Mut...@dastardlyhq.com wrote:
> Sounds like its one of those languages thats theoretically "pure" at the
> expense of making it hard to learn and write basic algorithms. Make a language
> too obtuse and few people will both to learn it as IBM found out with APL.

I guess that many of Erlang's features are supposed to make it easier
to reason about the code's correctness. For example, concerning recursion,
I have heard arguments that claim it helps to avoid off-by-one mistakes
that can happen pretty easily in languages such as C.

Erlang processes are isolated from each other, and can communicate only
using message passing. I remember reading about some kind of Actor Model
and maybe it's also supposed make code correctness easier to achieve. But
I do not remember the theoretical foundations of this model, it's been
13 years since I read the Erlang books.

You are right in that using Erlang, you simply cannot pick a CS
book about algorithms, look up what you want, and then implement it
in Erlang. Erlang is of course Turing complete, so it *is* doable
but not straightforward.

br,
KK

David Brown

unread,

Mar 13, 2023, 3:17:57 PM3/13/23

to

On 13/03/2023 16:18, Kalevi Kolttonen wrote:
> Mut...@dastardlyhq.com wrote:
>> Its functional isn't it? Functional languages are interesting as they do
>> polymorphism by value rather than by type IME.
>
> Yes, it is functional.
>
> There are no for or while loops in Erlang, but you
> have to use recursion instead.

That's standard for functional programming languages.

>
> Variables are immutable, so you assign a value to them once
> and they retain it. To me, this feature is probably the
> trickiest or "craziest". In so many other programming languages,
> we take it for granted that the values of variables can change
> during the program execution.

Again, that is the norm for functional programming languages. It is
even becoming common in non-functional languages for immutable
"variables" to be the default. A high proportion of variables in C or
C++ never need to be changed after their initialisation or first
assignment, and it is much easier to reason about the correctness of the
code if the values of variables never changes.

But it is certainly the case that functional programming involves
thinking about things in a somewhat different fashion from imperative
programming. And some tasks are easier in one style, others in a
different style.

>
> The following is taken straight from Wikipedia. The information
> is based on Joe Armstrong's (one of language creators) PhD
> Thesis:
>
> Everything is a process.
> Processes are strongly isolated.
> Process creation and destruction is a lightweight operation.
> Message passing is the only way for processes to interact.
> Processes have unique names.
> If you know the name of a process you can send it a message.
> Processes share no resources.
> Error handling is non-local.
> Processes do what they are supposed to do or fail.
>
>>> They use Erlang in some mobile phone networks, and with
>>
>> IIRC thats why ericsson wrote it.
>
> Correct.

The motto for Erlang is "write once, run forever". It's most fun
feature, IMHO, is the extent to which you can replace code while
everything is still running. The idea is that you don't have to stop
your entire mobile phone network just because you want to update the
software.

>
> I just quickly glanced through the Wikipedia article
> and it says there is one notable new application written
> in Erlang: WhatsApp. Maybe the client code is not Erlang,
> but the server infrastructure is.
>

There's a lot more software written in Erlang than just WhatsApp. As
well as being vital to many mobile phone and telecommunications
networks, it is used in a lot of server systems that require high
reliability and massive scalability. Of course it depends on what you
call "notable".

There is also Elixir that builds on Erlang, running on the same virtual
machine.

David Brown

unread,

Mar 13, 2023, 3:21:26 PM3/13/23

to

On 13/03/2023 18:02, Keith Thompson wrote:
> David Brown <david...@hesbynett.no> writes:
> [...]
>> We've already gone through this. Nothing good can ever come of trying
>> to execute something with undefined behaviour.
>
> That's not entirely true. Extensions (which are explicitly permitted by
> the standard) can take the form of an implementation defining behavior
> that is not defined by the standard.
>

I was using the term "undefined behaviour" here to mean behaviour that
is not defined by the standards, the implementation, or any other
relevant specification. It's fine to use things that are defined only
in a particular implementation - although of course the result is not
portable. (For example, with "gcc -fwrapv", signed integer overflow is
fully defined.)

I could have been clearer about that in my post, however.

Kalevi Kolttonen

unread,

Mar 13, 2023, 3:31:17 PM3/13/23

to

David Brown <david...@hesbynett.no> wrote:
>> There are no for or while loops in Erlang, but you
>> have to use recursion instead.
>
> That's standard for functional programming languages.

I know.

>> Variables are immutable, so you assign a value to them once
>> and they retain it. To me, this feature is probably the
>> trickiest or "craziest". In so many other programming languages,
>> we take it for granted that the values of variables can change
>> during the program execution.
>
> Again, that is the norm for functional programming languages. It is
> even becoming common in non-functional languages for immutable
> "variables" to be the default. A high proportion of variables in C or
> C++ never need to be changed after their initialisation or first
> assignment, and it is much easier to reason about the correctness of the
> code if the values of variables never changes.

That I didn't know. I thought immutable variables were a pretty
unique part of Erlang.

> But it is certainly the case that functional programming involves
> thinking about things in a somewhat different fashion from imperative
> programming. And some tasks are easier in one style, others in a
> different style.

Yes, the mental model is different.

> The motto for Erlang is "write once, run forever". It's most fun
> feature, IMHO, is the extent to which you can replace code while
> everything is still running. The idea is that you don't have to stop
> your entire mobile phone network just because you want to update the
> software.

I agree that it is a fantastic feature to have. I know of no
other language that has it.

>> I just quickly glanced through the Wikipedia article
>> and it says there is one notable new application written
>> in Erlang: WhatsApp. Maybe the client code is not Erlang,
>> but the server infrastructure is.
>>
>
> There's a lot more software written in Erlang than just WhatsApp. As
> well as being vital to many mobile phone and telecommunications
> networks, it is used in a lot of server systems that require high
> reliability and massive scalability. Of course it depends on what you
> call "notable".

Right, the definition of "notable" is important here, but without
doubt WhatsApp qualifies with flying colors. It still has lots
of users worldwide despite emerging competition.

> There is also Elixir that builds on Erlang, running on the same virtual
> machine.

Yes, I noticed that when reading the Wikipedia article. They also have
programming language Lua running on BEAM virtual machine.

br,
KK

Keith Thompson

unread,

Mar 13, 2023, 3:53:59 PM3/13/23

to

The standard defines "undefined behavior" as "behavior for which this
document [the standard] imposes no requirements". I recommend not using
that particular phrase with a different meaning.

Certainly relying on behavior that isn't defined either by the standard,
by any applicable secondary standard, or by the implementation is a bad
idea.

Paavo Helde

unread,

Mar 13, 2023, 6:02:00 PM3/13/23

to

13.03.2023 21:31 Kalevi Kolttonen kirjutas:

> David Brown <david...@hesbynett.no> wrote:

>> Again, that is the norm for functional programming languages. It is
>> even becoming common in non-functional languages for immutable
>> "variables" to be the default. A high proportion of variables in C or
>> C++ never need to be changed after their initialisation or first
>> assignment, and it is much easier to reason about the correctness of the
>> code if the values of variables never changes.
>
> That I didn't know. I thought immutable variables were a pretty
> unique part of Erlang.

For example, a string variable in Python is kind of immutable AFAIK. You
can replace it, but you cannot change it. If there are multiple
references to the string, the other references will continue to point to
the old string even if some path pointing to it is replaced by a new
string, in a COW fashion. So the string acts like an immutable
atomic/scalar value and any references to it only serve as a tool for
memory optimization and cannot be used for modification of data visible
via other references.

At least that's what I've gathered, I'm no expert in Python.

gst

unread,

Mar 14, 2023, 2:49:26 AM3/14/23

to

Le lundi 13 mars 2023 à 18:02:00 UTC-4, Paavo Helde a écrit :

> For example, a string variable in Python is kind of immutable AFAIK.

Pure strings, as floats, ints,.. in Python *are* totally immutable. not kind of. totally.

> .. it is replaced by a new string, in a COW fashion. So the string acts like an immutable

> atomic/scalar value and any references to it only serve as a tool for
> memory optimization and cannot be used for modification of data visible
> via other references.

they don't acts like immutable, they are. You cannot modify a string/immutabe objects. they do not have any method with side effect.

you can assign a string to something, then that thing can be passed down the line/anywhere, the same string can be assigned to many different things/names by many assignements (many references to them yes). But you cannot, never, modify a string in-place, with any reference to it. It has no such method (even private)/feature/possibility. You could only (try to) modify it by accessing C low level internal of CPython say, the implementation, not the language. but by then it's not anymore Python you're doing and that could/would break at some point your/any Python code obviously.