Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

vector<>::erase-behaviour

230 views
Skip to first unread message

Bonita Montero

unread,
Mar 31, 2019, 10:08:14 AM3/31/19
to
Does anyone know what's the technical reason behind that when I do
an .erase() on a vecor all iterators beginning from ther point where
I erased and not shifted (excluding the last elements the vector which
are the same number of elements I erased)?
As the capacity doesn't change the iterators could partitially remain
valid but just point to the shifted elements.

Juha Nieminen

unread,
Mar 31, 2019, 1:06:16 PM3/31/19
to
I'm honestly not trying to be a dick, but the amount of spelling
errors in that message genuinely makes it hard to understand
*exactly* what you are asking.

But if I'm interpreting it correctly, are you asking why the
C++ standard declares all the iterators that point to elements
that are located on or after the deletion point as invalid,
even though there's no technical reason for it?

For starters, the iterators are invalid because they won't be
pointing to the *same* elements anymore. That's why they are
invalid. Even if they are still pointing to *some* valid element
in the same vector, that doesn't mean it's somehow valid. It will
be a completely different element.

Secondly, the standard probably doesn't forbid the std::vector
implementation from physically reducing the allocated size of the
vector, if it can do that without moving the other elements around.

Bonita Montero

unread,
Mar 31, 2019, 4:35:12 PM3/31/19
to
> For starters, the iterators are invalid because they won't be
> pointing to the *same* elements anymore.

That shoudln't be a problem since they could partitially point to the
shifted elements. That could be guaranteed by the standard withoud any
restrictions on the implementation.

> Secondly, the standard probably doesn't forbid the std::vector
> implementation from physically reducing the allocated size of the
> vector, if it can do that without moving the other elements around.

I don't know it for sure but I'll bet my right hand that the standard
guarantees not to move the allocation in this case or to reduce its
capacity but just do everything in-place.

Mr Flibble

unread,
Mar 31, 2019, 5:05:33 PM3/31/19
to
On 31/03/2019 21:35, Bonita Montero wrote:
>> For starters, the iterators are invalid because they won't be
>> pointing to the *same* elements anymore.
>
> That shoudln't be a problem since they could partitially point to the
> shifted elements. That could be guaranteed by the standard withoud any
> restrictions on the implementation.
And you had the nerve to call my capabilities into question. Invalid
iterators should never be used even if they still "point" to valid objects
as this is undefined behaviour according to the standard.

/Flibble

--
“You won’t burn in hell. But be nice anyway.” – Ricky Gervais

“I see Atheists are fighting and killing each other again, over who
doesn’t believe in any God the most. Oh, no..wait.. that never happens.” –
Ricky Gervais

"Suppose it's all true, and you walk up to the pearly gates, and are
confronted by God," Bryne asked on his show The Meaning of Life. "What
will Stephen Fry say to him, her, or it?"
"I'd say, bone cancer in children? What's that about?" Fry replied.
"How dare you? How dare you create a world to which there is such misery
that is not our fault. It's not right, it's utterly, utterly evil."
"Why should I respect a capricious, mean-minded, stupid God who creates a
world that is so full of injustice and pain. That's what I would say."

Bonita Montero

unread,
Mar 31, 2019, 5:22:27 PM3/31/19
to
>>> For starters, the iterators are invalid because they won't be
>>> pointing to the *same* elements anymore.

>> That shoudln't be a problem since they could partitially point
>> to the shifted elements. That could be guaranteed by the standard
>> withoud any restrictions on the implementation.

> And you had the nerve to call my capabilities into question. Invalid
> iterators should never be used even if they still "point" to valid
> objects as this is undefined behaviour according to the standard.

If you were able to read you did read that I already know that.

But there's a point in which I wasn't certain and where had tomatoes
on my eyes: reallocation obviously doesn't occur because the iterators
before the erased sequence remain valid.
So there's absolutely no obvious coercive necessity for the standard
to mandate to invalidate the iterators beginning or behind that
sequence. The iterators could point partitially to shifted elements
without any restriction on the implementation.

Mr Flibble

unread,
Mar 31, 2019, 5:52:40 PM3/31/19
to
The iterator concept dictates that what an iterator refers to never
changes unless the iterator itself is changed. What you propose would
ruin the iterator concept.

Bonita Montero

unread,
Mar 31, 2019, 6:21:36 PM3/31/19
to
>>>>> For starters, the iterators are invalid because they won't be
>>>>> pointing to the *same* elements anymore.

>>>> That shoudln't be a problem since they could partitially point
>>>> to the shifted elements. That could be guaranteed by the standard
>>>> withoud any restrictions on the implementation.

>>> And you had the nerve to call my capabilities into question. Invalid
>>> iterators should never be used even if they still "point" to valid
>>> objects as this is undefined behaviour according to the standard.

>> If you were able to read you did read that I already know that.

>> But there's a point in which I wasn't certain and where had tomatoes
>> on my eyes: reallocation obviously doesn't occur because the iterators
>> before the erased sequence remain valid.
>> So there's absolutely no obvious coercive necessity for the standard
>> to mandate to invalidate the iterators beginning or behind that
>> sequence. The iterators could point partitially to shifted elements
>> without any restriction on the implementation.

> The iterator concept dictates that what an iterator refers to never
> changes unless the iterator itself is changed.  What you propose would
> ruin the iterator concept.

But there's no technical necessity for this concept.

Mr Flibble

unread,
Mar 31, 2019, 6:32:10 PM3/31/19
to
Of course there is: people want to know that an object referred to by an
iterators remains stable and they want to know when iterators become
invalid so they can design clean robust software; with your proposal all
bets are off. The reason iterators are invalidated is the same reason
element references are invalidated. The fact that you don't get this leads
me to the conclusion that the code you write is a mess.

Mr Flibble

unread,
Mar 31, 2019, 6:36:01 PM3/31/19
to
On 31/03/2019 21:35, Bonita Montero wrote:
>> For starters, the iterators are invalid because they won't be
>> pointing to the *same* elements anymore.
>
> That shoudln't be a problem since they could partitially point to the
> shifted elements. That could be guaranteed by the standard withoud any
> restrictions on the implementation.
>

I see you snipped part of a reply again that you disagree with but cannot
argue against, so I will repeat it again here:

"Even if they are still pointing to *some* valid element
in the same vector, that doesn't mean it's somehow valid. It will
be a completely different element."

Now reply to this snipped part please, if you can.

Bonita Montero

unread,
Apr 1, 2019, 2:08:37 AM4/1/19
to
>> But there's no technical necessity for this concept.

> Of course there is: people want to know that an object referred
> to by an iterators remains stable nd they want to know when
> iterators become invalid ...

No, they won't in this case.

Bonita Montero

unread,
Apr 1, 2019, 2:36:59 AM4/1/19
to
The thing is that .erase simply shifts the elements with th assignment
-operator and if I do the same with my own code like the following ...

template<typename T>
void vector_erase( vector<T> &v, typename vector<T>::iterator first,
typename vector<T>::iterator last )
{
typename vector<T>::iterator shiftOld,
shiftNew;
for( shiftOld = last, shiftNew = first; shiftOld != v.end(); )
*shiftNew++ = *shiftOld++;
v.resize( shiftNew - v.begin() );
}

.. the iterators to the vector remain valid until the place I cut
the vector.

Paavo Helde

unread,
Apr 1, 2019, 4:42:06 AM4/1/19
to
On 1.04.2019 0:52, Mr Flibble wrote:
>
> The iterator concept dictates that what an iterator refers to never
> changes unless the iterator itself is changed.

Consider two programs:

#include <iostream>
#include <vector>
#include <string>

int main() {
std::vector<std::string> v;
v.push_back("Alice");
v.push_back("Bob");
auto iter1 = v.begin();
std::cout << "I am "<< *iter1 << "\n";
v[0] = v[1];
v.erase(v.begin()+1);
std::cout << "Now I am "<< *iter1 << "\n";
}

---------------

#include <iostream>
#include <vector>
#include <string>

int main() {
std::vector<std::string> v;
v.push_back("Alice");
v.push_back("Bob");
auto iter1 = v.begin();
std::cout << "I am "<< *iter1 << "\n";
v.erase(v.begin());
std::cout << "Now I am "<< *iter1 << "\n";
}

According to the C++ standard the second program has UB when
dereferencing the invalidated iterator, whereas the first program is
fine. At the same time what they do seems to be very similar, so the
question is what is the motivation for the standard to make the second
one UB?


Öö Tiib

unread,
Apr 1, 2019, 7:22:38 AM4/1/19
to
The second program above will currently crash when debugging
iterators. Online example with gcc:
http://coliru.stacked-crooked.com/a/148f26f38bfb7643

It is tricky to specify how the iterators are now valid but
somehow "reseated" to point at different elements. Lot of
defects are already because of people being confused
with iterators and dereferencing invalid ones. Additional
complications won't reduce those issues. Is there some
motivating example for defining the behavior in standard?

It feels somewhat similar to iterating over whole
multidimensional array by incrementing single pointer to
element. That likely works as intended on all current
platforms but is undefined behavior (AFAIK by both C
and C++ standards). The code will be confusing and the
whole idea smelly.

Bonita Montero

unread,
Apr 1, 2019, 8:17:30 AM4/1/19
to
Oh, there's a minor difference in my implementation and what the
usual implementation seems to be: the assignment is applied with
the move-assignment-operator (i checked MSVC++ and g++) , i.e.
this ...
*shiftNew++ = *shiftOld++;
... has to be changed into this ...
*shiftNew++ = move( *shiftOld++ );
But is it really guaranteed by the standard that the move-assign-
ent-operator is used?

Paavo Helde

unread,
Apr 1, 2019, 8:31:56 AM4/1/19
to
Sure it will crash, as the debugging iterators have been programmed to
detect UB as specified by the standard.

>
> It is tricky to specify how the iterators are now valid but
> somehow "reseated" to point at different elements. Lot of
> defects are already because of people being confused
> with iterators and dereferencing invalid ones. Additional
> complications won't reduce those issues.

If so, why does the standard contain special verbiage that the iterators
to elements *before* the erased range remain valid? It would be much
simpler to say that an erase() call will invalidate all iterators.

Now, if I have some random iterator into the vector, this might or might
not be invalidated, depending on where the erasure occurred, how is that
simpler to keep track than to make sure the iterator still points inside
the final valid vector range.

> complications won't reduce those issues. Is there some
> motivating example for defining the behavior in standard?

I'm not suggesting that this behavior should be defined, I was just
trying to translate Bonita's question.

Mr Flibble

unread,
Apr 1, 2019, 9:13:13 AM4/1/19
to
Because the object identity of the elements before the erased range are
not changed so it makes sense for iterators referring to them to remain
valid. This is not the case for elements after the point of erasure.

Ask yourself why this behaviour is the case for std::vector but isn't the
case for std::list. This is all about object identity and what it means
for an object to change.

Paavo Helde

unread,
Apr 1, 2019, 10:12:27 AM4/1/19
to
So what's the identity of an object? As we all know, an object is a
region of storage. Object also has lifetime which is ended by the
destructor call. However, the standard specifically says that in a
vector::erase() call, the destructor is called only for erased elements,
for moving around the other stuff move operations are used:

23.3.6.5/4 [vector.modifiers] "Complexity: The destructor of T is called
the number of times equal to the number of the elements erased."

Now consider a vector v of 3 elements:

v[0], v[1], v[2]

If I erase v.begin(), v[0] is indeed destroyed. However, v[1] is
move-constructed at the location of v[0], and v[2] is move-assigned to
v[1]. So the second element v[1] resides at the same memory location, is
of the same type and has never seen a destructor call. It was just
assigned a new value, that's all.

So at which point did the v[1] object identity change now?


Paavo Helde

unread,
Apr 1, 2019, 10:25:44 AM4/1/19
to
Correction: after re-reading the standard I now see v[0] is not
destroyed either, it is move-assigned from v[1]. Only v[2] gets
destroyed. So, even more to the point, object lifetime has not ended for
none of the remaining elements, why should iterators to them become invalid?

Also note that pointers and references to these elements remain valid
(3.8/7 [basic.lifetime]).


Öö Tiib

unread,
Apr 1, 2019, 10:49:25 AM4/1/19
to
If it wasn't UB then there still would be frequently enough real
programming error. It feels similar like how static analyzers
warn about calling virtual functions from constructors and
destructors. That is well-defined but quite commonly there is
actual programming error.

>
> >
> > It is tricky to specify how the iterators are now valid but
> > somehow "reseated" to point at different elements. Lot of
> > defects are already because of people being confused
> > with iterators and dereferencing invalid ones. Additional
> > complications won't reduce those issues.
>
> If so, why does the standard contain special verbiage that the iterators
> to elements *before* the erased range remain valid? It would be much
> simpler to say that an erase() call will invalidate all iterators.

That is also complication, indeed. The difference is that it is
simpler to express and to understand. The element to what
the iterator (before erased range) is pointing is exactly same
element as it was before erase.

> Now, if I have some random iterator into the vector, this might or might
> not be invalidated, depending on where the erasure occurred, how is that
> simpler to keep track than to make sure the iterator still points inside
> the final valid vector range.

When (algorithm is such that) it is unclear if an iterator was
or wasn't invalidated then it is likely simplest to assume that
it was. Same as with push_back, when (algorithm is such that)
it is unsure if reallocation occurred or not then it is simplest
to assume that it did and so all iterators are invalid.

> > complications won't reduce those issues. Is there some
> > motivating example for defining the behavior in standard?
>
> I'm not suggesting that this behavior should be defined, I was just
> trying to translate Bonita's question.

I am still interested on what case such stored iterators (or
references or pointers) at "potentially shifted elements" are
handy to have and not confusing

Bonita Montero

unread,
Apr 1, 2019, 10:57:09 AM4/1/19
to
> If I erase v.begin(), ...

No, it is copy-assigned with the following element.
So there's no identity-change.

Bonita Montero

unread,
Apr 1, 2019, 10:57:25 AM4/1/19
to
> If I erase v.begin(), ...

No, it is move-assigned with the following element.

Mr Flibble

unread,
Apr 1, 2019, 10:58:43 AM4/1/19
to
Object identity is unrelated to object lifetime; it is an abstract
semantic property of an object referring to its identity with respect to
the rest of an abstract system.

Mr Flibble

unread,
Apr 1, 2019, 11:01:03 AM4/1/19
to
Obviously you do not understand what object identity means: the object
"becomes" another object so its identity has changed even though its
storage location and/or object lifetime has not changed.

Bonita Montero

unread,
Apr 1, 2019, 11:47:51 AM4/1/19
to
>> No, it is move-assigned with the following element.
>> So there's no identity-change.

> Obviously you do not understand what object identity means: the object
> "becomes" another object so its identity has changed even though its
> storage location and/or object lifetime has not changed.

That can't be true because if you change a single member of an object
this would invalidate the iterator "pointing" at it.

Paavo Helde

unread,
Apr 1, 2019, 12:26:24 PM4/1/19
to
Citation needed. The only passage in the standard containing "object
identity" talks about a copy of an object to a different region of memory.



Mr Flibble

unread,
Apr 1, 2019, 1:51:23 PM4/1/19
to
The standard is correct: a copy of an object has a different identity to
the original object so the objects move assigned-to with
std::vector::erase lose their original identity.

Mr Flibble

unread,
Apr 1, 2019, 1:51:59 PM4/1/19
to
Nonsense.

Bonita Montero

unread,
Apr 1, 2019, 2:10:59 PM4/1/19
to
>>> Obviously you do not understand what object identity means: the
>>> object "becomes" another object so its identity has changed even
>>> though its storage location and/or object lifetime has not changed.

>> That can't be true because if you change a single member of an object
>> this would invalidate the iterator "pointing" at it.

> Nonsense.

It always looks like that if you don't have any arguments.

Paavo Helde

unread,
Apr 1, 2019, 2:21:10 PM4/1/19
to
If so, why this is not UB?

std::vector<std::string> v;
v.push_back("original");
auto iter = v.begin();
std::string x("copy");
v[0] = std::move(x);
std::cout << *iter;






Mr Flibble

unread,
Apr 1, 2019, 2:36:16 PM4/1/19
to
Loss of object identity isn't undefined behaviour and is a separate albeit
related issue to iterator invalidation so I am not sure why you are asking
that question. Object identity is a property of an abstract system and it
can be perfectly valid for an object to lose its identity if that is what
is intended. In your example the assignment is explicit and intended so
you could argue in this case that object identity has not actually been lost.

If std::vector::erase() was renamed to
std::vector::erase_and_move_assign() then you could argue that identity is
no longer lost as that is the intention.

Paavo Helde

unread,
Apr 1, 2019, 3:56:18 PM4/1/19
to
Grudgingly, I have to admit this actually makes sense. Thanks!



Alf P. Steinbach

unread,
Apr 1, 2019, 6:18:35 PM4/1/19
to
On 01.04.2019 13:22, Öö Tiib wrote:
> It feels somewhat similar to iterating over whole
> multidimensional array by incrementing single pointer to
> element. That likely works as intended on all current
> platforms but is undefined behavior (AFAIK by both C
> and C++ standards). The code will be confusing and the
> whole idea smelly.

When I think of a natural way to do what you describe, there is no UB.

And no confusion either.

Perhaps mainly because I am aware that the standard guarantees
consecutive items, no padding, in the multidimensional array.

Perhaps you were not aware?

Or if you were aware of that, can you give an example of the UB way?


Cheers!,

- Alf

Öö Tiib

unread,
Apr 2, 2019, 4:49:14 AM4/2/19
to
Only with char pointer we may do pointer arithmetics on whole
array of arrays since it is an object and char pointer is allowed
to access consecutive bytes of an object (like you say there are
no padding). Standard has no concept of multidimensional array
and global indexing by other means in it is not defined.

So if we have pointer like p = &a[0][0] where a is T[M][N] then
[expr.add] (to what other things like increments, decrements
and [] refer) does explicitly tell that it is undefined behavior to
go any farther than from (p + 0) to (p + N) with it. I am not sure
why it is so but it has been as long I remember only wording of
it has changed over time a bit. Something similar is in C standard
too.

Alf P. Steinbach

unread,
Apr 2, 2019, 5:58:30 AM4/2/19
to
On 02.04.2019 10:49, Öö Tiib wrote:
> On Tuesday, 2 April 2019 01:18:35 UTC+3, Alf P. Steinbach wrote:
>> On 01.04.2019 13:22, Öö Tiib wrote:
>>> It feels somewhat similar to iterating over whole
>>> multidimensional array by incrementing single pointer to
>>> element. That likely works as intended on all current
>>> platforms but is undefined behavior (AFAIK by both C
>>> and C++ standards). The code will be confusing and the
>>> whole idea smelly.
>>
>> When I think of a natural way to do what you describe, there is no UB.
>>
>> And no confusion either.
>>
>> Perhaps mainly because I am aware that the standard guarantees
>> consecutive items, no padding, in the multidimensional array.
>>
>> Perhaps you were not aware?
>>
>> Or if you were aware of that, can you give an example of the UB way?
>
> Only with char pointer we may do pointer arithmetics on whole
> array of arrays since it is an object and char pointer is allowed
> to access consecutive bytes of an object (like you say there are
> no padding). Standard has no concept of multidimensional array

It has.

C++17 8.3.4/3
«When several “array of” specifications are adjacent, a multidimensional
array is created; only the first of the constant expressions that
specify the bounds of the arrays may be omitted.»


> and global indexing by other means in it is not defined.

That sounds incorrect but it depends on what you mean by “global indexing.”


> So if we have pointer like p = &a[0][0] where a is T[M][N] then
> [expr.add] (to what other things like increments, decrements
> and [] refer) does explicitly tell that it is undefined behavior to
> go any farther than from (p + 0) to (p + N) with it.

No, it doesn't.

On the contrary, it gives a hard requirement that it shall work;

«Moreover, if the expression P points to the last element of an array
object, the expression (P)+1 points one past the last element of the
array object.»

And in a multidimensional array one past the last element of an inner
array, is guaranteed to be the first element of a next inner array,
except for the last inner array.


> I am not sure why it is so but it has been as long I remember only
> wording of it has changed over time a bit. Something similar is in
> C standard too.

It's just a sabotage meme originating with some unreasoning socially
oriented first-year students that had a Very Ungood Teacher™ (VUT™).

Or at least that's my theory. :)


Cheers & hth.,

- Alf

Öö Tiib

unread,
Apr 2, 2019, 6:59:58 AM4/2/19
to
I did mean indexing multidimensional arrays like unidimensional.

> > So if we have pointer like p = &a[0][0] where a is T[M][N] then
> > [expr.add] (to what other things like increments, decrements
> > and [] refer) does explicitly tell that it is undefined behavior to
> > go any farther than from (p + 0) to (p + N) with it.
>
> No, it doesn't.

In what version? I have read it from C++98 to C++17 all the same.

> On the contrary, it gives a hard requirement that it shall work;
>
> «Moreover, if the expression P points to the last element of an array
> object, the expression (P)+1 points one past the last element of the
> array object.»

That is exactly (p + N) what I wrote above.

> And in a multidimensional array one past the last element of an inner
> array, is guaranteed to be the first element of a next inner array,
> except for the last inner array.

There is difference between pointer arithmetic and locations of
elements of arrays in memory. Standard does not indicate that
we are free to add or subtract anything to pointer. On the contrary,
[expr.add] is more strict there. So it is true that the pointer (p + N)
is valid and is guaranteed to compare equal with pointer at first
element of next inner array (&a[1][0]) but also it is true
that (p + N + 1) is explicitly told to be undefined behavior.

> > I am not sure why it is so but it has been as long I remember only
> > wording of it has changed over time a bit. Something similar is in
> > C standard too.
>
> It's just a sabotage meme originating with some unreasoning socially
> oriented first-year students that had a Very Ungood Teacher™ (VUT™).
>
> Or at least that's my theory. :)

:D

Alf P. Steinbach

unread,
Apr 2, 2019, 7:23:40 AM4/2/19
to
At least we're clear on that, that /one/ part of the standard requires
it to be well-defined.

Do you agree that for the abstract machine defined by the standard, the
history of how a valid pointer value was computed does not matter if
that history did not involve UB (as it cleary does not for (P)+1)?


>> And in a multidimensional array one past the last element of an inner
>> array, is guaranteed to be the first element of a next inner array,
>> except for the last inner array.
>
> There is difference between pointer arithmetic and locations of
> elements of arrays in memory. Standard does not indicate that
> we are free to add or subtract anything to pointer. On the contrary,
> [expr.add] is more strict there. So it is true that the pointer (p + N)
> is valid and is guaranteed to compare equal with pointer at first
> element of next inner array (&a[1][0])

Yes, it's valid.


> but also it is true
> that (p + N + 1) is explicitly told to be undefined behavior.

Not if you require (P+2) to be equivalent to ((P+1)+1), which is well
defined.

To make this alleged UB have a practical effect one needs a perverse
compiler that adds checking of whether N > 1.

Such a compiler /can/ be implemented, and can possibly be argued to be
formally correct, because the standard does not require `+` to be
associative: the C++ version groups left to right.

But we can reason about the standard imposing requirements such as P+2
having to be split up in two operations (P+1)+1, and so on, that for
practical effect requires over-the-top sabotaging compiler perversity,
at the technical cost of the inefficiency of “fat pointers”, and the
higher level cost of breaking a really large amount of existing code,
and introducing the notion that the history of a pointer matters.

Would that be reasonable or serve any engineering purpose? Nope. It's
just hogwash, an unnatural, totally impractical code-breaking
interpretation that completely disregards context and purpose.


>>> I am not sure why it is so but it has been as long I remember only
>>> wording of it has changed over time a bit. Something similar is in
>>> C standard too.
>>
>> It's just a sabotage meme originating with some unreasoning socially
>> oriented first-year students that had a Very Ungood Teacher™ (VUT™).
>>
>> Or at least that's my theory. :)
>
> :D

Yes.


Cheers!,

- Alf

Paavo Helde

unread,
Apr 2, 2019, 8:15:27 AM4/2/19
to
On 2.04.2019 11:49, Öö Tiib wrote:
>
> So if we have pointer like p = &a[0][0] where a is T[M][N] then
> [expr.add] (to what other things like increments, decrements
> and [] refer) does explicitly tell that it is undefined behavior to
> go any farther than from (p + 0) to (p + N) with it. I am not sure
> why it is so but it has been as long I remember only wording of
> it has changed over time a bit. Something similar is in C standard
> too.

Just curious, is this to support something like segmented addresses in
16-bit MS-DOS in the case where the whole 2D array would not fit in the
same segment?




Alf P. Steinbach

unread,
Apr 2, 2019, 8:47:06 AM4/2/19
to
It isn't and it couldn't, since you can do p+N by repeated application
of indisputably well-defined P+1.

It's just bollocks.


Cheers!,

- Alf


Öö Tiib

unread,
Apr 2, 2019, 8:48:19 AM4/2/19
to
I believe that it is meant differently. AFAICS that (P+1) is valid
pointer value and compares equal with valid pointer value R
(pointer of first element of next array).
However dereferencing of that valid pointer (P+1) is UB
despite R can be dereferenced. And further (R+1) is valid
pointer value but (P+1)+1 (or P+2, doesn't matter) is invalid
pointer value that may or may not compare equal with (R+1).
Places like [basic.std.dynamic.safety] hint at it.

> >> And in a multidimensional array one past the last element of an inner
> >> array, is guaranteed to be the first element of a next inner array,
> >> except for the last inner array.
> >
> > There is difference between pointer arithmetic and locations of
> > elements of arrays in memory. Standard does not indicate that
> > we are free to add or subtract anything to pointer. On the contrary,
> > [expr.add] is more strict there. So it is true that the pointer (p + N)
> > is valid and is guaranteed to compare equal with pointer at first
> > element of next inner array (&a[1][0])
>
> Yes, it's valid.
>
>
> > but also it is true
> > that (p + N + 1) is explicitly told to be undefined behavior.
>
> Not if you require (P+2) to be equivalent to ((P+1)+1), which is well
> defined.
>
> To make this alleged UB have a practical effect one needs a perverse
> compiler that adds checking of whether N > 1.

Oh, there will be likely built "perverse" processors that support such
checked pointers right away as part of dereference operations. For
lot of people that has been sort of dream to improve safety of pointers
for long time.

> Such a compiler /can/ be implemented, and can possibly be argued to be
> formally correct, because the standard does not require `+` to be
> associative: the C++ version groups left to right.
>
> But we can reason about the standard imposing requirements such as P+2
> having to be split up in two operations (P+1)+1, and so on, that for
> practical effect requires over-the-top sabotaging compiler perversity,
> at the technical cost of the inefficiency of “fat pointers”, and the
> higher level cost of breaking a really large amount of existing code,
> and introducing the notion that the history of a pointer matters.

I wrote about it above how I think that it is meant.

> Would that be reasonable or serve any engineering purpose? Nope. It's
> just hogwash, an unnatural, totally impractical code-breaking
> interpretation that completely disregards context and purpose.

How no purpose? The buffer overflows are actually nasty issue
and beneficial to no one. It is good purpose to at least allow
architectures that make those less frequent or even impossible.

Öö Tiib

unread,
Apr 2, 2019, 8:56:21 AM4/2/19
to
I trust it is about allowing traceable pointer objects described in
[basic.stc.dynamic.safety]. I don't know if any implementation
implements those.

Alf P. Steinbach

unread,
Apr 2, 2019, 9:57:14 AM4/2/19
to
So, say you do P2 = P+1, in the case where P2 is guaranteed to compare
equal to a pointer to the first element of the next inner array.

Is the history, that P2 was computed as P+1, forgotten at some point?

Or will it be UB to dereference the stored pointer value in P2, just as
it is with the expression P+1?

P2 points to the first item in an array, but as I understand it you mean
that it's UB to form P2+1, and well-defined to form P2-1, because it
/came from/ an earlier part; is that reasonable, do you think?

Or can your argument be applied to P2 also, that no arithmetic
whatsoever can be done with it?

Can't go forward because it came from previous inner array. Can't go
backward because it's at the start of an array. Immobile pointer, yes?


> And further (R+1) is valid
> pointer value but (P+1)+1 (or P+2, doesn't matter) is invalid
> pointer value that may or may not compare equal with (R+1).
> Places like [basic.std.dynamic.safety] hint at it.
>
>>>> And in a multidimensional array one past the last element of an inner
>>>> array, is guaranteed to be the first element of a next inner array,
>>>> except for the last inner array.
>>>
>>> There is difference between pointer arithmetic and locations of
>>> elements of arrays in memory. Standard does not indicate that
>>> we are free to add or subtract anything to pointer. On the contrary,
>>> [expr.add] is more strict there. So it is true that the pointer (p + N)
>>> is valid and is guaranteed to compare equal with pointer at first
>>> element of next inner array (&a[1][0])
>>
>> Yes, it's valid.
>>
>>
>>> but also it is true
>>> that (p + N + 1) is explicitly told to be undefined behavior.
>>
>> Not if you require (P+2) to be equivalent to ((P+1)+1), which is well
>> defined.
>>
>> To make this alleged UB have a practical effect one needs a perverse
>> compiler that adds checking of whether N > 1.
>
> Oh, there will be likely built "perverse" processors that support such
> checked pointers right away as part of dereference operations. For
> lot of people that has been sort of dream to improve safety of pointers
> for long time.

But think about it.

The nice checking processor can't prevent me from traversing the array
one step at a time, which is indisputably well-defined.

It can only prevent me from /efficiently/ going in strides across the
array, as is often done in image processing.

Well, okay, so then we also need a hardware image processor to help out
with that particular area.

All this extra hardware and fat pointer overhead is surely what the C++
committee had in mind, a small cost indeed to pay for detecting some
programmers' bad practices that give buffer overruns.


>> Such a compiler /can/ be implemented, and can possibly be argued to be
>> formally correct, because the standard does not require `+` to be
>> associative: the C++ version groups left to right.
>>
>> But we can reason about the standard imposing requirements such as P+2
>> having to be split up in two operations (P+1)+1, and so on, that for
>> practical effect requires over-the-top sabotaging compiler perversity,
>> at the technical cost of the inefficiency of “fat pointers”, and the
>> higher level cost of breaking a really large amount of existing code,
>> and introducing the notion that the history of a pointer matters.
>
> I wrote about it above how I think that it is meant.

No, you didn't.

I've seen no suggestion of a rationale.

It's just a totally impractical nonsense interpretation.


>> Would that be reasonable or serve any engineering purpose? Nope. It's
>> just hogwash, an unnatural, totally impractical code-breaking
>> interpretation that completely disregards context and purpose.
>
> How no purpose? The buffer overflows are actually nasty issue
> and beneficial to no one. It is good purpose to at least allow
> architectures that make those less frequent or even impossible.

There are no buffer overflows in code to iterate through a
multidimensional array.

The wording it appears that you focus on is for individual arrays.

That's a different context.

Öö Tiib

unread,
Apr 2, 2019, 10:57:18 AM4/2/19
to
Yes, dereferencing P2 that was calculated as P+1 is UB.
It is likely not meant to be history how it was calculated but as a
state of the pointer. The pointer P2 is at one past last of its range,
while pointer to the first element of the next inner array is at start
of its range. These compare equal but P2 can't be dereferenced.

> P2 points to the first item in an array, but as I understand it you mean
> that it's UB to form P2+1, and well-defined to form P2-1, because it
> /came from/ an earlier part; is that reasonable, do you think?
>
> Or can your argument be applied to P2 also, that no arithmetic
> whatsoever can be done with it?

Yes, P2-1 is valid and can be dereferenced again.

> Can't go forward because it came from previous inner array. Can't go
> backward because it's at the start of an array. Immobile pointer, yes?

No, P2 is one past last of previous inner array that compares equal to
start of next inner array. It can't be dereferenced but can be
decremented back up to start of previous inner array.
I trust that such wording is not in C and C++ standards because of
programmers being clumsy but because of hardware manufacturers
being toying with various fat pointer concepts for decades.

> >> Such a compiler /can/ be implemented, and can possibly be argued to be
> >> formally correct, because the standard does not require `+` to be
> >> associative: the C++ version groups left to right.
> >>
> >> But we can reason about the standard imposing requirements such as P+2
> >> having to be split up in two operations (P+1)+1, and so on, that for
> >> practical effect requires over-the-top sabotaging compiler perversity,
> >> at the technical cost of the inefficiency of “fat pointers”, and the
> >> higher level cost of breaking a really large amount of existing code,
> >> and introducing the notion that the history of a pointer matters.
> >
> > I wrote about it above how I think that it is meant.
>
> No, you didn't.
>
> I've seen no suggestion of a rationale.
>
> It's just a totally impractical nonsense interpretation.

You have for some reason decided that it is nonsense
but it is unclear (to me) what is the reason.

> >> Would that be reasonable or serve any engineering purpose? Nope. It's
> >> just hogwash, an unnatural, totally impractical code-breaking
> >> interpretation that completely disregards context and purpose.
> >
> > How no purpose? The buffer overflows are actually nasty issue
> > and beneficial to no one. It is good purpose to at least allow
> > architectures that make those less frequent or even impossible.
>
> There are no buffer overflows in code to iterate through a
> multidimensional array.
>
> The wording it appears that you focus on is for individual arrays.
>
> That's a different context.

Consider int arr[5][5]. Now accessing arr[0][5] is one kind of
buffer overflow and it is bad that it is equivalent to accessing
arr[1][0] on most platforms.

Alf P. Steinbach

unread,
Apr 2, 2019, 11:33:24 AM4/2/19
to
On 02.04.2019 16:57, Öö Tiib wrote:
>
> Consider int arr[5][5]. Now accessing arr[0][5] is one kind of
> buffer overflow and it is bad that it is equivalent to accessing
> arr[1][0] on most platforms.

Array indexing is defined in terms of pointer arithmetic; the []
operator applies to pointers, not arrays.

So `arr[0][5]` is by definition `*(arr[0] + 5)` which in turn is by
definition `*(*(arr + 0) + 5)` which is equivalent to `*(*arr + 5)`.

And `arr[1][0]` is by definition `*(arr[1] + 0)` which in turn is by
definition `*(*(arr + 5) + 0)` which is `*(*arr + 5)`.

On which platform are these expressions not equivalent?

I.e., when you say it holds on “most platforms”, on which platform does
the equivalence not hold?


Cheers!,

- Alf

Alf P. Steinbach

unread,
Apr 2, 2019, 10:58:37 PM4/2/19
to
On 02.04.2019 17:33, Alf P. Steinbach wrote:
> On 02.04.2019 16:57, Öö Tiib wrote:
>>
>> Consider int arr[5][5]. Now accessing arr[0][5] is one kind of
>> buffer overflow and it is bad that it is equivalent to accessing
>> arr[1][0] on most platforms.
>
> Array indexing is defined in terms of pointer arithmetic; the []
> operator applies to pointers, not arrays.
>
> So `arr[0][5]` is by definition `*(arr[0] + 5)`  which in turn is by
> definition `*(*(arr + 0) + 5)` which is equivalent to `*(*arr + 5)`.
>
> And `arr[1][0]` is by definition `*(arr[1] + 0)` which in turn is by
> definition `*(*(arr + 5) + 0)` which is `*(*arr + 5)`.

Ouch, sorry about the typo.

`arr[1][0]` is by definition `*(arr[1] + 0)` which in turn is by
definition `*(*(arr + 1) + 0)`,

which is equivalent to `*((*arr + 1×5) + 0)` (with "(*" not "*("), which
is `*(*arr + 5)`

To understand how that result expression works:

1) `arr` decays to pointer to first sub-array.
2) `*arr` forms an lvalue referring to that whole sub-array.
3) That sub-array expression decays to pointer to first `int` item.
4) The compiler adds 5 to that pointer.
5) The result, a pointer to one past the first sub-array, is dereferenced.

You have argued that this, the definition of the indexing, is Undefined
Behavior, possibly on the grounds that the derivation step covered by my
typo and above marked by a blank line, is not specified directly by the
standard; and by interpreting the standard's

“point to elements of the same array object”

as not referring to a multidimensional array object, because you
maintained that the standard had no notion of such.

I showed by direct quote that the standard does indeed use that term and
have that notion, and it constitutes a much more reasonable interpretation.

If it hadn't then filing a Defect Report would IMO be in order.

Ben Bacarisse

unread,
Apr 3, 2019, 6:41:29 AM4/3/19
to
"Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:

> On 02.04.2019 17:33, Alf P. Steinbach wrote:
>> On 02.04.2019 16:57, Öö Tiib wrote:
>>>
>>> Consider int arr[5][5]. Now accessing arr[0][5] is one kind of
>>> buffer overflow and it is bad that it is equivalent to accessing
>>> arr[1][0] on most platforms.
<cut>

> 1) `arr` decays to pointer to first sub-array.
> 2) `*arr` forms an lvalue referring to that whole sub-array.
> 3) That sub-array expression decays to pointer to first `int` item.
> 4) The compiler adds 5 to that pointer.
> 5) The result, a pointer to one past the first sub-array, is dereferenced.
>
> You have argued that this, the definition of the indexing, is
> Undefined Behavior, possibly on the grounds that the derivation step
> covered by my typo and above marked by a blank line, is not specified
> directly by the standard; and by interpreting the standard's
>
> “point to elements of the same array object”
>
> as not referring to a multidimensional array object, because you
> maintained that the standard had no notion of such.

Given int arr[5][5]; the expression arr[0][5] is a special case because
constructing a pointer "just past" the end of an array (or just after a
single non-array object) is specifically defined.

To take a more clear-cut example, is it your view that arr[0][6] is also
well-defined in C++ and that it corresponds to arr[1][1]? The wording
looks similar to that of the C standard, and it is generally regarded as
undefined in C, though it will usually work of course.

The problem with relying on the "elements of the same array object"
wording is that in and 'int arr[5][5]' there are only two plausible
arrays that that text could be referring to. One is arr itself which
has 5 elements. The other is array[0] which also has 5 elements.
Neither of the these has a 7th element (indexed by 6).

The other problem with that wording is that is only applies to
subtracting two pointers. The wording that explains P + N says:

"if the expression P points to the i-th element of an array object,
the expressions (P)+N (equivalently, N+(P)) nd (P)-N (where N has the
value n) point to, respectively, the i + n-th and i − n-th elements of
the array object, provided they exist."

This is why I explained the problem in terms of the number of elements.
There is no array that has enough elements in this example.

--
Ben.

james...@alumni.caltech.edu

unread,
Apr 3, 2019, 9:51:40 AM4/3/19
to
int array[3][4];
int *p = array+1;
p[5] = 6;

james...@alumni.caltech.edu

unread,
Apr 3, 2019, 10:00:39 AM4/3/19
to
On Tuesday, April 2, 2019 at 5:58:30 AM UTC-4, Alf P. Steinbach wrote:
> On 02.04.2019 10:49, Öö Tiib wrote:
...
> > and global indexing by other means in it is not defined.
>
> That sounds incorrect but it depends on what you mean by “global indexing.”

It's not a well-defined term, but in context I would assume he's referring to the use of a T[0] to access elements of T[i] where i>0.

> > So if we have pointer like p = &a[0][0] where a is T[M][N] then
> > [expr.add] (to what other things like increments, decrements
> > and [] refer) does explicitly tell that it is undefined behavior to
> > go any farther than from (p + 0) to (p + N) with it.
>
> No, it doesn't.

"If the expression P points to element x[i] of an array object x with n
elements, the expressions P + J and J + P (where J has the value j)
point to the (possibly-hypothetical) element x[i + j]
if 0 <= i + j >= n; otherwise, the behavior is undefined."

p-1 and p+N+1 both violate the condition near the end of that sentence,
and therefore would have undefined behavior. The standard makes no
exception from that rule for the special case where the array x is
itself an element of another array.

james...@alumni.caltech.edu

unread,
Apr 3, 2019, 11:10:18 AM4/3/19
to
On Tuesday, April 2, 2019 at 7:23:40 AM UTC-4, Alf P. Steinbach wrote:
> On 02.04.2019 12:59, Öö Tiib wrote:
...
> > That is exactly (p + N) what I wrote above.
>
> At least we're clear on that, that /one/ part of the standard requires
> it to be well-defined.
>
> Do you agree that for the abstract machine defined by the standard, the
> history of how a valid pointer value was computed does not matter if
> that history did not involve UB (as it cleary does not for (P)+1)?

No. The history of how a valid pointer was computed can affect what it
is legal to do with it.

...
> > but also it is true
> > that (p + N + 1) is explicitly told to be undefined behavior.
>
> Not if you require (P+2) to be equivalent to ((P+1)+1), which is well
> defined.

Comparing equal doesn't mean that they're required to be equivalent,
only that they're required to represent the same address in memory.

The fact that the standard says the behavior is undefined gives
implementors permission to, for instance, create heavy pointers that
keep a record of start and end of the array from which they are
obtained, and to cause problems if an attempt is made to add or subtract
from them a number that puts them outside the valid range, or to
dereference them when they point one past the end of that range. That
would incur an enormous performance penalty, but some compilers have a
mode where they enable such a feature for debugging purposes. Because
the behavior is undefined, turning on that option does not render the
implementation non-conforming.

More realistically, an implementation is allowed to, for example, look
at the expressions array[0][i] and array[1][j], and assume that neither
expression will be evaluated in a context where i and j have values that
are out of range, and that it is therefore unnecessary to consider the
possibility that they alias the same location. Since the behavior would
be undefined in those cases, ignoring that possibility would not rende
the implementation non-conforming.

> But we can reason about the standard imposing requirements such as P+2
> having to be split up in two operations (P+1)+1, and so on, that for

Such a re-write does nothing to avoid either of the possibilities I
mentioned above. If p+2 would trigger a bounds-check, then so would
(p+1)+1. array[0]+(N-1)+1 might compare equal to array[1], but unlike
the second expression, it would be undefined behavior to use the first
one access the value stored in that location, so an implementation need
not consider the possibility that the two expressions alias each other.

> practical effect requires over-the-top sabotaging compiler perversity,
> at the technical cost of the inefficiency of “fat pointers”, and the
> higher level cost of breaking a really large amount of existing code,
> and introducing the notion that the history of a pointer matters.

No, that notion was introduced a long time ago, when a similar clause was
first written in the C90; C++ merely inherited the notion.

james...@alumni.caltech.edu

unread,
Apr 3, 2019, 12:21:50 PM4/3/19
to
On Tuesday, April 2, 2019 at 9:57:14 AM UTC-4, Alf P. Steinbach wrote:
> On 02.04.2019 14:48, Öö Tiib wrote:
...
> > I believe that it is meant differently. AFAICS that (P+1) is valid
> > pointer value and compares equal with valid pointer value R
> > (pointer of first element of next array).
> > However dereferencing of that valid pointer (P+1) is UB
> > despite R can be dereferenced.
>
> So, say you do P2 = P+1, in the case where P2 is guaranteed to compare
> equal to a pointer to the first element of the next inner array.
>
> Is the history, that P2 was computed as P+1, forgotten at some point?

When the lifetime of P2 ends, or when it is assigned a new value.

> Or will it be UB to dereference the stored pointer value in P2, just as
> it is with the expression P+1?

Yes.

> P2 points to the first item in an array,

It points one past the end of an array, and compares equal to a pointer
that points at the first element of the array, but has undefined
behavior if used to actually access that element.

> ... but as I understand it you mean
> that it's UB to form P2+1, and well-defined to form P2-1, because it
> /came from/ an earlier part; is that reasonable, do you think?

Correct.

> Or can your argument be applied to P2 also, that no arithmetic
> whatsoever can be done with it?

Incorrect.

> Can't go forward because it came from previous inner array. ...

Correct.

> ... Can't go
> backward

Incorrect, because your claim that

> because it's at the start of an array.

is also incorrect.

...
> > Oh, there will be likely built "perverse" processors that support such
> > checked pointers right away as part of dereference operations. For
> > lot of people that has been sort of dream to improve safety of pointers
> > for long time.
>
> But think about it.
>
> The nice checking processor can't prevent me from traversing the array
> one step at a time, which is indisputably well-defined.

I dispute it.

> All this extra hardware and fat pointer overhead is surely what the C++
> committee had in mind, a small cost indeed to pay for detecting some
> programmers' bad practices that give buffer overruns.

What they had in mind was not to do that kind of processing. If you need
to do something like that, define a single large array, and then slicing
to simulate multi-dimensionality using the same techniques as
std::valarray.

...
> > I wrote about it above how I think that it is meant.
>
> No, you didn't.

You're saying he's lying about the reason he had for writing that?

> > How no purpose? The buffer overflows are actually nasty issue
> > and beneficial to no one. It is good purpose to at least allow
> > architectures that make those less frequent or even impossible.
>
> There are no buffer overflows in code to iterate through a
> multidimensional array.
>
> The wording it appears that you focus on is for individual arrays.

It's for arrays in general, whether complete objects in their own right,
or an array whose elements are themselves arrays, or an array that's an
element of another array.

...
> >>>> It's just a sabotage meme originating with some unreasoning socially
> >>>> oriented first-year students that had a Very Ungood Teacher™ (VUT™).
> >>>>
> >>>> Or at least that's my theory. :)

Would you consider that an accurate description of the members of the C committee? (I mention the C committee rather than the C++ committee, only because I know where to find a document in which the C committee has addressed this question - I'm not certain whether the C++ committee has ever felt a need to make it's own distinct decision on this matter).

Defect Report #017 dated 10 Dec 1992 to C89:
> For an array of arrays, the permitted pointer arithmetic in
> subclause 6.3.6, page 47, lines 12-40 is to be understood by
> interpreting the use of the word object as denoting the specific
> object determined directly by the pointer's type and value, not other
> objects related to that one by contiguity. Therefore, if an expression
> exceeds these permissions, the behavior is undefined. For example, the
> following code has undefined behavior:
>
> int a[4][5];
>
> a[1][7] = 0; /* undefined */

When C++ was standardized in 1998, it contained essentially the same
wording for all of the relevant clauses as was used in C. They didn't
make any changes to justify concluding that DR 017 didn't also apply to
C++. To the best of my knowledge, neither committee has ever reversed
the decision of DR 017.

Alf P. Steinbach

unread,
Apr 4, 2019, 1:44:27 AM4/4/19
to
On 03.04.2019 17:10, james...@alumni.caltech.edu wrote:
>
> More realistically, an implementation is allowed to, for example, look
> at the expressions array[0][i] and array[1][j], and assume that neither
> expression will be evaluated in a context where i and j have values that
> are out of range, and that it is therefore unnecessary to consider the
> possibility that they alias the same location. Since the behavior would
> be undefined in those cases, ignoring that possibility would not rende
> the implementation non-conforming.

Thanks, now it appears to be clear where the nonsense comes from, namely
the "optimization" in the GCC compiler.

They've wasted quite some zillions of hours of programmer's time on that
kind of "optimization"s that often (when it kicks in) makes the code do
an unintended and unanticipated thing, without any diagnostic.

It makes the compiler unreliable, but IME it can't be discussed without
a lot of heat ensuing, and circular logic + other fallacies offered up.
As I see it, that's because somehow a majority of GCC users have been
convinced that regarding this issue they're on the same team as the
compiler writers. Sort of like the majority of Trump's voters think
they're on the same team as the billionaires that exploit them.


Cheers!,

- Alf

David Brown

unread,
Apr 4, 2019, 5:05:45 AM4/4/19
to
On 01/04/2019 08:36, Bonita Montero wrote:
> The thing is that .erase simply shifts the elements with th assignment
> -operator and if I do the same with my own code like the following ...
>
> template<typename T>
> void vector_erase( vector<T> &v, typename vector<T>::iterator first,
>                    typename vector<T>::iterator last )
> {
>     typename vector<T>::iterator shiftOld,
>                                  shiftNew;
>     for( shiftOld = last, shiftNew = first; shiftOld != v.end(); )
>         *shiftNew++ = *shiftOld++;
>     v.resize( shiftNew - v.begin() );
> }
>
> .. the iterators to the vector remain valid until the place I cut
> the vector.

That is true for standard erase also. But iterators at or after the
erase point are not valid. You can't assume they point to shifted
values, or other elements.

<https://en.cppreference.com/w/cpp/container/vector/erase>


David Brown

unread,
Apr 4, 2019, 5:18:17 AM4/4/19
to
On 01/04/2019 14:31, Paavo Helde wrote:
> On 1.04.2019 14:22, Öö Tiib wrote:
Isn't it just a combination of "erase leaves elements before the erase
point untouched, but changes the vector after that" and "if an iterator
points to part of a container that gets structural changes, it is invalid" ?

When you have associative containers, erase only (logically) changes the
erased items - iterators to guaranteed unchanged parts are still valid.

>
> Now, if I have some random iterator into the vector, this might or might
> not be invalidated, depending on where the erasure occurred, how is that
> simpler to keep track than to make sure the iterator still points inside
> the final valid vector range.
>
>> complications won't reduce those issues. Is there some
>> motivating example for defining the behavior in standard?
>
> I'm not suggesting that this behavior should be defined, I was just
> trying to translate Bonita's question.
>

Ben Bacarisse

unread,
Apr 4, 2019, 6:44:49 AM4/4/19
to
"Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:

> On 03.04.2019 17:10, james...@alumni.caltech.edu wrote:
>>
>> More realistically, an implementation is allowed to, for example, look
>> at the expressions array[0][i] and array[1][j], and assume that neither
>> expression will be evaluated in a context where i and j have values that
>> are out of range, and that it is therefore unnecessary to consider the
>> possibility that they alias the same location. Since the behavior would
>> be undefined in those cases, ignoring that possibility would not rende
>> the implementation non-conforming.
>
> Thanks, now it appears to be clear where the nonsense comes from,
> namely the "optimization" in the GCC compiler.

That's unlikely. The wording in the C standard predates gcc.

--
Ben.

Bonita Montero

unread,
Apr 4, 2019, 7:05:35 AM4/4/19
to
> That is true for standard erase also. But iterators at or after the
> erase point are not valid. You can't assume they point to shifted
> values, or other elements.

That's a stupid specification because it's easy to implement the erase
-method in a way that only iterators to the last elements of the vector
become invalid (equal to the number you erased). I've already given an
implementation in this thread that makes this guarantee.

James Kuyper

unread,
Apr 4, 2019, 8:04:49 AM4/4/19
to
On 4/4/19 1:44 AM, Alf P. Steinbach wrote:
> On 03.04.2019 17:10, james...@alumni.caltech.edu wrote:
>>
>> More realistically, an implementation is allowed to, for example, look
>> at the expressions array[0][i] and array[1][j], and assume that neither
>> expression will be evaluated in a context where i and j have values that
>> are out of range, and that it is therefore unnecessary to consider the
>> possibility that they alias the same location. Since the behavior would
>> be undefined in those cases, ignoring that possibility would not rende
>> the implementation non-conforming.
>
> Thanks, now it appears to be clear where the nonsense comes from, namely
> the "optimization" in the GCC compiler.

I made no mention of gcc, or any other particular compiler. I have no
idea whether any compiler actually performs such an optimization. I was
giving it only as an example of how you might realistically see
unexpected behavior due to the fact that such code has undefined behavior.

Keep in mind that this "nonsense" was explicitly endorsed by the C
committee in DR #017, justifying that ruling by citing wording from the
C90 standard that has continued to be present in every version of both
the C and C++ standards. They added wording in section J.1 of C99
clarifying that this was indeed the intent of the committee. Neither the
C committee nor the C++ committee has ever, to the best of my knowledge,
repudiated the resolution of that DR.

Paavo Helde

unread,
Apr 4, 2019, 8:05:00 AM4/4/19
to
If something is easy it does not automatically mean it should be done.

For example, composing SQL statements by string concatenation. Or
copy-pasting a large function to make a small modification.

James Kuyper

unread,
Apr 4, 2019, 8:14:05 AM4/4/19
to
I'd been planning to say the same thing, but when I checked on
Wikipedia, it said that the first version of gcc came out in 1987, 2
years before the first version of the C standard.
Note that while the wording in the C standard dates back to C89, the
fact that it has this meaning was sufficiently unclear that it had to be
resolved with DR #017 in 1992, 5 years after gcc came out.
I sincerely doubt that gcc had anything in particular to do with that
decision, but the timing of the events does not rule out that possibility.

Bonita Montero

unread,
Apr 4, 2019, 8:43:18 AM4/4/19
to
> If something is easy it does not automatically mean it should be done.

I already said that this implications won't make any restriction on the
implementation.

> For example, composing SQL statements by string concatenation.
> Or copy-pasting a large function to make a small modification.

This analogies don't fit.

Ben Bacarisse

unread,
Apr 4, 2019, 8:55:47 AM4/4/19
to
James Kuyper <james...@alumni.caltech.edu> writes:

> On 4/4/19 6:44 AM, Ben Bacarisse wrote:
>> "Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
>>
>>> On 03.04.2019 17:10, james...@alumni.caltech.edu wrote:
>>>>
>>>> More realistically, an implementation is allowed to, for example, look
>>>> at the expressions array[0][i] and array[1][j], and assume that neither
>>>> expression will be evaluated in a context where i and j have values that
>>>> are out of range, and that it is therefore unnecessary to consider the
>>>> possibility that they alias the same location. Since the behavior would
>>>> be undefined in those cases, ignoring that possibility would not rende
>>>> the implementation non-conforming.
>>>
>>> Thanks, now it appears to be clear where the nonsense comes from,
>>> namely the "optimization" in the GCC compiler.
>>
>> That's unlikely. The wording in the C standard predates gcc.
>
> I'd been planning to say the same thing, but when I checked on
> Wikipedia, it said that the first version of gcc came out in 1987, 2
> years before the first version of the C standard.

I could claim (though I won't!) that I was talking about the first C
standard -- the reference manual in K&R1 -- which has similar wording,
though it does not explicitly say the result is undefined. K&R2 (1988)
adds such an explicit remark with no comment about this being a change,
and the list of changes from K&R1 does not include it.

It's not reasonable to think that K&R (or the committee) added those
words because gcc, in it's first few months, had decided to do
sophisticated alias analysis.

> Note that while the wording in the C standard dates back to C89, the
> fact that it has this meaning was sufficiently unclear that it had to be
> resolved with DR #017 in 1992, 5 years after gcc came out.

Right. That means that the "it comes from gcc" argument would require
that it was sufficiently clear the the gcc team (despite being,
apparently unclear to others) that they were prepared to break code that
did not have the same understanding.

> I sincerely doubt that gcc had anything in particular to do with that
> decision, but the timing of the events does not rule out that
> possibility.

ACK.

--
Ben.

David Brown

unread,
Apr 4, 2019, 9:30:00 AM4/4/19
to
On 02/04/2019 14:48, Öö Tiib wrote:
> On Tuesday, 2 April 2019 14:23:40 UTC+3, Alf P. Steinbach wrote:
>> On 02.04.2019 12:59, Öö Tiib wrote:

<snip>

>>
>>> but also it is true
>>> that (p + N + 1) is explicitly told to be undefined behavior.
>>
>> Not if you require (P+2) to be equivalent to ((P+1)+1), which is well
>> defined.
>>
>> To make this alleged UB have a practical effect one needs a perverse
>> compiler that adds checking of whether N > 1.
>
> Oh, there will be likely built "perverse" processors that support such
> checked pointers right away as part of dereference operations. For
> lot of people that has been sort of dream to improve safety of pointers
> for long time.
>


None of these sorts of limitations of valid access are really about
"perverse processors". There are several advantages of saying that you
may not access a multidimensional array as though it were a large
unidimensional array - with many other kinds of banned (or undefined, or
invalid) behaviour having similar points.

1. Code that accesses data in a weird way, is usually wrong. By making
it explicitly wrong (by declaring it to be invalid or undefined
behaviour), compilers, debuggers and other tools are able to spot
mistakes and help the user correct them. If these out-of-bounds
accesses were defined behaviour, tools could not tell if the programmer
had a bug in their code or was intentionally mistreating their arrays.

2. It aids optimisation. The limits on the behaviour means the compiler
knows more about possible aliasing, meaning it can do better at
automatically vectorising code, or re-arranging accesses for better
scheduling.

3. It gives the compiler more scope to re-arrange things. There is no
need to have the arrays contiguous in memory - it can store the
sub-arrays in SIMD registers, or arrange them for better cache locality.
On platforms with multiple independent ram blocks, it could put
different parts in different blocks to improve throughpput.

4. It makes logical sense to restrict allowed access, so that your code
does what it appears to say it will do. Code that defines arrays of a
particular size, then pretends they are something different, is
inevitably going to be harder to follow.


C and C++ let you define multidimensional arrays. They let you define
unidimensional arrays. They let you define structures that support both
accesses. It makes a lot more sense to write code correctly, using
appropriate structures and features, rather than trying to write
something that is inconsistent and against the explicit rules of the
language.

james...@alumni.caltech.edu

unread,
Apr 4, 2019, 9:51:23 AM4/4/19
to
On Thursday, April 4, 2019 at 9:30:00 AM UTC-4, David Brown wrote:
...
> None of these sorts of limitations of valid access are really about
> "perverse processors". There are several advantages of saying that you
> may not access a multidimensional array as though it were a large
> unidimensional array - with many other kinds of banned (or undefined, or
> invalid) behaviour having similar points.
...
> 3. It gives the compiler more scope to re-arrange things. There is no
> need to have the arrays contiguous in memory - it can store the
> sub-arrays in SIMD registers, or arrange them for better cache locality.
> On platforms with multiple independent ram blocks, it could put
> different parts in different blocks to improve throughpput.

The standard explicitly requires that arrays be allocated contiguously
(9.2.3.4p1). Violating that requirement for well-formed code requires
invoking the as-if rule. Since the array bounds rule gives undefined
behavior to many of the simplest kinds of code you could write to prove
that an array is not contiguous, it does open up the scope for applying
the as-if rule, but it's not an open-ended release from that
requirement. For instance, if std::memcpy(), std::memset(),
std::memcmp(), std::istream::read(), std::ostream::write() or sizeof are
applied to more than one row of such an array, they must have behavior
consistent with that requirement.

David Brown

unread,
Apr 4, 2019, 10:00:48 AM4/4/19
to
Yes, but you can't do p + N + 1 that way. Nor can you access data via
(p + N).

A compiler /could/ split a multidimensional array across memory
segments, if all access was done using the array pointers. But it is
legal to access the whole array as one object using char pointers, or
memcpy, and it would be rather difficult to maintain logical correctness
while having the data physically split up.

David Brown

unread,
Apr 4, 2019, 10:09:05 AM4/4/19
to
On 04/04/2019 15:51, james...@alumni.caltech.edu wrote:
> On Thursday, April 4, 2019 at 9:30:00 AM UTC-4, David Brown wrote:
> ...
>> None of these sorts of limitations of valid access are really about
>> "perverse processors". There are several advantages of saying that you
>> may not access a multidimensional array as though it were a large
>> unidimensional array - with many other kinds of banned (or undefined, or
>> invalid) behaviour having similar points.
> ...
>> 3. It gives the compiler more scope to re-arrange things. There is no
>> need to have the arrays contiguous in memory - it can store the
>> sub-arrays in SIMD registers, or arrange them for better cache locality.
>> On platforms with multiple independent ram blocks, it could put
>> different parts in different blocks to improve throughpput.
>
> The standard explicitly requires that arrays be allocated contiguously
> (9.2.3.4p1).

I started writing an explanation of how non-contiguous arrays could
work, but I see you've covered it nicely yourself.

I don't know if compilers actually do this, but they are certainly
allowed to (with the restrictions you list).

Öö Tiib

unread,
Apr 4, 2019, 12:14:36 PM4/4/19
to
Yes, sure an UB encourages tool writers to check for particular UB.
Fortunately the tools do not limit to such but have also tendency to
warn us about when there are no UB but the situation feels otherwise
questionable.

> 2. It aids optimisation. The limits on the behaviour means the compiler
> knows more about possible aliasing, meaning it can do better at
> automatically vectorising code, or re-arranging accesses for better
> scheduling.

It is always possible that there is some optimization about aliasing
but it feels unlikely about pointers to same type. The restrict keyword
was likely added to C because assumptions like that A[-2] and B[3]
are different objects does indeed help to optimize but compiler is in
difficulties to find any reasons to assume that. It feels not much
better when A and B are pointers of multidimensional array. Can
you bring any example of such optimizations?

> 3. It gives the compiler more scope to re-arrange things. There is no
> need to have the arrays contiguous in memory - it can store the
> sub-arrays in SIMD registers, or arrange them for better cache locality.
> On platforms with multiple independent ram blocks, it could put
> different parts in different blocks to improve throughpput.

When compiler is sure that it is such local array that it may rearrange
then the compiler is also free to do it when that UB was not in standard.

> 4. It makes logical sense to restrict allowed access, so that your code
> does what it appears to say it will do. Code that defines arrays of a
> particular size, then pretends they are something different, is
> inevitably going to be harder to follow.

I do not see how. The huge pile of potential UBs on every step is not
helping us to write code that does what it appears to say it does.
We see how even most advanced specialists argue if something is
UB or nonsense hearsay. I fail to see with what that helped (if
we leave aside that you brought in 1) that it encourages tool makers
to diagnose it).

Note that standard clearly allows us to access and to do pointer
arithmetics over whole array of arrays with char pointers when we
intentionally want/need to be so unusual there.

David Brown

unread,
Apr 4, 2019, 2:21:14 PM4/4/19
to
That's true, of course. But it's easier when there is clarity in the
situation - if the compiler can see you've stepped outside the valid
ranges for the array, it can give the warning. If the behaviour had
been allowed, but questionable, then you'd need more optional flags, or
ways to help the compiler distinguish false positives from intentional
coding. (I'm thinking here of things like writing "if ((x = a))..."
with extra parenthesis, so that the compiler knows the legal but
questionable code is intentional.)

>> 2. It aids optimisation. The limits on the behaviour means the compiler
>> knows more about possible aliasing, meaning it can do better at
>> automatically vectorising code, or re-arranging accesses for better
>> scheduling.
>
> It is always possible that there is some optimization about aliasing
> but it feels unlikely about pointers to same type. The restrict keyword
> was likely added to C because assumptions like that A[-2] and B[3]
> are different objects does indeed help to optimize but compiler is in
> difficulties to find any reasons to assume that. It feels not much
> better when A and B are pointers of multidimensional array. Can
> you bring any example of such optimizations?

It is often difficult to think of examples - and just because a compiler
/can/ use information for optimisation, does not mean that it /will/.
But basically, with the rules laid out in the standards, the compiler
knows that A[i][j] and A[x][y] cannot alias if i != x, no matter what j
and y are. This could have an effect in some code, such as in-place
matrix operations.

I don't imagine it will often give optimisation opportunities, but
sometimes it will - and if it means you can use vector operations, it
can be significant.

>
>> 3. It gives the compiler more scope to re-arrange things. There is no
>> need to have the arrays contiguous in memory - it can store the
>> sub-arrays in SIMD registers, or arrange them for better cache locality.
>> On platforms with multiple independent ram blocks, it could put
>> different parts in different blocks to improve throughpput.
>
> When compiler is sure that it is such local array that it may rearrange
> then the compiler is also free to do it when that UB was not in standard.

True. But the UB means that it already knows certain types of access
can't happen, which can make it easier to be sure the re-arrangement is
safe.

Again, I can't tell you if compilers do this sort of thing, or if it is
often significant. All I can say is that the UB gives the compiler more
information about what can't happen, and that can open more optimisation
and rearrangement possibilities.

>
>> 4. It makes logical sense to restrict allowed access, so that your code
>> does what it appears to say it will do. Code that defines arrays of a
>> particular size, then pretends they are something different, is
>> inevitably going to be harder to follow.
>
> I do not see how. The huge pile of potential UBs on every step is not
> helping us to write code that does what it appears to say it does.

Yes, it does. By having rules that say you are not allowed to write
code in such-and-such a way, people are less likely to write code that way.

> We see how even most advanced specialists argue if something is
> UB or nonsense hearsay. I fail to see with what that helped (if
> we leave aside that you brought in 1) that it encourages tool makers
> to diagnose it).

I don't think anyone is arguing that out-of-bounds sub-array access is
UB or not - the standards are entirely clear on the matter. There is
some disagreement about whether it /should/ be UB, and whether compilers
should take advantage of that or whether they should consider it to be
defined behaviour (compilers can always give a definition of particular
undefined behaviours).

>
> Note that standard clearly allows us to access and to do pointer
> arithmetics over whole array of arrays with char pointers when we
> intentionally want/need to be so unusual there.

Yes.

Alf P. Steinbach

unread,
Apr 4, 2019, 11:51:59 PM4/4/19
to
On 04.04.2019 14:13, James Kuyper wrote:
> On 4/4/19 6:44 AM, Ben Bacarisse wrote:
>> "Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
>>
>>> On 03.04.2019 17:10, james...@alumni.caltech.edu wrote:
>>>>
>>>> More realistically, an implementation is allowed to, for example, look
>>>> at the expressions array[0][i] and array[1][j], and assume that neither
>>>> expression will be evaluated in a context where i and j have values that
>>>> are out of range, and that it is therefore unnecessary to consider the
>>>> possibility that they alias the same location. Since the behavior would
>>>> be undefined in those cases, ignoring that possibility would not rende
>>>> the implementation non-conforming.
>>>
>>> Thanks, now it appears to be clear where the nonsense comes from,
>>> namely the "optimization" in the GCC compiler.
>>
>> That's unlikely. The wording in the C standard predates gcc.

The last line above assumes the interpretation that it seeks to prove.
Which is the usual circular argumentation fallacy in debates about GCC.
Which goes to show that as an online discussion that involves GCC grows
beyond the first few remarks, the probability of clearly fallacious
reasoning being used in earnest, approaches 1.


> I'd been planning to say the same thing, but when I checked on
> Wikipedia, it said that the first version of gcc came out in 1987, 2
> years before the first version of the C standard.
> Note that while the wording in the C standard dates back to C89, the
> fact that it has this meaning was sufficiently unclear that it had to be
> resolved with DR #017 in 1992, 5 years after gcc came out.

This however is good, and proves me wrong about the interpretation.

At least it does if one doesn't argue pedantically (and misguided) that
the part of the C++ standard that says it incorporates the C standard,
is non-normative. It is, though, which to me reaffirms that the standard
is practical document intended to be interpreted with common sense, and
not a perfect formal document. Which sometimes, when the practicality is
arguable as it is here, leads to doubts and misunderstandings...

Quoting the resolution of question 16 in C Defect Report #17 at <url:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_017.html>:

[quote]
For an array of arrays, the permitted pointer arithmetic in subclause
6.3.6, page 47, lines 12-40 is to be understood by interpreting the use
of the word ``object'' as denoting the specific object determined
directly by the pointer's type and value, not other objects related to
that one by contiguity. Therefore, if an expression exceeds these
permissions, the behavior is undefined. For example, the following code
has undefined behavior:
int a[4][5];

a[1][7] = 0; /* undefined */
Some conforming implementations may choose to diagnose an ``array bounds
violation,'' while others may choose to interpret such attempted
accesses successfully with the ``obvious'' extended semantics.
[/quote]


> I sincerely doubt that gcc had anything in particular to do with that
> decision, but the timing of the events does not rule out that possibility.

It's all GCC's fault.

Including all apparently natural catastrophes.

Well, more seriously, if it wasn't the folks involved with optimization
in GCC, then it was the same mindset at work, the idea of /supporting
the compiler/ at the expense of programmers having to know about and
perfectly, consistently avoid a large number of unnatural possible UBs,
i.e. making it easy to write incorrect code and making the effect
unpredictable, instead of the opposite, making it hard to write
incorrect code and making the code predictable, at cost to compiler
writers who must then work harder to effect the desired optimizations.


Cheers!,

- Alf

James Kuyper

unread,
Apr 5, 2019, 8:47:09 AM4/5/19
to
On 4/4/19 11:51 PM, Alf P. Steinbach wrote:
> On 04.04.2019 14:13, James Kuyper wrote:
>> On 4/4/19 6:44 AM, Ben Bacarisse wrote:
>>> "Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
>>>
>>>> On 03.04.2019 17:10, james...@alumni.caltech.edu wrote:
>>>>>
>>>>> More realistically, an implementation is allowed to, for example, look
>>>>> at the expressions array[0][i] and array[1][j], and assume that neither
>>>>> expression will be evaluated in a context where i and j have values that
>>>>> are out of range, and that it is therefore unnecessary to consider the
>>>>> possibility that they alias the same location. Since the behavior would
>>>>> be undefined in those cases, ignoring that possibility would not rende
>>>>> the implementation non-conforming.
>>>>
>>>> Thanks, now it appears to be clear where the nonsense comes from,
>>>> namely the "optimization" in the GCC compiler.
>>>
>>> That's unlikely. The wording in the C standard predates gcc.
>
> The last line above assumes the interpretation that it seeks to prove.

How? In the last line, he's attempting to prove that it's unlikely that
gcc influenced the committee's decision. The only assumption he's using
is that gcc didn't exist yet at the time the decision was made. I don't
see any circularity there. He was incorrect about that point, but only
by a couple of years; and it is indeed unlikely that the newly released
compiler was already sufficiently influential to control the committee's
decisions. It's not impossible, as it would have been if he'd been
right, but it's still implausible.

I'm not sure whether the FSF had any representatives on the committee at
that time. I know that they had no representatives on the committee at
the time C99 was approved, because some of them complained in this forum
about the committee's failure to incorporate many of gcc's innovations.
The response from a committee member was to point out that there was no
one on the committee who felt sufficiently familiar with those
innovations to promote them. He said that the FSF should have put at
least one member on the committee. The FSF people responded by claiming
that membership was too expensive in both money and time for them to
participate.

But many organizations much smaller than the FSF had no trouble finding
people to volunteer for the committee. Participating in the US committee
was only $800/year at the time that complaint was made, and open to
foreign nationals - many people join the US committee because their own
country either doesn't have a standards group participating in ISO, or
if there is such a group, membership is too expensive. The US committee
gets only 1 vote in ISO, just like any other country, but is
disproportionately influential on the ISO committee, and that's one of
the reasons (having a lot more members than the other countries to work
on tasks is the other main factor).
Voting membership also requires attending at least 2(?) of the three
meetings per year that are held at various places around the world. The
travel costs can be significant - but the FSF is sufficiently large and
sufficiently international that it should have been able to send a
representative to at least 2 meetings per year without having to use an
airplane to get there. Many members have no organizational support -
they cover the costs entirely from their own funds.

>> I'd been planning to say the same thing, but when I checked on
>> Wikipedia, it said that the first version of gcc came out in 1987, 2
>> years before the first version of the C standard.
>> Note that while the wording in the C standard dates back to C89, the
>> fact that it has this meaning was sufficiently unclear that it had to be
>> resolved with DR #017 in 1992, 5 years after gcc came out.
>
> This however is good, and proves me wrong about the interpretation.
>
> At least it does if one doesn't argue pedantically (and misguided) that
> the part of the C++ standard that says it incorporates the C standard,
> is non-normative.

Several distinct points, each of which is separately sufficient to
deflate that argument.

1. I was making no use of the concept that the C++ standard incorporates
any part of the C standard.

2. The particular clauses from the C standard that the C committee used
as a basis for it's decision on DR017 were copied into the C++ standard
with modifications, rather than being incorporated by reference.

This is not particularly unusual - the only significant part of the C
standard that was incorporated into the C++ standard by reference was
the description of the C standard library. Section 1.7, "Normative
references", includes references to the C99 standard and to the next
three Technical Corrigenda to that standard, but referring to a standard
does not mean incorporating it by reference. For instance, POSIX is
another standard listed in that section, but it's mentioned there only
so the standard can refer to POSIX when discussing compatibility with
that standard, not because any of that standard's requirements are
incorporated into C++ by reference.

None of the modifications that were made during the copy affected the
validity of the C committee's conclusions. If the C++ committee had ever
said anything to repudiate those conclusions, at least so far as C++ is
concerned, it would be a different matter. But as far as I know, they
haven't.

3. The only clause that does incorporate parts of the C standard by
reference is 17.5.1.5p1, which is fully normative - what makes you think
otherwise? If that weren't the case, there'd be no normative definition
of the behavior of most of the functions that C++ borrows from the C
standard library. Are you really prepared to say that there is no
normative description of the behavior of std::memcpy()?

> ... It is, though, which to me reaffirms that the standard
> is practical document intended to be interpreted with common sense, and
> not a perfect formal document. Which sometimes, when the practicality is
> arguable as it is here, leads to doubts and misunderstandings...
>
> Quoting the resolution of question 16 in C Defect Report #17 at <url:
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_017.html>:
>
> [quote]
> For an array of arrays, the permitted pointer arithmetic in subclause
> 6.3.6, page 47, lines 12-40 is to be understood by interpreting the use
> of the word ``object'' as denoting the specific object determined
> directly by the pointer's type and value, not other objects related to
> that one by contiguity. Therefore, if an expression exceeds these
> permissions, the behavior is undefined. For example, the following code
> has undefined behavior:
> int a[4][5];
>
> a[1][7] = 0; /* undefined */
> Some conforming implementations may choose to diagnose an ``array bounds
> violation,'' while others may choose to interpret such attempted
> accesses successfully with the ``obvious'' extended semantics.
> [/quote]
>
>
>> I sincerely doubt that gcc had anything in particular to do with that
>> decision, but the timing of the events does not rule out that possibility.
>
> It's all GCC's fault.
>
> Including all apparently natural catastrophes.

I'll assume that was a joke. Given the faceless nature of usenet, it's
best to mark such comments with an appropriate emoticon, tag, or emoji -
it's easy to confuse such jokes with the mad but serious ravings of
fanatics, because such fanatics do in fact post such things to usenet
fairly often.

> Well, more seriously, if it wasn't the folks involved with optimization
> in GCC, then it was the same mindset at work, the idea of /supporting
> the compiler/ at the expense of programmers having to know about and
> perfectly, consistently avoid a large number of unnatural possible UBs,
> i.e. making it easy to write incorrect code and making the effect
> unpredictable, instead of the opposite, making it hard to write
> incorrect code and making the code predictable, at cost to compiler
> writers who must then work harder to effect the desired optimizations.

You've got the motivations backwards. Hard as it may be for you to
believe, the relevant rule was written in the belief that code which
violates it is fundamentally flawed, and should therefore NOT be
allowed. You clearly do not agree - but your failure to agree doesn't
change the fact that this is in fact the motivation for the rule.

However, given the nature of the language, detection of violations of
that rule at compile time without any false positives or false negatives
is provably equivalent to solving the halting problem. It is therefore
not feasible for the standard to mandate such detection, so it cannot be
made a constraint violation. The most it can do is declare that such
code has undefined behavior.

Run-time detection of such problems using heavy pointers is entirely
feasible - and is permitted precisely because the behavior is undefined.
Optimizations such as the one I mentioned are simply allowed by the fact
that such code has undefined behavior - they were not the motivation for
the rule.

james...@alumni.caltech.edu

unread,
Apr 5, 2019, 10:21:47 AM4/5/19
to
On Friday, April 5, 2019 at 8:47:09 AM UTC-4, James Kuyper wrote:
> On 4/4/19 11:51 PM, Alf P. Steinbach wrote:
...
> > Well, more seriously, if it wasn't the folks involved with optimization
> > in GCC, then it was the same mindset at work, the idea of /supporting
> > the compiler/ at the expense of programmers having to know about and
> > perfectly, consistently avoid a large number of unnatural possible UBs,
> > i.e. making it easy to write incorrect code and making the effect
> > unpredictable, instead of the opposite, making it hard to write
> > incorrect code and making the code predictable, at cost to compiler
> > writers who must then work harder to effect the desired optimizations.
>
> You've got the motivations backwards. Hard as it may be for you to
> believe, the relevant rule was written in the belief that code which
> violates it is fundamentally flawed, and should therefore NOT be
> allowed. You clearly do not agree - but your failure to agree doesn't
> change the fact that this is in fact the motivation for the rule.
>
> However, given the nature of the language, detection of violations of
> that rule at compile time without any false positives or false negatives
> is provably equivalent to solving the halting problem. It is therefore
> not feasible for the standard to mandate such detection, so it cannot be
> made a constraint violation. The most it can do is declare that such
> code has undefined behavior.
>
> Run-time detection of such problems using heavy pointers is entirely
> feasible - and is permitted precisely because the behavior is undefined.
> Optimizations such as the one I mentioned are simply allowed by the fact
> that such code has undefined behavior - they were not the motivation for
> the rule.

I let myself get sidetracked while writing that. The reasons why this
can't be a constraint violation are stronger and less esoteric than just
the halting problem issue. A constraint violation is supposed to be
detectable during translation of a program. Determining whether or not
this rule has been violated requires knowing which array a given pointer
expression points at, and what location in that array it points at, and
the length of the array, as well as the value of the integer expression
being added or subtracted from the pointer expression. NONE of that
information need be available during translation - it could all vary at
run time. Decent implementations are free to produce a warning in cases
where it's easy to determine that it's been violated, but it's not
feasible for the standard to mandate detection in all cases, which is
what would happen if it were a constraint violation.

They could have made an exception to this rule for cases, such as
arrays of arrays, where contiguity is guaranteed. They did end up having
to make such an exception for pointer equality comparisons in the case
where two objects happen to be adjacent. The fact that they didn't make
a similar exception for the addition of an integer to a pointer reflects
the committee's judgement that you shouldn't be writing such code.

Alf P. Steinbach

unread,
Apr 5, 2019, 11:56:44 AM4/5/19
to
On 05.04.2019 14:46, James Kuyper wrote:
> On 4/4/19 11:51 PM, Alf P. Steinbach wrote:
>> On 04.04.2019 14:13, James Kuyper wrote:
>>> On 4/4/19 6:44 AM, Ben Bacarisse wrote:
>>>> "Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
>>>>
>>>>> On 03.04.2019 17:10, james...@alumni.caltech.edu wrote:
>>>>>>
>>>>>> More realistically, an implementation is allowed to, for example, look
>>>>>> at the expressions array[0][i] and array[1][j], and assume that neither
>>>>>> expression will be evaluated in a context where i and j have values that
>>>>>> are out of range, and that it is therefore unnecessary to consider the
>>>>>> possibility that they alias the same location. Since the behavior would
>>>>>> be undefined in those cases, ignoring that possibility would not rende
>>>>>> the implementation non-conforming.
>>>>>
>>>>> Thanks, now it appears to be clear where the nonsense comes from,
>>>>> namely the "optimization" in the GCC compiler.
>>>>
>>>> That's unlikely. The wording in the C standard predates gcc.
>>
>> The last line above assumes the interpretation that it seeks to prove.
>
> How? In the last line, he's attempting to prove that it's unlikely that
> gcc influenced the committee's decision.

Well, keep in mind that we were talking about two opposing
interpretations of a piece of the standard, not that wording itself.

The circular assumption is that the committee's intent, what it tried to
express, was the favored interpretation.

Without that circular assumption the above would express in a
Spock-incompatible way that GCC folks could not have made an
interpretation of wording that already existed, which is a much worse
fallacy.

As it happened the conclusion turned out to probably not be wrong (it
was I who was wrong), but the logic is just missing.

Wikipedia about this:

[quote]
Circular reasoning is not a formal logical fallacy but a pragmatic
defect in an argument whereby the premises are just as much in need of
proof or evidence as the conclusion, and as a consequence the argument
fails to persuade. Other ways to express this are that there is no
reason to accept the premises unless one already believes the
conclusion, or that the premises provide no independent ground or
evidence for the conclusion. Begging the question is closely related to
circular reasoning, and in modern usage the two generally refer to the
same thing.
[/quote]

You showed me, I can't speak for others really though I think they'd
have stated it if they'd known, that some time after C Defect Report 17
the committee ruled on what the wording should mean, and decided that it
meant that interpretation, that I had been sure was nonsense on grounds
of practicality. That resolution is definitive for me. If the above had
referred to the DR's resolution then it would have been roughly valid,
since presumably that resolution is known to compiler writers


> The only assumption he's using
> is that gcc didn't exist yet at the time the decision was made.

Nope.


> I don't
> see any circularity there. He was incorrect about that point, but only
> by a couple of years; and it is indeed unlikely that the newly released
> compiler was already sufficiently influential to control the committee's
> decisions. It's not impossible, as it would have been if he'd been
> right, but it's still implausible.

Again, keep in mind that we were talking about two opposing
interpretations of a piece of the standard, not that wording itself.


> [snip]
>> This [the info about C DR 17] however is good, and proves me wrong
>> about the interpretation.
>>
>> At least it does if one doesn't argue pedantically (and misguided) that
>> the part of the C++ standard that says it incorporates the C standard,
>> is non-normative.
>
> Several distinct points, each of which is separately sufficient to
> deflate that argument.
>
> 1. I was making no use of the concept that the C++ standard incorporates
> any part of the C standard.

I'm sorry, I didn't mean to imply that.

Rather I intended to note that in the context of admitting that I was
wrong. Sort of, hey, I'm not going into some artificial formal idiocy to
show how right I could have been with a literal no-common-sense reading
of the standard. I'm very much in favor of clarity, not obscurity. :)


> 2. The particular clauses from the C standard that the C committee used
> as a basis for it's decision on DR017 were copied into the C++ standard
> with modifications, rather than being incorporated by reference.
>
> This is not particularly unusual - the only significant part of the C
> standard that was incorporated into the C++ standard by reference was
> the description of the C standard library. Section 1.7, "Normative
> references", includes references to the C99 standard and to the next
> three Technical Corrigenda to that standard, but referring to a standard
> does not mean incorporating it by reference.

I was referring to

C++14 17.5.1.5
[quote]
Paragraphs labeled “See also:” contain cross-references to the relevant
portions of this International Standard and the ISO C standard, which is
incorporated into this International Standard by reference.
[/quote]

It was there from the beginning, but in C++17 the part after the comma
has been removed.

Anyway it was in a non-normative section of the standard.

I'm not sure how e.g. the issue of guaranteed ranges of basic types is
specified in C++17 and later, without that incorporation.

With C++14 and earlier one could just, with a reasonable common sense
interpretation, use the C standard's wording.


> For instance, POSIX is
> another standard listed in that section, but it's mentioned there only
> so the standard can refer to POSIX when discussing compatibility with
> that standard, not because any of that standard's requirements are
> incorporated into C++ by reference.


[snip rest]


Cheers!,

- Alf

Richard Damon

unread,
Apr 5, 2019, 12:15:46 PM4/5/19
to
On 3/31/19 6:21 PM, Bonita Montero wrote:
>>>>>> For starters, the iterators are invalid because they won't be
>>>>>> pointing to the *same* elements anymore.
>
>>>>> That shoudln't be a problem since they could partitially point
>>>>> to the shifted elements. That could be guaranteed by the standard
>>>>> withoud any restrictions on the implementation.
>
>>>> And you had the nerve to call my capabilities into question. Invalid
>>>> iterators should never be used even if they still "point" to valid
>>>> objects as this is undefined behaviour according to the standard.
>
>>> If you were able to read you did read that I already know that.
>
>>> But there's a point in which I wasn't certain and where had tomatoes
>>> on my eyes: reallocation obviously doesn't occur because the iterators
>>> before the erased sequence remain valid.
>>> So there's absolutely no obvious coercive necessity for the standard
>>> to mandate to invalidate the iterators beginning or behind that
>>> sequence. The iterators could point partitially to shifted elements
>>> without any restriction on the implementation.
>
>> The iterator concept dictates that what an iterator refers to never
>> changes unless the iterator itself is changed.  What you propose would
>> ruin the iterator concept.
>
> But there's no technical necessity for this concept.
>

It seems that if you want an way to specify the n'th element of the
vector, you don't want an iterator, but an numerical index, and use
indexing operations on the vector.

An iterator IS a concept, and part of that is that the idea that it is
an extension of a pointer that when working on a sequence always points
to a GIVEN item, and the invalidation rules tells you when that
assumption no longer holds.

Note, that in normal implementations, the invalidation of those
iterators doesn't actually do anything to them, so they could be used to
access the shifted elements, and unless the iterator has an extensive
debugging scaffolding to check this (which is in my experience
unusually, especially if not asked for) it will work the way you
describe in the case you describe. The key here is YOU need to know
enough of what is happening behind the scenes to know what unpromised
behavior exists. The Standard, because it does include the concept that
an Iterator will continue to point to the same element unless explicitly
changed (or invalidated) doesn't want to add the overhead to have all
those iterators updated, so it invalidates them. This concept is useful
because it is meaning full for a wide range of data structures, and
given data structures are constrained by the various complexity limits
and iterator invalidation rules. Changing invalidation to some specific
changing of what certain iterators now point to adds a complexity to the
standard and to understanding it.

As I pointed out in the beginning, the behavior you are looking for IS
available.

David Brown

unread,
Apr 5, 2019, 1:39:43 PM4/5/19
to
Ah, so the "circular assumption" comes from reading the words in the
standard, and assuming the authors wrote what they intended to write?

That is a very strange use of the phrase "circular assumption" or
"circular reasoning" - not one that I had come across before. I thought
it was just an example of "reading".



james...@alumni.caltech.edu

unread,
Apr 5, 2019, 2:41:20 PM4/5/19
to
On Friday, April 5, 2019 at 11:56:44 AM UTC-4, Alf P. Steinbach wrote:
> On 05.04.2019 14:46, James Kuyper wrote:
> > On 4/4/19 11:51 PM, Alf P. Steinbach wrote:
> >> On 04.04.2019 14:13, James Kuyper wrote:
> >>> On 4/4/19 6:44 AM, Ben Bacarisse wrote:
> >>>> "Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
> >>>>
> >>>>> On 03.04.2019 17:10, james...@alumni.caltech.edu wrote:
> >>>>>>
> >>>>>> More realistically, an implementation is allowed to, for example, look
> >>>>>> at the expressions array[0][i] and array[1][j], and assume that neither
> >>>>>> expression will be evaluated in a context where i and j have values that
> >>>>>> are out of range, and that it is therefore unnecessary to consider the
> >>>>>> possibility that they alias the same location. Since the behavior would
> >>>>>> be undefined in those cases, ignoring that possibility would not rende
> >>>>>> the implementation non-conforming.
> >>>>>
> >>>>> Thanks, now it appears to be clear where the nonsense comes from,
> >>>>> namely the "optimization" in the GCC compiler.
> >>>>
> >>>> That's unlikely. The wording in the C standard predates gcc.
> >>
> >> The last line above assumes the interpretation that it seeks to prove.
> >
> > How? In the last line, he's attempting to prove that it's unlikely that
> > gcc influenced the committee's decision.
>
> Well, keep in mind that we were talking about two opposing
> interpretations of a piece of the standard, not that wording itself.
>
> The circular assumption is that the committee's intent, what it tried to
> express, was the favored interpretation.

He made no use of that assumption in the line you identified as
containing a circular argument. Regardless of what the committee's
intent was, if gcc did not exist at the time (which is the false
assumption he did make), then gcc could not have possibly influenced the
committee's decision on the matter. While gcc did exist, it was very
young at that time, so it would still be unlikely to have much influence
on the committee's decision. Again, that is true, regardless of what the
committee's actual intent was.

> Without that circular assumption the above would express in a
> Spock-incompatible way that GCC folks could not have made an
> interpretation of wording that already existed, which is a much worse
> fallacy.

I agree that it would be Spock-incompatible, but only because I can't
figure out any way to produce that conclusion from his argument, with or
without the circular assumption you incorrectly claim he relied upon.
If, as he incorrectly assumed, gcc did not exist yet, how could the gcc
folks have possibly formed any interpretation at all, of anything?
That's completely independent of what the committee's intent was.

Since gcc did actually exist in 1989 when those words were first
published, they could in fact interpret those words - but because it was
still in it's infancy at that time, gcc was unlikely to have influenced
how those words were written. Even in 1992 when the committee resolved
DR#017, gcc was still so new that it was unlikely to have significantly
influenced the committee's resolution.

> You showed me, I can't speak for others really though I think they'd
> have stated it if they'd known, that some time after C Defect Report 17
> the committee ruled on what the wording should mean, and decided that it

To be specific, the committee made that ruling when it resolved DR017.
You seem to be assuming that the committee used that ruling to change
their mind about what their wording meant. I don't have any details
about their deliberations on the matter, so I can't be sure, but it's at
least as likely that they merely used that ruling to confirm that the
wording had always been meant to be interpreted in that fashion,
particularly since it seems quite clear to me that this was in fact the
meaning of that wording. Note that the resolution of that DR did not
specify any changes to the wording.

> meant that interpretation, that I had been sure was nonsense on grounds
> of practicality. That resolution is definitive for me. If the above had
> referred to the DR's resolution then it would have been roughly valid,
> since presumably that resolution is known to compiler writers

We're talking about the possibility that gcc influenced the way the
words were originally written, or perhaps the decision that was made
when resolving the defect report (it's your claim that they had such
influence, but I'm not clear from your wording which of those two
possibilities you were thinking of). Any such influence, in order to be
effective, would have had to have occurred, at the very latest, before
the resolution of the DR. Therefore, whether or not gcc folk ever read
the resolution is completely irrelevant to that possibility.

> > I don't
> > see any circularity there. He was incorrect about that point, but only
> > by a couple of years; and it is indeed unlikely that the newly released
> > compiler was already sufficiently influential to control the committee's
> > decisions. It's not impossible, as it would have been if he'd been
> > right, but it's still implausible.
>
> Again, keep in mind that we were talking about two opposing
> interpretations of a piece of the standard, not that wording itself.

We're also talking which of those interpretations is consistent with the
wording. I've seen many explanations of the other interpretation, but
I've never seen an explanation that was consistent with the actual
words.
The key point is that, given
int a[4][5];
The array a has only 4 elements, of the type int[5]. The array a[1] has
only 5 elements, of type int. The expression a[1][7] is equivalent to
*(*(a+1)+7).

Applying the wording in n4762.pdf:
"If the expression P points to element x[i] of an array object x with n
elements, the expressions P + J and J + P (where J has the value j)
point to the (possibly-hypothetical) element x[i + j] if
0 <= i + j <= n; otherwise, the behavior is undefined."

The relevant expression is *(a+1)+7, for which P is "*(a+1)", J is "7",
x is a[1], i is 0, n is 5, and j is 7, thereby violating the requirement
for defined behavior.
Can you identify what P, J, x, i, n, and j are for an interpretation of
these words that does not violate that requirement? A popular choice is
to interpret x as referring to "a". The problem is that P does not
point at any element of a itself, it points into a[1], but it points at
a[1][0], which is an element of a[1], but not an element of a itself.
If you think "point at" could be interpreted as including "point into",
then i=1, and x[i+j] would be a[8], which does not match anyone's claims
for the location that this expression points at, and would violate the
relevant requirement just as badly.

What you'd really have to do is justify identifying *(int(*)[20])a as
the array that you're referring to (I've seen people give arguments
equivalent to that approach) - but the standard doesn't define the
behavior of that expression.

...
> > 2. The particular clauses from the C standard that the C committee used
> > as a basis for it's decision on DR017 were copied into the C++ standard
> > with modifications, rather than being incorporated by reference.
> >
> > This is not particularly unusual - the only significant part of the C
> > standard that was incorporated into the C++ standard by reference was
> > the description of the C standard library. Section 1.7, "Normative
> > references", includes references to the C99 standard and to the next
> > three Technical Corrigenda to that standard, but referring to a standard
> > does not mean incorporating it by reference.
>
> I was referring to
>
> C++14 17.5.1.5
> [quote]
> Paragraphs labeled “See also:” contain cross-references to the relevant
> portions of this International Standard and the ISO C standard, which is
> incorporated into this International Standard by reference.
> [/quote]

In the latest draft that I have access to, the closest equivalent to
that wording is in section 15.4.1.5, titled "C library".

> It was there from the beginning, but in C++17 the part after the comma
> has been removed.

I hadn't realized that there'd been such a change in the relevant
wording. At work, where I am now, I refer to n4762.pdf, which doesn't
use the phrase "incorporated by reference" anywhere. At home I've got a
copy of an older draft, which uses that phrase in exactly one location.
The newer draft does, however, say, in 15.2p2, that

"The descriptions of many library functions rely on the C standard
library for the semantics of those functions. In some cases, the
signatures specified in this document may be different from the
signatures in the C standard library, and additional overloads may be
declared in this document, but the behavior and the preconditions
(including any preconditions implied by the use of an ISO C restrict
qualifier) are the same unless otherwise stated."

which has essentially the same meaning as the older version, even if it
doesn't use the phrase "incorporated by reference". Does your copy have
comparable wording, somewhere?

> Anyway it was in a non-normative section of the standard.

I'm curious - what renders it non-normative? Notes, examples, and
footnotes are non-normative, and the same is true of any part of the
standard which is explicitly labelled "informative". The rest of the
standard is normative. Which of these cases applies to 17.5.1.5 in your
copy?

Alf P. Steinbach

unread,
Apr 5, 2019, 2:55:00 PM4/5/19
to
At that time your circular reasoning was first posted it was a pure
assumption, one that was not made more likely by any supporting facts
known at that time, because there were none.

At the time we didn't know that the committee after C Defect Report 17
decided on the assumed interpretation.

It's still not a known fact that this resolution was the original
intent, but given the now known resolution of C Defect Report 17 it's a
reasonable assumption; it would be quite unreasonable to believe otherwise.

Repeating an invalid argument, or an argument that isn't accepted, as
you did now, is a fallacy known as Argumentum ad Nauseam. In English
that's argument by repetition. So you managed to post /two/ fallacies in
one sentence.

From experience it's not unlikely that this thread will now devolve
into even more fallacies posted, e.g. ad Ad Hominem is commonly used,
and regarding the question of whether the endless stream of fallacies
are really fallacies,the Argumentum ad Populum fallacy is not uncommon.

It's a stupid social game that's all about appearances. The facts have
been cleared up.


[snip]

Cheers!,

- Alf

Alf P. Steinbach

unread,
Apr 5, 2019, 3:06:47 PM4/5/19
to
On 05.04.2019 20:41, james...@alumni.caltech.edu wrote:
> [snip]
> If, as he incorrectly assumed, gcc did not exist yet, how could the gcc
> folks have possibly formed any interpretation at all, of anything?

Please, since you ask I feel obliged to answer.

I am in no way trying to patronize you or talk down to you, and I'm
aware that it's Friday evening, when e.g. many people have a good time
along with friends perhaps with a beer or two, or wine, like that, so
that one could easily ask a question that, after deeper consideration,
one would realize had a really trivial answer.

So, please accept my sincere apologies for answering this question; it's
asked, I give the trivial answer, but I do not mean this as a put-down
or anything, and in particular I do not mean it as an argument that the
simple physical possibility implies that that was what happened.

This is the answer: anyone could form an interpretation of the wording
/at any time/ since it was written.

And I've already mentioned that, up-thread.


Cheers & hth.,

- Alf

David Brown

unread,
Apr 5, 2019, 3:21:06 PM4/5/19
to
I did not make the post that you called "circular reasoning". And that
post /was/ supported by facts - the C standards here haven't changed,
and the C++ standards copied them. Maybe you were not aware of the
facts - that does not change them.

> At the time we didn't know that the committee after C Defect Report 17
> decided on the assumed interpretation.
>
> It's still not a known fact that this resolution was the original
> intent, but given the now known resolution of C Defect Report 17 it's a
> reasonable assumption; it would be quite unreasonable to believe otherwise.
>
> Repeating an invalid argument, or an argument that isn't accepted, as
> you did now, is a fallacy known as Argumentum ad Nauseam. In English
> that's argument by repetition. So you managed to post /two/ fallacies in
> one sentence.
>

In English, when you get things wrong, it is called "being wrong". When
you try - incorrectly - to label other people's correct posts as some
sort of logical fallacy, it is called "being a smart-arse". It does not
make you right.

> From experience it's not unlikely that this thread will now devolve
> into even more fallacies posted, e.g. ad Ad Hominem is commonly used,
> and regarding the question of whether the endless stream of fallacies
> are really fallacies,the Argumentum ad Populum fallacy is not uncommon.
>

You were wrong. It's time to accept that, and learn from it - not to
play silly-buggers with word games. You didn't know the history here,
and made invalid interpretations of the wording in the standards.
That's okay - now you know better. Add it to the impressive range of
things that you /do/ know about C++ - you are a good teacher in this
group, but sometimes you need to learn things too.

> It's a stupid social game that's all about appearances. The facts have
> been cleared up.
>

Good.

Manfred

unread,
Apr 5, 2019, 3:41:30 PM4/5/19
to
On 4/5/19 8:54 PM, Alf P. Steinbach wrote:
> Repeating an invalid argument, or an argument that isn't accepted, as
> you did now, is a fallacy known as Argumentum ad Nauseam. In English
> that's argument by repetition. So you managed to post /two/ fallacies in
> one sentence.
>
> From experience it's not unlikely that this thread will now devolve
> into even more fallacies posted, e.g. ad Ad Hominem is commonly used,
> and regarding the question of whether the endless stream of fallacies
> are really fallacies,the Argumentum ad Populum fallacy is not uncommon.
>
> It's a stupid social game that's all about appearances. The facts have
> been cleared up.
>

Ad maiora semper!

>
> [snip]
>
> Cheers!,
>
> - Alf

james...@alumni.caltech.edu

unread,
Apr 5, 2019, 5:37:49 PM4/5/19
to
On Friday, April 5, 2019 at 3:06:47 PM UTC-4, Alf P. Steinbach wrote:
> On 05.04.2019 20:41, james...@alumni.caltech.edu wrote:
> > [snip]
> > If, as he incorrectly assumed, gcc did not exist yet, how could the gcc
> > folks have possibly formed any interpretation at all, of anything?
>
> Please, since you ask I feel obliged to answer.
>
> I am in no way trying to patronize you or talk down to you, and I'm

You don't come across as patronizing or talking down - you come across
as some one who has failed to understand the arguments he's responding
to.

...
> This is the answer: anyone could form an interpretation of the wording
> /at any time/ since it was written.

That trivial answer is completely inapplicable to the context - he
incorrectly assumed that "gcc folk" didn't exist yet at the relevant
time. An organization that didn't exist yet could not possibly have
influenced either the writing of those words or the committee's
resolution of the DR, and the fact that they could (and almost certainly
did) read those words at some later time and form an interpretation of
them is completely irrelevant to the point he was making.

Let me reinstate some preceding context:
On 4/4/19 6:44 AM, Ben Bacarisse wrote:
> "Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
>
>> On 03.04.2019 17:10, james...@alumni.caltech.edu wrote:
...
>> Thanks, now it appears to be clear where the nonsense comes from,
>> namely the "optimization" in the GCC compiler.
>
> That's unlikely. The wording in the C standard predates gcc.

He was wrong about the C standard pre-dating gcc, and he admitted as
such as soon as I pointed it out. But you didn't choose to object on
those grounds - instead, you claimed that:

"Without that circular assumption the above would express in a
Spock-incompatible way that GCC folks could not have made an
interpretation of wording that already existed, which is a much worse
fallacy."

The "nonsense" you referred to was an interpretation based upon the
actual wording of the C90 standard, which was confirmed as being the
correct interpretations of those words by the committee's resolution for
DR#017 in 1992. Therefore, the only way that this nonsense could have
come from 'the "optimization" of the gcc compiler' is if that compiler
were in existence early enough for its optimization strategies to
influence either the wording of the C90 standard, or the committee's
decision on DR#017. All that Ben was pointing out is that, since he
believed that gcc wasn't around at that time, it couldn't have
influenced either of those things.

His comment had nothing to do with the "circular assumption" you've
identified, namely that the C committee's intent when writing that
wording was that it be interpreted in exactly the same fashion that the
committee later endorsed in it's resolution to DR#017. The committee
might or might not have changed it's mind about the meaning of that
wording between they time that they wrote it and the time that they
resolved that DR - but that doesn't matter. Either way, Ben's statement
would have been equally accurate - if he'd been right about when gcc was
created.

Incidentally, I don't consider that to be a circular assumption. I
believe that the committee may have intended from the very beginning
that the words that it wrote have the meaning that it later confirmed. I
don't believe that in a circular fashion, I believe it because I've read
those words, and consider the meaning that was later confirmed by the
committee to be the only one supported by that wording. Therefore, I
consider it entirely plausible, though not certain, that the committee
intended those words to have that meaning from the very beginning.

Sure, they might have originally intended those words to have a
different meaning, and only realized later that the actual words that
they had written had a meaning different from their original intent, one
that was, entirely by accident, better (in their opinion, not yours)
than their original intent. That's not an impossible sequence of events,
but I consider it a pretty unlikely one.

Alf P. Steinbach

unread,
Apr 5, 2019, 10:33:43 PM4/5/19
to
On 05.04.2019 23:37, james...@alumni.caltech.edu wrote:
> On Friday, April 5, 2019 at 3:06:47 PM UTC-4, Alf P. Steinbach wrote:
>> On 05.04.2019 20:41, james...@alumni.caltech.edu wrote:
>>> [snip]
>>> If, as he incorrectly assumed, gcc did not exist yet, how could the gcc
>>> folks have possibly formed any interpretation at all, of anything?
>>
>> Please, since you ask I feel obliged to answer.
>>
>> I am in no way trying to patronize you or talk down to you, and I'm
>
> You don't come across as patronizing or talking down - you come across
> as some one who has failed to understand the arguments he's responding
> to.
>
> ...
>> This is the answer: anyone could form an interpretation of the wording
>> /at any time/ since it was written.
>
> That trivial answer is completely inapplicable to the context - he
> incorrectly assumed that "gcc folk" didn't exist yet at the relevant
> time.

Yes, his argument totally lacked any logic.


> An organization that didn't exist yet could not possibly have
> influenced either the writing of those words or the committee's
> resolution of the DR,

Right.


> and the fact that they could (and almost certainly
> did) read those words at some later time and form an interpretation of
> them is completely irrelevant to the point he was making.

Right again, it was a fallacious argument, one with no inner logic.

Cheers!,

- Alf

Ben Bacarisse

unread,
Apr 6, 2019, 6:12:35 AM4/6/19
to
"Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
<cut>
> Yes, his argument totally lacked any logic.

I think I am the he of whom you speak, but I don't see any value in
relying to this specific point. I am more interested in the original
opinion that either the wording of the C standard or the later DR
clarifying the intent comes from gcc's optimisations (either planned or
implemented). Is that still your view (or the view of any other
readers), and if it is, is there any evidence for it?

--
Ben.

Alf P. Steinbach

unread,
Apr 6, 2019, 9:44:18 AM4/6/19
to
Far upthread we were discussing two ~opposite interpretations of the
standard, and when optimization (essentially through assuming
non-aliased pointer results of expressions) was mentioned as a rationale
for the UB interpretation, I thought it likely that the UB
interpretation of the wording came from the GCC optimization team.
They're into things like that, in my view.

Later, with C Defect Report 17 pointed out, it became clear that I was
wrong about which interpretation was correct.

Then I noted that, and also that I no longer believed it came from GCC,
but that the mindset behind such an interpretation is similar: the
mindset of compiler developers who want to make their task somewhat
easier, with less analysis necessary to do optimization, at the cost of
carving out UB traps all over the place for language users to fall into.


Cheers!,

- Alf

Ben Bacarisse

unread,
Apr 6, 2019, 11:04:03 AM4/6/19
to
"Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:

> On 06.04.2019 12:12, Ben Bacarisse wrote:
>> "Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
>> <cut>
>>> Yes, his argument totally lacked any logic.
>>
>> I think I am the he of whom you speak, but I don't see any value in
>> relying to this specific point. I am more interested in the original
>> opinion that either the wording of the C standard or the later DR
>> clarifying the intent comes from gcc's optimisations (either planned or
>> implemented). Is that still your view (or the view of any other
>> readers), and if it is, is there any evidence for it?
>
> Far upthread we were discussing two ~opposite interpretations of the
> standard, and when optimization (essentially through assuming
> non-aliased pointer results of expressions) was mentioned as a
> rationale for the UB interpretation, I thought it likely that the UB
> interpretation of the wording came from the GCC optimization
> team. They're into things like that, in my view.

Yes, I got that. I was curious as to what, if any, was the evidence for
your (now changed) view.

> Later, with C Defect Report 17 pointed out, it became clear that I was
> wrong about which interpretation was correct.

Yes, I got that too.

> Then I noted that, and also that I no longer believed it came from
> GCC, but that the mindset behind such an interpretation is similar:
> the mindset of compiler developers who want to make their task
> somewhat easier, with less analysis necessary to do optimization, at
> the cost of carving out UB traps all over the place for language users
> to fall into.

Ah, I missed that change. It appears to be a same view except that gcc,
specifically, is off the hook. It's compilers in general driving the
inclusion of UB traps. What, if any, is the evidence for that?

That's probably right in a rather limited sense. For example, the
undefined and implementation defined behaviour of the shift operations
are clearly intended to allow "optimisation" in the sense of avoiding
the need for extra code to implement a kind of shift not supported by
the hardware. But in reality it's the hardware design that's driving
the UB, but really the compilers that target that hardware.

My gut feeling about making sub-array index overflow undefined is that
it may have been driven by a desire to allow cheap indexing using
offsets on segmented architectures. I've got no evidence for this.

--
Ben.

David Brown

unread,
Apr 6, 2019, 11:34:28 AM4/6/19
to
I am not convinced this view applies to many of the undefined behaviours
in C. It certainly applies to a lot of the implementation defined
behaviours, and unspecified behaviours. These give the compiler writer
the freedom to match operations to suit their target.

But undefined behaviours are, AFAICS, mainly for things that don't
actually make sense. Making them UB tells programmers "don't write
nonsensical code", and it tells compiler writers "you can assume code is
sensible". The result of this is clearer code for the programmer who
sticks to the rules as no one needs to suspect them of doing weird
things, like running outside of arrays. The compiler can make
optimisations based on these, such as knowing that if you add two
positive integers, the result will still be positive, and that you can
re-arrange expressions based on normal mathematical rules. And the
compiler (and debuggers, sanitizers, etc.) can give better warnings and
aid the developer because code errors such as integer overflow are
clearly bugs.

So as I see it, UB is an advantage to me, as a programmer. The only
real problem with it is people who think they "know" how particular UB
is supposed to work, who think they "know" what the standards committee
really meant to write (as though the committee were so careless as to
leave such "mistakes" in multiple generations of the standards), and who
think "it worked when I tried it" is good enough.

>
> My gut feeling about making sub-array index overflow undefined is that
> it may have been driven by a desire to allow cheap indexing using
> offsets on segmented architectures. I've got no evidence for this.
>

I think that would have lead to a lot of challenges when accessing the
array as a whole using char* pointers (or memcpy).

My own gut feeling (with no evidence either) is that this is simply a
matter of consistency. If you have "int a[5][4];", then "a[1]" is an
array of 4 int's - and you are not allowed to index an array of 4 int's
outside of 0 .. 3. This gives you a simple and consistent rule.

Alf P. Steinbach

unread,
Apr 6, 2019, 12:05:26 PM4/6/19
to
On 06.04.2019 17:03, Ben Bacarisse wrote:
> [snip]
> That's probably right in a rather limited sense. For example, the
> undefined and implementation defined behaviour of the shift operations
> are clearly intended to allow "optimisation" in the sense of avoiding
> the need for extra code to implement a kind of shift not supported by
> the hardware. But in reality it's the hardware design that's driving
> the UB, but really the compilers that target that hardware.

I agree regarding shift operations.

But with the multi-dimensional arrays it's not hardware driving the UB.
The sizeof requirements guarantee contiguous multi-D arrays, which must
be accessible as contiguous via byte pointers, so to place sub-arrays in
different segments you'd need run-time sized and allocated segments with
byte level size granularity, somehow appearing as contiguous memory to
byte pointers, and I just don't see that.

One extreme example of UB for the ultra-marginal convenience of compiler
writers, at possibly very high expense to language users: in C++03 it
was undefined behavior to have a decimal integer literal to large for
type `long`. Undefined behavior is at run-time so the compiler could
within its rights produce a gotcha-not-what-you-expected-moo-hah
program. In C++17 such a literal that is too large for the type that
otherwise would be used, causes the source code to be ill-formed, which
requires a diagnostic.


> My gut feeling about making sub-array index overflow undefined is that
> it may have been driven by a desire to allow cheap indexing using
> offsets on segmented architectures. I've got no evidence for this.

Sorry for repeating text, but the quoting sort of requires it:

The sizeof requirements guarantee contiguous multi-D arrays, which must
be accessible as contiguous via byte pointers, so to place sub-arrays in
different segments you'd need run-time sized and allocated segments with
byte level size granularity, somehow appearing as contiguous memory to
byte pointers, and I just don't see that.


Cheers!,

- Alf

Alf P. Steinbach

unread,
Apr 7, 2019, 5:21:08 AM4/7/19
to
On 06.04.2019 18:05, Alf P. Steinbach wrote:
> On 06.04.2019 17:03, Ben Bacarisse wrote:
>> [snip]
>> That's probably right in a rather limited sense.  For example, the
>> undefined and implementation defined behaviour of the shift operations
>> are clearly intended to allow "optimisation" in the sense of avoiding
>> the need for extra code to implement a kind of shift not supported by
>> the hardware.  But in reality it's the hardware design that's driving
>> the UB, but really the compilers that target that hardware.
>
> I agree regarding shift operations.
>
> But with the multi-dimensional arrays it's not hardware driving the UB.
> The sizeof requirements guarantee contiguous multi-D arrays, which must
> be accessible as contiguous via byte pointers, so to place sub-arrays in
> different segments you'd need run-time sized and allocated segments with
> byte level size granularity, somehow appearing as contiguous memory to
> byte pointers, and I just don't see that.
>
> One extreme example of UB for the ultra-marginal convenience of compiler
> writers, at possibly very high expense to language users: in C++03 it
> was undefined behavior to have a decimal integer literal to large for
> type `long`. Undefined behavior is at run-time so the compiler could
> within its rights produce a gotcha-not-what-you-expected-moo-hah
> program. In C++17 such a literal that is too large for the type that
> otherwise would be used, causes the source code to be ill-formed, which
> requires a diagnostic. [snip]

I forgot to mention, the standard library algorithms generally have no
notion of multidimensional arrays.

And so, to e.g. `std::find` something in a multidimensional array, if
one wants to use standard library algorithms then the formal UB for the
natural direct expression means that one must write silly and verbose
workaround code.

The standard-compliant code below, with a work-around for the now
discovered (at least, for me now discovered) UB, reminds me strongly of
the final Microsoft OLE/COM-based "Hello, world!" in a once popular list
of "Hello, world!" on various platforms and in various languages...


-------------------------------------------------------------------------
#include <algorithm>
#include <iostream>
#include <iterator> // std::forward_iterator
#include <optional>
#include <stddef.h> // ptrdiff_t
#include <string>
#include <tuple> // std::tie
#include <type_traits> // std::(extent, remove_all_extents)
#include <utility> // std::(begin, end)

using Size = ptrdiff_t;
using Index = ptrdiff_t;

struct Position{ Index row; Index col; };

auto operator==( const Position& a, const Position& b )
-> bool
{ return std::tie( a.row, a.col ) == std::tie( b.row, b.col ); }

// My preferred way, no standard library algorithm, just straight looping:
namespace natural_loop
{
using std::optional;

template< class Table, class Item >
auto find_in( const Table& table, const Item x )
-> optional<Position>
{
for( const auto& row: table ) for( const auto& item: row )
{
if( item == x )
{
return Position{ &row - &table[0], &item - &row[0] };
}
}
return {};
}
} // namespace natural_loop

// Using the std::find algorithm instead of explicit looping.
// Not permitted by the Undefined Behavior for traversing across
sub-array boundaries...
namespace natural_stdalgo
{
using std::extent_v, std::find, std::optional;

template< class Table, class Item >
auto find_in( const Table& table, const Item x )
-> optional<Position>
{
const auto& first = table[0][0];
const Size n_items = sizeof( table )/sizeof( first );

const auto p_item = find( &first, &first + n_items, x );
if( p_item == &first + n_items )
{
return {};
}
const Index i = p_item - &first;
return Position{ i/Size(extent_v<Table, 1>), i %
Size(extent_v<Table, 1>) };
}
} // namespace natural_stdalgo

// Extreme contortions in order to use the standard library's algorithm
without UB:
namespace compliant_stdalgo
{
using
std::extent_v, std::forward_iterator_tag, std::iterator,
std::optional,
std::random_access_iterator_tag, std::remove_all_extents_t,
std::tie;

template< class Any_dimensional_array >
struct Array_item_t_
{
using T = remove_all_extents_t<Any_dimensional_array>;
};

template< class Any_dimensional_array >
struct Array_item_t_<const Any_dimensional_array>
{
using T = const remove_all_extents_t<Any_dimensional_array>;
};

template< class Any_dimensional_array >
using Array_item_ = typename Array_item_t_<Any_dimensional_array>::T;

template< class Table >
using Table_iterator_types_ = iterator<
forward_iterator_tag, // iterator_category
Array_item_<Table>, // value_type
Size, // difference_type
Array_item_<Table>*, // pointer
Array_item_<Table>& // reference
>;

template< class Table >
class Table_iterator_
: public Table_iterator_types_<Table>
{
Table* m_p_table;
Position m_position;

public:
using typename Table_iterator_types_<Table>::pointer;
using typename Table_iterator_types_<Table>::reference;

auto position() const -> Position { return m_position; }

void advance()
{
++m_position.col;
if( m_position.col == extent_v<Table, 1> )
{
m_position.col = 0;
++m_position.row;
}
}

auto operator++()
-> Table_iterator_
{
advance();
return *this;
}

auto operator++( int )
-> Table_iterator_
{
auto original = *this;
advance();
return original;
}

auto operator*() const
-> reference
{ return (*m_p_table)[m_position.row][m_position.col]; }

auto operator->() const
-> pointer
{ return &operator*(); }

friend auto operator==( const Table_iterator_& a, const
Table_iterator_& b )
-> bool
{ return tie( a.m_p_table, a.m_position ) == tie( b.m_p_table,
b.m_position ); }

friend auto operator!=( const Table_iterator_& a, const
Table_iterator_& b )
-> bool
{ return not( a == b ); }

Table_iterator_( const Table& table, const Index row, const
Index col ):
m_p_table( &table ),
m_position{ row, col }
{}

static auto to_first_of( Table& table )
-> Table_iterator_
{ return Table_iterator_( table, 0, 0 ); }

static auto to_beyond( Table& table )
-> Table_iterator_
{ return Table_iterator_( table, extent_v<Table>, 0 ); }
};

template< class Table, class Item >
auto find_in( Table& table, const Item x )
-> optional<Position>
{
using It = Table_iterator_<Table>;
const auto it_item = find( It::to_first_of( table ),
It::to_beyond( table ), x );
if( it_item == It::to_beyond( table ) ) { return {}; }
return it_item.position();
}
} // namespace compliant_stdalgo

auto main() -> int
{
using namespace std;

const int numbers[2][10] =
{
{ 3, 1, 4, 1, 5, 9, 2, 6, 5, 4 },
{ 2, 7, 1, 8, 2, 8, 1, 8, 2, 8 }
};

if( const optional<Position> pos = natural_loop::find_in( numbers,
8 ) )
{
cout << "Natural loop: " << pos->row << ", " << pos->col << "."
<< endl;
}

if( const optional<Position> pos = natural_stdalgo::find_in(
numbers, 8 ) )
{
cout << "Natural stdalgo: " << pos->row << ", " << pos->col <<
"." << endl;
}

if( const optional<Position> pos = compliant_stdalgo::find_in(
numbers, 8 ) )
{
cout << "Compliant stdalgo: " << pos->row << ", " << pos->col
<< "." << endl;
}
}
-------------------------------------------------------------------------


As I see it the compiler writer's reduced burden of implementing a very
rare optimization that probably doesn't ever produce a significant
effect, is not worth the complexity needed to work around the UB.

So I was reluctant to accept the idea, but C Defect Report 17 clinched
it: I was wrong and the standard is actually that perverse.


Cheers!

- Alf

Melzzzzz

unread,
Apr 7, 2019, 6:44:14 AM4/7/19
to
What's wrong with just iterating and calling find for each row?
>
> - Alf


--
press any key to continue or any other to quit...

Alf P. Steinbach

unread,
Apr 7, 2019, 7:30:58 AM4/7/19
to
What's wrong with just iterating and not calling ``find` at all, as in
the first example above?

Nothing; it is, as I noted, what I would do for this example.

The example is simple in order to not introduce any complexity unrelated
to the point it aims for. It shows the contortions one must/can go
through to get compliant code for use of a standard library algorithm on
a multidimensional array, for a near simplest possible case.

And as such it shows, IMO, that the rules for multidimensional arrays
are in practice incompatible with standard library algorithms.

But maybe others have some more clever ways to apply the standard
algorithms; never say never (James Bond).


Cheers!,

- Alf

Paavo Helde

unread,
Apr 7, 2019, 2:05:22 PM4/7/19
to
On 7.04.2019 14:30, Alf P. Steinbach wrote:
> And as such it shows, IMO, that the rules for multidimensional arrays
> are in practice incompatible with standard library algorithms.
>
> But maybe others have some more clever ways to apply the standard
> algorithms; never say never (James Bond).

I hope you all understand that multi-dimensional C arrays are mostly an
academic topic because they would require knowing the sizes of at least
some dimensions at compile time. This happens pretty rarely in RL, maybe
in some specific applications.

In the high performance image-processing software libraries developed by
our firm I do not recall a multi-dimensional C array used, ever.

Ben Bacarisse

unread,
Apr 7, 2019, 8:33:42 PM4/7/19
to
Paavo Helde <myfir...@osa.pri.ee> writes:

> On 7.04.2019 14:30, Alf P. Steinbach wrote:
>> And as such it shows, IMO, that the rules for multidimensional arrays
>> are in practice incompatible with standard library algorithms.
>>
>> But maybe others have some more clever ways to apply the standard
>> algorithms; never say never (James Bond).
>
> I hope you all understand that multi-dimensional C arrays are mostly
> an academic topic because they would require knowing the sizes of at
> least some dimensions at compile time. This happens pretty rarely in
> RL, maybe in some specific applications.

Since this is comp.lang.c++, you are right, but multi-dimensional C
arrays in C need not have dimensions known at compile-time. Even when
variable length arrays are not used, C99's variably modified array types
can be used to simplify multi-dimensional array handling.

<cut>
--
Ben.

Paavo Helde

unread,
Apr 8, 2019, 12:34:27 AM4/8/19
to
And in C++ it is pretty easy e.g. to wrap a 1D array/vector into a class
providing element access via a T& operator()(size_t x, size_t y), if the
goal is to simplify multi-dimensional array handling.

This approach would share the same drawback with multi-dimensional C
arrays in that the index calculation formally happens at each element
access and care must be taken to write algorithms in such a way that the
compiler would be able to optimize it away. In our code we have decided
to leave the index calculation explicit in the algorithms to see the
memory access pattern better, and to be able to bring parts of the index
calculation out of the loop where possible, for helping the optimizer.



James Kuyper

unread,
Apr 8, 2019, 8:14:35 AM4/8/19
to
On 4/5/19 10:33 PM, Alf P. Steinbach wrote:
> On 05.04.2019 23:37, james...@alumni.caltech.edu wrote:
>> On Friday, April 5, 2019 at 3:06:47 PM UTC-4, Alf P. Steinbach wrote:
>>> On 05.04.2019 20:41, james...@alumni.caltech.edu wrote:
>>>> [snip]
>>>> If, as he incorrectly assumed, gcc did not exist yet, how could the gcc
>>>> folks have possibly formed any interpretation at all, of anything?
>>>
>>> Please, since you ask I feel obliged to answer.
>>>
>>> I am in no way trying to patronize you or talk down to you, and I'm
>>
>> You don't come across as patronizing or talking down - you come across
>> as some one who has failed to understand the arguments he's responding
>> to.
>>
>> ...
>>> This is the answer: anyone could form an interpretation of the wording
>>> /at any time/ since it was written.
>>
>> That trivial answer is completely inapplicable to the context - he
>> incorrectly assumed that "gcc folk" didn't exist yet at the relevant
>> time.
>
> Yes, his argument totally lacked any logic.

Incorrect - the logic of his argument was perfectly valid. The only
problem was that one of his premises was incorrect. And it wasn't
sufficiently incorrect to completely invalidate the conclusion. If you
replace the premise with "gcc was quite young at the time the decision
was made", then all you have to do is modify the conclusion slightly -
instead of "it's impossible for gcc to have influenced the decision",
say instead that "it's improbable that gcc significantly influenced the
decision".

The weakened conclusion is still strong enough to justify presenting it.

>> and the fact that they could (and almost certainly
>> did) read those words at some later time and form an interpretation of
>> them is completely irrelevant to the point he was making.
>
> Right again, it was a fallacious argument, one with no inner logic.

In a logical argument, you have premises which are assumed correct, a
conclusion that you're trying to prove, and an argument which is
supposed to connect those premises to that conclusion. A fallacious
argument is one where the logic of that connection is invalid. An
argument where the connection is logically valid, but one or more
premises is incorrect, is not fallacious, it's merely wrong, and that's
the case with this argument.


James Kuyper

unread,
Apr 8, 2019, 8:27:18 AM4/8/19
to
On 4/6/19 9:44 AM, Alf P. Steinbach wrote:
> On 06.04.2019 12:12, Ben Bacarisse wrote:
>> "Alf P. Steinbach" <alf.p.stein...@gmail.com> writes:
>> <cut>
>>> Yes, his argument totally lacked any logic.
>>
>> I think I am the he of whom you speak, but I don't see any value in
>> relying to this specific point. I am more interested in the original
>> opinion that either the wording of the C standard or the later DR
>> clarifying the intent comes from gcc's optimisations (either planned or
>> implemented). Is that still your view (or the view of any other
>> readers), and if it is, is there any evidence for it?
>
> Far upthread we were discussing two ~opposite interpretations of the
> standard, and when optimization (essentially through assuming
> non-aliased pointer results of expressions) was mentioned as a rationale
> for the UB interpretation, ...

I mentioned the optimization for the sole purpose of pointing out that
the fact that such code has undefined behavior is not just a pedantic
quibble - there are plausible mechanisms by which such code could
produce unexpected results.

I have repeatedly tried, and apparently failed, to make it clear to you
that I never said anything to suggest that such optimizations were the
motivation for this rule. I'm not even sure whether any real-world
compiler actually performs such optimizations.

As far as I know, the rule was motivated solely by the belief that
accessing an array outside it's declared length is an inherently
illogical thing to do, which the C standard should discourage in the
only way that it can - by specifying that the behavior of such code is
undefined.

You apparently disagree with that belief, which might explain why you
resist so strongly the idea that it was in fact the motivation for this
rule. But that belief does exist, and a majority of the C committee
apparently shares that belief - not only did they approve the wording
that says so, they also approved the DR that clarified that this is in
fact what that wording means.

> Later, with C Defect Report 17 pointed out, it became clear that I was
> wrong about which interpretation was correct.
>
> Then I noted that, and also that I no longer believed it came from GCC,
> but that the mindset behind such an interpretation is similar: the
> mindset of compiler developers who want to make their task somewhat
> easier, with less analysis necessary to do optimization, at the cost of
> carving out UB traps all over the place for language users to fall into.

The mindset behind this rule, as we keep trying to explain to you, had
nothing to do with the optimization, and everything to do with believing
that such code is inherently illogical. The rule merely enables such
optimizations; enabling them was not the purpose of creating the rule.

Alf P. Steinbach

unread,
Apr 8, 2019, 8:42:19 AM4/8/19
to
Out of context, say as the start of a new thread, it could have been
merely wrong, as an argument merely about the standard's wording.

It was however posted in the context of discussing interpretations of
that wording. With the implication that it was relevant to that
discussion. As such it was a fallacy. /And/ wrong.

james...@alumni.caltech.edu

unread,
Apr 8, 2019, 9:51:12 AM4/8/19
to
On Monday, April 8, 2019 at 8:42:19 AM UTC-4, Alf P. Steinbach wrote:
> On 08.04.2019 14:14, James Kuyper wrote:
...
> > In a logical argument, you have premises which are assumed correct, a
> > conclusion that you're trying to prove, and an argument which is
> > supposed to connect those premises to that conclusion. A fallacious
> > argument is one where the logic of that connection is invalid. An
> > argument where the connection is logically valid, but one or more
> > premises is incorrect, is not fallacious, it's merely wrong, and that's
> > the case with this argument.
>
> Out of context, say as the start of a new thread, it could have been
> merely wrong, as an argument merely about the standard's wording.

It wasn't an argument about the standard's wording at all, which may be
the problem you're having with the argument. It's about influence,
specifically the influence that gcc might have had on the committee's
decisions about the wording.

> It was however posted in the context of discussing interpretations of
> that wording. With the implication that it was relevant to that
> discussion. As such it was a fallacy. /And/ wrong.

The only thing that can make an argument fallacious is a violation of
the rules of logic - context can never turn a logically valid argument
into an invalid one. A valid argument might not be applicable in a given
context, but it remains valid, even in such a context.

The only thing wrong with his argument was that it had a false premise;
the logic itself was perfectly valid. If it's premise had been correct,
his argument would have been entirely relevant. And since his premise
was only slightly incorrect, the weaker, corrected version of his
argument remains entirely relevant.

Alf P. Steinbach

unread,
Apr 8, 2019, 9:57:37 AM4/8/19
to
On 08.04.2019 15:51, james...@alumni.caltech.edu wrote:
> On Monday, April 8, 2019 at 8:42:19 AM UTC-4, Alf P. Steinbach wrote:
>> On 08.04.2019 14:14, James Kuyper wrote:
> ...
>>> In a logical argument, you have premises which are assumed correct, a
>>> conclusion that you're trying to prove, and an argument which is
>>> supposed to connect those premises to that conclusion. A fallacious
>>> argument is one where the logic of that connection is invalid. An
>>> argument where the connection is logically valid, but one or more
>>> premises is incorrect, is not fallacious, it's merely wrong, and that's
>>> the case with this argument.
>>
>> Out of context, say as the start of a new thread, it could have been
>> merely wrong, as an argument merely about the standard's wording.
>
> It wasn't an argument about the standard's wording at all, which may be
> the problem you're having with the argument. It's about influence,
> specifically the influence that gcc might have had on the committee's
> decisions about the wording.

Well, let's formulate it that way, about not the wording itself but
about the influence on the committees decisions that led to the wording.

Which is what I meant.

That means that it's a fallacy, in context.

We're not children.

We are able to reason about context and what an argument is meant to
support.

If it doesn't support anything then it's just nonsense, so one can
choose: nonsense, or fallacy.


Cheers!,

- Alf

Alf P. Steinbach

unread,
Apr 8, 2019, 10:09:38 AM4/8/19
to
On 08.04.2019 15:51, james...@alumni.caltech.edu wrote:
> The only thing that can make an argument fallacious is a violation of
> the rules of logic - context can never turn a logically valid argument
> into an invalid one.

I don't think so, for example because this very response is a
counter-example.


Cheers!,

- Alf

james...@alumni.caltech.edu

unread,
Apr 8, 2019, 10:33:48 AM4/8/19
to
On Monday, April 8, 2019 at 9:57:37 AM UTC-4, Alf P. Steinbach wrote:
> On 08.04.2019 15:51, james...@alumni.caltech.edu wrote:
...
> > It wasn't an argument about the standard's wording at all, which may be
> > the problem you're having with the argument. It's about influence,
> > specifically the influence that gcc might have had on the committee's
> > decisions about the wording.
>
> Well, let's formulate it that way, about not the wording itself but
> about the influence on the committees decisions that led to the wording.

Does that sentence accurately reflect what you were claiming? You
weren't clear about which decision you were accusing gcc of influencing,
but I'd been assuming that you were talking about the committee's
decision on the DR, not their decision on the writing of the wording
itself. There's three reasons I made that assumption:

1. Your comments that this is an issue about the interpretation of the
words, implying that you were under the misapprehension that the other
interpretation was also consistent with the words.

2. Your comments, nonsensical in context, about whether the gcc
developers could have been able to read the words of the first standard,
which was never actually relevant to what he was talking about.

3. It's marginally more plausible that gcc might have significantly
influenced the later decision, because they would have had 3 more years
to accumulate enough influence to do so.

> Which is what I meant.

You meant to claim that gcc influenced the committee's decision. If his
premise had been correct, presenting his argument would have proved you
were incorrect on that point. Because his premise was inaccurate, the
corrected version of his argument still served to point out that your
claim was implausible.

...
> We are able to reason about context and what an argument is meant to
> support.

His argument was meant to support the idea that your claim of gcc
influence did not make sense. The corrected version of his argument
still supports that claim, though slightly less strongly.

> If it doesn't support anything then it's just nonsense, so one can
> choose: nonsense, or fallacy.

Perfectly sensable logically valid arguments can fail to support
anything relevant to the context in which they occur, so if it didn't
support anything, that would not, in itself, justify labeling it as
either nonsense or fallacious. The corrected version of his argument is
neither nonsensical, nor fallacious, nor does it fail to support the
point it was intended to support.

james...@alumni.caltech.edu

unread,
Apr 8, 2019, 10:41:06 AM4/8/19
to
Feel free to identify the specific logical fallacy (or fallacies) that applies. For a starting point, please review <https://en.wikipedia.org/wiki/List_of_fallacies#Formal_fallacies>.
Note, in particular, that none of the items listed under <https://en.wikipedia.org/wiki/List_of_fallacies#Improper_premise> is a fallacy because the relevant premise is false - they are fallacies, regardless of whether the relevant premise is true or false, because of an improper relationship between the nature of the premise and the argument that uses that premise.

Bonita Montero

unread,
Apr 8, 2019, 11:06:36 AM4/8/19
to
>>> The iterator concept dictates that what an iterator refers to never
>>> changes unless the iterator itself is changed.  What you propose would
>>> ruin the iterator concept.

>> But there's no technical necessity for this concept.

> It seems that if you want an way to specify the n'th element of the
> vector, you don't want an iterator, but an numerical index, and use
> indexing operations on the vector.

No, i want to use iterators.

> An iterator IS a concept, and part of that is that the idea that it is
> an extension of a pointer that when working on a sequence always points
> to a GIVEN item, and the invalidation rules tells you when that
> assumption no longer holds.

The invalidation-rules are not necessary here. I gave a nearby implemen-
tation of erase and this implementation doesn't invalidate the iterators
beginning with the erased element. And guaranteeing that the iterators
remain valid doesn't put any restrictions on the implementation.

Alf P. Steinbach

unread,
Apr 8, 2019, 11:14:50 AM4/8/19
to
On 08.04.2019 16:33, james...@alumni.caltech.edu wrote:
> On Monday, April 8, 2019 at 9:57:37 AM UTC-4, Alf P. Steinbach wrote:
>> On 08.04.2019 15:51, james...@alumni.caltech.edu wrote:
> ...
>>> It wasn't an argument about the standard's wording at all, which may be
>>> the problem you're having with the argument. It's about influence,
>>> specifically the influence that gcc might have had on the committee's
>>> decisions about the wording.
>>
>> Well, let's formulate it that way, about not the wording itself but
>> about the influence on the committees decisions that led to the wording.
>
> Does that sentence accurately reflect what you were claiming? You
> weren't clear about which decision you were accusing gcc of influencing,

I never claimed they influenced a decision.

That may be why it's unclear in your mind.


> but I'd been assuming that you were talking about the committee's
> decision on the DR, not their decision on the writing of the wording
> itself.

This looks like an example of false memory.


> There's three reasons I made that assumption:
>
> 1. Your comments that this is an issue about the interpretation of the
> words, implying that you were under the misapprehension that the other
> interpretation was also consistent with the words.

The other interpretation is the only sensible one, e.g. not incompatible
with the standard library's algorithm functions, not adding UB traps.

But we got what we got.

Apparently to marginally ease the life of compiler writers, at cost.


> 2. Your comments, nonsensical in context, about whether the gcc
> developers could have been able to read the words of the first standard,
> which was never actually relevant to what he was talking about.

I guess I lost your train of thought here.

But considering, as noted above, that you apparently started with a
falsehood: anything can be proved from a falsehood.


> 3. It's marginally more plausible that gcc might have significantly
> influenced the later decision, because they would have had 3 more years
> to accumulate enough influence to do so.

No-one would have to influence any decision in order to promote an
interpretation of the wording that resulted from the decision.

Assuming that's necessary is, well, nonsensical.

A time travel view.


>> Which is what I meant.
>
> You meant to claim that gcc influenced the committee's decision.

WTF?


> If his
> premise had been correct, presenting his argument would have proved you
> were incorrect on that point.

You make me laugh.

Thanks. :)


> Because his premise was inaccurate, the
> corrected version of his argument still served to point out that your
> claim was implausible.

Lols.

[snip]

Cheers!,

- Alf

james...@alumni.caltech.edu

unread,
Apr 8, 2019, 12:33:38 PM4/8/19
to
On Monday, April 8, 2019 at 11:14:50 AM UTC-4, Alf P. Steinbach wrote:
> On 08.04.2019 16:33, james...@alumni.caltech.edu wrote:
> > On Monday, April 8, 2019 at 9:57:37 AM UTC-4, Alf P. Steinbach wrote:
...
> >> Well, let's formulate it that way, about not the wording itself but
> >> about the influence on the committees decisions that led to the wording.
> >
> > Does that sentence accurately reflect what you were claiming? You
> > weren't clear about which decision you were accusing gcc of influencing,
>
> I never claimed they influenced a decision.

On Thursday, April 4, 2019 at 1:44:27 AM UTC-4, Alf P. Steinbach wrote:
...
> Thanks, now it appears to be clear where the nonsense comes from,
> namely the "optimization" in the GCC compiler.

At the time you posted that message, I had already posted my message
identifying the C committee's resolution of DR#017 confirming the truth
of that "nonsense". Therefore, the only way you could justify asserting
that the nonsense came from gcc is to imply that gcc influenced the
committee's decision when resolving that DR.

Since it's not clear whether you had yet read that part of my earlier
message at the time your wrote that one, it's not entirely clear whether
that's what you intended. Ben's comment served to remind you that, if
that is what you meant, it didn't make sense. And he was right about
that.

If that isn't what you meant by that comment, saying so would have been
a better response, to any of the subsequent messages on this sub-thread,
than incorrectly claiming that Ben's logic was invalid.

> > There's three reasons I made that assumption:
> >
> > 1. Your comments that this is an issue about the interpretation of the
> > words, implying that you were under the misapprehension that the other
> > interpretation was also consistent with the words.
>
> The other interpretation is the only sensible one, e.g. not incompatible
> with the standard library's algorithm functions, not adding UB traps.

It suffers from just one key problem: it's inconsistent with the wording
of the standard. The key point is that the standard's wording refers to
an array (called "x" in the current version of the C++ standard), and
there is no array declared that can be substituted for "x" which has the
right properties to justify the only interpretation you consider
sensible. And the committee has made it clear that the interpretation
you consider senseless is the one they intended to apply.

> But we got what we got.
>
> Apparently to marginally ease the life of compiler writers, at cost.

It wasn't done for that reason, as you've been repeatedly told. It was
done in the belief that code which violates this rule is logically
flawed. Dispute that belief if you wish, but please stop suggesting,
without supporting evidence, that some other reason was behind this
"senseless" decision. The optimization I mentioned is NOT such
evidence, since I did not identify it as a motivation for the rule.89012

> > 2. Your comments, nonsensical in context, about whether the gcc
> > developers could have been able to read the words of the first standard,
> > which was never actually relevant to what he was talking about.
>
> I guess I lost your train of thought here.

You clearly never had it. He made an argument which did not depend in
any way upon whether or not the original intent of the wording matched
the interpretation that the committee eventually confirmed as correct,
and which did not depend, in any fashion, upon whether or not gcc
evelopers had ever actually read the standard, by claiming:

On Friday, April 5, 2019 at 11:56:44 AM UTC-4, Alf P. Steinbach wrote:
...
> The circular assumption is that the committee's intent, what it tried
> to express, was the favored interpretation.
>
> Without that circular assumption the above would express in a Spock-
> incompatible way that GCC folks could not have made an interpretation
> of wording that already existed, which is a much worse fallacy.

...
> > 3. It's marginally more plausible that gcc might have significantly
> > influenced the later decision, because they would have had 3 more years
> > to accumulate enough influence to do so.
>
> No-one would have to influence any decision in order to promote an
> interpretation of the wording that resulted from the decision.

Someone would have to influence the decision to write the words the way
they were written, in order to make sure that those words were written
in a way consistent with the optimization they wanted to make. They
would have had to influence the decision about how to resolve DR#017, in
order to make sure that it wasn't resolved in a way that repudiated that
interpretation. The only way to justify holding gcc responsible for the
"nonsense" that was confirmed by the committee's resolution of DR#017 is
to imply that they had such influence over at least one of those two
decisions. Which is implausible, though not, as originally implied,
impossible.

> >> Which is what I meant.
> >
> > You meant to claim that gcc influenced the committee's decision.
>
> WTF?

As cited above, Thursday, April 4, 2019 at 1:44:27 AM UTC-4.
It is loading more messages.
0 new messages