Core issue 2182 and pointer arithmetic

593 views
Skip to first unread message

Myriachan

unread,
Sep 8, 2017, 3:47:40 PM9/8/17
to ISO C++ Standard - Discussion
Core issue 2182:

2182. Pointer arithmetic in array-like containers

Section: 5.7  [expr.add]     Status: drafting     Submitter: Jonathan Wakely     Date: 2015-10-20

The current direction for issue 1776 (see paper P0137) calls into question the validity of doing pointer arithmetic to address separately-allocated but contiguous objects in a container like std::vector. A related question is whether there should be some allowance made for allowing pointer arithmetic using a pointer to a base class if the derived class is a standard-layout class with no non-static data members. It is possible that std::launder could play a part in the resolution of this issue.

Notes from the February, 2016 meeting:

This issue is expected to be resolved by the resolution of issue 1776. The major problem is when the elements of the vector contain constant or reference members; 3.8 [basic.life] paragraph 7 implies that pointer arithmetic leading to such an object produces undefined behavior, and CWG expects this to continue. Some changes to the interface of std::vector may be required, perhaps using std::launder as part of iterator processing.



It seems incredible that the direction of the Standard would be toward making pointer arithmetic undefined for objects inside an std::vector just because they have a const member or reference member.


class C {
   
int &member;
   
void f();
   
...
};
...
std
::vector<C> v;
v
.push_back(C());
v
.push_back(C());
v
.push_back(C());
(&v[0] + 2)->f();



This would seriously be undefined by what was stated above?  This is completely ridiculous to me.


First of all, it seems that P0137R1 didn't solve the arithmetic issue, making &v[0] + 2 illegal pointer arithmetic in the first place, because std::vector (most likely) constructed the objects separately.  Second, making this undefined just because of a const or reference nonstatic member would break an unbelievable amount of existing C++ code if this arithmetic were to suddenly require a call to std::launder.


For the first issue, it seems like we should formally define pointer arithmetic as working across adjacent array objects, with individual objects being treated as an array of size 1 for this purpose as usual.  This does have implications, though, because it would allow "bad" code such as the following to be well-defined:


struct S {
   
int a, b, c;
};
...
static_assert(offsetof(S, b) == offsetof(S, a) + sizeof(S::a));
static_assert(offsetof(S, c) == offsetof(S, b) + sizeof(S::b));
S s
;
(&s.a)[2] = 2;
assert(s.c == 2);

If we solve the pointer arithmetic with std::vector, it seems as if this code must either be correct (or be ill-formed because the static_asserts fire).  I don't think that compiler optimizers would appreciate this very much, though.  This seems like the "right" solution to the whole problem to me, but I can see why there would be objections.  (I would propose additionally that char * / unsigned char * / std::byte * be allowed to cross among objects even of different types, so long as the arithmetic remains within the bounds of that allocated storage block...but I'm a radical around here.)



For the second issue, this would be a strongly breaking change - at least, it would be a breaking change in the sense that too much existing code relied upon this even if it were undefined behavior for technical reasons.  I think a lot of C++ programmers would be unhappy that this is suddenly undefined behavior and compilers start emitting code that generates nasal demons.


An option that would technically work would be to require that std::vector allocate its whole collection of objects as an array, meaning that any push_back would necessarily require moving the entire array.  This obviously won't fly for performance reasons.


Thanks, and sorry for the long text,

Melissa

Richard Smith

unread,
Sep 8, 2017, 4:20:07 PM9/8/17
to std-dis...@isocpp.org
On 8 September 2017 at 12:47, Myriachan <myri...@gmail.com> wrote:
Core issue 2182:

2182. Pointer arithmetic in array-like containers

Section: 5.7  [expr.add]     Status: drafting     Submitter: Jonathan Wakely     Date: 2015-10-20

The current direction for issue 1776 (see paper P0137) calls into question the validity of doing pointer arithmetic to address separately-allocated but contiguous objects in a container like std::vector. A related question is whether there should be some allowance made for allowing pointer arithmetic using a pointer to a base class if the derived class is a standard-layout class with no non-static data members. It is possible that std::launder could play a part in the resolution of this issue.

Notes from the February, 2016 meeting:

This issue is expected to be resolved by the resolution of issue 1776. The major problem is when the elements of the vector contain constant or reference members; 3.8 [basic.life] paragraph 7 implies that pointer arithmetic leading to such an object produces undefined behavior, and CWG expects this to continue. Some changes to the interface of std::vector may be required, perhaps using std::launder as part of iterator processing.



It seems incredible that the direction of the Standard would be toward making pointer arithmetic undefined for objects inside an std::vector just because they have a const member or reference member.


class C {
   
int &member;
   
void f();
   
...
};
...
std
::vector<C> v;
v
.push_back(C());
v
.push_back(C());
v
.push_back(C());
(&v[0] + 2)->f();



This would seriously be undefined by what was stated above?

No, I doubt that's the intent; rather, the requisite calls to std:launder would be in the implementation of vector itself. However, this case would be undefined:

std::vector<C> v;
v.reserve(1);
v.push_back(C());
C *p = v.data(); // or "auto p = v.begin();"
v.pop_back();
v.push_back(C());
p->f(); // invalid, that C object went away

This is completely ridiculous to me.


First of all, it seems that P0137R1 didn't solve the arithmetic issue, making &v[0] + 2 illegal pointer arithmetic in the first place, because std::vector (most likely) constructed the objects separately.  Second, making this undefined just because of a const or reference nonstatic member would break an unbelievable amount of existing C++ code if this arithmetic were to suddenly require a call to std::launder.


For the first issue, it seems like we should formally define pointer arithmetic as working across adjacent array objects, with individual objects being treated as an array of size 1 for this purpose as usual.

SG12 is already exploring this direction, but only for adjacent objects whose storage is provided by the same array.

This does have implications, though, because it would allow "bad" code such as the following to be well-defined:


struct S {
   
int a, b, c;
};
...
static_assert(offsetof(S, b) == offsetof(S, a) + sizeof(S::a));
static_assert(offsetof(S, c) == offsetof(S, b) + sizeof(S::b));
S s
;
(&s.a)[2] = 2;
assert(s.c == 2);


... which does not happen in this case, so this would continue to be UB under that approach. 

If we solve the pointer arithmetic with std::vector, it seems as if this code must either be correct (or be ill-formed because the static_asserts fire).  I don't think that compiler optimizers would appreciate this very much, though.  This seems like the "right" solution to the whole problem to me, but I can see why there would be objections.  (I would propose additionally that char * / unsigned char * / std::byte * be allowed to cross among objects even of different types, so long as the arithmetic remains within the bounds of that allocated storage block...but I'm a radical around here.)



For the second issue, this would be a strongly breaking change - at least, it would be a breaking change in the sense that too much existing code relied upon this even if it were undefined behavior for technical reasons.  I think a lot of C++ programmers would be unhappy that this is suddenly undefined behavior and compilers start emitting code that generates nasal demons.


An option that would technically work would be to require that std::vector allocate its whole collection of objects as an array, meaning that any push_back would necessarily require moving the entire array.  This obviously won't fly for performance reasons.


Thanks, and sorry for the long text,

Melissa

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Nicol Bolas

unread,
Sep 8, 2017, 4:35:58 PM9/8/17
to ISO C++ Standard - Discussion
That being undefined, to me, is perfectly valid, but only for certain types of `C`. Namely those mentioned above: types containing references or `const` objects.

This is completely ridiculous to me.


First of all, it seems that P0137R1 didn't solve the arithmetic issue, making &v[0] + 2 illegal pointer arithmetic in the first place, because std::vector (most likely) constructed the objects separately.  Second, making this undefined just because of a const or reference nonstatic member would break an unbelievable amount of existing C++ code if this arithmetic were to suddenly require a call to std::launder.


For the first issue, it seems like we should formally define pointer arithmetic as working across adjacent array objects, with individual objects being treated as an array of size 1 for this purpose as usual.

SG12 is already exploring this direction, but only for adjacent objects whose storage is provided by the same array.

What does "provided by the same array" mean, exactly? Right now, we already have that.

The problem is that we don't allow pointer arithmetic to work across adjacent objects of the same type whose storage is provided by the same allocation.

Richard Smith

unread,
Sep 8, 2017, 5:09:02 PM9/8/17
to std-dis...@isocpp.org
See the definition of "provides storage" here: http://eel.is/c++draft/intro.object#3
 
Right now, we already have that.

The problem is that we don't allow pointer arithmetic to work across adjacent objects of the same type whose storage is provided by the same allocation.

Myriachan

unread,
Sep 8, 2017, 5:35:41 PM9/8/17
to ISO C++ Standard - Discussion
I wouldn't want to be the person who has to answer that question on Stack Overflow...
 

This is completely ridiculous to me.


First of all, it seems that P0137R1 didn't solve the arithmetic issue, making &v[0] + 2 illegal pointer arithmetic in the first place, because std::vector (most likely) constructed the objects separately.  Second, making this undefined just because of a const or reference nonstatic member would break an unbelievable amount of existing C++ code if this arithmetic were to suddenly require a call to std::launder.


For the first issue, it seems like we should formally define pointer arithmetic as working across adjacent array objects, with individual objects being treated as an array of size 1 for this purpose as usual.

SG12 is already exploring this direction, but only for adjacent objects whose storage is provided by the same array.

What does "provided by the same array" mean, exactly? Right now, we already have that.

The problem is that we don't allow pointer arithmetic to work across adjacent objects of the same type whose storage is provided by the same allocation.


The "same array" definition has its own issues.

For one thing, it still prevents optimization in a lot of cases.  Take my S example and instead write it as a function taking a reference:

struct S {
   
int a, b, c;
};
...
static_assert(offsetof(S, b) == offsetof(S, a) + sizeof(S::a));
static_assert(offsetof(S, c) == offsetof(S, b) + sizeof(S::b));

void Function(S &s)
{
    (&s.a)[2] = 2;
    assert(s.c == 2);
}

A compiler cannot assume that this is undefined behavior, because what if we call Function like this?:

alignas(S) unsigned char storage[sizeof(S)];
S *ps = new(storage) S();
Function(*ps);

In this case, the compiler would not always be able to assume undefined behavior, because the reference "s" could be entirely backed by one single array, meeting the proposed conditions for pointer arithmetic.

In addition to this issue, I would say that we are better off considering all objects to be additionally considered to be overlaid on top of an array of chars/unsigned chars/std::bytes.  Without this concession, offsetof() is basically useless.

Melissa

Nicol Bolas

unread,
Sep 8, 2017, 5:36:45 PM9/8/17
to ISO C++ Standard - Discussion
Right, but the allocator functions don't create objects "of type “array of N unsigned char” or of type “array of N std​::​byte”". Nor does `std::aligned_storage/union_t`. So neither can "provide storage" for an object under those rules. You can certainly create objects in that storage. But that won't be the same as "provide storage".

Unless you're saying that `vector` has to allocate memory, then do `new() char[]` on the allocation, and only then perform construction on any types in the memory. Or unless you're saying that every allocation of memory, every object, is also an array of bytes in addition to being whatever it currently is.

Myriachan

unread,
Sep 8, 2017, 8:25:48 PM9/8/17
to ISO C++ Standard - Discussion
I know you're replying to Richard, but I personally would say that every object of type T ought to be considered an array of characters of size sizeof(T).  However, that definition implies that a fix to the std::vector would also allow the shenanigans I mentioned above.

I think that the permissive route is better overall, but it does impede some optimizations.

Melissa

Richard Smith

unread,
Sep 8, 2017, 8:27:32 PM9/8/17
to std-dis...@isocpp.org
Part of the relevant changes would be specifying that ::operator new and malloc do in fact create such an array object; this has the nice side-effect of guaranteeing that pointer arithmetic on dynamically allocated storage actually works.

Unless you're saying that `vector` has to allocate memory, then do `new() char[]` on the allocation, and only then perform construction on any types in the memory. Or unless you're saying that every allocation of memory, every object, is also an array of bytes in addition to being whatever it currently is.

Richard Smith

unread,
Sep 8, 2017, 8:32:41 PM9/8/17
to std-dis...@isocpp.org
There's a delicate balance here. On the one hand, we would like C++ to support the low-level memory operations necessary to implement something like vector, and on the other hand, we would like C++ to support high-level semantics in which abstract reasoning about the behavior of a UB-free program can be performed.

Finding a middle ground is not simple, but in this case one does seem like it might be available (allowing vector but not your adjacent fields case), and that's the direction that I'm currently pursuing with encouragement from SG12. If that doesn't work out, maybe a blunter instrument will be warranted.
 
I think that the permissive route is better overall, but it does impede some optimizations.

Melissa

--

Nicol Bolas

unread,
Sep 8, 2017, 9:05:01 PM9/8/17
to ISO C++ Standard - Discussion
But that also means that pointer arithmetic on non-dynamically allocated storage does not work. Which means `vector` only works if it uses memory allocated by `::operator new` or `malloc`. If you have some static storage, I guess it had better already be a byte array.

And what of `std::aligned_storage/union_t`? Are those now required to be byte arrays?


Unless you're saying that `vector` has to allocate memory, then do `new() char[]` on the allocation, and only then perform construction on any types in the memory. Or unless you're saying that every allocation of memory, every object, is also an array of bytes in addition to being whatever it currently is.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.

Nicol Bolas

unread,
Sep 8, 2017, 9:38:16 PM9/8/17
to ISO C++ Standard - Discussion
Here's the thing though. If all allocations (dynamic, automatic, static, whatever) are not byte arrays, then that means you cannot perform byte-pointer arithmetic on them to move from pointer to pointer. Which means `offsetof` is useless. This makes many existing forms of automatic serialization (that is, iterating through the subobjects of a type) unworkable.

But if all objects everywhere really are byte arrays, then the definition of pointer arithmetic you want (based on "provides storage" wording) can just as easily be applied to any contiguous sequence of objects of the same type, so long as they're in the same storage allocation.

Where is the middle ground here? It seems like you either make automatic serialization impossible (ignoring what reflection might bring), or you make jumping between contiguous members well-defined.
 
 
I think that the permissive route is better overall, but it does impede some optimizations.

Melissa

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.

Richard Smith

unread,
Sep 8, 2017, 9:39:34 PM9/8/17
to std-dis...@isocpp.org
They should be moved to Annex D.
 
Unless you're saying that `vector` has to allocate memory, then do `new() char[]` on the allocation, and only then perform construction on any types in the memory. Or unless you're saying that every allocation of memory, every object, is also an array of bytes in addition to being whatever it currently is.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.


--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.

Richard Smith

unread,
Sep 8, 2017, 9:40:49 PM9/8/17
to std-dis...@isocpp.org
The middle ground is that this only applies to a contiguous sequence of *complete* objects.
 
I think that the permissive route is better overall, but it does impede some optimizations.

Melissa

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.


--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.

Nicol Bolas

unread,
Sep 8, 2017, 11:56:32 PM9/8/17
to ISO C++ Standard - Discussion
On Friday, September 8, 2017 at 9:39:34 PM UTC-4, Richard Smith wrote:
On 8 September 2017 at 18:05, Nicol Bolas <jmck...@gmail.com> wrote:
On Friday, September 8, 2017 at 8:27:32 PM UTC-4, Richard Smith wrote:
On 8 September 2017 at 14:36, Nicol Bolas <jmck...@gmail.com> wrote:

Part of the relevant changes would be specifying that ::operator new and malloc do in fact create such an array object; this has the nice side-effect of guaranteeing that pointer arithmetic on dynamically allocated storage actually works.

But that also means that pointer arithmetic on non-dynamically allocated storage does not work. Which means `vector` only works if it uses memory allocated by `::operator new` or `malloc`. If you have some static storage, I guess it had better already be a byte array.

And what of `std::aligned_storage/union_t`? Are those now required to be byte arrays?

They should be moved to Annex D.

To be replaced with... what, exactly? Oh sure, you can probably easily replace `aligned_storage` with a `sizeof/alignas` array declaration. But `aligned_union` is not something so easily or concisely replicated.

Wouldn't it be easier to just declare them to be byte arrays of the appropriate size/alignment? Right now, they're stated to be POD types. And an array of bytes is a POD type, yes? And since we never specified exactly what type it was, nobody was allowed to write conforming, portable code that relied on specific properties of the type, other than being POD.

Nicol Bolas

unread,
Sep 9, 2017, 12:34:20 AM9/9/17
to ISO C++ Standard - Discussion
On Friday, September 8, 2017 at 9:40:49 PM UTC-4, Richard Smith wrote:
On 8 September 2017 at 18:38, Nicol Bolas <jmck...@gmail.com> wrote:
On Friday, September 8, 2017 at 8:32:41 PM UTC-4, Richard Smith wrote:
On 8 September 2017 at 17:25, Myriachan <myri...@gmail.com> wrote:
I know you're replying to Richard, but I personally would say that every object of type T ought to be considered an array of characters of size sizeof(T).  However, that definition implies that a fix to the std::vector would also allow the shenanigans I mentioned above.

There's a delicate balance here. On the one hand, we would like C++ to support the low-level memory operations necessary to implement something like vector, and on the other hand, we would like C++ to support high-level semantics in which abstract reasoning about the behavior of a UB-free program can be performed.

Finding a middle ground is not simple, but in this case one does seem like it might be available (allowing vector but not your adjacent fields case), and that's the direction that I'm currently pursuing with encouragement from SG12. If that doesn't work out, maybe a blunter instrument will be warranted.

Here's the thing though. If all allocations (dynamic, automatic, static, whatever) are not byte arrays, then that means you cannot perform byte-pointer arithmetic on them to move from pointer to pointer. Which means `offsetof` is useless. This makes many existing forms of automatic serialization (that is, iterating through the subobjects of a type) unworkable.

But if all objects everywhere really are byte arrays, then the definition of pointer arithmetic you want (based on "provides storage" wording) can just as easily be applied to any contiguous sequence of objects of the same type, so long as they're in the same storage allocation.

Where is the middle ground here? It seems like you either make automatic serialization impossible (ignoring what reflection might bring), or you make jumping between contiguous members well-defined.

The middle ground is that this only applies to a contiguous sequence of *complete* objects.

... now, I'm very confused as to what you're saying.

My initial understanding of the idea is that there are some allocations of memory that are naturally byte arrays and some that may or may not be. If you declare an `int` on the stack, then there is no byte array providing storage for that `int`. But if you heap-allocate it, then there will be a byte array providing storage for it, through the use of special wording for `::operator new`.

But now, with this "complete object" wording, I'm not sure why it is that you cannot say that all allocations of memory are byte arrays. After all, if only complete objects can form a sequence (outside of actual arrays, of course), then you've directly forbidden the ability to jump from subobject to subobject via pointer arithmetic.

Given that, why do you need to explicitly declare some kinds of allocations to make byte arrays but not other kinds? Why can't all memory be byte arrays? If you keep this dichotomy between byte-array storage and non-byte-array storage, then you create this situation:

struct S
{
 
int x;
 
float y;
};

void do_something(S &s);

//...

auto ptr_s = newS;
S obj_s
;

do_something
(*ptr_s);
do_something
(obj_s);

//...

void do_something(S &s)
{
 
auto ptr = &s;
 
auto member_ptr = reinterpret_cast<float*>(reinterpret_cast<std::byte*>(ptr) + offsetof(y, S));
 
//Do something with member_ptr.
}

You're basically saying that `do_something(*ptr_s)` works, but `do_something(obj_s)` does not, simply because one object was created dynamically and one was not. So if I'm using `offsetof`, it only works for dynamically allocated memory or cases where the user explicitly creates them, since those are the only ones where I can cast to a byte array and do pointer arithmetic.

For any object of type `S`, I think that `do_something` should either be well-formed or ill-formed. It shouldn't matter how the object was created or what kind of storage it is in. And the only way to make that always be well-formed is to make all allocations be byte arrays.

Myriachan

unread,
Sep 11, 2017, 3:48:48 PM9/11/17
to ISO C++ Standard - Discussion
It is already the case that at least trivially copyable types have to be considered byte arrays, since basic.types/2 depends on it.  The Standard is self-inconsistent here: it says that copying the underlying bytes copies the value if it's a trivially copyable type, but it's undefined behavior to do the pointer arithmetic for this.

struct S {
   
int a, b;
};

S d
;
S s
= { 1, 2 };
std
::byte *dest = reinterpret_cast<std::byte *>(&d);
std
::byte *src = reinterpret_cast<std::byte *>(&s);

for (std::size_t x = 0; x < sizeof(S); ++x) {
    dest
[x] = src[x];
}

This is undefined behavior, because dest[x] and src[x] resolve to *(dest + x) and *(src + x) respectively, and those pointer additions are undefined behavior, because dest and src do not point to std::byte arrays.

It seems that at the least, we should define all storage to be byte arrays in addition to whatever type is constructed within.  There are side effects to that decision, however.

One side effect is that if you have a pointer to a struct object containing a char array, the compiler cannot assume that you never overflow that char array within the object, because the object is within some storage, and that storage is a char array covering at least the whole object.

If we wanted to preserve certain optimizations, we might have to design pointers such that it matters how the pointer is derived, rather than just its value.  This is already the case in certain respects, such as the rule that you can't access a second array using a pointer to just past the end of a first array, even if the compiler happened to make the two arrays adjacent, and thus the pointers are equal in the middle.

Melissa

Hyman Rosen

unread,
Sep 11, 2017, 4:14:45 PM9/11/17
to std-dis...@isocpp.org
On Mon, Sep 11, 2017 at 3:48 PM, Myriachan <myri...@gmail.com> wrote:
If we wanted to preserve certain optimizations...

Language design driven by optimization is a fundamental error which has poisoned many aspects of C++. 

Nicol Bolas

unread,
Sep 11, 2017, 6:00:24 PM9/11/17
to ISO C++ Standard - Discussion
It's illegal for you to do it manually. But it's not illegal to call `memcpy` or `memmove` to cause it to happen. The standard is just saying that those are the only ways to copy the underlying bytes.
 
This is undefined behavior, because dest[x] and src[x] resolve to *(dest + x) and *(src + x) respectively, and those pointer additions are undefined behavior, because dest and src do not point to std::byte arrays.

It seems that at the least, we should define all storage to be byte arrays in addition to whatever type is constructed within.  There are side effects to that decision, however.

One side effect is that if you have a pointer to a struct object containing a char array, the compiler cannot assume that you never overflow that char array within the object, because the object is within some storage, and that storage is a char array covering at least the whole object.

Note that this would have to be an `unsigned char` or `std::byte` array. A plain `char` array does not qualify.

Hyman Rosen

unread,
Sep 11, 2017, 6:22:30 PM9/11/17
to std-dis...@isocpp.org
On Mon, Sep 11, 2017 at 6:00 PM, Nicol Bolas <jmck...@gmail.com> wrote:
It's illegal for you to do it manually. But it's not illegal to call `memcpy` or `memmove` to cause it to happen. The standard is just saying that those are the only ways to copy the underlying bytes.

That's false.  N4687 [basic.types] talks about copying the bytes.
It does not require that only certain functions may do that copying.
The footnotes in that section, describing how bytes are copied, say
that for example using memcpy or memmove.

Nicol Bolas

unread,
Sep 11, 2017, 6:57:20 PM9/11/17
to ISO C++ Standard - Discussion
My point is that this part of the standard makes it legal to copy the bytes of certain types. But that doesn't make it wrong for other parts of the standard to forbid copying bytes in certain ways. So long as there is still some way to do it, the standard is fine.

That's not to say that I disagree with the idea of all allocations being byte arrays. But it doesn't make the standard inconsistent to have it allow byte copying in one place while forbidding a certain type of byte copying in another.

Myriachan

unread,
Sep 11, 2017, 9:13:31 PM9/11/17
to ISO C++ Standard - Discussion
I suppose that if you consider memcpy and memmove to be magic functions, then you're right.

I feel as though the community/committee needs to decide whether to go down the route of having abstract objects or having a concrete memory model.  If an extreme abstract model is there, and it's not possible (or undefined behavior) to drop down to raw memory when the programmer wants to, we might as well be coding C# or Java.  The ability to drop down to lower-level code while still having high-level code is what attracts modern developers to C++, because the other languages are safer and easier.

Melissa

Nicol Bolas

unread,
Sep 12, 2017, 10:48:13 AM9/12/17
to ISO C++ Standard - Discussion
On Monday, September 11, 2017 at 9:13:31 PM UTC-4, Myriachan wrote:
On Monday, September 11, 2017 at 3:57:20 PM UTC-7, Nicol Bolas wrote:
On Monday, September 11, 2017 at 6:22:30 PM UTC-4, Hyman Rosen wrote:
On Mon, Sep 11, 2017 at 6:00 PM, Nicol Bolas <jmck...@gmail.com> wrote:
It's illegal for you to do it manually. But it's not illegal to call `memcpy` or `memmove` to cause it to happen. The standard is just saying that those are the only ways to copy the underlying bytes.

That's false.  N4687 [basic.types] talks about copying the bytes.
It does not require that only certain functions may do that copying.
The footnotes in that section, describing how bytes are copied, say
that for example using memcpy or memmove.


My point is that this part of the standard makes it legal to copy the bytes of certain types. But that doesn't make it wrong for other parts of the standard to forbid copying bytes in certain ways. So long as there is still some way to do it, the standard is fine.

That's not to say that I disagree with the idea of all allocations being byte arrays. But it doesn't make the standard inconsistent to have it allow byte copying in one place while forbidding a certain type of byte copying in another.

I suppose that if you consider memcpy and memmove to be magic functions, then you're right.

I feel as though the community/committee needs to decide whether to go down the route of having abstract objects or having a concrete memory model.

I don't believe that such a choice needs to be made. I see no reason why you can't have a concrete memory model and have a concrete object model.

The main problem we have is that the concept of indexing memory through bytes has to happen via the object model, requiring an explicit "byte array" object. What we want is for it to be able to happen outside of the object model.

Myriachan

unread,
Sep 12, 2017, 3:00:05 PM9/12/17
to ISO C++ Standard - Discussion
On Tuesday, September 12, 2017 at 7:48:13 AM UTC-7, Nicol Bolas wrote:

I suppose that if you consider memcpy and memmove to be magic functions, then you're right.

I feel as though the community/committee needs to decide whether to go down the route of having abstract objects or having a concrete memory model.

I don't believe that such a choice needs to be made. I see no reason why you can't have a concrete memory model and have a concrete object model.

The main problem we have is that the concept of indexing memory through bytes has to happen via the object model, requiring an explicit "byte array" object. What we want is for it to be able to happen outside of the object model.


It's sometimes difficult to reconcile the two worlds, particularly when considering compiler optimizations and exotic implementations.

How do we allow bytewise access and allow std::vector::data() to be used with pointer arithmetic simultaneously with disallowing garbage like accessing the wrong element of a class?

// Assume no padding in this implementation.


struct S { int a, b, c; };

S s
{ 0, 0, 0 };

(&s.a)[2] = 4; // writes s.c???

If s is to be treated as a byte array, how could this indirection be disallowed?  A reasonable implementation of std::vector<int> would be to allocate a suitably-aligned std::byte array, then construct ints individually within the byte array.  How would the above differ from this?:

std::vector<int> v;
v
.reserve(3);
v
.push_back(0);
v
.push_back(0);
v
.push_back(0);
v
.data()[2] = 4;

The current model is rather screwy in that the + 2 within v.data()[2] = 4; is technically undefined pointer arithmetic, even though it's intended to be allowed.  The alternative would be screwy in that you could do stuff you're really not supposed to do, and compilers may have to be pessimistic.

A technically correct interpretation of the current standard would be to say that std::vector magically constructs objects sequentially while simultaneously allowing pointer arithmetic through data().  An inability to make custom containers would irritate many C++ programmers, so I don't think that that is a viable solution.

Which way is better?

Melissa

Hyman Rosen

unread,
Sep 12, 2017, 4:07:17 PM9/12/17
to std-dis...@isocpp.org
On Tue, Sep 12, 2017 at 3:00 PM, Myriachan <myri...@gmail.com> wrote:
How do we allow bytewise access and allow std::vector::data() to be used with pointer arithmetic simultaneously with disallowing garbage like accessing the wrong element of a class?

You don't.  Stop trying to disallow things, and stop letting optimization drive language design.

It is a reality that objects sit at locations in memory, and that adding an offset to a pointer to
one object can make it point to a different object.  Instead of trying to find brilliant ways to
avoid that reality, acknowledge it, and stop contorting the language so that optimizers can
assume that writing through a pointer leaves some object untouched.  If the compiler can
prove that, fine, but that shouldn't be a part of the language.

Optimization by the assumption that undefined behavior does not happen has been a curse
on the language for decades.  Failing to specify left-to-right order of evaluation has been a
curse on the language for decades.  Treating uninitialized variables as untouchable has been
a curse on the language for decades.

When you think you need std::launder, your language design has gone off the deep end.

Myriachan

unread,
Sep 12, 2017, 8:36:44 PM9/12/17
to ISO C++ Standard - Discussion
I agree with most of what you're saying, but not everyone in the community and Committee agrees, so I was trying to find a consensus resolution most are happy with.

Melissa

Hyman Rosen

unread,
Sep 13, 2017, 12:21:07 PM9/13/17
to std-dis...@isocpp.org
On Tue, Sep 12, 2017 at 8:36 PM, Myriachan <myri...@gmail.com> wrote:
I agree with most of what you're saying, but not everyone in the community and Committee agrees, so I was trying to find a consensus resolution most are happy with.

It's a noble cause, but doing that has gotten us

    a() << b(); // a() is called before b()
    a() <= b(); // a() and b() are called in unspecified order
    a() += b(); // b() is called before a()
    a().operator+=(b()); // a() is called before b()

Language design needs vision, not compromise.  That's how Jean Ichbiah
designed Ada, for example.  In that language you don't see a hodgepodge
mess of cases and exceptions.

Edward Catmur

unread,
Sep 14, 2017, 5:04:28 AM9/14/17
to ISO C++ Standard - Discussion
std::vector doesn't need to be itself magic; it can call a language support facility available to users as well as to implementors. This would be called whenever updating data() or size() to mark the range [data(), data() + size()) as amenable to pointer arithmetic of the pointer type. 

Richard Hodges

unread,
Sep 14, 2017, 5:28:27 AM9/14/17
to std-dis...@isocpp.org
Language design needs vision, not compromise.

hear hear.

It saddens me that I see so many discussions on this age-old unsolvable problem of "is it bytes or is it an object".

There is only one correct answer.

It's an object unless you say it's bytes.

proposal: modify the class/struct syntax to allow the volatile notation.

struct volatile X { ... };

now a compiler can understand that all members can be potentially aliased by a pointer and it must ensure that the memory model of the X is consistent at the beginning and end of a statement involving an X. A 'soft' thread-local memory fence, if you will.

Otherwise the compiler is safe to assume no aliasing and can make sweeping optimisations.

At the moment of course we can write 
volatile X vx;

But that eliminates *all* optimisation, including within one statement. That's overkill.

result:

every member of X with the same level of access shall be treated as both an array of bytes and as discrete objects by the compiler. Aliasing into members shall be legal and consistent.
 
pros:

no new keywords
no new attributes
no existing code breakage
obvious intent

cons:

I'll leave that to you. I can't see any.





--

Ville Voutilainen

unread,
Sep 14, 2017, 5:41:26 AM9/14/17
to std-dis...@isocpp.org
On 14 September 2017 at 12:28, Richard Hodges <hodg...@gmail.com> wrote:
> proposal: modify the class/struct syntax to allow the volatile notation.
>
> struct volatile X { ... };
>
> now a compiler can understand that all members can be potentially aliased by
> a pointer and it must ensure that the memory model of the X is consistent at
> the beginning and end of a statement involving an X. A 'soft' thread-local
> memory fence, if you will.

I don't know why you started talking about threads and memory fences
when you seemed
to want to promise that the type should not allow type-based aliasing
optimizations, but
as soon as you start talking about threads, volatile automatically
becomes unacceptable
as a keyword choice.

Richard Hodges

unread,
Sep 14, 2017, 5:43:48 AM9/14/17
to std-dis...@isocpp.org
I don't know why you started talking about threads and memory fences

I didn't.

I'm talking about a 'soft' fence. That ensures consistency of the memory model between statements. This has nothing to do with concurrency. I chose the term 'soft' fence because in my mind, it's a similar kind of concept. What term would be less confusing?



FrankHB1989

unread,
Sep 14, 2017, 6:13:31 AM9/14/17
to ISO C++ Standard - Discussion


在 2017年9月13日星期三 UTC+8上午4:07:17,Hyman Rosen写道:
On Tue, Sep 12, 2017 at 3:00 PM, Myriachan <myri...@gmail.com> wrote:
How do we allow bytewise access and allow std::vector::data() to be used with pointer arithmetic simultaneously with disallowing garbage like accessing the wrong element of a class?

You don't.  Stop trying to disallow things, and stop letting optimization drive language design.

It is a reality that objects sit at locations in memory, and that adding an offset to a pointer to
one object can make it point to a different object.
Why "disallowing"? You're requesting new rules to be put into the language which was not provided previously, aren't you?

If not, show your reality in the language. Which rules have granted these properties to you? The C++ object model do not expose such properties. The C++ memory model only allows the location to be addressed as some bytes, rather than any arbitrary object or subobject.

Instead of trying to find brilliant ways to
avoid that reality, acknowledge it, and stop contorting the language so that optimizers can
assume that writing through a pointer leaves some object untouched.  If the compiler can
prove that, fine, but that shouldn't be a part of the language.

False. It is still required to be in the language rules to ensure the leeway a conforming implementation has. Currently they are the as-if rules.

Optimization by the assumption that undefined behavior does not happen has been a curse
on the language for decades.  Failing to specify left-to-right order of evaluation has been a
curse on the language for decades.
False. You have semicolon and built-in operator comma for decades.
 
Treating uninitialized variables as untouchable has been
a curse on the language for decades.

Let alone newbies being lacking of ability to correctly use fundamental abstraction like "volatile" has been a curse on the language for decades.

Nicol Bolas

unread,
Sep 14, 2017, 11:38:30 AM9/14/17
to ISO C++ Standard - Discussion
This is a really bad way of thinking. Implementing `vector` or `vector`-like constructs should not require such an expert-level of understanding of the object model and the use of exceedingly esoteric functions.

Users have good reason to expect that, if you explicitly construct two objects of the same type beside each other in memory, then you can use pointer arithmetic to jump from one to another. Code that does this exists and is extremely prevalent.

You're effectively proposing to tell all of these people that they have to call some function (which, FYI, doesn't actually do anything) in order to make code work. Even though it already works. People will simply not do it, and therefore compiler writers will refuse to optimize for it since it would break the world.

So, what exactly have you gained over just making the code work?

I'm against breaking the object model just to allow certain C-isms to work. But I don't see how it's breaking the object model to say that two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an array. That seems like a perfectly coherent object model to me.

Edward Catmur

unread,
Sep 14, 2017, 2:03:56 PM9/14/17
to std-dis...@isocpp.org
OK, I take your point. It's certainly worth exploring what the minimal change to the Standard would be to make existing containers work without changing code.

So, what exactly have you gained over just making the code work?

I'm against breaking the object model just to allow certain C-isms to work. But I don't see how it's breaking the object model to say that two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an array. That seems like a perfectly coherent object model to me.

Yes, and this should hold whether the storage is a byte array or provided by an allocation function. 

So, if [expr.add]/4 is additionally allowed to work on the ith element of a sequence of n complete objects of the same type as *P, where either the same array provides storage for all the elements of the sequence or the elements of the sequence are constructed within the same block of storage returned by an allocation function, would that be enough? It would appear to allow any sensible implementation of vector and other contiguous containers, as well as SSO vectors and so forth. And it would not appear to leave any space for jumping between class data members as shown above. There's no need to require everything to be a byte array.

Hyman Rosen

unread,
Sep 14, 2017, 4:02:33 PM9/14/17
to std-dis...@isocpp.org
On Thu, Sep 14, 2017 at 2:03 PM, 'Edward Catmur' via ISO C++ Standard - Discussion <std-dis...@isocpp.org> wrote:
So, if [expr.add]/4 is additionally allowed to work on the ith element of a sequence of n complete objects of the same type

What about code that constructs a sequence of variously typed objects in a buffer (say, for serialization)?  For that matter, the underlying <stdarg> code used to just treat the stack as a such a buffer, and would increment a pointer through the parameters having cast the pointer to the current parameter type.

Pointer addition of T *p and ptrdiff_t offset should just be defined as (T*)((intptr_t)p + offset * sizeof(T)) with wraparound semantics for the arithmetic.  Then if the result is equal to some valid pointer, the result is that pointer.  Yes, that may let you jump between different objects via arithmetic.  So what?  The language has offsetof already, so it's not like this is an undesirable result.

Edward Catmur

unread,
Sep 14, 2017, 5:13:25 PM9/14/17
to std-dis...@isocpp.org
On Thu, Sep 14, 2017 at 9:02 PM, Hyman Rosen <hyman...@gmail.com> wrote:
On Thu, Sep 14, 2017 at 2:03 PM, 'Edward Catmur' via ISO C++ Standard - Discussion <std-dis...@isocpp.org> wrote:
So, if [expr.add]/4 is additionally allowed to work on the ith element of a sequence of n complete objects of the same type

What about code that constructs a sequence of variously typed objects in a buffer (say, for serialization)?

If you want to do that safely, you access the buffer as a byte array and construct the sequence of objects using memcpy or equivalent. You need a pointer to the buffer anyway, so you may as well just use that as the destination for each memcpy and update as you go.
 
  For that matter, the underlying <stdarg> code used to just treat the stack as a such a buffer, and would increment a pointer through the parameters having cast the pointer to the current parameter type.

va_arg yields a prvalue, so even that doesn't need to perform unsafe pointer arithmetic, whatever other magic it's doing. 
 
Pointer addition of T *p and ptrdiff_t offset should just be defined as (T*)((intptr_t)p + offset * sizeof(T)) with wraparound semantics for the arithmetic.  Then if the result is equal to some valid pointer, the result is that pointer.  Yes, that may let you jump between different objects via arithmetic.  So what?  The language has offsetof already, so it's not like this is an undesirable result.

offsetof is still useful when following the rules; it allows you to memcpy a data member into or out of a standard-layout object in a type-erased manner, knowing only its offset and size.

Myriachan

unread,
Sep 14, 2017, 5:57:47 PM9/14/17
to ISO C++ Standard - Discussion
Currently, that is usually not true.  The below is undefined behavior:

struct S { int a; int b; } s;
int c = 4;
std
::memcpy(reinterpret_cast<std::byte *>(&s) + offsetof(S, b), &c, sizeof(int));

The reason that it's undefined behavior is that the reinterpret_cast pointer does not point to an array of std::byte, so pointer arithmetic on it is undefined behavior.  This is undefined behavior even before memcpy gets involved.

That's primarily why I created this thread: the status quo is broken.  Since programmers do this often, and even more expect pointers to work across adjacent separately constructed objects, it seems that the Standard is broken here, not the programmers.

Additionally, there is an active proposal to extend offsetof to all classes that don't have virtual functions or virtual bases.  It passed LEWG in Toronto.  (Classes with virtual functions could be supported in an implementation if they want to, which probably most will, since virtual functions don't break offsetof in the ABIs I can think of. Virtual base classes, not so much.)  I hope that this proposal passes Core.

Melissa

Hyman Rosen

unread,
Sep 14, 2017, 6:01:39 PM9/14/17
to std-dis...@isocpp.org
On Thu, Sep 14, 2017 at 5:13 PM, 'Edward Catmur' via ISO C++ Standard - Discussion <std-dis...@isocpp.org> wrote:
va_arg yields a prvalue, so even that doesn't need to perform unsafe pointer arithmetic, whatever other magic it's doing.

It's not unsafe, just forbidden.  This is all a weird sort of political correctness.  The overwhelming majority of computing takes place on systems that present a single flat address space to programs, and where pointer arithmetic is just integer arithmetic.  But we are supposed to pretend that this isn't so, and that there is special fairy dust that gets sprinkled on some additions but not on others.  It's even weirder because actual programs do that sort of unblessed pointer arithmetic all the time, and have been doing so forever.  Overlaying data representations in order to treat the same segment of memory as objects of different types has been around even longer, in Fortran's EQUIVALENCE and probably earlier, but we can't do that with unions any more, even though we've been doing that with unions forever.

All of this in misguided service to the optimizationists.  The compiler should translate our code as written, without making assumptions that we're not engaging in activities that we are are, in fact, engaging in.  If it can prove something to itself, that's fine, but otherwise it needs to leave our code alone.  The language should do away with the undefined behavior that lets compilers indulge in these shenanigans.

FrankHB1989

unread,
Sep 15, 2017, 12:11:49 AM9/15/17
to ISO C++ Standard - Discussion


在 2017年9月14日星期四 UTC+8下午11:38:30,Nicol Bolas写道:
Did it ever work?
 
I'm against breaking the object model just to allow certain C-isms to work. But I don't see how it's breaking the object model to say that two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an array. That seems like a perfectly coherent object model to me.

Did it ever work in ISO C?

Nicol Bolas

unread,
Sep 15, 2017, 12:45:44 AM9/15/17
to ISO C++ Standard - Discussion
... what are you asking here? Are you asking if non-standard library people have ever written implementations of `vector`? Yes, they have.

The C++ object model has not been defined to allow it. My point is that this is a pattern of usage that our object model ought to allow. We shouldn't have to rely on esoteric functions to be able to implement something like this.

I'm against breaking the object model just to allow certain C-isms to work. But I don't see how it's breaking the object model to say that two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an array. That seems like a perfectly coherent object model to me.

Did it ever work in ISO C?

I don't see why that matters, since the C object model is completely different from C++'s. But yes, `vector`-like types work in C too.

FrankHB1989

unread,
Sep 15, 2017, 3:49:01 AM9/15/17
to ISO C++ Standard - Discussion


在 2017年9月15日星期五 UTC+8下午12:45:44,Nicol Bolas写道:
On Friday, September 15, 2017 at 12:11:49 AM UTC-4, FrankHB1989 wrote:
在 2017年9月14日星期四 UTC+8下午11:38:30,Nicol Bolas写道:
On Thursday, September 14, 2017 at 5:04:28 AM UTC-4, Edward Catmur wrote:
std::vector doesn't need to be itself magic; it can call a language support facility available to users as well as to implementors. This would be called whenever updating data() or size() to mark the range [data(), data() + size()) as amenable to pointer arithmetic of the pointer type.

This is a really bad way of thinking. Implementing `vector` or `vector`-like constructs should not require such an expert-level of understanding of the object model and the use of exceedingly esoteric functions.

Users have good reason to expect that, if you explicitly construct two objects of the same type beside each other in memory, then you can use pointer arithmetic to jump from one to another. Code that does this exists and is extremely prevalent.

You're effectively proposing to tell all of these people that they have to call some function (which, FYI, doesn't actually do anything) in order to make code work. Even though it already works. People will simply not do it, and therefore compiler writers will refuse to optimize for it since it would break the world.

So, what exactly have you gained over just making the code work?
Did it ever work?

... what are you asking here? Are you asking if non-standard library people have ever written implementations of `vector`? Yes, they have.

The C++ object model has not been defined to allow it. My point is that this is a pattern of usage that our object model ought to allow. We shouldn't have to rely on esoteric functions to be able to implement something like this.

Well, to make the problem clearer, let me borrow some terms from ISO C. Did it work in a strictly conforming way? That is, being directly portable in any conforming C++ implementations without any more assumptions that ISO C++ does not provide.

If it did work like that in the sense of C++, why you propose the change? (Otherwise it does not make much sense - things in namespace `std` already have their rights to be implemented by magic, with or without the guarantees provided by ISO C++.)

You were arguing you want it to work, with assumptions not provided by the C++ object model or any other rules in current standard. Please define them first, e.g. an alternative object model, to replace the necessity of "exceedingly esoteric functions" you don't want to see. I'm curious whether it would be more complicated.

(BTW, the naive way you seem to expect does not work in ISO C, either.)

FrankHB1989

unread,
Sep 15, 2017, 6:13:26 AM9/15/17
to ISO C++ Standard - Discussion


在 2017年9月15日星期五 UTC+8上午6:01:39,Hyman Rosen写道:
On Thu, Sep 14, 2017 at 5:13 PM, 'Edward Catmur' via ISO C++ Standard - Discussion <std-dis...@isocpp.org> wrote:
va_arg yields a prvalue, so even that doesn't need to perform unsafe pointer arithmetic, whatever other magic it's doing.

It's not unsafe, just forbidden.
How do you know it is safe when not guaranteed?
 
This is all a weird sort of political correctness.
Not all. History counts.
 
The overwhelming majority of computing takes place on systems that present a single flat address space to programs, and where pointer arithmetic is just integer arithmetic.
No. Pointers in C++ are basically types. Pointer values are typed. If you want arithmetic on address, instead propose address type or break the type system in a whole.

But we are supposed to pretend that this isn't so, and that there is special fairy dust that gets sprinkled on some additions but not on others.
No. This is the status quo: if nobody add more rules, or limitations, you can't rely on that assumption in the sense of the language.

Once you have done that in the standard, some one can then accuse you forbid such former-conforming implementations. Should all of them go to die? This is also more or less "a weird sort of political correctness".

Of course this is not totally unacceptable, e.g. removal of trigraphs. But in general, such changes should better make the specification simpler and be with less limitations to conform, to overcome the net cost of losing functionality or portability. Assuming a flat address space fails to be the evolution in that direction. (This can be a non-issue for a new language. This is never the case in C++.)

It's even weirder because actual programs do that sort of unblessed pointer arithmetic all the time, and have been doing so forever.
No. High-level languages should usually never rely on such low-level operations. They are obliged by the view of system design to provide and utilize proper high-level abstractions to escape away from implementation details. If they have to leak the abstraction, they flaw by design.

People have face such unblessed features for historical reasons. The case is just compromise. To bless such things back leads to more mess.

If you're talking about the language implementation itself - no rules forbid them. If there is some difference ...OK, it's you need to guarantee it correct (and portable) rather than vendors of the implementations. Is it unfair to do more detailed work when you want to gain more dirty corners?
 
Overlaying data representations in order to treat the same segment of memory as objects of different types has been around even longer, in Fortran's EQUIVALENCE and probably earlier, but we can't do that with unions any more, even though we've been doing that with unions forever.

All of this in misguided service to the optimizationists.  The compiler should translate our code as written, without making assumptions that we're not engaging in activities that we are are, in fact, engaging in.  If it can prove something to itself, that's fine, but otherwise it needs to leave our code alone.  The language should do away with the undefined behavior that lets compilers indulge in these shenanigans.
Without any optimization, the reasons above still apply. The rules to model "as written" are far from sufficient to be accurate about behavior, and, usually useless for a practical programming language even there is no portability problems.

To discipline most side effects exposed by the abstraction machine "as written", we already have `volatile`. So what exact change do you want to get if it does not provide a better chance to provide more flexible abstraction and ease to conform (also, way to "optimize")? Just to be more noob-friendly? ... There would be no end. You can get it done only with more anti-engineering practices.

Richard Hodges

unread,
Sep 15, 2017, 7:11:11 AM9/15/17
to std-dis...@isocpp.org
It seems to me that the problem of is-it-bytes-or-is-it-objects is only a concern because optimisers have to know whether there is any pointer aliasing going on.

Might be be fair to allow a compiler to assume there is no aliasing unless told there might be?

Humour me for a moment and imagine a construct called the 'as_if_fence".

The optimiser would be free to optimise and do "as_if" stuff, but not across such a fence.

Now the programmer has a way to prevent aliasing problems when he's treating objects as bytes.

Of course there would need to be a way to mark a function as using a fence.

example, imagine a function that takes a reference and a pointer to potentially the same object:

auto foo(Bar& bar, std::uint8_t* pbar)
{
  bar.some_updates();
  more_updates(pbar);  // this could be reordered in either direction
  auto result =  bar.result();
  return result;
}

If aggressive re-ordering were allowed, optimising this code could cause breakage because of as-if reordering where *pbar and bar are aliases.
  
so consider:

auto foo(Bar& bar, Bar* pbar)
{
  bar.some_updates();

  // compiler may not reorder memory state changes over this fence 
  // as a result of as-if-rule
  std::as_if_fence();   

  pbar->more_updates();  // so the side effects of this will be consistent

  std::as_if_fence();   // and will "happen before" the stuff below

  auto result = bar.result();
  return result;
}

Doesn't that solve the issue of is it memory or is it objects?



Richard Hodges

unread,
Sep 15, 2017, 7:15:32 AM9/15/17
to std-dis...@isocpp.org
Sorry, the second function above should be:

auto foo(Bar& bar, std::uint8_t* pbar)
{
  bar.some_updates();

  // compiler may not reorder memory state changes over this fence 
  // as a result of as-if-rule
  std::as_if_fence();   

  more_updates(pbar);  // so the side effects of this will be consistent

  std::as_if_fence();   // and will "happen before" the stuff below

  auto result = bar.result();
  return result;
}

Edward Catmur

unread,
Sep 15, 2017, 7:39:06 AM9/15/17
to std-dis...@isocpp.org
On Thu, Sep 14, 2017 at 10:57 PM, Myriachan <myri...@gmail.com> wrote:
On Thursday, September 14, 2017 at 2:13:25 PM UTC-7, Edward Catmur wrote:
On Thu, Sep 14, 2017 at 9:02 PM, Hyman Rosen <hyman...@gmail.com> wrote:
On Thu, Sep 14, 2017 at 2:03 PM, 'Edward Catmur' via ISO C++ Standard - Discussion <std-dis...@isocpp.org> wrote:
So, if [expr.add]/4 is additionally allowed to work on the ith element of a sequence of n complete objects of the same type

What about code that constructs a sequence of variously typed objects in a buffer (say, for serialization)?

If you want to do that safely, you access the buffer as a byte array and construct the sequence of objects using memcpy or equivalent. You need a pointer to the buffer anyway, so you may as well just use that as the destination for each memcpy and update as you go.
 
  For that matter, the underlying <stdarg> code used to just treat the stack as a such a buffer, and would increment a pointer through the parameters having cast the pointer to the current parameter type.

va_arg yields a prvalue, so even that doesn't need to perform unsafe pointer arithmetic, whatever other magic it's doing. 
 
Pointer addition of T *p and ptrdiff_t offset should just be defined as (T*)((intptr_t)p + offset * sizeof(T)) with wraparound semantics for the arithmetic.  Then if the result is equal to some valid pointer, the result is that pointer.  Yes, that may let you jump between different objects via arithmetic.  So what?  The language has offsetof already, so it's not like this is an undesirable result.

offsetof is still useful when following the rules; it allows you to memcpy a data member into or out of a standard-layout object in a type-erased manner, knowing only its offset and size.

Currently, that is usually not true.  The below is undefined behavior:

struct S { int a; int b; } s;
int c = 4;
std
::memcpy(reinterpret_cast<std::byte *>(&s) + offsetof(S, b), &c, sizeof(int));

The reason that it's undefined behavior is that the reinterpret_cast pointer does not point to an array of std::byte, so pointer arithmetic on it is undefined behavior.  This is undefined behavior even before memcpy gets involved.

That's primarily why I created this thread: the status quo is broken.  Since programmers do this often, and even more expect pointers to work across adjacent separately constructed objects, it seems that the Standard is broken here, not the programmers.

Right; that's the "sequence of N [bytes]" in [basic.types]/4. I agree that both of these are issues, but as far as I can tell they could be resolved by relaxing [expr.add] slightly; in this case by allowing pointer arithmetic on byte pointers within a complete object or allocation. Hopefully this need not open any holes elsewhere; e.g. the pointer interconvertibility rules [basic.compound] would still prevent using offsetof to construct a pointer to a class data member.

Nicol Bolas

unread,
Sep 15, 2017, 11:01:27 AM9/15/17
to ISO C++ Standard - Discussion
On Friday, September 15, 2017 at 6:13:26 AM UTC-4, FrankHB1989 wrote:
在 2017年9月15日星期五 UTC+8上午6:01:39,Hyman Rosen写道:
On Thu, Sep 14, 2017 at 5:13 PM, 'Edward Catmur' via ISO C++ Standard - Discussion <std-dis...@isocpp.org> wrote:
va_arg yields a prvalue, so even that doesn't need to perform unsafe pointer arithmetic, whatever other magic it's doing.

It's not unsafe, just forbidden.
How do you know it is safe when not guaranteed?
 
This is all a weird sort of political correctness.
Not all. History counts.

So why are you ignoring the long history of C++ programmers doing operations like this and expecting them to work?

The overwhelming majority of computing takes place on systems that present a single flat address space to programs, and where pointer arithmetic is just integer arithmetic.
No. Pointers in C++ are basically types. Pointer values are typed. If you want arithmetic on address, instead propose address type or break the type system in a whole.

You're misrepresenting his statement. His statement is about the systems that the implementation runs on. Your response is as if he were making a declaration about what the standard says.

His statement is correct: most systems that C++ code executes on has a flat address space, with pointer arithmetic being integer arithmetic. That is an accurate description of most computer systems.

Yes, the standard's memory model does not model this fact. He knows that, which is why he's arguing that it should be changed.

But we are supposed to pretend that this isn't so, and that there is special fairy dust that gets sprinkled on some additions but not on others.
No. This is the status quo: if nobody add more rules, or limitations, you can't rely on that assumption in the sense of the language.

Sure, but since we're talking about changing the language, that is kind of a non-sequitur.

Once you have done that in the standard, some one can then accuse you forbid such former-conforming implementations. Should all of them go to die? This is also more or less "a weird sort of political correctness".

That assumes that such implementations cannot be changed. That there are systems where it is fundamentally impossible to implement a revised C++ object model on them.

Do you have evidence of the existence of such systems?

Of course this is not totally unacceptable, e.g. removal of trigraphs. But in general, such changes should better make the specification simpler and be with less limitations to conform, to overcome the net cost of losing functionality or portability. Assuming a flat address space fails to be the evolution in that direction. (This can be a non-issue for a new language. This is never the case in C++.)

It's even weirder because actual programs do that sort of unblessed pointer arithmetic all the time, and have been doing so forever.
No. High-level languages should usually never rely on such low-level operations.

It doesn't matter what you think ought to happen. It has happened, is happening, and will continue to happen because it is a very useful thing to do.

Also, I disagree with your characterization here. Being able to drop down to low-level operations when appropriate is one of the defining characteristics of C++.

They are obliged by the view of system design to provide and utilize proper high-level abstractions to escape away from implementation details. If they have to leak the abstraction, they flaw by design.

People have face such unblessed features for historical reasons. The case is just compromise. To bless such things back leads to more mess.

Please explain why allowing the following would make the object model a "mess":

alignas(T) <some storage of T * 2 bytes> storage;
T *p1 = new(storage) T;
T *p2 = new(p1+1) T;  //Getting the 1-past-the-end is always valid ptr arithmetic.
p1[1].x = 5;
assert(p2->x == 5);

It seems perfectly clear to me what this code is expressing. You want to create two Ts right next to each other and do pointer arithmetic between them. The object model seems to be preserved here; objects are initialized in storage. They have a well-defined lifetime. And so forth.

What is wrong with allowing this?

Nicol Bolas

unread,
Sep 15, 2017, 11:12:17 AM9/15/17
to ISO C++ Standard - Discussion
On Friday, September 15, 2017 at 3:49:01 AM UTC-4, FrankHB1989 wrote:
在 2017年9月15日星期五 UTC+8下午12:45:44,Nicol Bolas写道:
On Friday, September 15, 2017 at 12:11:49 AM UTC-4, FrankHB1989 wrote:
在 2017年9月14日星期四 UTC+8下午11:38:30,Nicol Bolas写道:
On Thursday, September 14, 2017 at 5:04:28 AM UTC-4, Edward Catmur wrote:
std::vector doesn't need to be itself magic; it can call a language support facility available to users as well as to implementors. This would be called whenever updating data() or size() to mark the range [data(), data() + size()) as amenable to pointer arithmetic of the pointer type.

This is a really bad way of thinking. Implementing `vector` or `vector`-like constructs should not require such an expert-level of understanding of the object model and the use of exceedingly esoteric functions.

Users have good reason to expect that, if you explicitly construct two objects of the same type beside each other in memory, then you can use pointer arithmetic to jump from one to another. Code that does this exists and is extremely prevalent.

You're effectively proposing to tell all of these people that they have to call some function (which, FYI, doesn't actually do anything) in order to make code work. Even though it already works. People will simply not do it, and therefore compiler writers will refuse to optimize for it since it would break the world.

So, what exactly have you gained over just making the code work?
Did it ever work?

... what are you asking here? Are you asking if non-standard library people have ever written implementations of `vector`? Yes, they have.

The C++ object model has not been defined to allow it. My point is that this is a pattern of usage that our object model ought to allow. We shouldn't have to rely on esoteric functions to be able to implement something like this.

Well, to make the problem clearer, let me borrow some terms from ISO C. Did it work in a strictly conforming way? That is, being directly portable in any conforming C++ implementations without any more assumptions that ISO C++ does not provide.

The question is irrelevant, since we're talking about whether we should change the standard to make it conforming. That is, we know that the standard doesn't allow it, and we're saying that it should.
 
If it did work like that in the sense of C++, why you propose the change? (Otherwise it does not make much sense - things in namespace `std` already have their rights to be implemented by magic, with or without the guarantees provided by ISO C++.)

You were arguing you want it to work, with assumptions not provided by the C++ object model or any other rules in current standard. Please define them first, e.g. an alternative object model, to replace the necessity of "exceedingly esoteric functions" you don't want to see. I'm curious whether it would be more complicated.

The changes have already been discussed in this thread. It's in the e-mail chain you replied to. But if you need it repeated, my suggestion was:

> two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an array

It doesn't matter if it's "more complicated" than the current system. What matters is if the change:

1) Provides genuine benefit to users.

2) Does not make the object model nonsensical.

It clearly provides genuine benefit, since users are already doing it. So clearly that's something people want to do. And I don't see how it damages the object model. Each object's lifetime is still clear, and we already have the concept of "nested within" to allow objects to be dynamically created within other objects' storage. All we're doing is allowing you to effectively dynamically create arrays of objects.

FrankHB1989

unread,
Sep 15, 2017, 2:23:37 PM9/15/17
to ISO C++ Standard - Discussion


在 2017年9月15日星期五 UTC+8下午11:12:17,Nicol Bolas写道:
On Friday, September 15, 2017 at 3:49:01 AM UTC-4, FrankHB1989 wrote:
在 2017年9月15日星期五 UTC+8下午12:45:44,Nicol Bolas写道:
On Friday, September 15, 2017 at 12:11:49 AM UTC-4, FrankHB1989 wrote:
在 2017年9月14日星期四 UTC+8下午11:38:30,Nicol Bolas写道:
On Thursday, September 14, 2017 at 5:04:28 AM UTC-4, Edward Catmur wrote:
std::vector doesn't need to be itself magic; it can call a language support facility available to users as well as to implementors. This would be called whenever updating data() or size() to mark the range [data(), data() + size()) as amenable to pointer arithmetic of the pointer type.

This is a really bad way of thinking. Implementing `vector` or `vector`-like constructs should not require such an expert-level of understanding of the object model and the use of exceedingly esoteric functions.

Users have good reason to expect that, if you explicitly construct two objects of the same type beside each other in memory, then you can use pointer arithmetic to jump from one to another. Code that does this exists and is extremely prevalent.

You're effectively proposing to tell all of these people that they have to call some function (which, FYI, doesn't actually do anything) in order to make code work. Even though it already works. People will simply not do it, and therefore compiler writers will refuse to optimize for it since it would break the world.

So, what exactly have you gained over just making the code work?
Did it ever work?

... what are you asking here? Are you asking if non-standard library people have ever written implementations of `vector`? Yes, they have.

The C++ object model has not been defined to allow it. My point is that this is a pattern of usage that our object model ought to allow. We shouldn't have to rely on esoteric functions to be able to implement something like this.

Well, to make the problem clearer, let me borrow some terms from ISO C. Did it work in a strictly conforming way? That is, being directly portable in any conforming C++ implementations without any more assumptions that ISO C++ does not provide.

The question is irrelevant, since we're talking about whether we should change the standard to make it conforming. That is, we know that the standard doesn't allow it, and we're saying that it should.
 
Sure, but this is not the reason qualifying it should.

If it did work like that in the sense of C++, why you propose the change? (Otherwise it does not make much sense - things in namespace `std` already have their rights to be implemented by magic, with or without the guarantees provided by ISO C++.)

You were arguing you want it to work, with assumptions not provided by the C++ object model or any other rules in current standard. Please define them first, e.g. an alternative object model, to replace the necessity of "exceedingly esoteric functions" you don't want to see. I'm curious whether it would be more complicated.

The changes have already been discussed in this thread. It's in the e-mail chain you replied to. But if you need it repeated, my suggestion was:

> two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an array

It doesn't matter if it's "more complicated" than the current system. What matters is if the change:

Is this enough? For example, is "constructed adjacently" allowed to be normative solely in the standard without providing a definition of term? How does "contiguous sequence" apply?

I see the point, but no formal wording. I doubt there can be bad things slip in easily, until an exhaustive list of actual changes are checked.

1) Provides genuine benefit to users.

2) Does not make the object model nonsensical.

It clearly provides genuine benefit, since users are already doing it.
It provides compatibility to old code which was written without mind of this issue. This may benefit or not. Also note verbosity and limitations on operations does not effect much, as the case where `observer_ptr` is considered superior than raw pointers.

So clearly that's something people want to do.

So, no.

And I don't see how it damages the object model. Each object's lifetime is still clear, and we already have the concept of "nested within" to allow objects to be dynamically created within other objects' storage. All we're doing is allowing you to effectively dynamically create arrays of objects.
This effectively changes the notion of validity of pointer values. Consider comparing of two iterator values from different vector objects with same type. You can't assume the result being meaningful because they can be from unrelated sequences. In this case the sequences assumed are array objects. The change claims arbitrary two "adjacent" objects are in the same sequence, which sounds like it requires the language to provide the definition of infinite number of implicit array lvalues with unknown bound (and they can be aliased randomly) on-the-fly. This implies a very strange picture on the object model, especially when [intro.object] is left unchanged.



Myriachan

unread,
Sep 15, 2017, 4:02:56 PM9/15/17
to ISO C++ Standard - Discussion
On Friday, September 15, 2017 at 11:23:37 AM UTC-7, FrankHB1989 wrote:
Is this enough? For example, is "constructed adjacently" allowed to be normative solely in the standard without providing a definition of term? How does "contiguous sequence" apply?

I see the point, but no formal wording. I doubt there can be bad things slip in easily, until an exhaustive list of actual changes are checked.

The formal wording is a formality; once we decide what we want, then we can write a proposal with formal wording.


1) Provides genuine benefit to users.

2) Does not make the object model nonsensical.

It clearly provides genuine benefit, since users are already doing it.
It provides compatibility to old code which was written without mind of this issue. This may benefit or not. Also note verbosity and limitations on operations does not effect much, as the case where `observer_ptr` is considered superior than raw pointers.


"Old code which was written without mind of this issue" - you mean, every current implementation of std::vector out there?  I don't know that it's possible to implement std::vector entirely correctly without violating this rule, due to the presence of the reserve() function.  The current state of things is that std::vector is necessarily a magic class.
 
And I don't see how it damages the object model. Each object's lifetime is still clear, and we already have the concept of "nested within" to allow objects to be dynamically created within other objects' storage. All we're doing is allowing you to effectively dynamically create arrays of objects.
This effectively changes the notion of validity of pointer values. Consider comparing of two iterator values from different vector objects with same type. You can't assume the result being meaningful because they can be from unrelated sequences. In this case the sequences assumed are array objects. The change claims arbitrary two "adjacent" objects are in the same sequence, which sounds like it requires the language to provide the definition of infinite number of implicit array lvalues with unknown bound (and they can be aliased randomly) on-the-fly. This implies a very strange picture on the object model, especially when [intro.object] is left unchanged.


This is why one of the rules I propose is that you can only do this for pointers that are part of the same block of storage.  Two std::vectors would allocate memory separately from their allocator, meaning that even if the two arrays end up adjacent, the pointer arithmetic would still be undefined behavior.  Similarly for two automatic arrays of the same type, even if the compiler happens to put them adjacent in stack memory, because such allocation is considered to be separate storage.

Melissa

Nicol Bolas

unread,
Sep 15, 2017, 5:15:31 PM9/15/17
to ISO C++ Standard - Discussion
On Friday, September 15, 2017 at 2:23:37 PM UTC-4, FrankHB1989 wrote:
在 2017年9月15日星期五 UTC+8下午11:12:17,Nicol Bolas写道:
And I don't see how it damages the object model. Each object's lifetime is still clear, and we already have the concept of "nested within" to allow objects to be dynamically created within other objects' storage. All we're doing is allowing you to effectively dynamically create arrays of objects.
This effectively changes the notion of validity of pointer values. Consider comparing of two iterator values from different vector objects with same type.

This is already explicitly undefined behavior. Iterators from different containers cannot be compared to one another. However, that's irrelevant because:
 
You can't assume the result being meaningful because they can be from unrelated sequences.

Allow me to quote myself again:

> two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an array

So unless those two containers got their allocations from the same storage, that can't happen.

In this case the sequences assumed are array objects. The change claims arbitrary two "adjacent" objects are in the same sequence, which sounds like it requires the language to provide the definition of infinite number of implicit array lvalues with unknown bound (and they can be aliased randomly) on-the-fly. This implies a very strange picture on the object model, especially when [intro.object] is left unchanged.

It implies only the following.

A top-level object (one which is not explicitly a subobject) lives in a piece of storage. If there is a top-level object of the same type directly adjacent to it in that same storage, then it is legal to access that top-level object via pointer arithmetic from this object.

Or to put it another way, you're not assuming anything is an array object. Pointer arithmetic now works for things that aren't arrays. It's now explicitly for accessing adjacent objects, either sibling top-level objects or sibling array elements.

FrankHB1989

unread,
Sep 16, 2017, 2:13:06 PM9/16/17
to ISO C++ Standard - Discussion


在 2017年9月16日星期六 UTC+8上午4:02:56,Myriachan写道:
On Friday, September 15, 2017 at 11:23:37 AM UTC-7, FrankHB1989 wrote:
Is this enough? For example, is "constructed adjacently" allowed to be normative solely in the standard without providing a definition of term? How does "contiguous sequence" apply?

I see the point, but no formal wording. I doubt there can be bad things slip in easily, until an exhaustive list of actual changes are checked.

The formal wording is a formality; once we decide what we want, then we can write a proposal with formal wording.


1) Provides genuine benefit to users.

2) Does not make the object model nonsensical.

It clearly provides genuine benefit, since users are already doing it.
It provides compatibility to old code which was written without mind of this issue. This may benefit or not. Also note verbosity and limitations on operations does not effect much, as the case where `observer_ptr` is considered superior than raw pointers.


"Old code which was written without mind of this issue" - you mean, every current implementation of std::vector out there?  I don't know that it's possible to implement std::vector entirely correctly without violating this rule, due to the presence of the reserve() function.  The current state of things is that std::vector is necessarily a magic class.
Probably yes. But the case is different, since it is a part of implementation, vendors can provide additional guarantees to avoid portability problems in user code. Ask them if you are afraid of bugs here, if you can't provide these guarantees by yourself.
I don't think making std::vector relying on such magic is a deliberate design, but implementations of every standard library have already the rights to rely on the magic, since no rule tells you can avoid that. If you don't like it, propose new rules in [library], to require the library components except [support.general] and implementation-defined ones being always implementable in portable C++; or directly move them out of the standard. Anyway, this should not be only applicable on std::vector. (Though this is more likely another topic.)

 
And I don't see how it damages the object model. Each object's lifetime is still clear, and we already have the concept of "nested within" to allow objects to be dynamically created within other objects' storage. All we're doing is allowing you to effectively dynamically create arrays of objects.
This effectively changes the notion of validity of pointer values. Consider comparing of two iterator values from different vector objects with same type. You can't assume the result being meaningful because they can be from unrelated sequences. In this case the sequences assumed are array objects. The change claims arbitrary two "adjacent" objects are in the same sequence, which sounds like it requires the language to provide the definition of infinite number of implicit array lvalues with unknown bound (and they can be aliased randomly) on-the-fly. This implies a very strange picture on the object model, especially when [intro.object] is left unchanged.


This is why one of the rules I propose is that you can only do this for pointers that are part of the same block of storage.  Two std::vectors would allocate memory separately from their allocator, meaning that even if the two arrays end up adjacent, the pointer arithmetic would still be undefined behavior.  Similarly for two automatic arrays of the same type, even if the compiler happens to put them adjacent in stack memory, because such allocation is considered to be separate storage.

Well, I see, you actually have to make it bypassing the object model (otherwise there would a chicken and egg problem on definition of "same block storage" without changing on [intro.object]) for limited cases. This might be technically doable, but as a user I am not comfortable to rely on these rules reasoning the program with "object pointers". It seems just like a hack. I hope there would be a more sane address space model to resolve the problem, if possible.


Melissa

FrankHB1989

unread,
Sep 16, 2017, 2:48:11 PM9/16/17
to ISO C++ Standard - Discussion


在 2017年9月16日星期六 UTC+8上午5:15:31,Nicol Bolas写道:
I think I see your point now. That's why I illustrated the limitations on iterators: just as an iterator, conceptually, a valid nonnull object pointer is always implicitly bound to a sequence, i.e. an array object. This is the base to build the semantics of pointer arithmetic operations, or, random access iteration operations. The proposed change turns it to rebuild the semantics on the layout properties occasionally provided by underlying memory model instead of the object model. This undermines the ability of reasoning on well-behaved pointer arithmetic operations severely, because it then needs more information on siblings which often can't be collected from a single context. I don't see it a desired abstraction even the wording can be patched in limited cases.

 

Nicol Bolas

unread,
Sep 16, 2017, 8:35:24 PM9/16/17
to ISO C++ Standard - Discussion
Any function which takes a pointer as a parameter lacks knowledge that the pointer is "bound to a sequence, i.e. an array object". So it is already the case that reasoning is about pointer arithmetic is impaired.

Under the current wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer and know that it points into an array object of `T`s. Under the new wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer, and then look in the direction of the arithmetic to see if there are more `T`s, and that they all share the same storage and are not themselves subobjects.

I submit that if you can reason about the former, then you have all of the information needed to reason about the latter. In both cases, you have to be able to walk through memory and determine what is actually in that region of storage pointed to by a pointer. You have to be able to turn an address into the nested sequence of objects that are pointed to by that address.
 
I don't see it a desired abstraction even the wording can be patched in limited cases.

 So, being able to implement `std::vector` in a platform-neutral fashion is not a desired feature of the language?

FrankHB1989

unread,
Sep 17, 2017, 2:13:09 AM9/17/17
to ISO C++ Standard - Discussion


在 2017年9月17日星期日 UTC+8上午8:35:24,Nicol Bolas写道:
On Saturday, September 16, 2017 at 2:48:11 PM UTC-4, FrankHB1989 wrote:
在 2017年9月16日星期六 UTC+8上午5:15:31,Nicol Bolas写道:
On Friday, September 15, 2017 at 2:23:37 PM UTC-4, FrankHB1989 wrote:
在 2017年9月15日星期五 UTC+8下午11:12:17,Nicol Bolas写道:
And I don't see how it damages the object model. Each object's lifetime is still clear, and we already have the concept of "nested within" to allow objects to be dynamically created within other objects' storage. All we're doing is allowing you to effectively dynamically create arrays of objects.
This effectively changes the notion of validity of pointer values. Consider comparing of two iterator values from different vector objects with same type.

This is already explicitly undefined behavior. Iterators from different containers cannot be compared to one another. However, that's irrelevant because:
 
You can't assume the result being meaningful because they can be from unrelated sequences.

Allow me to quote myself again:

> two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an array

So unless those two containers got their allocations from the same storage, that can't happen.

In this case the sequences assumed are array objects. The change claims arbitrary two "adjacent" objects are in the same sequence, which sounds like it requires the language to provide the definition of infinite number of implicit array lvalues with unknown bound (and they can be aliased randomly) on-the-fly. This implies a very strange picture on the object model, especially when [intro.object] is left unchanged.

It implies only the following.

A top-level object (one which is not explicitly a subobject) lives in a piece of storage. If there is a top-level object of the same type directly adjacent to it in that same storage, then it is legal to access that top-level object via pointer arithmetic from this object.

Or to put it another way, you're not assuming anything is an array object. Pointer arithmetic now works for things that aren't arrays. It's now explicitly for accessing adjacent objects, either sibling top-level objects or sibling array elements.
I think I see your point now. That's why I illustrated the limitations on iterators: just as an iterator, conceptually, a valid nonnull object pointer is always implicitly bound to a sequence, i.e. an array object. This is the base to build the semantics of pointer arithmetic operations, or, random access iteration operations. The proposed change turns it to rebuild the semantics on the layout properties occasionally provided by underlying memory model instead of the object model. This undermines the ability of reasoning on well-behaved pointer arithmetic operations severely, because it then needs more information on siblings which often can't be collected from a single context.

Any function which takes a pointer as a parameter lacks knowledge that the pointer is "bound to a sequence, i.e. an array object". So it is already the case that reasoning is about pointer arithmetic is impaired.

This is caused by the design of type system. Working around semantic limitations is not a fix of it.

In general, the C++ type system is not power enough to encode such information in type signatures. Some form of gradual typing is needed. Though there would more problems if you do fix it, e.g. ABI.

However, that does not implies that the knowledge cannot be reasoned besides typechecking.

Under the current wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer and know that it points into an array object of `T`s. Under the new wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer, and then look in the direction of the arithmetic to see if there are more `T`s, and that they all share the same storage and are not themselves subobjects.

I submit that if you can reason about the former, then you have all of the information needed to reason about the latter. In both cases, you have to be able to walk through memory and determine what is actually in that region of storage pointed to by a pointer. You have to be able to turn an address into the nested sequence of objects that are pointed to by that address.
 
Really? The former is not depending on the memory model directly. It needs only to know the length of array and one of the pointer values to nth element of the array, not the complete element type, nor the complete array type. Size of element is not interested here, nor is the layout of elements in the array. On the contrary, the latter is not possible if you don't know the layout of members of the complete type it resides in, and the way to determine all the information is to calculate the layout depending on the complete type definitions and the memory model, or, to simulate the actual allocations.

I don't see it a desired abstraction even the wording can be patched in limited cases.

 So, being able to implement `std::vector` in a platform-neutral fashion is not a desired feature of the language?

No. To be accurate, I agree that allowing implementation of `std::vector` being portable (in sense of conforming C++ programs) is a convenience.

However:

1. Technically, it is a convenience, not a must, even lacking of the convenience easily cause bad things we can see, politically.

2. It has pros and cons. Not all cases need such convenience and it can harm users not relying on it in reality. For instance, insufficient vendor-based optimization. Users can hardly rescue the case unless they turn themselves to be vendors.

3. As I have said, it should not be only applicable on `std::vector`. It is strange to allow `std::vector` as the only special case. If the ability is important to keep, why not only `std::vector`?

4.To avoid it requiring magic directly to be conforming is the correct direction. But it does not imply modification of core rules is always the desired choice just for that. (Personally I prefer magic-wrapped-in-another-std-library approach.)



Edward Catmur

unread,
Sep 17, 2017, 6:13:16 AM9/17/17
to std-dis...@isocpp.org
Note that any implementation of std::vector will always require at least as much magic as std::optional, since optional is equivalent to a vector with max_size of 1; an object with const or reference data members can be erased and re-emplaced into the same storage.

So at present there's not much point tweaking rules on pointer arithmetic to allow std::vector to operate without magic, since it will still need magic to deal with destroyed and recreated objects. 


Nicol Bolas

unread,
Sep 17, 2017, 9:22:29 AM9/17/17
to ISO C++ Standard - Discussion
On Sunday, September 17, 2017 at 2:13:09 AM UTC-4, FrankHB1989 wrote:
在 2017年9月17日星期日 UTC+8上午8:35:24,Nicol Bolas写道:
On Saturday, September 16, 2017 at 2:48:11 PM UTC-4, FrankHB1989 wrote:
在 2017年9月16日星期六 UTC+8上午5:15:31,Nicol Bolas写道:
On Friday, September 15, 2017 at 2:23:37 PM UTC-4, FrankHB1989 wrote:
在 2017年9月15日星期五 UTC+8下午11:12:17,Nicol Bolas写道:
And I don't see how it damages the object model. Each object's lifetime is still clear, and we already have the concept of "nested within" to allow objects to be dynamically created within other objects' storage. All we're doing is allowing you to effectively dynamically create arrays of objects.
This effectively changes the notion of validity of pointer values. Consider comparing of two iterator values from different vector objects with same type.

This is already explicitly undefined behavior. Iterators from different containers cannot be compared to one another. However, that's irrelevant because:
 
You can't assume the result being meaningful because they can be from unrelated sequences.

Allow me to quote myself again:

> two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an array

So unless those two containers got their allocations from the same storage, that can't happen.

In this case the sequences assumed are array objects. The change claims arbitrary two "adjacent" objects are in the same sequence, which sounds like it requires the language to provide the definition of infinite number of implicit array lvalues with unknown bound (and they can be aliased randomly) on-the-fly. This implies a very strange picture on the object model, especially when [intro.object] is left unchanged.

It implies only the following.

A top-level object (one which is not explicitly a subobject) lives in a piece of storage. If there is a top-level object of the same type directly adjacent to it in that same storage, then it is legal to access that top-level object via pointer arithmetic from this object.

Or to put it another way, you're not assuming anything is an array object. Pointer arithmetic now works for things that aren't arrays. It's now explicitly for accessing adjacent objects, either sibling top-level objects or sibling array elements.
I think I see your point now. That's why I illustrated the limitations on iterators: just as an iterator, conceptually, a valid nonnull object pointer is always implicitly bound to a sequence, i.e. an array object. This is the base to build the semantics of pointer arithmetic operations, or, random access iteration operations. The proposed change turns it to rebuild the semantics on the layout properties occasionally provided by underlying memory model instead of the object model. This undermines the ability of reasoning on well-behaved pointer arithmetic operations severely, because it then needs more information on siblings which often can't be collected from a single context.

Any function which takes a pointer as a parameter lacks knowledge that the pointer is "bound to a sequence, i.e. an array object". So it is already the case that reasoning is about pointer arithmetic is impaired.

This is caused by the design of type system.

The cause is irrelevant; the fact is that it's the way the system works. You can pass pointers to functions, and those functions can do pointer arithmetic on them if those pointers happen to point to the right thing.

Working around semantic limitations is not a fix of it.

In general, the C++ type system is not power enough to encode such information in type signatures. Some form of gradual typing is needed. Though there would more problems if you do fix it, e.g. ABI.

However, that does not implies that the knowledge cannot be reasoned besides typechecking.

It doesn't "imply" anything; it outright states it. A `T*` is a pointer; that's a type. There is no type-based way to distinguish a pointer to an array element from a pointer to something that isn't an array element. Therefore, typechecking alone cannot be used to determine if pointer arithmetic is valid.

That is the way C++ works today. Given that fact, there is no reason why we can't extend the set of conditions in which pointer arithmetic works.

Under the current wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer and know that it points into an array object of `T`s. Under the new wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer, and then look in the direction of the arithmetic to see if there are more `T`s, and that they all share the same storage and are not themselves subobjects.

I submit that if you can reason about the former, then you have all of the information needed to reason about the latter. In both cases, you have to be able to walk through memory and determine what is actually in that region of storage pointed to by a pointer. You have to be able to turn an address into the nested sequence of objects that are pointed to by that address.
 
Really? The former is not depending on the memory model directly. It needs only to know the length of array and one of the pointer values to nth element of the array, not the complete element type, nor the complete array type. Size of element is not interested here, nor is the layout of elements in the array.

You forget what we're doing here.

We're trying to reason about the object model implications about a particular piece of code. Specifically, we have a function that takes a pointer. That function is performing pointer arithmetic on that pointer. We're trying to see what it would take for the object model to be able to verify, at runtime, if that pointer arithmetic would result in well-defined behavior. What must the object model be able to determine from that pointer.

Well, in order to get to "runtime", we must first get past compile-time. And it is ill-formed for code to perform pointer arithmetic on `T*` if `T` is incomplete. Therefore, we must already know the "size of element".

Furthermore, you do need to know the "layout of elements in the array", because you need to make sure that the type `T` that points to is the same type as the array element type. After all, the `T*` could be pointing to the first subobject of the array element type `U`. Since `U*` will have the same address as the `T*` first subobject, you need to be able to differentiate between these cases.

Knowing that a `T*` happens to point into an array isn't enough; it must point into an array of `T`. It must be a direct subobject of an array of `T`.

So given a `T*`, we have to be able to ask (at least) if it is a subobject and, if it is, if that containing object is an array of `T`.

And if you have the length of the array, and the array element type, then by definition you know the "complete array type".

The only way to determine the well-behaved status of this is to have data structures in memory that can take a typed pointer and get the containing object. And if you can do this once, you can do this repeatedly until there is no containing object.

The only difference between the "pointer arithmetic on arrays" and "pointer arithmetic on top-level sequences" is that the former can have "containing object" be based only on static subobject definitions, while the latter requires "containing object" to handle dynamic "nested-within" objects. But really, if you have a memory system that can compute the former, there's no reason you can't extend it to compute the latter.

On the contrary, the latter is not possible if you don't know the layout of members of the complete type it resides in, and the way to determine all the information is to calculate the layout depending on the complete type definitions and the memory model, or, to simulate the actual allocations.

I don't see it a desired abstraction even the wording can be patched in limited cases.

 So, being able to implement `std::vector` in a platform-neutral fashion is not a desired feature of the language?

No. To be accurate, I agree that allowing implementation of `std::vector` being portable (in sense of conforming C++ programs) is a convenience.

However:

1. Technically, it is a convenience, not a must, even lacking of the convenience easily cause bad things we can see, politically.

... huh?

2. It has pros and cons. Not all cases need such convenience and it can harm users not relying on it in reality. For instance, insufficient vendor-based optimization. Users can hardly rescue the case unless they turn themselves to be vendors.

3. As I have said, it should not be only applicable on `std::vector`. It is strange to allow `std::vector` as the only special case. If the ability is important to keep, why not only `std::vector`?

I don't know what you're talking about. The feature we're discussing here would not be limited to `vector`. It's simply the most obvious use case for it and the biggest justification for having it.

FrankHB1989

unread,
Sep 17, 2017, 1:25:13 PM9/17/17
to ISO C++ Standard - Discussion


在 2017年9月17日星期日 UTC+8下午9:22:29,Nicol Bolas写道:
On Sunday, September 17, 2017 at 2:13:09 AM UTC-4, FrankHB1989 wrote:
在 2017年9月17日星期日 UTC+8上午8:35:24,Nicol Bolas写道:
On Saturday, September 16, 2017 at 2:48:11 PM UTC-4, FrankHB1989 wrote:
在 2017年9月16日星期六 UTC+8上午5:15:31,Nicol Bolas写道:
On Friday, September 15, 2017 at 2:23:37 PM UTC-4, FrankHB1989 wrote:
在 2017年9月15日星期五 UTC+8下午11:12:17,Nicol Bolas写道:
And I don't see how it damages the object model. Each object's lifetime is still clear, and we already have the concept of "nested within" to allow objects to be dynamically created within other objects' storage. All we're doing is allowing you to effectively dynamically create arrays of objects.
This effectively changes the notion of validity of pointer values. Consider comparing of two iterator values from different vector objects with same type.

This is already explicitly undefined behavior. Iterators from different containers cannot be compared to one another. However, that's irrelevant because:
 
You can't assume the result being meaningful because they can be from unrelated sequences.

Allow me to quote myself again:

> two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an array

So unless those two containers got their allocations from the same storage, that can't happen.

In this case the sequences assumed are array objects. The change claims arbitrary two "adjacent" objects are in the same sequence, which sounds like it requires the language to provide the definition of infinite number of implicit array lvalues with unknown bound (and they can be aliased randomly) on-the-fly. This implies a very strange picture on the object model, especially when [intro.object] is left unchanged.

It implies only the following.

A top-level object (one which is not explicitly a subobject) lives in a piece of storage. If there is a top-level object of the same type directly adjacent to it in that same storage, then it is legal to access that top-level object via pointer arithmetic from this object.

Or to put it another way, you're not assuming anything is an array object. Pointer arithmetic now works for things that aren't arrays. It's now explicitly for accessing adjacent objects, either sibling top-level objects or sibling array elements.
I think I see your point now. That's why I illustrated the limitations on iterators: just as an iterator, conceptually, a valid nonnull object pointer is always implicitly bound to a sequence, i.e. an array object. This is the base to build the semantics of pointer arithmetic operations, or, random access iteration operations. The proposed change turns it to rebuild the semantics on the layout properties occasionally provided by underlying memory model instead of the object model. This undermines the ability of reasoning on well-behaved pointer arithmetic operations severely, because it then needs more information on siblings which often can't be collected from a single context.

Any function which takes a pointer as a parameter lacks knowledge that the pointer is "bound to a sequence, i.e. an array object". So it is already the case that reasoning is about pointer arithmetic is impaired.

This is caused by the design of type system.

The cause is irrelevant; the fact is that it's the way the system works. You can pass pointers to functions, and those functions can do pointer arithmetic on them if those pointers happen to point to the right thing.

Not necessary. This is how C-style pointers work with typechecking. A type like ptr<T> can also work like that. But C++ does not have limitation to force every parametric type working only in this exact style.
 
Working around semantic limitations is not a fix of it.

In general, the C++ type system is not power enough to encode such information in type signatures. Some form of gradual typing is needed. Though there would more problems if you do fix it, e.g. ABI.

However, that does not implies that the knowledge cannot be reasoned besides typechecking.

It doesn't "imply" anything; it outright states it. A `T*` is a pointer; that's a type. There is no type-based way to distinguish a pointer to an array element from a pointer to something that isn't an array element. Therefore, typechecking alone cannot be used to determine if pointer arithmetic is valid.

That is the way C++ works today. Given that fact, there is no reason why we can't extend the set of conditions in which pointer arithmetic works.

There is also no sufficient reason about why it has to be extended in this way.

Under the current wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer and know that it points into an array object of `T`s. Under the new wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer, and then look in the direction of the arithmetic to see if there are more `T`s, and that they all share the same storage and are not themselves subobjects.

I submit that if you can reason about the former, then you have all of the information needed to reason about the latter. In both cases, you have to be able to walk through memory and determine what is actually in that region of storage pointed to by a pointer. You have to be able to turn an address into the nested sequence of objects that are pointed to by that address.
 
Really? The former is not depending on the memory model directly. It needs only to know the length of array and one of the pointer values to nth element of the array, not the complete element type, nor the complete array type. Size of element is not interested here, nor is the layout of elements in the array.

You forget what we're doing here.

We're trying to reason about the object model implications about a particular piece of code. Specifically, we have a function that takes a pointer. That function is performing pointer arithmetic on that pointer. We're trying to see what it would take for the object model to be able to verify, at runtime, if that pointer arithmetic would result in well-defined behavior. What must the object model be able to determine from that pointer.

Well, in order to get to "runtime", we must first get past compile-time. And it is ill-formed for code to perform pointer arithmetic on `T*` if `T` is incomplete. Therefore, we must already know the "size of element".

To reason a program does not mean to figure out all the information that can be determined by the program semantics. The fact that an conforming implementation must have known the exact size of the type in the well-formed program does not mean that the one who are reasoning the program should. If the array in the program compiled, the element type does have a positive size, but the exact value is not necessarily interested in reasoning. For example, it is just a common divisor in boundary checking based on address calculation; so why not use the operand in pointer arithmetic directly instead of addresses?

Furthermore, you do need to know the "layout of elements in the array", because you need to make sure that the type `T` that points to is the same type as the array element type. After all, the `T*` could be pointing to the first subobject of the array element type `U`. Since `U*` will have the same address as the `T*` first subobject, you need to be able to differentiate between these cases.

Knowing that a `T*` happens to point into an array isn't enough; it must point into an array of `T`. It must be a direct subobject of an array of `T`.

So given a `T*`, we have to be able to ask (at least) if it is a subobject and, if it is, if that containing object is an array of `T`.

Verifying correctness with type-based aliasing is a different scene which requires other rules. And if the propose is to figure out all the possible undefined behavior upon the rules derived from the object model (at least, as per [object.life]), the information provided here is still far from enough.

And if you have the length of the array, and the array element type, then by definition you know the "complete array type".

The only way to determine the well-behaved status of this is to have data structures in memory that can take a typed pointer and get the containing object. And if you can do this once, you can do this repeatedly until there is no containing object.

The only difference between the "pointer arithmetic on arrays" and "pointer arithmetic on top-level sequences" is that the former can have "containing object" be based only on static subobject definitions, while the latter requires "containing object" to handle dynamic "nested-within" objects. But really, if you have a memory system that can compute the former, there's no reason you can't extend it to compute the latter.

Simplicity. Locality.

Probably more importantly, computation complexity matters a lot, when it is checked using my brain during coding.

On the contrary, the latter is not possible if you don't know the layout of members of the complete type it resides in, and the way to determine all the information is to calculate the layout depending on the complete type definitions and the memory model, or, to simulate the actual allocations.

I don't see it a desired abstraction even the wording can be patched in limited cases.

 So, being able to implement `std::vector` in a platform-neutral fashion is not a desired feature of the language?

No. To be accurate, I agree that allowing implementation of `std::vector` being portable (in sense of conforming C++ programs) is a convenience.

However:

1. Technically, it is a convenience, not a must, even lacking of the convenience easily cause bad things we can see, politically.

... huh?

I did not say solution of the CWG issue itself is not needed. But the proposed rules change is not necessarily needed.

See the proposed alternatives by others.

2. It has pros and cons. Not all cases need such convenience and it can harm users not relying on it in reality. For instance, insufficient vendor-based optimization. Users can hardly rescue the case unless they turn themselves to be vendors.

3. As I have said, it should not be only applicable on `std::vector`. It is strange to allow `std::vector` as the only special case. If the ability is important to keep, why not only `std::vector`?

I don't know what you're talking about. The feature we're discussing here would not be limited to `vector`. It's simply the most obvious use case for it and the biggest justification for having it.

This is like X-Y problem.

If it is really desired to allow `std::vector` in platform-neutral fashion, why not just guarantee "magic-free" property for implementation standard library components in general? Even `std::vector` is a notable example frustrating users, it is debatable to be special enough as the special case to get away from the magic.

And if we have generally solved this problem by other means, is the proposed change still needed?

Nicol Bolas

unread,
Sep 17, 2017, 6:52:03 PM9/17/17
to ISO C++ Standard - Discussion
On Sunday, September 17, 2017 at 1:25:13 PM UTC-4, FrankHB1989 wrote:
在 2017年9月17日星期日 UTC+8下午9:22:29,Nicol Bolas写道:
On Sunday, September 17, 2017 at 2:13:09 AM UTC-4, FrankHB1989 wrote:
在 2017年9月17日星期日 UTC+8上午8:35:24,Nicol Bolas写道:
On Saturday, September 16, 2017 at 2:48:11 PM UTC-4, FrankHB1989 wrote:
Working around semantic limitations is not a fix of it.

In general, the C++ type system is not power enough to encode such information in type signatures. Some form of gradual typing is needed. Though there would more problems if you do fix it, e.g. ABI.

However, that does not implies that the knowledge cannot be reasoned besides typechecking.

It doesn't "imply" anything; it outright states it. A `T*` is a pointer; that's a type. There is no type-based way to distinguish a pointer to an array element from a pointer to something that isn't an array element. Therefore, typechecking alone cannot be used to determine if pointer arithmetic is valid.

That is the way C++ works today. Given that fact, there is no reason why we can't extend the set of conditions in which pointer arithmetic works.

There is also no sufficient reason about why it has to be extended in this way.

The reasons have already been explained. Whether you find them "sufficient" is a personal choice.

To me, so long as the object model remains reasonable and coherent, fixing the problem is worth it.
 

Under the current wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer and know that it points into an array object of `T`s. Under the new wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer, and then look in the direction of the arithmetic to see if there are more `T`s, and that they all share the same storage and are not themselves subobjects.

I submit that if you can reason about the former, then you have all of the information needed to reason about the latter. In both cases, you have to be able to walk through memory and determine what is actually in that region of storage pointed to by a pointer. You have to be able to turn an address into the nested sequence of objects that are pointed to by that address.
 
Really? The former is not depending on the memory model directly. It needs only to know the length of array and one of the pointer values to nth element of the array, not the complete element type, nor the complete array type. Size of element is not interested here, nor is the layout of elements in the array.

You forget what we're doing here.

We're trying to reason about the object model implications about a particular piece of code. Specifically, we have a function that takes a pointer. That function is performing pointer arithmetic on that pointer. We're trying to see what it would take for the object model to be able to verify, at runtime, if that pointer arithmetic would result in well-defined behavior. What must the object model be able to determine from that pointer.

Well, in order to get to "runtime", we must first get past compile-time. And it is ill-formed for code to perform pointer arithmetic on `T*` if `T` is incomplete. Therefore, we must already know the "size of element".

To reason a program does not mean to figure out all the information that can be determined by the program semantics. The fact that an conforming implementation must have known the exact size of the type in the well-formed program does not mean that the one who are reasoning the program should.

I disagree. You cannot reason about an ill-formed program, since by definition, an ill-formed program is semantic nonsense. If it's ill-formed, it is not a C++ program. And I submit that you cannot use C++ logic to reason about things that aren't C++ programs.
 
If the array in the program compiled, the element type does have a positive size, but the exact value is not necessarily interested in reasoning. For example, it is just a common divisor in boundary checking based on address calculation; so why not use the operand in pointer arithmetic directly instead of addresses?

Furthermore, you do need to know the "layout of elements in the array", because you need to make sure that the type `T` that points to is the same type as the array element type. After all, the `T*` could be pointing to the first subobject of the array element type `U`. Since `U*` will have the same address as the `T*` first subobject, you need to be able to differentiate between these cases.

Knowing that a `T*` happens to point into an array isn't enough; it must point into an array of `T`. It must be a direct subobject of an array of `T`.

So given a `T*`, we have to be able to ask (at least) if it is a subobject and, if it is, if that containing object is an array of `T`.

Verifying correctness with type-based aliasing is a different scene which requires other rules. And if the propose is to figure out all the possible undefined behavior upon the rules derived from the object model (at least, as per [object.life]), the information provided here is still far from enough.

My point is that the proposed solution does not render the object model to be incoherent or nonsensical. It is just as reasonable as the old version, requiring the same reasoning tools that the old version required.

It simply has different answers for different situations. In the original version, you say that pointer arithmetic moves through an array. In the new version, you say that pointer arithmetic is moving between sequential top-level objects (which arrays are a subset of). If you find that difficult to reason about... I can't really help that.


And if you have the length of the array, and the array element type, then by definition you know the "complete array type".

The only way to determine the well-behaved status of this is to have data structures in memory that can take a typed pointer and get the containing object. And if you can do this once, you can do this repeatedly until there is no containing object.

The only difference between the "pointer arithmetic on arrays" and "pointer arithmetic on top-level sequences" is that the former can have "containing object" be based only on static subobject definitions, while the latter requires "containing object" to handle dynamic "nested-within" objects. But really, if you have a memory system that can compute the former, there's no reason you can't extend it to compute the latter.

Simplicity.

Necessary functionality trumps simplicity.
 
Locality.

Locality of what? I already demonstrated that the current system has no greater "locality" than this one.
 
Probably more importantly, computation complexity matters a lot, when it is checked using my brain during coding.

On the contrary, the latter is not possible if you don't know the layout of members of the complete type it resides in, and the way to determine all the information is to calculate the layout depending on the complete type definitions and the memory model, or, to simulate the actual allocations.

I don't see it a desired abstraction even the wording can be patched in limited cases.

 So, being able to implement `std::vector` in a platform-neutral fashion is not a desired feature of the language?

No. To be accurate, I agree that allowing implementation of `std::vector` being portable (in sense of conforming C++ programs) is a convenience.

However:

1. Technically, it is a convenience, not a must, even lacking of the convenience easily cause bad things we can see, politically.

... huh?

I did not say solution of the CWG issue itself is not needed. But the proposed rules change is not necessarily needed.

See the proposed alternatives by others.

The only alternative on this thread which isn't merely an alternate statement or limited version of what I described is "use magic function to declare that a region of sequential top-level objects contains an array". Which is an expert-only tool, since non-experts would never even guess that such a thing would be needed.

Dynamic creation of arrays should not be an expert-only thing.


2. It has pros and cons. Not all cases need such convenience and it can harm users not relying on it in reality. For instance, insufficient vendor-based optimization. Users can hardly rescue the case unless they turn themselves to be vendors.

3. As I have said, it should not be only applicable on `std::vector`. It is strange to allow `std::vector` as the only special case. If the ability is important to keep, why not only `std::vector`?

I don't know what you're talking about. The feature we're discussing here would not be limited to `vector`. It's simply the most obvious use case for it and the biggest justification for having it.

This is like X-Y problem.

If it is really desired to allow `std::vector` in platform-neutral fashion, why not just guarantee "magic-free" property for implementation standard library components in general? Even `std::vector` is a notable example frustrating users, it is debatable to be special enough as the special case to get away from the magic.

What part of "The feature we're discussing here would not be limited to `vector`" eluded you? `vector` already works. The point is not to make `vector` work. It's to allow users to write their own dynamic arrays, which includes the possibility of writing `vector`, but also any other kind of dynamic array type. And to do so without deep knowledge of esoteric C++ functions.




FrankHB1989

unread,
Sep 18, 2017, 12:16:11 AM9/18/17
to ISO C++ Standard - Discussion


在 2017年9月18日星期一 UTC+8上午6:52:03,Nicol Bolas写道:
On Sunday, September 17, 2017 at 1:25:13 PM UTC-4, FrankHB1989 wrote:
在 2017年9月17日星期日 UTC+8下午9:22:29,Nicol Bolas写道:
On Sunday, September 17, 2017 at 2:13:09 AM UTC-4, FrankHB1989 wrote:
在 2017年9月17日星期日 UTC+8上午8:35:24,Nicol Bolas写道:
On Saturday, September 16, 2017 at 2:48:11 PM UTC-4, FrankHB1989 wrote:
Working around semantic limitations is not a fix of it.

In general, the C++ type system is not power enough to encode such information in type signatures. Some form of gradual typing is needed. Though there would more problems if you do fix it, e.g. ABI.

However, that does not implies that the knowledge cannot be reasoned besides typechecking.

It doesn't "imply" anything; it outright states it. A `T*` is a pointer; that's a type. There is no type-based way to distinguish a pointer to an array element from a pointer to something that isn't an array element. Therefore, typechecking alone cannot be used to determine if pointer arithmetic is valid.

That is the way C++ works today. Given that fact, there is no reason why we can't extend the set of conditions in which pointer arithmetic works.

There is also no sufficient reason about why it has to be extended in this way.

The reasons have already been explained. Whether you find them "sufficient" is a personal choice.

To me, so long as the object model remains reasonable and coherent, fixing the problem is worth it.
 
To me, so long as the object model remains reasonable and coherent, avoiding being replaced by memory model is worth it.

Under the current wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer and know that it points into an array object of `T`s. Under the new wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer, and then look in the direction of the arithmetic to see if there are more `T`s, and that they all share the same storage and are not themselves subobjects.

I submit that if you can reason about the former, then you have all of the information needed to reason about the latter. In both cases, you have to be able to walk through memory and determine what is actually in that region of storage pointed to by a pointer. You have to be able to turn an address into the nested sequence of objects that are pointed to by that address.
 
Really? The former is not depending on the memory model directly. It needs only to know the length of array and one of the pointer values to nth element of the array, not the complete element type, nor the complete array type. Size of element is not interested here, nor is the layout of elements in the array.

You forget what we're doing here.

We're trying to reason about the object model implications about a particular piece of code. Specifically, we have a function that takes a pointer. That function is performing pointer arithmetic on that pointer. We're trying to see what it would take for the object model to be able to verify, at runtime, if that pointer arithmetic would result in well-defined behavior. What must the object model be able to determine from that pointer.

Well, in order to get to "runtime", we must first get past compile-time. And it is ill-formed for code to perform pointer arithmetic on `T*` if `T` is incomplete. Therefore, we must already know the "size of element".

To reason a program does not mean to figure out all the information that can be determined by the program semantics. The fact that an conforming implementation must have known the exact size of the type in the well-formed program does not mean that the one who are reasoning the program should.

I disagree. You cannot reason about an ill-formed program, since by definition, an ill-formed program is semantic nonsense. If it's ill-formed, it is not a C++ program. And I submit that you cannot use C++ logic to reason about things that aren't C++ programs.
 
To let it make sense, the program itself being reasoned is assumed well-formed. The fact is not necessarily rechecked in reasoning; generally, it has to be separately verified if not provided as a premise.

If the array in the program compiled, the element type does have a positive size, but the exact value is not necessarily interested in reasoning. For example, it is just a common divisor in boundary checking based on address calculation; so why not use the operand in pointer arithmetic directly instead of addresses?

Furthermore, you do need to know the "layout of elements in the array", because you need to make sure that the type `T` that points to is the same type as the array element type. After all, the `T*` could be pointing to the first subobject of the array element type `U`. Since `U*` will have the same address as the `T*` first subobject, you need to be able to differentiate between these cases.

Knowing that a `T*` happens to point into an array isn't enough; it must point into an array of `T`. It must be a direct subobject of an array of `T`.

So given a `T*`, we have to be able to ask (at least) if it is a subobject and, if it is, if that containing object is an array of `T`.

Verifying correctness with type-based aliasing is a different scene which requires other rules. And if the propose is to figure out all the possible undefined behavior upon the rules derived from the object model (at least, as per [object.life]), the information provided here is still far from enough.

My point is that the proposed solution does not render the object model to be incoherent or nonsensical. It is just as reasonable as the old version, requiring the same reasoning tools that the old version required.

The C++ object is largely decoupled with the memory model. It is apparently deliberately designed to be address-space-agnostic, as well as being isolated with assumptions of object layout between arbitrary objects.To enforce stronger assumptions on the object model with rules requiring properties more fit in underlying model seems not like an effort to make it coherent and sensible.

It simply has different answers for different situations. In the original version, you say that pointer arithmetic moves through an array. In the new version, you say that pointer arithmetic is moving between sequential top-level objects (which arrays are a subset of). If you find that difficult to reason about... I can't really help that.

The difference is what I meant for "locality" - the property of allowing determining specific properties in the context where pointer value exists without non "local" knowledge. Specifically, this includes the pointer value and the identity of the sequence (a reference and the length of the array object) it belongs. Originally, the pointer is bound with the sequence directly, so one can just assume there is always an array that directly contains it. As of pointer arithmetics, the resulted valid nonnull pointer value are always in the transitive closure of a quite limited set of finite well-defined operations which can be resolved on that array. On the contrary, the new version make it impossible to reason without simulation of object layout, which needs additional information not "local" at all.

Practically, there is one more difference. The length of the sequence are often passed to the context where pointer value exists, which reflects a quite common API style. In such case, the limitation on sequence can be strengthened by a separated length provided by user, then only a base pointer of the range is need to form the "local" knowledge. (Note the sequence can also be a subrange in the array.) This is actually the check logic on several operations of an random access iterator and it can be easily reused here. Whether the range is based on pointer value is implementation details. The new version can not work like this because there is no general portable way to iterate over sequence consisted of offsets of arbitrary subobjects in the given top-level object.


And if you have the length of the array, and the array element type, then by definition you know the "complete array type".

The only way to determine the well-behaved status of this is to have data structures in memory that can take a typed pointer and get the containing object. And if you can do this once, you can do this repeatedly until there is no containing object.

The only difference between the "pointer arithmetic on arrays" and "pointer arithmetic on top-level sequences" is that the former can have "containing object" be based only on static subobject definitions, while the latter requires "containing object" to handle dynamic "nested-within" objects. But really, if you have a memory system that can compute the former, there's no reason you can't extend it to compute the latter.

Simplicity.

Necessary functionality trumps simplicity.
Well... less is more :-)
To treat workaround as necessary functionality is more or less... a personal choice.

 
Locality.

Locality of what? I already demonstrated that the current system has no greater "locality" than this one.
 
See above.

Probably more importantly, computation complexity matters a lot, when it is checked using my brain during coding.

On the contrary, the latter is not possible if you don't know the layout of members of the complete type it resides in, and the way to determine all the information is to calculate the layout depending on the complete type definitions and the memory model, or, to simulate the actual allocations.

I don't see it a desired abstraction even the wording can be patched in limited cases.

 So, being able to implement `std::vector` in a platform-neutral fashion is not a desired feature of the language?

No. To be accurate, I agree that allowing implementation of `std::vector` being portable (in sense of conforming C++ programs) is a convenience.

However:

1. Technically, it is a convenience, not a must, even lacking of the convenience easily cause bad things we can see, politically.

... huh?

I did not say solution of the CWG issue itself is not needed. But the proposed rules change is not necessarily needed.

See the proposed alternatives by others.

The only alternative on this thread which isn't merely an alternate statement or limited version of what I described is "use magic function to declare that a region of sequential top-level objects contains an array". Which is an expert-only tool, since non-experts would never even guess that such a thing would be needed.

The magic should be rarely used. It should not disturb you to write daily code.

Note nowadays writing a properly optimized vector has already been exact a kind of expert-level work. (Exception guarantees? Allocators? Expansion policy?)

Even there are no magic functions, the proposed rules are still like magic for non-experts, just implicit and more likely to be ignored by average users.

Moreover, non-experts are likely to ignore UB that related to object model totally. (How many users have learned strict aliasing?) They take the rights to breaking type and/or memory safety for granted, without attention.on extra responsibility needed. I doubt loosing the rules would make the case worse.

Dynamic creation of arrays should not be an expert-only thing.


2. It has pros and cons. Not all cases need such convenience and it can harm users not relying on it in reality. For instance, insufficient vendor-based optimization. Users can hardly rescue the case unless they turn themselves to be vendors.

3. As I have said, it should not be only applicable on `std::vector`. It is strange to allow `std::vector` as the only special case. If the ability is important to keep, why not only `std::vector`?

I don't know what you're talking about. The feature we're discussing here would not be limited to `vector`. It's simply the most obvious use case for it and the biggest justification for having it.

This is like X-Y problem.

If it is really desired to allow `std::vector` in platform-neutral fashion, why not just guarantee "magic-free" property for implementation standard library components in general? Even `std::vector` is a notable example frustrating users, it is debatable to be special enough as the special case to get away from the magic.

What part of "The feature we're discussing here would not be limited to `vector`" eluded you? `vector` already works. The point is not to make `vector` work. It's to allow users to write their own dynamic arrays, which includes the possibility of writing `vector`, but also any other kind of dynamic array type.
I was certainly talking about implementing a `std::vector` like interface by users. My point is not only for `std::vector`, other `std` components can have opportunities sharing the same kind of guarantees. This is a broader problem, though.

And to do so without deep knowledge of esoteric C++ functions.


It depends. I don't think it needs to be esoteric than the rules you proposed.

 


Reply all
Reply to author
Forward
0 new messages