2182. Pointer arithmetic in array-like containers
Section: 5.7 [expr.add] Status: drafting Submitter: Jonathan Wakely Date: 2015-10-20The current direction for issue 1776 (see paper P0137) calls into question the validity of doing pointer arithmetic to address separately-allocated but contiguous objects in a container like std::vector. A related question is whether there should be some allowance made for allowing pointer arithmetic using a pointer to a base class if the derived class is a standard-layout class with no non-static data members. It is possible that std::launder could play a part in the resolution of this issue.
Notes from the February, 2016 meeting:
This issue is expected to be resolved by the resolution of issue 1776. The major problem is when the elements of the vector contain constant or reference members; 3.8 [basic.life] paragraph 7 implies that pointer arithmetic leading to such an object produces undefined behavior, and CWG expects this to continue. Some changes to the interface of std::vector may be required, perhaps using std::launder as part of iterator processing.
It seems incredible that the direction of the Standard would be toward making pointer arithmetic undefined for objects inside an std::vector just because they have a const member or reference member.
class C {
int &member;
void f();
...
};
...
std::vector<C> v;
v.push_back(C());
v.push_back(C());
v.push_back(C());
(&v[0] + 2)->f();
This would seriously be undefined by what was stated above? This is completely ridiculous to me.
First of all, it seems that P0137R1 didn't solve the arithmetic issue, making &v[0] + 2 illegal pointer arithmetic in the first place, because std::vector
(most likely) constructed the objects separately. Second, making this
undefined just because of a const or reference nonstatic member would
break an unbelievable amount of existing C++ code if this arithmetic
were to suddenly require a call to std::launder.
For the first issue, it seems like we should formally define pointer arithmetic as working across adjacent array objects, with individual objects being treated as an array of size 1 for this purpose as usual. This does have implications, though, because it would allow "bad" code such as the following to be well-defined:
struct S {
int a, b, c;
};
...
static_assert(offsetof(S, b) == offsetof(S, a) + sizeof(S::a));
static_assert(offsetof(S, c) == offsetof(S, b) + sizeof(S::b));
S s;
(&s.a)[2] = 2;
assert(s.c == 2);
If we solve the pointer arithmetic with std::vector, it seems as if this code must either be correct (or be ill-formed because the static_asserts
fire). I don't think that compiler optimizers would appreciate this
very much, though. This seems like the "right" solution to the whole
problem to me, but I can see why there would be objections. (I would
propose additionally that char * / unsigned char * / std::byte *
be allowed to cross among objects even of different types, so long as
the arithmetic remains within the bounds of that allocated storage
block...but I'm a radical around here.)
Core issue 2182:2182. Pointer arithmetic in array-like containers
Section: 5.7 [expr.add] Status: drafting Submitter: Jonathan Wakely Date: 2015-10-20The current direction for issue 1776 (see paper P0137) calls into question the validity of doing pointer arithmetic to address separately-allocated but contiguous objects in a container like std::vector. A related question is whether there should be some allowance made for allowing pointer arithmetic using a pointer to a base class if the derived class is a standard-layout class with no non-static data members. It is possible that std::launder could play a part in the resolution of this issue.
Notes from the February, 2016 meeting:
This issue is expected to be resolved by the resolution of issue 1776. The major problem is when the elements of the vector contain constant or reference members; 3.8 [basic.life] paragraph 7 implies that pointer arithmetic leading to such an object produces undefined behavior, and CWG expects this to continue. Some changes to the interface of std::vector may be required, perhaps using std::launder as part of iterator processing.
It seems incredible that the direction of the Standard would be toward making pointer arithmetic undefined for objects inside an std::vector just because they have a const member or reference member.
class C {
int &member;
void f();
...
};
...
std::vector<C> v;
v.push_back(C());
v.push_back(C());
v.push_back(C());
(&v[0] + 2)->f();
This would seriously be undefined by what was stated above?
This is completely ridiculous to me.
First of all, it seems that P0137R1 didn't solve the arithmetic issue, making &v[0] + 2 illegal pointer arithmetic in the first place, because std::vector (most likely) constructed the objects separately. Second, making this undefined just because of a const or reference nonstatic member would break an unbelievable amount of existing C++ code if this arithmetic were to suddenly require a call to std::launder.
For the first issue, it seems like we should formally define pointer arithmetic as working across adjacent array objects, with individual objects being treated as an array of size 1 for this purpose as usual.
This does have implications, though, because it would allow "bad" code such as the following to be well-defined:
struct S {
int a, b, c;
};
...
static_assert(offsetof(S, b) == offsetof(S, a) + sizeof(S::a));static_assert(offsetof(S, c) == offsetof(S, b) + sizeof(S::b));S s;
(&s.a)[2] = 2;
assert(s.c == 2);
If we solve the pointer arithmetic with std::vector, it seems as if this code must either be correct (or be ill-formed because the static_asserts fire). I don't think that compiler optimizers would appreciate this very much, though. This seems like the "right" solution to the whole problem to me, but I can see why there would be objections. (I would propose additionally that char * / unsigned char * / std::byte * be allowed to cross among objects even of different types, so long as the arithmetic remains within the bounds of that allocated storage block...but I'm a radical around here.)
For the second issue, this would be a strongly breaking change - at least, it would be a breaking change in the sense that too much existing code relied upon this even if it were undefined behavior for technical reasons. I think a lot of C++ programmers would be unhappy that this is suddenly undefined behavior and compilers start emitting code that generates nasal demons.
An option that would technically work would be to require that std::vector allocate its whole collection of objects as an array, meaning that any push_back would necessarily require moving the entire array. This obviously won't fly for performance reasons.
Thanks, and sorry for the long text,
Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
This is completely ridiculous to me.
First of all, it seems that P0137R1 didn't solve the arithmetic issue, making &v[0] + 2 illegal pointer arithmetic in the first place, because std::vector (most likely) constructed the objects separately. Second, making this undefined just because of a const or reference nonstatic member would break an unbelievable amount of existing C++ code if this arithmetic were to suddenly require a call to std::launder.
For the first issue, it seems like we should formally define pointer arithmetic as working across adjacent array objects, with individual objects being treated as an array of size 1 for this purpose as usual.
SG12 is already exploring this direction, but only for adjacent objects whose storage is provided by the same array.
Right now, we already have that.The problem is that we don't allow pointer arithmetic to work across adjacent objects of the same type whose storage is provided by the same allocation.
This is completely ridiculous to me.
First of all, it seems that P0137R1 didn't solve the arithmetic issue, making &v[0] + 2 illegal pointer arithmetic in the first place, because std::vector (most likely) constructed the objects separately. Second, making this undefined just because of a const or reference nonstatic member would break an unbelievable amount of existing C++ code if this arithmetic were to suddenly require a call to std::launder.
For the first issue, it seems like we should formally define pointer arithmetic as working across adjacent array objects, with individual objects being treated as an array of size 1 for this purpose as usual.
SG12 is already exploring this direction, but only for adjacent objects whose storage is provided by the same array.What does "provided by the same array" mean, exactly? Right now, we already have that.The problem is that we don't allow pointer arithmetic to work across adjacent objects of the same type whose storage is provided by the same allocation.
struct S {
int a, b, c;
};
...
static_assert(offsetof(S, b) == offsetof(S, a) + sizeof(S::a));
static_assert(offsetof(S, c) == offsetof(S, b) + sizeof(S::b));
void Function(S &s){
(&s.a)[2] = 2;
assert(s.c == 2);
}
Unless you're saying that `vector` has to allocate memory, then do `new() char[]` on the allocation, and only then perform construction on any types in the memory. Or unless you're saying that every allocation of memory, every object, is also an array of bytes in addition to being whatever it currently is.
I think that the permissive route is better overall, but it does impede some optimizations.Melissa
--
Unless you're saying that `vector` has to allocate memory, then do `new() char[]` on the allocation, and only then perform construction on any types in the memory. Or unless you're saying that every allocation of memory, every object, is also an array of bytes in addition to being whatever it currently is.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.
I think that the permissive route is better overall, but it does impede some optimizations.Melissa
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.
--Unless you're saying that `vector` has to allocate memory, then do `new() char[]` on the allocation, and only then perform construction on any types in the memory. Or unless you're saying that every allocation of memory, every object, is also an array of bytes in addition to being whatever it currently is.
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.
I think that the permissive route is better overall, but it does impede some optimizations.Melissa--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.
On 8 September 2017 at 18:05, Nicol Bolas <jmck...@gmail.com> wrote:On Friday, September 8, 2017 at 8:27:32 PM UTC-4, Richard Smith wrote:On 8 September 2017 at 14:36, Nicol Bolas <jmck...@gmail.com> wrote:Part of the relevant changes would be specifying that ::operator new and malloc do in fact create such an array object; this has the nice side-effect of guaranteeing that pointer arithmetic on dynamically allocated storage actually works.But that also means that pointer arithmetic on non-dynamically allocated storage does not work. Which means `vector` only works if it uses memory allocated by `::operator new` or `malloc`. If you have some static storage, I guess it had better already be a byte array.And what of `std::aligned_storage/union_t`? Are those now required to be byte arrays?They should be moved to Annex D.
On 8 September 2017 at 18:38, Nicol Bolas <jmck...@gmail.com> wrote:On Friday, September 8, 2017 at 8:32:41 PM UTC-4, Richard Smith wrote:On 8 September 2017 at 17:25, Myriachan <myri...@gmail.com> wrote:I know you're replying to Richard, but I personally would say that every object of type T ought to be considered an array of characters of size sizeof(T). However, that definition implies that a fix to the std::vector would also allow the shenanigans I mentioned above.There's a delicate balance here. On the one hand, we would like C++ to support the low-level memory operations necessary to implement something like vector, and on the other hand, we would like C++ to support high-level semantics in which abstract reasoning about the behavior of a UB-free program can be performed.Finding a middle ground is not simple, but in this case one does seem like it might be available (allowing vector but not your adjacent fields case), and that's the direction that I'm currently pursuing with encouragement from SG12. If that doesn't work out, maybe a blunter instrument will be warranted.Here's the thing though. If all allocations (dynamic, automatic, static, whatever) are not byte arrays, then that means you cannot perform byte-pointer arithmetic on them to move from pointer to pointer. Which means `offsetof` is useless. This makes many existing forms of automatic serialization (that is, iterating through the subobjects of a type) unworkable.But if all objects everywhere really are byte arrays, then the definition of pointer arithmetic you want (based on "provides storage" wording) can just as easily be applied to any contiguous sequence of objects of the same type, so long as they're in the same storage allocation.Where is the middle ground here? It seems like you either make automatic serialization impossible (ignoring what reflection might bring), or you make jumping between contiguous members well-defined.The middle ground is that this only applies to a contiguous sequence of *complete* objects.
struct S
{
int x;
float y;
};
void do_something(S &s);
//...
auto ptr_s = newS;
S obj_s;
do_something(*ptr_s);
do_something(obj_s);
//...
void do_something(S &s)
{
auto ptr = &s;
auto member_ptr = reinterpret_cast<float*>(reinterpret_cast<std::byte*>(ptr) + offsetof(y, S));
//Do something with member_ptr.
}struct S {
int a, b;
};
S d;
S s = { 1, 2 };
std::byte *dest = reinterpret_cast<std::byte *>(&d);
std::byte *src = reinterpret_cast<std::byte *>(&s);
for (std::size_t x = 0; x < sizeof(S); ++x) {
dest[x] = src[x];
}
If we wanted to preserve certain optimizations...
This is undefined behavior, because dest[x] and src[x] resolve to *(dest + x) and *(src + x) respectively, and those pointer additions are undefined behavior, because dest and src do not point to std::byte arrays.It seems that at the least, we should define all storage to be byte arrays in addition to whatever type is constructed within. There are side effects to that decision, however.One side effect is that if you have a pointer to a struct object containing a char array, the compiler cannot assume that you never overflow that char array within the object, because the object is within some storage, and that storage is a char array covering at least the whole object.
It's illegal for you to do it manually. But it's not illegal to call `memcpy` or `memmove` to cause it to happen. The standard is just saying that those are the only ways to copy the underlying bytes.
On Monday, September 11, 2017 at 3:57:20 PM UTC-7, Nicol Bolas wrote:On Monday, September 11, 2017 at 6:22:30 PM UTC-4, Hyman Rosen wrote:On Mon, Sep 11, 2017 at 6:00 PM, Nicol Bolas <jmck...@gmail.com> wrote:It's illegal for you to do it manually. But it's not illegal to call `memcpy` or `memmove` to cause it to happen. The standard is just saying that those are the only ways to copy the underlying bytes.
That's false. N4687 [basic.types] talks about copying the bytes.
It does not require that only certain functions may do that copying.
The footnotes in that section, describing how bytes are copied, say
that for example using memcpy or memmove.
My point is that this part of the standard makes it legal to copy the bytes of certain types. But that doesn't make it wrong for other parts of the standard to forbid copying bytes in certain ways. So long as there is still some way to do it, the standard is fine.
That's not to say that I disagree with the idea of all allocations being byte arrays. But it doesn't make the standard inconsistent to have it allow byte copying in one place while forbidding a certain type of byte copying in another.
I suppose that if you consider memcpy and memmove to be magic functions, then you're right.
I feel as though the community/committee needs to decide whether to go down the route of having abstract objects or having a concrete memory model.
I suppose that if you consider memcpy and memmove to be magic functions, then you're right.
I feel as though the community/committee needs to decide whether to go down the route of having abstract objects or having a concrete memory model.I don't believe that such a choice needs to be made. I see no reason why you can't have a concrete memory model and have a concrete object model.The main problem we have is that the concept of indexing memory through bytes has to happen via the object model, requiring an explicit "byte array" object. What we want is for it to be able to happen outside of the object model.
// Assume no padding in this implementation.
struct S { int a, b, c; };
S s{ 0, 0, 0 };
(&s.a)[2] = 4; // writes s.c???std::vector<int> v;
v.reserve(3);
v.push_back(0);
v.push_back(0);
v.push_back(0);
v.data()[2] = 4;How do we allow bytewise access and allow std::vector::data() to be used with pointer arithmetic simultaneously with disallowing garbage like accessing the wrong element of a class?
I agree with most of what you're saying, but not everyone in the community and Committee agrees, so I was trying to find a consensus resolution most are happy with.
--
On Tue, Sep 12, 2017 at 3:00 PM, Myriachan <myri...@gmail.com> wrote:How do we allow bytewise access and allow std::vector::data() to be used with pointer arithmetic simultaneously with disallowing garbage like accessing the wrong element of a class?
You don't. Stop trying to disallow things, and stop letting optimization drive language design.
It is a reality that objects sit at locations in memory, and that adding an offset to a pointer to
one object can make it point to a different object.
Instead of trying to find brilliant ways to
avoid that reality, acknowledge it, and stop contorting the language so that optimizers can
assume that writing through a pointer leaves some object untouched. If the compiler can
prove that, fine, but that shouldn't be a part of the language.
Optimization by the assumption that undefined behavior does not happen has been a curse
on the language for decades. Failing to specify left-to-right order of evaluation has been a
curse on the language for decades.
Treating uninitialized variables as untouchable has been
a curse on the language for decades.
So, what exactly have you gained over just making the code work?I'm against breaking the object model just to allow certain C-isms to work. But I don't see how it's breaking the object model to say that two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an array. That seems like a perfectly coherent object model to me.
So, if [expr.add]/4 is additionally allowed to work on the ith element of a sequence of n complete objects of the same type
On Thu, Sep 14, 2017 at 2:03 PM, 'Edward Catmur' via ISO C++ Standard - Discussion <std-dis...@isocpp.org> wrote:So, if [expr.add]/4 is additionally allowed to work on the ith element of a sequence of n complete objects of the same type
What about code that constructs a sequence of variously typed objects in a buffer (say, for serialization)?
For that matter, the underlying <stdarg> code used to just treat the stack as a such a buffer, and would increment a pointer through the parameters having cast the pointer to the current parameter type.
Pointer addition of T *p and ptrdiff_t offset should just be defined as (T*)((intptr_t)p + offset * sizeof(T)) with wraparound semantics for the arithmetic. Then if the result is equal to some valid pointer, the result is that pointer. Yes, that may let you jump between different objects via arithmetic. So what? The language has offsetof already, so it's not like this is an undesirable result.
struct S { int a; int b; } s;
int c = 4;
std::memcpy(reinterpret_cast<std::byte *>(&s) + offsetof(S, b), &c, sizeof(int));va_arg yields a prvalue, so even that doesn't need to perform unsafe pointer arithmetic, whatever other magic it's doing.
I'm against breaking the object model just to allow certain C-isms to work. But I don't see how it's breaking the object model to say that two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an array. That seems like a perfectly coherent object model to me.
I'm against breaking the object model just to allow certain C-isms to work. But I don't see how it's breaking the object model to say that two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an array. That seems like a perfectly coherent object model to me.Did it ever work in ISO C?
On Friday, September 15, 2017 at 12:11:49 AM UTC-4, FrankHB1989 wrote:在 2017年9月14日星期四 UTC+8下午11:38:30,Nicol Bolas写道:On Thursday, September 14, 2017 at 5:04:28 AM UTC-4, Edward Catmur wrote:std::vector doesn't need to be itself magic; it can call a language support facility available to users as well as to implementors. This would be called whenever updating data() or size() to mark the range [data(), data() + size()) as amenable to pointer arithmetic of the pointer type.This is a really bad way of thinking. Implementing `vector` or `vector`-like constructs should not require such an expert-level of understanding of the object model and the use of exceedingly esoteric functions.Users have good reason to expect that, if you explicitly construct two objects of the same type beside each other in memory, then you can use pointer arithmetic to jump from one to another. Code that does this exists and is extremely prevalent.You're effectively proposing to tell all of these people that they have to call some function (which, FYI, doesn't actually do anything) in order to make code work. Even though it already works. People will simply not do it, and therefore compiler writers will refuse to optimize for it since it would break the world.So, what exactly have you gained over just making the code work?Did it ever work?... what are you asking here? Are you asking if non-standard library people have ever written implementations of `vector`? Yes, they have.The C++ object model has not been defined to allow it. My point is that this is a pattern of usage that our object model ought to allow. We shouldn't have to rely on esoteric functions to be able to implement something like this.
On Thu, Sep 14, 2017 at 5:13 PM, 'Edward Catmur' via ISO C++ Standard - Discussion <std-dis...@isocpp.org> wrote:va_arg yields a prvalue, so even that doesn't need to perform unsafe pointer arithmetic, whatever other magic it's doing.
It's not unsafe, just forbidden.
This is all a weird sort of political correctness.
The overwhelming majority of computing takes place on systems that present a single flat address space to programs, and where pointer arithmetic is just integer arithmetic.
But we are supposed to pretend that this isn't so, and that there is special fairy dust that gets sprinkled on some additions but not on others.
It's even weirder because actual programs do that sort of unblessed pointer arithmetic all the time, and have been doing so forever.
Overlaying data representations in order to treat the same segment of memory as objects of different types has been around even longer, in Fortran's EQUIVALENCE and probably earlier, but we can't do that with unions any more, even though we've been doing that with unions forever.
All of this in misguided service to the optimizationists. The compiler should translate our code as written, without making assumptions that we're not engaging in activities that we are are, in fact, engaging in. If it can prove something to itself, that's fine, but otherwise it needs to leave our code alone. The language should do away with the undefined behavior that lets compilers indulge in these shenanigans.
On Thursday, September 14, 2017 at 2:13:25 PM UTC-7, Edward Catmur wrote:On Thu, Sep 14, 2017 at 9:02 PM, Hyman Rosen <hyman...@gmail.com> wrote:On Thu, Sep 14, 2017 at 2:03 PM, 'Edward Catmur' via ISO C++ Standard - Discussion <std-dis...@isocpp.org> wrote:So, if [expr.add]/4 is additionally allowed to work on the ith element of a sequence of n complete objects of the same type
What about code that constructs a sequence of variously typed objects in a buffer (say, for serialization)?If you want to do that safely, you access the buffer as a byte array and construct the sequence of objects using memcpy or equivalent. You need a pointer to the buffer anyway, so you may as well just use that as the destination for each memcpy and update as you go.For that matter, the underlying <stdarg> code used to just treat the stack as a such a buffer, and would increment a pointer through the parameters having cast the pointer to the current parameter type.va_arg yields a prvalue, so even that doesn't need to perform unsafe pointer arithmetic, whatever other magic it's doing.Pointer addition of T *p and ptrdiff_t offset should just be defined as (T*)((intptr_t)p + offset * sizeof(T)) with wraparound semantics for the arithmetic. Then if the result is equal to some valid pointer, the result is that pointer. Yes, that may let you jump between different objects via arithmetic. So what? The language has offsetof already, so it's not like this is an undesirable result.offsetof is still useful when following the rules; it allows you to memcpy a data member into or out of a standard-layout object in a type-erased manner, knowing only its offset and size.Currently, that is usually not true. The below is undefined behavior:struct S { int a; int b; } s;
int c = 4;
std::memcpy(reinterpret_cast<std::byte *>(&s) + offsetof(S, b), &c, sizeof(int));The reason that it's undefined behavior is that the reinterpret_cast pointer does not point to an array of std::byte, so pointer arithmetic on it is undefined behavior. This is undefined behavior even before memcpy gets involved.That's primarily why I created this thread: the status quo is broken. Since programmers do this often, and even more expect pointers to work across adjacent separately constructed objects, it seems that the Standard is broken here, not the programmers.
在 2017年9月15日星期五 UTC+8上午6:01:39,Hyman Rosen写道:On Thu, Sep 14, 2017 at 5:13 PM, 'Edward Catmur' via ISO C++ Standard - Discussion <std-dis...@isocpp.org> wrote:va_arg yields a prvalue, so even that doesn't need to perform unsafe pointer arithmetic, whatever other magic it's doing.
It's not unsafe, just forbidden.How do you know it is safe when not guaranteed?This is all a weird sort of political correctness.Not all. History counts.
The overwhelming majority of computing takes place on systems that present a single flat address space to programs, and where pointer arithmetic is just integer arithmetic.No. Pointers in C++ are basically types. Pointer values are typed. If you want arithmetic on address, instead propose address type or break the type system in a whole.
But we are supposed to pretend that this isn't so, and that there is special fairy dust that gets sprinkled on some additions but not on others.No. This is the status quo: if nobody add more rules, or limitations, you can't rely on that assumption in the sense of the language.
Once you have done that in the standard, some one can then accuse you forbid such former-conforming implementations. Should all of them go to die? This is also more or less "a weird sort of political correctness".
Of course this is not totally unacceptable, e.g. removal of trigraphs. But in general, such changes should better make the specification simpler and be with less limitations to conform, to overcome the net cost of losing functionality or portability. Assuming a flat address space fails to be the evolution in that direction. (This can be a non-issue for a new language. This is never the case in C++.)It's even weirder because actual programs do that sort of unblessed pointer arithmetic all the time, and have been doing so forever.No. High-level languages should usually never rely on such low-level operations.
They are obliged by the view of system design to provide and utilize proper high-level abstractions to escape away from implementation details. If they have to leak the abstraction, they flaw by design.People have face such unblessed features for historical reasons. The case is just compromise. To bless such things back leads to more mess.
在 2017年9月15日星期五 UTC+8下午12:45:44,Nicol Bolas写道:On Friday, September 15, 2017 at 12:11:49 AM UTC-4, FrankHB1989 wrote:在 2017年9月14日星期四 UTC+8下午11:38:30,Nicol Bolas写道:On Thursday, September 14, 2017 at 5:04:28 AM UTC-4, Edward Catmur wrote:std::vector doesn't need to be itself magic; it can call a language support facility available to users as well as to implementors. This would be called whenever updating data() or size() to mark the range [data(), data() + size()) as amenable to pointer arithmetic of the pointer type.This is a really bad way of thinking. Implementing `vector` or `vector`-like constructs should not require such an expert-level of understanding of the object model and the use of exceedingly esoteric functions.Users have good reason to expect that, if you explicitly construct two objects of the same type beside each other in memory, then you can use pointer arithmetic to jump from one to another. Code that does this exists and is extremely prevalent.You're effectively proposing to tell all of these people that they have to call some function (which, FYI, doesn't actually do anything) in order to make code work. Even though it already works. People will simply not do it, and therefore compiler writers will refuse to optimize for it since it would break the world.So, what exactly have you gained over just making the code work?Did it ever work?... what are you asking here? Are you asking if non-standard library people have ever written implementations of `vector`? Yes, they have.The C++ object model has not been defined to allow it. My point is that this is a pattern of usage that our object model ought to allow. We shouldn't have to rely on esoteric functions to be able to implement something like this.Well, to make the problem clearer, let me borrow some terms from ISO C. Did it work in a strictly conforming way? That is, being directly portable in any conforming C++ implementations without any more assumptions that ISO C++ does not provide.
If it did work like that in the sense of C++, why you propose the change? (Otherwise it does not make much sense - things in namespace `std` already have their rights to be implemented by magic, with or without the guarantees provided by ISO C++.)You were arguing you want it to work, with assumptions not provided by the C++ object model or any other rules in current standard. Please define them first, e.g. an alternative object model, to replace the necessity of "exceedingly esoteric functions" you don't want to see. I'm curious whether it would be more complicated.
On Friday, September 15, 2017 at 3:49:01 AM UTC-4, FrankHB1989 wrote:在 2017年9月15日星期五 UTC+8下午12:45:44,Nicol Bolas写道:On Friday, September 15, 2017 at 12:11:49 AM UTC-4, FrankHB1989 wrote:在 2017年9月14日星期四 UTC+8下午11:38:30,Nicol Bolas写道:On Thursday, September 14, 2017 at 5:04:28 AM UTC-4, Edward Catmur wrote:std::vector doesn't need to be itself magic; it can call a language support facility available to users as well as to implementors. This would be called whenever updating data() or size() to mark the range [data(), data() + size()) as amenable to pointer arithmetic of the pointer type.This is a really bad way of thinking. Implementing `vector` or `vector`-like constructs should not require such an expert-level of understanding of the object model and the use of exceedingly esoteric functions.Users have good reason to expect that, if you explicitly construct two objects of the same type beside each other in memory, then you can use pointer arithmetic to jump from one to another. Code that does this exists and is extremely prevalent.You're effectively proposing to tell all of these people that they have to call some function (which, FYI, doesn't actually do anything) in order to make code work. Even though it already works. People will simply not do it, and therefore compiler writers will refuse to optimize for it since it would break the world.So, what exactly have you gained over just making the code work?Did it ever work?... what are you asking here? Are you asking if non-standard library people have ever written implementations of `vector`? Yes, they have.The C++ object model has not been defined to allow it. My point is that this is a pattern of usage that our object model ought to allow. We shouldn't have to rely on esoteric functions to be able to implement something like this.Well, to make the problem clearer, let me borrow some terms from ISO C. Did it work in a strictly conforming way? That is, being directly portable in any conforming C++ implementations without any more assumptions that ISO C++ does not provide.The question is irrelevant, since we're talking about whether we should change the standard to make it conforming. That is, we know that the standard doesn't allow it, and we're saying that it should.
If it did work like that in the sense of C++, why you propose the change? (Otherwise it does not make much sense - things in namespace `std` already have their rights to be implemented by magic, with or without the guarantees provided by ISO C++.)You were arguing you want it to work, with assumptions not provided by the C++ object model or any other rules in current standard. Please define them first, e.g. an alternative object model, to replace the necessity of "exceedingly esoteric functions" you don't want to see. I'm curious whether it would be more complicated.The changes have already been discussed in this thread. It's in the e-mail chain you replied to. But if you need it repeated, my suggestion was:> two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an arrayIt doesn't matter if it's "more complicated" than the current system. What matters is if the change:
1) Provides genuine benefit to users.2) Does not make the object model nonsensical.It clearly provides genuine benefit, since users are already doing it.
So clearly that's something people want to do.
And I don't see how it damages the object model. Each object's lifetime is still clear, and we already have the concept of "nested within" to allow objects to be dynamically created within other objects' storage. All we're doing is allowing you to effectively dynamically create arrays of objects.
Is this enough? For example, is "constructed adjacently" allowed to be normative solely in the standard without providing a definition of term? How does "contiguous sequence" apply?I see the point, but no formal wording. I doubt there can be bad things slip in easily, until an exhaustive list of actual changes are checked.
1) Provides genuine benefit to users.2) Does not make the object model nonsensical.It clearly provides genuine benefit, since users are already doing it.It provides compatibility to old code which was written without mind of this issue. This may benefit or not. Also note verbosity and limitations on operations does not effect much, as the case where `observer_ptr` is considered superior than raw pointers.
And I don't see how it damages the object model. Each object's lifetime is still clear, and we already have the concept of "nested within" to allow objects to be dynamically created within other objects' storage. All we're doing is allowing you to effectively dynamically create arrays of objects.This effectively changes the notion of validity of pointer values. Consider comparing of two iterator values from different vector objects with same type. You can't assume the result being meaningful because they can be from unrelated sequences. In this case the sequences assumed are array objects. The change claims arbitrary two "adjacent" objects are in the same sequence, which sounds like it requires the language to provide the definition of infinite number of implicit array lvalues with unknown bound (and they can be aliased randomly) on-the-fly. This implies a very strange picture on the object model, especially when [intro.object] is left unchanged.
在 2017年9月15日星期五 UTC+8下午11:12:17,Nicol Bolas写道:
And I don't see how it damages the object model. Each object's lifetime is still clear, and we already have the concept of "nested within" to allow objects to be dynamically created within other objects' storage. All we're doing is allowing you to effectively dynamically create arrays of objects.
This effectively changes the notion of validity of pointer values. Consider comparing of two iterator values from different vector objects with same type.
You can't assume the result being meaningful because they can be from unrelated sequences.
In this case the sequences assumed are array objects. The change claims arbitrary two "adjacent" objects are in the same sequence, which sounds like it requires the language to provide the definition of infinite number of implicit array lvalues with unknown bound (and they can be aliased randomly) on-the-fly. This implies a very strange picture on the object model, especially when [intro.object] is left unchanged.
On Friday, September 15, 2017 at 11:23:37 AM UTC-7, FrankHB1989 wrote:Is this enough? For example, is "constructed adjacently" allowed to be normative solely in the standard without providing a definition of term? How does "contiguous sequence" apply?I see the point, but no formal wording. I doubt there can be bad things slip in easily, until an exhaustive list of actual changes are checked.The formal wording is a formality; once we decide what we want, then we can write a proposal with formal wording.1) Provides genuine benefit to users.2) Does not make the object model nonsensical.It clearly provides genuine benefit, since users are already doing it.It provides compatibility to old code which was written without mind of this issue. This may benefit or not. Also note verbosity and limitations on operations does not effect much, as the case where `observer_ptr` is considered superior than raw pointers."Old code which was written without mind of this issue" - you mean, every current implementation of std::vector out there? I don't know that it's possible to implement std::vector entirely correctly without violating this rule, due to the presence of the reserve() function. The current state of things is that std::vector is necessarily a magic class.
And I don't see how it damages the object model. Each object's lifetime is still clear, and we already have the concept of "nested within" to allow objects to be dynamically created within other objects' storage. All we're doing is allowing you to effectively dynamically create arrays of objects.This effectively changes the notion of validity of pointer values. Consider comparing of two iterator values from different vector objects with same type. You can't assume the result being meaningful because they can be from unrelated sequences. In this case the sequences assumed are array objects. The change claims arbitrary two "adjacent" objects are in the same sequence, which sounds like it requires the language to provide the definition of infinite number of implicit array lvalues with unknown bound (and they can be aliased randomly) on-the-fly. This implies a very strange picture on the object model, especially when [intro.object] is left unchanged.This is why one of the rules I propose is that you can only do this for pointers that are part of the same block of storage. Two std::vectors would allocate memory separately from their allocator, meaning that even if the two arrays end up adjacent, the pointer arithmetic would still be undefined behavior. Similarly for two automatic arrays of the same type, even if the compiler happens to put them adjacent in stack memory, because such allocation is considered to be separate storage.
Melissa
I don't see it a desired abstraction even the wording can be patched in limited cases.
On Saturday, September 16, 2017 at 2:48:11 PM UTC-4, FrankHB1989 wrote:在 2017年9月16日星期六 UTC+8上午5:15:31,Nicol Bolas写道:On Friday, September 15, 2017 at 2:23:37 PM UTC-4, FrankHB1989 wrote:在 2017年9月15日星期五 UTC+8下午11:12:17,Nicol Bolas写道:And I don't see how it damages the object model. Each object's lifetime is still clear, and we already have the concept of "nested within" to allow objects to be dynamically created within other objects' storage. All we're doing is allowing you to effectively dynamically create arrays of objects.This effectively changes the notion of validity of pointer values. Consider comparing of two iterator values from different vector objects with same type.This is already explicitly undefined behavior. Iterators from different containers cannot be compared to one another. However, that's irrelevant because:You can't assume the result being meaningful because they can be from unrelated sequences.Allow me to quote myself again:> two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an arraySo unless those two containers got their allocations from the same storage, that can't happen.In this case the sequences assumed are array objects. The change claims arbitrary two "adjacent" objects are in the same sequence, which sounds like it requires the language to provide the definition of infinite number of implicit array lvalues with unknown bound (and they can be aliased randomly) on-the-fly. This implies a very strange picture on the object model, especially when [intro.object] is left unchanged.It implies only the following.A top-level object (one which is not explicitly a subobject) lives in a piece of storage. If there is a top-level object of the same type directly adjacent to it in that same storage, then it is legal to access that top-level object via pointer arithmetic from this object.Or to put it another way, you're not assuming anything is an array object. Pointer arithmetic now works for things that aren't arrays. It's now explicitly for accessing adjacent objects, either sibling top-level objects or sibling array elements.I think I see your point now. That's why I illustrated the limitations on iterators: just as an iterator, conceptually, a valid nonnull object pointer is always implicitly bound to a sequence, i.e. an array object. This is the base to build the semantics of pointer arithmetic operations, or, random access iteration operations. The proposed change turns it to rebuild the semantics on the layout properties occasionally provided by underlying memory model instead of the object model. This undermines the ability of reasoning on well-behaved pointer arithmetic operations severely, because it then needs more information on siblings which often can't be collected from a single context.Any function which takes a pointer as a parameter lacks knowledge that the pointer is "bound to a sequence, i.e. an array object". So it is already the case that reasoning is about pointer arithmetic is impaired.
Under the current wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer and know that it points into an array object of `T`s. Under the new wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer, and then look in the direction of the arithmetic to see if there are more `T`s, and that they all share the same storage and are not themselves subobjects.I submit that if you can reason about the former, then you have all of the information needed to reason about the latter. In both cases, you have to be able to walk through memory and determine what is actually in that region of storage pointed to by a pointer. You have to be able to turn an address into the nested sequence of objects that are pointed to by that address.
I don't see it a desired abstraction even the wording can be patched in limited cases.So, being able to implement `std::vector` in a platform-neutral fashion is not a desired feature of the language?
在 2017年9月17日星期日 UTC+8上午8:35:24,Nicol Bolas写道:On Saturday, September 16, 2017 at 2:48:11 PM UTC-4, FrankHB1989 wrote:在 2017年9月16日星期六 UTC+8上午5:15:31,Nicol Bolas写道:On Friday, September 15, 2017 at 2:23:37 PM UTC-4, FrankHB1989 wrote:在 2017年9月15日星期五 UTC+8下午11:12:17,Nicol Bolas写道:And I don't see how it damages the object model. Each object's lifetime is still clear, and we already have the concept of "nested within" to allow objects to be dynamically created within other objects' storage. All we're doing is allowing you to effectively dynamically create arrays of objects.This effectively changes the notion of validity of pointer values. Consider comparing of two iterator values from different vector objects with same type.This is already explicitly undefined behavior. Iterators from different containers cannot be compared to one another. However, that's irrelevant because:You can't assume the result being meaningful because they can be from unrelated sequences.Allow me to quote myself again:> two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an arraySo unless those two containers got their allocations from the same storage, that can't happen.In this case the sequences assumed are array objects. The change claims arbitrary two "adjacent" objects are in the same sequence, which sounds like it requires the language to provide the definition of infinite number of implicit array lvalues with unknown bound (and they can be aliased randomly) on-the-fly. This implies a very strange picture on the object model, especially when [intro.object] is left unchanged.It implies only the following.A top-level object (one which is not explicitly a subobject) lives in a piece of storage. If there is a top-level object of the same type directly adjacent to it in that same storage, then it is legal to access that top-level object via pointer arithmetic from this object.Or to put it another way, you're not assuming anything is an array object. Pointer arithmetic now works for things that aren't arrays. It's now explicitly for accessing adjacent objects, either sibling top-level objects or sibling array elements.I think I see your point now. That's why I illustrated the limitations on iterators: just as an iterator, conceptually, a valid nonnull object pointer is always implicitly bound to a sequence, i.e. an array object. This is the base to build the semantics of pointer arithmetic operations, or, random access iteration operations. The proposed change turns it to rebuild the semantics on the layout properties occasionally provided by underlying memory model instead of the object model. This undermines the ability of reasoning on well-behaved pointer arithmetic operations severely, because it then needs more information on siblings which often can't be collected from a single context.Any function which takes a pointer as a parameter lacks knowledge that the pointer is "bound to a sequence, i.e. an array object". So it is already the case that reasoning is about pointer arithmetic is impaired.This is caused by the design of type system.
Working around semantic limitations is not a fix of it.In general, the C++ type system is not power enough to encode such information in type signatures. Some form of gradual typing is needed. Though there would more problems if you do fix it, e.g. ABI.However, that does not implies that the knowledge cannot be reasoned besides typechecking.
Under the current wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer and know that it points into an array object of `T`s. Under the new wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer, and then look in the direction of the arithmetic to see if there are more `T`s, and that they all share the same storage and are not themselves subobjects.I submit that if you can reason about the former, then you have all of the information needed to reason about the latter. In both cases, you have to be able to walk through memory and determine what is actually in that region of storage pointed to by a pointer. You have to be able to turn an address into the nested sequence of objects that are pointed to by that address.Really? The former is not depending on the memory model directly. It needs only to know the length of array and one of the pointer values to nth element of the array, not the complete element type, nor the complete array type. Size of element is not interested here, nor is the layout of elements in the array.
On the contrary, the latter is not possible if you don't know the layout of members of the complete type it resides in, and the way to determine all the information is to calculate the layout depending on the complete type definitions and the memory model, or, to simulate the actual allocations.I don't see it a desired abstraction even the wording can be patched in limited cases.So, being able to implement `std::vector` in a platform-neutral fashion is not a desired feature of the language?No. To be accurate, I agree that allowing implementation of `std::vector` being portable (in sense of conforming C++ programs) is a convenience.However:1. Technically, it is a convenience, not a must, even lacking of the convenience easily cause bad things we can see, politically.
2. It has pros and cons. Not all cases need such convenience and it can harm users not relying on it in reality. For instance, insufficient vendor-based optimization. Users can hardly rescue the case unless they turn themselves to be vendors.3. As I have said, it should not be only applicable on `std::vector`. It is strange to allow `std::vector` as the only special case. If the ability is important to keep, why not only `std::vector`?
On Sunday, September 17, 2017 at 2:13:09 AM UTC-4, FrankHB1989 wrote:在 2017年9月17日星期日 UTC+8上午8:35:24,Nicol Bolas写道:On Saturday, September 16, 2017 at 2:48:11 PM UTC-4, FrankHB1989 wrote:在 2017年9月16日星期六 UTC+8上午5:15:31,Nicol Bolas写道:On Friday, September 15, 2017 at 2:23:37 PM UTC-4, FrankHB1989 wrote:在 2017年9月15日星期五 UTC+8下午11:12:17,Nicol Bolas写道:And I don't see how it damages the object model. Each object's lifetime is still clear, and we already have the concept of "nested within" to allow objects to be dynamically created within other objects' storage. All we're doing is allowing you to effectively dynamically create arrays of objects.This effectively changes the notion of validity of pointer values. Consider comparing of two iterator values from different vector objects with same type.This is already explicitly undefined behavior. Iterators from different containers cannot be compared to one another. However, that's irrelevant because:You can't assume the result being meaningful because they can be from unrelated sequences.Allow me to quote myself again:> two non-subobjects of the same dynamic type, constructed adjacently in the same storage, can have pointer arithmetic used on them as though they were in an arraySo unless those two containers got their allocations from the same storage, that can't happen.In this case the sequences assumed are array objects. The change claims arbitrary two "adjacent" objects are in the same sequence, which sounds like it requires the language to provide the definition of infinite number of implicit array lvalues with unknown bound (and they can be aliased randomly) on-the-fly. This implies a very strange picture on the object model, especially when [intro.object] is left unchanged.It implies only the following.A top-level object (one which is not explicitly a subobject) lives in a piece of storage. If there is a top-level object of the same type directly adjacent to it in that same storage, then it is legal to access that top-level object via pointer arithmetic from this object.Or to put it another way, you're not assuming anything is an array object. Pointer arithmetic now works for things that aren't arrays. It's now explicitly for accessing adjacent objects, either sibling top-level objects or sibling array elements.I think I see your point now. That's why I illustrated the limitations on iterators: just as an iterator, conceptually, a valid nonnull object pointer is always implicitly bound to a sequence, i.e. an array object. This is the base to build the semantics of pointer arithmetic operations, or, random access iteration operations. The proposed change turns it to rebuild the semantics on the layout properties occasionally provided by underlying memory model instead of the object model. This undermines the ability of reasoning on well-behaved pointer arithmetic operations severely, because it then needs more information on siblings which often can't be collected from a single context.Any function which takes a pointer as a parameter lacks knowledge that the pointer is "bound to a sequence, i.e. an array object". So it is already the case that reasoning is about pointer arithmetic is impaired.This is caused by the design of type system.The cause is irrelevant; the fact is that it's the way the system works. You can pass pointers to functions, and those functions can do pointer arithmetic on them if those pointers happen to point to the right thing.
Working around semantic limitations is not a fix of it.In general, the C++ type system is not power enough to encode such information in type signatures. Some form of gradual typing is needed. Though there would more problems if you do fix it, e.g. ABI.However, that does not implies that the knowledge cannot be reasoned besides typechecking.It doesn't "imply" anything; it outright states it. A `T*` is a pointer; that's a type. There is no type-based way to distinguish a pointer to an array element from a pointer to something that isn't an array element. Therefore, typechecking alone cannot be used to determine if pointer arithmetic is valid.That is the way C++ works today. Given that fact, there is no reason why we can't extend the set of conditions in which pointer arithmetic works.
Under the current wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer and know that it points into an array object of `T`s. Under the new wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer, and then look in the direction of the arithmetic to see if there are more `T`s, and that they all share the same storage and are not themselves subobjects.I submit that if you can reason about the former, then you have all of the information needed to reason about the latter. In both cases, you have to be able to walk through memory and determine what is actually in that region of storage pointed to by a pointer. You have to be able to turn an address into the nested sequence of objects that are pointed to by that address.Really? The former is not depending on the memory model directly. It needs only to know the length of array and one of the pointer values to nth element of the array, not the complete element type, nor the complete array type. Size of element is not interested here, nor is the layout of elements in the array.You forget what we're doing here.We're trying to reason about the object model implications about a particular piece of code. Specifically, we have a function that takes a pointer. That function is performing pointer arithmetic on that pointer. We're trying to see what it would take for the object model to be able to verify, at runtime, if that pointer arithmetic would result in well-defined behavior. What must the object model be able to determine from that pointer.Well, in order to get to "runtime", we must first get past compile-time. And it is ill-formed for code to perform pointer arithmetic on `T*` if `T` is incomplete. Therefore, we must already know the "size of element".
Furthermore, you do need to know the "layout of elements in the array", because you need to make sure that the type `T` that points to is the same type as the array element type. After all, the `T*` could be pointing to the first subobject of the array element type `U`. Since `U*` will have the same address as the `T*` first subobject, you need to be able to differentiate between these cases.Knowing that a `T*` happens to point into an array isn't enough; it must point into an array of `T`. It must be a direct subobject of an array of `T`.So given a `T*`, we have to be able to ask (at least) if it is a subobject and, if it is, if that containing object is an array of `T`.
And if you have the length of the array, and the array element type, then by definition you know the "complete array type".The only way to determine the well-behaved status of this is to have data structures in memory that can take a typed pointer and get the containing object. And if you can do this once, you can do this repeatedly until there is no containing object.The only difference between the "pointer arithmetic on arrays" and "pointer arithmetic on top-level sequences" is that the former can have "containing object" be based only on static subobject definitions, while the latter requires "containing object" to handle dynamic "nested-within" objects. But really, if you have a memory system that can compute the former, there's no reason you can't extend it to compute the latter.
On the contrary, the latter is not possible if you don't know the layout of members of the complete type it resides in, and the way to determine all the information is to calculate the layout depending on the complete type definitions and the memory model, or, to simulate the actual allocations.I don't see it a desired abstraction even the wording can be patched in limited cases.So, being able to implement `std::vector` in a platform-neutral fashion is not a desired feature of the language?No. To be accurate, I agree that allowing implementation of `std::vector` being portable (in sense of conforming C++ programs) is a convenience.However:1. Technically, it is a convenience, not a must, even lacking of the convenience easily cause bad things we can see, politically.... huh?
2. It has pros and cons. Not all cases need such convenience and it can harm users not relying on it in reality. For instance, insufficient vendor-based optimization. Users can hardly rescue the case unless they turn themselves to be vendors.3. As I have said, it should not be only applicable on `std::vector`. It is strange to allow `std::vector` as the only special case. If the ability is important to keep, why not only `std::vector`?I don't know what you're talking about. The feature we're discussing here would not be limited to `vector`. It's simply the most obvious use case for it and the biggest justification for having it.
在 2017年9月17日星期日 UTC+8下午9:22:29,Nicol Bolas写道:On Sunday, September 17, 2017 at 2:13:09 AM UTC-4, FrankHB1989 wrote:在 2017年9月17日星期日 UTC+8上午8:35:24,Nicol Bolas写道:On Saturday, September 16, 2017 at 2:48:11 PM UTC-4, FrankHB1989 wrote:
Working around semantic limitations is not a fix of it.In general, the C++ type system is not power enough to encode such information in type signatures. Some form of gradual typing is needed. Though there would more problems if you do fix it, e.g. ABI.However, that does not implies that the knowledge cannot be reasoned besides typechecking.It doesn't "imply" anything; it outright states it. A `T*` is a pointer; that's a type. There is no type-based way to distinguish a pointer to an array element from a pointer to something that isn't an array element. Therefore, typechecking alone cannot be used to determine if pointer arithmetic is valid.That is the way C++ works today. Given that fact, there is no reason why we can't extend the set of conditions in which pointer arithmetic works.There is also no sufficient reason about why it has to be extended in this way.
Under the current wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer and know that it points into an array object of `T`s. Under the new wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer, and then look in the direction of the arithmetic to see if there are more `T`s, and that they all share the same storage and are not themselves subobjects.I submit that if you can reason about the former, then you have all of the information needed to reason about the latter. In both cases, you have to be able to walk through memory and determine what is actually in that region of storage pointed to by a pointer. You have to be able to turn an address into the nested sequence of objects that are pointed to by that address.Really? The former is not depending on the memory model directly. It needs only to know the length of array and one of the pointer values to nth element of the array, not the complete element type, nor the complete array type. Size of element is not interested here, nor is the layout of elements in the array.You forget what we're doing here.We're trying to reason about the object model implications about a particular piece of code. Specifically, we have a function that takes a pointer. That function is performing pointer arithmetic on that pointer. We're trying to see what it would take for the object model to be able to verify, at runtime, if that pointer arithmetic would result in well-defined behavior. What must the object model be able to determine from that pointer.Well, in order to get to "runtime", we must first get past compile-time. And it is ill-formed for code to perform pointer arithmetic on `T*` if `T` is incomplete. Therefore, we must already know the "size of element".To reason a program does not mean to figure out all the information that can be determined by the program semantics. The fact that an conforming implementation must have known the exact size of the type in the well-formed program does not mean that the one who are reasoning the program should.
If the array in the program compiled, the element type does have a positive size, but the exact value is not necessarily interested in reasoning. For example, it is just a common divisor in boundary checking based on address calculation; so why not use the operand in pointer arithmetic directly instead of addresses?Furthermore, you do need to know the "layout of elements in the array", because you need to make sure that the type `T` that points to is the same type as the array element type. After all, the `T*` could be pointing to the first subobject of the array element type `U`. Since `U*` will have the same address as the `T*` first subobject, you need to be able to differentiate between these cases.Knowing that a `T*` happens to point into an array isn't enough; it must point into an array of `T`. It must be a direct subobject of an array of `T`.So given a `T*`, we have to be able to ask (at least) if it is a subobject and, if it is, if that containing object is an array of `T`.Verifying correctness with type-based aliasing is a different scene which requires other rules. And if the propose is to figure out all the possible undefined behavior upon the rules derived from the object model (at least, as per [object.life]), the information provided here is still far from enough.
And if you have the length of the array, and the array element type, then by definition you know the "complete array type".The only way to determine the well-behaved status of this is to have data structures in memory that can take a typed pointer and get the containing object. And if you can do this once, you can do this repeatedly until there is no containing object.The only difference between the "pointer arithmetic on arrays" and "pointer arithmetic on top-level sequences" is that the former can have "containing object" be based only on static subobject definitions, while the latter requires "containing object" to handle dynamic "nested-within" objects. But really, if you have a memory system that can compute the former, there's no reason you can't extend it to compute the latter.Simplicity.
Locality.
Probably more importantly, computation complexity matters a lot, when it is checked using my brain during coding.
On the contrary, the latter is not possible if you don't know the layout of members of the complete type it resides in, and the way to determine all the information is to calculate the layout depending on the complete type definitions and the memory model, or, to simulate the actual allocations.I don't see it a desired abstraction even the wording can be patched in limited cases.So, being able to implement `std::vector` in a platform-neutral fashion is not a desired feature of the language?No. To be accurate, I agree that allowing implementation of `std::vector` being portable (in sense of conforming C++ programs) is a convenience.However:1. Technically, it is a convenience, not a must, even lacking of the convenience easily cause bad things we can see, politically.... huh?I did not say solution of the CWG issue itself is not needed. But the proposed rules change is not necessarily needed.See the proposed alternatives by others.
2. It has pros and cons. Not all cases need such convenience and it can harm users not relying on it in reality. For instance, insufficient vendor-based optimization. Users can hardly rescue the case unless they turn themselves to be vendors.3. As I have said, it should not be only applicable on `std::vector`. It is strange to allow `std::vector` as the only special case. If the ability is important to keep, why not only `std::vector`?I don't know what you're talking about. The feature we're discussing here would not be limited to `vector`. It's simply the most obvious use case for it and the biggest justification for having it.This is like X-Y problem.If it is really desired to allow `std::vector` in platform-neutral fashion, why not just guarantee "magic-free" property for implementation standard library components in general? Even `std::vector` is a notable example frustrating users, it is debatable to be special enough as the special case to get away from the magic.
On Sunday, September 17, 2017 at 1:25:13 PM UTC-4, FrankHB1989 wrote:在 2017年9月17日星期日 UTC+8下午9:22:29,Nicol Bolas写道:On Sunday, September 17, 2017 at 2:13:09 AM UTC-4, FrankHB1989 wrote:在 2017年9月17日星期日 UTC+8上午8:35:24,Nicol Bolas写道:On Saturday, September 16, 2017 at 2:48:11 PM UTC-4, FrankHB1989 wrote:Working around semantic limitations is not a fix of it.In general, the C++ type system is not power enough to encode such information in type signatures. Some form of gradual typing is needed. Though there would more problems if you do fix it, e.g. ABI.However, that does not implies that the knowledge cannot be reasoned besides typechecking.It doesn't "imply" anything; it outright states it. A `T*` is a pointer; that's a type. There is no type-based way to distinguish a pointer to an array element from a pointer to something that isn't an array element. Therefore, typechecking alone cannot be used to determine if pointer arithmetic is valid.That is the way C++ works today. Given that fact, there is no reason why we can't extend the set of conditions in which pointer arithmetic works.There is also no sufficient reason about why it has to be extended in this way.The reasons have already been explained. Whether you find them "sufficient" is a personal choice.To me, so long as the object model remains reasonable and coherent, fixing the problem is worth it.
Under the current wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer and know that it points into an array object of `T`s. Under the new wording, in order to dynamically know whether pointer arithmetic is legitimate, you have to be able to look at the pointer, and then look in the direction of the arithmetic to see if there are more `T`s, and that they all share the same storage and are not themselves subobjects.I submit that if you can reason about the former, then you have all of the information needed to reason about the latter. In both cases, you have to be able to walk through memory and determine what is actually in that region of storage pointed to by a pointer. You have to be able to turn an address into the nested sequence of objects that are pointed to by that address.Really? The former is not depending on the memory model directly. It needs only to know the length of array and one of the pointer values to nth element of the array, not the complete element type, nor the complete array type. Size of element is not interested here, nor is the layout of elements in the array.You forget what we're doing here.We're trying to reason about the object model implications about a particular piece of code. Specifically, we have a function that takes a pointer. That function is performing pointer arithmetic on that pointer. We're trying to see what it would take for the object model to be able to verify, at runtime, if that pointer arithmetic would result in well-defined behavior. What must the object model be able to determine from that pointer.Well, in order to get to "runtime", we must first get past compile-time. And it is ill-formed for code to perform pointer arithmetic on `T*` if `T` is incomplete. Therefore, we must already know the "size of element".To reason a program does not mean to figure out all the information that can be determined by the program semantics. The fact that an conforming implementation must have known the exact size of the type in the well-formed program does not mean that the one who are reasoning the program should.I disagree. You cannot reason about an ill-formed program, since by definition, an ill-formed program is semantic nonsense. If it's ill-formed, it is not a C++ program. And I submit that you cannot use C++ logic to reason about things that aren't C++ programs.
If the array in the program compiled, the element type does have a positive size, but the exact value is not necessarily interested in reasoning. For example, it is just a common divisor in boundary checking based on address calculation; so why not use the operand in pointer arithmetic directly instead of addresses?Furthermore, you do need to know the "layout of elements in the array", because you need to make sure that the type `T` that points to is the same type as the array element type. After all, the `T*` could be pointing to the first subobject of the array element type `U`. Since `U*` will have the same address as the `T*` first subobject, you need to be able to differentiate between these cases.Knowing that a `T*` happens to point into an array isn't enough; it must point into an array of `T`. It must be a direct subobject of an array of `T`.So given a `T*`, we have to be able to ask (at least) if it is a subobject and, if it is, if that containing object is an array of `T`.Verifying correctness with type-based aliasing is a different scene which requires other rules. And if the propose is to figure out all the possible undefined behavior upon the rules derived from the object model (at least, as per [object.life]), the information provided here is still far from enough.My point is that the proposed solution does not render the object model to be incoherent or nonsensical. It is just as reasonable as the old version, requiring the same reasoning tools that the old version required.
It simply has different answers for different situations. In the original version, you say that pointer arithmetic moves through an array. In the new version, you say that pointer arithmetic is moving between sequential top-level objects (which arrays are a subset of). If you find that difficult to reason about... I can't really help that.
And if you have the length of the array, and the array element type, then by definition you know the "complete array type".The only way to determine the well-behaved status of this is to have data structures in memory that can take a typed pointer and get the containing object. And if you can do this once, you can do this repeatedly until there is no containing object.The only difference between the "pointer arithmetic on arrays" and "pointer arithmetic on top-level sequences" is that the former can have "containing object" be based only on static subobject definitions, while the latter requires "containing object" to handle dynamic "nested-within" objects. But really, if you have a memory system that can compute the former, there's no reason you can't extend it to compute the latter.Simplicity.Necessary functionality trumps simplicity.
Locality.Locality of what? I already demonstrated that the current system has no greater "locality" than this one.
Probably more importantly, computation complexity matters a lot, when it is checked using my brain during coding.On the contrary, the latter is not possible if you don't know the layout of members of the complete type it resides in, and the way to determine all the information is to calculate the layout depending on the complete type definitions and the memory model, or, to simulate the actual allocations.I don't see it a desired abstraction even the wording can be patched in limited cases.So, being able to implement `std::vector` in a platform-neutral fashion is not a desired feature of the language?No. To be accurate, I agree that allowing implementation of `std::vector` being portable (in sense of conforming C++ programs) is a convenience.However:1. Technically, it is a convenience, not a must, even lacking of the convenience easily cause bad things we can see, politically.... huh?I did not say solution of the CWG issue itself is not needed. But the proposed rules change is not necessarily needed.See the proposed alternatives by others.The only alternative on this thread which isn't merely an alternate statement or limited version of what I described is "use magic function to declare that a region of sequential top-level objects contains an array". Which is an expert-only tool, since non-experts would never even guess that such a thing would be needed.
Dynamic creation of arrays should not be an expert-only thing.2. It has pros and cons. Not all cases need such convenience and it can harm users not relying on it in reality. For instance, insufficient vendor-based optimization. Users can hardly rescue the case unless they turn themselves to be vendors.3. As I have said, it should not be only applicable on `std::vector`. It is strange to allow `std::vector` as the only special case. If the ability is important to keep, why not only `std::vector`?I don't know what you're talking about. The feature we're discussing here would not be limited to `vector`. It's simply the most obvious use case for it and the biggest justification for having it.This is like X-Y problem.If it is really desired to allow `std::vector` in platform-neutral fashion, why not just guarantee "magic-free" property for implementation standard library components in general? Even `std::vector` is a notable example frustrating users, it is debatable to be special enough as the special case to get away from the magic.What part of "The feature we're discussing here would not be limited to `vector`" eluded you? `vector` already works. The point is not to make `vector` work. It's to allow users to write their own dynamic arrays, which includes the possibility of writing `vector`, but also any other kind of dynamic array type.
And to do so without deep knowledge of esoteric C++ functions.