alignas(int) char buf[sizeof(int)];
void f() {
// turn the memory into an int: (??) from the POV of the abstract machine!
::new (buf) int; // is this strictly required? (aside: it's obviously a no-op)
// access storage:
*((int*)buf) = 42; // for this discussion, just assume the cast itself yields the correct pointer value
}
Now, I'm *not* asking whether the current C++ Standard requires - or not - the noop placement new for this code to be defined.
What I would be interested in is whether this has been discussed in the committee (CWG?) in the last very few years
and whether there is any agreement if omitting the placement new (for trivial type) should be allowed or if Standard C++ should absolutely require the placement new.
Simple links to any paper(s) discussing this would be already appreciated, the only reference I found was P0137R1, and that's more about clarifying current wording afaikt.
Thanks.
- Martin
p.s.: (*) is "trivial type" the correct term?
p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers
that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)
Hi.
I'm currently trying to understand a few ... interesting ... observations I have been making wrt. the C++ Standard and using char arrays as raw storage.
Essentially, as far as I can tell (have been told), the current C++ Standard only allows using a char array as raw storage (see also std::aligned_storage) when objects are put into this via placement new, even for e.g. int or other trivial(*) types.
See: http://stackoverflow.com/questions/41624685/is-placement-new-legally-required-for-putting-an-int-into-a-char-array or related questions where I'm told I'm expected to do the following:alignas(int) char buf[sizeof(int)]; void f() { // turn the memory into an int: (??) from the POV of the abstract machine! ::new (buf) int; // is this strictly required? (aside: it's obviously a no-op) // access storage: *((int*)buf) = 42; // for this discussion, just assume the cast itself yields the correct pointer value }
Now, I'm *not* asking whether the current C++ Standard requires - or not - the noop placement new for this code to be defined.
What I would be interested in is whether this has been discussed in the committee (CWG?) in the last very few years
and whether there is any agreement if omitting the placement new (for trivial type) should be allowed or if Standard C++ should absolutely require the placement new.
Simple links to any paper(s) discussing this would be already appreciated, the only reference I found was P0137R1, and that's more about clarifying current wording afaikt.
Thanks.
- Martin
p.s.: (*) is "trivial type" the correct term?
p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers
that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)
I cannot tell you if any discussion has been had. However, ...
... So the only change has been essentially a defect fix that makes unions actually work, in accord with the standard.
In at least 12 years of standardization, the committee has made no substantive change to the causes of bringing an object into being. While this is not conclusive, the fact that C++17 did put a fix into this section means that they have looked at it and talked about it at some point. So I would suggest that, if there was discussion about it, it did not progress beyond discussion.
-snip-
p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers
that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)
Compilers do treat it as UB. UB doesn't mean "crash"; UB can still do what you want.
The point of the UB designation is to allow implementations to be reasonably fast. If you reinterpret cast a pointer to a different type, the compiler doesn't have to check to see if that object really exists there; it will simply trust your cast and pretend that there is an object there.
On Monday, January 16, 2017 at 2:08:39 PM UTC-5, Martin Ba wrote:On Monday, January 16, 2017 at 12:37:00 AM UTC+1, Nicol Bolas wrote:I cannot tell you if any discussion has been had. However, ...
... So the only change has been essentially a defect fix that makes unions actually work, in accord with the standard.
In at least 12 years of standardization, the committee has made no substantive change to the causes of bringing an object into being. While this is not conclusive, the fact that C++17 did put a fix into this section means that they have looked at it and talked about it at some point. So I would suggest that, if there was discussion about it, it did not progress beyond discussion.
Thanks a lot for that wrap up!
-snip-
p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers
that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)
Compilers do treat it as UB. UB doesn't mean "crash"; UB can still do what you want.
The point of the UB designation is to allow implementations to be reasonably fast. If you reinterpret cast a pointer to a different type, the compiler doesn't have to check to see if that object really exists there; it will simply trust your cast and pretend that there is an object there.
What I meant by "treating it as UB" was in the same vein as, e.g., signed integer overflow. Compilers generate code today that doesn't work anymore if it relies/relied on signed integer overflow, although older optimizer didn't "break" anything.In the same vein, I'm sure we can imagine several transformations that break code that has no "placement new" (from my OP) that used (and uses) to work.
Such as?
Assuming a lack of signed integer overflow means that the compiler doesn't have to insert code to check for integer overflow. The UB designation allows correct code (code without overflows) to execute at maximum performance. Any degrading of incorrect code is merely a consequence of making correct code as fast as possible.
Let's say that you have a function that returns a `T*`. The fastest code generated which uses this return value is code which assumes that `T*` points to a live, valid object of type `T`. To do anything else makes correct code slower. Even if you inlined that function or could otherwise be certain that the `T*` was not valid, that simply means UB happens. Do you think compiler writers are going to detect such circumstances and make the code fail in some way?
Can you give an example of these "several transformations"? How would they speed up correct code?
-fdelete-null-pointer-checks - (see e.g. http://stackoverflow.com/questions/23153445/can-branches-with-undefined-behavior-be-assumed-unreachable-and-optimized-as-dea) the compiler sees a branch that definitiely invokes UB and optimizes away the branch and the branch check.
It should also be noted that, well, we can trace this rule back at least 12 years. Compilers haven't done anything to break such code yet.
delete-null-pointer-checks have happened in the sense that compiler writers saw legal optimization opportunities that break some code. So, just because I or you cannot see any reason today, that's not much consolation to me :-)
In the same vein, I'm sure we can imagine several transformations that break code that has no "placement new" (from my OP) that used (and uses) to work.
Such as?
On 01/15/2017 09:56 PM, Martin Ba wrote:
> Hi.
>
> I'm currently trying to understand a few ... interesting ... observations I have been making wrt. the C++ Standard and using char arrays as raw storage.
>
> Essentially, as far as I can tell (have been told), the current C++ Standard only allows using a char array as raw storage (see also std::aligned_storage) when objects are put into this via placement new, even for e.g. int or other trivial(*) types.
>
> See: http://stackoverflow.com/questions/41624685/is-placement-new-legally-required-for-putting-an-int-into-a-char-array or related questions where I'm told I'm expected to do the following:
>
> |alignas(int) char buf[sizeof(int)];
>
> void f() {
> // turn the memory into an int: (??) from the POV of the abstract machine!
> ::new (buf) int; // is this strictly required? (aside: it's obviously a no-op)
>
> // access storage:
> *((int*)buf) = 42; // for this discussion, just assume the cast itself yields the correct pointer value
> }
> What I would be interested in is whether this has been discussed in the committee (CWG?) in the last very few years
> and whether there is any agreement if omitting the placement new (for trivial type) should be allowed or if Standard C++ should absolutely require the placement new./
I believe I can say that CWG agrees that the words now in C++17 correctly
reflect the intent that you need the placement new in the case above.
If you believe that intent is misguided, feel free to propose a change.
All the change I can propose is that CWG considers some way to make this work. (As it does in practice anyway *today*.) As I understand so far from what I gleaned from P0137R1 is that the problem we have at the moment is that the definition for objects (in the memory location sense) doesn't allow this and that it's pretty complex and hard to come up with something that does allow it without restricting other things.
I'm sure compiler writers will explain to you how that substantially
pessimizes their code generation.
> p.s.: (*) is "trivial type" the correct term?
>
> p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers
> that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)
Some compilers might make special allowances for their particular user
community, precisely out of concerns you cited. That doesn't make your
code any better.
On Monday, January 16, 2017 at 2:08:39 PM UTC-5, Martin Ba wrote:In the same vein, I'm sure we can imagine several transformations that break code that has no "placement new" (from my OP) that used (and uses) to work.
Such as?
On Monday, January 16, 2017 at 10:11:19 PM UTC+1, Jens Maurer wrote:On 01/15/2017 09:56 PM, Martin Ba wrote:
> Hi.
>
> I'm currently trying to understand a few ... interesting ... observations I have been making wrt. the C++ Standard and using char arrays as raw storage.
>
> Essentially, as far as I can tell (have been told), the current C++ Standard only allows using a char array as raw storage (see also std::aligned_storage) when objects are put into this via placement new, even for e.g. int or other trivial(*) types.
>
> See: http://stackoverflow.com/questions/41624685/is-placement-new-legally-required-for-putting-an-int-into-a-char-array or related questions where I'm told I'm expected to do the following:
>
> |alignas(int) char buf[sizeof(int)];
>
> void f() {
> // turn the memory into an int: (??) from the POV of the abstract machine!
> ::new (buf) int; // is this strictly required? (aside: it's obviously a no-op)
>
> // access storage:
> *((int*)buf) = 42; // for this discussion, just assume the cast itself yields the correct pointer value
> }
> What I would be interested in is whether this has been discussed in the committee (CWG?) in the last very few years
> and whether there is any agreement if omitting the placement new (for trivial type) should be allowed or if Standard C++ should absolutely require the placement new./
I believe I can say that CWG agrees that the words now in C++17 correctly
reflect the intent that you need the placement new in the case above.
If this is really the intent, then this needs to be more clearly communicated and, I feel, rationalized. (Maybe it already has? Thats what the OP was actually about.)
If you believe that intent is misguided, feel free to propose a change.
Yes, I very much feel the intent is misguided. For two reasons:
- This intent declares UB totally reasonable legacy code. At least I consider it reasonable too *not* have to place a no-op placement new in straightforward buffer backed code for trivial types.
- Since C doesn't have placement new, any C code that uses a char buffer to back any other typed data is automatically UB in C++. Another unnecessary incompatibility.
All the change I can propose is that CWG considers some way to make this work. (As it does in practice anyway *today*.) As I understand so far from what I gleaned from P0137R1 is that the problem we have at the moment is that the definition for objects (in the memory location sense) doesn't allow this and that it's pretty complex and hard to come up with something that does allow it without restricting other things.
I'm sure compiler writers will explain to you how that substantially
pessimizes their code generation.
For this specific case, I do hope not. I'm braced for anything.
> p.s.: (*) is "trivial type" the correct term?
>
> p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers
> that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)
Some compilers might make special allowances for their particular user
community, precisely out of concerns you cited. That doesn't make your
code any better.
As far as the C++ Standard goes, I'm not so much concerned with "better" but with not allowing future compilers to break reasonable legacy code.
*When* using char arrays (or malloc'ed memory) as backing store for trivial types, I fully assume most (non generic) existing code to *not* employ placement new, simply because it's the straightforward thing to (not) do and the placement new would be a no-op and all compilers up to today seem to generate working code.
I think, here, the C++ Standard should take into account this "existing practice". (Yeah, I know the same arguments were/are raised wrt. signed integer overflow or the nullpointer-check-elimination, but I at least feel those cases, while possible problematic in quite some cases, are historically quite more clear cut. And at least both affect C and C++ code the same.)
cheers.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
So, how would you suggest the object model change to accommodate such a thing? What is the syntax that causes a piece of storage that contains all objects to contain just one?
Personally? I say let it go. C++ programmers have managed to survive this being UB since at least 2004. We're teaching C++ programmers nowadays to avoid pointless casting; the average C++ programmer today is far more likely to employ placement-new than to do casts and assume it was constructed.
I'd rather the committee spend time shoring up the object model for genuine C++ purposes, like making it possible for `vector` to be implemented without UB.
--
So where would you be allocating these different objects?
--
I very much disagree.C++ can use placement new, true. But C cannot, and many programs need to compile and run as both.Furthermore, I don't know of any reasonable way a compiler could exploit this to produce better code. Strict aliasing doesn't apply, since char pointers can alias anything. More importantly, you make it impossible to take an aligned char array — say, one filled in by an I/O operation — and cast it to an array of (say) int without an O(n) copy and a 2x memory overhead! That is anything BUT fast. C++ should NOT impose such overheads.
> Of course you have to explain
> to the implementation that each raw memory location corresponds to an
> int, and *you* imposed that burden on yourself--not the language.
The point I'm making is that the requirement to explain this is imposed
by the standard, while there is no technical reason to do that.
It seems to me that this entire discussion can be paraphrased as, "is c++ compatible with c or not?". [compatibility being defined as PODs legally created by one are valid in the other].The answer appears, at least insofar as the standard is concerned, "no, or at best, implementation defined".I think it would be fair to say that the vast majority of c++'s user base would hope that the answer is, "yes".I think it's therefore reasonable that the standard should give PODs special treatment, allowing them to have been created (default-initialised) simply by allocating properly aligned storage of sufficient size.This would then allow c++ to interoperate with c (in both directions) both legally and de-facto.
--
> Yes, I very much feel the intent is misguided. For two reasons:
>
> * This intent declares UB totally reasonable legacy code.
Even legacy code should have used "memcpy" here.
switch (*reinterpret_cast<uint16_t const*>(buf)) {
case MsgA::value:
handle(*reinterpret_cast<MsgA const*>(buf);
break;
case MsgB::value:
handle(*reinterpret_cast<MsgB const*>(buf);
break;
// ...
}
uint16_t msgType;
memcpy(&msgType, buf, sizeof(msgType));
switch (msgType) {
case MsgA::value: {
MsgA msg;
memcpy(&msg, buf, sizeof(msg));
handle(msg);
break;
}
case MsgB::value: {
MsgB msg;
memcpy(&msg, buf, sizeof(msg));
handle(msg);
break;
}
template <class T, class U>
T* start_lifetime_of_object_without_any_initialization_cast(U*);
> It would also create a dysfunctional C++ object model. A piece of storage would have to have every object that could fit into it all at the same time. >You're basically saying that any piece of memory should be able to be treated as a union of all appropriate types.
> As I understand it, that's not even the C object model; it manifests objects in arbitrary memory by you writing to them.OK, I think it's reasonable to require an intrinsic to have been written to before it is an 'object'. That makes sense as it allows the use of sentinels etc. Presumably this is why C mandates it this way.So I'll modify my argument while continuing on the theme: What's sauce for the goose ought to be sauce for the supposedly compatible gander.I argue (and I don't think I am alone) that for intrinsics and PODs solely thereof, elements that have been written to, either by C or C++ ought to have been constructed. If they have been written to through a cast of a correctly aligned memory pointer, they ought to 'exist' in that actual memory [subject to as-if-compatible optimisations, of course]. This is what we expect in C, and it is arguably what we would be reasonable to expect in C++.I accept that this would require a differentiation in handling between PODs and non-POD structs in the standard. I think that's reasonable:When constructors, destructors, copy, move ops are non-trivial, we expect 'object-like' behaviour. When they are trivial (particularly when they are NOPs) we expect memory-like behaviour.Again, this is the de-facto reality on which the code base of every c++ program that calls a C library depends. Why not codify a de-facto reality in order to legitimise it?
auto alloc = malloc(sizeof(int) * n);
get_ints(alloc, n, ...);
auto ints = reinterpret_cast<int*>(alloc);
ints[5]; //UB