Comitee stance on using aligned char arrays as raw storage without placement new for trivial types?

3,059 views
Skip to first unread message

Martin Ba

unread,
Jan 15, 2017, 3:56:56 PM1/15/17
to ISO C++ Standard - Discussion
Hi.

I'm currently trying to understand a few ... interesting ... observations I have been making wrt. the C++ Standard and using char arrays as raw storage.

Essentially, as far as I can tell (have been told), the current C++ Standard only allows using a char array as raw storage (see also std::aligned_storage) when objects are put into this via placement new, even for e.g. int or other trivial(*) types.

See: http://stackoverflow.com/questions/41624685/is-placement-new-legally-required-for-putting-an-int-into-a-char-array or related questions where I'm told I'm expected to do the following:

alignas(int) char buf[sizeof(int)];

void f() {
  // turn the memory into an int: (??) from the POV of the abstract machine!
  ::new (buf) int; // is this strictly required? (aside: it's obviously a no-op)

  // access storage:
  *((int*)buf) = 42; // for this discussion, just assume the cast itself yields the correct pointer value
}

Now, I'm *not* asking whether the current C++ Standard requires - or not - the noop placement new for this code to be defined.

What I would be interested in is whether this has been discussed in the committee (CWG?) in the last very few years
and whether there is any agreement if omitting the placement new (for trivial type) should be allowed or if Standard C++ should absolutely require the placement new.


Simple links to any paper(s) discussing this would be already appreciated, the only reference I found was P0137R1, and that's more about clarifying current wording afaikt.

Thanks.

- Martin

p.s.: (*) is "trivial type" the correct term?

p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers
that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)

Nicol Bolas

unread,
Jan 15, 2017, 6:37:00 PM1/15/17
to ISO C++ Standard - Discussion
I cannot tell you if any discussion has been had. However, I can give you some of the history of what in C++ prevents merely allocating memory from creating an object. That in itself is not definitive about discussions, but it is suggestive.

I do not actually have a copy of C++98 or C++03. The oldest working draft I can find that is available is N1638, which was released in 2004. [intro.object]/1 outlines when an object in C++ comes into being:

> An object is created by a definition (3.1), by a new-expression (5.3.4) or by the implementation (12.2) when needed.

I do have late drafts of C++11 and C++14. I'm not going to quote from their version of this section because they all say the exact same thing.

N4616, the current working draft leading into C++17, however, does change things. A bit:

> An object is created by a definition (3.1), by a new-expression (5.3.4), when implicitly changing the active member of a union (9.3), or when a temporary object is created (4.4, 12.2).

So the only change has been essentially a defect fix that makes unions actually work, in accord with the standard.

In at least 12 years of standardization, the committee has made no substantive change to the causes of bringing an object into being. While this is not conclusive, the fact that C++17 did put a fix into this section means that they have looked at it and talked about it at some point. So I would suggest that, if there was discussion about it, it did not progress beyond discussion.


On Sunday, January 15, 2017 at 3:56:56 PM UTC-5, Martin Ba wrote:
Hi.

I'm currently trying to understand a few ... interesting ... observations I have been making wrt. the C++ Standard and using char arrays as raw storage.

Essentially, as far as I can tell (have been told), the current C++ Standard only allows using a char array as raw storage (see also std::aligned_storage) when objects are put into this via placement new, even for e.g. int or other trivial(*) types.

See: http://stackoverflow.com/questions/41624685/is-placement-new-legally-required-for-putting-an-int-into-a-char-array or related questions where I'm told I'm expected to do the following:

alignas(int) char buf[sizeof(int)];

void f() {
  // turn the memory into an int: (??) from the POV of the abstract machine!
  ::new (buf) int; // is this strictly required? (aside: it's obviously a no-op)

  // access storage:
  *((int*)buf) = 42; // for this discussion, just assume the cast itself yields the correct pointer value
}

Now, I'm *not* asking whether the current C++ Standard requires - or not - the noop placement new for this code to be defined.

What I would be interested in is whether this has been discussed in the committee (CWG?) in the last very few years
and whether there is any agreement if omitting the placement new (for trivial type) should be allowed or if Standard C++ should absolutely require the placement new.


Simple links to any paper(s) discussing this would be already appreciated, the only reference I found was P0137R1, and that's more about clarifying current wording afaikt.

Thanks.

- Martin

p.s.: (*) is "trivial type" the correct term?

Correct term for what? Trivial Type is a term in C++, but it's unclear what it would mean for what you want to do.

Conceptually, a TrivialType is a type which is a pure block-of-bits, one for which any value of those bits is no less legal than any other. But C++ has other kinds of types.

A TriviallyCopyable type is a type for which a byte-by-byte copy operation is equivalent to a language-level copy or move operation. A TriviallyDefaultConstructible type is a type for which being uninitialized is a legitimate state. A TriviallyDestructible type is a type whose destruction is essentially irrelevant and can be ignored.

p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers 
that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)

Compilers do treat it as UB. UB doesn't mean "crash"; UB can still do what you want.

The point of the UB designation is to allow implementations to be reasonably fast. If you reinterpret cast a pointer to a different type, the compiler doesn't have to check to see if that object really exists there; it will simply trust your cast and pretend that there is an object there.

Martin Ba

unread,
Jan 16, 2017, 2:08:39 PM1/16/17
to ISO C++ Standard - Discussion
On Monday, January 16, 2017 at 12:37:00 AM UTC+1, Nicol Bolas wrote:
I cannot tell you if any discussion has been had. However, ...
... So the only change has been essentially a defect fix that makes unions actually work, in accord with the standard.


In at least 12 years of standardization, the committee has made no substantive change to the causes of bringing an object into being. While this is not conclusive, the fact that C++17 did put a fix into this section means that they have looked at it and talked about it at some point. So I would suggest that, if there was discussion about it, it did not progress beyond discussion.


Thanks a lot for that wrap up!
 
-snip-

p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers 
that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)

Compilers do treat it as UB. UB doesn't mean "crash"; UB can still do what you want.

The point of the UB designation is to allow implementations to be reasonably fast. If you reinterpret cast a pointer to a different type, the compiler doesn't have to check to see if that object really exists there; it will simply trust your cast and pretend that there is an object there.

What I meant by "treating it as UB" was in the same vein as, e.g., signed integer overflow. Compilers generate code today that doesn't work anymore if it relies/relied on signed integer overflow, although older optimizer didn't "break" anything.

In the same vein, I'm sure we can imagine several transformations that break code that has no "placement new" (from my OP) that used (and uses) to work.

- Martin

Nicol Bolas

unread,
Jan 16, 2017, 3:29:02 PM1/16/17
to ISO C++ Standard - Discussion

Such as?

Assuming a lack of signed integer overflow means that the compiler doesn't have to insert code to check for integer overflow. The UB designation allows correct code (code without overflows) to execute at maximum performance. Any degrading of incorrect code is merely a consequence of making correct code as fast as possible.

Let's say that you have a function that returns a `T*`. The fastest code generated which uses this return value is code which assumes that `T*` points to a live, valid object of type `T`. To do anything else makes correct code slower. Even if you inlined that function or could otherwise be certain that the `T*` was not valid, that simply means UB happens. Do you think compiler writers are going to detect such circumstances and make the code fail in some way?

Can you give an example of these "several transformations"? How would they speed up correct code?

It should also be noted that, well, we can trace this rule back at least 12 years. Compilers haven't done anything to break such code yet.

Jens Maurer

unread,
Jan 16, 2017, 4:11:19 PM1/16/17
to std-dis...@isocpp.org
On 01/15/2017 09:56 PM, Martin Ba wrote:
> Hi.
>
> I'm currently trying to understand a few ... interesting ... observations I have been making wrt. the C++ Standard and using char arrays as raw storage.
>
> Essentially, as far as I can tell (have been told), the current C++ Standard only allows using a char array as raw storage (see also std::aligned_storage) when objects are put into this via placement new, even for e.g. int or other trivial(*) types.
>
> See: http://stackoverflow.com/questions/41624685/is-placement-new-legally-required-for-putting-an-int-into-a-char-array or related questions where I'm told I'm expected to do the following:
>
> |alignas(int) char buf[sizeof(int)];
>
> void f() {
> // turn the memory into an int: (??) from the POV of the abstract machine!
> ::new (buf) int; // is this strictly required? (aside: it's obviously a no-op)
>
> // access storage:
> *((int*)buf) = 42; // for this discussion, just assume the cast itself yields the correct pointer value
> }

> What I would be interested in is whether this has been discussed in the committee (CWG?) in the last very few years
> and whether there is any agreement if omitting the placement new (for trivial type) should be allowed or if Standard C++ should absolutely require the placement new./

I believe I can say that CWG agrees that the words now in C++17 correctly
reflect the intent that you need the placement new in the case above.

If you believe that intent is misguided, feel free to propose a change.
I'm sure compiler writers will explain to you how that substantially
pessimizes their code generation.

> p.s.: (*) is "trivial type" the correct term?
>
> p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers
> that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)

Some compilers might make special allowances for their particular user
community, precisely out of concerns you cited. That doesn't make your
code any better.

Jens

Martin Ba

unread,
Jan 16, 2017, 4:18:30 PM1/16/17
to ISO C++ Standard - Discussion


On Monday, January 16, 2017 at 9:29:02 PM UTC+1, Nicol Bolas wrote:
On Monday, January 16, 2017 at 2:08:39 PM UTC-5, Martin Ba wrote:
On Monday, January 16, 2017 at 12:37:00 AM UTC+1, Nicol Bolas wrote:
I cannot tell you if any discussion has been had. However, ...
... So the only change has been essentially a defect fix that makes unions actually work, in accord with the standard.

In at least 12 years of standardization, the committee has made no substantive change to the causes of bringing an object into being. While this is not conclusive, the fact that C++17 did put a fix into this section means that they have looked at it and talked about it at some point. So I would suggest that, if there was discussion about it, it did not progress beyond discussion.


Thanks a lot for that wrap up!
 
-snip-
p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers 
that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)

Compilers do treat it as UB. UB doesn't mean "crash"; UB can still do what you want.

The point of the UB designation is to allow implementations to be reasonably fast. If you reinterpret cast a pointer to a different type, the compiler doesn't have to check to see if that object really exists there; it will simply trust your cast and pretend that there is an object there.

What I meant by "treating it as UB" was in the same vein as, e.g., signed integer overflow. Compilers generate code today that doesn't work anymore if it relies/relied on signed integer overflow, although older optimizer didn't "break" anything.

In the same vein, I'm sure we can imagine several transformations that break code that has no "placement new" (from my OP) that used (and uses) to work.

Such as?

Assuming a lack of signed integer overflow means that the compiler doesn't have to insert code to check for integer overflow. The UB designation allows correct code (code without overflows) to execute at maximum performance. Any degrading of incorrect code is merely a consequence of making correct code as fast as possible.


See e.g.: http://stackoverflow.com/questions/7682477/why-does-integer-overflow-on-x86-with-gcc-cause-an-infinite-loop ... "The compiler assumes you won't cause undefined behavior, and optimizes away the loop test."


Let's say that you have a function that returns a `T*`. The fastest code generated which uses this return value is code which assumes that `T*` points to a live, valid object of type `T`. To do anything else makes correct code slower. Even if you inlined that function or could otherwise be certain that the `T*` was not valid, that simply means UB happens. Do you think compiler writers are going to detect such circumstances and make the code fail in some way?

Can you give an example of these "several transformations"? How would they speed up correct code?


In the same vein as gcc's -fdelete-null-pointer-checks - (see e.g. http://stackoverflow.com/questions/23153445/can-branches-with-undefined-behavior-be-assumed-unreachable-and-optimized-as-dea) the compiler sees a branch that definitiely invokes UB and optimizes away the branch and the branch check.

It should also be noted that, well, we can trace this rule back at least 12 years. Compilers haven't done anything to break such code yet.

Yet. And I assume (FWIW) as a matter of QoI they won''t. But then, stuff like -fwrapv and -fno-delete-null-pointer-checks have happened in the sense that compiler writers saw legal optimization opportunities that break some code. So, just because I or you cannot see any reason today, that's not much consolation to me :-)

Jens Maurer

unread,
Jan 16, 2017, 4:22:01 PM1/16/17
to std-dis...@isocpp.org
On 01/16/2017 09:29 PM, Nicol Bolas wrote:
> On Monday, January 16, 2017 at 2:08:39 PM UTC-5, Martin Ba wrote:
> What I meant by "treating it as UB" was in the same vein as, e.g., signed integer overflow. Compilers generate code today that doesn't work anymore if it relies/relied on signed integer overflow, although older optimizer didn't "break" anything.
>
>
> In the same vein, I'm sure we can imagine several transformations that break code that has no "placement new" (from my OP) that used (and uses) to work.
>
>
> Such as?

Here's a gentle introduction to undefined behavior vs.
optimizations:

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html

And don't forget to follow the link to http://blog.regehr.org/archives/213 .

Jens

Chris Hallock

unread,
Jan 16, 2017, 4:24:35 PM1/16/17
to ISO C++ Standard - Discussion
In the same vein, I'm sure we can imagine several transformations that break code that has no "placement new" (from my OP) that used (and uses) to work.

Such as?

An aggressively-optimizing compiler that assumes perfectly-well-formed C++ input could detect that this code is UB and therefore assume it never executes (i.e. dead code that can be omitted from the binary).

Martin Ba

unread,
Jan 16, 2017, 4:49:45 PM1/16/17
to ISO C++ Standard - Discussion
On Monday, January 16, 2017 at 10:11:19 PM UTC+1, Jens Maurer wrote:
On 01/15/2017 09:56 PM, Martin Ba wrote:
> Hi.
>
> I'm currently trying to understand a few ... interesting ... observations I have been making wrt. the C++ Standard and using char arrays as raw storage.
>
> Essentially, as far as I can tell (have been told), the current C++ Standard only allows using a char array as raw storage (see also std::aligned_storage) when objects are put into this via placement new, even for e.g. int or other trivial(*) types.
>
> See: http://stackoverflow.com/questions/41624685/is-placement-new-legally-required-for-putting-an-int-into-a-char-array or related questions where I'm told I'm expected to do the following:
>
> |alignas(int) char buf[sizeof(int)];
>
> void f() {
>   // turn the memory into an int: (??) from the POV of the abstract machine!
>   ::new (buf) int; // is this strictly required? (aside: it's obviously a no-op)
>
>   // access storage:
>   *((int*)buf) = 42; // for this discussion, just assume the cast itself yields the correct pointer value
> }

> What I would be interested in is whether this has been discussed in the committee (CWG?) in the last very few years
> and whether there is any agreement if omitting the placement new (for trivial type) should be allowed or if Standard C++ should absolutely require the placement new./

I believe I can say that CWG agrees that the words now in C++17 correctly
reflect the intent that you need the placement new in the case above.


If this is really the intent, then this needs to be more clearly communicated and, I feel, rationalized. (Maybe it already has? Thats what the OP was actually about.)
 
If you believe that intent is misguided, feel free to propose a change.

Yes, I very much feel the intent is misguided. For two reasons:
  • This intent declares UB totally reasonable legacy code. At least I consider it reasonable too *not* have to place a no-op placement new in straightforward buffer backed code for trivial types.
  • Since C doesn't have placement new, any C code that uses a char buffer to back any other typed data is automatically UB in C++. Another unnecessary incompatibility.  

All the change I can propose is that CWG considers some way to make this work. (As it does in practice anyway *today*.) As I understand so far from what I gleaned from P0137R1 is that the problem we have at the moment is that the definition for objects (in the memory location sense) doesn't allow this and that it's pretty complex and hard to come up with something that does allow it without restricting other things.


I'm sure compiler writers will explain to you how that substantially
pessimizes their code generation.

For this specific case, I do hope not. I'm braced for anything.
 

> p.s.: (*) is "trivial type" the correct term?
>
> p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers
> that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)

Some compilers might make special allowances for their particular user
community, precisely out of concerns you cited.  That doesn't make your
code any better.


As far as the C++ Standard goes, I'm not so much concerned with "better" but with not allowing future compilers to break reasonable legacy code.

*When* using char arrays (or malloc'ed memory) as backing store for trivial types, I fully assume most (non generic) existing code to *not* employ placement new, simply because it's the straightforward thing to (not) do and the placement new would be a no-op and all compilers up to today seem to generate working code.

I think, here, the C++ Standard should take into account this "existing practice". (Yeah, I know the same arguments were/are raised wrt. signed integer overflow or the nullpointer-check-elimination, but I at least feel those cases, while possible problematic in quite some cases, are historically quite more clear cut. And at least both affect C and C++ code the same.)

cheers.


Jens Maurer

unread,
Jan 16, 2017, 5:41:33 PM1/16/17
to std-dis...@isocpp.org
On 01/16/2017 10:49 PM, Martin Ba wrote:
> I believe I can say that CWG agrees that the words now in C++17 correctly
> reflect the intent that you need the placement new in the case above.
>
>
> If this is really the intent, then this needs to be more clearly communicated

Well, the C++ committee doesn't have a PR department. What's unclear about:

1.8p1 [intro.object]

"The constructs in a C++ program create, destroy, refer to, access, and manipulate objects.
An object is created by a definition (3.1), by a new-expression (5.3.4), when implicitly
changing the active member of a union (9.3), or when a temporary object is created
(4.4, 12.2). ..."

> and, I feel, rationalized. (Maybe it already has? Thats what the OP was actually about.)

I've reviewed the notes on the CWG discussions for P0137Rx and I could not
find anything that would directly talk about your example.

>
> If you believe that intent is misguided, feel free to propose a change.
>
>
> Yes, I very much feel the intent is misguided. For two reasons:
>
> * This intent declares UB totally reasonable legacy code.

Even legacy code should have used "memcpy" here.

> At least I consider it reasonable too *not* have to place a no-op placement new in straightforward buffer backed code for trivial types.
> * Since C doesn't have placement new, any C code that uses a char buffer to back any other typed data is automatically UB in C++. Another unnecessary incompatibility.
>
> All the change I can propose is that CWG considers some way to make this work. (As it does in practice anyway *today*.) As I understand so far from what I gleaned from P0137R1 is that the problem we have at the moment is that the definition for objects (in the memory location sense) doesn't allow this and that it's pretty complex and hard to come up with something that does allow it without restricting other things.

Well, without a specific proposal on the table for rules that make this
work, but don't detrimentally affect other cases, I'm afraid nothing much
will happen.

> *When* using char arrays (or malloc'ed memory) as backing store for trivial types, I fully assume most (non generic) existing code to *not* employ placement new, simply because it's the straightforward thing to (not) do and the placement new would be a no-op and all compilers up to today seem to generate working code.

When compilers introduced type-based alias analysis, there was lots of broken code
that could be made to work with -fno-strict-aliasing. The code, eventually, got
fixed. I'm sure people using char arrays as backing store will fix their code
eventually, or learn to live with the shame of -fobjects-spring-to-life eternally.

> I think, here, the C++ Standard should take into account this "existing practice". (Yeah, I know the same arguments were/are raised wrt. signed integer overflow or the nullpointer-check-elimination, but I at least feel those cases, while possible problematic in quite some cases, are historically quite more clear cut. And at least both affect C and C++ code the same.)

Again, without a proposal, nothing is likely to happen.
(Off-topic: The rules for signed bit-shifts are subtly different between
C and C++, last I looked.)

Jens

T. C.

unread,
Jan 16, 2017, 6:14:35 PM1/16/17
to ISO C++ Standard - Discussion


On Monday, January 16, 2017 at 3:29:02 PM UTC-5, Nicol Bolas wrote:
On Monday, January 16, 2017 at 2:08:39 PM UTC-5, Martin Ba wrote:
In the same vein, I'm sure we can imagine several transformations that break code that has no "placement new" (from my OP) that used (and uses) to work.

Such as?


See Richard Smith's comment in this Reddit thread: https://www.reddit.com/r/cpp/comments/5fk3wn/undefined_behavior_with_reinterpret_cast/dal28n0/ for an example.


Richard Smith

unread,
Jan 16, 2017, 7:23:30 PM1/16/17
to std-dis...@isocpp.org
On 16 January 2017 at 13:49, Martin Ba <0xcdc...@gmx.at> wrote:
On Monday, January 16, 2017 at 10:11:19 PM UTC+1, Jens Maurer wrote:
On 01/15/2017 09:56 PM, Martin Ba wrote:
> Hi.
>
> I'm currently trying to understand a few ... interesting ... observations I have been making wrt. the C++ Standard and using char arrays as raw storage.
>
> Essentially, as far as I can tell (have been told), the current C++ Standard only allows using a char array as raw storage (see also std::aligned_storage) when objects are put into this via placement new, even for e.g. int or other trivial(*) types.
>
> See: http://stackoverflow.com/questions/41624685/is-placement-new-legally-required-for-putting-an-int-into-a-char-array or related questions where I'm told I'm expected to do the following:
>
> |alignas(int) char buf[sizeof(int)];
>
> void f() {
>   // turn the memory into an int: (??) from the POV of the abstract machine!
>   ::new (buf) int; // is this strictly required? (aside: it's obviously a no-op)
>
>   // access storage:
>   *((int*)buf) = 42; // for this discussion, just assume the cast itself yields the correct pointer value
> }

> What I would be interested in is whether this has been discussed in the committee (CWG?) in the last very few years
> and whether there is any agreement if omitting the placement new (for trivial type) should be allowed or if Standard C++ should absolutely require the placement new./

I believe I can say that CWG agrees that the words now in C++17 correctly
reflect the intent that you need the placement new in the case above.


If this is really the intent, then this needs to be more clearly communicated and, I feel, rationalized. (Maybe it already has? Thats what the OP was actually about.)
 
If you believe that intent is misguided, feel free to propose a change.

Yes, I very much feel the intent is misguided. For two reasons:
  • This intent declares UB totally reasonable legacy code. At least I consider it reasonable too *not* have to place a no-op placement new in straightforward buffer backed code for trivial types.
  • Since C doesn't have placement new, any C code that uses a char buffer to back any other typed data is automatically UB in C++. Another unnecessary incompatibility.  
The above code has undefined behavior in C too. C's effective type rules do not permit changing the effective type of a declared object to something other than its declared type; it only permits that for objects allocated with malloc or similar.

In the case where the storage /was/ allocated through malloc or similar, C++ requires a placement new where C simply allows the effective type to change through a store (and some parts of the C effective type model don't work as a result...). It would seem reasonable to me for such allocation functions to be specified to have implicitly created whatever set of objects the following code relies on existing[1] -- the compiler typically has to make that pessimistic assumption anyway, since it doesn't know what objects the implementation of an opaque function might create, so it seems like we'd lose little and gain more C compatibility by guaranteeing something like that.

 [1]: that is, we could require the compiler to assume that malloc runs a sequence of placement news (for types with trivial default construction and destruction) before it returns, where that set is chosen to be whatever set gives the program defined behavior -- if such a set exists

All the change I can propose is that CWG considers some way to make this work. (As it does in practice anyway *today*.) As I understand so far from what I gleaned from P0137R1 is that the problem we have at the moment is that the definition for objects (in the memory location sense) doesn't allow this and that it's pretty complex and hard to come up with something that does allow it without restricting other things.


I'm sure compiler writers will explain to you how that substantially
pessimizes their code generation.

For this specific case, I do hope not. I'm braced for anything.
 

> p.s.: (*) is "trivial type" the correct term?
>
> p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers
> that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)

Some compilers might make special allowances for their particular user
community, precisely out of concerns you cited.  That doesn't make your
code any better.


As far as the C++ Standard goes, I'm not so much concerned with "better" but with not allowing future compilers to break reasonable legacy code.

*When* using char arrays (or malloc'ed memory) as backing store for trivial types, I fully assume most (non generic) existing code to *not* employ placement new, simply because it's the straightforward thing to (not) do and the placement new would be a no-op and all compilers up to today seem to generate working code.

I think, here, the C++ Standard should take into account this "existing practice". (Yeah, I know the same arguments were/are raised wrt. signed integer overflow or the nullpointer-check-elimination, but I at least feel those cases, while possible problematic in quite some cases, are historically quite more clear cut. And at least both affect C and C++ code the same.)

cheers.


--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Nicol Bolas

unread,
Jan 16, 2017, 8:47:36 PM1/16/17
to ISO C++ Standard - Discussion

The result of a "sequence of placement news" on a piece of memory is the creation of an object of the last type `new`ed. The C++ object model does not permit storage to have an indeterminate object or many separate objects (outside of nesting). If you allocate 4 bytes and new an `int` into it, then it is an int. If you new a `float` into it, it stops being an `int`.

So, how would you suggest the object model change to accommodate such a thing? What is the syntax that causes a piece of storage that contains all objects to contain just one?

Personally? I say let it go. C++ programmers have managed to survive this being UB since at least 2004. We're teaching C++ programmers nowadays to avoid pointless casting; the average C++ programmer today is far more likely to employ placement-new than to do casts and assume it was constructed.

I'd rather the committee spend time shoring up the object model for genuine C++ purposes, like making it possible for `vector` to be implemented without UB.

Richard Smith

unread,
Jan 16, 2017, 9:05:11 PM1/16/17
to std-dis...@isocpp.org
I never said they would all be at the start of the allocation.

So, how would you suggest the object model change to accommodate such a thing? What is the syntax that causes a piece of storage that contains all objects to contain just one?

Personally? I say let it go. C++ programmers have managed to survive this being UB since at least 2004. We're teaching C++ programmers nowadays to avoid pointless casting; the average C++ programmer today is far more likely to employ placement-new than to do casts and assume it was constructed.

I'd rather the committee spend time shoring up the object model for genuine C++ purposes, like making it possible for `vector` to be implemented without UB.

--

Nicol Bolas

unread,
Jan 16, 2017, 9:25:26 PM1/16/17
to ISO C++ Standard - Discussion

... that doesn't make sense. I mean, where else are they going to be except for the start? If I allocate 4 bytes, then you need to `new` up both `int` and `float` (assuming they're both 4 bytes, of course). But there's no room to `new` them at different addresses within that allocation, since the allocation is only 4 bytes.

So where would you be allocating these different objects?

Richard Smith

unread,
Jan 16, 2017, 11:09:59 PM1/16/17
to std-dis...@isocpp.org
I don't know what this example is supposed to demonstrate.

So where would you be allocating these different objects?

If you allocate 8 bytes, there could be an int object at offset 0 and a float object at offset 4.

Demi Obenour

unread,
Jan 17, 2017, 12:36:44 AM1/17/17
to 'Edward Catmur' via ISO C++ Standard - Discussion
I very much disagree.

C++ can use placement new, true.  But C cannot, and many programs need to compile and run as both.

Furthermore, I don't know of any reasonable way a compiler could exploit this to produce better code.  Strict aliasing doesn't apply, since char pointers can alias anything.  More importantly, you make it impossible to take an aligned char array — say, one filled in by an I/O operation — and cast it to an array of (say) int without an O(n) copy and a 2x memory overhead!  That is anything BUT fast.  C++ should NOT impose such overheads.

--

Thiago Macieira

unread,
Jan 17, 2017, 2:41:34 AM1/17/17
to std-dis...@isocpp.org
Em terça-feira, 17 de janeiro de 2017, às 00:36:42 PST, Demi Obenour escreveu:
> C++ can use placement new, true. But C cannot, and many programs need to
> compile and run as both.

That argument doesn't apply. You can write code that compiles as both C and
C++, but that does not mean the rules from one language apply in the other.

If you need to write C++-specific code, you can always just use #ifdef
__cplusplus.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

Andrey Semashev

unread,
Jan 17, 2017, 5:47:24 AM1/17/17
to std-dis...@isocpp.org
On 01/17/17 04:47, Nicol Bolas wrote:
>
> The result of a "sequence of placement news" on a piece of memory is the
> creation of an object of the last type `new`ed. The C++ object model
> does not permit storage to have an indeterminate object or many separate
> objects (outside of nesting).

Does it not? Could you provide a reference to the standard?

I have always assumed the following was a well defined code:

char* p = static_cast< char* >(malloc(sizeof(int) + sizeof(float)));
int* pi = new (p) int;
float* pf = new (p + sizeof(int)) float;

(null checks and alignment accounting skipped for brevity).

Regarding the OP, I think it is fair to say that malloc returns a
storage, which is a sequence of bytes (chars) that is allowed to alias
any other type. In that sense, the compiler has no way to know what
actual objects are created by malloc in that storage, so when the user
cases the returned pointer, type aliasing effectively happens. Whether
that is UB or not is a grey area because we don't know if the storage
actually contains the objects that we casted the pointer returned my
malloc to. Regardless, the compiler cannot assume that the code is UB
and e.g. remove it.

> Personally? I say let it go. C++ programmers have managed to survive
> this being UB since at least 2004. We're teaching C++ programmers
> nowadays to avoid pointless casting; the average C++ programmer today is
> far more likely to employ placement-new than to do casts and assume it
> was constructed.

I disagree, because it requires programmers to write pointless code that
is known to be no-op anyway, just to satisfy the spec.

char* p = static_cast< char* >(malloc(sizeof(int) * 10));

// What is this code written for?
char* pi = p, *pe = p + sizeof(int) * 10;
for (; pi != pe; pi += sizeof(int))
{
new (pi) int;
}

// Use the array of ints
int* q = reinterpret_cast< int* >(p);

Andrey Semashev

unread,
Jan 17, 2017, 5:50:56 AM1/17/17
to std-dis...@isocpp.org
On 01/17/17 13:47, Andrey Semashev wrote:
> On 01/17/17 04:47, Nicol Bolas wrote:
>>
>> The result of a "sequence of placement news" on a piece of memory is the
>> creation of an object of the last type `new`ed. The C++ object model
>> does not permit storage to have an indeterminate object or many separate
>> objects (outside of nesting).
>
> Does it not? Could you provide a reference to the standard?
>
> I have always assumed the following was a well defined code:
>
> char* p = static_cast< char* >(malloc(sizeof(int) + sizeof(float)));
> int* pi = new (p) int;
> float* pf = new (p + sizeof(int)) float;
>
> (null checks and alignment accounting skipped for brevity).
>
> Regarding the OP, I think it is fair to say that malloc returns a
> storage, which is a sequence of bytes (chars) that is allowed to alias
> any other type. In that sense, the compiler has no way to know what
> actual objects are created by malloc in that storage, so when the user
> cases the returned pointer,

so when the user casts...

Robert Haberlach

unread,
Jan 17, 2017, 6:00:32 AM1/17/17
to std-dis...@isocpp.org
On 1/17/2017 11:47 AM, Andrey Semashev wrote:
> On 01/17/17 04:47, Nicol Bolas wrote:
>>
>> The result of a "sequence of placement news" on a piece of memory is the
>> creation of an object of the last type `new`ed. The C++ object model
>> does not permit storage to have an indeterminate object or many separate
>> objects (outside of nesting).
>
> Does it not? Could you provide a reference to the standard?
See [intro.object]/6 as modified by P0137.

>
> I have always assumed the following was a well defined code:
>
> char* p = static_cast< char* >(malloc(sizeof(int) + sizeof(float)));
> int* pi = new (p) int;
> float* pf = new (p + sizeof(int)) float;
>
> (null checks and alignment accounting skipped for brevity).
Yes, this is well-defined AFAICS.

> Regarding the OP, I think it is fair to say that malloc returns a
> storage, which is a sequence of bytes (chars) that is allowed to alias
> any other type.
Storage cannot be aliased. Objects can.

> In that sense, the compiler has no way to know what actual objects are
> created by malloc in that storage,
There are no objects in that storage. There is a very specific list of
situations in which objects are created, and calls to malloc are
excluded (and that has always been so).

> so when the user cases the returned pointer, type aliasing effectively
> happens. Whether that is UB or not is a grey area because we don't
> know if the storage actually contains the objects that we casted the
> pointer returned my malloc to.
Yes, we do know that--see above. malloc and its semantics are known to
the compiler.

>> Personally? I say let it go. C++ programmers have managed to survive
>> this being UB since at least 2004. We're teaching C++ programmers
>> nowadays to avoid pointless casting; the average C++ programmer today is
>> far more likely to employ placement-new than to do casts and assume it
>> was constructed.
>
> I disagree, because it requires programmers to write pointless code
> that is known to be no-op anyway, just to satisfy the spec.

"Pointless" and "satisfy the spec" are contradictions. You have to start
viewing C++ as an abstract language with an abstract object model and
not some type of hacking playground, where as long as your pointers
contain the correct value and the memory is aligned, everything is well.

>
> char* p = static_cast< char* >(malloc(sizeof(int) * 10));
>
> // What is this code written for?
> char* pi = p, *pe = p + sizeof(int) * 10;
> for (; pi != pe; pi += sizeof(int))
> {
> new (pi) int;
> }

Why didn't you just allocate using new[]? Of course you have to explain
to the implementation that each raw memory location corresponds to an
int, and *you* imposed that burden on yourself--not the language. If
you're so into malloc, try C.

Andrey Semashev

unread,
Jan 17, 2017, 6:53:04 AM1/17/17
to std-dis...@isocpp.org
On 01/17/17 14:00, Robert Haberlach wrote:
> On 1/17/2017 11:47 AM, Andrey Semashev wrote:
>> On 01/17/17 04:47, Nicol Bolas wrote:
>>>
>>> The result of a "sequence of placement news" on a piece of memory is the
>>> creation of an object of the last type `new`ed. The C++ object model
>>> does not permit storage to have an indeterminate object or many separate
>>> objects (outside of nesting).
>>
>> Does it not? Could you provide a reference to the standard?
> See [intro.object]/6 as modified by P0137.
>
>>
>> I have always assumed the following was a well defined code:
>>
>> char* p = static_cast< char* >(malloc(sizeof(int) + sizeof(float)));
>> int* pi = new (p) int;
>> float* pf = new (p + sizeof(int)) float;
>>
>> (null checks and alignment accounting skipped for brevity).
> Yes, this is well-defined AFAICS.
>
>> Regarding the OP, I think it is fair to say that malloc returns a
>> storage, which is a sequence of bytes (chars) that is allowed to alias
>> any other type.
> Storage cannot be aliased. Objects can.
>
>> In that sense, the compiler has no way to know what actual objects are
>> created by malloc in that storage,
> There are no objects in that storage. There is a very specific list of
> situations in which objects are created, and calls to malloc are
> excluded (and that has always been so).

I don't find where it is excluded. malloc is an opaque function that
returns a void pointer. The compiler has no way to know what objects are
created in the storage accessible through that pointer.

>> so when the user cases the returned pointer, type aliasing effectively
>> happens. Whether that is UB or not is a grey area because we don't
>> know if the storage actually contains the objects that we casted the
>> pointer returned my malloc to.
> Yes, we do know that--see above. malloc and its semantics are known to
> the compiler.

malloc is imported from C in [c.malloc]. In C99, 7.20.3/1, there is this
part:

> The pointer returned if the allocation succeeds is suitably aligned
> so that it may be assigned to a pointer to any type of object
> and then used to access such an object or an array of such objects
> in the space allocated (until the space is explicitly deallocated).
> The lifetime of an allocated object extends from the allocation until
> the deallocation.

So, according to C, the returned pointer may represent whatever object
the pointer is casted to.

Ok, you may argue that that description is given in terms of C, and that
doesn't mean that the same is valid with regard to C++ objects,
including trivial ones like int. Fair enough, but in that case the C++
standard should clarify that. And condidering that there are allocation
functions other than malloc/calloc, the only sane behavior would be the
one compatible with C.

>>> Personally? I say let it go. C++ programmers have managed to survive
>>> this being UB since at least 2004. We're teaching C++ programmers
>>> nowadays to avoid pointless casting; the average C++ programmer today is
>>> far more likely to employ placement-new than to do casts and assume it
>>> was constructed.
>>
>> I disagree, because it requires programmers to write pointless code
>> that is known to be no-op anyway, just to satisfy the spec.
>
> "Pointless" and "satisfy the spec" are contradictions.

No, unless you write code just to satisfy the spec. I personally don't
find that kind of activity productive.

> You have to start
> viewing C++ as an abstract language with an abstract object model and
> not some type of hacking playground, where as long as your pointers
> contain the correct value and the memory is aligned, everything is well.

Abstractions are a tool that the spec writers use to describe
generalized behavior of multiple conforming implementations. If that
tool requires people to write pointless code then the tool is broken.

>> char* p = static_cast< char* >(malloc(sizeof(int) * 10));
>>
>> // What is this code written for?
>> char* pi = p, *pe = p + sizeof(int) * 10;
>> for (; pi != pe; pi += sizeof(int))
>> {
>> new (pi) int;
>> }
>
> Why didn't you just allocate using new[]?

Because I might have reasons to. E.g. to pass that pointer to a C
library later that I call from my otherwise C++ code. Or we can pretend
that that is not malloc but posix_memalign/aligned_alloc/whatever_alloc
that provides additional properties of the allocated memory, like
increased alignment, that cannot be provided by operator new.

> Of course you have to explain
> to the implementation that each raw memory location corresponds to an
> int, and *you* imposed that burden on yourself--not the language.

The point I'm making is that the requirement to explain this is imposed
by the standard, while there is no technical reason to do that. That is
what I called "writing code just to satisfy the spec".

Richard Hodges

unread,
Jan 17, 2017, 7:37:54 AM1/17/17
to std-dis...@isocpp.org
It seems to me that this entire discussion can be paraphrased as, "is c++ compatible with c or not?". [compatibility being defined as PODs legally created by one are valid in the other].

The answer appears, at least insofar as the standard is concerned, "no, or at best, implementation defined".

I think it would be fair to say that the vast majority of c++'s user base would hope that the answer is, "yes".

I think it's therefore reasonable that the standard should give PODs special treatment, allowing them to have been created (default-initialised) simply by allocating properly aligned storage of sufficient size.

This would then allow c++ to interoperate with c (in both directions) both legally and de-facto.

Since the standard does seem to acknowledge the C language, is there any reasonable argument that suggests that this explicit interoperability should not be in the standard? 

R


Nicol Bolas

unread,
Jan 17, 2017, 8:31:48 AM1/17/17
to ISO C++ Standard - Discussion

You originally said:

> we could require the compiler to assume that malloc runs a sequence of placement news (for types with trivial default construction and destruction) before it returns, where that set is chosen to be whatever set gives the program defined behavior -- if such a set exists

Therefore, if I 'malloc' 4 bytes of storage, then placement `new` will be executed on that storage for both `int` and `float`. Among others. That's what you're asking for.


And as I said, that would make the memory both an `int` and a `float` at the same time. You then said:

> I never said they would all be at the start of the allocation.

Then where is it going to be? Where does the `int` get created and where does the `float` get created, since there's not room enough for both?

I'm trying to understand what you're suggesting the standard do here, and thus far, it does not make sense.

Nicol Bolas

unread,
Jan 17, 2017, 8:33:15 AM1/17/17
to ISO C++ Standard - Discussion


On Tuesday, January 17, 2017 at 12:36:44 AM UTC-5, Demi Obenour wrote:
I very much disagree.

C++ can use placement new, true.  But C cannot, and many programs need to compile and run as both.

Furthermore, I don't know of any reasonable way a compiler could exploit this to produce better code.  Strict aliasing doesn't apply, since char pointers can alias anything.  More importantly, you make it impossible to take an aligned char array — say, one filled in by an I/O operation — and cast it to an array of (say) int without an O(n) copy and a 2x memory overhead!  That is anything BUT fast.  C++ should NOT impose such overheads.

Or you could just create an `int` array to begin with, then pass it to an I/O operation to be filled in.

Nicol Bolas

unread,
Jan 17, 2017, 8:39:19 AM1/17/17
to ISO C++ Standard - Discussion


On Tuesday, January 17, 2017 at 6:53:04 AM UTC-5, Andrey Semashev wrote:
> Of course you have to explain
> to the implementation that each raw memory location corresponds to an
> int, and *you* imposed that burden on yourself--not the language.

The point I'm making is that the requirement to explain this is imposed
by the standard, while there is no technical reason to do that.

If you know that there really is "no technical reason to do that", then you ought to be able to propose changes to the specification that permit such a thing without creating an object model that is inherently self-contradictory or is otherwise broken.

So what exactly do you suggest we change? Not merely, "do stuff to make this work". But what exact changes should be made to the specification?

Nicol Bolas

unread,
Jan 17, 2017, 8:45:37 AM1/17/17
to ISO C++ Standard - Discussion


On Tuesday, January 17, 2017 at 7:37:54 AM UTC-5, Richard Hodges wrote:
It seems to me that this entire discussion can be paraphrased as, "is c++ compatible with c or not?". [compatibility being defined as PODs legally created by one are valid in the other].

The answer appears, at least insofar as the standard is concerned, "no, or at best, implementation defined".

I think it would be fair to say that the vast majority of c++'s user base would hope that the answer is, "yes".

I think it's therefore reasonable that the standard should give PODs special treatment, allowing them to have been created (default-initialised) simply by allocating properly aligned storage of sufficient size.

This would then allow c++ to interoperate with c (in both directions) both legally and de-facto.

It would also create a dysfunctional C++ object model. A piece of storage would have to have every object that could fit into it all at the same time. You're basically saying that any piece of memory should be able to be treated as a union of all appropriate types.

As I understand it, that's not even the C object model; it manifests objects in arbitrary memory by you writing to them.

Robert Haberlach

unread,
Jan 17, 2017, 8:52:44 AM1/17/17
to std-dis...@isocpp.org
On 1/17/2017 12:52 PM, Andrey Semashev wrote:
> On 01/17/17 14:00, Robert Haberlach wrote:
>> There is a very specific list of
>> situations in which objects are created, and calls to malloc are
>> excluded (and that has always been so).
>
> I don't find where it is excluded. malloc is an opaque function that
> returns a void pointer. The compiler has no way to know what objects
> are created in the storage accessible through that pointer.
It isn't explicitly excluded; rather, malloc is specified as allocating
storage, and the list does not mention malloc as creating an object
(e.g. a char array).

> Ok, you may argue that that description is given in terms of C, and
> that doesn't mean that the same is valid with regard to C++ objects,
> including trivial ones like int. Fair enough, but in that case the C++
> standard should clarify that.
It doesn't need to; the status quo (as mentioned in P0137 as a drafting
note) is that malloc does not create objects. Period. You are needlessly
sceptic about things that are clear to everyone else.

> And condidering that there are allocation functions other than
> malloc/calloc, the only sane behavior would be the one compatible with C.
I disagree.

>
>>>> Personally? I say let it go. C++ programmers have managed to survive
>>>> this being UB since at least 2004. We're teaching C++ programmers
>>>> nowadays to avoid pointless casting; the average C++ programmer
>>>> today is
>>>> far more likely to employ placement-new than to do casts and assume it
>>>> was constructed.
>>>
>>> I disagree, because it requires programmers to write pointless code
>>> that is known to be no-op anyway, just to satisfy the spec.
>>
>> "Pointless" and "satisfy the spec" are contradictions.
>
> No, unless you write code just to satisfy the spec. I personally don't
> find that kind of activity productive.
We have a "spec" (international standard) for C++, and you can either
write code conforming to it or not. But don't complain if non-conforming
code doesn't execute as you intended. The spec did not say "repeat each
variable's declaration for no reason". It says "if you want to use an
uninitialized memory location as an object of some type T, you must
express this by employing placement new on it". Just because it's a
no-op it's not pointless.

>> Of course you have to explain
>> to the implementation that each raw memory location corresponds to an
>> int, and *you* imposed that burden on yourself--not the language.
>
> The point I'm making is that the requirement to explain this is
> imposed by the standard, while there is no technical reason to do
> that. That is what I called "writing code just to satisfy the spec".
If you omit it, the optimizer will screw with your code. But I guess
there's little point arguing with some kind of C fundamentalist.

Richard Hodges

unread,
Jan 17, 2017, 9:09:03 AM1/17/17
to std-dis...@isocpp.org
> It would also create a dysfunctional C++ object model. A piece of storage would have to have every object that could fit into it all at the same time. >You're basically saying that any piece of memory should be able to be treated as a union of all appropriate types.

> As I understand it, that's not even the C object model; it manifests objects in arbitrary memory by you writing to them.

OK, I think it's reasonable to require an intrinsic to have been written to before it is an 'object'. That makes sense as it allows the use of sentinels etc. Presumably this is why C mandates it this way.

So I'll modify my argument while continuing on the theme: What's sauce for the goose ought to be sauce for the supposedly compatible gander.

I argue (and I don't think I am alone) that for intrinsics and PODs solely thereof, elements that have been written to, either by C or C++ ought to have been constructed. If they have been written to through a cast of a correctly aligned memory pointer, they ought to 'exist' in that actual memory [subject to as-if-compatible optimisations, of course]. This is what we expect in C, and it is arguably what we would be reasonable to expect in C++.

I accept that this would require a differentiation in handling between PODs and non-POD structs in the standard. I think that's reasonable:

When constructors, destructors, copy, move ops are non-trivial, we expect 'object-like' behaviour. When they are trivial (particularly when they are NOPs) we expect memory-like behaviour.

Again, this is the de-facto reality on which the code base of every c++ program that calls a C library depends. Why not codify a de-facto reality in order to legitimise it?
 







--

barry....@gmail.com

unread,
Jan 17, 2017, 9:45:15 AM1/17/17
to ISO C++ Standard - Discussion
> Yes, I very much feel the intent is misguided. For two reasons:
>
>   * This intent declares UB totally reasonable legacy code.

Even legacy code should have used "memcpy" here.


The problem is that, in any latency sensitive application, nobody uses memcpy here. If we're reading in some binary protocol off the wire, everybody's code is going to look something like:

switch (*reinterpret_cast<uint16_t const*>(buf)) {
case MsgA::value:
    handle
(*reinterpret_cast<MsgA const*>(buf);
   
break;
case MsgB::value:
    handle
(*reinterpret_cast<MsgB const*>(buf);
   
break;
// ...
}

instead of:

uint16_t msgType;
memcpy
(&msgType, buf, sizeof(msgType));
switch (msgType) {
case MsgA::value: {
   
MsgA msg;
    memcpy
(&msg, buf, sizeof(msg));
    handle
(msg);
   
break;
}
case MsgB::value: {
   
MsgB msg;
    memcpy
(&msg, buf, sizeof(msg));
    handle
(msg);
   
break;
}

That's definitely UB now and it was definitely UB before. But writing a memcpy and hoping that the compiler recognizes that it's really a reinterpret_cast (which sometimes works, sometimes doesn't) isn't really a solution. Avoiding that extra write matters. 

Maybe we just a:

template <class T, class U>
T
* start_lifetime_of_object_without_any_initialization_cast(U*);


Andrey Semashev

unread,
Jan 17, 2017, 10:09:21 AM1/17/17
to std-dis...@isocpp.org
On 01/17/17 16:39, Nicol Bolas wrote:
> On Tuesday, January 17, 2017 at 6:53:04 AM UTC-5, Andrey Semashev wrote:
>
> > Of course you have to explain
> > to the implementation that each raw memory location corresponds to an
> > int, and *you* imposed that burden on yourself--not the language.
>
> The point I'm making is that the requirement to explain this is imposed
> by the standard, while there is no technical reason to do that.
>
> If you know that there really is "no technical reason to do that", then
> you ought to be able to propose changes to the specification that permit
> such a thing /without/ creating an object model that is inherently
> self-contradictory or is otherwise broken.
>
> So what /exactly/ do you suggest we change? Not merely, "do stuff to
> make this work". But what exact changes should be made to the specification?

I don't have a concrete proposal. Producing one would require me to
spend significantly more time than I'm currently able to. But I don't
consider that a legitimate argument in favor of disregarding my or the
OP's point. You may disagree, of course.

My gut feeling is that the C++ object model has to allow a POD object to
automatically begin its lifetime whenever it is modified in the raw
storage. This is similar to how an active member of a union begins its
lifetime on the first modification.

Nicol Bolas

unread,
Jan 17, 2017, 10:53:00 AM1/17/17
to ISO C++ Standard - Discussion


On Tuesday, January 17, 2017 at 9:09:03 AM UTC-5, Richard Hodges wrote:
> It would also create a dysfunctional C++ object model. A piece of storage would have to have every object that could fit into it all at the same time. >You're basically saying that any piece of memory should be able to be treated as a union of all appropriate types.

> As I understand it, that's not even the C object model; it manifests objects in arbitrary memory by you writing to them.

OK, I think it's reasonable to require an intrinsic to have been written to before it is an 'object'. That makes sense as it allows the use of sentinels etc. Presumably this is why C mandates it this way.

So I'll modify my argument while continuing on the theme: What's sauce for the goose ought to be sauce for the supposedly compatible gander.

I argue (and I don't think I am alone) that for intrinsics and PODs solely thereof, elements that have been written to, either by C or C++ ought to have been constructed. If they have been written to through a cast of a correctly aligned memory pointer, they ought to 'exist' in that actual memory [subject to as-if-compatible optimisations, of course]. This is what we expect in C, and it is arguably what we would be reasonable to expect in C++.

I accept that this would require a differentiation in handling between PODs and non-POD structs in the standard. I think that's reasonable:

When constructors, destructors, copy, move ops are non-trivial, we expect 'object-like' behaviour. When they are trivial (particularly when they are NOPs) we expect memory-like behaviour.

Again, this is the de-facto reality on which the code base of every c++ program that calls a C library depends. Why not codify a de-facto reality in order to legitimise it?

Because making something de-jure requires actually pinning down what we're talking about, rather than making broad generalizations about the way we think things ought to work. It's easy to say what you're saying, but to actually make it work without the object model becoming contradictory is a huge process.

And even then, it wouldn't get you all of your C-isms.

For example, let's say you malloc some memory and pass it along to a C API that is going to fill that memory with consecutive `int`s. OK, fine; that memory now has a bunch of `int`s in it.

Know what it doesn't have? An array:

auto alloc = malloc(sizeof(int) * n);
get_ints
(alloc, n, ...);
auto ints = reinterpret_cast<int*>(alloc);
ints
[5]; //UB

This is the foundation of where `vector` has to rely on UB to get things done. Because the standard makes a distinction between objects that happen to be sequential in a piece of storage and an array of objects. And pointer arithmetic, the basis of [], only works on arrays. Pointer arithmetic is explicitly not allowed to jump from one object to another object unless they are in the same array.

Go ahead; take a look at [expr.add]. It only works for arrays. And the C API did not create an array.

Then there's the question of whether you're using the same type as the function filling in the object. A C API may have a `typedef struct` that contains an `int` and a `float`. If you declare a C++ struct that is layout compatible with it, then you can memcpy this data it into your C++ struct equivalent (if it is also trivially copyable). But unless you are using the same type definition as the C function that generated it, the objects that C created were their struct. And therefore, it is UB to simply cast that pointer to your C++ struct and start accessing it.

So now what? Do we say that if two standard layout types are layout compatible, it's OK to just pretend that one type is another type? Because that throws strict aliasing right out the window.

Then there's the question of why you need to restrict this to POD types at all. If what causes an object to come into being is copying into storage, would it not make sense to broaden the limitation based on that? That is, allow any type for which bitwise copies make sense? IE: trivially copyable. This is after all why C++11 created the standard-layout/trivially-copyable distinction: because POD is too narrow of a limitation based on what you're actually doing.

Now, some of these issues are things we need to fix (the specification of pointer arithmetic not allowing you to access sequential objects of the same type as though they were an array, for example). But overall, making C-isms legal C++ code is a lot of standardization work for little real gain.

Let C-isms remain UB. Encourage C++ programmers to follow C++ practices. And move forward.

The most I would be willing to see is a statement that code external to the C++ program is permitted to create objects compatible with C++'s object model, on an implementation-defined basis. But we should not standardize C-isms within actual C++ code.

Robert Haberlach

unread,
Jan 17, 2017, 11:25:50 AM1/17/17