Comitee stance on using aligned char arrays as raw storage without placement new for trivial types?

2,923 views
Skip to first unread message

Martin Ba

unread,
Jan 15, 2017, 3:56:56 PM1/15/17
to ISO C++ Standard - Discussion
Hi.

I'm currently trying to understand a few ... interesting ... observations I have been making wrt. the C++ Standard and using char arrays as raw storage.

Essentially, as far as I can tell (have been told), the current C++ Standard only allows using a char array as raw storage (see also std::aligned_storage) when objects are put into this via placement new, even for e.g. int or other trivial(*) types.

See: http://stackoverflow.com/questions/41624685/is-placement-new-legally-required-for-putting-an-int-into-a-char-array or related questions where I'm told I'm expected to do the following:

alignas(int) char buf[sizeof(int)];

void f() {
  // turn the memory into an int: (??) from the POV of the abstract machine!
  ::new (buf) int; // is this strictly required? (aside: it's obviously a no-op)

  // access storage:
  *((int*)buf) = 42; // for this discussion, just assume the cast itself yields the correct pointer value
}

Now, I'm *not* asking whether the current C++ Standard requires - or not - the noop placement new for this code to be defined.

What I would be interested in is whether this has been discussed in the committee (CWG?) in the last very few years
and whether there is any agreement if omitting the placement new (for trivial type) should be allowed or if Standard C++ should absolutely require the placement new.


Simple links to any paper(s) discussing this would be already appreciated, the only reference I found was P0137R1, and that's more about clarifying current wording afaikt.

Thanks.

- Martin

p.s.: (*) is "trivial type" the correct term?

p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers
that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)

Nicol Bolas

unread,
Jan 15, 2017, 6:37:00 PM1/15/17
to ISO C++ Standard - Discussion
I cannot tell you if any discussion has been had. However, I can give you some of the history of what in C++ prevents merely allocating memory from creating an object. That in itself is not definitive about discussions, but it is suggestive.

I do not actually have a copy of C++98 or C++03. The oldest working draft I can find that is available is N1638, which was released in 2004. [intro.object]/1 outlines when an object in C++ comes into being:

> An object is created by a definition (3.1), by a new-expression (5.3.4) or by the implementation (12.2) when needed.

I do have late drafts of C++11 and C++14. I'm not going to quote from their version of this section because they all say the exact same thing.

N4616, the current working draft leading into C++17, however, does change things. A bit:

> An object is created by a definition (3.1), by a new-expression (5.3.4), when implicitly changing the active member of a union (9.3), or when a temporary object is created (4.4, 12.2).

So the only change has been essentially a defect fix that makes unions actually work, in accord with the standard.

In at least 12 years of standardization, the committee has made no substantive change to the causes of bringing an object into being. While this is not conclusive, the fact that C++17 did put a fix into this section means that they have looked at it and talked about it at some point. So I would suggest that, if there was discussion about it, it did not progress beyond discussion.


On Sunday, January 15, 2017 at 3:56:56 PM UTC-5, Martin Ba wrote:
Hi.

I'm currently trying to understand a few ... interesting ... observations I have been making wrt. the C++ Standard and using char arrays as raw storage.

Essentially, as far as I can tell (have been told), the current C++ Standard only allows using a char array as raw storage (see also std::aligned_storage) when objects are put into this via placement new, even for e.g. int or other trivial(*) types.

See: http://stackoverflow.com/questions/41624685/is-placement-new-legally-required-for-putting-an-int-into-a-char-array or related questions where I'm told I'm expected to do the following:

alignas(int) char buf[sizeof(int)];

void f() {
  // turn the memory into an int: (??) from the POV of the abstract machine!
  ::new (buf) int; // is this strictly required? (aside: it's obviously a no-op)

  // access storage:
  *((int*)buf) = 42; // for this discussion, just assume the cast itself yields the correct pointer value
}

Now, I'm *not* asking whether the current C++ Standard requires - or not - the noop placement new for this code to be defined.

What I would be interested in is whether this has been discussed in the committee (CWG?) in the last very few years
and whether there is any agreement if omitting the placement new (for trivial type) should be allowed or if Standard C++ should absolutely require the placement new.


Simple links to any paper(s) discussing this would be already appreciated, the only reference I found was P0137R1, and that's more about clarifying current wording afaikt.

Thanks.

- Martin

p.s.: (*) is "trivial type" the correct term?

Correct term for what? Trivial Type is a term in C++, but it's unclear what it would mean for what you want to do.

Conceptually, a TrivialType is a type which is a pure block-of-bits, one for which any value of those bits is no less legal than any other. But C++ has other kinds of types.

A TriviallyCopyable type is a type for which a byte-by-byte copy operation is equivalent to a language-level copy or move operation. A TriviallyDefaultConstructible type is a type for which being uninitialized is a legitimate state. A TriviallyDestructible type is a type whose destruction is essentially irrelevant and can be ignored.

p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers 
that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)

Compilers do treat it as UB. UB doesn't mean "crash"; UB can still do what you want.

The point of the UB designation is to allow implementations to be reasonably fast. If you reinterpret cast a pointer to a different type, the compiler doesn't have to check to see if that object really exists there; it will simply trust your cast and pretend that there is an object there.

Martin Ba

unread,
Jan 16, 2017, 2:08:39 PM1/16/17
to ISO C++ Standard - Discussion
On Monday, January 16, 2017 at 12:37:00 AM UTC+1, Nicol Bolas wrote:
I cannot tell you if any discussion has been had. However, ...
... So the only change has been essentially a defect fix that makes unions actually work, in accord with the standard.


In at least 12 years of standardization, the committee has made no substantive change to the causes of bringing an object into being. While this is not conclusive, the fact that C++17 did put a fix into this section means that they have looked at it and talked about it at some point. So I would suggest that, if there was discussion about it, it did not progress beyond discussion.


Thanks a lot for that wrap up!
 
-snip-

p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers 
that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)

Compilers do treat it as UB. UB doesn't mean "crash"; UB can still do what you want.

The point of the UB designation is to allow implementations to be reasonably fast. If you reinterpret cast a pointer to a different type, the compiler doesn't have to check to see if that object really exists there; it will simply trust your cast and pretend that there is an object there.

What I meant by "treating it as UB" was in the same vein as, e.g., signed integer overflow. Compilers generate code today that doesn't work anymore if it relies/relied on signed integer overflow, although older optimizer didn't "break" anything.

In the same vein, I'm sure we can imagine several transformations that break code that has no "placement new" (from my OP) that used (and uses) to work.

- Martin

Nicol Bolas

unread,
Jan 16, 2017, 3:29:02 PM1/16/17
to ISO C++ Standard - Discussion

Such as?

Assuming a lack of signed integer overflow means that the compiler doesn't have to insert code to check for integer overflow. The UB designation allows correct code (code without overflows) to execute at maximum performance. Any degrading of incorrect code is merely a consequence of making correct code as fast as possible.

Let's say that you have a function that returns a `T*`. The fastest code generated which uses this return value is code which assumes that `T*` points to a live, valid object of type `T`. To do anything else makes correct code slower. Even if you inlined that function or could otherwise be certain that the `T*` was not valid, that simply means UB happens. Do you think compiler writers are going to detect such circumstances and make the code fail in some way?

Can you give an example of these "several transformations"? How would they speed up correct code?

It should also be noted that, well, we can trace this rule back at least 12 years. Compilers haven't done anything to break such code yet.

Jens Maurer

unread,
Jan 16, 2017, 4:11:19 PM1/16/17
to std-dis...@isocpp.org
On 01/15/2017 09:56 PM, Martin Ba wrote:
> Hi.
>
> I'm currently trying to understand a few ... interesting ... observations I have been making wrt. the C++ Standard and using char arrays as raw storage.
>
> Essentially, as far as I can tell (have been told), the current C++ Standard only allows using a char array as raw storage (see also std::aligned_storage) when objects are put into this via placement new, even for e.g. int or other trivial(*) types.
>
> See: http://stackoverflow.com/questions/41624685/is-placement-new-legally-required-for-putting-an-int-into-a-char-array or related questions where I'm told I'm expected to do the following:
>
> |alignas(int) char buf[sizeof(int)];
>
> void f() {
> // turn the memory into an int: (??) from the POV of the abstract machine!
> ::new (buf) int; // is this strictly required? (aside: it's obviously a no-op)
>
> // access storage:
> *((int*)buf) = 42; // for this discussion, just assume the cast itself yields the correct pointer value
> }

> What I would be interested in is whether this has been discussed in the committee (CWG?) in the last very few years
> and whether there is any agreement if omitting the placement new (for trivial type) should be allowed or if Standard C++ should absolutely require the placement new./

I believe I can say that CWG agrees that the words now in C++17 correctly
reflect the intent that you need the placement new in the case above.

If you believe that intent is misguided, feel free to propose a change.
I'm sure compiler writers will explain to you how that substantially
pessimizes their code generation.

> p.s.: (*) is "trivial type" the correct term?
>
> p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers
> that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)

Some compilers might make special allowances for their particular user
community, precisely out of concerns you cited. That doesn't make your
code any better.

Jens

Martin Ba

unread,
Jan 16, 2017, 4:18:30 PM1/16/17
to ISO C++ Standard - Discussion


On Monday, January 16, 2017 at 9:29:02 PM UTC+1, Nicol Bolas wrote:
On Monday, January 16, 2017 at 2:08:39 PM UTC-5, Martin Ba wrote:
On Monday, January 16, 2017 at 12:37:00 AM UTC+1, Nicol Bolas wrote:
I cannot tell you if any discussion has been had. However, ...
... So the only change has been essentially a defect fix that makes unions actually work, in accord with the standard.

In at least 12 years of standardization, the committee has made no substantive change to the causes of bringing an object into being. While this is not conclusive, the fact that C++17 did put a fix into this section means that they have looked at it and talked about it at some point. So I would suggest that, if there was discussion about it, it did not progress beyond discussion.


Thanks a lot for that wrap up!
 
-snip-
p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers 
that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)

Compilers do treat it as UB. UB doesn't mean "crash"; UB can still do what you want.

The point of the UB designation is to allow implementations to be reasonably fast. If you reinterpret cast a pointer to a different type, the compiler doesn't have to check to see if that object really exists there; it will simply trust your cast and pretend that there is an object there.

What I meant by "treating it as UB" was in the same vein as, e.g., signed integer overflow. Compilers generate code today that doesn't work anymore if it relies/relied on signed integer overflow, although older optimizer didn't "break" anything.

In the same vein, I'm sure we can imagine several transformations that break code that has no "placement new" (from my OP) that used (and uses) to work.

Such as?

Assuming a lack of signed integer overflow means that the compiler doesn't have to insert code to check for integer overflow. The UB designation allows correct code (code without overflows) to execute at maximum performance. Any degrading of incorrect code is merely a consequence of making correct code as fast as possible.


See e.g.: http://stackoverflow.com/questions/7682477/why-does-integer-overflow-on-x86-with-gcc-cause-an-infinite-loop ... "The compiler assumes you won't cause undefined behavior, and optimizes away the loop test."


Let's say that you have a function that returns a `T*`. The fastest code generated which uses this return value is code which assumes that `T*` points to a live, valid object of type `T`. To do anything else makes correct code slower. Even if you inlined that function or could otherwise be certain that the `T*` was not valid, that simply means UB happens. Do you think compiler writers are going to detect such circumstances and make the code fail in some way?

Can you give an example of these "several transformations"? How would they speed up correct code?


In the same vein as gcc's -fdelete-null-pointer-checks - (see e.g. http://stackoverflow.com/questions/23153445/can-branches-with-undefined-behavior-be-assumed-unreachable-and-optimized-as-dea) the compiler sees a branch that definitiely invokes UB and optimizes away the branch and the branch check.

It should also be noted that, well, we can trace this rule back at least 12 years. Compilers haven't done anything to break such code yet.

Yet. And I assume (FWIW) as a matter of QoI they won''t. But then, stuff like -fwrapv and -fno-delete-null-pointer-checks have happened in the sense that compiler writers saw legal optimization opportunities that break some code. So, just because I or you cannot see any reason today, that's not much consolation to me :-)

Jens Maurer

unread,
Jan 16, 2017, 4:22:01 PM1/16/17
to std-dis...@isocpp.org
On 01/16/2017 09:29 PM, Nicol Bolas wrote:
> On Monday, January 16, 2017 at 2:08:39 PM UTC-5, Martin Ba wrote:
> What I meant by "treating it as UB" was in the same vein as, e.g., signed integer overflow. Compilers generate code today that doesn't work anymore if it relies/relied on signed integer overflow, although older optimizer didn't "break" anything.
>
>
> In the same vein, I'm sure we can imagine several transformations that break code that has no "placement new" (from my OP) that used (and uses) to work.
>
>
> Such as?

Here's a gentle introduction to undefined behavior vs.
optimizations:

http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html

And don't forget to follow the link to http://blog.regehr.org/archives/213 .

Jens

Chris Hallock

unread,
Jan 16, 2017, 4:24:35 PM1/16/17
to ISO C++ Standard - Discussion
In the same vein, I'm sure we can imagine several transformations that break code that has no "placement new" (from my OP) that used (and uses) to work.

Such as?

An aggressively-optimizing compiler that assumes perfectly-well-formed C++ input could detect that this code is UB and therefore assume it never executes (i.e. dead code that can be omitted from the binary).

Martin Ba

unread,
Jan 16, 2017, 4:49:45 PM1/16/17
to ISO C++ Standard - Discussion
On Monday, January 16, 2017 at 10:11:19 PM UTC+1, Jens Maurer wrote:
On 01/15/2017 09:56 PM, Martin Ba wrote:
> Hi.
>
> I'm currently trying to understand a few ... interesting ... observations I have been making wrt. the C++ Standard and using char arrays as raw storage.
>
> Essentially, as far as I can tell (have been told), the current C++ Standard only allows using a char array as raw storage (see also std::aligned_storage) when objects are put into this via placement new, even for e.g. int or other trivial(*) types.
>
> See: http://stackoverflow.com/questions/41624685/is-placement-new-legally-required-for-putting-an-int-into-a-char-array or related questions where I'm told I'm expected to do the following:
>
> |alignas(int) char buf[sizeof(int)];
>
> void f() {
>   // turn the memory into an int: (??) from the POV of the abstract machine!
>   ::new (buf) int; // is this strictly required? (aside: it's obviously a no-op)
>
>   // access storage:
>   *((int*)buf) = 42; // for this discussion, just assume the cast itself yields the correct pointer value
> }

> What I would be interested in is whether this has been discussed in the committee (CWG?) in the last very few years
> and whether there is any agreement if omitting the placement new (for trivial type) should be allowed or if Standard C++ should absolutely require the placement new./

I believe I can say that CWG agrees that the words now in C++17 correctly
reflect the intent that you need the placement new in the case above.


If this is really the intent, then this needs to be more clearly communicated and, I feel, rationalized. (Maybe it already has? Thats what the OP was actually about.)
 
If you believe that intent is misguided, feel free to propose a change.

Yes, I very much feel the intent is misguided. For two reasons:
  • This intent declares UB totally reasonable legacy code. At least I consider it reasonable too *not* have to place a no-op placement new in straightforward buffer backed code for trivial types.
  • Since C doesn't have placement new, any C code that uses a char buffer to back any other typed data is automatically UB in C++. Another unnecessary incompatibility.  

All the change I can propose is that CWG considers some way to make this work. (As it does in practice anyway *today*.) As I understand so far from what I gleaned from P0137R1 is that the problem we have at the moment is that the definition for objects (in the memory location sense) doesn't allow this and that it's pretty complex and hard to come up with something that does allow it without restricting other things.


I'm sure compiler writers will explain to you how that substantially
pessimizes their code generation.

For this specific case, I do hope not. I'm braced for anything.
 

> p.s.: (*) is "trivial type" the correct term?
>
> p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers
> that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)

Some compilers might make special allowances for their particular user
community, precisely out of concerns you cited.  That doesn't make your
code any better.


As far as the C++ Standard goes, I'm not so much concerned with "better" but with not allowing future compilers to break reasonable legacy code.

*When* using char arrays (or malloc'ed memory) as backing store for trivial types, I fully assume most (non generic) existing code to *not* employ placement new, simply because it's the straightforward thing to (not) do and the placement new would be a no-op and all compilers up to today seem to generate working code.

I think, here, the C++ Standard should take into account this "existing practice". (Yeah, I know the same arguments were/are raised wrt. signed integer overflow or the nullpointer-check-elimination, but I at least feel those cases, while possible problematic in quite some cases, are historically quite more clear cut. And at least both affect C and C++ code the same.)

cheers.


Jens Maurer

unread,
Jan 16, 2017, 5:41:33 PM1/16/17
to std-dis...@isocpp.org
On 01/16/2017 10:49 PM, Martin Ba wrote:
> I believe I can say that CWG agrees that the words now in C++17 correctly
> reflect the intent that you need the placement new in the case above.
>
>
> If this is really the intent, then this needs to be more clearly communicated

Well, the C++ committee doesn't have a PR department. What's unclear about:

1.8p1 [intro.object]

"The constructs in a C++ program create, destroy, refer to, access, and manipulate objects.
An object is created by a definition (3.1), by a new-expression (5.3.4), when implicitly
changing the active member of a union (9.3), or when a temporary object is created
(4.4, 12.2). ..."

> and, I feel, rationalized. (Maybe it already has? Thats what the OP was actually about.)

I've reviewed the notes on the CWG discussions for P0137Rx and I could not
find anything that would directly talk about your example.

>
> If you believe that intent is misguided, feel free to propose a change.
>
>
> Yes, I very much feel the intent is misguided. For two reasons:
>
> * This intent declares UB totally reasonable legacy code.

Even legacy code should have used "memcpy" here.

> At least I consider it reasonable too *not* have to place a no-op placement new in straightforward buffer backed code for trivial types.
> * Since C doesn't have placement new, any C code that uses a char buffer to back any other typed data is automatically UB in C++. Another unnecessary incompatibility.
>
> All the change I can propose is that CWG considers some way to make this work. (As it does in practice anyway *today*.) As I understand so far from what I gleaned from P0137R1 is that the problem we have at the moment is that the definition for objects (in the memory location sense) doesn't allow this and that it's pretty complex and hard to come up with something that does allow it without restricting other things.

Well, without a specific proposal on the table for rules that make this
work, but don't detrimentally affect other cases, I'm afraid nothing much
will happen.

> *When* using char arrays (or malloc'ed memory) as backing store for trivial types, I fully assume most (non generic) existing code to *not* employ placement new, simply because it's the straightforward thing to (not) do and the placement new would be a no-op and all compilers up to today seem to generate working code.

When compilers introduced type-based alias analysis, there was lots of broken code
that could be made to work with -fno-strict-aliasing. The code, eventually, got
fixed. I'm sure people using char arrays as backing store will fix their code
eventually, or learn to live with the shame of -fobjects-spring-to-life eternally.

> I think, here, the C++ Standard should take into account this "existing practice". (Yeah, I know the same arguments were/are raised wrt. signed integer overflow or the nullpointer-check-elimination, but I at least feel those cases, while possible problematic in quite some cases, are historically quite more clear cut. And at least both affect C and C++ code the same.)

Again, without a proposal, nothing is likely to happen.
(Off-topic: The rules for signed bit-shifts are subtly different between
C and C++, last I looked.)

Jens

T. C.

unread,
Jan 16, 2017, 6:14:35 PM1/16/17
to ISO C++ Standard - Discussion


On Monday, January 16, 2017 at 3:29:02 PM UTC-5, Nicol Bolas wrote:
On Monday, January 16, 2017 at 2:08:39 PM UTC-5, Martin Ba wrote:
In the same vein, I'm sure we can imagine several transformations that break code that has no "placement new" (from my OP) that used (and uses) to work.

Such as?


See Richard Smith's comment in this Reddit thread: https://www.reddit.com/r/cpp/comments/5fk3wn/undefined_behavior_with_reinterpret_cast/dal28n0/ for an example.


Richard Smith

unread,
Jan 16, 2017, 7:23:30 PM1/16/17
to std-dis...@isocpp.org
On 16 January 2017 at 13:49, Martin Ba <0xcdc...@gmx.at> wrote:
On Monday, January 16, 2017 at 10:11:19 PM UTC+1, Jens Maurer wrote:
On 01/15/2017 09:56 PM, Martin Ba wrote:
> Hi.
>
> I'm currently trying to understand a few ... interesting ... observations I have been making wrt. the C++ Standard and using char arrays as raw storage.
>
> Essentially, as far as I can tell (have been told), the current C++ Standard only allows using a char array as raw storage (see also std::aligned_storage) when objects are put into this via placement new, even for e.g. int or other trivial(*) types.
>
> See: http://stackoverflow.com/questions/41624685/is-placement-new-legally-required-for-putting-an-int-into-a-char-array or related questions where I'm told I'm expected to do the following:
>
> |alignas(int) char buf[sizeof(int)];
>
> void f() {
>   // turn the memory into an int: (??) from the POV of the abstract machine!
>   ::new (buf) int; // is this strictly required? (aside: it's obviously a no-op)
>
>   // access storage:
>   *((int*)buf) = 42; // for this discussion, just assume the cast itself yields the correct pointer value
> }

> What I would be interested in is whether this has been discussed in the committee (CWG?) in the last very few years
> and whether there is any agreement if omitting the placement new (for trivial type) should be allowed or if Standard C++ should absolutely require the placement new./

I believe I can say that CWG agrees that the words now in C++17 correctly
reflect the intent that you need the placement new in the case above.


If this is really the intent, then this needs to be more clearly communicated and, I feel, rationalized. (Maybe it already has? Thats what the OP was actually about.)
 
If you believe that intent is misguided, feel free to propose a change.

Yes, I very much feel the intent is misguided. For two reasons:
  • This intent declares UB totally reasonable legacy code. At least I consider it reasonable too *not* have to place a no-op placement new in straightforward buffer backed code for trivial types.
  • Since C doesn't have placement new, any C code that uses a char buffer to back any other typed data is automatically UB in C++. Another unnecessary incompatibility.  
The above code has undefined behavior in C too. C's effective type rules do not permit changing the effective type of a declared object to something other than its declared type; it only permits that for objects allocated with malloc or similar.

In the case where the storage /was/ allocated through malloc or similar, C++ requires a placement new where C simply allows the effective type to change through a store (and some parts of the C effective type model don't work as a result...). It would seem reasonable to me for such allocation functions to be specified to have implicitly created whatever set of objects the following code relies on existing[1] -- the compiler typically has to make that pessimistic assumption anyway, since it doesn't know what objects the implementation of an opaque function might create, so it seems like we'd lose little and gain more C compatibility by guaranteeing something like that.

 [1]: that is, we could require the compiler to assume that malloc runs a sequence of placement news (for types with trivial default construction and destruction) before it returns, where that set is chosen to be whatever set gives the program defined behavior -- if such a set exists

All the change I can propose is that CWG considers some way to make this work. (As it does in practice anyway *today*.) As I understand so far from what I gleaned from P0137R1 is that the problem we have at the moment is that the definition for objects (in the memory location sense) doesn't allow this and that it's pretty complex and hard to come up with something that does allow it without restricting other things.


I'm sure compiler writers will explain to you how that substantially
pessimizes their code generation.

For this specific case, I do hope not. I'm braced for anything.
 

> p.s.: (*) is "trivial type" the correct term?
>
> p.p.s.: My personal impression on the matter is that requiring the placement new for trivial types (like int, ...) is rather insane and the amount of real world code compiled with C++ compilers
> that would be broken should any C++ compiler/optimizer ever manage to actually treat this as UB is quite huge. 'Course I may be totally off here. Just take this as a disclaimer :-)

Some compilers might make special allowances for their particular user
community, precisely out of concerns you cited.  That doesn't make your
code any better.


As far as the C++ Standard goes, I'm not so much concerned with "better" but with not allowing future compilers to break reasonable legacy code.

*When* using char arrays (or malloc'ed memory) as backing store for trivial types, I fully assume most (non generic) existing code to *not* employ placement new, simply because it's the straightforward thing to (not) do and the placement new would be a no-op and all compilers up to today seem to generate working code.

I think, here, the C++ Standard should take into account this "existing practice". (Yeah, I know the same arguments were/are raised wrt. signed integer overflow or the nullpointer-check-elimination, but I at least feel those cases, while possible problematic in quite some cases, are historically quite more clear cut. And at least both affect C and C++ code the same.)

cheers.


--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Nicol Bolas

unread,
Jan 16, 2017, 8:47:36 PM1/16/17
to ISO C++ Standard - Discussion

The result of a "sequence of placement news" on a piece of memory is the creation of an object of the last type `new`ed. The C++ object model does not permit storage to have an indeterminate object or many separate objects (outside of nesting). If you allocate 4 bytes and new an `int` into it, then it is an int. If you new a `float` into it, it stops being an `int`.

So, how would you suggest the object model change to accommodate such a thing? What is the syntax that causes a piece of storage that contains all objects to contain just one?

Personally? I say let it go. C++ programmers have managed to survive this being UB since at least 2004. We're teaching C++ programmers nowadays to avoid pointless casting; the average C++ programmer today is far more likely to employ placement-new than to do casts and assume it was constructed.

I'd rather the committee spend time shoring up the object model for genuine C++ purposes, like making it possible for `vector` to be implemented without UB.

Richard Smith

unread,
Jan 16, 2017, 9:05:11 PM1/16/17
to std-dis...@isocpp.org
I never said they would all be at the start of the allocation.

So, how would you suggest the object model change to accommodate such a thing? What is the syntax that causes a piece of storage that contains all objects to contain just one?

Personally? I say let it go. C++ programmers have managed to survive this being UB since at least 2004. We're teaching C++ programmers nowadays to avoid pointless casting; the average C++ programmer today is far more likely to employ placement-new than to do casts and assume it was constructed.

I'd rather the committee spend time shoring up the object model for genuine C++ purposes, like making it possible for `vector` to be implemented without UB.

--

Nicol Bolas

unread,
Jan 16, 2017, 9:25:26 PM1/16/17
to ISO C++ Standard - Discussion

... that doesn't make sense. I mean, where else are they going to be except for the start? If I allocate 4 bytes, then you need to `new` up both `int` and `float` (assuming they're both 4 bytes, of course). But there's no room to `new` them at different addresses within that allocation, since the allocation is only 4 bytes.

So where would you be allocating these different objects?

Richard Smith

unread,
Jan 16, 2017, 11:09:59 PM1/16/17
to std-dis...@isocpp.org
I don't know what this example is supposed to demonstrate.

So where would you be allocating these different objects?

If you allocate 8 bytes, there could be an int object at offset 0 and a float object at offset 4.

Demi Obenour

unread,
Jan 17, 2017, 12:36:44 AM1/17/17
to 'Edward Catmur' via ISO C++ Standard - Discussion
I very much disagree.

C++ can use placement new, true.  But C cannot, and many programs need to compile and run as both.

Furthermore, I don't know of any reasonable way a compiler could exploit this to produce better code.  Strict aliasing doesn't apply, since char pointers can alias anything.  More importantly, you make it impossible to take an aligned char array — say, one filled in by an I/O operation — and cast it to an array of (say) int without an O(n) copy and a 2x memory overhead!  That is anything BUT fast.  C++ should NOT impose such overheads.

--

Thiago Macieira

unread,
Jan 17, 2017, 2:41:34 AM1/17/17
to std-dis...@isocpp.org
Em terça-feira, 17 de janeiro de 2017, às 00:36:42 PST, Demi Obenour escreveu:
> C++ can use placement new, true. But C cannot, and many programs need to
> compile and run as both.

That argument doesn't apply. You can write code that compiles as both C and
C++, but that does not mean the rules from one language apply in the other.

If you need to write C++-specific code, you can always just use #ifdef
__cplusplus.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

Andrey Semashev

unread,
Jan 17, 2017, 5:47:24 AM1/17/17
to std-dis...@isocpp.org
On 01/17/17 04:47, Nicol Bolas wrote:
>
> The result of a "sequence of placement news" on a piece of memory is the
> creation of an object of the last type `new`ed. The C++ object model
> does not permit storage to have an indeterminate object or many separate
> objects (outside of nesting).

Does it not? Could you provide a reference to the standard?

I have always assumed the following was a well defined code:

char* p = static_cast< char* >(malloc(sizeof(int) + sizeof(float)));
int* pi = new (p) int;
float* pf = new (p + sizeof(int)) float;

(null checks and alignment accounting skipped for brevity).

Regarding the OP, I think it is fair to say that malloc returns a
storage, which is a sequence of bytes (chars) that is allowed to alias
any other type. In that sense, the compiler has no way to know what
actual objects are created by malloc in that storage, so when the user
cases the returned pointer, type aliasing effectively happens. Whether
that is UB or not is a grey area because we don't know if the storage
actually contains the objects that we casted the pointer returned my
malloc to. Regardless, the compiler cannot assume that the code is UB
and e.g. remove it.

> Personally? I say let it go. C++ programmers have managed to survive
> this being UB since at least 2004. We're teaching C++ programmers
> nowadays to avoid pointless casting; the average C++ programmer today is
> far more likely to employ placement-new than to do casts and assume it
> was constructed.

I disagree, because it requires programmers to write pointless code that
is known to be no-op anyway, just to satisfy the spec.

char* p = static_cast< char* >(malloc(sizeof(int) * 10));

// What is this code written for?
char* pi = p, *pe = p + sizeof(int) * 10;
for (; pi != pe; pi += sizeof(int))
{
new (pi) int;
}

// Use the array of ints
int* q = reinterpret_cast< int* >(p);

Andrey Semashev

unread,
Jan 17, 2017, 5:50:56 AM1/17/17
to std-dis...@isocpp.org
On 01/17/17 13:47, Andrey Semashev wrote:
> On 01/17/17 04:47, Nicol Bolas wrote:
>>
>> The result of a "sequence of placement news" on a piece of memory is the
>> creation of an object of the last type `new`ed. The C++ object model
>> does not permit storage to have an indeterminate object or many separate
>> objects (outside of nesting).
>
> Does it not? Could you provide a reference to the standard?
>
> I have always assumed the following was a well defined code:
>
> char* p = static_cast< char* >(malloc(sizeof(int) + sizeof(float)));
> int* pi = new (p) int;
> float* pf = new (p + sizeof(int)) float;
>
> (null checks and alignment accounting skipped for brevity).
>
> Regarding the OP, I think it is fair to say that malloc returns a
> storage, which is a sequence of bytes (chars) that is allowed to alias
> any other type. In that sense, the compiler has no way to know what
> actual objects are created by malloc in that storage, so when the user
> cases the returned pointer,

so when the user casts...

Robert Haberlach

unread,
Jan 17, 2017, 6:00:32 AM1/17/17
to std-dis...@isocpp.org
On 1/17/2017 11:47 AM, Andrey Semashev wrote:
> On 01/17/17 04:47, Nicol Bolas wrote:
>>
>> The result of a "sequence of placement news" on a piece of memory is the
>> creation of an object of the last type `new`ed. The C++ object model
>> does not permit storage to have an indeterminate object or many separate
>> objects (outside of nesting).
>
> Does it not? Could you provide a reference to the standard?
See [intro.object]/6 as modified by P0137.

>
> I have always assumed the following was a well defined code:
>
> char* p = static_cast< char* >(malloc(sizeof(int) + sizeof(float)));
> int* pi = new (p) int;
> float* pf = new (p + sizeof(int)) float;
>
> (null checks and alignment accounting skipped for brevity).
Yes, this is well-defined AFAICS.

> Regarding the OP, I think it is fair to say that malloc returns a
> storage, which is a sequence of bytes (chars) that is allowed to alias
> any other type.
Storage cannot be aliased. Objects can.

> In that sense, the compiler has no way to know what actual objects are
> created by malloc in that storage,
There are no objects in that storage. There is a very specific list of
situations in which objects are created, and calls to malloc are
excluded (and that has always been so).

> so when the user cases the returned pointer, type aliasing effectively
> happens. Whether that is UB or not is a grey area because we don't
> know if the storage actually contains the objects that we casted the
> pointer returned my malloc to.
Yes, we do know that--see above. malloc and its semantics are known to
the compiler.

>> Personally? I say let it go. C++ programmers have managed to survive
>> this being UB since at least 2004. We're teaching C++ programmers
>> nowadays to avoid pointless casting; the average C++ programmer today is
>> far more likely to employ placement-new than to do casts and assume it
>> was constructed.
>
> I disagree, because it requires programmers to write pointless code
> that is known to be no-op anyway, just to satisfy the spec.

"Pointless" and "satisfy the spec" are contradictions. You have to start
viewing C++ as an abstract language with an abstract object model and
not some type of hacking playground, where as long as your pointers
contain the correct value and the memory is aligned, everything is well.

>
> char* p = static_cast< char* >(malloc(sizeof(int) * 10));
>
> // What is this code written for?
> char* pi = p, *pe = p + sizeof(int) * 10;
> for (; pi != pe; pi += sizeof(int))
> {
> new (pi) int;
> }

Why didn't you just allocate using new[]? Of course you have to explain
to the implementation that each raw memory location corresponds to an
int, and *you* imposed that burden on yourself--not the language. If
you're so into malloc, try C.

Andrey Semashev

unread,
Jan 17, 2017, 6:53:04 AM1/17/17
to std-dis...@isocpp.org
On 01/17/17 14:00, Robert Haberlach wrote:
> On 1/17/2017 11:47 AM, Andrey Semashev wrote:
>> On 01/17/17 04:47, Nicol Bolas wrote:
>>>
>>> The result of a "sequence of placement news" on a piece of memory is the
>>> creation of an object of the last type `new`ed. The C++ object model
>>> does not permit storage to have an indeterminate object or many separate
>>> objects (outside of nesting).
>>
>> Does it not? Could you provide a reference to the standard?
> See [intro.object]/6 as modified by P0137.
>
>>
>> I have always assumed the following was a well defined code:
>>
>> char* p = static_cast< char* >(malloc(sizeof(int) + sizeof(float)));
>> int* pi = new (p) int;
>> float* pf = new (p + sizeof(int)) float;
>>
>> (null checks and alignment accounting skipped for brevity).
> Yes, this is well-defined AFAICS.
>
>> Regarding the OP, I think it is fair to say that malloc returns a
>> storage, which is a sequence of bytes (chars) that is allowed to alias
>> any other type.
> Storage cannot be aliased. Objects can.
>
>> In that sense, the compiler has no way to know what actual objects are
>> created by malloc in that storage,
> There are no objects in that storage. There is a very specific list of
> situations in which objects are created, and calls to malloc are
> excluded (and that has always been so).

I don't find where it is excluded. malloc is an opaque function that
returns a void pointer. The compiler has no way to know what objects are
created in the storage accessible through that pointer.

>> so when the user cases the returned pointer, type aliasing effectively
>> happens. Whether that is UB or not is a grey area because we don't
>> know if the storage actually contains the objects that we casted the
>> pointer returned my malloc to.
> Yes, we do know that--see above. malloc and its semantics are known to
> the compiler.

malloc is imported from C in [c.malloc]. In C99, 7.20.3/1, there is this
part:

> The pointer returned if the allocation succeeds is suitably aligned
> so that it may be assigned to a pointer to any type of object
> and then used to access such an object or an array of such objects
> in the space allocated (until the space is explicitly deallocated).
> The lifetime of an allocated object extends from the allocation until
> the deallocation.

So, according to C, the returned pointer may represent whatever object
the pointer is casted to.

Ok, you may argue that that description is given in terms of C, and that
doesn't mean that the same is valid with regard to C++ objects,
including trivial ones like int. Fair enough, but in that case the C++
standard should clarify that. And condidering that there are allocation
functions other than malloc/calloc, the only sane behavior would be the
one compatible with C.

>>> Personally? I say let it go. C++ programmers have managed to survive
>>> this being UB since at least 2004. We're teaching C++ programmers
>>> nowadays to avoid pointless casting; the average C++ programmer today is
>>> far more likely to employ placement-new than to do casts and assume it
>>> was constructed.
>>
>> I disagree, because it requires programmers to write pointless code
>> that is known to be no-op anyway, just to satisfy the spec.
>
> "Pointless" and "satisfy the spec" are contradictions.

No, unless you write code just to satisfy the spec. I personally don't
find that kind of activity productive.

> You have to start
> viewing C++ as an abstract language with an abstract object model and
> not some type of hacking playground, where as long as your pointers
> contain the correct value and the memory is aligned, everything is well.

Abstractions are a tool that the spec writers use to describe
generalized behavior of multiple conforming implementations. If that
tool requires people to write pointless code then the tool is broken.

>> char* p = static_cast< char* >(malloc(sizeof(int) * 10));
>>
>> // What is this code written for?
>> char* pi = p, *pe = p + sizeof(int) * 10;
>> for (; pi != pe; pi += sizeof(int))
>> {
>> new (pi) int;
>> }
>
> Why didn't you just allocate using new[]?

Because I might have reasons to. E.g. to pass that pointer to a C
library later that I call from my otherwise C++ code. Or we can pretend
that that is not malloc but posix_memalign/aligned_alloc/whatever_alloc
that provides additional properties of the allocated memory, like
increased alignment, that cannot be provided by operator new.

> Of course you have to explain
> to the implementation that each raw memory location corresponds to an
> int, and *you* imposed that burden on yourself--not the language.

The point I'm making is that the requirement to explain this is imposed
by the standard, while there is no technical reason to do that. That is
what I called "writing code just to satisfy the spec".

Richard Hodges

unread,
Jan 17, 2017, 7:37:54 AM1/17/17
to std-dis...@isocpp.org
It seems to me that this entire discussion can be paraphrased as, "is c++ compatible with c or not?". [compatibility being defined as PODs legally created by one are valid in the other].

The answer appears, at least insofar as the standard is concerned, "no, or at best, implementation defined".

I think it would be fair to say that the vast majority of c++'s user base would hope that the answer is, "yes".

I think it's therefore reasonable that the standard should give PODs special treatment, allowing them to have been created (default-initialised) simply by allocating properly aligned storage of sufficient size.

This would then allow c++ to interoperate with c (in both directions) both legally and de-facto.

Since the standard does seem to acknowledge the C language, is there any reasonable argument that suggests that this explicit interoperability should not be in the standard? 

R


Nicol Bolas

unread,
Jan 17, 2017, 8:31:48 AM1/17/17
to ISO C++ Standard - Discussion

You originally said:

> we could require the compiler to assume that malloc runs a sequence of placement news (for types with trivial default construction and destruction) before it returns, where that set is chosen to be whatever set gives the program defined behavior -- if such a set exists

Therefore, if I 'malloc' 4 bytes of storage, then placement `new` will be executed on that storage for both `int` and `float`. Among others. That's what you're asking for.


And as I said, that would make the memory both an `int` and a `float` at the same time. You then said:

> I never said they would all be at the start of the allocation.

Then where is it going to be? Where does the `int` get created and where does the `float` get created, since there's not room enough for both?

I'm trying to understand what you're suggesting the standard do here, and thus far, it does not make sense.

Nicol Bolas

unread,
Jan 17, 2017, 8:33:15 AM1/17/17
to ISO C++ Standard - Discussion


On Tuesday, January 17, 2017 at 12:36:44 AM UTC-5, Demi Obenour wrote:
I very much disagree.

C++ can use placement new, true.  But C cannot, and many programs need to compile and run as both.

Furthermore, I don't know of any reasonable way a compiler could exploit this to produce better code.  Strict aliasing doesn't apply, since char pointers can alias anything.  More importantly, you make it impossible to take an aligned char array — say, one filled in by an I/O operation — and cast it to an array of (say) int without an O(n) copy and a 2x memory overhead!  That is anything BUT fast.  C++ should NOT impose such overheads.

Or you could just create an `int` array to begin with, then pass it to an I/O operation to be filled in.

Nicol Bolas

unread,
Jan 17, 2017, 8:39:19 AM1/17/17
to ISO C++ Standard - Discussion


On Tuesday, January 17, 2017 at 6:53:04 AM UTC-5, Andrey Semashev wrote:
> Of course you have to explain
> to the implementation that each raw memory location corresponds to an
> int, and *you* imposed that burden on yourself--not the language.

The point I'm making is that the requirement to explain this is imposed
by the standard, while there is no technical reason to do that.

If you know that there really is "no technical reason to do that", then you ought to be able to propose changes to the specification that permit such a thing without creating an object model that is inherently self-contradictory or is otherwise broken.

So what exactly do you suggest we change? Not merely, "do stuff to make this work". But what exact changes should be made to the specification?

Nicol Bolas

unread,
Jan 17, 2017, 8:45:37 AM1/17/17
to ISO C++ Standard - Discussion


On Tuesday, January 17, 2017 at 7:37:54 AM UTC-5, Richard Hodges wrote:
It seems to me that this entire discussion can be paraphrased as, "is c++ compatible with c or not?". [compatibility being defined as PODs legally created by one are valid in the other].

The answer appears, at least insofar as the standard is concerned, "no, or at best, implementation defined".

I think it would be fair to say that the vast majority of c++'s user base would hope that the answer is, "yes".

I think it's therefore reasonable that the standard should give PODs special treatment, allowing them to have been created (default-initialised) simply by allocating properly aligned storage of sufficient size.

This would then allow c++ to interoperate with c (in both directions) both legally and de-facto.

It would also create a dysfunctional C++ object model. A piece of storage would have to have every object that could fit into it all at the same time. You're basically saying that any piece of memory should be able to be treated as a union of all appropriate types.

As I understand it, that's not even the C object model; it manifests objects in arbitrary memory by you writing to them.

Robert Haberlach

unread,
Jan 17, 2017, 8:52:44 AM1/17/17
to std-dis...@isocpp.org
On 1/17/2017 12:52 PM, Andrey Semashev wrote:
> On 01/17/17 14:00, Robert Haberlach wrote:
>> There is a very specific list of
>> situations in which objects are created, and calls to malloc are
>> excluded (and that has always been so).
>
> I don't find where it is excluded. malloc is an opaque function that
> returns a void pointer. The compiler has no way to know what objects
> are created in the storage accessible through that pointer.
It isn't explicitly excluded; rather, malloc is specified as allocating
storage, and the list does not mention malloc as creating an object
(e.g. a char array).

> Ok, you may argue that that description is given in terms of C, and
> that doesn't mean that the same is valid with regard to C++ objects,
> including trivial ones like int. Fair enough, but in that case the C++
> standard should clarify that.
It doesn't need to; the status quo (as mentioned in P0137 as a drafting
note) is that malloc does not create objects. Period. You are needlessly
sceptic about things that are clear to everyone else.

> And condidering that there are allocation functions other than
> malloc/calloc, the only sane behavior would be the one compatible with C.
I disagree.

>
>>>> Personally? I say let it go. C++ programmers have managed to survive
>>>> this being UB since at least 2004. We're teaching C++ programmers
>>>> nowadays to avoid pointless casting; the average C++ programmer
>>>> today is
>>>> far more likely to employ placement-new than to do casts and assume it
>>>> was constructed.
>>>
>>> I disagree, because it requires programmers to write pointless code
>>> that is known to be no-op anyway, just to satisfy the spec.
>>
>> "Pointless" and "satisfy the spec" are contradictions.
>
> No, unless you write code just to satisfy the spec. I personally don't
> find that kind of activity productive.
We have a "spec" (international standard) for C++, and you can either
write code conforming to it or not. But don't complain if non-conforming
code doesn't execute as you intended. The spec did not say "repeat each
variable's declaration for no reason". It says "if you want to use an
uninitialized memory location as an object of some type T, you must
express this by employing placement new on it". Just because it's a
no-op it's not pointless.

>> Of course you have to explain
>> to the implementation that each raw memory location corresponds to an
>> int, and *you* imposed that burden on yourself--not the language.
>
> The point I'm making is that the requirement to explain this is
> imposed by the standard, while there is no technical reason to do
> that. That is what I called "writing code just to satisfy the spec".
If you omit it, the optimizer will screw with your code. But I guess
there's little point arguing with some kind of C fundamentalist.

Richard Hodges

unread,
Jan 17, 2017, 9:09:03 AM1/17/17
to std-dis...@isocpp.org
> It would also create a dysfunctional C++ object model. A piece of storage would have to have every object that could fit into it all at the same time. >You're basically saying that any piece of memory should be able to be treated as a union of all appropriate types.

> As I understand it, that's not even the C object model; it manifests objects in arbitrary memory by you writing to them.

OK, I think it's reasonable to require an intrinsic to have been written to before it is an 'object'. That makes sense as it allows the use of sentinels etc. Presumably this is why C mandates it this way.

So I'll modify my argument while continuing on the theme: What's sauce for the goose ought to be sauce for the supposedly compatible gander.

I argue (and I don't think I am alone) that for intrinsics and PODs solely thereof, elements that have been written to, either by C or C++ ought to have been constructed. If they have been written to through a cast of a correctly aligned memory pointer, they ought to 'exist' in that actual memory [subject to as-if-compatible optimisations, of course]. This is what we expect in C, and it is arguably what we would be reasonable to expect in C++.

I accept that this would require a differentiation in handling between PODs and non-POD structs in the standard. I think that's reasonable:

When constructors, destructors, copy, move ops are non-trivial, we expect 'object-like' behaviour. When they are trivial (particularly when they are NOPs) we expect memory-like behaviour.

Again, this is the de-facto reality on which the code base of every c++ program that calls a C library depends. Why not codify a de-facto reality in order to legitimise it?
 







--

barry....@gmail.com

unread,
Jan 17, 2017, 9:45:15 AM1/17/17
to ISO C++ Standard - Discussion
> Yes, I very much feel the intent is misguided. For two reasons:
>
>   * This intent declares UB totally reasonable legacy code.

Even legacy code should have used "memcpy" here.


The problem is that, in any latency sensitive application, nobody uses memcpy here. If we're reading in some binary protocol off the wire, everybody's code is going to look something like:

switch (*reinterpret_cast<uint16_t const*>(buf)) {
case MsgA::value:
    handle
(*reinterpret_cast<MsgA const*>(buf);
   
break;
case MsgB::value:
    handle
(*reinterpret_cast<MsgB const*>(buf);
   
break;
// ...
}

instead of:

uint16_t msgType;
memcpy
(&msgType, buf, sizeof(msgType));
switch (msgType) {
case MsgA::value: {
   
MsgA msg;
    memcpy
(&msg, buf, sizeof(msg));
    handle
(msg);
   
break;
}
case MsgB::value: {
   
MsgB msg;
    memcpy
(&msg, buf, sizeof(msg));
    handle
(msg);
   
break;
}

That's definitely UB now and it was definitely UB before. But writing a memcpy and hoping that the compiler recognizes that it's really a reinterpret_cast (which sometimes works, sometimes doesn't) isn't really a solution. Avoiding that extra write matters. 

Maybe we just a:

template <class T, class U>
T
* start_lifetime_of_object_without_any_initialization_cast(U*);


Andrey Semashev

unread,
Jan 17, 2017, 10:09:21 AM1/17/17
to std-dis...@isocpp.org
On 01/17/17 16:39, Nicol Bolas wrote:
> On Tuesday, January 17, 2017 at 6:53:04 AM UTC-5, Andrey Semashev wrote:
>
> > Of course you have to explain
> > to the implementation that each raw memory location corresponds to an
> > int, and *you* imposed that burden on yourself--not the language.
>
> The point I'm making is that the requirement to explain this is imposed
> by the standard, while there is no technical reason to do that.
>
> If you know that there really is "no technical reason to do that", then
> you ought to be able to propose changes to the specification that permit
> such a thing /without/ creating an object model that is inherently
> self-contradictory or is otherwise broken.
>
> So what /exactly/ do you suggest we change? Not merely, "do stuff to
> make this work". But what exact changes should be made to the specification?

I don't have a concrete proposal. Producing one would require me to
spend significantly more time than I'm currently able to. But I don't
consider that a legitimate argument in favor of disregarding my or the
OP's point. You may disagree, of course.

My gut feeling is that the C++ object model has to allow a POD object to
automatically begin its lifetime whenever it is modified in the raw
storage. This is similar to how an active member of a union begins its
lifetime on the first modification.

Nicol Bolas

unread,
Jan 17, 2017, 10:53:00 AM1/17/17
to ISO C++ Standard - Discussion


On Tuesday, January 17, 2017 at 9:09:03 AM UTC-5, Richard Hodges wrote:
> It would also create a dysfunctional C++ object model. A piece of storage would have to have every object that could fit into it all at the same time. >You're basically saying that any piece of memory should be able to be treated as a union of all appropriate types.

> As I understand it, that's not even the C object model; it manifests objects in arbitrary memory by you writing to them.

OK, I think it's reasonable to require an intrinsic to have been written to before it is an 'object'. That makes sense as it allows the use of sentinels etc. Presumably this is why C mandates it this way.

So I'll modify my argument while continuing on the theme: What's sauce for the goose ought to be sauce for the supposedly compatible gander.

I argue (and I don't think I am alone) that for intrinsics and PODs solely thereof, elements that have been written to, either by C or C++ ought to have been constructed. If they have been written to through a cast of a correctly aligned memory pointer, they ought to 'exist' in that actual memory [subject to as-if-compatible optimisations, of course]. This is what we expect in C, and it is arguably what we would be reasonable to expect in C++.

I accept that this would require a differentiation in handling between PODs and non-POD structs in the standard. I think that's reasonable:

When constructors, destructors, copy, move ops are non-trivial, we expect 'object-like' behaviour. When they are trivial (particularly when they are NOPs) we expect memory-like behaviour.

Again, this is the de-facto reality on which the code base of every c++ program that calls a C library depends. Why not codify a de-facto reality in order to legitimise it?

Because making something de-jure requires actually pinning down what we're talking about, rather than making broad generalizations about the way we think things ought to work. It's easy to say what you're saying, but to actually make it work without the object model becoming contradictory is a huge process.

And even then, it wouldn't get you all of your C-isms.

For example, let's say you malloc some memory and pass it along to a C API that is going to fill that memory with consecutive `int`s. OK, fine; that memory now has a bunch of `int`s in it.

Know what it doesn't have? An array:

auto alloc = malloc(sizeof(int) * n);
get_ints
(alloc, n, ...);
auto ints = reinterpret_cast<int*>(alloc);
ints
[5]; //UB

This is the foundation of where `vector` has to rely on UB to get things done. Because the standard makes a distinction between objects that happen to be sequential in a piece of storage and an array of objects. And pointer arithmetic, the basis of [], only works on arrays. Pointer arithmetic is explicitly not allowed to jump from one object to another object unless they are in the same array.

Go ahead; take a look at [expr.add]. It only works for arrays. And the C API did not create an array.

Then there's the question of whether you're using the same type as the function filling in the object. A C API may have a `typedef struct` that contains an `int` and a `float`. If you declare a C++ struct that is layout compatible with it, then you can memcpy this data it into your C++ struct equivalent (if it is also trivially copyable). But unless you are using the same type definition as the C function that generated it, the objects that C created were their struct. And therefore, it is UB to simply cast that pointer to your C++ struct and start accessing it.

So now what? Do we say that if two standard layout types are layout compatible, it's OK to just pretend that one type is another type? Because that throws strict aliasing right out the window.

Then there's the question of why you need to restrict this to POD types at all. If what causes an object to come into being is copying into storage, would it not make sense to broaden the limitation based on that? That is, allow any type for which bitwise copies make sense? IE: trivially copyable. This is after all why C++11 created the standard-layout/trivially-copyable distinction: because POD is too narrow of a limitation based on what you're actually doing.

Now, some of these issues are things we need to fix (the specification of pointer arithmetic not allowing you to access sequential objects of the same type as though they were an array, for example). But overall, making C-isms legal C++ code is a lot of standardization work for little real gain.

Let C-isms remain UB. Encourage C++ programmers to follow C++ practices. And move forward.

The most I would be willing to see is a statement that code external to the C++ program is permitted to create objects compatible with C++'s object model, on an implementation-defined basis. But we should not standardize C-isms within actual C++ code.

Robert Haberlach

unread,
Jan 17, 2017, 11:25:50 AM1/17/17
to std-dis...@isocpp.org
On 1/17/2017 3:53 PM, Nicol Bolas wrote:
> The most I would be willing to see is a statement that code external
> to the C++ program is permitted to create objects compatible with
> C++'s object model, on an implementation-defined basis.

Cf. footnote 40:

"40) This section does not impose restrictions on indirection through
pointers to memory not allocated by ::operator new. This maintains the
ability of many C++ implementations to use binary libraries and
components written in other languages. In particular, this applies to C
binaries, because indirection through pointers to memory allocated by
std::malloc is not restricted."

Nicol Bolas

unread,
Jan 17, 2017, 11:27:20 AM1/17/17
to ISO C++ Standard - Discussion, barry....@gmail.com

You already have that. It's called using placement `new` with default initialization. If `T` is trivially default constructible, then `::new(p) T` will begin the lifetime of `T` with no initialization.

Richard Hodges

unread,
Jan 17, 2017, 11:32:55 AM1/17/17
to std-dis...@isocpp.org
I cannot see any reasonable argument that pointer arithmetic should not be allowed to work on consecutive objects. int* p = get_ints_from_c(); *(++p); should absolutely be defined behaviour in c++ provided there is actually some memory at std::addressof(*p) + sizeof(p); - there is no conceivable reason why it should not. 

Note that I am asserting *should absolutely* - a very strong statement. This is because we absolutely cannot move away from c. There are no c++ operating systems. Therefore all useful libraries are written with C interfaces. Thousands of c++ wrapper libraries exist to turn those C interfaces back into c++. We don't do that because we want to. We do that because C++ is not suitable creating portable object libraries, having no modules or common ABI.

By all means lets talk about moving forward - after we have modules, defined ABIs, an agreed-upon means to transmitting exceptions and so on.

Until then, the entire foundation of our C++ universe is C. To try to pretend otherwise is a fallacy.

OK, it's "difficult" to marry the c++ abstract machine model with the C memory model. So what? That doesn't mean that it should not be done. Clearly the definition of the C++ abstract machine needs to be revised, or made more granular. Difficulty does not come into it.

Should bitwise copies of non-trivial objects be allowed? Of course not. We can express the reason why not as a high level "because of memory model concerns" or we can be truthful: "because the pointers will be wrong and your double delete will crash the program". C++ is a low level high level language. We should not be afraid of talking about memory, addresses and pointers. That is what they are.

As much as the standards committee wants to pretend that C++ is not "C with classes, templates and exceptions", they are wrong. It is, and it will always be. If it wasn't, it wouldn't work because rather than relying on a million open-source contributors to C libraries in order to create anything but the most trivial and useless program, there would have to be a central team writing all the c++ libraries for sound, graphics, crypto, comms, etc, etc, etc.

If that were the case then we may as well dump all of the C compatibility and go for D.

C++ is an evolution of C, it relies on C in its standard libraries. Its user community relies on its compatibility with C. Its memory should be 100% compatible. Optimisers can cope. 




--

barry....@gmail.com

unread,
Jan 17, 2017, 12:22:24 PM1/17/17
to ISO C++ Standard - Discussion, barry....@gmail.com

You already have that. It's called using placement `new` with default initialization. If `T` is trivially default constructible, then `::new(p) T` will begin the lifetime of `T` with no initialization.

What if T isn't trivially default constructible?
What if it is, but my compiler decides to "default-initialize" some fundamental types with fixed values in debug mode? 

Nicol Bolas

unread,
Jan 17, 2017, 1:10:25 PM1/17/17
to ISO C++ Standard - Discussion, barry....@gmail.com
On Tuesday, January 17, 2017 at 12:22:24 PM UTC-5, barry....@gmail.com wrote:

You already have that. It's called using placement `new` with default initialization. If `T` is trivially default constructible, then `::new(p) T` will begin the lifetime of `T` with no initialization.

What if T isn't trivially default constructible?

If a type is not trivially default constructible, then the writer of that type has explicitly decided that the type cannot take on arbitrary values. Therefore, it can only take on a specific set of values, defined by the constructors of that type. It can still be trivially copyable, but that requires you to start from a valid instance of that type, as created by one of its constructors.

Therefore, whatever construct you want to have that adopts the data in existing storage cannot apply to non-trivially default constructible types.

What if it is, but my compiler decides to "default-initialize" some fundamental types with fixed values in debug mode?

... That's a fair point.

Thiago Macieira

unread,
Jan 17, 2017, 1:14:01 PM1/17/17
to std-dis...@isocpp.org
Em terça-feira, 17 de janeiro de 2017, às 18:09:17 PST, Andrey Semashev
escreveu:
> My gut feeling is that the C++ object model has to allow a POD object to
> automatically begin its lifetime whenever it is modified in the raw
> storage. This is similar to how an active member of a union begins its
> lifetime on the first modification.

A POD object's lifetime should begin when storage for it is provided and end
when storage is freed. In that sense, it happens before the modification
through a pointer. In fact, you could say it happens sometime inside malloc().

T. C.

unread,
Jan 17, 2017, 1:17:19 PM1/17/17
to ISO C++ Standard - Discussion, barry....@gmail.com


On Tuesday, January 17, 2017 at 1:10:25 PM UTC-5, Nicol Bolas wrote:
On Tuesday, January 17, 2017 at 12:22:24 PM UTC-5, barry....@gmail.com wrote:

You already have that. It's called using placement `new` with default initialization. If `T` is trivially default constructible, then `::new(p) T` will begin the lifetime of `T` with no initialization.

What if T isn't trivially default constructible?

If a type is not trivially default constructible, then the writer of that type has explicitly decided that the type cannot take on arbitrary values. Therefore, it can only take on a specific set of values, defined by the constructors of that type. It can still be trivially copyable, but that requires you to start from a valid instance of that type, as created by one of its constructors.

Therefore, whatever construct you want to have that adopts the data in existing storage cannot apply to non-trivially default constructible types.

Not necessarily. struct X { const int a; }; isn't trivially default constructible but nothing else you wrote above applies to it.
 

What if it is, but my compiler decides to "default-initialize" some fundamental types with fixed values in debug mode?

... That's a fair point.

 There are also optimizers that treat the placement new as clobbering the memory, because not performing initialization for dynamic storage duration objects results in indeterminate values, not "whatever was there before".

Demi Obenour

unread,
Jan 17, 2017, 1:20:13 PM1/17/17
to 'Edward Catmur' via ISO C++ Standard - Discussion
THIS.  C++'s main selling point is C compatibility.  Much code is REQUIRED to compile as BOTH C and C++.  Without changes.

No compiler will dare treat this as undefined behavior unless it is also undefined in C.

Thiago Macieira

unread,
Jan 17, 2017, 1:26:31 PM1/17/17
to std-dis...@isocpp.org
Em terça-feira, 17 de janeiro de 2017, às 17:32:52 PST, Richard Hodges
escreveu:
> Note that I am asserting *should absolutely* - a very strong statement.
> This is because we absolutely cannot move away from c. There are no c++
> operating systems. Therefore all useful libraries are written with C
> interfaces.

Side-note: Microsoft's UCRT library implements the C library in C++. Yes, it
has an extern "C" interface, but it's actually a C++ library.

Nicol Bolas

unread,
Jan 17, 2017, 1:28:44 PM1/17/17
to ISO C++ Standard - Discussion
On Tuesday, January 17, 2017 at 11:32:55 AM UTC-5, Richard Hodges wrote:
I cannot see any reasonable argument that pointer arithmetic should not be allowed to work on consecutive objects.

And nobody has made such an argument. Indeed, I'm pretty sure that I stated quite the opposite. Though for very different reasons and different restrictions.
 
int* p = get_ints_from_c(); *(++p); should absolutely be defined behaviour in c++ provided there is actually some memory at std::addressof(*p) + sizeof(p); - there is no conceivable reason why it should not. 

Note that I am asserting *should absolutely* - a very strong statement. This is because we absolutely cannot move away from c. There are no c++ operating systems. Therefore all useful libraries are written with C interfaces. Thousands of c++ wrapper libraries exist to turn those C interfaces back into c++. We don't do that because we want to. We do that because C++ is not suitable creating portable object libraries, having no modules or common ABI.

By all means lets talk about moving forward - after we have modules, defined ABIs, an agreed-upon means to transmitting exceptions and so on.

Until then, the entire foundation of our C++ universe is C. To try to pretend otherwise is a fallacy.

OK, it's "difficult" to marry the c++ abstract machine model with the C memory model. So what? That doesn't mean that it should not be done.

OK, so explain what we will gain by doing all of this work. How will it make my currently functional code faster and/or better? How will it make my programs more correct? How will it improve the C++ object model in ways that are useful for actual C++ programs?

The status quo is adequately functional. And if C is as entwined with C++ as you believe, then no compiler vendor is going to break the world with "optimizations" that don't actually make things more optimal.

Clearly the definition of the C++ abstract machine needs to be revised, or made more granular. Difficulty does not come into it.

Should bitwise copies of non-trivial objects be allowed? Of course not.

But they are. Right now.

Bitwise copies of non-trivially copyable types are forbidden. Trivial types are a subset of trivially copyable ones.

We can express the reason why not as a high level "because of memory model concerns" or we can be truthful: "because the pointers will be wrong and your double delete will crash the program".

... that's not why we forbid non-trivially copyable types from being bit-copied.

We do it because if you have written a copy/move constructor/assignment operator or a destructor, then you clearly have needs for your object that cannot be satisfied by mere bit-fiddling. A live destructor represents that dropping an object on the floor does not represent a valid way to get rid of it. Live copy/move means that there is some internal resource your object is managing, and therefore bit-copying is incapable of doing so.

"Because of memory model concerns" is the truth. To say it's because of pointer stuff or whatever is to lie to the user, to declare that "trivially copyable" is some arbitrary construct that has no intrinsic meaning.

A trivial type is a type for which any value is equally valid. A trivially copyable type is a type for which bitwise-copying makes sense. These are legitimate constructs that have a representation in your code structure.

You could write a non-trivially copyable type, such that your implementation of one of the non-trivial functions doesn't obstruct bitwise copying. But C++ cannot know that; that is an implementation detail of your function which cannot be tested for. So your type is forbidden from doing so, even though by your "low level high level" reasoning, it should be allowed.

That's a good thing. It makes our rules simple and easily testible and verifiable. It allows us to write `std::copy` and `std::vector` implementations that perform optimally for types where we know we can perform bitcopies. And for those of us who want to take advantage of such optimizations, we know exactly what we have to do to create such types.

Your kind of "wild west" thinking doesn't allow for such simple, easily testible constructs. For your kind of coding, the user would have to explicitly inform `copy` or `vector` that it should perform bitcopies.

Thiago Macieira

unread,
Jan 17, 2017, 1:32:42 PM1/17/17
to std-dis...@isocpp.org
Em terça-feira, 17 de janeiro de 2017, às 13:20:10 PST, Demi Obenour escreveu:
> THIS. C++'s main selling point is C compatibility. Much code is REQUIRED
> to compile as BOTH C and C++. Without changes.

Yes, but no.

Yes, I agree with your agreeing with Richard.

But no, there's not a lot of code that needs to compile as both C and C++.
That's limited to a few (static) inline functions in headers. It may be that
they're used extremely often, especially if they come from the standard C
library itself or from POSIX or a relevant standard like the ancillary socket
data payloads defined by RFC 3542 -- CMSG_DATA and CMSG_NXTHDR are *ugly*.

Richard's point, which I agree with, is that C++ needs to interoperate with C
libraries and vice-versa. That does not imply compiling a lot of code as
either.

Nicol Bolas

unread,
Jan 17, 2017, 1:33:12 PM1/17/17
to ISO C++ Standard - Discussion


On Tuesday, January 17, 2017 at 1:14:01 PM UTC-5, Thiago Macieira wrote:
Em terça-feira, 17 de janeiro de 2017, às 18:09:17 PST, Andrey Semashev
escreveu:
> My gut feeling is that the C++ object model has to allow a POD object to
> automatically begin its lifetime whenever it is modified in the raw
> storage. This is similar to how an active member of a union begins its
> lifetime on the first modification.

A POD object's lifetime should begin when storage for it is provided and end
when storage is freed. In that sense, it happens before the modification
through a pointer. In fact, you could say it happens sometime inside malloc().

Which POD object? There are an arbitrary number of them that can fit into that storage. If you say that `malloc(sizeof(void*))` begins the lifetime of every pointer type, every pointer-to-pointer type, every pointer-to-pointer-to-pointer etc, then the entire idea of having a typed object model loses all meaning.

Nicol Bolas

unread,
Jan 17, 2017, 1:44:13 PM1/17/17
to ISO C++ Standard - Discussion
On Tuesday, January 17, 2017 at 1:32:42 PM UTC-5, Thiago Macieira wrote:
Em terça-feira, 17 de janeiro de 2017, às 13:20:10 PST, Demi Obenour escreveu:
> THIS.  C++'s main selling point is C compatibility.  Much code is REQUIRED
> to compile as BOTH C and C++.  Without changes.

Yes, but no.

Yes, I agree with your agreeing with Richard.

But no, there's not a lot of code that needs to compile as both C and C++.
That's limited to a few (static) inline functions in headers. It may be that
they're used extremely often, especially if they come from the standard C
library itself or from POSIX or a relevant standard like the ancillary socket
data payloads defined by RFC 3542 -- CMSG_DATA and CMSG_NXTHDR are *ugly*.

Richard's point, which I agree with, is that C++ needs to interoperate with C
libraries and vice-versa. That does not imply compiling a lot of code as
either.

That I can agree with. The standard needs a clear statement on the ability of non-C++ code to be able to create C++ objects.

Validating C-isms inside C++ code is not something we need to deal with.

From an interop perspective, the main problem there is dealing with compatibility between structs. C can write data that is layout compatible with C++, but unless they are both working on the same struct definition, C++'s object model does not allow accessing an object that's merely layout compatible with what's there.

So, to avoid having to memcpy from external objects into C++ objects, you would need to have a way to effectively adopt that memory into a formal C++ object. I'm not sure what that ought to look like, but it would be a new way of creating a C++ object.

I hesitate to call it "type punning," but that is essentially what it is. Except that it is formalized within the object model and is only valid for trivial types (if you have a non-trivial type that's trivially copyable, you'll have to copy it to get this to work). And, like any form of lifetime-ending-followed-by-restart, pointers/references/variables to the old memory aren't valid for accessing the object's values anymore.

Richard Hodges

unread,
Jan 17, 2017, 1:57:02 PM1/17/17
to std-dis...@isocpp.org
There is already a mechanism in the language syntax for interoperability with C.

extern "C" { // <-- this!

  Struct Bar { double d; }
  struct Foo { int bar_count; Bar[] bars };   // this is clearly an object that C can initialise
  Foo* makeFoo();  // make a foo with a number of Bars. N will be 1+
  destroyFoo(Foo*);
};

// this should be predictable and correct in the standard
auto pfoo = std::unique_ptr<Foo, void(*)(Foo*)>(makeFoo(), &destroyFoo);
auto first = pfoo->bars;
auto last = first + pfoo->bar_count;
doBarThings(first, last);

As the standard is currently worded, this code is UB. That's unacceptable.


--

Andrey Semashev

unread,
Jan 17, 2017, 2:04:58 PM1/17/17
to std-dis...@isocpp.org
On 01/17/17 21:56, Richard Hodges wrote:
> There is already a mechanism in the language syntax for interoperability
> with C.
>
> extern "C" { // <-- this!
>
> Struct Bar { double d; }
> struct Foo { int bar_count; Bar[] bars }; // this is clearly an
> object that C can initialise
> Foo* makeFoo(); // make a foo with a number of Bars. N will be 1+
> destroyFoo(Foo*);
> };
>
> // this should be predictable and correct in the standard
> auto pfoo = std::unique_ptr<Foo, void(*)(Foo*)>(makeFoo(), &destroyFoo);
> auto first = pfoo->bars;
> auto last = first + pfoo->bar_count;
> doBarThings(first, last);
>
> As the standard is currently worded, this code is UB. That's unacceptable.

Trailing arrays are not supported in C++, AFAIK, so this won't compile
in C++.


Nicol Bolas

unread,
Jan 17, 2017, 2:35:02 PM1/17/17
to ISO C++ Standard - Discussion, barry....@gmail.com
On Tuesday, January 17, 2017 at 1:17:19 PM UTC-5, T. C. wrote:
On Tuesday, January 17, 2017 at 1:10:25 PM UTC-5, Nicol Bolas wrote:
On Tuesday, January 17, 2017 at 12:22:24 PM UTC-5, barry....@gmail.com wrote:

You already have that. It's called using placement `new` with default initialization. If `T` is trivially default constructible, then `::new(p) T` will begin the lifetime of `T` with no initialization.

What if T isn't trivially default constructible?

If a type is not trivially default constructible, then the writer of that type has explicitly decided that the type cannot take on arbitrary values. Therefore, it can only take on a specific set of values, defined by the constructors of that type. It can still be trivially copyable, but that requires you to start from a valid instance of that type, as created by one of its constructors.

Therefore, whatever construct you want to have that adopts the data in existing storage cannot apply to non-trivially default constructible types.

Not necessarily. struct X { const int a; }; isn't trivially default constructible but nothing else you wrote above applies to it.

Well, I found somethign quite troubling. `X` may not be trivially default constructible, but it is trivially copyable. I have no idea how that is even possible, since it effectively means you can change a `const` member of a live object.

Richard Hodges

unread,
Jan 17, 2017, 2:46:21 PM1/17/17
to std-dis...@isocpp.org

But it should be, because c++ should be compatible with c. The code has clearlily been declared as external "C". So it's not a c++ struct, it's a c struct.

It should obey the c memory model, and Interoperate correctly with c++. In the same way that objective-c++ understands c, c++ and objective c. 

--



---

You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.

Nicol Bolas

unread,
Jan 17, 2017, 2:55:17 PM1/17/17
to ISO C++ Standard - Discussion
On Tuesday, January 17, 2017 at 2:46:21 PM UTC-5, Richard Hodges wrote:

But it should be, because c++ should be compatible with c. The code has clearlily been declared as external "C". So it's not a c++ struct, it's a c struct.

C++ is not a superset of C and it never has been. Users should not expect to be able to throw any C struct at C++ and have it work with it.

It should obey the c memory model, and Interoperate correctly with c++. In the same way that objective-c++ understands c, c++ and objective c.

Objective C is designed to be a pure superset of C. C++ is not designed to be a pure superset of C. And that simply is not going to change.

Interop is one thing, expecting to shove C at a C++ compiler is another. The latter has never been true.

C/C++ is not a real language, and it's time people finally accepted that.

Ville Voutilainen

unread,
Jan 17, 2017, 3:24:04 PM1/17/17
to std-dis...@isocpp.org
On 17 January 2017 at 21:46, Richard Hodges <hodg...@gmail.com> wrote:
>
> But it should be, because c++ should be compatible with c. The code has
> clearlily been declared as external "C". So it's not a c++ struct, it's a c
> struct.

extern "C" doesn't mean that the code inside it is compiled as C.

Thiago Macieira

unread,
Jan 17, 2017, 3:24:28 PM1/17/17
to std-dis...@isocpp.org
On terça-feira, 17 de janeiro de 2017 10:33:12 PST Nicol Bolas wrote:
> > A POD object's lifetime should begin when storage for it is provided and
> > end
> > when storage is freed. In that sense, it happens before the modification
> > through a pointer. In fact, you could say it happens sometime inside
> > malloc().
>
> *Which* POD object? There are an arbitrary number of them that can fit into
> that storage. If you say that `malloc(sizeof(void*))` begins the lifetime
> of every pointer type, every pointer-to-pointer type, every
> pointer-to-pointer-to-pointer etc, then the entire idea of having a typed
> object model loses all meaning.

It's unspecified which one, but one only. That means the compiler cannot assume
that the code did not initialise, but it can infer from code that uses that
storage area what type it was.

struct S { int i; };
struct T { float f; };

auto ptr = malloc(4);

// the next line tells the compiler that there's an S object there
S *s = static_cast<S *>(ptr);
s->i = 0;

// the following line aliases S's storage with a struct that does not
// have a common sequence, so it's UB
T *t = static_cast<T *>(ptr);

// it would be as wrong as
T *t = reinterpret_cast<T *>(s);

Ville Voutilainen

unread,
Jan 17, 2017, 3:26:02 PM1/17/17
to std-dis...@isocpp.org
On 17 January 2017 at 21:35, Nicol Bolas <jmck...@gmail.com> wrote:
> Well, I found somethign quite troubling. `X` may not be trivially default
> constructible, but it is trivially copyable. I have no idea how that is even
> possible, since it effectively means you can change a `const` member of a
> live object.


I don't see how that follows from the current rules. You can end the
lifetime of the X object
and create a new one in the same location, but I don't see how you can

Thiago Macieira

unread,
Jan 17, 2017, 3:29:18 PM1/17/17
to std-dis...@isocpp.org
On terça-feira, 17 de janeiro de 2017 11:55:16 PST Nicol Bolas wrote:
> Interop is one thing, expecting to shove C at a C++ compiler is another.
> The latter has never been true.

void foo(int x[static 8]) {}
void bar(int n, int x[static n]) {}

Not to mention variables called "class", "new", "delete", etc.

Richard Hodges

unread,
Jan 17, 2017, 3:30:47 PM1/17/17
to std-dis...@isocpp.org
C++ is not a superset of C and it never has been. Users should not expect to be able to throw any C struct at C++ and have it work with it.

And yet this is exactly what I can do, de-facto, today. And it is exactly this undefined behaviour that the entire c++ enterprise depends on, today.


Code (proven on clang and g++):

#include <stdlib.h>


extern "C"
{

struct Foo {
int bars;
double bar[];
};

Foo *makeFoo() {
int a = 6;
auto vp = malloc(sizeof(Foo) + a * sizeof(double));
auto p = (Foo *) vp;
p->bars = a;
for (int i = 0; i < a; ++i) {
p->bar[i] = i * 2;
}
return p;
}
void deleteFoo(Foo *p) {
free(p);
}
}

#include <memory>
#include <iostream>
#include <algorithm>
#include <iterator>

struct FooDeleter {
void operator()(Foo *p) const {
deleteFoo(p);
}
};

int main() {
using fooptr = std::unique_ptr<Foo, FooDeleter>;
auto p = fooptr(makeFoo());

auto first = p->bar;
auto last = first + p->bars;
std::copy(first, last, std::ostream_iterator<double>(std::cout, ", "));
std::cout << std::endl;

}



--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.

Nicol Bolas

unread,
Jan 17, 2017, 3:53:40 PM1/17/17
to ISO C++ Standard - Discussion

X x1 = {5};
X x2
= {10};

memcpy
(&x1, &x2, sizeof(X));

[basic.types]/3 tells us that this is perfectly legal. And it tells us "obj2 shall subsequently hold the same value as obj1". It does not say that `obj2` will be destroyed and constructed, or that `obj2`'s storage will be reused for a new instance of `X`. It says that it holds the same value as `obj1`.

That means `x1.a` must be 10. Which means we have changed a `const` object.

Richard Hodges

unread,
Jan 17, 2017, 3:59:16 PM1/17/17
to std-dis...@isocpp.org
All the standard needs is the addition of a paragraph that says:

'any type or intrinsic object declared inside extern "C" exists in the C memory model (see ISO standard xxx). Members of objects of an extern "C" class type shall behave as per the C language, previously cited. The C++ and C memory models must coexist in an unsurprising way'

End of problem.


Ville Voutilainen

unread,
Jan 17, 2017, 4:01:25 PM1/17/17
to std-dis...@isocpp.org
You seem to be pretending that [dcl.type.cv]/4 doesn't exist.

Nicol Bolas

unread,
Jan 17, 2017, 4:04:56 PM1/17/17
to ISO C++ Standard - Discussion
On Tuesday, January 17, 2017 at 3:24:28 PM UTC-5, Thiago Macieira wrote:
On terça-feira, 17 de janeiro de 2017 10:33:12 PST Nicol Bolas wrote:
> > A POD object's lifetime should begin when storage for it is provided and
> > end
> > when storage is freed. In that sense, it happens before the modification
> > through a pointer. In fact, you could say it happens sometime inside
> > malloc().
>
> *Which* POD object? There are an arbitrary number of them that can fit into
> that storage. If you say that `malloc(sizeof(void*))` begins the lifetime
> of every pointer type, every pointer-to-pointer type, every
> pointer-to-pointer-to-pointer etc, then the entire idea of having a typed
> object model loses all meaning.

It's unspecified which one, but one only. That means the compiler cannot assume
that the code did not initialise, but it can infer from code that uses that
storage area what type it was.

Standards do not work based off of vague inferences. They have to specify behavior. So if an "inference" is going to be made, then there must be an explicit enumeration of syntactic constructs which the compiler will use to "infer" the type.

At which point, those syntaxes are not being used to "infer" something; they now become alternate syntaxes for creating an object.

If you say that `static_cast` can begin the lifetime of an object, then you need to say under which circumstances that will happen. Does it only work from memory fresh out of `malloc`, or can it work on `malloc`ed memory that used to have an object in it and you're now replacing it with another? How do you tell the difference between pointer conversion and object initialization?

C++ has a way to tell the difference: `static_cast` is for pointer conversion; `new()` is for object initialization. Because of that, it can do this:

struct S { int i; };
struct T { float f; };
 
auto mem = malloc(std::max(sizeof(S), sizeof(T)));

S
*s = new(mem) S;
S
->i;

T
*t = static_cast<T*>(mem); //OK, but you can't use `t`.

T
*t2 = new(mem) T; //Can use `t2`, but not `s` anymore.
t
= std::launder(t); //I can use `t` now.

Nicol Bolas

unread,
Jan 17, 2017, 4:19:40 PM1/17/17
to ISO C++ Standard - Discussion

If trivially copying objects with `const` members provokes UB, why are objects with `const` members trivially copyable? Or is the idea that, since such types are not assignable, you shouldn't be trying to use trivial copy mechanics to assign to them?

Ville Voutilainen

unread,
Jan 17, 2017, 4:22:13 PM1/17/17
to std-dis...@isocpp.org
On 17 January 2017 at 23:19, Nicol Bolas <jmck...@gmail.com> wrote:
> If trivially copying objects with `const` members provokes UB, why are
> objects with `const` members trivially copyable? Or is the idea that, since
> such types are not assignable, you shouldn't be trying to use trivial copy
> mechanics to assign to them?

The latter, yes. You can still bit-blast them into buffers that don't
contain live objects yet,
so for that reason it's apparently rather important that such types
are trivially copyable,
but trivially copyable doesn't necessarily mean assignable or "can
blast values over existing objects".

Nicol Bolas

unread,
Jan 17, 2017, 4:29:40 PM1/17/17
to ISO C++ Standard - Discussion
On Tuesday, January 17, 2017 at 3:30:47 PM UTC-5, Richard Hodges wrote:
C++ is not a superset of C and it never has been. Users should not expect to be able to throw any C struct at C++ and have it work with it.

And yet this is exactly what I can do, de-facto, today. And it is exactly this undefined behaviour that the entire c++ enterprise depends on, today.

That's not "undefined behavior"; those are compiler extensions. That's different; compilers are allowed to give non-C++ syntax meaning if they so choose.


On Tuesday, January 17, 2017 at 3:59:16 PM UTC-5, Richard Hodges wrote:
All the standard needs is the addition of a paragraph that says:

'any type or intrinsic object declared inside extern "C" exists in the C memory model (see ISO standard xxx). Members of objects of an extern "C" class type shall behave as per the C language, previously cited. The C++ and C memory models must coexist in an unsurprising way'

End of problem.

Um, no. This now requires that every C++ compiler also implement <insert version here> of C. Just because "c++ enterprise" depends on some non-C++ features doesn't mean we should shove them into the standard.

Furthermore, declaring that the two object models "must coexist in an unsurprising way" basically says nothing. It's about as useful for deciding on behavior as the wording on pointer-to-`intptr_t` conversions.

Thiago Macieira

unread,
Jan 17, 2017, 4:46:37 PM1/17/17
to std-dis...@isocpp.org
struct X { const int a; };

On terça-feira, 17 de janeiro de 2017 12:53:39 PST Nicol Bolas wrote:
> X x1 = {5};
> X x2 = {10};
>
> memcpy(&x1, &x2, sizeof(X));
>
> [basic.types]/3 tells us that this is perfectly legal. And it tells us
> "obj2 shall subsequently hold the same value as obj1". It does not say that
> `obj2` will be destroyed and constructed, or that `obj2`'s storage will be
> reused for a new instance of `X`. It says that it holds the same value as
> `obj1`.
>
> That means `x1.a` *must* be 10. Which means we have changed a `const`
> object.

12.8.2 [class.copy.assign] / 7 says the copy-assignment operator is deleted if
any non-static member is const. That's why the expression:

x1 = x2;

fails to compile. But [class]/6 still allows it to be trivially copyable,
since all copy/move constructors and assignment operators are either trivial
(both constructors) or deleted (both assignment operators), but still has at
least one of them non-deleted.

That means the memcpy above is not a copy-assignment. It's actually destroying
the old object and initialising a new one. That is, it's equivalent to:

x1.~X();
new (&x1) X(x2);

Which is perfectly valid, even in the context of const members.

Richard Hodges

unread,
Jan 17, 2017, 4:53:59 PM1/17/17
to std-dis...@isocpp.org
> Um, no. This now requires that every C++ compiler also implement <insert version here> of C. 

This is not what I am saying. I am saying that the objects that are imbued with extern "C" must exist in the C memory model. They already do, and the entirety of the c++ world depends upon this fact today. This is an inescapable truth.

> Just because "c++ enterprise" depends on some non-C++ features doesn't mean we should shove them into the standard.

The alternative is that almost every meaningful application and library in existence is fundamentally non-portable. This is not in the interests of c++ developers, users of their work, or indeed manufacturers of the compilers. So I have to say, with respect, that you are mistaken. An ISO standard describing a system built upon UB is meaningless because all programs become strictly non-portable. See above.

Furthermore, declaring that the two object models "must coexist in an unsurprising way" basically says nothing. It's about as useful for deciding on behavior as the wording on pointer-to-`intptr_t` conversions.

You know full well that I am paraphrasing.

I'll make you a bet. Let's put this question to a poll of c++ developers (say with more than 4 years' experience). I'll give you even money on any bet you care to take that I my position would win by a ratio exceeding 7:3

It's what the language does. It's what the entire developer base expects the compiler to do. It is the de-facto truth of c++. The standard is currently perverse in stating otherwise.

The committee should hang its head in shame.
 



--

Demi Obenour

unread,
Jan 17, 2017, 5:13:13 PM1/17/17
to 'Edward Catmur' via ISO C++ Standard - Discussion
Agreed.  Furthermore, I claim that code that blasts bits into memory and treats it as an object will always be there.  There is no other good way to achieve guaranteed zero-copy operation.

barry....@gmail.com

unread,
Jan 17, 2017, 5:26:57 PM1/17/17
to ISO C++ Standard - Discussion


The latter, yes. You can still bit-blast them into buffers that don't
contain live objects yet,
so for that reason it's apparently rather important that such types
are trivially copyable,
but trivially copyable doesn't necessarily mean assignable or "can
blast values over existing objects".

I can bit-blast it into a buffer. But how can I bit-blast it out of a buffer?

struct X { const int val; };

// this is all well and good
alignas(X) char buffer[sizeof(X)];
new (buffer) X{42};
::send(buffer, sizeof(buffer));

alignas
(X) char recv_buffer[sizeof(X)];
::recv(recv_buffer, sizeof(recv_buffer));

// can't do this
X x
; // nope
memcpy
(&a, recv_buffer, sizeof(a));

// this is UB
X b
(*reinterpret_cast<X const*>(recv_buffer));

// this is well-defined yet totally unmaintainable
int v;
memcpy
(&v, recv_buffer, sizeof(v));
X c
{v};

Martin Ba

unread,
Jan 17, 2017, 5:45:35 PM1/17/17
to ISO C++ Standard - Discussion
On Tuesday, January 17, 2017 at 7:28:44 PM UTC+1, Nicol Bolas wrote:
On Tuesday, January 17, 2017 at 11:32:55 AM UTC-5, Richard Hodges wrote:
I cannot see any reasonable argument that pointer arithmetic should not be allowed to work on consecutive objects.

And nobody has made such an argument. Indeed, I'm pretty sure that I stated quite the opposite. Though for very different reasons and different restrictions.
 
int* p = get_ints_from_c(); *(++p); should absolutely be defined behaviour in c++ provided there is actually some memory at std::addressof(*p) + sizeof(p); - there is no conceivable reason why it should not. 

Note that I am asserting *should absolutely* - a very strong statement. This is because we absolutely cannot move away from c. There are no c++ operating systems. Therefore all useful libraries are written with C interfaces. Thousands of c++ wrapper libraries exist to turn those C interfaces back into c++. We don't do that because we want to. We do that because C++ is not suitable creating portable object libraries, having no modules or common ABI.

By all means lets talk about moving forward - after we have modules, defined ABIs, an agreed-upon means to transmitting exceptions and so on.

Until then, the entire foundation of our C++ universe is C. To try to pretend otherwise is a fallacy.

OK, it's "difficult" to marry the c++ abstract machine model with the C memory model. So what? That doesn't mean that it should not be done.

OK, so explain what we will gain by doing all of this work. How will it make my currently functional code faster and/or better? How will it make my programs more correct? How will it improve the C++ object model in ways that are useful for actual C++ programs?


What we will gain is a Standard that is not contradicting reality, which is *at least* a marketing asset for the language. (compare: isocpp.org)

What we will gain is people writing reasonable real world "low level" code *not being told* that their code is UB and that they should resort to memcpy and no-op placement-new contortions - this is at least an asset wrt. the learning curve of the language.

What we will gain is not having to spend time on rather fruitless discussion like this one here.
 
The status quo is adequately functional. And if C is as entwined with C++ as you believe, then no compiler vendor is going to break the world with "optimizations" that don't actually make things more optimal.

The status quo in reality is functional. The Standard contradicts reality in this regard. That it works everywhere in practice, and can be expected to do so, is only an argument for the priority of fixing this, not an argument for not fixing the Standard.


I have to say I do not quite follow you argumentation wrt. this: on one hand you seem to care very much about the Standard supplying a useful and consistent object model, but on the other hand, so seem to say that the places where this shiningly consistent model is violated by a huge fraction of programs in existence don't matter because they will continue to work anyway.

Richard Smith

unread,
Jan 17, 2017, 5:46:51 PM1/17/17
to std-dis...@isocpp.org
On 17 January 2017 at 05:31, Nicol Bolas <jmck...@gmail.com> wrote:
On Monday, January 16, 2017 at 11:09:59 PM UTC-5, Richard Smith wrote:
On 16 Jan 2017 6:25 pm, "Nicol Bolas" <jmck...@gmail.com> wrote:
On Monday, January 16, 2017 at 9:05:11 PM UTC-5, Richard Smith wrote:
On 16 January 2017 at 17:47, Nicol Bolas <jmck...@gmail.com> wrote:
On Monday, January 16, 2017 at 7:23:30 PM UTC-5, Richard Smith wrote:
On 16 January 2017 at 13:49, Martin Ba <0xcdc...@gmx.at> wrote:
On Monday, January 16, 2017 at 10:11:19 PM UTC+1, Jens Maurer wrote:
On 01/15/2017 09:56 PM, Martin Ba wrote:
> Hi.
>
> I'm currently trying to understand a few ... interesting ... observations I have been making wrt. the C++ Standard and using char arrays as raw storage.
>
> Essentially, as far as I can tell (have been told), the current C++ Standard only allows using a char array as raw storage (see also std::aligned_storage) when objects are put into this via placement new, even for e.g. int or other trivial(*) types.
>
> See: http://stackoverflow.com/questions/41624685/is-placement-new-legally-required-for-putting-an-int-into-a-char-array or related questions where I'm told I'm expected to do the following:
>
> |alignas(int) char buf[sizeof(int)];
>
> void f() {
>   // turn the memory into an int: (??) from the POV of the abstract machine!
>   ::new (buf) int; // is this strictly required? (aside: it's obviously a no-op)
>
>   // access storage:
>   *((int*)buf) = 42; // for this discussion, just assume the cast itself yields the correct pointer value
> }

> What I would be interested in is whether this has been discussed in the committee (CWG?) in the last very few years
> and whether there is any agreement if omitting the placement new (for trivial type) should be allowed or if Standard C++ should absolutely require the placement new./

I believe I can say that CWG agrees that the words now in C++17 correctly
reflect the intent that you need the placement new in the case above.


If this is really the intent, then this needs to be more clearly communicated and, I feel, rationalized. (Maybe it already has? Thats what the OP was actually about.)
 
If you believe that intent is misguided, feel free to propose a change.

Yes, I very much feel the intent is misguided. For two reasons:
  • This intent declares UB totally reasonable legacy code. At least I consider it reasonable too *not* have to place a no-op placement new in straightforward buffer backed code for trivial types.
  • Since C doesn't have placement new, any C code that uses a char buffer to back any other typed data is automatically UB in C++. Another unnecessary incompatibility.  
The above code has undefined behavior in C too. C's effective type rules do not permit changing the effective type of a declared object to something other than its declared type; it only permits that for objects allocated with malloc or similar.

In the case where the storage /was/ allocated through malloc or similar, C++ requires a placement new where C simply allows the effective type to change through a store (and some parts of the C effective type model don't work as a result...). It would seem reasonable to me for such allocation functions to be specified to have implicitly created whatever set of objects the following code relies on existing[1] -- the compiler typically has to make that pessimistic assumption anyway, since it doesn't know what objects the implementation of an opaque function might create, so it seems like we'd lose little and gain more C compatibility by guaranteeing something like that.

 [1]: that is, we could require the compiler to assume that malloc runs a sequence of placement news (for types with trivial default construction and destruction) before it returns, where that set is chosen to be whatever set gives the program defined behavior -- if such a set exists

The result of a "sequence of placement news" on a piece of memory is the creation of an object of the last type `new`ed. The C++ object model does not permit storage to have an indeterminate object or many separate objects (outside of nesting). If you allocate 4 bytes and new an `int` into it, then it is an int. If you new a `float` into it, it stops being an `int`.

I never said they would all be at the start of the allocation.

... that doesn't make sense. I mean, where else are they going to be except for the start? If I allocate 4 bytes, then you need to `new` up both `int` and `float` (assuming they're both 4 bytes, of course). But there's no room to `new` them at different addresses within that allocation, since the allocation is only 4 bytes.

I don't know what this example is supposed to demonstrate.

You originally said:

> we could require the compiler to assume that malloc runs a sequence of placement news (for types with trivial default construction and destruction) before it returns, where that set is chosen to be whatever set gives the program defined behavior -- if such a set exists

Therefore, if I 'malloc' 4 bytes of storage, then placement `new` will be executed on that storage for both `int` and `float`. Among others. That's what you're asking for.

No, it's not. If the program does this:

  void *p = malloc(4);
  int *n = (int*)p;
  *n = 0;
  float *f = (float*)p;
  *f = 0;

then there is no set of placement new operations that malloc could have performed that result in this program being valid, so the rule I described does not place requirements on the behavior of this program.

But if the program does this:

  void *p = malloc(8);
  int *n = (int*)p;
  *n = 0;
  float *f = (float*)((char*)p + 4);
  *f = 0;

... then that would be valid if malloc created an int at offset 0 and a float at offset 4 (plus an array of chars covering the whole array to make the pointer arithmetic valid).

And as I said, that would make the memory both an `int` and a `float` at the same time. You then said:

> I never said they would all be at the start of the allocation.

Then where is it going to be? Where does the `int` get created and where does the `float` get created, since there's not room enough for both?

Not all allocations are of 4 bytes. You seem to be saying that because this can't happen in one particular situation, it can't happen in any situation. That obviously doesn't follow.
 
I'm trying to understand what you're suggesting the standard do here, and thus far, it does not make sense.

Martin Ba

unread,
Jan 17, 2017, 5:49:39 PM1/17/17
to ISO C++ Standard - Discussion
On Tuesday, January 17, 2017 at 10:53:59 PM UTC+1, Richard Hodges wrote:
> Um, no. This now requires that every C++ compiler also implement <insert version here> of C. 

This is not what I am saying. I am saying that the objects that are imbued with extern "C" must exist in the C memory model. They already do, and the entirety of the c++ world depends upon this fact today. This is an inescapable truth.

> Just because "c++ enterprise" depends on some non-C++ features doesn't mean we should shove them into the standard.

The alternative is that almost every meaningful application and library in existence is fundamentally non-portable. This is not in the interests of c++ developers, users of their work, or indeed manufacturers of the compilers. So I have to say, with respect, that you are mistaken. An ISO standard describing a system built upon UB is meaningless because all programs become strictly non-portable. See above.


This!
 
Furthermore, declaring that the two object models "must coexist in an unsurprising way" basically says nothing. It's about as useful for deciding on behavior as the wording on pointer-to-`intptr_t` conversions.

You know full well that I am paraphrasing.

I'll make you a bet. Let's put this question to a poll of c++ developers (say with more than 4 years' experience). I'll give you even money on any bet you care to take that I my position would win by a ratio exceeding 7:3

It's what the language does. It's what the entire developer base expects the compiler to do. It is the de-facto truth of c++. The standard is currently perverse in stating otherwise.


Exactly my thoughts.
 
The committee should hang its head in shame.
 

Who knows? Has anyone that actually participates in CWG commented on the original question yet, namely whether this has been discussed (like this thread here) and whether there have been any conclusions?

It seems to be a hard problem to solve with very little immediate practical benefit, so I'm rather not surprised it hasn't been done yet (on the other hand, someone found time for P0137R1).


Richard Smith

unread,
Jan 17, 2017, 5:50:24 PM1/17/17
to std-dis...@isocpp.org
(This doesn't quite work because the "provides storage" rule doesn't permit an array of plain char to provide storage for other objects, but I'm increasingly thinking that's a mistake.)

Richard Hodges

unread,
Jan 17, 2017, 6:17:33 PM1/17/17
to std-dis...@isocpp.org
(This doesn't quite work because the "provides storage" rule doesn't permit an array of plain char to provide storage for other objects, but I'm increasingly thinking that's a mistake.)

The mistake was made when the committee decided that c++ was not a low level language that could deliberately do low level things legally

Clearly some young zealot thought that c++ was too useful, and the standards committee needed to be distracted from doing useful work, like providing ranges, proper threading, async networking, http (for goodness' sake! it's 20 years old!!!), graphics and other useful libraries. 

They seem to have done that by turning the committee into some kind of theoretical computing debating society.

It's time the committee woke up and brought useful, practical improvements to the language, rather than ridiculous meaningless rules that just get in the way of expressing obvious logic. 

Lambdas were a good start, and some kind of formalisation on threading was useful, copy elision is great...

But the threading library is garbage. Un-useable for anything serious. We still have to reach for boost, or TBB, or some other third party crutch. It's embarrassing, and a huge inconvenience when cross-compiling for linux, ios, android, etc as I and people like me do.

For a language that's geared for correctness and performance, this is disgraceful.

Someone needs to go through the committee with a stiff broom.



Thiago Macieira

unread,
Jan 17, 2017, 11:48:15 PM1/17/17
to std-dis...@isocpp.org
On quarta-feira, 18 de janeiro de 2017 00:17:30 PST Richard Hodges wrote:
> The mistake was made when the committee decided that c++ was not a low
> level language that could deliberately do low level things legally.
>
> Clearly some young zealot thought that c++ was too useful, and the
> standards committee needed to be distracted from doing useful work, like
> providing ranges, proper threading, async networking, http (for goodness'
> sake! it's 20 years old!!!), graphics and other useful libraries.

In the committee's defence and noting that I am not a member of the committee,
the rule that we're talking about here was in the original language of the
standard, in C++98, and is partially shared with C: the strict aliasing rule.
The reason we're talking about it now is that compilers have become a lot
smarter in the past 10 years and have begun using the UBs to optimise code.
Plus there was some well-intended clarification added to C++17 that apparently
had side-effects.

As for the other things that you think are useful, they need to be built layer
by layer. In order to have HTTP support, we need to have support for sockets,
threading asynchronous operations, and manipulating URIs. In turn, in order to
have support for sockets, we need primitives for socket addresses and at least
a modicum of SSL/TLS control. Probably other things too.

Some of what I mentioned has papers in the standardisation track. Some others
have ideas. Some others are still unaddressed. But we are building it, little
by little.

> They seem to have done that by turning the committee into some kind of
> theoretical computing debating society.

I have often thought the same. Just look at std::chrono.

> It's time the committee woke up and brought useful, practical improvements
> to the language, rather than ridiculous meaningless rules that just get in
> the way of expressing obvious logic.

I dispute that there aren't practical improvements being added and I dispute
that the rules are meaningless.

> But the threading library is garbage. Un-useable for anything serious. We
> still have to reach for boost, or TBB, or some other third party crutch.
> It's embarrassing, and a huge inconvenience when cross-compiling for linux,
> ios, android, etc as I and people like me do.

Care to elaborate what your problems are? TBB is not a benchmark, though: it
does a lot more than threading alone.

> For a language that's geared for correctness and performance, this is
> disgraceful.
>
> Someone needs to go through the committee with a stiff broom.

Can that someone be you?

Thiago Macieira

unread,
Jan 18, 2017, 12:08:48 AM1/18/17
to std-dis...@isocpp.org
On terça-feira, 17 de janeiro de 2017 13:04:55 PST Nicol Bolas wrote:
> Standards do not work based off of vague inferences. They have to *specify*
> behavior. So if an "inference" is going to be made, then there *must* be an
> explicit enumeration of syntactic constructs which the compiler will use to
> "infer" the type.

It wasn't a vague inference. It was a logical conclusion based on existing
rules.

If the compiler sees you casting a memory block to a given class, unless it
has reason to doubt you, it should trust you that you're right and that
pointer points to an area containing an object of that type. That is, if your
code is:

extern "C" void *allocate();
void *ptr = allocate();
S *s = static_cast<S *>(ptr);
s->i = 0;

why should it doubt you? What's to say that the allocate function isn't:

void *allocate() { return new S; }

To be explicit: unless the compiler can prove that the allocation function
*isn't* allocating an S, it has no reason to doubt your casting.

> At which point, those syntaxes are not being used to "infer" something;
> they now become alternate syntaxes for creating an object.

As I said in another email, static_cast does not create the object and nor
does the dereferencing of that pointer. The creation of the POD object
happened in the allocation of the storage, since the constructor is trivial.

> If you say that `static_cast` can begin the lifetime of an object, then you
> need to say under which circumstances that will happen. Does it only work
> from memory fresh out of `malloc`, or can it work on `malloc`ed memory that
> used to have an object in it and you're now replacing it with another? How
> do you tell the difference between pointer conversion and object
> initialization?

See above. Initialisation happens inside the allocation function, not on
casting.

Now, the lifetime can end if you repurpose the storage by memcpy'ing something
else there. See my other email where I said memcpy can be the same as:

x2.~X();
new (&x2) X(x1);

> C++ has a way to tell the difference: `static_cast` is for pointer
> conversion; `new()` is for object initialization. Because of that, it can
> do this:
>
> struct S { int i; };
> struct T { float f; };
>
> auto mem = malloc(std::max(sizeof(S), sizeof(T)));
>
> S *s = new(mem) S;
> S->i;
>
> T *t = static_cast<T*>(mem); //OK, but you can't use `t`.

Actually, I've seen UBSan complain about a static cast of the wrong type, so
you shouldn't cast to the wrong type, even if you don't use the pointer.
Though in that case we were talking about polymorphic types and here we're
talking about trivial ones.

> T *t2 = new(mem) T; //Can use `t2`, but not `s` anymore.
> t = std::launder(t); //I can use `t` now.

Agreed, your code is fine. And using the placement new allows us to be explicit
about the object initialisation and also safe if the type in question isn't
trivially constructible.

But the compiler cannot prove that malloc didn't initialise the object before
it returned. Take the allocate() function from above: if we expand the
operator new, we get:

void *allocate()
{
auto ptr = ::operator new(sizeof(S));
new (ptr) S;
return ptr;
}

But since S has a trivial constructor, the placement new must expand to
absolutely nothing and have no side effects. Therefore, that function is
functionally identical to:

void *allocate() { return ::operator new(sizeof(S)); }

Finally, since the default ::operator new function just calls malloc, it's no
different from:

void *allocate() { return malloc(sizeof(S)); }

To me, this proves that you cannot distinguish malloc() or any other memory
allocation function from a function that initialises a trivial object.

Thiago Macieira

unread,
Jan 18, 2017, 12:15:10 AM1/18/17
to std-dis...@isocpp.org
On terça-feira, 17 de janeiro de 2017 21:59:12 PST Richard Hodges wrote:
> All the standard needs is the addition of a paragraph that says:
>
> 'any type or intrinsic object declared inside extern "C" exists in the C
> memory model (see ISO standard xxx). Members of objects of an extern "C"
> class type shall behave as per the C language, previously cited. The C++
> and C memory models must coexist in an unsurprising way'

I'd rather not go there, for two reasons:
1) changing the rules for extern "C" is opening Pandora's box
2) it's not sufficient, as it's possible that C headers do:

struct Foo {
int bars;
double bar[];
};

#ifdef __cplusplus
extern "C" {
#endif

Foo *makeFoo();

#ifdef __cplusplus
]
#endif

I think we already have the rule we need: POD should behave like C. That's
what "POD" exists for anyway.

Jens Maurer

unread,
Jan 18, 2017, 2:36:44 AM1/18/17
to std-dis...@isocpp.org
On 2017-01-17 23:49, Martin Ba wrote:
> Who knows? Has anyone that actually participates in CWG commented on the
> original question yet, namely whether this has been discussed (like this
> thread here) and whether there have been any conclusions?

As I said before, I think I can say that CWG believes the post-P0137R1
wording in this area accurately reflects the intent of the committee.

I've re-read the discussion notes on P0137R1, and it seems the question
of "malloc", while somewhat related to P0137R1, has not been discussed
specifically. ("malloc" is also sort-of out-of-scope for P0137R1.)

I believe any change in this area needs a paper (possibly even reviewed
by EWG, not just CWG), as opposed to some ranting on a mailing list,
so if you feel strongly that a change is required for "malloc", please
propose one (preferably with wording changes) in a paper, for review
at one of the next WG21 meetings.

Jens


Richard Hodges

unread,
Jan 18, 2017, 2:58:15 AM1/18/17
to std-dis...@isocpp.org
The committee has already rejected async io and continuation futures - fundamental building blocks that every other language takes for granted. It's had years to come to its senses. All it had to do was adopt some longstanding boost constructs into the language without watering them down to the point of uselessness. It couldn't even do that. It's clearly not fit for purpose.

Who's in charge? what's his number?

If they're still fumbling over whether as cast is really a cast, or mentally pleasuring themselves over whether they're really allowed to implement std::vector or not, then I'm afraid the metaphors would have to be that the band is still playing while the titanic is being steered into an iceberg. Honestly, from the point of view of a long-time user and evangelist of this language, it just makes you sound like a bunch of nerds wondering whether you're allowed to look inside the school computer, or whether you're going to get a scolding from teacher.

A new committee needs to re recruited. People who actually want to achieve something, to move the language forward to the point where it becomes the de-facto choice for any project.

It is my view that c++ is the best language for performant, cross platform, licence-free development. This is despite the efforts of the committee, not because of them.

I don't care when the "problem" of "is my memory really my memory" came up. All I care about is that after 21 years, this trivial, petty, non-issue gets closed (by deleting the ridiculous notion that you have to call new on an int, are you all mad??) and the language moves on.

Damn right I'll step up. Who do I need to speak to?



--



---

You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.

To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.

barry....@gmail.com

unread,
Jan 18, 2017, 8:05:31 AM1/18/17
to ISO C++ Standard - Discussion

Damn right I'll step up. Who do I need to speak to?

Probably a good therapist. Insulting and trivializing people isn't exactly an efficient way to reach consensus. 

Once you do that, you can follow the steps here: https://isocpp.org/std/submit-a-proposal

Richard Hodges

unread,
Jan 18, 2017, 8:19:53 AM1/18/17
to std-dis...@isocpp.org
This isn't going to help. There little point submitting proposals to a committee that is resistant to progress. We have watched with dismay as networking has foundered, concepts still aren't in, we still have to call new on an int-aligned memory location to make it an int (try explaining that to someone other than a committee member, they'll think you're nuts).

If any of you on the committee feeling insulted, you need to stop being so thin skinned. This can't be the first time someone from the user community has stood up and called you out on the current failure.

Sorry chaps, my livelihood, and that of hundreds of thousands of others depends upon you doing your job properly. When you do wrong, you need to hear it.

I am very happy to take a driving seat on the committee if that is what it will take to rescue this language from ththe current half-decade of navel-gazing.

So, who's in charge? What's his phone number?

Herb Sutter? Bjarn Stroustrup? Drop me a line. If it's not already obvious how to fix that standard I'll explain it in 20 minutes. Then we can move on and add some useful features, like, what's that thing that every computer has? Oh yes. Networking.

Time to step up to the plate gentlemen. The language needs help.


Tom Honermann

unread,
Jan 18, 2017, 9:40:35 AM1/18/17
to std-dis...@isocpp.org
On 01/18/2017 08:19 AM, Richard Hodges wrote:
So, who's in charge?


You are.  The committee is composed of volunteers.

If you aren't happy with the direction the committee is taking, I suggest you attend a meeting, get to know the members, understand and appreciate their concerns and motivations, and contribute to the hard work required to move forward a language that millions of people rely on.

https://isocpp.org/std/meetings-and-participation/

Tom.

Nicol Bolas

unread,
Jan 18, 2017, 10:31:08 AM1/18/17
to ISO C++ Standard - Discussion, barry....@gmail.com
On Tuesday, January 17, 2017 at 5:26:57 PM UTC-5, barry....@gmail.com wrote:


The latter, yes. You can still bit-blast them into buffers that don't
contain live objects yet,
so for that reason it's apparently rather important that such types
are trivially copyable,
but trivially copyable doesn't necessarily mean assignable or "can
blast values over existing objects".

I can bit-blast it into a buffer. But how can I bit-blast it out of a buffer?

struct X { const int val; };

// this is all well and good
alignas(X) char buffer[sizeof(X)];
new (buffer) X{42};
::send(buffer, sizeof(buffer));

alignas
(X) char recv_buffer[sizeof(X)];
::recv(recv_buffer, sizeof(recv_buffer));

// can't do this
X x
; // nope
memcpy
(&a, recv_buffer, sizeof(a));

Now there's an interesting idea for a standard library function:

auto x = std::trivial_copy_construct<X>(recv_buffer);

Obviously, it can only take `X` types which are trivially copyable and copy constructible. It returns a prvalue `X` (praise be to guaranteed elision).

Nicol Bolas

unread,
Jan 18, 2017, 11:08:25 AM1/18/17
to ISO C++ Standard - Discussion
On Wednesday, January 18, 2017 at 12:08:48 AM UTC-5, Thiago Macieira wrote:
On terça-feira, 17 de janeiro de 2017 13:04:55 PST Nicol Bolas wrote:
> Standards do not work based off of vague inferences. They have to *specify*
> behavior. So if an "inference" is going to be made, then there *must* be an
> explicit enumeration of syntactic constructs which the compiler will use to
> "infer" the type.

It wasn't a vague inference. It was a logical conclusion based on existing
rules.

If the compiler sees you casting a memory block to a given class, unless it
has reason to doubt you, it should trust you that you're right and that
pointer points to an area containing an object of that type. That is, if your
code is:

        extern "C" void *allocate();
        void *ptr = allocate();
        S *s = static_cast<S *>(ptr);
        s->i = 0;

why should it doubt you?

There's no question of doubting or not doubting. The question is "does static_cast create an object?" Because that's what you seem to want.

If that's what you want, then say that. That's what I'm talking about with "inferring;" it's nonsense. Either performing an operation creates a specific object or it doesn't. If it does, then we need to know exactly which operations should be creating exactly which objects and under exactly which circumstances.

Without that, you can't have a specification.

What's to say that the allocate function isn't:

void *allocate() { return new S; }

To be explicit: unless the compiler can prove that the allocation function
*isn't* allocating an S, it has no reason to doubt your casting.

Again, there's this pointless question of doubting or not doubting. C++ is very clear here: if your `allocate` function did indeed return an `S` object, then your code works. If it did not return an `S` object, then you get UB.

Doubt is not in question; the compiler is doing what it was told: if there was no object there, then you get UB.

The problem being discussed in this thread is when there is no object there. When there is merely a collection of bits that might form that object, but no object actually exists there yet. No object has been started in accord with the rules of [intro.object]/1.

Pretending that there is an object somewhere when there isn't is UB.

The only way to make that code legal, while still having a memory model that is coherent, is to have one of the operations in that code actually create the object.

So which operation should it be?

> At which point, those syntaxes are not being used to "infer" something;
> they now become alternate syntaxes for creating an object.

As I said in another email, static_cast does not create the object and nor
does the dereferencing of that pointer. The creation of the POD object
happened in the allocation of the storage, since the constructor is trivial.

Which POD? If I call `malloc`, which POD did it create? It cannot simultaneously create all PODs. So which one was it?

Because if you cannot answer that question, then you have a dysfunctional memory model. We cannot have a Schrodinger's Cat memory model, where the memory contains some quantum state object that is all POD types simultaneously until you first look at it.

> If you say that `static_cast` can begin the lifetime of an object, then you
> need to say under which circumstances that will happen. Does it only work
> from memory fresh out of `malloc`, or can it work on `malloc`ed memory that
> used to have an object in it and you're now replacing it with another? How
> do you tell the difference between pointer conversion and object
> initialization?

See above. Initialisation happens inside the allocation function, not on
casting.

Initialization is irrelevant. Objects can be created without initializing them; trivial default constructors do it all the time.

What matters is when an object is created.

Now, the lifetime can end if you repurpose the storage by memcpy'ing something
else there. See my other email where I said memcpy can be the same as:

        x2.~X();
        new (&x2) X(x1);

> C++ has a way to tell the difference: `static_cast` is for pointer
> conversion; `new()` is for object initialization. Because of that, it can
> do this:
>
> struct S { int i; };
> struct T { float f; };
>
> auto mem = malloc(std::max(sizeof(S), sizeof(T)));
>
> S *s = new(mem) S;
> S->i;
>
> T *t = static_cast<T*>(mem); //OK, but you can't use `t`.

Actually, I've seen UBSan complain about a static cast of the wrong type, so
you shouldn't cast to the wrong type, even if you don't use the pointer.

But it doesn't provoke UB; [expr.static.cast]/13 says so. It's accessing data through that pointer that provokes UB. So such "UBSan" are wrong.
 
Though in that case we were talking about polymorphic types and here we're
talking about trivial ones.

> T *t2 = new(mem) T; //Can use `t2`, but not `s` anymore.
> t = std::launder(t); //I can use `t` now.

Agreed, your code is fine. And using the placement new allows us to be explicit
about the object initialisation and also safe if the type in question isn't
trivially constructible.

But the compiler cannot prove that malloc didn't initialise the object before
it returned.

The compiler most certainly can prove that malloc did not create an object of type `S`. Because `std::malloc` is not specified to create an object of any type; it merely allocates memory. As such, the compiler is free to assume that the memory `std::malloc` returns has no objects in it.
 
Take the allocate() function from above: if we expand the
operator new, we get:

void *allocate()
{
        auto ptr = ::operator new(sizeof(S));
        new (ptr) S;
        return ptr;
}

But since S has a trivial constructor, the placement new must expand to
absolutely nothing and have no side effects. Therefore, that function is
functionally identical to:  

void *allocate() { return ::operator new(sizeof(S)); }

Please read [intro.object]/1. The two pieces of code are not identical. No matter the fact that `new(ptr) S` will be a no-op for real-life compilers, it still does something. It creates the object. Without that line, you have storage that contains no objects.

Finally, since the default ::operator new function just calls malloc, it's no
different from:

void *allocate() { return malloc(sizeof(S)); }

To me, this proves that you cannot distinguish malloc() or any other memory
allocation function from a function that initialises a trivial object.

It proves no such thing. Given sufficient power and inlining, the compiler can recognize that you're calling `std::malloc`, which as defined by the standard does not create an object. The compiler can see that `allocate` also does not create a C++ object. Therefore, the compiler has all the information to know that, if you don't create an object in that memory yourself, then trying to access it will invoke UB. And therefore, the compiler is technically free to wipe out any of those operations that don't have side-effects. For example:

auto ptr = allocate();
auto s_ptr = static_cast<S*>(ptr);
s_ptr
->i = 5;

Memory allocation has side-effects, so they can't be wiped out. But the third line could be freely ignored by a really smart compiler, because it knows that there is no `S` object there.

Now granted, I have no idea why someone would write a compiler that does that. If `allocate` did indeed create an `S` object, then the compiler would compile the code as normal. And if `allocate` doesn't create an `S` object... what would be the point of removing the third line? The user may have provoked UB, but you're not saving any performance in a well-defined program.

But the fear some people have on this thread is that the C++ memory model permits the compiler to throw out such code. And therefore, should be "fixed"... somehow.

Nicol Bolas

unread,
Jan 18, 2017, 11:30:43 AM1/18/17
to ISO C++ Standard - Discussion


On Tuesday, January 17, 2017 at 5:45:35 PM UTC-5, Martin Ba wrote:
On Tuesday, January 17, 2017 at 7:28:44 PM UTC+1, Nicol Bolas wrote:
On Tuesday, January 17, 2017 at 11:32:55 AM UTC-5, Richard Hodges wrote:
I cannot see any reasonable argument that pointer arithmetic should not be allowed to work on consecutive objects.

And nobody has made such an argument. Indeed, I'm pretty sure that I stated quite the opposite. Though for very different reasons and different restrictions.
 
int* p = get_ints_from_c(); *(++p); should absolutely be defined behaviour in c++ provided there is actually some memory at std::addressof(*p) + sizeof(p); - there is no conceivable reason why it should not. 

Note that I am asserting *should absolutely* - a very strong statement. This is because we absolutely cannot move away from c. There are no c++ operating systems. Therefore all useful libraries are written with C interfaces. Thousands of c++ wrapper libraries exist to turn those C interfaces back into c++. We don't do that because we want to. We do that because C++ is not suitable creating portable object libraries, having no modules or common ABI.

By all means lets talk about moving forward - after we have modules, defined ABIs, an agreed-upon means to transmitting exceptions and so on.

Until then, the entire foundation of our C++ universe is C. To try to pretend otherwise is a fallacy.

OK, it's "difficult" to marry the c++ abstract machine model with the C memory model. So what? That doesn't mean that it should not be done.

OK, so explain what we will gain by doing all of this work. How will it make my currently functional code faster and/or better? How will it make my programs more correct? How will it improve the C++ object model in ways that are useful for actual C++ programs?


What we will gain is a Standard that is not contradicting reality, which is *at least* a marketing asset for the language. (compare: isocpp.org)

Exactly how do you market something that by definition doesn't actually change anything? Behold, the new C++, with 100% more of the exact same stuff you had before.
 
What we will gain is people writing reasonable real world "low level" code *not being told* that their code is UB and that they should resort to memcpy and no-op placement-new contortions - this is at least an asset wrt. the learning curve of the language.

Nonsense. It is only a learning curve problem for C programmers. For people from other languages, or none at all, they learn what they are told.

Native C++ programmers, who have not been exposed to C-isms, are highly unlikely to resort to casting memory and so forth.

What we will gain is not having to spend time on rather fruitless discussion like this one here.
 
The status quo is adequately functional. And if C is as entwined with C++ as you believe, then no compiler vendor is going to break the world with "optimizations" that don't actually make things more optimal.

The status quo in reality is functional. The Standard contradicts reality in this regard. That it works everywhere in practice, and can be expected to do so, is only an argument for the priority of fixing this, not an argument for not fixing the Standard.


I have to say I do not quite follow you argumentation wrt. this: on one hand you seem to care very much about the Standard supplying a useful and consistent object model, but on the other hand, so seem to say that the places where this shiningly consistent model is violated by a huge fraction of programs in existence don't matter because they will continue to work anyway.

Essentially yes, but there's more to it than that.

The problem basically boils down to this: C++ makes C-isms undefined behavior, but a lot of code relies on C-isms, so compilers aren't free to discard them or do anything about them. The solutions being tossed about here are that we should make them well-defined behavior.

I have a different solution. Instead of promoting garbage C-isms like pointer casting and so forth, we make C++ equivalents. Placement `new` is one such C++-ism which allows the creation of C++ objects in arbitrary memory. But we can add many more.

If people need a way to take memory that has been filled in from external code and use that as a C++ object which is compatible with the layout of that memory, lets provide them with a function that does that. If people need a way to initialize an object directly from compatible data externally provided, let's provide them a way to do that. Let's take all of the useful C-isms and provide C++ ways to do them, rather than promoting pointer casting and whatnot as good code.

In the end, if we give low-level programmers ways to work within the C++ memory model that don't make their code slower, then they ought to stop using C-isms. And those few who continue to rely on C-isms in C++ will just be no different than any other code that relies on UB.

Nicol Bolas

unread,
Jan 18, 2017, 11:40:14 AM1/18/17
to ISO C++ Standard - Discussion
On Wednesday, January 18, 2017 at 2:58:15 AM UTC-5, Richard Hodges wrote:
The committee has already rejected async io and continuation futures

... what? Since when were these rejected?

I don't know of any async io proposals. Unless you're talking about the Networking TS, which is... a Technical Specification. That's the exact opposite of being "rejected".

As for continuation futures, those too were not "rejected". I believe they were part of library fundamentals v2, which was not adopted into C++17. But that's far from being "rejected".

So I really have no idea what you're talking about.

Richard Hodges

unread,
Jan 18, 2017, 11:41:13 AM1/18/17
to std-dis...@isocpp.org
Essentially yes, but there's more to it than that.

Only if you enjoy complicating your life.

The problem basically boils down to this: C++ makes C-isms undefined behavior, but a lot of code relies on C-isms, so compilers aren't free to discard them or do anything about them. The solutions being tossed about here are that we should make them well-defined behavior.

The problem is that intrinsic types are not objects, and neither are PODs. To treat them the same is counter-factual. aligned memory is inherently a union of all PODs that will fit. Make it so in the standard. End the argument forever.

This makes the C++ behaviour the same as C behaviour when intrinsics and PODS are mapped onto memory. It's logical, everyone does it anyway, and it's never going away in gcc or clang. End of problem. Lets get on with something new.

 > I have a different solution. Instead of promoting garbage C-isms like pointer casting and so forth, we make C++ equivalents. Placement `new` is one such C++-ism which allows the creation of C++ objects in arbitrary memory. But we can add many more.

NO - because that just adds more useless work for programmers. It's the reverse of what auto does - which is make life easier and better. Useless work like having to formally introduce storage (which is what you're suggesting) is what COBOL and Pascal did. They're dead now. Let's not do that.

If people need a way to take memory that has been filled in from external code and use that as a C++ object which is compatible with the layout of that memory, lets provide them with a function that does that. If people need a way to initialize an object directly from compatible data externally provided, let's provide them a way to do that. Let's take all of the useful C-isms and provide C++ ways to do them, rather than promoting pointer casting and whatnot as good code.

No need for any of that. It already happens in gcc. gcc *is* the standard. The ISO standard needs to catch up.





--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.

Richard Hodges

unread,
Jan 18, 2017, 11:46:51 AM1/18/17
to std-dis...@isocpp.org
 those too were not "rejected"

"not adopted" means exactly the same thing as "rejected" when you are waiting for a feature to come out so that you can standardise code across platforms.

Every 3 year delay in the committee adopting a perfectly good feature is a waste of hundreds of thousands of man-hours (probably much more) across the c++ developer community.

Pedantically splitting hairs with concerned observers is the kind of behaviour that causes delays in the progress of c++. Let's not do that either.




--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.

Thiago Macieira

unread,
Jan 18, 2017, 12:03:11 PM1/18/17
to std-dis...@isocpp.org
On quarta-feira, 18 de janeiro de 2017 08:08:25 PST Nicol Bolas wrote:
> There's no question of doubting or not doubting. The question is "does
> static_cast create an object?" Because that's what you seem to want.

It does not. I thought I was very explicit. The object was created before the
static_cast.

> What's to say that the allocate function isn't:
> > void *allocate() { return new S; }
> >
> > To be explicit: unless the compiler can prove that the allocation function
> > *isn't* allocating an S, it has no reason to doubt your casting.
>
> Again, there's this pointless question of doubting or not doubting. C++ is
> very clear here: if your `allocate` function did indeed return an `S`
> object, then your code works. If it did not return an `S` object, then you
> get UB.

Agreed. There's no doubting that.

What I am saying is that in an opaque function returning a void*, the compiler
cannot infer that that it did not create an S. Therefore, it must allow for
the possibility that it did and cannot optimise otherwise.

> The problem being discussed in this thread is when *there is no object
> there*. When there is merely a collection of bits that might form that
> object, but no object actually exists there yet. No object has been started
> in accord with the rules of [intro.object]/1.
>
> Pretending that there is an object somewhere when there isn't is UB.

I disagree. POD & trivial object lifetimes begin when storage for them is
allocated. Therefore, a POD object could exist there with just a collection of
unspecified bits.

> > void *allocate() { return ::operator new(sizeof(S)); }
>
> Please read [intro.object]/1. The two pieces of code are not identical. No
> matter the fact that `new(ptr) S` will be a no-op for real-life compilers,
> it still *does something*. It creates the object. Without that line, you
> have storage that contains no objects.

How can an operation that does nothing do something?

The ultimate allocation function is always opaque and non-inline: if it's not
allocate() above, it's ::operator new(). If it's not ::operator new(), then
it's malloc(). If it's not malloc(), then it's a system call like VirtualAlloc
or brk() or mmap(). So even if the compiler can inline almost everything, it
stops at some opaque boundary that returns a pointer to a bag of bits. That
being the case, the compiler does not know what the allocation function did or
did not do. Therefore, it must account for the possibility that the function
did placement-new S there.

And even if the deep embedded system has no system calls and simply finds
memory from a list somewhere, it usually cannot "remember" what happened in
that memory allocation, so it cannot rule out that the new S did happen some
time in the past.

The way I see it, this is reality. Even if we made the standard language
stupid and try to diverge from reality, it wouldn't change the behaviour
because the compiler could never prove UB and therefore optimise your code to
be different from what it is today.

I don't think we need to change the standard, but we can clarify this:

> It proves no such thing. Given sufficient power and inlining, the compiler
> can recognize that you're calling `std::malloc`, which as defined by the
> standard does not create an object. The compiler can see that `allocate`
> also does not create a C++ object. Therefore, the compiler has all the
> information to know that, if you don't create an object in that memory
> yourself, then trying to access it will invoke UB. And therefore, the
> compiler is technically free to wipe out any of those operations that don't
> have side-effects.

We should change the language of the standard to say that std::malloc could
have created a trivially-constructible object there.

> For example:
>
> auto ptr = allocate();
> auto s_ptr = static_cast<S*>(ptr);
> s_ptr->i = 5;
>
> Memory allocation has side-effects, so they can't be wiped out. But the
> third line could be freely ignored by a really smart compiler, because it
> knows that there is no `S` object there.

Agreed, it could, but only if the compiler can prove that there is no S there.
What I am proposing is that we tell the compiler it doesn't know what's there.
Therefore, an S *could* be there.

> But the fear some people have on this thread is that the C++ memory model
> permits the compiler to throw out such code. And therefore, should be
> "fixed"... somehow.

Yes.

Nicol Bolas

unread,
Jan 18, 2017, 12:10:39 PM1/18/17
to ISO C++ Standard - Discussion
On Wednesday, January 18, 2017 at 11:46:51 AM UTC-5, Richard Hodges wrote:
 those too were not "rejected"

"not adopted" means exactly the same thing as "rejected" when you are waiting for a feature to come out so that you can standardise code across platforms.

No, it really doesn't.

If something is rejected, that means it's flat-out not gonna happen. So you should stop waiting for it and try to work around the problem. If it was standardized in a TS, then it's going to happen, but not yet. Thus, waiting is a legitimate action.

Robert Haberlach

unread,
Jan 18, 2017, 12:10:41 PM1/18/17
to std-dis...@isocpp.org
On 1/18/2017 4:41 PM, Richard Hodges wrote:
Essentially yes, but there's more to it than that.

Only if you enjoy complicating your life.

The problem basically boils down to this: C++ makes C-isms undefined behavior, but a lot of code relies on C-isms, so compilers aren't free to discard them or do anything about them. The solutions being tossed about here are that we should make them well-defined behavior.

The problem is that intrinsic types are not objects, and neither are PODs. To treat them the same is counter-factual. aligned memory is inherently a union of all PODs that will fit.
Are you stoned?

To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.

Nicol Bolas

unread,
Jan 18, 2017, 12:19:51 PM1/18/17
to ISO C++ Standard - Discussion
On Wednesday, January 18, 2017 at 11:41:13 AM UTC-5, Richard Hodges wrote:
Essentially yes, but there's more to it than that.

Only if you enjoy complicating your life.

The problem basically boils down to this: C++ makes C-isms undefined behavior, but a lot of code relies on C-isms, so compilers aren't free to discard them or do anything about them. The solutions being tossed about here are that we should make them well-defined behavior.

The problem is that intrinsic types are not objects, and neither are PODs. To treat them the same is counter-factual. aligned memory is inherently a union of all PODs that will fit. Make it so in the standard. End the argument forever.

You keep saying that as though it were some objective fact rather than a choice.

To a compiler, there's no such thing as an "object"; memory is just memory. It contains values. And so forth.

The concept of "object" only exists at the level of the standard. And thus, an object can be whatever we choose for it to be.

If trivial types are not objects, then what are they? And how would that work with code that acts on objects?

This makes the C++ behaviour the same as C behaviour when intrinsics and PODS are mapped onto memory. It's logical, everyone does it anyway, and it's never going away in gcc or clang. End of problem. Lets get on with something new.

 > I have a different solution. Instead of promoting garbage C-isms like pointer casting and so forth, we make C++ equivalents. Placement `new` is one such C++-ism which allows the creation of C++ objects in arbitrary memory. But we can add many more.

NO - because that just adds more useless work for programmers. It's the reverse of what auto does - which is make life easier and better. Useless work like having to formally introduce storage (which is what you're suggesting)

No, I'm not. What I'm saying is that casting should not be something that is encouraged. If you want to perform a certain operation, you should perform that operation, not fiddle around with what type a pointer points to or other such nonsense.

Consider the following:

auto ptr = malloc(4);
autp i_ptr
= static_cast<int*>(ptr);
*i_ptr = 5;

To a programmer who has never seen C-isms before, this looks like nonsense. You get some memory, then pretend that it stores an `int`? How? Why? By contrast:

auto ptr = malloc(4);
auto i_ptr = new(ptr) int;
*i_ptr = 5;

This is sane code. It clearly allocates memory and creates an `int`. It then accesses it.

Similarly:

Type t;
memcpy
(&t, some_ptr, sizeof(T);

Is oddball. By contrast:

auto t = std::trivial_copy_construct<T>(some_ptr);

Is far more reasonable. You're clearly constructing a `T` from memory.
 
is what COBOL and Pascal did. They're dead now. Let's not do that.

If people need a way to take memory that has been filled in from external code and use that as a C++ object which is compatible with the layout of that memory, lets provide them with a function that does that. If people need a way to initialize an object directly from compatible data externally provided, let's provide them a way to do that. Let's take all of the useful C-isms and provide C++ ways to do them, rather than promoting pointer casting and whatnot as good code.

No need for any of that. It already happens in gcc. gcc *is* the standard.

No, it isn't.

Richard Hodges

unread,
Jan 18, 2017, 1:41:24 PM1/18/17
to std-dis...@isocpp.org
You keep saying that as though it were some objective fact rather than a choice.

The fact that I can cast properly aligned storage to a POD and use it as a POD:

a) is a de-facto reality and always will be
b) is necessary to allow c++ to interact with every computer system in the world.
c) should therefore obviously be mandated as true in the standard.

The concept of "object" only exists at the level of the standard. And thus, an object can be whatever we choose for it to be.

Good. We agree on that. Let's choose for for an 'object' to be something more complex than a BASIC POD. Let's define a BASIC POD as being a POD with only defaulted special functions. Lets also choose that a pointer to BASIC POD is a template through which we manipulate memory (subject to the as-if-rule). Lets also choose that any sufficiently aligned and sized memory block can be viewed in a defined way through a pointer to a BASIC POD.

Now, if we choose to overlay a BASIC POD half-way into some other object, then OBVIOUSLY, access to that other object is undefined. But the BASIC POD is not.

Why? 
a) Because this is reality and,
b) It's necessary and,
c) it solves your pet problem - implementing a vector correctly.

Lets also allow a BASIC POD to have its last member as a Zero-sized array. Such an array may be validly accessed provided there is storage behind it - because this allows us to create really useful things like buffers that the average programmer can understand.

Further. Lets further legalise pointer arithmetic.

Finally, let's stop trying to pretend that memory is some nebulous thing. It's memory. Sometimes C++ needs to go low level and it's useful for it to be high level. Let's keep the versatility. gcc's optimiser copes with that, so does clang's. There is no start writing doublespeak in the standard about it not being true. It is true.

What I'm saying is that casting should not be something that is encouraged.

Casting cannot be avoided when you interface with C libraries. Every production c++ program interfaces with C (and sometimes objective-C) libraries. Therefore, casting cannot be avoided. Interacting with C's "BASIC PODS" cannot be avoided. Therefore it must not be undefined. If nothing else, this will prevent every 20th post on stackoverflow from being howls of outrage that being forced to write memcpy, only for the copy to be thrown away.

Let me put this another way:

this code:

std::memcpy(&myints, your_chars, sizeof(int) * 10); 

currently signals to the compiler that your_chars are really an array of 10 ints.

so should this:

struct F {
  int n;
  int a[]
};

extern "C" { 
  F* makeF();
}

auto pf = makeF();

// pf->n should be valid AND pf->a[pf->n-1] should be valid when (pf->n > 0)

and this:

int* pint = (int*)your_chars;

in which case *(pint + 6) should be mandated as valid *if* there is storage behind the address.

If you want to treat c++ as a high level language, no problem. Someone else can write the wrapper for you. But the wrapper should be able to be standard-compliant within the C++ language





 
 

--

Nicol Bolas

unread,
Jan 18, 2017, 2:43:52 PM1/18/17
to ISO C++ Standard - Discussion
On Wednesday, January 18, 2017 at 1:41:24 PM UTC-5, Richard Hodges wrote:
You keep saying that as though it were some objective fact rather than a choice.

The fact that I can cast properly aligned storage to a POD and use it as a POD:

a) is a de-facto reality and always will be

Sure. But that doesn't mean we should encourage it.

b) is necessary to allow c++ to interact with every computer system in the world.

No, it is not necessary at all. We can develop C++ tools that don't involve casting which can achieve the same effect.

The mechanism isn't important; the effect is what matters. I don't see any problems with C++ using different mechanisms to achieve the effects you would use in C. As long as we can get equivalent functionality, there's no need to adopt C-isms.

c) should therefore obviously be mandated as true in the standard.

The concept of "object" only exists at the level of the standard. And thus, an object can be whatever we choose for it to be.

Good. We agree on that. Let's choose for for an 'object' to be something more complex than a BASIC POD. Let's define a BASIC POD as being a POD with only defaulted special functions. Lets also choose that a pointer to BASIC POD is a template through which we manipulate memory (subject to the as-if-rule). Lets also choose that any sufficiently aligned and sized memory block can be viewed in a defined way through a pointer to a BASIC POD.

OK, let's say we do that.

Are you prepared to rewrite the entire standard to that effect? Before, the standard could use the term "object"; when it did, that part of the standard would apply to a type as simple as `int` or one as complex as `complex_polymorphic_cpp_type`. `new int` returns a newly allocated `int` object. `new complex_polymorphic_cpp_type` would likewise return a newly allocated object of that type. So how do you define what it does now?

Now, you have to come up with some new term to represent both cases, because there are lots of parts of the standard that cover both. You then have to change it everywhere, except for those cases where you need to differentiate them.

And that's just one aspect of your suggested change.

It's easy to tell someone else to do the hard work, isn't it? It's easy to say "take this building down and build a new one." It's a lot harder when it's just you with some concrete and a sledgehammer who has to do the actual work.

If it's so easy, then do it. Put together a proposal. Not a couple of paragraphs, but an actual, firm, real proposal (one that perhaps doesn't use ALL CAPS as much as you do here. Seriously, why does BASIC need to be capitalized?). One with at least a good-faith attempt at standards wording. One that results in a coherent object model.

Now, if we choose to overlay a BASIC POD half-way into some other object, then OBVIOUSLY, access to that other object is undefined. But the BASIC POD is not.

Why? 
a) Because this is reality and,
b) It's necessary and,
c) it solves your pet problem - implementing a vector correctly.

It doesn't solve my "pet problem". Your change only affects non-"object"s; `vector<T>` needs to work whether `T` is an "object" type or a "BASIC POD" type.

Lets also allow a BASIC POD to have its last member as a Zero-sized array. Such an array may be validly accessed provided there is storage behind it - because this allows us to create really useful things like buffers that the average programmer can understand.

Further. Lets further legalise pointer arithmetic.

Pointer arithmetic is legal within arrays, which is fine. The problem is with things that are array-like but aren't technically arrays.

Suggesting that we fundamentally re-architect the entire object model just to make `vector` work is like using a nuke on a mountain, then rebuilding that mountain with a tunnel in it. It'd be a lot easier and less radioactive to just drill a tunnel.

Finally, let's stop trying to pretend that memory is some nebulous thing. It's memory. Sometimes C++ needs to go low level and it's useful for it to be high level. Let's keep the versatility. gcc's optimiser copes with that, so does clang's. There is no start writing doublespeak in the standard about it not being true. It is true.

What I'm saying is that casting should not be something that is encouraged.

Casting cannot be avoided when you interface with C libraries. Every production c++ program interfaces with C (and sometimes objective-C) libraries. Therefore, casting cannot be avoided. Interacting with C's "BASIC PODS" cannot be avoided. Therefore it must not be undefined. If nothing else, this will prevent every 20th post on stackoverflow from being howls of outrage that being forced to write memcpy, only for the copy to be thrown away.

Let me put this another way:

this code:

std::memcpy(&myints, your_chars, sizeof(int) * 10); 

currently signals to the compiler that your_chars are really an array of 10 ints.

Does it? Would `memcpy(&myints, your_chars, 4 * 10)` provide the same signal? How does the compiler know that it's an array of 10 ints and not of 10 floats? Or 20 shorts? Or 40 chars? Or whatever?

I know what would actually signal copying 10 ints:

std::trivial_copy_assign_strict<int>(&myints, your_chars, 10);

so should this:

struct F {
  int n;
  int a[]
};

That's not legal C++.
 

Richard Hodges

unread,
Jan 18, 2017, 2:43:53 PM1/18/17
to std-dis...@isocpp.org
Are you stoned?

No, but if you ever visit Spain I can show you where you may become so (quite legally) if you wish.

What is your problem with memory being... memory?

on a system where ints are 32 bits, 32-bit words are addressable without bitwise arithmetic, and the compiler deems that 128 bits is a reasonable alignment strategy...:

struct A {
  int a;
  int b[2];
};

struct B {
  int a[2];
  int b;
};

... both A and B occupy 128 bits. The value of the last 32 bits is irrelevant.

There is no reason whatsoever (other than handwaving from the 'c++ is not a low level language crowd') that they should not be a union of each other.

It makes no difference to optimisers, no difference to threading, no difference to anything other than the sensibilities of snowflake theorists.

Be a man. Embrace your memory.


Nicol Bolas

unread,
Jan 18, 2017, 2:54:26 PM1/18/17
to ISO C++ Standard - Discussion
On Wednesday, January 18, 2017 at 2:43:53 PM UTC-5, Richard Hodges wrote:
Are you stoned?

No, but if you ever visit Spain I can show you where you may become so (quite legally) if you wish.

What is your problem with memory being... memory?

on a system where ints are 32 bits, 32-bit words are addressable without bitwise arithmetic, and the compiler deems that 128 bits is a reasonable alignment strategy...:

struct A {
  int a;
  int b[2];
};

struct B {
  int a[2];
  int b;
};

... both A and B occupy 128 bits. The value of the last 32 bits is irrelevant.

There is no reason whatsoever (other than handwaving from the 'c++ is not a low level language crowd') that they should not be a union of each other.

C++ is a low level langauge. What C++ isn't is a language that pretends that compilers don't get to make choices about how things work.

The compiler gets to decide how to lay out both of those structures. And I see no reason why the compiler should be required to decide to lay them out identically.

I don't even know what you mean by "a union of each other", but that sounds very much like "let's pretend strict aliasing doesn't exist". So not gonna happen.

Richard Hodges

unread,
Jan 18, 2017, 2:58:57 PM1/18/17
to std-dis...@isocpp.org
Sure. But that doesn't mean we should encourage it.

enablement is not encouragement. 

No, it is not necessary at all. We can develop C++ tools that don't involve casting which can achieve the same effect.

Not standards compliant ones, you can't. And no-one has. Otherwise, please show me one. In the face of reality, you're quoting me hypotheses that will never see the light of day. Your'e wasting our time.

OK, let's say we do that. Are you prepared to rewrite the entire standard to that effect?

Good. Yes, absolutely. The current standard is garbage, because it tells lies. Like an interface that lies, that is worse than useless. Fix it.

If it's so easy, then do it. Put together a proposal. Not a couple of paragraphs, but an actual, firm, real proposal (one that perhaps doesn't use ALL CAPS as much as you do here. Seriously, why does BASIC need to be capitalized?). One with at least a good-faith attempt at standards wording. One that results in a coherent object model.

I pretty much just did. But sure. If you can guarantee that it will get in front of the top guy, and I'll get a chance to argue common sense and the position of the majority of the user base in person, no problem. Like any organisation, speaking to anyone other than the decision maker wastes everyone's time. Who is the decision-maker here, or the person to whose opinion everyone on the committee will submit? Show me that and I will teach you how it's done.

It doesn't solve my "pet problem". Your change only affects non-"object"s; `vector<T>` needs to work whether `T` is an "object" type or a "BASIC POD" type.

If vector<T> can't be implementing in c++, it's because you've broken c++. Let's repair the damage before you guys make it worse.

Pointer arithmetic is legal within arrays, which is fine. The problem is with things that are array-like but aren't technically arrays.

Pointers are index registers. End of story. If you add the size of an object to one, you get the address of the adjacent object. A 3-year-old can understand this. <sigh>

Suggesting that we fundamentally re-architect the entire object model just to make `vector` work is like using a nuke on a mountain, then rebuilding that mountain with a tunnel in it. It'd be a lot easier and less radioactive to just drill a tunnel.

It's not just vector. The entire edifice upon which c++ depends (interaction with C) is broken. I say we stop whinging and fix it.

std::trivial_copy_assign_strict<int>
 
Sure, let's type 40 cryptic characters where none would do.

That's not legal C++.

And yet it works, is expressive and necessary if you want to describe variable length buffers, which are incredibly useful, without resorting to casts. 

I thought you were against casts?.

Stop arguing the unarguable. The standard is broken. Let's fix it.





Richard Hodges

unread,
Jan 18, 2017, 3:10:10 PM1/18/17
to std-dis...@isocpp.org
I don't even know what you mean by "a union of each other", 

But I think you do.

> but that sounds very much like "let's pretend strict aliasing doesn't exist". So not gonna happen.

Strict aliasing is another matter entirely. Clearly, writing to the same memory through dissimilar pointers in the same loop gives optimisers more problems when tracking aliasing. By all means let's disallow that. By all means let's assume that a block of memory has only one "shape" at a time, for a given logical operation on it.

And when a user chooses to say something different, by casting a pointer or reference to that memory, let's flush any pending writes at that point. Logically there is no difference between casting POD pointers pointing to the same memory, and changing the element addressed in a union.

Even if there were, imbuing a POD with extern "C" should still make it behave like C memory, because that's obviously what it is.




 




--

Hyman Rosen

unread,
Jan 18, 2017, 5:00:30 PM1/18/17
to std-dis...@isocpp.org
I would prefer if the Standard said something like this:
    For any trivially copyable type T, a suitably-aligned region of storage with size sizeof(T) is said to contain an object of type T if the region contains the value representation of a value of T.  If a region of storage contains an object of type T, a valid pointer to an object of type cv T* may be obtained by applying static_cast<cv T*> to a pointer to the address of the region.  The cv qualifiers on the pointer and cast must be at least as strict as those on the region of storage.
    If the storage of an object of union type contains an object of the type of a member, that member is active.

Hyman Rosen

unread,
Jan 18, 2017, 5:06:49 PM1/18/17
to std-dis...@isocpp.org
Given my wording and struct A { int a[2]; int b; }; struct B { int a; int b[2]; };, it would be the case that any region of storage containing an A would also contain a B.  It's in that sense that A and B are "unions of each other".

Hyman Rosen

unread,
Jan 18, 2017, 5:09:36 PM1/18/17
to std-dis...@isocpp.org
I mis-worded slightly:
    For any trivially copyable type T, a suitably-aligned region of storage with size sizeof(T) is said to contain an object of type T if the region contains the value representation of a value of T.  If a region of storage contains an object of type T, a valid pointer to an object of type cv T* may be obtained by applying static_cast<cv T*> to the address of the region.  The cv qualifiers on the pointer and cast must be at least as strict as those on the region of storage.

    If the storage of an object of union type contains an object of the type of a member, that member is active.

Hyman Rosen

unread,
Jan 18, 2017, 5:33:25 PM1/18/17
to std-dis...@isocpp.org
And lets make it
    ...with size at least sizeof(T)...
so that we can speak of the region of storage of a union containing a T.

Hyman Rosen

unread,
Jan 18, 2017, 5:41:42 PM1/18/17
to std-dis...@isocpp.org
Grr...  I'll get the wording right sooner or later...
    For any trivially copyable type T, a suitably-aligned region of storage with size at least sizeof(T) is said to contain an object of type T if the initial sizeof(T) part of the region contains the value representation of a value of T.  If a region of storage contains an object of type T, a valid pointer to an object of type cv T* may be obtained by applying static_cast<cv T*> to the address of the region.  The cv qualifiers on the pointer and cast must be at least as strict as those on the region of storage.

    If the storage of an object of union type contains an object of the type of a member, that member is active.
It is loading more messages.
0 new messages