Split trivially-copyable into consistent-layout

113 views
Skip to first unread message

Myriachan

unread,
May 18, 2015, 4:57:20 PM5/18/15
to std-pr...@isocpp.org
This is similar to a previous idea I had, but I think that the kinks are worked out better this time.


I think that we should split trivially-copyable into a new category, "consistent-layout", separating out the memory layout portions of trivially-copyable from triviality of copying.  I feel that this would make certain aspects of C++ more orthogonal, as well as put firm grounding in certain existing practices among programmers.  My proposed definitions are below.

A consistent-layout type is one that occupies contiguous bytes of storage and whose subobjects are at a consistent constant-expression offset from the beginning of the object.  Access-specifiers do not matter; while it is true that they may cause ordering differences, they do not affect the fact that there exists a fixed memory offset to each member.  Similarly for empty base class weirdness.

If a class has no virtual functions and no virtual base classes, it is a consistent-layout class.  Consistent-layout types consist of consistent-layout classes and fundamental types.

Obviously, offsetof() is redefined to work with consistent-layout classes instead of standard-layout, with undefined behavior otherwise.  Most compilers are (conceptually) consistent-layout with classes that have virtual functions, and can implement offsetof() on those, too, but that's within the purview of undefined behavior.

Trivially-copyable does not need redefinition.  Its current definition implies consistent-layout.

By this definition, trivially-copyable and standard-layout are strict subsets of consistent-layout, and have an intersection between each other called plain-old-data.


Consistent-layout and memcpy() / memmove():

Something important is that a consistent-layout class has no internal compiler bookkeeping relative to a group of unrelated variables of fundamental type.  Constructors and destructors do nothing magical to a consistent-layout class's members.  Calling the constructor isn't critical to making accessing a class's members defined behavior; initialization is the critical operation.

Thus, consistent-layout classes are compatible with memcpy() / memmove().  Using them to copy an instance is equivalent to doing a deep memberwise copy at the fundamental type level.  In this sense, all consistent-layout classes are compatible with memcpy().  If there's undefined behavior, it's from not using a constructor when you should have, or similar, not from the language itself.  Thus trivially-copyable isn't really special; it's just a derived consequence.  Also, this means that it's safe for a class's copy constructor to std::memcpy(this, &other, sizeof(*this)).


Just my thoughts on this.

Melissa

Nicol Bolas

unread,
May 18, 2015, 5:35:06 PM5/18/15
to std-pr...@isocpp.org


On Monday, May 18, 2015 at 4:57:20 PM UTC-4, Myriachan wrote:
This is similar to a previous idea I had, but I think that the kinks are worked out better this time.


I think that we should split trivially-copyable into a new category, "consistent-layout", separating out the memory layout portions of trivially-copyable from triviality of copying.

Um, that's already separated. There's the concept of "standard layout". It means almost exactly the same thing as you want, and offsetof is defined to work with such types.

The reason that standard layout does not allow for non-empty base classes or public/private shenanigans is (in part) because `offsetof` works with them. It's a macro; it's not smart enough to know how to actually do the computation for things like public/private (because implementations can reorder them) and base classes can be placed in front of or behind other classes. Both of these confound a macro like `offsetof`.

Myriachan

unread,
May 18, 2015, 7:26:08 PM5/18/15
to std-pr...@isocpp.org

The current definition of trivially-copyable is such that trivially-copyable classes must be (conceptually) compatible with offsetof(), because otherwise they don't comply with the Standard in other ways.  With trivially-copyable types, the compiler already must be assigning fixed offsets to each member, and keeping these offsets for the lifetime of the program.  If the compiler did not do this, Standard-compliant programs would break.

To prove this, let's say that at two different places in a program, trivially-copyable but non-standard-layout class Meow's member "y" is at two different offsets "a" and "b".  We can construct a contradiction as below.  In the below program, if offset1 ever differs from offset2, because the compiler has taken liberties with the layout of Meow, then either the first operation in main() will break, or the second operation in main() will break.  That is, some sort of undefined behavior will occur.

However, this is a contradiction.  All the below operations are legal, as far as I know.  You are allowed to alias pointers with unsigned char.  You may also copy trivially-copyable types with memcpy().  memcpy()ing a trivially-copyable type results in the destination holding the same value.  This does not affect whether the destination has been constructed or destructed.  Thus, object pointers to non-static data members remain valid across a memcpy() operation--py1 and py2 keep pointing to meow1.y and meow2.y respectively, even after the data from the allegedly-different offset y is copied.  If this were the case, these pointers could not be correct, which results in the contradiction.

The end result is that trivially-copyable types must be implemented with each member (and base-class subobject) having a fixed offset for the ability to be memcpy()'d to make any sense at all.  This is the case in the Standard right now, because of what I showed.

The reason I think it'd be worthwhile to make a "consistent-layout" category is to formally define what makes a class have a memory layout that matches with the concept of underlying bytes, rather than this halfway concept we have right now.  Standard-layout doesn't cover all the useful cases.


#include <cassert>
#include <cmath>
#include <cstddef>
#include <cstdio>
#include <cstring>
#include <type_traits>

struct Meow
{
   
int x;
   
float GetY() const { return y; }
protected:
   
float y;
   
friend void Location1();
   
friend void Location2();
};

static_assert(std::is_trivially_copyable<Meow>::value);

static union
{
   
unsigned char meowBytes1[sizeof(Meow)];
   
Meow meow1;
};
std
::ptrdiff_t offset1;
unsigned char blob1[sizeof(Meow)];
unsigned char *py1;

static union
{
   
unsigned char meowBytes2[sizeof(Meow)];
   
Meow meow2;
};
std
::ptrdiff_t offset2;
unsigned char blob2[sizeof(Meow)];
unsigned char *py2;

void Location1()
{
    meow1
.x = 1;
    meow1
.y = std::exp(1.0f);
    py1
= reinterpret_cast<unsigned char *>(&meow1.y);

    offset1
= py1 - meowBytes1;
    std
::memcpy(blob1, &meow1, sizeof(meow1));
}

void Location2()
{
    meow2
.x = 2;
    meow2
.y = 2 * std::asin(1.0f);
    py2
= reinterpret_cast<unsigned char *>(&meow2.y);

    offset2
= py2 - meowBytes2;
    std
::memcpy(blob2, &meow2, sizeof(meow2));
}

int main()
{
   
Location1();
   
Location2();

   
float f1 = 123.0f;
    std
::memcpy(&meow1, blob2, sizeof(meow1));
    std
::memcpy(py1, &f1, sizeof(f1));

   
float f2 = 321.0f;
   
unsigned char temp[sizeof(meow2)];
    std
::memcpy(temp, blob1, sizeof(temp));
    std
::memcpy(&temp[offset2], &f2, sizeof(f2));
    std
::memcpy(&meow2, temp, sizeof(meow2));

    std
::printf("%f %f\n", meow1.GetY(), meow2.GetY());
    std
::printf("%td %td\n", offset1, offset2);

   
return 0;
}


Melissa

Nicol Bolas

unread,
May 18, 2015, 9:50:24 PM5/18/15
to std-pr...@isocpp.org
On Monday, May 18, 2015 at 7:26:08 PM UTC-4, Myriachan wrote:
On Monday, May 18, 2015 at 2:35:06 PM UTC-7, Nicol Bolas wrote:
On Monday, May 18, 2015 at 4:57:20 PM UTC-4, Myriachan wrote:
This is similar to a previous idea I had, but I think that the kinks are worked out better this time.


I think that we should split trivially-copyable into a new category, "consistent-layout", separating out the memory layout portions of trivially-copyable from triviality of copying.

Um, that's already separated. There's the concept of "standard layout". It means almost exactly the same thing as you want, and offsetof is defined to work with such types.

The reason that standard layout does not allow for non-empty base classes or public/private shenanigans is (in part) because `offsetof` works with them. It's a macro; it's not smart enough to know how to actually do the computation for things like public/private (because implementations can reorder them) and base classes can be placed in front of or behind other classes. Both of these confound a macro like `offsetof`.

The current definition of trivially-copyable is such that trivially-copyable classes must be (conceptually) compatible with offsetof(), because otherwise they don't comply with the Standard in other ways.  With trivially-copyable types, the compiler already must be assigning fixed offsets to each member, and keeping these offsets for the lifetime of the program.  If the compiler did not do this, Standard-compliant programs would break.

To prove this, let's say that at two different places in a program, trivially-copyable but non-standard-layout class Meow's member "y" is at two different offsets "a" and "b".

Stop. Why would it be at two different offsets? Nobody has claimed that it could be at two different offsets. Indeed, with the exception of virtual inheritance, there's no circumstance where the offset for a member, given a pointer to that type, could be different.

So why are you trying to show that a situation is impossible, if nobody's claiming that it is possible?

`offsetof` is not a compiler intrinsic that is capable of accessing compiler data; it is nothing more than a C macro (which is why it doesn't give a static_assert or error if you give it a non-standard layout type). And while the compiler is capable of compile-time computing the offset for any non-virtually-inherited member, a simple macro cannot.

Remember: `offsetof` is from C. So it has to live by C's rules and behave as C did. Standard layout types are capable of using it because standard layout rules effectively define a C-style struct, only in that they allow a few more C++-isms that don't affect layout.

That's the entire point of the standard layout classification: that the layout follows the standard C rules. And `offsetof` is based on those rules.

If you want an `offsetof` equivalent that can access trivially copyable types, then you should ask for that. Though I'm not really sure why you'd want that; your example code doesn't explain why.

David Krauss

unread,
May 18, 2015, 10:22:15 PM5/18/15
to std-pr...@isocpp.org
On 2015–05–19, at 9:50 AM, Nicol Bolas <jmck...@gmail.com> wrote:

`offsetof` is not a compiler intrinsic that is capable of accessing compiler data; it is nothing more than a C macro (which is why it doesn't give a static_assert or error if you give it a non-standard layout type).

offsetof is a macro (#ifdef will recognize it) which expands to an intrinsic. The pointer arithmetic implementation is UB and went out of style long ago. Also, in both languages it produces a constant expression. In C++ the requisite void* cast is not allowed, even before it could produce UB. In C the pointer arithmetic does not meet the constant expression requirements, but it leaves constant evaluation open-ended so it’s still implementation-specific.

Clang (by default) warns on all invalid use of offsetof and produces a hard error when you apply it to a virtually inherited member. GCC does not implement the warning, but the error is the same.

Nicol Bolas

unread,
May 19, 2015, 1:03:08 AM5/19/15
to std-pr...@isocpp.org, pot...@mac.com
On Monday, May 18, 2015 at 10:22:15 PM UTC-4, David Krauss wrote:
On 2015–05–19, at 9:50 AM, Nicol Bolas <jmck...@gmail.com> wrote:

`offsetof` is not a compiler intrinsic that is capable of accessing compiler data; it is nothing more than a C macro (which is why it doesn't give a static_assert or error if you give it a non-standard layout type).

offsetof is a macro (#ifdef will recognize it) which expands to an intrinsic.

No, it does not; it expands into something implementation-dependent. This could be a compiler intrinsic. Or:
 
The pointer arithmetic implementation is UB and went out of style long ago.

Undefined behavior can be defined by a particular implementation if they so choose. Even if it's just for that particular expression.

So it's perfectly legal for `offsetof` to expand into an expression that is undefined behavior by the standard, so long as `offsetof`'s use of that expression yields the correct offset for standard layout types and members.
 
Clang (by default) warns on all invalid use of offsetof and produces a hard error when you apply it to a virtually inherited member. GCC does not implement the warning, but the error is the same.

When I said that it doesn't give an error, I mean that it isn't required by the standard to do so. It's considered UB, not a compilation failure.

Douglas Boffey

unread,
May 19, 2015, 4:52:18 AM5/19/15
to std-pr...@isocpp.org


On Monday, 18 May 2015 21:57:20 UTC+1, Myriachan wrote:
This is similar to a previous idea I had, but I think that the kinks are worked out better this time.


I think that we should split trivially-copyable into a new category, "consistent-layout", separating out the memory layout portions of trivially-copyable from triviality of copying.  I feel that this would make certain aspects of C++ more orthogonal, as well as put firm grounding in certain existing practices among programmers.  My proposed definitions are below.
 
A consistent-layout type is one that occupies contiguous bytes of storage

Just a small point, but, unless char and long long have the same alignment,

struct Purr {
 
char c;
 
long long ll;
};


does not occupy contiguous bytes of storage ;)

David Krauss

unread,
May 19, 2015, 9:11:45 AM5/19/15
to std-pr...@isocpp.org, Nicol Bolas
On 2015–05–19, at 1:03 PM, Nicol Bolas <jmck...@gmail.com> wrote:

Undefined behavior can be defined by a particular implementation if they so choose. Even if it's just for that particular expression.

Not within constant expression evaluation. An expression that invokes UB is not a constant expression, even if it’s defined by the implementation. Before that rule existed, C++98 specifically said “pointers… shall not be used” in an integral constant expression.

In standard C++, offsetof has always been “nearly” an operator. It just doesn’t get respect because of its heritage and its rivalry with PTMs. (There committee did endorse an evolutionary direction for PTMs years ago, to supersede offsetof once and for all, which has been discussed here more recently.)

Casey Carter

unread,
May 19, 2015, 10:43:19 AM5/19/15
to std-pr...@isocpp.org
On Tuesday, May 19, 2015 at 3:52:18 AM UTC-5, Douglas Boffey wrote:

Just a small point, but, unless char and long long have the same alignment,

struct Purr {
 
char c;
 
long long ll;
};


does not occupy contiguous bytes of storage ;)

Every revision of the C++ standard disagrees with you. C++98 1.8/5 (ignoring for the moment that C++98 has no "long long" type):

... An object of POD type shall occupy contiguous storage.

N4431 (The C++ WP from the 2015-04 pre-meeting mailing) 1.8/5:

... An object of trivially copyable or standard-layout type shall occupy contiguous bytes of storage.

The potential existence of padding bytes between the members means that the members may not be contiguous, but the objects themselves necessarily occupy contiguous bytes of storage.

Thiago Macieira

unread,
May 19, 2015, 11:30:36 AM5/19/15
to std-pr...@isocpp.org
Or, if you read it in the negative:

there are no bytes between c and ll that aren't part of the object (or, worse,
belong to another object).

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358

Myriachan

unread,
May 19, 2015, 3:10:01 PM5/19/15
to std-pr...@isocpp.org

Additionally (1.7/1):

The fundamental storage unit in the C++ memory model is the byte.  A byte is at least large enough to contain any member of the basic execution character set (2.3) and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is implementation-defined. ...


(5.3.3/1):

The sizeof operator yields the number of bytes in the object representation of its operand. ... sizeof(char), sizeof(signed char) and sizeof(unsigned char) are 1. ...


So a char or unsigned char array can be used to hold any trivially-copyable type, which is a key part of 1.8/5 and 3.9/2, and what I was proposing with "consistent-layout".

Melissa

Myriachan

unread,
May 19, 2015, 4:22:18 PM5/19/15
to std-pr...@isocpp.org, jmck...@gmail.com, pot...@mac.com

As much as I'd appreciate something like "T void::*", that wouldn't help with interfacing with lower-level code that continues to work with structure offsets.  I think the best way is to fix the minor wart with offsetof, then say, "but look, here's a much better C++ way to do it!" with "T void::*" or whatever the syntax will be.

My proposal with "consistent-layout" is to ground the current situation in reality.  Compilers already have to choose a fixed layout for trivially-copyable classes in order to be compatible with memcpy(); this just formalizes it.  Additionally, this proposal would answer Core errata 1701.

Compilers would still be free to put padding in "consistent-layout" classes where they need to, reorder fields as desired (subject to the member order rule, 9.2/13) and insert hidden fields.  They just wouldn't be able to use structures that can change at runtime, such that offsetof() would work, as would aliasing the class with a char pointer to access individual members, if you know the offset.  (Undefined behavior would result from accessing the dead space of a non-trivially-copyable class, of course.)

By the way, is the following offsetof() implementation implementation-defined rather than undefined, by 5.2.10/5 ("mappings between pointers and integers are otherwise implementation-defined")?  Compilers that allow reinterpret_cast in constexpr (GCC, MSVC, but not clang), which is illegal by the Standard, allow this macro in constant expressions.

#define myria_offsetof(type, ...) \
   
(static_cast< ::std::size_t>( \
       
reinterpret_cast<unsigned char *>( \
           
(&reinterpret_cast<type *>(
               
static_cast< ::std::uintptr_t>(alignof(type)))->__VA_ARGS__)) - \
       
reinterpret_cast<unsigned char *>(
           
static_cast< ::std::uintptr_t>(alignof(type)))))

Not that I'm recommending such a thing at all...it's rather messy =)  A compiler intrinsic is definitely the way to go.  (The use of alignof(type) is to avoid undefined behavior with null pointer arithmetic.)

Melissa

Nicol Bolas

unread,
May 19, 2015, 4:33:01 PM5/19/15
to std-pr...@isocpp.org, jmck...@gmail.com, pot...@mac.com


On Tuesday, May 19, 2015 at 4:22:18 PM UTC-4, Myriachan wrote:
On Tuesday, May 19, 2015 at 6:11:45 AM UTC-7, David Krauss wrote:

On 2015–05–19, at 1:03 PM, Nicol Bolas <jmck...@gmail.com> wrote:

Undefined behavior can be defined by a particular implementation if they so choose. Even if it's just for that particular expression.

Not within constant expression evaluation. An expression that invokes UB is not a constant expression, even if it’s defined by the implementation. Before that rule existed, C++98 specifically said “pointers… shall not be used” in an integral constant expression.

In standard C++, offsetof has always been “nearly” an operator. It just doesn’t get respect because of its heritage and its rivalry with PTMs. (There committee did endorse an evolutionary direction for PTMs years ago, to supersede offsetof once and for all, which has been discussed here more recently.)

As much as I'd appreciate something like "T void::*", that wouldn't help with interfacing with lower-level code that continues to work with structure offsets.  I think the best way is to fix the minor wart with offsetof, then say, "but look, here's a much better C++ way to do it!" with "T void::*" or whatever the syntax will be.

My proposal with "consistent-layout" is to ground the current situation in reality.  Compilers already have to choose a fixed layout for trivially-copyable classes in order to be compatible with memcpy(); this just formalizes it.  Additionally, this proposal would answer Core errata 1701.

I don't understand this. With the exception of members of virtual base classes, compilers have to choose a fixed layout for all classes. But having a fixed layout and being copyable isn't the same thing. That's why it's "trivially copyable" that decides whether memcpy works, not "standard layout".

If you want to expand `offsetof` to make it work for trivially copyable classes (all standard layout types are also trivially copyable, IIRC), well, OK, but it's kind of a C thing. I'd much rather see a proper C++ intrinsic (vis-a-vis reflection) than to reuse a macro.

But if you want a new value category that is "ground[ed] the current situation in reality"... then really, it should work for any type+member, where the referenced member is not a member of a virtual base class. So it wouldn't even be a type category, as it depends on the relationship between the type and the specified member.

Ville Voutilainen

unread,
May 19, 2015, 5:04:31 PM5/19/15
to std-pr...@isocpp.org
On 19 May 2015 at 23:33, Nicol Bolas <jmck...@gmail.com> wrote:
> If you want to expand `offsetof` to make it work for trivially copyable
> classes (all standard layout types are also trivially copyable, IIRC), well,

That's not correct. A standard-layout type can easily be non-trivially copyable.
[class]/10 has an example.

Nicol Bolas

unread,
May 19, 2015, 5:15:47 PM5/19/15
to std-pr...@isocpp.org

OK, good point.

That being said, you could still expand the definition of `offsetof` (or better yet, make a new thing that is a real intrinsic) which can get a compile-time offset from any type+member, as long as it doesn't have to go through a virtual base class. That's a superset of both standard layout and trivially copyable.

Myriachan

unread,
May 19, 2015, 8:12:04 PM5/19/15
to std-pr...@isocpp.org

The current rule only requires that trivially-copyable and standard-layout types occupy contiguous memory (1.8/5):

... An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage.

In other words, types with nontrivial copy constructors or destructors, or classes with virtual functions, etc. yet don't have virtual base classes can have strange layouts under current rules.  They can be implemented with internal hidden pointers similar to the way some compilers implement virtual base classes if the compiler so chooses.  Such an implementation is silly, particularly if virtual functions are not involved, but doesn't appear to be barred by anything in the Standard.

My "consistent-layout" proposal would arrange things as follows.  "Consistent-layout" is a class with no virtual functions and no virtual base classes; thus, standard-layout and trivially-copyable are proper subsets.

A: All data members go in order.  (If not A, member order is still somewhat restricted by 9.2/13.)
B: Two classes with the same initial sequence of members of similar types are layout-compatible in those members.
C: A class instance may be copied to another with memcpy().
D: A class instance may safely be copied to a char array and back.
E: offsetof is permitted.
F: A data member of a class instance may be accessed by aliasing the class with a char pointer and adding the member's offset. (*)

Plain old data: ABCDEF
Standard-layout: ABEF
Trivially-copyable: CDEF
Consistent-layout: EF
Polymorphic: conditionally supported: EF, otherwise naught
With virtual bases: naught

(*) Base class subobjects may not be accessed in this manner, but their members, if any, may be.  Which, by the way, is something that offsetof can do and member pointers still cannot... =/

Melissa

Nicol Bolas

unread,
May 19, 2015, 9:10:23 PM5/19/15
to std-pr...@isocpp.org
On Tuesday, May 19, 2015 at 8:12:04 PM UTC-4, Myriachan wrote:
On Tuesday, May 19, 2015 at 2:15:47 PM UTC-7, Nicol Bolas wrote:
On Tuesday, May 19, 2015 at 5:04:31 PM UTC-4, Ville Voutilainen wrote:
That's not correct. A standard-layout type can easily be non-trivially copyable.
[class]/10 has an example.

OK, good point.

That being said, you could still expand the definition of `offsetof` (or better yet, make a new thing that is a real intrinsic) which can get a compile-time offset from any type+member, as long as it doesn't have to go through a virtual base class. That's a superset of both standard layout and trivially copyable.

The current rule only requires that trivially-copyable and standard-layout types occupy contiguous memory (1.8/5):

... An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage.

In other words, types with nontrivial copy constructors or destructors, or classes with virtual functions, etc. yet don't have virtual base classes can have strange layouts under current rules.  They can be implemented with internal hidden pointers similar to the way some compilers implement virtual base classes if the compiler so chooses.  Such an implementation is silly, particularly if virtual functions are not involved, but doesn't appear to be barred by anything in the Standard.

True. But the reason the C++98/03 notion of POD said something similar. Yet in C++11, they changed the rules, so that more types could be considered to have "contiguous memory". The reason they changed those rules was because compilers already handled them.

Do you know of compilers that implement classes, such that the offset from a pointer to type T to any non-virtual base member is not a compile-time constant? If all compilers implement classes such that the non-virtual-base members of any particular class have a consistent byte offset, then you don't need this classification at all. The only thing you need is a new version of `offsetof` that can take any type/member pair, so long as the member is in a virtual base of the type.

Is your choice of excluding all polymorphic types based on actual knowledge of how compilers work? Or is it based on something else?

I say this because, as previously stated, I've worked on a serialization system that had to write data via code from one compiler, and read it with code from another one. Or rather, code from another three compilers. That compiled code for three very different CPUs, running on three very different systems.

And we managed to get virtuals working (it required lots and lots of UB, but we did it). Now, we didn't try virtual inheritance. But we had regular inheritance going, along with virtual functions. At no time were we stopped by a compiler that didn't have static offsets for members of polymorphic types.

Note that these compilers did have lots of differences. Of particular note were variances between where the base class goes relative to the derived. On one platform, it was after, while on the others, it was before.

And yet, at no time were there non-static offsets for members of polymorphic types.

So do you have actual experience with actual compilers that show that compilers will have variances regarding polymorphic types? If you haven't done the research, then my first suggestion would be to actually do the research and find out where the real intersection of your features E&F are. Not merely to think you know it, but to actually know where that intersection is in actual, live compilers.

Also, from the examples you've given, I smell some kind of horrible serialization system behind this request, one that uses byte offsets and such to serialize members and so forth, rather than being hard-coded to the member names. My concern there is that reflection will basically do everything you need, and do it better, in a way that won't require this kind of low-level fiddling.

This would also explain why you're willing to sacrifice polymorphic types entirely. Reconstructing such data from a serialized object would be very difficult. So your serialization system would consider it out of scope. Thus, it's not part of your proposal.

If so, maybe it would be better to wait for the right solution (reflection) than to modify the standard in such a way. I would hate for the committee to standardize something, only for another feature to come along and make it a moot point (re: std::bind. Good idea at the time, dumb idea with lambdas around).

Zhihao Yuan

unread,
May 19, 2015, 9:17:43 PM5/19/15
to std-pr...@isocpp.org
On Tue, May 19, 2015 at 8:12 PM, Myriachan <myri...@gmail.com> wrote:
>
> My "consistent-layout" proposal would arrange things as follows.
> "Consistent-layout" is a class with no virtual functions and no virtual base
> classes; thus, standard-layout and trivially-copyable are proper subsets.
>
> A: All data members go in order. (If not A, member order is still somewhat
> restricted by 9.2/13.)
> B: Two classes with the same initial sequence of members of similar types
> are layout-compatible in those members.
> C: A class instance may be copied to another with memcpy().
> D: A class instance may safely be copied to a char array and back.
> E: offsetof is permitted.
> F: A data member of a class instance may be accessed by aliasing the class
> with a char pointer and adding the member's offset. (*)
>
> Plain old data: ABCDEF
> Standard-layout: ABEF
> Trivially-copyable: CDEF
> Consistent-layout: EF
> Polymorphic: conditionally supported: EF, otherwise naught
> With virtual bases: naught

Although I'm not a core guy, the explanation is very clear to me.
Even we don't like offsetof right now, such a proposal may
benefit future language extensions like reflection.

>
> (*) Base class subobjects may not be accessed in this manner, but their
> members, if any, may be. Which, by the way, is something that offsetof can
> do and member pointers still cannot... =/
>
> Melissa


--
Zhihao Yuan, ID lichray
The best way to predict the future is to invent it.
___________________________________________________
4BSD -- http://bit.ly/blog4bsd

Thiago Macieira

unread,
May 19, 2015, 9:28:52 PM5/19/15
to std-pr...@isocpp.org
On Tuesday 19 May 2015 17:12:04 Myriachan wrote:
> A: All data members go in order. (If not A, member order is still somewhat
> restricted by 9.2/13.)
> B: Two classes with the same initial sequence of members of similar types
> are layout-compatible in those members.
> C: A class instance may be copied to another with memcpy().
> D: A class instance may safely be copied to a char array and back.
> E: offsetof is permitted.
> F: A data member of a class instance may be accessed by aliasing the class
> with a char pointer and adding the member's offset.
>
> Plain old data: ABCDEF
> Standard-layout: ABEF
> Trivially-copyable: CDEF
> Consistent-layout: EF
> Polymorphic: conditionally supported: EF, otherwise naught
> With virtual bases: naught

I don't think we need to create a new type for this. Simply change 18.2
[support.types] p4:

- If type is not a standard-layout class (Clause 9), the results are
undefined.
+ If type is not a standard-layout or trivially copyable class (Clause 9), the
results are undefined.

Or refer to the contiguous storage definition from 1.8/5 that you quoted.

I personally don't think we should overcomplicate this because offsetof is
supposed to be low-level C code. C code shouldn't be dealing with more complex
C++ types -- for that, we can have a C++ solution, including the subtraction
operator with pointer-to-members.

struct S
{
int i;
};
void f()
{
S s{0};
auto pm1 = &S::i;
int *p1 = pm1 - &s;

*p1 = 1;
assert(s.i == 1);
}

This bypasses the need for F.

I don't like the void pointer-to-member suggestion because it erases the type
like plain void* does. And I can't think of other use-cases where I'd like to
store type-erased offsets, not real pointers -- though it might be interesting
to have

template <type Member, typename Outer> ptrdiff_t
make_offset(Outer *object, Member Outer:: *pointer_to_member);
template <type Member, typename Outer> Member Outer::*
make_pointer_to_member(Outer *object, ptrdiff_t offset);

On quite a few ABIs, the pointer to member *is* an offset anyway and those two
functions would ignore the first argument and return a reinterpreted second
argument.

David Krauss

unread,
May 19, 2015, 9:53:59 PM5/19/15
to std-pr...@isocpp.org

On 2015–05–20, at 4:22 AM, Myriachan <myri...@gmail.com> wrote:

As much as I'd appreciate something like "T void::*", that wouldn't help with interfacing with lower-level code that continues to work with structure offsets.  I think the best way is to fix the minor wart with offsetof, then say, "but look, here's a much better C++ way to do it!" with "T void::*" or whatever the syntax will be.

Given reinterpret_cast between size_t and PTM types, you can interoperate the C and C++ solutions. I don’t see why that shouldn’t work. You can even try it now, by casting to reference type.

Myriachan

unread,
May 19, 2015, 10:12:40 PM5/19/15
to std-pr...@isocpp.org
On Tuesday, May 19, 2015 at 6:10:23 PM UTC-7, Nicol Bolas wrote:
Do you know of compilers that implement classes, such that the offset from a pointer to type T to any non-virtual base member is not a compile-time constant? If all compilers implement classes such that the non-virtual-base members of any particular class have a consistent byte offset, then you don't need this classification at all. The only thing you need is a new version of `offsetof` that can take any type/member pair, so long as the member is in a virtual base of the type.


No, I'm not aware of any such implementations.  I think that offsetof() fails on some compilers that don't strictly enforce the rules if attempted on a class with virtual bases, even if the chosen member is not a member of one of the virtual bases...?


Is your choice of excluding all polymorphic types based on actual knowledge of how compilers work? Or is it based on something else?

No, it's based on what I've tried to interpret of the Standard, and from various discussions of the topic on these mailing lists.  In particular, the exclusion of classes with virtual functions was because of objections raised in a previous thread, particularly by Thiago, if my memory is correct.  His (I think) point of view on the matter made sense to me--the Standard does not specify how virtual dispatch works, and crazy implementations are legal, so all bets are off when it comes to classes with virtual functions.  Classes with virtual bases, of course, go crazy in both theory and in reality.

 
Note that these compilers did have lots of differences. Of particular note were variances between where the base class goes relative to the derived. On one platform, it was after, while on the others, it was before.

And yet, at no time were there non-static offsets for members of polymorphic types.

Yes, that may be true, but the Standard does not appear to exclude this possibility.  The Standard does not even explicitly exclude this possibility for trivially-copyable types, but that proof earlier in the thread shows that this exclusion can be derived.  The exclusion should probably be stated for clarity, in my opinion; "consistent-layout" would do this.

 
So do you have actual experience with actual compilers that show that compilers will have variances regarding polymorphic types? If you haven't done the research, then my first suggestion would be to actually do the research and find out where the real intersection of your features E&F are. Not merely to think you know it, but to actually know where that intersection is in actual, live compilers.

Also, from the examples you've given, I smell some kind of horrible serialization system behind this request, one that uses byte offsets and such to serialize members and so forth, rather than being hard-coded to the member names. My concern there is that reflection will basically do everything you need, and do it better, in a way that won't require this kind of low-level fiddling.

No, it's actually not about that.  It's about having a meaningful low-level memory layout for classes.  This sort of thing is used for custom container classes for optimization, and also to interface with C code or operating system calls.

I have no reason to expect that E (offsetof) will fail on any compiler (in concept, though it may fail due to enforcement of 18.2/4's undefined behavior as a hard error).

F (indirect access to members via char array) might fail on a compiler due to pointer aliasing and arithmetic weirdness with a few rules that I think should be changed.  If instead the class is part of a union with a char array of the same size, and the constructor is called appropriately, then I expect F to always work.  The pointer arithmetic of accessing that array through the offset may be currently undefined due to some ambiguities of the Standard (e.g. Core errata 1701), but no compiler I know of has problems with this, because char pointers are allowed to alias.


This would also explain why you're willing to sacrifice polymorphic types entirely. Reconstructing such data from a serialized object would be very difficult. So your serialization system would consider it out of scope. Thus, it's not part of your proposal.

If so, maybe it would be better to wait for the right solution (reflection) than to modify the standard in such a way. I would hate for the committee to standardize something, only for another feature to come along and make it a moot point (re: std::bind. Good idea at the time, dumb idea with lambdas around).

I completely agree with this.  It would be insane to try to implement reflection as a hack involving offsets rather than get a real implementation of reflection.  But that's not what I'm trying to do.

Melissa

Myriachan

unread,
May 19, 2015, 10:46:03 PM5/19/15
to std-pr...@isocpp.org
On Tuesday, May 19, 2015 at 6:28:52 PM UTC-7, Thiago Macieira wrote:
On Tuesday 19 May 2015 17:12:04 Myriachan wrote:
> F: A data member of a class instance may be accessed by aliasing the class
> with a char pointer and adding the member's offset.

I don't think we need to create a new type for this. Simply change 18.2
[support.types] p4:

-  If type is not a standard-layout class (Clause 9), the results are
undefined.
+ If type is not a standard-layout or trivially copyable class (Clause 9), the
results are undefined.

Or refer to the contiguous storage definition from 1.8/5 that you quoted.

That wouldn't cover the use of offsetof() in non-trivially-copyable non-standard-layout types, but is an improvement.
 

I personally don't think we should overcomplicate this because offsetof is
supposed to be low-level C code. C code shouldn't be dealing with more complex
C++ types -- for that, we can have a C++ solution, including the subtraction
operator with pointer-to-members.

struct S
{
        int i;
};
void f()
{
        S s{0};
        auto pm1 = &S::i;
        int *p1 = pm1 - &s;

        *p1 = 1;
        assert(s.i == 1);
}

This bypasses the need for F.

What does this new subtraction operator do?  Like, how does "pm1 - &s" differ from "&(s.*pm1)"?

Also, keep in mind that to truly replace offsetof(), member pointers would need to gain the ability to refer to members of members.

struct Kitty
{
   
char asdf;
   
double fdsa;
};
struct Meow
{
   
Kitty kitty;
};

double Meow::*member = &Meow::kitty.fdsa; // currently ill-formed
std
::size_t offset = offsetof(Meow, kitty.fdsa); // extension accepted by the compilers I use


The reason for F is to me quite important: it solidifies the existing practice of treating classes as existing in memory as bytes.  If you're allowed to determine the offset, you should be allowed to use the offset.  Allowing aliasing a class with a char pointer and using the offset to a member as the index is how to do that.  It also resolves Core errata 1701--accessing a class's members through the array is defined in a straightforward manner, as is (un)defining what happens regarding accessing the dead space areas of a class.


        template <type Member, typename Outer> ptrdiff_t
                make_offset(Outer *object, Member Outer:: *pointer_to_member);
        template <type Member, typename Outer> Member Outer::*
                make_pointer_to_member(Outer *object, ptrdiff_t offset);

On quite a few ABIs, the pointer to member *is* an offset anyway and those two
functions would ignore the first argument and return a reinterpreted second
argument.


I'd love for these to exist, irrespective of anything else I've mentioned. =)  It's possible to make this as a third-party library, if you do some platform-specific hacks and maybe restrict the types people are allowed to use, but if implemented by the compiler vendor, it'd work great.

What would the first parameter be used for?  Something like, "the returned member pointer shall only be used with objects of the same dynamic type as the dynamic type of 'object', else the behavior is undefined"?

Member pointers are better than offsetof(), but offsetof() shouldn't be locked out.

Love all your input, and thanks,

Melissa

Nicol Bolas

unread,
May 19, 2015, 10:54:02 PM5/19/15
to std-pr...@isocpp.org
On Tuesday, May 19, 2015 at 10:12:40 PM UTC-4, Myriachan wrote:
On Tuesday, May 19, 2015 at 6:10:23 PM UTC-7, Nicol Bolas wrote:
Do you know of compilers that implement classes, such that the offset from a pointer to type T to any non-virtual base member is not a compile-time constant? If all compilers implement classes such that the non-virtual-base members of any particular class have a consistent byte offset, then you don't need this classification at all. The only thing you need is a new version of `offsetof` that can take any type/member pair, so long as the member is in a virtual base of the type.


No, I'm not aware of any such implementations.  I think that offsetof() fails on some compilers that don't strictly enforce the rules if attempted on a class with virtual bases, even if the chosen member is not a member of one of the virtual bases...?


Is your choice of excluding all polymorphic types based on actual knowledge of how compilers work? Or is it based on something else?

No, it's based on what I've tried to interpret of the Standard, and from various discussions of the topic on these mailing lists.  In particular, the exclusion of classes with virtual functions was because of objections raised in a previous thread, particularly by Thiago, if my memory is correct.  His (I think) point of view on the matter made sense to me--the Standard does not specify how virtual dispatch works, and crazy implementations are legal, so all bets are off when it comes to classes with virtual functions.  Classes with virtual bases, of course, go crazy in both theory and in reality.

 
Note that these compilers did have lots of differences. Of particular note were variances between where the base class goes relative to the derived. On one platform, it was after, while on the others, it was before.

And yet, at no time were there non-static offsets for members of polymorphic types.

Yes, that may be true, but the Standard does not appear to exclude this possibility.  The Standard does not even explicitly exclude this possibility for trivially-copyable types, but that proof earlier in the thread shows that this exclusion can be derived.  The exclusion should probably be stated for clarity, in my opinion; "consistent-layout" would do this.

That's why I mentioned the whole POD-to-standard layout thing. C++98/03 didn't specify

If what you want to do is standardize existing practice, you first need to know what existing practice is. Arbitrarily deciding that existing practice for compile-time layout stops at virtual functions seems... arbitrary.
 

 
So do you have actual experience with actual compilers that show that compilers will have variances regarding polymorphic types? If you haven't done the research, then my first suggestion would be to actually do the research and find out where the real intersection of your features E&F are. Not merely to think you know it, but to actually know where that intersection is in actual, live compilers.

Also, from the examples you've given, I smell some kind of horrible serialization system behind this request, one that uses byte offsets and such to serialize members and so forth, rather than being hard-coded to the member names. My concern there is that reflection will basically do everything you need, and do it better, in a way that won't require this kind of low-level fiddling.

No, it's actually not about that.  It's about having a meaningful low-level memory layout for classes.

The only times I ever cared about the low-level layout for memory in a class were when I was:

* Heavily optimizing memory access patterns
* Performing serialization
* Needing to define a C++ type that was layout compatible with a C one.

If you're doing heavy optimization for memory access, you're more likely trying to unbundle structs entirely. Also, base classes and the like are probably not going to happen. We've already talked about serialization. And the last one is covered by "standard layout" and specifically not part of your proposal (see below for more).

So, can you give me an example of how this feature would help you with "low level memory layout?"

This sort of thing is used for custom container classes for optimization, and also to interface with C code or operating system calls.

Well, here's one problem. The whole point of "standard layout" as a separate type classification is that, if your class qualifies, then C++ guarantees that its layout follows C's layout rules. That's why it's important, and that's why in certain code, you need to follow those rules. Because if you do, you get to match C struct layout.

Merely having compile-time-determined offsets for members on types does not mean that these offsets will match C's offsets for similarly constructed types. E&F from your example are not enough to be able to "interface with C code or operating system calls".

Indeed, your list of items proves that. You'd need at least A on that list (order of definition = increasing offsets) before you could declare a C++ type to be layout compatible with a C type. Remember: non-standard layout types are allowed to arbitrarily decide on the order of members. It could sort them by typename in terms of order, and that would be legal.

Standard layout types cannot, because they're required to be layout-compatible with C.

So what you have proposed doesn't solve at least part of the problem you want it to solve.

I'm not sure what you mean by "custom container classes for optimization". Nor am I sure how being able to get at a member by a static offset (which as I understand the feature, is the only thing that "consistent layout" does for you) would aid you in that way.

Nicol Bolas

unread,
May 19, 2015, 10:55:27 PM5/19/15
to std-pr...@isocpp.org


On Tuesday, May 19, 2015 at 10:54:02 PM UTC-4, Nicol Bolas wrote:
On Tuesday, May 19, 2015 at 10:12:40 PM UTC-4, Myriachan wrote:
On Tuesday, May 19, 2015 at 6:10:23 PM UTC-7, Nicol Bolas wrote:
Do you know of compilers that implement classes, such that the offset from a pointer to type T to any non-virtual base member is not a compile-time constant? If all compilers implement classes such that the non-virtual-base members of any particular class have a consistent byte offset, then you don't need this classification at all. The only thing you need is a new version of `offsetof` that can take any type/member pair, so long as the member is in a virtual base of the type.


No, I'm not aware of any such implementations.  I think that offsetof() fails on some compilers that don't strictly enforce the rules if attempted on a class with virtual bases, even if the chosen member is not a member of one of the virtual bases...?


Is your choice of excluding all polymorphic types based on actual knowledge of how compilers work? Or is it based on something else?

No, it's based on what I've tried to interpret of the Standard, and from various discussions of the topic on these mailing lists.  In particular, the exclusion of classes with virtual functions was because of objections raised in a previous thread, particularly by Thiago, if my memory is correct.  His (I think) point of view on the matter made sense to me--the Standard does not specify how virtual dispatch works, and crazy implementations are legal, so all bets are off when it comes to classes with virtual functions.  Classes with virtual bases, of course, go crazy in both theory and in reality.

 
Note that these compilers did have lots of differences. Of particular note were variances between where the base class goes relative to the derived. On one platform, it was after, while on the others, it was before.

And yet, at no time were there non-static offsets for members of polymorphic types.

Yes, that may be true, but the Standard does not appear to exclude this possibility.  The Standard does not even explicitly exclude this possibility for trivially-copyable types, but that proof earlier in the thread shows that this exclusion can be derived.  The exclusion should probably be stated for clarity, in my opinion; "consistent-layout" would do this.

That's why I mentioned the whole POD-to-standard layout thing. C++98/03 didn't specify

If what you want to do is standardize existing practice, you first need to know what existing practice is. Arbitrarily deciding that existing practice for compile-time layout stops at virtual functions seems... arbitrary.


Sorry; ignore this part of my post. I forgot to delete this.

Thiago Macieira

unread,
May 20, 2015, 12:09:29 AM5/20/15
to std-pr...@isocpp.org
On Tuesday 19 May 2015 19:46:03 Myriachan wrote:
> > - If type is not a standard-layout class (Clause 9), the results are
> > undefined.
> > + If type is not a standard-layout or trivially copyable class (Clause 9),
> > the
> > results are undefined.
> >
> > Or refer to the contiguous storage definition from 1.8/5 that you quoted.
>
> That wouldn't cover the use of offsetof() in non-trivially-copyable
> non-standard-layout types, but is an improvement.

Indeed, but what's the use-case? Obviously there's no code doing that now. Why
do we need to enable that, instead of enabling a better alternative?

> What does this new subtraction operator do? Like, how does "pm1 - &s"
> differ from "&(s.*pm1)"?

It was proposed in a previous discussion. Let me take the example of offsetof
used in the "Linux kernel style list node": the offset of the member of type
"struct list" is used to recreate the original object. In other words, instead
of the traditional doubly-linked list

template <typename T> struct Node
{
T data;
Node *next, *prev;
};

You have:

struct Node { Node *next, *prev; };
struct MyType
{
[...]
Node list;
};

To obtain the next item of a given MyType object m, you get m->list->next and
then do pointer manipulation to get back from that to the containing MyType.
That could be C++ified as:

template <typename T, Node T:: *ListMember> T *next(T *item)
{
Node *nextNode = (item->*ListMember)->next;
return nextNode - ListMember;
}

We'd define the operations:
Outer *c;
MemberType *m;
MemberType ClassType:: *pm;

m = c + pm = &(c->*pm)
m - pm = c
pm = m - c (probably as a function, not an operator, since this
i already defined if MemberType and ClassType are the
same or related)

We could do this as a library solution if it weren't for the fact that we
can't define operator overloads for primitive types.

> Also, keep in mind that to truly replace offsetof(), member pointers would
> need to gain the ability to refer to members of members.
>
>
> struct Kitty
> {
> char asdf;
> double fdsa;
> };
> struct Meow
> {
char space[16];
> Kitty kitty;
> };

True, but you can compose the operation, but you'll need an intermediate stop
in the Kitty object. You can't fill a
double Meow:: *ptr

But we could define that:
Outer *c;
MemberType *m;
MemberType Inner:: *pm1;
Inner Outer:: *pm2;
MemberType Outer:: *composed;

m = c + pm2 + pm1 = &((c->*pm2)->*pm1)
m - pm1 = c + pm2
m - pm1 - pm2 = c

composed = pm2 + pm1
composed - pm1 = pm2
composed - pm2 = pm1
which is what allows for:
m = c + (pm2 + pm1)

In other words: we retain associativity of addition.

In the modified example, for the double member in a standard ABI, we'd have
pm1 = &Kitty::fdsa = 8
pm2 = &Meow::kitty = 16
composed = pm1 + pm2 = 24

pm2 = composed - pm1 = 24 - 8 = 16
pm1 = composed - pm2 = 24 - 16 = 8

With careful work, we may even maintain commutativity of addition:

template <typename Outer, typename Inner, typename MemberType>
MemberType Outer:: * operator +(MemberType Inner::*, Inner Outer::*);

template <typename Outer, typename Inner, typename MemberType>
MemberType Outer:: * operator +(Inner Outer::*, MemberType Inner::*);

Is there any case where the two above would be ambiguous? If Inner ==
MemberType or if Inner == Outer, it would be, but that would imply we have a
type containing itself as a member and that is not possible.

Can anyone see another ambiguity?

Obviously subtraction has never had commutativity, though we should be able to
reorder the operators:

pm1 - composed = 8 - 24 = -16
pm2 - composed = 16 - 24 = -8

This would create a "pointer-to-members" of type Outer Inner::* and Inner
MemberType::*. In other words, not a pointer to *member*, but pointer to
*container* and I think we should allow it. That in tur brings the unary
negation operator, which transforms a pointer-to-member in pointer-to-
container:

composed - pm1 = - (pm1 - composed)

The property of a pointer-to-container is that it can be applied on any object
of the container type, but needs to be applied to the correct member.

struct S { T i, j; } s1, s2;
S T::*ptc = - &S::i;
assert(&s1.i + ptc == &s1);
assert(&s2.i + ptc == &s2);
&s1.j + ptc; // undefined

Note how this also requires the type "S int::*" to exist

> > template <type Member, typename Outer> ptrdiff_t
> > make_offset(Outer *object, Member Outer::
> > *pointer_to_member);
> >
> > template <type Member, typename Outer> Member Outer::*
> > make_pointer_to_member(Outer *object, ptrdiff_t offset);
> >

> I'd love for these to exist, irrespective of anything else I've mentioned.
> =) It's possible to make this as a third-party library, if you do some
> platform-specific hacks and maybe restrict the types people are allowed to
> use, but if implemented by the compiler vendor, it'd work great.

Right. And so could all of the above I described, aside from actually using
operator+ and operator-.

> What would the first parameter be used for? Something like, "the returned
> member pointer shall only be used with objects of the same dynamic type as
> the dynamic type of 'object', else the behavior is undefined"?

I'm not sure. I know that in most ABIs it wouldn't be needed, since the
pointer to member encodes all of the information we need to transform it to an
offset. I added it without thinking too much, and I can't think of a reason why
it would be needed. We only need the type.

> Member pointers are better than offsetof(), but offsetof() shouldn't be
> locked out.

I disagree here. I think we shouldn't extend offsetof more.
Reply all
Reply to author
Forward
0 new messages