P0840, empty member optimization, fails at its goal if there are no guarantees.

332 views
Skip to first unread message

Nicol Bolas

unread,
Dec 6, 2017, 5:39:40 PM12/6/17
to ISO C++ Standard - Future Proposals
Before C++11, empty base optimization was a commonly-implemented concept, but there was no actual requirement for it in the standard. C++98/03 permitted it, but it did not require it.

C++11's standard layout rules effectively require EBO in a certain set of limited circumstances. If your class follows standard layout rules, the standard guarantees that the base class will not disturb the layout of the class's members. This means that you know this data structure will have the most optimal packing possible:

template<typename T>
class S : private T
{
 
int my_member;
};

For any type `T` where `is_empty_v<T>` is true, `S<T>` will provide optimal packing because the standard explicitly requires implementations to do so.

That is not true for this:

template<typename T>
struct U
{
 
[[no_unique_address]] T t;
 
int my_member;
};

I have no guarantees that `U<T>` will have optimal packing.

So, when deciding between implementing my code as `S` or `U`, why would I ever pick `U` when `S` is the one that has explicit standard requirements? And if `U` is not the obviously correct choice... what good is the feature? It's not fixing the problem it claims to because it lacks the guarantees that standard layout provides with EBO.

So I make the claim that without some kind of guarantee of optimization, this feature fails to achieve its goals.

So, what would it take to fix this? Well, one big annoyance with P0840 is that it is defined as an attribute. And it stakes out its position on the validity of defining such a thing the idea that, so long as the code functions the same whether the attribute is implemented or not, then it's fine to use an attribute.

Following that rule (which, let's be frank, only exists as an expedient means of getting the proposal through the committee, not because it makes the feature better in some way) makes it impossible to ensure layout compatibility to between `S<T>` and `U<T>`. But that doesn't mean we cannot have some reasonable guarantees on whether `U<T>::t` is zero-sized.

When I was thinking about my own zero-sized idea, discussed here, I came upon the whole "unique identity rule" issue. Namely, that every object of the same type T must have a unique address. None of my solutions to that issue ever really pleased me. This proposal kind of sidesteps that problem, since the attribute is effectively saying "this subobject does not have to be uniquely identified, so it may be zero sized", rather than my approach of "this subobject is zero sized and therefore may not be uniquely identified".

Of course, that's part of the problem: users don't actually care about unique identity; they care about the size.

I brought up my idea because, while investigating it, I happened upon a solution that seems to dovetail very closely with this proposal. But first, let's look at the scope of the problem. Let's say you have this type:

struct empty {};

struct A
{
 
[[no_unqiue_address]] empty x;
  empty y
;
};

While `x` does not have to have a unique address, `y` does. And therefore `y`'s address must be unique from `x`'s. And since we're not allowed to reorder members, the compiler must include space for `x`.

OK, fine. What about this:

struct B
{
  empty y
;
};

struct C
{
 
[[no_unique_address]] empty x;
  B b
;
};

Neither `B::y` nor `C::b` is declared to not have a unique address. But if the compiler does not assign a location to `C::x`, then it may overlap with `C::B::y`, which is not allowed. And therefore, `C::x` must have an address.

Note that the C++11 standard layout rules have the exact same problem with base classes:

struct EA {}; //Empty
struct EB : EA {}; //Also empty.

struct EC : EA, EB {};

struct ED : EA
{
  EB b
;
};

By the (amended) rules of standard layout, neither `EC` nor `ED` is standard layout. Why? For `EC`, there are two base class subobjects of the type with the same type, and therefore they must have the same address. For `ED`, the empty base class `EA` potentially overlaps with `EC::b`'s base class. And this is actually recursive, through all of the base class subojects of the type and all of the base class subobjects of the first member of the type.

So I think we can effectively piggyback off of this rule.

I suggest we declare the following:

1. A type is not standard layout if any of its potentially overlapping subobjects are of the same type.

2. A type is not standard layout if any of its potentially overlapping subobjects are of the same type as the type of its first non-"no_unique_address" subobject or any potentially overlapping subobjects of that subobject.

3. All subobjects which are empty types that declared with the `no_unique_address` attribute are always zero sized if their containing type is standard layout.

4. A standard layout type with all of its members declared `no_unique_address` is empty.

One might argue that this now makes the `no_unique_address` attribute affect the visible behavior of the program. But I would argue that it already does so. `is_empty_v` is explicitly defined by the P0840 wording to be implementation-defined as to whether `no_unique_address` will affect the emptiness of a particular type. So there will be types where `is_empty_v` is true on some implementations and false on others.

Furthermore, a standard type still has to fulfill all of the other requirements of standard layout. So if a type `is_standard_layout` under the `no_unqiue_address` rules, then it would still be standard layout under the rules without the attribute. Similarly, it will not affect types that are still standard layout under the old rules.

Lastly, this rules do not change the layout compatibility rules and the common initial sequence rule. So these rules don't mean that `no_unique_address` is ignored by layout compatibility; it only ensures that if they're standard layout, the subobjects are guaranteed to be zero sized.

I came up with these rules somewhat off the cuff, so feel free to let me know where I went wrong. This all sounds implementable to me, but I haven't written the code for it.


Thiago Macieira

unread,
Dec 6, 2017, 5:49:35 PM12/6/17
to std-pr...@isocpp.org
On Wednesday, 6 December 2017 14:39:39 PST Nicol Bolas wrote:
> That is not true for this:
>
> template<typename T>
> struct U
> {
> [[no_unique_address]] T t;
> int my_member;
> };
>
> I have no guarantees that `U<T>` will have optimal packing.

Why wouldn't it have optimal packing? Are you thinking of a compiler that does
not implement the attribute?

> So, when deciding between implementing my code as `S` or `U`, why would I
> ever pick `U` when `S` is the one that has explicit standard requirements?

T might be final.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

Nicol Bolas

unread,
Dec 6, 2017, 6:08:13 PM12/6/17
to ISO C++ Standard - Future Proposals
On Wednesday, December 6, 2017 at 5:49:35 PM UTC-5, Thiago Macieira wrote:
On Wednesday, 6 December 2017 14:39:39 PST Nicol Bolas wrote:
> That is not true for this:
>
> template<typename T>
> struct U
> {
>   [[no_unique_address]] T t;
>   int my_member;
> };
>
> I have no guarantees that `U<T>` will have optimal packing.

Why wouldn't it have optimal packing?

If you have two options, one of which guarantees you exactly what you want, and one that probably gives you what you want except when it doesn't and there's no way to force it to happen... why would you pick the latter over the former?

By your reasoning, why bother having EBO as part of standard layout at all? After all, why wouldn't a compiler optimize away empty base classes?

QOI should not be an excuse not to require something in the standard.

Are you thinking of a compiler that does
not implement the attribute?

> So, when deciding between implementing my code as `S` or `U`, why would I
> ever pick `U` when `S` is the one that has explicit standard requirements?

T might be final.

That still wouldn't make me pick the choice that may not work. I'd either ignore that possibility (or specifically reject it via a concept) or failing that, provide an alternate implementation for `final` types.

Thiago Macieira

unread,
Dec 6, 2017, 9:46:33 PM12/6/17
to std-pr...@isocpp.org
On Wednesday, 6 December 2017 15:08:13 PST Nicol Bolas wrote:
> By your reasoning, why bother having EBO as part of standard layout at all?
> After all, why wouldn't a compiler optimize away empty base classes?

I'm not sure I agree with your assertion:

> C++11's standard layout rules effectively require EBO in a certain set of
> limited circumstances. If your class follows standard layout rules, the
> standard guarantees that the base class will not disturb the layout of the
> class's members.

First of all, this excludes non-std-layout. And we've seen it excludes final
types too. A third case is when you need two non-unique-address members of the
same type: you can't directly derive twice from the same class. So there are
already three cases when you can't use a base for no-address. The attribute
was created for those cases.

Second, can you explain your reasoning why you think an empty base of a
stdlayout class needs to have effective size zero? A class with a base is not
possible in C and therefore cannot be shared with a C library, even though it
may follow all the rules required for POD. So why is it required that:

struct empty {};

struct A : empty
{
int i;
};

sizeof(A) == sizeof(int)
offsetof(A, i) == 0

Nicol Bolas

unread,
Dec 6, 2017, 11:44:25 PM12/6/17
to ISO C++ Standard - Future Proposals
On Wednesday, December 6, 2017 at 9:46:33 PM UTC-5, Thiago Macieira wrote:
On Wednesday, 6 December 2017 15:08:13 PST Nicol Bolas wrote:
> By your reasoning, why bother having EBO as part of standard layout at all?
> After all, why wouldn't a compiler optimize away empty base classes?

I'm not sure I agree with your assertion:

> C++11's standard layout rules effectively require EBO in a certain set of
> limited circumstances. If your class follows standard layout rules, the
> standard guarantees that the base class will not disturb the layout of the
> class's members.

First of all, this excludes non-std-layout.

But it includes standard layout. Which means there is at least some class of types where it undeniably works. A set that certainly includes something is better than a set that may not include anything.

And we've seen it excludes final
types too. A third case is when you need two non-unique-address members of the
same type: you can't directly derive twice from the same class.

Despite the name, `no_unique_address` does not spare implementations from the "unique identity rule". From P0840R1's version of [intro.object]/8:

> Unless an object is a bit-field or a subobject of zero size, the address of that object is the address of the first byte it occupies. Two objects with overlapping lifetimes that are not bit-fields may have the same address if one is nested within the other, or if at least one is a subobject of zero size and they are of different types; otherwise, they have distinct addresses and shall occupy disjoint bytes of storage.

Emphasis added. If you add two members of the same type which use `no_unique_address`, they must have distinct addresses and distinct memory storage. And if you add two members of the same type, one which uses the attribute and the other which doesn't, the implementation must give them distinct addresses and distinct memory storage.

So no optimization takes place. Given that, why would you declare either of them with `no_unique_address`?

FYI: This rule is why the standard layout rules have that incredibly complicated bit about base classes and the first member subobject (and the first member subobject of that type, recursively) having the same type. Because the standard layout rules want to force EBO, and therefore are defined by the circumstances that cannot violate the unique identity rule.

So there are
already three cases when you can't use a base for no-address. The attribute
was created for those cases.

Second, can you explain your reasoning why you think an empty base of a
stdlayout class needs to have effective size zero?

I'm not sure I understand the question. Whether it "needs to" or not is essentially sophistry because it already does. It did back in C++11; that's a done deal. We're talking about bringing the same power to member subobjects.

If you're asking why that change was made... well, read the original paper on standard layout.

Now to be completely fair, standard layout rules don't actually require EBO per-se. Instead, it requires that empty base classes not disturb the layout of the members of the type. Theoretically, empty base classes can take up space after all of the non-static data members. However, as I'll demonstrate below, it's very hard to implement the rules of standard layout types without making subobjects zero sized in at least some cases.

I also hadn't realized that among the wording changes caused by P0840 is an explicit statement that empty base class subobjects of standard layout types are zero sized. I was rather surprised by that. It also helps demonstrate my point: guarantees are good things.

Why bother redefining prvalues if you know every compiler implements RVOs? Because it allows us to be certain of things that were previously just hoped for or expected. OK yes, it also technically allows us to return prvalues of even immobile types. But many of the motivations for it were to move us from the realm of "an expected optimization that the standard allows" to "required and expected behavior".

A class with a base is not
possible in C and therefore cannot be shared with a C library, even though it
may follow all the rules required for POD.

And that might matter if standard layout existed solely to talk about objects that get shared with a C library.

Also, sharing objects between C and C++ does not (necessarily) require the objects to be defined in the same way. While the C++ object model does not support having C create C++ objects, if you create a C++ object that matches the layout of a C type, then C's object model can receive and manipulate that object just fine.

So why is it required that:

struct empty {};

struct A : empty
{
    int i;
};

sizeof(A) == sizeof(int)
offsetof(A, i) == 0

The second one is required; the first is not. `A` is not required to be layout compatible with `int`. However, `A` is required to be layout compatible with B, C, and D here:

struct other_empty {};
struct B : other_empty, A {};
struct C : A, other_empty {};
struct my_int { int i; };
struct D : my_int {};

Now, the sizes of B, C, and D are not required by the standard to be the same. But you can stick them in a `union` and fetch the same `::i` subobject from any of them. And that's good enough.

You are correct that the standard does not require that these objects are the same size as one another. That an implementation can assign space for these base classes at the end of the type.

An implementation could lay out `B`'s base classes in reverse declaration order (recursively), so that `A`'s members come first, and reserve an additional byte for `A::empty` and then another additional byte for `B::other_empty`. But the same implementation, when applied to `C` would have to make `C::other_empty` zero sized, since it would by that rule come first. And `C::i` must have the same offset as it does for `A::i`.

So it's not possible to have a coherent implementation whereby both `B::other_empty` and `C::other_empty` are not zero sized. At least one of them has to be zero sized because one of them is going to wind up being allocated before `A`.


Corentin

unread,
Dec 7, 2017, 3:03:44 AM12/7/17
to std-pr...@isocpp.org

To me the issue is the idea that the compiler can ignore all an every attributes. 

I'd argue that attributes should be allowed to change the semantic of a program, on a per-attribute basis.
[[no_unique_address]] is a perfect exemple. It is arcane enough that it does not warrant a new keyword or the re-purposing of one ( like register ), and act like a kind of qualifier. For the reason you pointed out, there is an expectation that this attributes has an unconditional semantic significance and enforcing that in the program seems reasonable.

I'm not, by any means, suggesting that all futures keywords be implemented in terms of attributes.  The decision of introducing attributes that are not discard-able should be made on a per-attributes basis.

I'm working on a set of rules
  • Exisiting standard attributes are ignorable
  • Compilers / tooling provided attributes are always ignorable
  • The committee may decide in the future to introduce attributes that are not ignorable and  carry semantic meaning. Compilers that ignore those attributes are not-conformant. Adding or removing such attributes may change the meaning if the program. 
  • Whether futures attributes are ignorable will be decided on a per-attribute basis;

That would let the committee decide what the best semantic for [[no_unique_address]] is.
And we can build upon that to introduce user-defined attributes and attribute reflection;
The idea being that user-defined attributes are always namespaced so that the standard can keep adding new attributes without running into name-collision.






Thiago Macieira

unread,
Dec 7, 2017, 2:11:50 PM12/7/17
to std-pr...@isocpp.org
On Wednesday, 6 December 2017 20:44:24 PST Nicol Bolas wrote:
> > First of all, this excludes non-std-layout.
>
> But it includes standard layout. Which means there is at least *some class*
> of types where it undeniably works. A set that certainly includes something
> is better than a set that may not include anything.

If it's a QoI issue, you get to blame your vendor.

For example, when I implemented code in Qt that used <chrono>, I got to blame
Microsoft for not adding __has_include until VS 15.3 (there was a thread even
in std-discussions about that).

> Despite the name, `no_unique_address` does not spare implementations from
>
> the "unique identity rule". From P0840R1's version of [intro.object]/8:
> > Unless an object is a bit-field or a subobject of zero size, the address
>
> of that object is the address of the first byte it occupies. Two objects
> with overlapping lifetimes that are not bit-fields may have the same
> address if one is nested within the other, or if at least one is a
> subobject of zero size and *they are of different types*; otherwise, they
> have distinct addresses and shall occupy disjoint bytes of storage.
>
> Emphasis added. If you add two members of the same type which use
> `no_unique_address`, they *must* have distinct addresses and distinct
> memory storage. And if you add two members of the same type, one which uses
> the attribute and the other which doesn't, the implementation *must* give
> them distinct addresses and distinct memory storage.

Well, one easy work around to that is to share the address of another member
in the larger structure.

In any case, why must this restriction apply to [[no_unique_address]] ? If we
don't want a unique address in the first place, why must it still be unique
among sub-objects of the same type?

> FYI: This rule is why the standard layout rules have that incredibly
> complicated bit about base classes and the first member subobject (and the
> first member subobject of that type, recursively) having the same type.
> Because the standard layout rules want to force EBO, and therefore are
> defined by the circumstances that cannot violate the unique identity rule.

I never understood that either. If the base and the first member aren't empty
classes, they obviously have different addresses. Why isn't that standard
layout?

If they are empty but no [[no_unique_address]] is in effect, then they have
different addresses. Why isn't that standard layout?

By the way, note how [[no_unique_address]] implies non-standard layout. So a
type's actual classification ought to change with an attribute. This alone may
be a reason not to use them.

> > Second, can you explain your reasoning why you think an empty base of a
> > stdlayout class needs to have effective size zero?
>
> I'm not sure I understand the question. Whether it "needs to" or not is
> essentially sophistry because it already does. It did back in C++11; that's
> a done deal. We're talking about bringing the same power to member
> subobjects.

I'm asking what in the standard makes EBO mandatory. The "O" in EBO would
imply otherwise.

> If you're asking why that change was made... well, read the original paper
> on standard layout.
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2342.htm>

The paper says in the footnote "This ensures that two subobjects that have the
same class type and that belong to the same most-derived object are not
allocated at the same address ([expr.eq]).". It doesn't explain why they would
be allocated at the same address in the first place. It's clear that the
restriction makes it so the same-allocation cannot happen, but doesn't say how
it could otherwise happen.

> Now to be completely fair, standard layout rules don't actually require EBO
> per-se. Instead, it requires that empty base classes not disturb the layout
> of the members of the type. Theoretically, empty base classes can take up
> space after all of the non-static data members. However, as I'll
> demonstrate below, it's very hard to implement the rules of standard layout
> types without making subobjects zero sized in at least some cases.

My question is why can't you consider the base as a sub-object of size 1, and
thus the first member of the derived class is at 2 + alignment?

> Also, sharing objects between C and C++ does not (necessarily) require the
> objects to be defined in the same way. While the C++ object model does not
> support having C create C++ objects, if you create a C++ object that
> matches the layout of a C type, then C's object model can *receive* and
> manipulate that object just fine.

I'd say that's extremely fragile and should not be counted on. If you want to
share an object between C and C++, use the C declaration of the type. Anything
else is tempting UB.

Unless the standard begins to *explicitly* say that a given C++ hierarchy must
always have the exact same layout as a C struct. Until that happens, I would
advise not writing code like that and I'd advise to always reject such code in
code reviews.

> So why is it required that:
> > struct empty {};
> >
> > struct A : empty
> > {
> > int i;
> > };
> >
> > sizeof(A) == sizeof(int)
> > offsetof(A, i) == 0
>
> The second one is required; the first is not. `A` is not required to be
> layout compatible with `int`.

Why is the second required? Why can't struct A be layout-equivalent to

struct Aprime
{
empty _;
int i;
};

> However, `A` is required to be layout compatible with B, C, and D here:
>
> struct other_empty {};
> struct B : other_empty, A {};
> struct C : A, other_empty {};
> struct my_int { int i; };
> struct D : my_int {};
>
> Now, the sizes of B, C, and D are not required by the standard to be the
> same. But you can stick them in a `union` and fetch the same `::i`
> subobject from any of them. And that's good enough.

Again, why? The only one that seems to me you could put in a union and access
A::i is C, because it derives from A, so by definition it follows the initial
sequence requirement.

struct B has a member before A, even if empty.

struct D has a wholly different hierarchy. I guess this is the interesting
one, because if we say that the initial sequence only applies to non-static
data members and does not include base sub-objects, then the only member in
both A and D is "int i". But even if that's the case (and I'd need specific
wording in the standard saying it is), it doesn't mean the bases must be
empty.

As you said yourself:

> You are correct that the standard does not require that these objects are
> the same size as one another. That an implementation can assign space for
> these base classes at the end of the type.

Then they are not empty, are they?

> An implementation could lay out `B`'s base classes in reverse declaration
> order (recursively), so that `A`'s members come first, and reserve an
> additional byte for `A::empty` and then another additional byte for
> `B::other_empty`. But the same implementation, when applied to `C` would
> have to make `C::other_empty` zero sized, since it would by that rule come
> first. And `C::i` must have the same offset as it does for `A::i`.

I don't agree with your conclusion. The implementation does not need to apply
reverse order. It only needs to consider first the non-empty ones, then the
empty ones. So the order of sub-objects could be:

A: i empty
B: i empty other_empty
C: i empty other_empty
D: i

Note how in this case B and C are layout compatible.

Richard Smith

unread,
Dec 7, 2017, 5:15:35 PM12/7/17
to std-pr...@isocpp.org
On 6 December 2017 at 14:39, Nicol Bolas <jmck...@gmail.com> wrote:
Before C++11, empty base optimization was a commonly-implemented concept, but there was no actual requirement for it in the standard. C++98/03 permitted it, but it did not require it.

C++11's standard layout rules effectively require EBO in a certain set of limited circumstances. If your class follows standard layout rules, the standard guarantees that the base class will not disturb the layout of the class's members. This means that you know this data structure will have the most optimal packing possible:

template<typename T>
class S : private T
{
 
int my_member;
};

For any type `T` where `is_empty_v<T>` is true, `S<T>` will provide optimal packing because the standard explicitly requires implementations to do so.

That is not true for this:

template<typename T>
struct U
{
 
[[no_unique_address]] T t;
 
int my_member;
};

I have no guarantees that `U<T>` will have optimal packing.

So, when deciding between implementing my code as `S` or `U`, why would I ever pick `U` when `S` is the one that has explicit standard requirements?

As Thiago notes, U works reliably whereas S has a number of caveats and problematic corners. There are other, less-well-known reasons why EBO is a problematic implementation technique. For example:

struct A {}; struct Alloc : A { typedef A base; /*...*/ };

namespace base { void f(); }

struct B : std::vector<int, Alloc> {
  void g() { base::f(); }
};

This (non-conformingly) fails to compile with some standard library implementations, due to their use of EBO inside the vector implementation making the name 'base' visible to unqualified lookup within vector<int, Alloc>.

And if `U` is not the obviously correct choice... what good is the feature?

You have not convinced me that `U` is not the obviously correct choice.

It's not fixing the problem it claims to because it lacks the guarantees that standard layout provides with EBO.

So I make the claim that without some kind of guarantee of optimization, this feature fails to achieve its goals.

I disagree. Its goal is to provide a superior alternative to EBO, and in my view, it achieves that. An implementation is (as ever) permitted to use stupid layout for structs -- inserting arbitrarily large amounts of padding for no reason is a valid implementation technique, and remains so in the presence of this feature. You're right that standard-layout structs do give an EBO guarantee in some cases, thereby disallowing lead padding, but an implementation can still pad out the *end* of the class with space for the empty base class if it wants (and indeed implementations actually do that in some cases). Also, in generic code, you often cannot ensure that the class for which you would like to perform EBO is a standard-layout class, so special guarantees that apply only for standard-layout structs are often irrelevant.

However, I think it would be reasonable to provide stronger layout guarantees for standard layout types when this attribute is used; the feature has not yet been discussed in CWG, and this seems like a reasonable thing to consider at that point. It's certainly in line with the design intent to provide the same guarantees for fields with this attribute as for base classes in the standard-layout case.

So, what would it take to fix this? Well, one big annoyance with P0840 is that it is defined as an attribute. And it stakes out its position on the validity of defining such a thing the idea that, so long as the code functions the same whether the attribute is implemented or not, then it's fine to use an attribute.

Following that rule (which, let's be frank, only exists as an expedient means of getting the proposal through the committee, not because it makes the feature better in some way)

This parenthetical is incorrect. Modeling this feature as an attribute makes it a conforming extension in prior language modes. We did consider in committee the possibility of a keyword instead of an attribute, but as you can tell from the progress of the paper, the attribute approach won the day.
 
makes it impossible to ensure layout compatibility to between `S<T>` and `U<T>`. But that doesn't mean we cannot have some reasonable guarantees on whether `U<T>::t` is zero-sized.

When I was thinking about my own zero-sized idea, discussed here, I came upon the whole "unique identity rule" issue. Namely, that every object of the same type T must have a unique address. None of my solutions to that issue ever really pleased me. This proposal kind of sidesteps that problem, since the attribute is effectively saying "this subobject does not have to be uniquely identified, so it may be zero sized", rather than my approach of "this subobject is zero sized and therefore may not be uniquely identified".

Of course, that's part of the problem: users don't actually care about unique identity; they care about the size.

I brought up my idea because, while investigating it, I happened upon a solution that seems to dovetail very closely with this proposal. But first, let's look at the scope of the problem. Let's say you have this type:

struct empty {};

struct A
{
 
[[no_unqiue_address]] empty x;
  empty y
;
};

While `x` does not have to have a unique address, `y` does. And therefore `y`'s address must be unique from `x`'s. And since we're not allowed to reorder members, the compiler must include space for `x`.

Minor correction: we actually are allowed to reorder [[no_unique_address]] members. sizeof(A) must still be at least 2, because the two 'empty' members must have distinct addresses, but the 'x' member can be in A's tail padding and thus reusable. However, it's desirable for an initial [[no_unique_address]] member to result in the same layout as a base class, so that there is no ABI change in moving from the old solution to the new one. In practice, that means the above struct will likely be laid out with the 'x' member at offset 0 and the 'y' member at offset 1.
 
OK, fine. What about this:

struct B
{
  empty y
;
};

struct C
{
 
[[no_unique_address]] empty x;
  B b
;
};

Neither `B::y` nor `C::b` is declared to not have a unique address. But if the compiler does not assign a location to `C::x`, then it may overlap with `C::B::y`, which is not allowed. And therefore, `C::x` must have an address.

Note that the C++11 standard layout rules have the exact same problem with base classes:

struct EA {}; //Empty
struct EB : EA {}; //Also empty.

struct EC : EA, EB {};

struct ED : EA
{
  EB b
;
};

By the (amended) rules of standard layout, neither `EC` nor `ED` is standard layout. Why? For `EC`, there are two base class subobjects of the type with the same type, and therefore they must have the same address. For `ED`, the empty base class `EA` potentially overlaps with `EC::b`'s base class. And this is actually recursive, through all of the base class subojects of the type and all of the base class subobjects of the first member of the type.

So I think we can effectively piggyback off of this rule.

I suggest we declare the following:

1. A type is not standard layout if any of its potentially overlapping subobjects are of the same type.

2. A type is not standard layout if any of its potentially overlapping subobjects are of the same type as the type of its first non-"no_unique_address" subobject or any potentially overlapping subobjects of that subobject.

3. All subobjects which are empty types that declared with the `no_unique_address` attribute are always zero sized if their containing type is standard layout.

4. A standard layout type with all of its members declared `no_unique_address` is empty.

One might argue that this now makes the `no_unique_address` attribute affect the visible behavior of the program. But I would argue that it already does so. `is_empty_v` is explicitly defined by the P0840 wording to be implementation-defined as to whether `no_unique_address` will affect the emptiness of a particular type. So there will be types where `is_empty_v` is true on some implementations and false on others.

Furthermore, a standard type still has to fulfill all of the other requirements of standard layout. So if a type `is_standard_layout` under the `no_unqiue_address` rules, then it would still be standard layout under the rules without the attribute. Similarly, it will not affect types that are still standard layout under the old rules.

Lastly, this rules do not change the layout compatibility rules and the common initial sequence rule. So these rules don't mean that `no_unique_address` is ignored by layout compatibility; it only ensures that if they're standard layout, the subobjects are guaranteed to be zero sized.

I came up with these rules somewhat off the cuff, so feel free to let me know where I went wrong. This all sounds implementable to me, but I haven't written the code for it.

Thanks, I think this is a good suggestion.

Nicol Bolas

unread,
Dec 7, 2017, 9:31:13 PM12/7/17
to ISO C++ Standard - Future Proposals
On Thursday, December 7, 2017 at 2:11:50 PM UTC-5, Thiago Macieira wrote:
On Wednesday, 6 December 2017 20:44:24 PST Nicol Bolas wrote:
> > First of all, this excludes non-std-layout.
>
> But it includes standard layout. Which means there is at least *some class*
> of types where it undeniably works. A set that certainly includes something
> is better than a set that may not include anything.

If it's a QoI issue, you get to blame your vendor.

I don't see how that's a helpful solution. Blaming vendors don't make them implement things.

Again, this goes back to the reason why we got guaranteed EBO in the first place. And guaranteed elision.

For example, when I implemented code in Qt that used <chrono>, I got to blame
Microsoft for not adding __has_include until VS 15.3 (there was a thread even
in std-discussions about that).

... that's not even the same kind of thing. `__has_include` is not an optional feature of C++17. It was going to happen in VS, sooner or later.

Whether `no_unique_address` works in particular cases or not is not a "gonna happen" thing. It's entirely up to implementations.

> Despite the name, `no_unique_address` does not spare implementations from
>
> the "unique identity rule". From P0840R1's version of [intro.object]/8:
> > Unless an object is a bit-field or a subobject of zero size, the address
>
> of that object is the address of the first byte it occupies. Two objects
> with overlapping lifetimes that are not bit-fields may have the same
> address if one is nested within the other, or if at least one is a
> subobject of zero size and *they are of different types*; otherwise, they
> have distinct addresses and shall occupy disjoint bytes of storage.
>
> Emphasis added. If you add two members of the same type which use
> `no_unique_address`, they *must* have distinct addresses and distinct
> memory storage. And if you add two members of the same type, one which uses
> the attribute and the other which doesn't, the implementation *must* give
> them distinct addresses and distinct memory storage.

Well, one easy work around to that is to share the address of another member
in the larger structure.

[class.mem]/14 confounds you:

> Non-static data members of a (non-union) class with the same access control (Clause 14) are allocated so
that later members have higher addresses within a class object.

And you cannot violate that requirement, since `no_unique_address` is an attribute. And the reasoning P0840 gave for allowing it to be an attribute rather than a keyword is that removing it means that the program still functions the "same", without affecting visible behavior (again, this is a lie, but whatever). Reordering `no_unique_address` members would be observable defined behavior; it would affect the result of `offsetof`, for example.

Oh, and BTW, if the authors of P0840R1 are listening, they need to change [class.mem]/14 in the next revision of their paper. Because as it currently stands, `no_unique_address` can't make objects zero sized at all, since that would require the address of the zero sized object to be the same as that of later members. So that would obviously need to be changed.

And as previously pointed out, you can't change it to allow arbitrary reshuffling; such a change would merely turn "higher addresses" into "higher or equal addresses".

In any case, why must this restriction apply to [[no_unique_address]] ? If we
don't want a unique address in the first place, why must it still be unique
among sub-objects of the same type?

Because C++ requires that every object of type T have a distinct address from every other object. That's [intro.object]/8.

Now yes, `no_unique_address` certainly sounds like it shouldn't have to follow that rule. But that's how they wrote their modified version of [intro.object]/8. But the reason why we can't just change that is because any particular `T` might actually rely on it.

The only types where `no_unique_address` actually matter are types that are empty. The problem is that it is possible to effectively assign state to an empty type by using its address, since the C++ object model currently requires that all separate `T` instances have unique addresses. Therefore, if you break that rule, you make take someone's working-yet-decidedly-oddball code and accidentally break it.

So I would say (as I did in my proposal idea) that it is wrong to retract this rule unless the type itself defines that it wants to exempt itself from unique identity.

> FYI: This rule is why the standard layout rules have that incredibly
> complicated bit about base classes and the first member subobject (and the
> first member subobject of that type, recursively) having the same type.
> Because the standard layout rules want to force EBO, and therefore are
> defined by the circumstances that cannot violate the unique identity rule.

I never understood that either. If the base and the first member aren't empty
classes, they obviously have different addresses. Why isn't that standard
layout?

Because that means that there are two classes in the hierarchy that are contributing members. And standard layout forbids that.

One of the goals of the concept of "standard layout" was to take something that already worked (POD-based intercommunication with C) and expand on it based on existing implementation experience. That is, to allow more types to participate in such intercommunication. The definition of POD was confining, so they expanded it.

Different compilers implement class hierarchies differently. Some put all base class members before the current class members; some put them after.

The standard could not pick a winner or loser here, since ABI changes are really hard to just make people do. So instead, they picked something that everybody could support: you can have a hierarchy, but only one class in that hierarchy may contribute members if you want it to be standard layout.

If they are empty but no [[no_unique_address]] is in effect, then they have
different addresses. Why isn't that standard layout?

Because the only way to give them different addresses is to have the compiler assign space to the empty base class. And the ABI for that platform might put that space in front of the members. Lastly, we want standard layout types to enforce EBO.

Also, this would break layout compatibility, since standard layout types have to be layout compatible with any types that have the same sequence of member subobjects. And if you assign space to empty bases before the first member, then their layouts are not compatible.

By the way, note how [[no_unique_address]] implies non-standard layout.

It implies no such thing. Indeed, it changes nothing about the standard layout rules; the specification is very clear on that.
 
So a
type's actual classification ought to change with an attribute. This alone may
be a reason not to use them.

> > Second, can you explain your reasoning why you think an empty base of a
> > stdlayout class needs to have effective size zero?
>
> I'm not sure I understand the question. Whether it "needs to" or not is
> essentially sophistry because it already does. It did back in C++11; that's
> a done deal. We're talking about bringing the same power to member
> subobjects.

I'm asking what in the standard makes EBO mandatory. The "O" in EBO would
imply otherwise.

That's merely an etymological artifact, much like "return value optimization". We called it "EBO" back before we could get guarantees for it, so we still use that name.

Also, there are cases where it's not guaranteed but compilers still optimize the empty base anyway.

But specifically, the part of the standard that enforces the zero-sized nature of empty bases of standard layout types is the layout compatibility rules. It's not a simple "empty base classes of standard layout types are zero sized" kind of thing; it's a complex interaction between a multitude of rules that makes assigning storage for empty base classes of standard layout objects impossible.

See below for details.

> If you're asking why that change was made... well, read the original paper
> on standard layout.
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2342.htm>

The paper says in the footnote "This ensures that two subobjects that have the
same class type and that belong to the same most-derived object are not
allocated at the same address ([expr.eq]).". It doesn't explain why they would
be allocated at the same address in the first place. It's clear that the
restriction makes it so the same-allocation cannot happen, but doesn't say how
it could otherwise happen.

> Now to be completely fair, standard layout rules don't actually require EBO
> per-se. Instead, it requires that empty base classes not disturb the layout
> of the members of the type. Theoretically, empty base classes can take up
> space after all of the non-static data members. However, as I'll
> demonstrate below, it's very hard to implement the rules of standard layout
> types without making subobjects zero sized in at least some cases.

My question is why can't you consider the base as a sub-object of size 1, and
thus the first member of the derived class is at 2 + alignment?

Because `A`:

struct empty {};
struct A : empty {int i;};

is supposed to be layout compatible with `B`:

struct B {int i;};

`B::i` is the first member subobject of `B`, which is a standard layout type. [class.mem]/24 explicitly requires that `offsetof(B, i)` is zero.

> If a standard-layout class object has any non-static data members, its address is the same as the address
of its first non-static data member.

And [class.mem]/20 requires that `B` and `A` are layout compatible (see below). The only way that is possible is if `offsetof(A, i)` is also zero.

> Also, sharing objects between C and C++ does not (necessarily) require the
> objects to be defined in the same way. While the C++ object model does not
> support having C create C++ objects, if you create a C++ object that
> matches the layout of a C type, then C's object model can *receive* and
> manipulate that object just fine.

I'd say that's extremely fragile and should not be counted on. If you want to
share an object between C and C++, use the C declaration of the type. Anything
else is tempting UB.

I don't understand your thinking here. Standard layout was standardized specifically so that this could be "counted on". So that it won't be "tempting UB"... since it's by definition well-defined behavior.

You're basically saying that we shouldn't trust some part of the standard because you don't agree with it or something. Well, you can do what you want, but the standard guarantees this behavior. So besides appealing to C++98 sensibilities, there's no reason not to rely on it.

Unless the standard begins to *explicitly* say that a given C++ hierarchy must
always have the exact same layout as a C struct. Until that happens, I would
advise not writing code like that and I'd advise to always reject such code in
code reviews.

Remember the layout compatibility rules (or refresh your recollection below). If C++ has an object that is standard layout and is layout compatible with a C object declaration compiled as C++, then the two objects have the same layout. That's what "layout compatible" means.

There's no way it could be more "explicitly" stated than that without breaking the C++ object model.

> So why is it required that:
> > struct empty {};
> >
> > struct A : empty
> > {
> >     int i;
> > };
> >
> > sizeof(A) == sizeof(int)
> > offsetof(A, i) == 0
>
> The second one is required; the first is not. `A` is not required to be
> layout compatible with `int`.

Why is the second required? Why can't struct A be layout-equivalent to

struct Aprime
{
        empty _;
        int i;
};

That would effectively make EBO impossible, which is going in the wrong direction. And it would break any ABI that allocates storage for base classes after the current class's member subobjects. Such ABIs do indeed exist.

> However, `A` is required to be layout compatible with B, C, and D here:
>
> struct other_empty {};
> struct B : other_empty, A {};
> struct C : A, other_empty {};
> struct my_int { int i; };
> struct D : my_int {};
>
> Now, the sizes of B, C, and D are not required by the standard to be the
> same. But you can stick them in a `union` and fetch the same `::i`
> subobject from any of them. And that's good enough.

Again, why? The only one that seems to me you could put in a union and access
A::i is C, because it derives from A, so by definition it follows the initial
sequence requirement.

struct B has a member before A, even if empty.

No, it does not. It has a subobject before `A::i`, but not a member subobject.

Let's look at [class.mem]/12:

> The common initial sequence of two standard-layout struct (Clause 12) types is the longest sequence of
non-static data members and bit-fields in declaration order, starting with the first such entity in each of the
structs, such that corresponding entities have layout-compatible types and either neither entity is a bit-field
or both are bit-fields with the same width.

`other_empty` is not a non-static data member or a bit-field. Neither is `A`. However, `B` does have a non-static data member: `i`, which it inherited from `A`.

So, the sequence of "non-static data members and bit-fields" in `B` is: `int i`.

The sequence of "non-static data members and bit-fields" in `A` is: `int i`.

These sequences are the same. Therefore, the common initial sequence comprises all "non-static data members and bit-fields" of both types. Therefore, from [class.mem]/21:

> Two standard-layout struct (Clause 12) types are layout-compatible classes if their common initial sequence
comprises all members and bit-fields of both classes (6.9).

`A` is a standard layout type. `B` is a standard layout type. `A` and `B` have common initial sequences that "comprises all members and bit-fields of both classes".

Therefore, `A` and `B` are layout compatible.

QED.
 
struct D has a wholly different hierarchy.

See above.

I guess this is the interesting
one, because if we say that the initial sequence only applies to non-static
data members and does not include base sub-objects, then the only member in
both A and D is "int i". But even if that's the case (and I'd need specific
wording in the standard saying it is), it doesn't mean the bases must be
empty.

As you said yourself:

> You are correct that the standard does not require that these objects are
> the same size as one another. That an implementation can assign space for
> these base classes at the end of the type.

Then they are not empty, are they?

Actually, I was wrong. All of the empty base classes are zero sized. They have to be. Why? Because of layout compatibility's recursive nature.

Consider the following types:

struct empty {};
struct H { int i; };
struct J : empty { int i; };

struct A
{
  H h
;
 
int a;
};

struct B
{
  J j
;
 
int b;
};

Are `A` and `B` layout compatible?

If you break each one down into its sequence of members, per [class.mem]/20, you have to match `A::h` up against `B::j`. So, the standard asks if `H` and `J` are layout-compatible types.

And they are.

Therefore, implementations are not allowed to add padding to the end of `J`. Otherwise, `A::a` and `B::b` would not line up. Therefore, `sizeof(J)` must be the same as `sizeof(H)`. Not unless the compiler were magical and somehow knew that `A::h` needed padding afterwards simply because `B::j` exists.

So I was wrong; standard layout does completely enforce EBO.

Nicol Bolas

unread,
Dec 7, 2017, 10:00:54 PM12/7/17
to ISO C++ Standard - Future Proposals
On Thursday, December 7, 2017 at 5:15:35 PM UTC-5, Richard Smith wrote:
On 6 December 2017 at 14:39, Nicol Bolas <jmck...@gmail.com> wrote:
It's not fixing the problem it claims to because it lacks the guarantees that standard layout provides with EBO.

So I make the claim that without some kind of guarantee of optimization, this feature fails to achieve its goals.

I disagree. Its goal is to provide a superior alternative to EBO, and in my view, it achieves that.

But it is not a strict superset of EBO. That is, it has advantages in usability, but you pay for that by not knowing if it will actually be optimized away.

If the extent of template metaprogramming has taught me anything, it is that C++ programmers have shown a willingness to suffer through unimaginable pain to gain an optimization.

An implementation is (as ever) permitted to use stupid layout for structs -- inserting arbitrarily large amounts of padding for no reason is a valid implementation technique, and remains so in the presence of this feature. You're right that standard-layout structs do give an EBO guarantee in some cases, thereby disallowing lead padding, but an implementation can still pad out the *end* of the class with space for the empty base class if it wants (and indeed implementations actually do that in some cases).

Actually they can't; I explained how the standard forbids it at the bottom of my reply to Thaigo. And those implementations that add that padding at the end are defective.

Also, in generic code, you often cannot ensure that the class for which you would like to perform EBO is a standard-layout class, so special guarantees that apply only for standard-layout structs are often irrelevant.

"Often"? I think you overestimate how many non-standard-layout types are running around.

However, I think it would be reasonable to provide stronger layout guarantees for standard layout types when this attribute is used; the feature has not yet been discussed in CWG, and this seems like a reasonable thing to consider at that point. It's certainly in line with the design intent to provide the same guarantees for fields with this attribute as for base classes in the standard-layout case.

That's basically my whole point: that this feature ought to provide the same guarantees as standard layout base classes. Or rather, if it doesn't provide the same guarantee, then you give people incentive to keep using base classes despite the potential problems.

So, what would it take to fix this? Well, one big annoyance with P0840 is that it is defined as an attribute. And it stakes out its position on the validity of defining such a thing the idea that, so long as the code functions the same whether the attribute is implemented or not, then it's fine to use an attribute.

Following that rule (which, let's be frank, only exists as an expedient means of getting the proposal through the committee, not because it makes the feature better in some way)

This parenthetical is incorrect. Modeling this feature as an attribute makes it a conforming extension in prior language modes.

... that's very odd. I'm somewhat torn on how useful that kind of forward compatibility will be. After all, if this space optimization were that crucial to me, wouldn't I want to use the mechanism that the version of the language my code is written against requires, rather than the one that may well be ignored?

 
We did consider in committee the possibility of a keyword instead of an attribute, but as you can tell from the progress of the paper, the attribute approach won the day.
 
makes it impossible to ensure layout compatibility to between `S<T>` and `U<T>`. But that doesn't mean we cannot have some reasonable guarantees on whether `U<T>::t` is zero-sized.

When I was thinking about my own zero-sized idea, discussed here, I came upon the whole "unique identity rule" issue. Namely, that every object of the same type T must have a unique address. None of my solutions to that issue ever really pleased me. This proposal kind of sidesteps that problem, since the attribute is effectively saying "this subobject does not have to be uniquely identified, so it may be zero sized", rather than my approach of "this subobject is zero sized and therefore may not be uniquely identified".

Of course, that's part of the problem: users don't actually care about unique identity; they care about the size.

I brought up my idea because, while investigating it, I happened upon a solution that seems to dovetail very closely with this proposal. But first, let's look at the scope of the problem. Let's say you have this type:

struct empty {};

struct A
{
 
[[no_unqiue_address]] empty x;
  empty y
;
};

While `x` does not have to have a unique address, `y` does. And therefore `y`'s address must be unique from `x`'s. And since we're not allowed to reorder members, the compiler must include space for `x`.

Minor correction: we actually are allowed to reorder [[no_unique_address]] members.

I see nothing in P0840R1 that changes [class.mem]/17, which is what specifies the order that members are allocated in. Without such a change, reordering is not permitted.

sizeof(A) must still be at least 2, because the two 'empty' members must have distinct addresses, but the 'x' member can be in A's tail padding and thus reusable. However, it's desirable for an initial [[no_unique_address]] member to result in the same layout as a base class, so that there is no ABI change in moving from the old solution to the new one. In practice, that means the above struct will likely be laid out with the 'x' member at offset 0 and the 'y' member at offset 1.

I agree that this would be ideal. However, the rules P0840R1 specifies don't agree. In addition to the lack of changes [class.mem]/17, P0840 makes changes to [class.mem]/21, which governs layout compatibility. There, `no_unique_address`-qualified members are not ignored; they are explicitly required to be defined in both types for them to be layout compatible.

Naturally, if the committee likes the idea of enforced `no_unique_address` optimization for standard layout types, then I would say that the standard should change the layout compatibility rules to take advantage of it. However, the problem there is that P0840 is very clearly designed to avoid doing things such that removing the `no_unique_address` attribute would change the visible behavior of the program. As it currently stands, two types that are layout compatible with the attribute are still layout compatible without it.

If we do what I just suggested, this would no longer be the case.

Now of course, the ABI is not defined by layout compatibility; two incompatible types may use the same ABI definition. But C++ doesn't enforce it, so you wouldn't be able to put such types in unions and access each others' subobjects.


Thiago Macieira

unread,
Dec 8, 2017, 2:01:51 AM12/8/17
to std-pr...@isocpp.org
On Thursday, 7 December 2017 18:31:13 PST Nicol Bolas wrote:
> > blame
> > Microsoft for not adding __has_include until VS 15.3 (there was a thread
> > even
> > in std-discussions about that).
>
> ... that's not even the same kind of thing. `__has_include` is not an
> optional feature of C++17. It was going to happen in VS, sooner or later.

But was optional in C++11 and 14.

> [class.mem]/14 confounds you:
> > Non-static data members of a (non-union) class with the same access
>
> control (Clause 14) are allocated so
> that later members have higher addresses within a class object.
>
> And you cannot violate that requirement, since `no_unique_address` is an
> attribute.

And probably another reason why it shouldn't be an attribute.

> And the reasoning P0840 gave for allowing it to be an attribute
> rather than a keyword is that removing it means that the program still
> functions the "same", without affecting visible behavior (again, this is a
> lie, but whatever). Reordering `no_unique_address` members would be
> observable defined behavior; it would affect the result of `offsetof`, for
> example.

And it doesn't now? The fact that there are two members with the same address,
even if different types, is observable behaviour. And as I pointed out in the
previous reply, a struct with no_unique_address members should not be standard
layout, so another observable behaviour.

> > In any case, why must this restriction apply to [[no_unique_address]] ? If
> > we
> > don't want a unique address in the first place, why must it still be
> > unique
> > among sub-objects of the same type?
>
> Because C++ requires that every object of type T have a distinct address
> from every other object. That's [intro.object]/8.
>
> Now yes, `no_unique_address` certainly *sounds like* it shouldn't have to
> follow that rule. But that's how they wrote their modified version of
> [intro.object]/8. But the reason why we can't just change that is because
> any particular `T` *might* actually rely on it.

Right, I understand that. It makes sense, so I'll drop this part.

> > I never understood that either. If the base and the first member aren't
> > empty
> > classes, they obviously have different addresses. Why isn't that standard
> > layout?
>
> Because that means that there are two classes in the hierarchy that are
> contributing members. And standard layout forbids that.

In other words, that has nothing to do with the fact that they are of the same
type. And since we said that two instances of the same type shouldn't have the
same address, then an empty base followed by an empty member of the same type
can't be allocated at the same address.

> If they are empty but no [[no_unique_address]] is in effect, then they have
> > different addresses. Why isn't that standard layout?
>
> Because the only way to give them different addresses is to have the
> compiler assign space to the empty base class. And the ABI for that
> platform might put that space in front of the members.

Sorry, I didn't follow you. [but moot point]

The only way to give them different addresses is to give them different
addresses. That works for both base-before-current as well as base-after-
current.

The base won't be empty but...

> Lastly, we *want* standard layout types to enforce EBO.

That's putting the cart ahead of the oxen. Sure, we want EBO for efficiency,
but that's a nice side-effect whenver possible. Hence "optimisation".

But even if you don't get EBO when the base and the first member are of the
same type, why does that make the type not standard-layout? [see below, I've
answered myself]

> Also, this would break layout compatibility, since standard layout types
> have to be layout compatible with any types that have the same sequence of
> member subobjects. And if you assign space to empty bases before the first
> member, then their layouts are not compatible.

They are, but offset.

Which goes back to "only one class in the hierarchy can contribute members".
If there's a need for an offset, then the base has contributed members and
hence it's not standard-layout anymore.

So now I have to question why we need only one class in the hierarchy to
contribute members.

> > By the way, note how [[no_unique_address]] implies non-standard layout.
>
> It implies no such thing. Indeed, it changes *nothing* about the standard
> layout rules; the specification is very clear on that.

The specification can claim all it wants but it doesn't make it so.

If a member has no unique address, how can this be of standard-layout? Such a
thing cannot exist in C, so it's not possible to follow the layout rules for C
in the first place since there are no rules for this any more than there are
for virtual tables.

> > My question is why can't you consider the base as a sub-object of size 1,
> > and
> > thus the first member of the derived class is at 2 + alignment?
>
> Because `A`:
>
> struct empty {};
> struct A : empty {int i;};
>
> is supposed to be layout compatible with `B`:
>
> struct B {int i;};
>
> `B::i` is the first member subobject of `B`, which is a standard layout
> type. [class.mem]/24 explicitly requires that `offsetof(B, i)` is zero.

Why must it be layout compatible? I would want it to be, but why must it be?
[also answed below]

> > I'd say that's extremely fragile and should not be counted on. If you want
> > to
> > share an object between C and C++, use the C declaration of the type.
> > Anything
> > else is tempting UB.
>
> I don't understand your thinking here. Standard layout was standardized
> specifically so that this could be "counted on". So that it won't be
> "tempting UB"... since it's by definition well-defined behavior.

Still fragile and deserves a negative vote on code review. Most reviewers will
look at such code and see a reinterpret_cast between unrelated types. They
will remember the type-punning warnings and not the "initial common sequence"
rules (cf. sockaddr and sockaddr_in). Moreover, they will see a C++ type being
passed to C.

What's wrong with using the actual C type? What compelling use-cases are there
for doing the reinterpret_cast between any of the A, B, C, D, or my_int types
that you had in your email (my_int being the only one that the C code could
ever use)?

> You're basically saying that we shouldn't trust some part of the standard
> because you don't agree with it or something. Well, you can do what you
> want, but the standard *guarantees* this behavior. So besides appealing to
> C++98 sensibilities, there's no reason not to rely on it.

Well, I can call the feature ill-advised.

But I get your point that this was the intention, therefore it is today
required and therefore those types need to be layout compatible.

> Unless the standard begins to *explicitly* say that a given C++ hierarchy
>
> > must
> > always have the exact same layout as a C struct. Until that happens, I
> > would
> > advise not writing code like that and I'd advise to always reject such
> > code in
> > code reviews.
>
> Remember the layout compatibility rules (or refresh your recollection
> below). If C++ has an object that is standard layout and is layout
> compatible with a C object declaration compiled as C++, then the two
> objects *have the same layout*. That's what "layout compatible" means.

Right.

> > > The second one is required; the first is not. `A` is not required to be
> > > layout compatible with `int`.
> >
> > Why is the second required? Why can't struct A be layout-equivalent to
> >
> > struct Aprime
> > {
> > empty _;
> > int i;
> > };
>
> That would effectively make EBO impossible, which is going in the wrong
> direction. And it would break any ABI that allocates storage for base
> classes after the current class's member subobjects. Such ABIs do indeed
> exist.

I'm not asking for ABIs to change the way they currently do things. I am
asking why is it required that they implement this way and no other way. Or,
asked differently, why wouldn't an ABI that allocated as above be possible?
[keep reading, answered below]

> No, it does not. It has a *subobject* before `A::i`, but not a *member*
> subobject.
>
> Let's look at [class.mem]/12:
> > The common initial sequence of two standard-layout struct (Clause 12)
...
> Therefore, `A` and `B` are layout compatible.

Reluctantly agreed. I get that this is how it is, but I reserve my right to
disagree with the rationale for having made it so in the first place.

> So I was wrong; standard layout does *completely enforce* EBO.

I agree with your logic.

I don't have to like it.

Thiago Macieira

unread,
Dec 8, 2017, 2:11:10 AM12/8/17
to std-pr...@isocpp.org
On Thursday, 7 December 2017 19:00:53 PST Nicol Bolas wrote:
> "Often"? I think you overestimate how many non-standard-layout types are
> running around.

In my experience, the vast majority of types in C++ are not standard-layout.

That said, this population may not be relevant. We need to know how many types
that would be used with [[no_unique_address]] are standard-layout, non-final,
don't have unnecessary "using" clauses, and don't need to be used multiple
types by the same container.

> ... that's very odd. I'm somewhat torn on how useful that kind of forward
> compatibility will be. After all, if this space optimization were that
> crucial to me, wouldn't I want to use the mechanism that the version of the
> language my code is written against requires, rather than the one that may
> well be ignored?

I think the keyword here is "crucial". It's not *that* crucial in most cases.
it's just a nice thing to have, which is probably why we still refer to it as
an optimisation.

I personally would use the attribute knowing that the good compilers have
implemented the attribute, and let my users know to upgrade away from those
that don't. I do this all the time: I feel no problem in providing a worse
experience for compilers that provide worse C++ support and that fail to keep
up with the times, especially when they fail to keep up with free and open
source ones.
Message has been deleted
Message has been deleted

floria...@gmail.com

unread,
Dec 8, 2017, 6:57:30 AM12/8/17
to ISO C++ Standard - Future Proposals
Hello everybody,

I would want to give my point on view on this subject, I have to want you, I'm not a standard expert, just a user.

I agree with Nicol that the information about address uniqueness should embedded in the type itself, and not in its uses.
Indeed, only the type writer can know how their type behaves and if it is a problem for two different instances to share the same address.

So one could write something like that:
class [[no_unique_address]] empty_t {};
// if you prefer keyword instead of attribute:
inline class empty_t {};

class Foo {
  empty_t a
, b;
 
int c;
};

class Bar {
 
int i;
};

In that case, Foo would have the same layout as Bar.

This solve the problem of empty classes, but not the case of members with padding.
So one could extend the meaning of this attribute (or the inline class if you prefer):
The padding of an object can overlap other objects wherever they are.

Let's formalize that a bit. (I use the inline keyword to simplify my explanation, but anything could be used here)

An inline class is a class whose tail padding can be safely overlapped with any objects. If the inline class happens to be empty, then its padding can also be used by another instance of the same class, effectively sharing their address.
An inline class would then have two sizes: one size considering tail padding (regular sizeof), and one ignoring tail padding: inline sizeof().
Internal padding is kept as usual.

//examples

// Basic inline class
inline class A {
 
double d;
 
float f;
};

static_assert(sizeof(A) == 16, "");

static_assert(inline sizeof(A) == 12, "");
static_assert(alignof(A) == 8, "");

// Class with an inline class member
class B {
  A a;
  float f;
};

static_assert(sizeof(B) == 16, "");
static_assert(inline sizeof(B) == 16, "");

static_assert(alignof(B) == 8, "");

// Basic class (not same layout as the inline equivalent)
class C {
  double d;
  float f;
};

static_assert(sizeof(C) == 16, "");
static_assert(inline sizeof(C) == 16, "");

static_assert(alignof(C) == 8, "");

// Class using a regular class member
class D {
  C c;
  float f;
};

static_assert(sizeof(D) == 24, "");
static_assert(inline sizeof(D) == 24, "");

static_assert(alignof(D) == 8, "");

// Inline class with a regular class member
inline class E {
  C c;
  float f;
};

static_assert(sizeof(E) == 24, "");
static_assert(inline sizeof(E) == 20, "");

static_assert(alignof(E) == 8, "");


The inline size is not necessarily a multiple of the alignment and can be any integer >= 0.

I hope these examples are explicit enough to show my idea.

I would say that all those classes (both regular and inline) could be standard-layout, but could be difficult to specify this concept with padding removal (how to define the C equivalent?).
One way would be to flatten the member hierarchy whenever a member is an inline class (which would be coherent with the "inline" keyword).

One should also consider: shall we make "inline" classes opt-in or opt-out. The simplest would be opt-in, but it would be more useful to have it opt-out: I think most of the classes throughout the world just don't mind.

What do you think?


Florian

PS: Third send: I don't know why Google keeps deleting my reply

Nicol Bolas

unread,
Dec 8, 2017, 2:13:16 PM12/8/17
to ISO C++ Standard - Future Proposals
On Friday, December 8, 2017 at 2:11:10 AM UTC-5, Thiago Macieira wrote:
On Thursday, 7 December 2017 19:00:53 PST Nicol Bolas wrote:
> "Often"? I think you overestimate how many non-standard-layout types are
> running around.

In my experience, the vast majority of types in C++ are not standard-layout.

If you're living in Qt or MFC land, where everything is derived from a single base class that has a virtual function, this won't be very helpful.

But if you look at the C++ standard library or Boost or many other C++ libraries, you will find few types where being standard layout is either the obvious way of implementing them or would otherwise not be a burden. Sure, that's hard for something like `variant` or `optional`, but for `vector`, it's pretty trivial if the allocator is empty.

And indeed, `no_unique_address` usage will make it even easier for types to be standard layout. `vector` implementations that use this for allocators instead of EBO will be standard layout even if the allocator is non-empty.

So long as you're not writing types that use virtual functions, writing standard layout types is not very hard.

That said, this population may not be relevant. We need to know how many types
that would be used with [[no_unique_address]] are standard-layout, non-final,
don't have unnecessary "using" clauses, and don't need to be used multiple
types by the same container.

Look at the C++ standard library. Or Boost. Or most other C++ libraries. Most implementations of those types either are standard layout or could be implemented as such relatively easily.

> ... that's very odd. I'm somewhat torn on how useful that kind of forward
> compatibility will be. After all, if this space optimization were that
> crucial to me, wouldn't I want to use the mechanism that the version of the
> language my code is written against requires, rather than the one that may
> well be ignored?

I think the keyword here is "crucial". It's not *that* crucial in most cases.
it's just a nice thing to have, which is probably why we still refer to it as
an optimisation.

So you're telling me that we call it an optimization, not because that's what we called it before it was required, but because it's just a nice thing to have?

Sorry, not buying it. Etymology and word usage just doesn't work that way. People call things what they're used to calling them. RVO is still called an "optimization" even though it's required in many cases now. And people will continue to call it that. Not because it's just "a nice thing to have" but because that's its name.

I personally would use the attribute knowing that the good compilers have
implemented the attribute, and let my users know to upgrade away from those
that don't. I do this all the time: I feel no problem in providing a worse
experience for compilers that provide worse C++ support and that fail to keep
up with the times, especially when they fail to keep up with free and open
source ones.

That's fine; you're in a position where that's a reasonable choice to make. Other people aren't. Why shouldn't we help them switch to this functionality? Because that's what we're talking about.

That's the thing: nobody wants to use EBO for these sorts of thing (that's not to say that nobody wants EBO). Nobody ever said, "I think it totally makes sense for my `vector` implementation to inherit from the allocator." They don't do it because they want to; they do it because it gives them the optimal memory usage they want.

The whole point of `no_unique_address` is to allow us to get that optimal memory usage, but without the stupidity of inheriting from what is clearly a member. If that's the idea, if it is intended to replace EBO for those uses, then it needs to be as equivalent to EBO as possible. It should be all of the good parts of EBO, but without the stupid parts.

That's what I'm arguing for. You may not think it makes sense that standard layout types guarantee EBO. But given the fact that this guarantee exists, why does it not make sense for `no_unique_address` to provide the same guarantee?
 

Richard Smith

unread,
Dec 8, 2017, 2:27:23 PM12/8/17
to std-pr...@isocpp.org
On 7 December 2017 at 19:00, Nicol Bolas <jmck...@gmail.com> wrote:
On Thursday, December 7, 2017 at 5:15:35 PM UTC-5, Richard Smith wrote:
On 6 December 2017 at 14:39, Nicol Bolas <jmck...@gmail.com> wrote:
It's not fixing the problem it claims to because it lacks the guarantees that standard layout provides with EBO.

So I make the claim that without some kind of guarantee of optimization, this feature fails to achieve its goals.

I disagree. Its goal is to provide a superior alternative to EBO, and in my view, it achieves that.

But it is not a strict superset of EBO. That is, it has advantages in usability, but you pay for that by not knowing if it will actually be optimized away.

If the extent of template metaprogramming has taught me anything, it is that C++ programmers have shown a willingness to suffer through unimaginable pain to gain an optimization.

An implementation is (as ever) permitted to use stupid layout for structs -- inserting arbitrarily large amounts of padding for no reason is a valid implementation technique, and remains so in the presence of this feature. You're right that standard-layout structs do give an EBO guarantee in some cases, thereby disallowing lead padding, but an implementation can still pad out the *end* of the class with space for the empty base class if it wants (and indeed implementations actually do that in some cases).

Actually they can't; I explained how the standard forbids it at the bottom of my reply to Thaigo. And those implementations that add that padding at the end are defective.

Interesting, this is a defect in the standard. Consider:

struct A {};
struct B : A { int n; };
struct X { B b; int m; };

struct alignas(64) C {};
struct D : C { int n; };
struct Y { D d; int m; };

The offset of m in X and the offset of m in Y are different, because D has tail padding for alignment and B does not.

Also, in generic code, you often cannot ensure that the class for which you would like to perform EBO is a standard-layout class, so special guarantees that apply only for standard-layout structs are often irrelevant.

"Often"? I think you overestimate how many non-standard-layout types are running around.

However, I think it would be reasonable to provide stronger layout guarantees for standard layout types when this attribute is used; the feature has not yet been discussed in CWG, and this seems like a reasonable thing to consider at that point. It's certainly in line with the design intent to provide the same guarantees for fields with this attribute as for base classes in the standard-layout case.

That's basically my whole point: that this feature ought to provide the same guarantees as standard layout base classes. Or rather, if it doesn't provide the same guarantee, then you give people incentive to keep using base classes despite the potential problems.

Only if their implementation produces unnecessarily-wasteful class layouts. But the standard doesn't dictate that. If you're relying on particular layouts, you're relying on implementation-specific behavior already. Users have no incentives to use EBO if the alternative is at least as good in every case on every real implementation.

Is it accurate to summarize your concern as: some implementation might choose to produce a wasteful class layout, and you as a user would then lack the recourse of telling them they're non-conforming?
 
So, what would it take to fix this? Well, one big annoyance with P0840 is that it is defined as an attribute. And it stakes out its position on the validity of defining such a thing the idea that, so long as the code functions the same whether the attribute is implemented or not, then it's fine to use an attribute.

Following that rule (which, let's be frank, only exists as an expedient means of getting the proposal through the committee, not because it makes the feature better in some way)

This parenthetical is incorrect. Modeling this feature as an attribute makes it a conforming extension in prior language modes.

... that's very odd. I'm somewhat torn on how useful that kind of forward compatibility will be. After all, if this space optimization were that crucial to me, wouldn't I want to use the mechanism that the version of the language my code is written against requires, rather than the one that may well be ignored?

You presumably would not use this feature if your implementation doesn't support it. But, for instance, it sounds like the maintainer of libstdc++ is interested in using it whenever it's available (even in older language standards), to present the name leakage problems introduced by EBO, and falling back to EBO when the feature is not available. Other authors of portable libraries will likely do the same.

We did consider in committee the possibility of a keyword instead of an attribute, but as you can tell from the progress of the paper, the attribute approach won the day.
 
makes it impossible to ensure layout compatibility to between `S<T>` and `U<T>`. But that doesn't mean we cannot have some reasonable guarantees on whether `U<T>::t` is zero-sized.

When I was thinking about my own zero-sized idea, discussed here, I came upon the whole "unique identity rule" issue. Namely, that every object of the same type T must have a unique address. None of my solutions to that issue ever really pleased me. This proposal kind of sidesteps that problem, since the attribute is effectively saying "this subobject does not have to be uniquely identified, so it may be zero sized", rather than my approach of "this subobject is zero sized and therefore may not be uniquely identified".

Of course, that's part of the problem: users don't actually care about unique identity; they care about the size.

I brought up my idea because, while investigating it, I happened upon a solution that seems to dovetail very closely with this proposal. But first, let's look at the scope of the problem. Let's say you have this type:

struct empty {};

struct A
{
 
[[no_unqiue_address]] empty x;
  empty y
;
};

While `x` does not have to have a unique address, `y` does. And therefore `y`'s address must be unique from `x`'s. And since we're not allowed to reorder members, the compiler must include space for `x`.

Minor correction: we actually are allowed to reorder [[no_unique_address]] members.

I see nothing in P0840R1 that changes [class.mem]/17, which is what specifies the order that members are allocated in. Without such a change, reordering is not permitted.

Thanks for pointing out that wording oversight; it will be corrected. P0840R1 does change the *other* place this rule is specified, which is [expr.rel]/3.2.
 
sizeof(A) must still be at least 2, because the two 'empty' members must have distinct addresses, but the 'x' member can be in A's tail padding and thus reusable. However, it's desirable for an initial [[no_unique_address]] member to result in the same layout as a base class, so that there is no ABI change in moving from the old solution to the new one. In practice, that means the above struct will likely be laid out with the 'x' member at offset 0 and the 'y' member at offset 1.

I agree that this would be ideal. However, the rules P0840R1 specifies don't agree. In addition to the lack of changes [class.mem]/17, P0840 makes changes to [class.mem]/21, which governs layout compatibility. There, `no_unique_address`-qualified members are not ignored; they are explicitly required to be defined in both types for them to be layout compatible.

I think you're missing the point. The intent is to *allow* there to be no ABI change, not to *require* there to be no ABI change.
 
Naturally, if the committee likes the idea of enforced `no_unique_address` optimization for standard layout types, then I would say that the standard should change the layout compatibility rules to take advantage of it. However, the problem there is that P0840 is very clearly designed to avoid doing things such that removing the `no_unique_address` attribute would change the visible behavior of the program. As it currently stands, two types that are layout compatible with the attribute are still layout compatible without it.

If we do what I just suggested, this would no longer be the case.

Now of course, the ABI is not defined by layout compatibility; two incompatible types may use the same ABI definition. But C++ doesn't enforce it, so you wouldn't be able to put such types in unions and access each others' subobjects.


--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/3d091ab0-977c-4d24-a3d2-4b6b68f68c57%40isocpp.org.

Nicol Bolas

unread,
Dec 8, 2017, 2:34:23 PM12/8/17
to ISO C++ Standard - Future Proposals
On Friday, December 8, 2017 at 2:01:51 AM UTC-5, Thiago Macieira wrote:
On Thursday, 7 December 2017 18:31:13 PST Nicol Bolas wrote:
> > blame
> > Microsoft for not adding __has_include until VS 15.3 (there was a thread
> > even
> > in std-discussions about that).
>
> ... that's not even the same kind of thing. `__has_include` is not an
> optional feature of C++17. It was going to happen in VS, sooner or later.

But was optional in C++11 and 14.

I scanned through my copy of C++11 and C++14, and neither had any reference to `__has_include`. So it would only be "optional" in the sense that any implementation can add non-standard keywords and so forth.

What it isn't is an optional feature of the standard.

> [class.mem]/14 confounds you:
> > Non-static data members of a (non-union) class with the same access
>
> control (Clause 14) are allocated so
> that later members have higher addresses within a class object.
>
> And you cannot violate that requirement, since `no_unique_address` is an
> attribute.

And probably another reason why it shouldn't be an attribute.

> And the reasoning P0840 gave for allowing it to be an attribute
> rather than a keyword is that removing it means that the program still
> functions the "same", without affecting visible behavior (again, this is a
> lie, but whatever). Reordering `no_unique_address` members would be
> observable defined behavior; it would affect the result of `offsetof`, for
> example.

And it doesn't now? The fact that there are two members with the same address,
even if different types, is observable behaviour. And as I pointed out in the
previous reply, a struct with no_unique_address members should not be standard
layout, so another observable behaviour.

Well... yes. But the "observable behavior" ship sailed the moment P0840 said that `is_empty_v` is implementation defined for types where all of their members are `no_unique_address`.

So really, it's merely a question of how offensive said observable behavior will be. The committee apparently felt that `is_empty` was not sufficiently offensive. As well as the observable changes to pointer relationships. Though again, without the necessary changes to [class.mem] that would allow such reordering.

> > By the way, note how [[no_unique_address]] implies non-standard layout.
>
> It implies no such thing. Indeed, it changes *nothing* about the standard
> layout rules; the specification is very clear on that.

The specification can claim all it wants but it doesn't make it so.

If a member has no unique address, how can this be of standard-layout? Such a
thing cannot exist in C, so it's not possible to follow the layout rules for C
in the first place since there are no rules for this any more than there are
for virtual tables.

That's because you're thinking of "standard layout" as nothing more than "lay it out like C does". That's part of "standard layout", but that's not all that it is. Standard layout encompasses layout compatibility, which is what allows you to access the wrong object from a union.

But again, this goes to the point that this shouldn't be an attribute, and that `no_unique_address` members for standard layout types should not be checked when dealing with layout compatibility.




Nicol Bolas

unread,
Dec 8, 2017, 4:31:10 PM12/8/17
to ISO C++ Standard - Future Proposals


On Friday, December 8, 2017 at 2:27:23 PM UTC-5, Richard Smith wrote:
On 7 December 2017 at 19:00, Nicol Bolas <jmck...@gmail.com> wrote:
On Thursday, December 7, 2017 at 5:15:35 PM UTC-5, Richard Smith wrote:
On 6 December 2017 at 14:39, Nicol Bolas <jmck...@gmail.com> wrote:
It's not fixing the problem it claims to because it lacks the guarantees that standard layout provides with EBO.

So I make the claim that without some kind of guarantee of optimization, this feature fails to achieve its goals.

I disagree. Its goal is to provide a superior alternative to EBO, and in my view, it achieves that.

But it is not a strict superset of EBO. That is, it has advantages in usability, but you pay for that by not knowing if it will actually be optimized away.

If the extent of template metaprogramming has taught me anything, it is that C++ programmers have shown a willingness to suffer through unimaginable pain to gain an optimization.

An implementation is (as ever) permitted to use stupid layout for structs -- inserting arbitrarily large amounts of padding for no reason is a valid implementation technique, and remains so in the presence of this feature. You're right that standard-layout structs do give an EBO guarantee in some cases, thereby disallowing lead padding, but an implementation can still pad out the *end* of the class with space for the empty base class if it wants (and indeed implementations actually do that in some cases).

Actually they can't; I explained how the standard forbids it at the bottom of my reply to Thaigo. And those implementations that add that padding at the end are defective.

Interesting, this is a defect in the standard. Consider:

struct A {};
struct B : A { int n; };
struct X { B b; int m; };

struct alignas(64) C {};
struct D : C { int n; };
struct Y { D d; int m; };

The offset of m in X and the offset of m in Y are different, because D has tail padding for alignment and B does not.

I think adding a "types must have the same alignment" check to the layout compatibility rules will be sufficient. Or at least, I hope so; I could see the rules getting very complicated very quickly otherwise.

Also, in generic code, you often cannot ensure that the class for which you would like to perform EBO is a standard-layout class, so special guarantees that apply only for standard-layout structs are often irrelevant.

"Often"? I think you overestimate how many non-standard-layout types are running around.

However, I think it would be reasonable to provide stronger layout guarantees for standard layout types when this attribute is used; the feature has not yet been discussed in CWG, and this seems like a reasonable thing to consider at that point. It's certainly in line with the design intent to provide the same guarantees for fields with this attribute as for base classes in the standard-layout case.

That's basically my whole point: that this feature ought to provide the same guarantees as standard layout base classes. Or rather, if it doesn't provide the same guarantee, then you give people incentive to keep using base classes despite the potential problems.

Only if their implementation produces unnecessarily-wasteful class layouts. But the standard doesn't dictate that. If you're relying on particular layouts, you're relying on implementation-specific behavior already. Users have no incentives to use EBO if the alternative is at least as good in every case on every real implementation.

Is it accurate to summarize your concern as: some implementation might choose to produce a wasteful class layout, and you as a user would then lack the recourse of telling them they're non-conforming?

To me, it's not just about looking at a document and telling compiler vendors, "hey, your compiler is wrong". It's about being able to look at a document and say, "hey, my code is right." Being able to knowing that I have done everything I could to make my code optimal.

If I'm writing code that needs to run across multiple compilers, the ability to know a priori whether something is going to work as optimially as possible on all of them is valuable. It means I don't have to spend time checking each compiler and seeing how it implements the feature. I know that there are rules I can follow to make it work, so I can follow those rules whenever I need it, and can therefore get on with my actual work.

Does inheriting from types cause problems? Absolutely. It's unnatural, creates unpleasant code, can fail with `final` types, can cause name conflicts, etc.

But if my type is otherwise standard layout, inheriting from the type works. I may have to guard against issues and so forth, but I'm not spending a bunch of time researching compilers to see who implements what in which circumstances.

I want to not have to spend time researching compilers or dealing with unnatural code. I shouldn't have to pick one or the other.

Obviously standard requirements are not a practical guarantee (as we've seen variance in implementing EBO with standard layout). But not standardizing such requirements won't provide a better practical guarantee.

I just don't see what is to be gained by not imposing empty member requirements where we already have empty base requirements.

So, what would it take to fix this? Well, one big annoyance with P0840 is that it is defined as an attribute. And it stakes out its position on the validity of defining such a thing the idea that, so long as the code functions the same whether the attribute is implemented or not, then it's fine to use an attribute.

Following that rule (which, let's be frank, only exists as an expedient means of getting the proposal through the committee, not because it makes the feature better in some way)

This parenthetical is incorrect. Modeling this feature as an attribute makes it a conforming extension in prior language modes.

... that's very odd. I'm somewhat torn on how useful that kind of forward compatibility will be. After all, if this space optimization were that crucial to me, wouldn't I want to use the mechanism that the version of the language my code is written against requires, rather than the one that may well be ignored?

You presumably would not use this feature if your implementation doesn't support it. But, for instance, it sounds like the maintainer of libstdc++ is interested in using it whenever it's available (even in older language standards), to present the name leakage problems introduced by EBO, and falling back to EBO when the feature is not available. Other authors of portable libraries will likely do the same.

But you don't have to make it an attribute to allow that.

In order to "fall back to EBO when the feature is not available", you would need to wrap your alternative implementations with a feature test macro. That is, you wouldn't be relying on `no_unique_address` being an attribute; you would be relying on it being an implemented feature.

You'd get the same effect with a keyword.

The attribute mechanism would only matter in cases where you refuse to have an EBO fallback. Where it's not between EMO and EBO; it's between EMO and "take up extra space".
 
sizeof(A) must still be at least 2, because the two 'empty' members must have distinct addresses, but the 'x' member can be in A's tail padding and thus reusable. However, it's desirable for an initial [[no_unique_address]] member to result in the same layout as a base class, so that there is no ABI change in moving from the old solution to the new one. In practice, that means the above struct will likely be laid out with the 'x' member at offset 0 and the 'y' member at offset 1.

I agree that this would be ideal. However, the rules P0840R1 specifies don't agree. In addition to the lack of changes [class.mem]/17, P0840 makes changes to [class.mem]/21, which governs layout compatibility. There, `no_unique_address`-qualified members are not ignored; they are explicitly required to be defined in both types for them to be layout compatible.

I think you're missing the point. The intent is to *allow* there to be no ABI change, not to *require* there to be no ABI change.

But there's a difference between ABI equivalence and the layout compatibility rules. Consider the following:

struct A
{
 
int i;
};

struct B
{
 
[[no_unique_address]] empty e;
 
int j;
};

P0840 allows `A` and `B` to share the same ABI. Now, they may not have ABI equivalence, but the proposal allows implementations to make them equivalent.

What is not implementation-dependent is layout compatibility. So even on implementations where they are ABI equivalent, they are not layout compatible.

By contrast, if you inherit from `empty`, you get both ABI equivalence and layout compatibility.

I don't think it's a good thing to allow EBO to do something that `no_unique_address` can't. If we add rules to enforce empty `no_unique_address` members for standard layout types, then we can change the layout compatibility rules (which already require the two types to be standard layout) to ignore `no_unique_address` members.

This would give us ABI equivalence and layout compatibility, just like the EBO version.

Or if we just can't have direct enforcement, at the very least the standard should say that if an implementation makes all `no_unique_address` members zero-sized, then they are layout compatible. Along with some kind of type-trait to detect this circumstance.

Thiago Macieira

unread,
Dec 8, 2017, 4:52:20 PM12/8/17
to std-pr...@isocpp.org
On sexta-feira, 8 de dezembro de 2017 11:34:23 PST Nicol Bolas wrote:
> I scanned through my copy of C++11 and C++14, and neither had any reference
> to `__has_include`. So it would only be "optional" in the sense that any
> implementation can add non-standard keywords and so forth.
>
> What it isn't is an optional feature of the standard.

A matter of interpretation. We considered SD-6 to be a mandatory part of the
implementation. Failing to implement it meant we would fail to detect the
feature.

Thiago Macieira

unread,
Dec 8, 2017, 5:17:53 PM12/8/17
to std-pr...@isocpp.org
On sexta-feira, 8 de dezembro de 2017 11:13:16 PST Nicol Bolas wrote:
> On Friday, December 8, 2017 at 2:11:10 AM UTC-5, Thiago Macieira wrote:
> > On Thursday, 7 December 2017 19:00:53 PST Nicol Bolas wrote:
> > > "Often"? I think you overestimate how many non-standard-layout types are
> > > running around.
> >
> > In my experience, the vast majority of types in C++ are not
> > standard-layout.
>
> If you're living in Qt or MFC land, where everything is derived from a
> single base class that has a virtual function, this won't be very helpful.

That's only two thirds of the classes. The value-type classes in Qt, like
QString, QVector, QNetworkProxy, etc. do not derive from a common base and do
not have virtuals.

If they are standard-layout, it's only a side effect of our using private
implementation paradigms, where there's only one member and then quite by
construction all members have the same access level. That's not intentional
and no one should rely on that.

More importantly, the members of the class are the invariant and you should
not inspect them through reinterpret_cast'ing or by placing them in a union.
This particular rule applies to the vast majority of C++ classes. Their being
standard-layout or not is completely irrelevant.

> But if you look at the C++ standard library or Boost or many other C++
> libraries, you will find few types where being standard layout is either
> the obvious way of implementing them or would otherwise not be a burden.
> Sure, that's hard for something like `variant` or `optional`, but for
> `vector`, it's pretty trivial if the allocator is empty.

Nowhere in the standard does it say vector needs to be standard-layout. In
fact, because implementations often make std::vector derive from one or more
base classes, it's quite likely it isn't standard-layout.

libstdc++'s std::vector is empty, deriving from _Vector_base, which has a
single NSDM: a struct with three trivial elements, no access change.
Therefore, this std::vector is standard-layout.

libc++'s std::vector is empty, deriving from __vector_base, which in turn
derives from an empty base __vector_base_common. __vector_base has three
protected members, one of which is __compressed_pair. That, in turn, privately
inherits from two base classes, at least one of which has a private member.
Therefore, libc++'s std::vector is not standard-layout.

I also bet this is the first time anyone tried to see if std::vector is
standard-layout. The information is irrelevant. You shouldn't need to know
this and if you think you need it, then you're doing something wrong.

> And indeed, `no_unique_address` usage will make it even easier for types to
> be standard layout. `vector` implementations that use this for allocators
> instead of EBO will be standard layout even if the allocator is non-empty.
>
> So long as you're not writing types that use virtual functions, writing
> standard layout types is not very hard.

And also completely irrelevant. Who cares if std::vector or QNetworkProxy is
standard-layout? What is it going to be layout compatible with? QNetworkProxy
is layout-compatible with QUrl and QString, but no one would ever think of
using that particular detail.

I'm not saying standard-layout is useless or irrelevant. I'm saying that
standard-layoutness is a goal in itself: when you need it, you design your
class hierarchy to meet it. But for most classes, they are standard-layout
quite by accident.

> > That said, this population may not be relevant. We need to know how many
> > types
> > that would be used with [[no_unique_address]] are standard-layout,
> > non-final,
> > don't have unnecessary "using" clauses, and don't need to be used multiple
> > types by the same container.
>
> Look at the C++ standard library. Or Boost. Or most other C++ libraries.
> Most implementations of those types either are standard layout or could be
> implemented as such relatively easily.

How many of those are guaranteed by specification to be and remain as
standard-layout?

And my question was even more restrictive: how many of those classes are
expected to be used with [[no_unique_address]]? Looking at the entirety of C++
class population, those that are standard-layout, or even those that are
intentionally standard-layout is still biasing the statistic.

> So you're telling me that we call it an optimization, not because that's
> what we called it before it was required, but because it's just a nice
> thing to have?

Yeah, I was.

> Sorry, not buying it. Etymology and word usage just doesn't work that way.
> People call things what they're used to calling them. RVO is still called
> an "optimization" even though it's required in many cases now. And people
> will continue to call it that. Not because it's just "a nice thing to have"
> but because *that's its name*.

As a counter-example, we have "mandatory ellision" now, which technically RVO
is a type of. In an ideal world, we'd refer by the correct term.

But people still call the Standard Library the "STL" even though it isn't
called that and there are many classes that aren't templates. So, yes, old
habits die hard.

> > I personally would use the attribute knowing that the good compilers have
> > implemented the attribute, and let my users know to upgrade away from
> > those
> > that don't. I do this all the time: I feel no problem in providing a worse
> > experience for compilers that provide worse C++ support and that fail to
> > keep
> > up with the times, especially when they fail to keep up with free and open
> > source ones.
>
> That's fine; you're in a position where that's a reasonable choice to make.
> Other people aren't. Why *shouldn't* we help them switch to this
> functionality? Because that's what we're talking about.

We should help them switch, but I don't think we have to get our of our way to
do so. If an attribute suffices for the functionality and most compilers
implement it (or will soon), that should be it. There's no need to create a
keyword, mostly because it doesn't help much with adoption since people will
have to check #ifdef __cpp_no_unique_address and define their macro anyway.

In any case, I do think we need a keyword but for other reasons.

> That's what I'm arguing for. You may not think it makes sense that standard
> layout types guarantee EBO. But given the fact that this guarantee exists,
> why does it not make sense for `no_unique_address` to provide the same
> guarantee?

It's a side-effect guarantee, not an intentional part.

bastie...@gmail.com

unread,
Dec 9, 2017, 2:49:52 AM12/9/17
to ISO C++ Standard - Future Proposals
On Thursday, December 7, 2017 at 9:03:44 AM UTC+1, Corentin wrote:

To me the issue is the idea that the compiler can ignore all an every attributes. 

I'd argue that attributes should be allowed to change the semantic of a program, on a per-attribute basis.
[[no_unique_address]] is a perfect exemple. It is arcane enough that it does not warrant a new keyword or the re-purposing of one ( like register ), and act like a kind of qualifier. For the reason you pointed out, there is an expectation that this attributes has an unconditional semantic significance and enforcing that in the program seems reasonable.

I'm not, by any means, suggesting that all futures keywords be implemented in terms of attributes.  The decision of introducing attributes that are not discard-able should be made on a per-attributes basis.

I'm working on a set of rules
  • Exisiting standard attributes are ignorable
  • Compilers / tooling provided attributes are always ignorable
  • The committee may decide in the future to introduce attributes that are not ignorable and  carry semantic meaning. Compilers that ignore those attributes are not-conformant. Adding or removing such attributes may change the meaning if the program. 
  • Whether futures attributes are ignorable will be decided on a per-attribute basis;

That would let the committee decide what the best semantic for [[no_unique_address]] is.
And we can build upon that to introduce user-defined attributes and attribute reflection;
The idea being that user-defined attributes are always namespaced so that the standard can keep adding new attributes without running into name-collision.

I agree, you should make a new thread about it.
That being said [[no_unique_address]] is not the best attribute to bring that up, seeing the disagreement on the importance of a layout side-effect.
Also your ruleset is not really required as they are already implicitly written in.

6 For an attribute-token (including an attribute-scoped-token) not specified in this document, the behavior is
implementation-defined. Any attribute-token that is not recognized by the implementation is ignored. [ Note:
Each implementation should choose a distinctive name for the attribute-namespace in an attribute-scoped-token.
—end note ]

And the standard uses for the existing attributes terms like "may", "should" or "might" to describe their behaviours.

As Nicol noted in the past: 
You won't find it in the standard. It's a philosophy of what the committee decides to make attributes vs. real syntax. The dividing line is generally this: if you remove the attribute, will the code that compiled with those attributes still compile and have the same meaning? If the answer is "no", then it shouldn't be an attribute.

So you'd have to convince the committee to change its mind on that instead of doing a normal proposal.
Personally I believe that the existence of such a meta rule is quite issuematic and doesn't seem well motivated.

Corentin

unread,
Dec 9, 2017, 4:50:04 AM12/9/17
to std-pr...@isocpp.org
That being said [[no_unique_address]] is not the best attribute to bring that up, seeing the disagreement on the importance of a layout side-effect.

I was pointing out that I agree with Nicol that a variable declared to not have a unique address should always be optimally packed and that I don't think that that precludes using an attribute.
However, there are indeed 2 discussions to be had.
  • How would such variable behave, what would the semantic of sizeof, is_empty, etc be.
  • What syntax use.
That second question should not constrained the first. Once everyone agrees on semantic, we will be better equipped to decide whether an attribute would make sense or not.
But constraining the semantic in such a way that it make the feature less reliable and more clunky, for the sake oh honoring the current philosophical view of what an attribute should be seems counter-productive to me.

I have taken note of your other remarks on attributes and thank you for them.
However, as you said, it's a longer conversation that deserves its own discussion.

Corentin.    


--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.

To post to this group, send email to std-pr...@isocpp.org.

Nicol Bolas

unread,
Dec 9, 2017, 10:50:53 AM12/9/17
to ISO C++ Standard - Future Proposals, floria...@gmail.com
On Friday, December 8, 2017 at 6:57:30 AM UTC-5, floria...@gmail.com wrote:
Hello everybody,

I would want to give my point on view on this subject, I have to want you, I'm not a standard expert, just a user.

I agree with Nicol that the information about address uniqueness should embedded in the type itself, and not in its uses.

That's slightly different from what I said. Well, that statement is similar to one I made, but that's not enough of what I was saying.

One thing I realized when I was writing my idea was that there needs to be a distinction between "I want this subobject to take up no space" and "this type intrinsically is without state". The former is something you do on a case-by-case basis, but doing that does not mean that the object loses unique identity. The compiler simply chooses not to assign storage for it so long as it doesn't violate unique identity to do so.

The latter is a stronger statement that ensures that all uses of the type as a subobject (whether base or member, whether standard layout or not) will not take up space in the layout.

If you look at the history of my paper, you find that the first revision used `stateless` to refer to both things. I quickly realized (after conversations here) that I was really talking about two different-though-related things. So I moved to having `stateless` mean that the type is without state, and `zero_sized` mean that the subobject should not have size in the object's layout. `stateless` implies `zero_sized`, but it also exempts the compiler from having to give `stateless` objects unique identity. And therefore, `stateless` types used as member or base class subobjects never disturb the object's layout.

`stateless` types also aren't allowed to have non-`stateless` subobjects.

Nicol Bolas

unread,
Dec 9, 2017, 12:38:36 PM12/9/17
to ISO C++ Standard - Future Proposals
None. But that's not my point.

Your statements seemed to suggest that you think standard layout types are a minority of types. That there really weren't very many out there, and therefore making guarantees about their behavior is pointless. I'm trying to show that there are a lot of standard layout types out there. And therefore, making guarantees about them is not pointless.

You may see it as an implementation detail, but the point remains: people write standard layout types all the time.

Indeed, the more guarantees we can make about them, the more incentive people have to be conscious of it, to deliberately make their types standard layout so that they can enjoy those benefits.

And my question was even more restrictive: how many of those classes are
expected to be used with [[no_unique_address]]?

Your question is invalid. It's not up to the creator of a class to decide when it may or may not be used alongside `no_unique_address` types. Observe:

struct mine
{
  a_type val
;
 
[[no_unique_address]] some_type t;
};

The writer of `a_type` did not decide to use it alongside a `no_unique_address` type. But that still happened all-the-same.

In cases like this, it's not a question of whether `a_type` is certainly standard layout. After all, if that type were `vector<int>`, we don't know if it's standard layout or not. What matters is that it could be. And if an implementation of `vector` happens to be standard layout, and the standard has guarantees about `no_unique_address` subobjects in standard layout types, then I know that `t` takes up no space in `mine`.

And therefore, I get the same guarantee as I would for this code:

struct mine_old : some_type
{
  std
::vector<int> ivec;
};

And that's what I want: to have the same guarantees for EMO as I do for EBO.

Why is that so much to ask?

Looking at the entirety of C++
class population, those that are standard-layout, or even those that are
intentionally standard-layout is still biasing the statistic.

> So you're telling me that we call it an optimization, not because that's
> what we called it before it was required, but because it's just a nice
> thing to have?

Yeah, I was.

> Sorry, not buying it. Etymology and word usage just doesn't work that way.
> People call things what they're used to calling them. RVO is still called
> an "optimization" even though it's required in many cases now. And people
> will continue to call it that. Not because it's just "a nice thing to have"
> but because *that's its name*.

As a counter-example, we have "mandatory ellision" now, which technically RVO
is a type of. In an ideal world, we'd refer by the correct term.

How is that a counterexample? More people are familiar with the term RVO than with the term "elision", guaranteed or not. And indeed, the term RVO was never used in the standard; it was always called "elision", so the standard never referred to it as an "optimization".

So you're basically agreeing with my premise that whether it's called an "optimization" is mere semantics, not some kind of definitive statement about the inherent nature of some functionality.

But people still call the Standard Library the "STL" even though it isn't
called that and there are many classes that aren't templates. So, yes, old
habits die hard.

> > I personally would use the attribute knowing that the good compilers have
> > implemented the attribute, and let my users know to upgrade away from
> > those
> > that don't. I do this all the time: I feel no problem in providing a worse
> > experience for compilers that provide worse C++ support and that fail to
> > keep
> > up with the times, especially when they fail to keep up with free and open
> > source ones.
>
> That's fine; you're in a position where that's a reasonable choice to make.
> Other people aren't. Why *shouldn't* we help them switch to this
> functionality? Because that's what we're talking about.

We should help them switch, but I don't think we have to get our of our way to
do so. If an attribute suffices for the functionality and most compilers
implement it (or will soon), that should be it. There's no need to create a
keyword, mostly because it doesn't help much with adoption since people will
have to check #ifdef __cpp_no_unique_address and define their macro anyway.

In any case, I do think we need a keyword but for other reasons.

But without guarantees, why does it need to be a keyword? If adding it has no (or trivial) semantic effects, what's the point of making a keyword for it? If using it is merely "a nice thing to have" rather than having real, genuine effects, then it's not worth having a keyword for it.

> That's what I'm arguing for. You may not think it makes sense that standard
> layout types guarantee EBO. But given the fact that this guarantee exists,
> why does it not make sense for `no_unique_address` to provide the same
> guarantee?

It's a side-effect guarantee, not an intentional part.

Is it? The original proposal doesn't seem to agree; it explicitly recognizes that the rules it lays out is guaranteeing EBO, and it recognizes this as a positive aspect of the proposal.

It was most certainly not an accident. It may not have been the entire purpose, but it's hardly a mere "side-effect".

And even if it was a side-effect... so what? Side-effects are what allow `std::enable_if_t` to work. We treat "side-effects" as genuine features of the language all the time.

Personal dislike of a feature is not sufficient reason for pretending that it doesn't exist.

Thiago Macieira

unread,
Dec 9, 2017, 1:14:23 PM12/9/17
to std-pr...@isocpp.org
On sábado, 9 de dezembro de 2017 09:38:36 PST Nicol Bolas wrote:
> > How many of those are guaranteed by specification to be and remain as
> > standard-layout?
>
> None. But that's not my point.
>
> Your statements seemed to suggest that you think standard layout types are
> a minority of types. That there really weren't very many out there, and
> therefore making guarantees about their behavior is pointless. I'm trying
> to show that there are a* lot* of standard layout types out there. And
> therefore, making guarantees about them is not pointless.

I disagree.

Unless a type is intentionally standard-layout, then you shouldn't rely on it,
because it may change in the next version or keep flipping back and forth
depending on some external parameter. It may differ from implementation to
implementation, architecture to architecture.

I'd much rather have a solution that worked for all types than to only work
for those that are standard-layout. And as we've seen, the attribute solution
does have other benefits too.

> You may see it as an implementation detail, but the point remains: people
> write standard layout types* all the time*.

I'm not disputing they do. I'm saying they didn't set out with that in mind.

> Indeed, the more guarantees we can make about them, the more incentive
> people have to be conscious of it, to deliberately make their types
> standard layout so that they can enjoy those benefits.

And I think we should not do this or make it an incentive. People designing
for the side-effect creates perverse consequences, where types are forced into
a mold they had no business using. For example, the use of protected: a class
could have a protected member and a private one, but since that breaks the
standard-layoutness of the type, the author could decide to make both
protected or both private plus add a friend. In one solution you expose too
much to another derived class, whereas in the other you exposed to little.

So, no, standard-layout should be a goal in itself, used only where it's
needed.

> > And my question was even more restrictive: how many of those classes are
> > expected to be used with [[no_unique_address]]?
>
> Your question is invalid. It's not up to the creator of a class to decide
> when it may or may not be used alongside `no_unique_address` types. Observe:
>
> struct mine
> {
> a_type val;
> [[no_unique_address]] some_type t;
> };
>
> The writer of `a_type` did not decide to use it alongside a
> `no_unique_address` type. But that still happened all-the-same.

I'm not going to get into the discussion of whether the class definition or
the class use point should get the attribute. That was not the intent of my
question.

My intent was to ask which classes are likely to be used with the attribute. I
was not making any implications about whether it's a good idea to use it or
not, just asking what the population is.

> In cases like this, it's not a question of whether `a_type` is certainly
> standard layout. After all, if that type were `vector<int>`, we don't know
> if it's standard layout or not. What matters is that it* could be*. And if
> an implementation of `vector` happens to be standard layout, and the
> standard has guarantees about `no_unique_address` subobjects in standard
> layout types, then I *know* that `t` takes up no space in `mine`.
>
> And therefore, I get the same guarantee as I would for this code:
>
> struct mine_old : some_type
> {
> std::vector<int> ivec;
> };

This type has no guarantee of EBO, since it isn't guaranteed to be standard-
layout. But it might be, in which case EBO becomes mandatory. So in a certain
percentage of implementations, mine_old is known to have EBO. In others,
whether some_type occupies effectively zero space is up to the ABI.

How is that different from the case with [[no_unique_address]]? On some
implementations, it will apply, in others it won't.

> And that's what I want: to have the same guarantees for EMO as I do for EBO.
>
> Why is that so much to ask?

Because:
a) you don't have guarantees in many cases, like the one above
b) using an artifact to obtain something instead of using the right tool has
consequences and side-effects
c) it creates perverse design effect as people hammer something into a shape
it shouldn't have

The engineer's motto is violated: The right tool for the right job.

> > As a counter-example, we have "mandatory ellision" now, which technically
> > RVO
> > is a type of. In an ideal world, we'd refer by the correct term.
>
> How is that a counterexample? More people are familiar with the term RVO
> than with the term "elision", guaranteed or not. And indeed, the term RVO
> was never used in the standard; it was always called "elision", so the
> standard never referred to it as an "optimization".

RVO is not the same as the permitted ellision in the standard. It began as a
language extension like:

MyType functionName() returns(ret)
{
if (x)
ret.clear();
else
ret.populate();
return ret;
}

This gave a name to the implicit parameter that was used to return the object.

Then the language extension was dropped in favour of the compiler detecting
which variable in your body was your return value and applying the allowed
rules of elision to elide a copy. But note how it's still an optimisation:

MyType functionName()
{
MyType ret;
if (!x)
ret.populate();
return ret;
}

> > In any case, I do think we need a keyword but for other reasons.
>
> But without guarantees, why does it need to be a keyword? If adding it has
> no (or trivial) semantic effects, what's the point of making a keyword for
> it? If using it is merely "a nice thing to have" rather than having real,
> genuine effects, then it's not worth having a keyword for it.

Well, I think we need a keyword because it has non-trivial semantic effects.

Moreover, a keyword cannot be ignored by the implementation. It must implement
the keyword if it wants to be called "conformant".

> > It's a side-effect guarantee, not an intentional part.
>
> Is it? The original proposal doesn't seem to agree; it explicitly
> recognizes that the rules it lays out is guaranteeing EBO, and it
> recognizes this as a positive aspect of the proposal.

I'm saying that the rules that guarantee EBO did not intend to guarantee EBO.
It is there now and you've proven so. But it wasn't the intent.

> It was most certainly not an accident. It may not have been the entire
> purpose, but it's hardly a mere "side-effect".

I didn't say it was an accident. That would imply chance and that's not the
case.

> And even if it was a side-effect... so what? Side-effects are what allow
> `std::enable_if_t` to work. We treat "side-effects" as genuine features of
> the language all the time.

Template metaprogramming arose as a side-effect and we use it all the time.

But that did not stop us from adding constexpr evaluation.

> Personal dislike of a feature is not sufficient reason for pretending that
> it doesn't exist.

Dislike was never part of my arguments. I'm arguing for a superior solution.
Reply all
Reply to author
Forward
0 new messages