Stateless Subobjects and Types

313 views
Skip to first unread message

Nicol Bolas

unread,
Jul 7, 2016, 10:16:42 PM7/7/16
to ISO C++ Standard - Future Proposals
A while back, I came up with a proposal for stateless types, as well as inner classes and mixin support. At the time, I thought that all three ideas built on one another, such that only providing one of them would not be a particularly worthwhile feature. Since then, I've had a re-evaluation of that notion.

So here is what I currently think in terms of just stateless stuff. The proposal is available in HTML and Markdown formats.

A quick overview:

* You can declare that a empty type which is used as a subobject of a class is stateless (taking up no space in the layout). Stateless subobjects behave like regular objects where possible. This works for both members and base classes; people keep forgetting that EBO is optional if the class isn't standard layout.
* You can declare that a type will be used in a stateless fashion whenever it is declared as a subobject of another type. It can be used normally in non-stateless ways, but it must conform to the requirements of stateless subobject (ie: the class must be empty).
* You can condition either kind of declaration on a constant expression, like `noexcept` conditions.
Stateless Subobjects.html
Stateless Subobjects.md

Bengt Gustafsson

unread,
Jul 13, 2016, 7:03:42 AM7/13/16
to ISO C++ Standard - Future Proposals
It seems to me that by requiring that the type of the variable being declared stateless really is empty you prevent your new feature from being used in maybe the most common use case exemplified by the container Alloc type. What's needed there is a potentially zero-sized member in case the Alloc class is stateless, but which takes up as much space as needed if it is not.

I commend you for pointing out that even classes with virtual methods can be empty, but maybe your paper should elaborate a little on why: You think of such an object as containing only a vtbl pointer but as it is a by-value member of its containing object you don't need the vtbl pointer as you know what it points to for each method anyway.

Your idea of allowing a class declaration to be declared stateless is scary as it removes the declaration of the statelessness from the usage. Now you can declare a std::vector<T> with a stateless T and, well, it is hard to tell if this would work. If sizeof(T) is 1 for a stateless T it could but would waste heap memory. As long as the stateless property is on the variable the code accessing the variable can work with the restrictions that this qualifier introduces.

I don't see the problem with array iteration that you state " If the stateless array has a count larger than 1, and it is contained in an empty class, there is no guarantee that adding a number larger than 1 will result in a valid address for the purposes of iteration."
The address is going to be just as valid as for any iteration. Yes, the address can point to memory that would cause an access violation if accessed, but there is no state to access there anyway, so doing so deserves an access violation. What could possibly happen is that if you declare a very large array of zero sized objects the address could wrap around causing loops with < end conditions to stop immediately. Is this what you mean? I would say that this case is arcane enough to let through with a recommendation: Don't use < tests on loop pointers when iterating over stateless objects. (Here again is a difference between a stateless variable and a stateless class: if stateless claesses are allowed this has to be extended to: Don't use < tests in loops in template code). In any case, as you mention: The this pointer for any method call inside the loop body would not care if the address was actually outside the memory area (0 bytes) allocated for the array... it will not use the this pointer anyway.

Oops: While writting the previous clause I noted that your assumption that types with virtual methods are ok as stateless is false: If the implementation of a virtual method that you call directly by compile time lookup of the vtbl calls another virtual method this will be done using a regular virtual call, and thus it will need the vtbl that was optimized away!

Nicol Bolas

unread,
Jul 13, 2016, 10:20:22 AM7/13/16
to ISO C++ Standard - Future Proposals
On Wednesday, July 13, 2016 at 7:03:42 AM UTC-4, Bengt Gustafsson wrote:
It seems to me that by requiring that the type of the variable being declared stateless really is empty you prevent your new feature from being used in maybe the most common use case exemplified by the container Alloc type. What's needed there is a potentially zero-sized member in case the Alloc class is stateless, but which takes up as much space as needed if it is not.

I called that specific case out as an example:

template<typename Alloc>
struct DataBlock
{
    stateless
(is_empty_v<Alloc>) Alloc a;
};

`a` will take up no space if `Alloc` is empty. Also, you could use `stateless(auto)` instead of the explicit `is_empty_v` test.

This allows you to make a distinction between "Make this object stateless where possible" and "If this type is not empty, then that's an error." While the former is useful, the latter is also quite useful. After all, if your type needs to be empty, and you're including a bunch of other member types, you want to make sure nobody breaks your type's rules by accidentally adding an NSDM or something.

That is:

struct OtherType
{
  stateless
First f;
 
Second s; //Oops, forgot to declare that stateless.
};

struct MyType
{
  stateless
OtherType o;
};

`MyType` should be guaranteed to be empty, just as if you had declared no members at all. If `OtherType` doesn't play along, then there clearly ought to be a compile error.

I commend you for pointing out that even classes with virtual methods can be empty, but maybe your paper should elaborate a little on why: You think of such an object as containing only a vtbl pointer but as it is a by-value member of its containing object you don't need the vtbl pointer as you know what it points to for each method anyway.

The Motivation and Scope section didn't exclude virtual types, but that was because they are logically empty. The Design section made it perfectly clear that types with virtual functions or members are not considered empty and thus are not eligible to be used statelessly.

Your idea of allowing a class declaration to be declared stateless is scary as it removes the declaration of the statelessness from the usage. Now you can declare a std::vector<T> with a stateless T and, well, it is hard to tell if this would work. If sizeof(T) is 1 for a stateless T it could but would waste heap memory.

Looking back at the proposal, I don't think I made my intentions with stateless types sufficiently explicit.

With two exceptions that I'll get to in a moment, a stateless type can be used exactly like any other type. You can heap allocate them, use placement new, etc. When you use a stateless type to directly create an object in a way that cannot have `stateless` applied (stack variables, `new`, etc), then the object behaves just like any empty type. So if you use `std::allocate<Stateless>::allocate(5)`, you will get an array of 5 empty objects. A vector of stateless types is no different from a vector of empty types.

However, if you use a stateless type to declare an object for which `stateless` can be applied (direct NSDM or base class of a non-union class type), then `stateless` is implicitly added to that declaration. So a stateless type declaration is simply saying that the objects created from it will be stateless wherever possible. And that is only possible as a direct subobject of a non-union class type.

Remember: the goal of this proposal is not to allow zero-sized objects (and arrays). The goal is to make empty subobjects be able to not affect the layout of the type. A `vector<T>` does not create its Ts as subobjects of a type. So the fact that `T` is a stateless type is irrelevant.

When it comes to stateless types, there are really only two choices for what should happen if you use them in a non-stateless way. Either it is an explicit error, or it is legal and behaves as it normally would (ie: like any empty type).

My proposal suggests the latter. Earlier versions preferred the former, but that plays extreme havoc with using stateless types in template code. The most obvious example being `tuple`; if you give it a stateless type, it should still be able to work. But if you use option 1, there is no way it possibly could, since `tuple` implementations often rely on placement new and so forth. And stateless types would be forbidden from being used by placement new. Oh sure, with option 2 there's no guarantee that a tuple implementation will actually make the contained value stateless. But at least the code would compile and function adequately.

My proposal makes two exceptions for option 2: stateless types as NSDM arrays, and stateless types in unions. You could allow stateless types to be in unions and NSDM arrays. However, their behavior would not be stateless.

The reason to forbid them is mainly sanity. A user who doesn't understand the complexities of C++ may well expect an NSDM array of stateless types to be a stateless array. But it won't; it'll just be an array of empty classes, and the size of the containing type would be adjusted appropriately. By contrast, if you declare a stack variable of a stateless type, the user has no easy (standard-provided) way to tell if this actually takes up room on the stack. Whereas `sizeof` is very much standard-provided.

I'm open to being convinced otherwise however. Particularly with template code.
 
As long as the stateless property is on the variable the code accessing the variable can work with the restrictions that this qualifier introduces.
I don't see the problem with array iteration that you state " If the stateless array has a count larger than 1, and it is contained in an empty class, there is no guarantee that adding a number larger than 1 will result in a valid address for the purposes of iteration."
The address is going to be just as valid as for any iteration. Yes, the address can point to memory that would cause an access violation if accessed, but there is no state to access there anyway, so doing so deserves an access violation. What could possibly happen is that if you declare a very large array of zero sized objects the address could wrap around causing loops with < end conditions to stop immediately. Is this what you mean? I would say that this case is arcane enough to let through with a recommendation: Don't use < tests on loop pointers when iterating over stateless objects. (Here again is a difference between a stateless variable and a stateless class: if stateless claesses are allowed this has to be extended to: Don't use < tests in loops in template code). In any case, as you mention: The this pointer for any method call inside the loop body would not care if the address was actually outside the memory area (0 bytes) allocated for the array... it will not use the this pointer anyway.

As far as C++ is concerned, if you do `a->v()`, you are accessing the pointer `a`. Oh, the compiler may not dereference that memory location just to call the function (unless it's virtual). But this isn't about what compilers do; it's about what the standard says. And `a->v()` is no less of an access to the pointer `a` than `a->d = 4;` is.

And if you access an invalid pointer, you cause undefined behavior.

Nicol Bolas

unread,
Jul 13, 2016, 2:18:31 PM7/13/16
to ISO C++ Standard - Future Proposals
Here's a small update to the idea, after receiving commentary. It explains the behavior of stateless types hopefully more explicitly. It also adds a notation that stateless subobjects still impose their alignments on their containing types. So if you declare that an empty class has 8 byte alignment, then classes which statelessly use it will have at least 8 byte alignment. Even though it doesn't take up size.

Also, I've been thinking about complexity reduction (this isn't reflected in the new version). I'm starting to think that `stateless` doesn't need full constant-expression conditionals.

At subobject declarations, there are 3 scenarios of interest: 1) The user requires the subobject to be stateless and therefore the type must be empty. 2) The user wants the subobject to be stateless if the type allows that, but if not then it won't be stateless. 3) The user doesn't care.

With type declarations, there are 3 scenarios of interest: 1) The user wants all potentially stateless uses of the type to be stateless, so fail if the type is non-empty. 2) The user wants all potentially stateless uses of the type to be stateless if the type is empty, but otherwise be non-stateless. 3) The user does not want all potentially stateless uses of the type to be stateless.

Even something like `std::pair` doesn't need a conditional. A `pair<stateless1, stateless2>` is an empty type, and therefore we could declare that `pair` would be stateless if both types passed to it are stateless.
Stateless Subobjects.html
Stateless Subobjects.md

Bengt Gustafsson

unread,
Jul 13, 2016, 6:35:22 PM7/13/16
to ISO C++ Standard - Future Proposals
You wrote:

As far as C++ is concerned, if you do `a->v()`, you are accessing the pointer `a`. Oh, the compiler may not dereference that memory location just to call the function (unless it's virtual). But this isn't about what compilers do; it's about what the standard says. And `a->v()` is no less of an access to the pointer `a` than `a->d = 4;` is.

If this is what worries you I think the C++ standard could easily be changed to allow calling a non-virtual method of a stateless object with any this pointer. If this solves the "array of stateless objects" issue it is well worth it.

I understand now that the need for a stateless class came from usages of new, make_unique and similar where it is hard or impossible to tag the variable itself as stateless. However, it bothers me a little with this anyway as malloc() typically returns nullptr if you call it with 0 bytes. But maybe this can be changed to returning some other fixed address so that null-testing the pointer returns false. This other address (such as 1) would then have to be tested for in free() of course. This is a scary solution, especially as two separate object addresses can now compare equal. I am skeptical to the possibility of avoiding allocating a minimal size block for heap use of stateless types. For the new[] case I think a 1 byte allocation would be enough, but is this motivation enough for having the stateless class possibility? 

For static and local variables it seems reasonable to require the stateless keyword at the variable declaration.

If you ask me we could just as well make all variables stateless(auto), i.e. I don't see the problem (outside the standard text) of having the same address of two variables, but everyone else seems convinced that this is a real problem, and in that case I would be scared to put stateless on a class and then use it as a template parameter so that the template code unknowingly generates zero sized variables... It seems that you have to make up your mind whether aliasing the address of two variables is a problem or not! I don't think sprinkling the documentation of all libraries out there with notes on each template whether it is proof for stateless classes or not is a feasible solution, and then expecting users to read and follow those notes.

Thiago Macieira

unread,
Jul 13, 2016, 7:04:59 PM7/13/16
to std-pr...@isocpp.org
Em quarta-feira, 13 de julho de 2016, às 15:35:22 PDT, Bengt Gustafsson
escreveu:
> As far as C++ is concerned, if you do `a->v()`, you are accessing the
> pointer `a`. Oh, the compiler may not dereference that memory location just
> to call the function (unless it's virtual). But this isn't about what
> compilers do; it's about what the standard *says*. And `a->v()` is no less
> of an access to the pointer `a` than `a->d = 4;` is.
>
> If this is what worries you I think the C++ standard could easily be
> changed to allow calling a non-virtual method of a stateless object with
> any this pointer. If this solves the "array of stateless objects" issue it
> is well worth it.

Only if you can't access "this" pointer in a stateless class's non-static
member function.

We may want to do that anyway, as otherwise the this pointer value could be
used for keying access in an associative container, which in turn would
require two different stateless objects to have different this pointer values.

That is the same as your concern:

> This is a scary
> solution, especially as two separate object addresses can now compare
> equal. I am skeptical to the possibility of avoiding allocating a minimal
> size block for heap use of stateless types. For the new[] case I think a 1
> byte allocation would be enough, but is this motivation enough for having
> the stateless class possibility?

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

Nicol Bolas

unread,
Jul 13, 2016, 7:25:47 PM7/13/16
to ISO C++ Standard - Future Proposals
On Wednesday, July 13, 2016 at 7:04:59 PM UTC-4, Thiago Macieira wrote:
Em quarta-feira, 13 de julho de 2016, às 15:35:22 PDT, Bengt Gustafsson
escreveu:
> As far as C++ is concerned, if you do `a->v()`, you are accessing the
> pointer `a`. Oh, the compiler may not dereference that memory location just
> to call the function (unless it's virtual). But this isn't about what
> compilers do; it's about what the standard *says*. And `a->v()` is no less
> of an access to the pointer `a` than `a->d = 4;` is.
>
> If this is what worries you I think the C++ standard could easily be
> changed to allow calling a non-virtual method of a stateless object with
> any this pointer. If this solves the "array of stateless objects" issue it
> is well worth it.

Only if you can't access "this" pointer in a stateless class's non-static
member function.

We may want to do that anyway, as otherwise the this pointer value could be
used for keying access in an associative container, which in turn would
require two different stateless objects to have different this pointer values.

The problem is that your suggestion makes it impossible to have stateless subobjects which were not initially declared that way. And therefore it doesn't actually solve the whole problem we're trying to solve (`std::allocator` would not be able to be used statelessly except via EBO). So we're back to the same problem.

It is possible for a user of an empty type to use its `this` pointer value as a form of state. However, I feel that the best way to handle this is to accept that it can happen. So the question is this: how often do people do that sort of thing? How often do people use the location of an empty class instance as a key?

And how often is it that this happens within the same empty class containment hierarchy (both as base classes and NSDM)? After all, it will only fail if a single object contains two instances of the same stateless subobject.

Which suggests a possible solution, actually. We could just forbid allowing a type to have more than one stateless subobject of the same type, recursively.

But really, I think it's worth just accepting this particular edge case. I think it's reasonable to assume that 99% of empty types do not use `this` in this way, and the 1% who do need such identity can just give themselves a `char` member.

Nicol Bolas

unread,
Jul 13, 2016, 7:33:16 PM7/13/16
to ISO C++ Standard - Future Proposals
On Wednesday, July 13, 2016 at 6:35:22 PM UTC-4, Bengt Gustafsson wrote:
You wrote:

As far as C++ is concerned, if you do `a->v()`, you are accessing the pointer `a`. Oh, the compiler may not dereference that memory location just to call the function (unless it's virtual). But this isn't about what compilers do; it's about what the standard says. And `a->v()` is no less of an access to the pointer `a` than `a->d = 4;` is.

If this is what worries you I think the C++ standard could easily be changed to allow calling a non-virtual method of a stateless object with any this pointer. If this solves the "array of stateless objects" issue it is well worth it.

Are you familiar enough with this section of the standard to be throwing around terms like "easily be changed"? The consequences of simply declaring that you can just make up a pointer to a type, call it, and get valid behavior (whatever that would mean) are non-trivial. It certainly is not something to be taken lightly.

Also, making such a radical change for the purpose of allowing arrays to not take up space is just pointless. They're arrays of objects which are identical by definition. Allowing such arrays to not take up space as subobject members doesn't solve an actual problem.

I understand now that the need for a stateless class came from usages of new, make_unique and similar where it is hard or impossible to tag the variable itself as stateless.

I'm not sure how to explain this any more clearly.

Stateless types only behave differently from non-stateless types when they are declared as subobjects of class types. In all other cases, they behave no differently from non-stateless types.

Consider this:

stateless struct no_state {};
struct stateful{};

Given those declarations, `sizeof(no_state)` will be a non-zero value. It will also be the same non-zero value as `sizeof(stateful)`. If you perform `new no_state`, you will allocate `sizeof(no_state)` bytes. If you overload `operator new` for `no_state`, then you will find that it will get a size of `sizeof(no_state)`. At no time will there be a zero-sized allocation.

If you perform `new no_state[25]`, what you get back will be a genuine array containing 25 actual values of type `no_state`. That array will consume `25 * sizeof(no_state)` bytes, and that array will act like any other C++ array. Again, there is no zero-sized allocation, since `sizeof(no_state)` is a non-zero value.

Stateless types are a convenience feature built atop stateless subobjects. It simply means that, when the type is used to generate a subobject of a class, then that subobject will be a stateless subobject. Objects of that type generated in all other ways will not be stateless.

It is class subobjects which are stateless. Nothing else. A type is stateless only in that it declares class subobjects as stateless implicitly.
 
If you ask me we could just as well make all variables stateless(auto), i.e. I don't see the problem (outside the standard text) of having the same address of two variables, but everyone else seems convinced that this is a real problem, and in that case I would be scared to put stateless on a class and then use it as a template parameter so that the template code unknowingly generates zero sized variables... It seems that you have to make up your mind whether aliasing the address of two variables is a problem or not! I don't think sprinkling the documentation of all libraries out there with notes on each template whether it is proof for stateless classes or not is a feasible solution, and then expecting users to read and follow those notes.

I think you're misunderstanding the "aliasing is a problem" issue when it comes to zero-sized objects.

Yes, it is a problem. But it is one that can be solved with careful understanding of the rules and with a solid design for the feature. And I outlined what I believe is such a solution in the section titled "Memory locations" (as well as array forbearance). We declare that the memory location for a stateless subobject can be any valid location of the right size within its containing object. That solves the "object is a region of storage" rule.

This is also why we don't allow statelessness in non-subobject contexts.

To satisfy aliasing, we would have to extend the aliasing rules to say that stateless subobjects can alias with any other object. Now, they still must be valid objects, with all the regular C++ rules that requires (thus you can't just make up a pointer). But a stateless subobject's region of storage is not unique within its container, nor is it forbidden from overlapping with other objects.

My issue with your previous proposal on the subject is that you never actually addressed the problem. My proposal does. I can't claim sufficient knowledge of the problem to be certain that it fully addresses the problem. But I've explained how it does address the problem as I understand it.

I would hope to see additional commentary on edge cases that I may have missed.

Thiago Macieira

unread,
Jul 13, 2016, 8:30:20 PM7/13/16
to std-pr...@isocpp.org
Em quarta-feira, 13 de julho de 2016, às 16:25:47 PDT, Nicol Bolas escreveu:
> > We may want to do that anyway, as otherwise the this pointer value could
> > be
> > used for keying access in an associative container, which in turn would
> > require two different stateless objects to have different this pointer
> > values.
>
> The problem is that your suggestion makes it impossible to have stateless
> subobjects which were not initially declared that way. And therefore it
> doesn't actually solve the whole problem we're trying to solve
> (`std::allocator` would not be able to be used statelessly except via EBO).
> So we're back to the same problem.

Why can't std::allocator be declared in C++2x as stateless?

That would allow the class author to decide whether the class can be used
statelessly or not. The implementation should not be allowed to use techniques
like EBO for non-stateless classes, as the implementation could very well
require different pointer addresses for two elements.

Example would be std::tuple<allocator, allocator>.

> It is possible for a user of an empty type to use its `this` pointer value
> as a form of state. However, I feel that the best way to handle this is to
> accept that it can happen. So the question is this: how often do people do
> that sort of thing? How often do people use the location of an empty class
> instance as a key?

I know of only two cases, which boil down to not really being stateless. That
is, the object is not really stateless, it's just keeping the data elsewhere.

One case is in poorly-designed code that didn't foresee the need to expand the
class to accommodate. This happened to Qt 3 and KDE 3 classes, when the
requirement to always add a d pointer (pimpl) wasn't yet on the coding
guidelines. Whenever the class author needed to expand the class, the solution
was to have a global hash that was keyed on the object's address.

You can still find the recommendation in the KDE Wiki:

https://community.kde.org/Policies/Binary_Compatibility_Issues_With_C%2B
%2B#Adding_new_data_members_to_classes_without_d-pointer

The second is when you only want to do that in debug mode, to keep some extra
state that classes don't often do.

> And how often is it that this happens within the same empty class
> containment hierarchy (both as base classes and NSDM)? After all, it will
> *only fail* if a single object contains two instances of the same stateless
> subobject.

Indeed, which can happen for std::tuple<Stateless, Stateless>.

Granted, this is not very common, but why not require class authors to add
"stateless" ?

> Which suggests a possible solution, actually. We could just forbid allowing
> a type to have more than one stateless subobject of the same type,
> recursively.
>
> But really, I think it's worth just accepting this particular edge case. I
> think it's reasonable to assume that 99% of empty types do not use `this`
> in this way, and the 1% who do need such identity can just give themselves
> a `char` member.

They probably should, but we have two options:

1) require "fake stateless" classes to have a member
2) require stateless classes to have a "stateless" keywords

Why not #2?

jgot...@gmail.com

unread,
Jul 13, 2016, 9:29:19 PM7/13/16
to ISO C++ Standard - Future Proposals


On Wednesday, July 13, 2016 at 6:35:22 PM UTC-4, Bengt Gustafsson wrote:
You wrote:

As far as C++ is concerned, if you do `a->v()`, you are accessing the pointer `a`. Oh, the compiler may not dereference that memory location just to call the function (unless it's virtual). But this isn't about what compilers do; it's about what the standard says. And `a->v()` is no less of an access to the pointer `a` than `a->d = 4;` is.

If this is what worries you I think the C++ standard could easily be changed to allow calling a non-virtual method of a stateless object with any this pointer. If this solves the "array of stateless objects" issue it is well worth it.


The main problem with arrays of stateless objects is that, given
stateless Foo fooArray[7];

In the code
for (auto begin = fooArray, end = fooArray + 7; begin != end; ++begin) {
   bar
(*begin);
}

the function bar must be called exactly 7 times,  The range-based for loop depends on this.  If FooArray consists of 7 size 0 objects, how could you possibly guarantee that?

Joe Gottman

Giovanni Piero Deretta

unread,
Jul 14, 2016, 5:37:05 AM7/14/16
to ISO C++ Standard - Future Proposals
On Wednesday, July 13, 2016 at 3:20:22 PM UTC+1, Nicol Bolas wrote:
On Wednesday, July 13, 2016 at 7:03:42 AM UTC-4, Bengt Gustafsson wrote:
It seems to me that by requiring that the type of the variable being declared stateless really is empty you prevent your new feature from being used in maybe the most common use case exemplified by the container Alloc type. What's needed there is a potentially zero-sized member in case the Alloc class is stateless, but which takes up as much space as needed if it is not.

I called that specific case out as an example:

[snip predicated stateless example]

`a` will take up no space if `Alloc` is empty. Also, you could use `stateless(auto)` instead of the explicit `is_empty_v` test.

This allows you to make a distinction between "Make this object stateless where possible" and "If this type is not empty, then that's an error." While the former is useful, the latter is also quite useful. After all, if your type needs to be empty, and you're including a bunch of other member types, you want to make sure nobody breaks your type's rules by accidentally adding an NSDM or something.


What about a conditional 'static' construct? By predicating static on, for example, 'is_empty' you get similar guarantees. The advantage is that the object will still be a proper C++ object whit its own address, but with (conditional) static lifetime, so it will not take space in the object itself. I'll leave the part of how to make the initialization conditional in the constructor initializer list to others.

 

Thiago Macieira

unread,
Jul 14, 2016, 11:46:12 AM7/14/16
to std-pr...@isocpp.org
On quinta-feira, 14 de julho de 2016 02:37:04 PDT Giovanni Piero Deretta
wrote:
> What about a conditional 'static' construct? By predicating static on, for
> example, 'is_empty' you get similar guarantees. The advantage is that the
> object will still be a proper C++ object whit its own address, but with
> (conditional) static lifetime, so it will not take space in the object
> itself. I'll leave the part of how to make the initialization conditional
> in the constructor initializer list to others.

You replace one problem with the other. Now they have addresses, but two
different objects have the same address.

Nicol Bolas

unread,
Jul 14, 2016, 3:10:57 PM7/14/16
to ISO C++ Standard - Future Proposals

There are two problems there:

1) It's not an NSDM. Which means you can't really treat it like an NSDM. While you can partially treat it like one, you cannot treat it like one in all ways. Constructor initialization is part of it, but so are things like reflection, `offsetof`, member pointers, and so forth.

2) You can't have static base classes. I don't know why people keep forgetting that EBO is an optimization and therefore not required (for non-standard layout types).

Nicol Bolas

unread,
Jul 14, 2016, 4:09:37 PM7/14/16
to ISO C++ Standard - Future Proposals
On Wednesday, July 13, 2016 at 8:30:20 PM UTC-4, Thiago Macieira wrote:
Em quarta-feira, 13 de julho de 2016, às 16:25:47 PDT, Nicol Bolas escreveu:
> > We may want to do that anyway, as otherwise the this pointer value could
> > be
> > used for keying access in an associative container, which in turn would
> > require two different stateless objects to have different this pointer
> > values.
>
> The problem is that your suggestion makes it impossible to have stateless
> subobjects which were not initially declared that way. And therefore it
> doesn't actually solve the whole problem we're trying to solve
> (`std::allocator` would not be able to be used statelessly except via EBO).
> So we're back to the same problem.

Why can't std::allocator be declared in C++2x as stateless?

Because that would potentially break code. The size of any object that used `allocator` as a member variable would change, thus breaking any ABIs involving such objects.


Actually, it wouldn't (in my proposal as it currently stands). `tuple` does not necessarily have any true subobjects of type `Stateless`. Or rather, that's not a *required* implementation of `tuple`. Some `tuple` implementations use recursive inheritance, while others rely on placement `new` gimmicks. In the latter case, the two types would just be empty types, and thus they would have unique memory locations.

Granted, this is not very common, but why not require class authors to add
"stateless" ?

> Which suggests a possible solution, actually. We could just forbid allowing
> a type to have more than one stateless subobject of the same type,
> recursively.
>
> But really, I think it's worth just accepting this particular edge case. I
> think it's reasonable to assume that 99% of empty types do not use `this`
> in this way, and the 1% who do need such identity can just give themselves
> a `char` member.

They probably should, but we have two options:

1) require "fake stateless" classes to have a member
2) require stateless classes to have a "stateless" keywords

Why not #2?

I do appreciate the simplicity of #2. It was after all my original design from the very first version of this idea. However, remember back to the defining problem for this proposal. Someone gives us a type. We want to use that type as a member variable. But if that type is empty, we don't want our type taking up extra space.

#2 is not a solution to this problem. Why? Because people will still pass us empty-but-not-stateless types. And therefore this:

template<typename Alloc>
struct DataBlock : public Alloc
{};

Will always be a better alternative than this:

template<typename Alloc>
struct DataBlock
{

 
Alloc a;
};

In the former, we are guaranteed to avoid bloat for any empty class. In the latter, we only avoid bloat if the user provides a stateless type. And stateless types will always be a subset of empty types.

The goal of this whole exercise is to allow the latter to be just as efficient as the former, thus ensuring that we only derive from something if we really mean it. By only allowing types explicitly declared stateless to be used statelessly, we cannot achieve that, since users will always pass empty types that could have been optimized, but aren't.

And it's important to realize that we are not using the type in any way that would break the type's identity. There would never be two stateless subobjects of type `Alloc` in `DataBlock::Alloc`. We are guaranteed that, for any two instances of `DataBlock<Alloc>`, their `a` members will have different addresses.

----

I've been thinking quite a bit about the scope of the unique identity problem. I have come up with 3 possible solutions:

1: Ignore it. Basically saying that it's not a problem.

2: Explicit forbid it. I didn't like this idea at first. But then I remembered the increasingly complicated exception in standard layout rules. The one which forbids using the same type as a base class more than once, as well as the first NSDM (recursively) from being the same type as one of the base classes:

struct empty {};

struct not_standard_layout : public empty
{
  empty e
;
};

This incredibly complex rule exists to avoid this exact problem. If that were permitted to be standard layout, then `not_standard_layout::empty` would be at the same address as `not_standard_layout::e`. There would be two overlapping objects of the same type, so the two objects would lose their identity.

Stateless subobjects have often been analogized as a generalized form of empty base optimization. Given that analogy, it makes sense to have a similar rule forbidding such things for stateless subobjects. In this case, it would be a hard compile error, since you asked for something to be stateless, but that cannot be provided.

The rules here are very complex however, much as the standard layout exception rule has grown over the years.

3: Redefine the problem away. The problem is essentially this: do stateless subobjects have identity outside of their containing non-stateless instances? Using solution #2 effectively says "yes" and thereby breaks every time you do something that breaks that identity. However, if we say "no", then what we're saying is that if a type contains two stateless subobjects of the same type, then they refer to the same object. Thus, we don't have two separate objects; we have two names for the same object.

Functionally, it's no different from #1; code that expects a type to have identity will still fail. But logically, it makes it more clear what it means to declare a subobject to be stateless.

4: I just came up with this while writing this post. It is effectively a merger of #2 and #3.

In the current version of the proposal, applying `stateless` to a type only means that subobject declarations for this type are implicitly declared stateless (and prohibiting potentially confusing uses). We can add something more to this declaration.

If you declare that a type is `stateless`, then you are saying that this type does not have identity. And therefore, it is OK if more than one object of this type has the same address. So this represents a contractual obligation on the class writer's part to not use the class's `this` pointer with the expectation of object identity. Or, using #3's reasoning, two subobjects of this type in the same class (recursively) represent different names for the same object.

However, if the type was empty-but-not-stateless, then we invoke rule #2: a type may not have more than one stateless subobject of the same non-stateless type.

For example:

stateless struct no_identity {};
struct empty {};

struct s1
{
  no_identity n1
; //Stateless by default
  stateless empty e1
; //Explicitly stateless.
  no_identity n2
; //OK, n1 and n2 are aliases of the same object.
  stateless empty e2
; //il-formed. `empty` was not declared `stateless`, and therefore it must have identity.
};

Arthur O'Dwyer

unread,
Jul 14, 2016, 4:27:54 PM7/14/16
to ISO C++ Standard - Future Proposals
On Wednesday, July 13, 2016 at 3:35:22 PM UTC-7, Bengt Gustafsson wrote:
> You wrote:
>>
>> As far as C++ is concerned, if you do `a->v()`, you are accessing the pointer `a`. Oh, the compiler may not dereference
>> that memory location just to call the function (unless it's virtual). But this isn't about what compilers do; it's about what
>> the standard says. And `a->v()` is no less of an access to the pointer `a` than `a->d = 4;` is.
>
> If this is what worries you I think the C++ standard could easily be changed to allow calling a non-virtual method of a
> stateless object with any this pointer. If this solves the "array of stateless objects" issue it is well worth it.

I've thought of this too, but I don't think it's feasible. Consider that right now member functions and free functions are basically interchangeable:

    class A {
      void f() { return g(this); }
    };
    void g(A *a) { return a->f(); }

You're proposing to make it legal to call garbage->f() (where garbage is e.g. a null pointer, or a pointer that has recently been deleted, or whatever); but not legal to call g(garbage) — which means that garbage->f() will blow up when it gets to the line g(this) anyway.

This trap-for-the-unwary will be both catastrophic for newbies and unnecessarily annoying for experts.  "What do you mean, I have to rewrite all my free functions as member functions or else the compiler quietly gives my code undefined behavior??"


On another angle of the topic, it occurred to me the other day that if a programmer is trying to express the idea of "This object has no address — that is, I promise not to take its address except perhaps implicitly in trivial ways" — the most natural way to express that is with the newly freed-up register storage class.  The meaning of this storage-class specifier in C and C++-pre-17 is literally "I promise not to take this object's address", so, if we can piggyback on that intuition, that would be nice.  However, this is merely a contribution to the syntactic bikeshed; I agree with Nicol that there's no obvious forward progress on the semantic problem happening here.

my $.02,
–Arthur

Nicol Bolas

unread,
Jul 14, 2016, 4:35:17 PM7/14/16
to ISO C++ Standard - Future Proposals
On Thursday, July 14, 2016 at 4:27:54 PM UTC-4, Arthur O'Dwyer wrote:
However, this is merely a contribution to the syntactic bikeshed; I agree with Nicol that there's no obvious forward progress on the semantic problem happening here.

I think we are getting somewhere on the semantic issues. We've uncovered yet another problem to be resolved: the unique identity of such subobjects. This is separate from aliasing, as that is about objects of different types overlapping. This one is about different object instances having the same address.

The only way we're going to solve this problem is to uncover all of the corner cases and just work out solutions to them.

Tom Honermann

unread,
Jul 14, 2016, 5:14:03 PM7/14/16
to std-pr...@isocpp.org
On 7/14/2016 4:09 PM, Nicol Bolas wrote:

For example:

stateless struct no_identity {};
struct empty {};

struct s1
{
  no_identity n1
; //Stateless by default
  stateless empty e1
; //Explicitly stateless.
  no_identity n2
; //OK, n1 and n2 are aliases of the same object.
  stateless empty e2
; //il-formed. `empty` was not declared `stateless`, and therefore it must have identity.
};

I've been thinking along these lines too.  The questions I have are:
* What if no_identify has a non-trivial constructor or destructor.  When are they invoked in the example above and how many times?
* What is the lifetime of the object aliased by n1 and n2 above?  (Essentially the same question).

Tom.

Nicol Bolas

unread,
Jul 14, 2016, 5:45:00 PM7/14/16
to ISO C++ Standard - Future Proposals

Well, if there truly is only one `no_identity` subobject in `s1`, then by definition there must be only one constructor and destructor call. Any other compiler-generated per-subobject functions (copy/move assignment, etc) must similarly find only one subobject.

Then again, if the type is truly `stateless` and therefore has no identity, there would be nothing wrong on an implementation level with declaring that the per-subobject functions get called once per member. But that plays havoc with C++'s object rules, since you would be calling a constructor multiple times on an object who's lifetime has started. Or a destructor multiple times on the same object.

Then again, we can change the lifetime rules of stateless types (since they're explicitly denoted as such) to be different. That calling a constructor on the same stateless object does not restart its lifetime or calling the destructor multiple times does not end it. Or something like that.

It's certainly something to consider.

Nicol Bolas

unread,
Jul 14, 2016, 6:28:48 PM7/14/16
to ISO C++ Standard - Future Proposals

After thinking about it some more, I don't think that it's such a big problem. We should do the obvious.

Stateless subobjects, as a concept, relies on the idea that multiple objects can exist in the same memory location. As such, having two of them be the same type in the same location is fine. Well, it's fine from the point of view of the standard. I think. That is, if we can make the standard work for multiple objects of different types, we can make it work for multiple objects of the same type.

So instead of looking at it as having two aliases for the same subobject, you have two subobjects. But because they are stateless, they just so happen to live in the same location. So they look identical. And technically, you can't tell which one you're talking to. But that's OK because they are stateless and therefore lack identity.

So yes, you call constructors and destructors once for each subobject declaration. And other such operations. That's fine.

And I think so long as we restrict identity-lessness to types which are declared stateless, we should avoid any user pitfalls.

Thiago Macieira

unread,
Jul 14, 2016, 7:23:53 PM7/14/16
to std-pr...@isocpp.org
On quinta-feira, 14 de julho de 2016 13:09:36 PDT Nicol Bolas wrote:
> > Why can't std::allocator be declared in C++2x as stateless?
>
> Because that would potentially break code. The size of any object that used
> `allocator` as a member variable would change, thus breaking any ABIs
> involving such objects.

How? If we require both in the declaration of the class and where it's used,
then no current code would be affected.

If the declaration doesn't use stateless:
object occupies space, regardless of declaration
If the declaration specifies stateless or stateless(true)
if class specifies stateless:
object occupies no space
if class doesn't specify stateless:
ill-formed
If the declaration specifies stateless(auto)
if class specifies stateless:
object occupies no space
if class doesn't specify stateless:
object occupies space

In a green field, we'd have stateless only on the class declaration. We're not
in a green field.

> > They probably should, but we have two options:
> >
> > 1) require "fake stateless" classes to have a member
> > 2) require stateless classes to have a "stateless" keywords
> >
> > Why not #2?
>
> I do appreciate the simplicity of #2. It was after all my original design
> from the very first version of this idea
> <https://groups.google.com/a/isocpp.org/forum/#!searchin/std-proposals/state
> less$20inner/std-proposals/HjGujSdKXX0/27WIWTc_EwAJ>. However, remember back
> to the defining problem for this proposal. Someone gives us a type. We want
> to use that type as a member variable. But if that type is empty, we don't
> want our type taking up extra space.

But only if the class author declares that it shouldn't take up space. If the
class author requires that the object have a unique pointer address, different
from any and all other objects, then you cannot ignore her.

> #2 is not a solution to this problem. Why? Because people will still pass
> us empty-but-not-stateless types. And therefore this:
>
> template<typename Alloc>
> struct DataBlock : public Alloc
> {};
>
> Will always be a better alternative than this:
>
> template<typename Alloc>
> struct DataBlock
> {
> Alloc a;
> };
>
> In the former, we are guaranteed to avoid bloat for any empty class. In the
> latter, we only avoid bloat if the user provides a stateless type. And
> stateless types will *always* be a subset of empty types.

But you don't know if you're allowed to "avoid bloat". Maybe your (premature)
optimisation will break code later.

I really think that the class author should be allowed to decide whether the
use of that class can take up no space.

> For example:
>
> stateless struct no_identity {};
> struct empty {};
>
> struct s1
> {
> no_identity n1; //Stateless by default
> stateless empty e1; //Explicitly stateless.
> no_identity n2; //OK, n1 and n2 are aliases of the same object.
> stateless empty e2; //il-formed. `empty` was not declared `stateless`,
> and therefore it must have identity.
> };

That's the green-field case. It's the cleanest solution, I agree, but that
doesn't allow us to use it for std::allocator.

Thiago Macieira

unread,
Jul 14, 2016, 7:30:00 PM7/14/16
to std-pr...@isocpp.org
Interesting questions.

My first reaction is to say that a stateless class must only have static data
members and static member functions. Therefore, it would not be allowed to
declare a constructor or destructor.

For std::allocator to be made stateless, the implementation would need to
change all of the member functions to be static. This could lead to a binary
compatibility problem (MSVC encodes the "staticness" of a member function in
the mangled name). That can be solved by saying all members functions in a
stateless class are implicitly static and therefore cannot use the this
pointer.

As a consequence of either solution, you can't add a constructor or a
destructor to a stateless class.

But maybe we can: a default constructor and a destructor are actually
possible. They'd be called (without a this pointer) whenever the object begins
or ends its lifetime. That is:

{
no_identity n1;
f();
no_identity n2;
g();
}

would be equivalent to:

{
no_identity::no_identity();
f();
no_identity::no_identity();
g();
no_identity::~no_identity();
no_identity::~no_identity();

Thiago Macieira

unread,
Jul 14, 2016, 7:33:58 PM7/14/16
to std-pr...@isocpp.org
On quinta-feira, 14 de julho de 2016 13:27:54 PDT Arthur O'Dwyer wrote:
> On another angle of the topic, it occurred to me the other day that if a
> programmer is trying to express the idea of "This object has no address —
> that is, I promise not to take its address except perhaps implicitly in
> trivial ways" — the most natural way to express that is with the newly
> freed-up register storage class.

Indeed it is. But it would be quite a head-scratcher to read code like:

register class allocator
{
...
};

struct my_data
{
register allocator alloc;
};

It's one of those things we could do for the sake of convenience, but twenty
years from now we'd be asking ourselves "why register"?

Bjorn Reese

unread,
Jul 16, 2016, 6:49:22 AM7/16/16
to std-pr...@isocpp.org
Can the stateless subobjects access the member variables of the parent
class?

Thiago Macieira

unread,
Jul 16, 2016, 12:26:47 PM7/16/16
to std-pr...@isocpp.org
On sábado, 16 de julho de 2016 12:55:31 PDT Bjorn Reese wrote:
> Can the stateless subobjects access the member variables of the parent
> class?

I don't see how.

Nicol Bolas

unread,
Jul 16, 2016, 1:33:40 PM7/16/16
to ISO C++ Standard - Future Proposals
On Thursday, July 14, 2016 at 7:23:53 PM UTC-4, Thiago Macieira wrote:
On quinta-feira, 14 de julho de 2016 13:09:36 PDT Nicol Bolas wrote:
> > Why can't std::allocator be declared in C++2x as stateless?
>
> Because that would potentially break code. The size of any object that used
> `allocator` as a member variable would change, thus breaking any ABIs
> involving such objects.

How? If we require both in the declaration of the class and where it's used,
then no current code would be affected.

This now requires that everyone go through their code and annotate all of their empty types with `stateless`. Not to mention going through the standard library and deciding which classes that are currently "empty" should be explicitly declared `stateless`. Not to mention cases where the library is fixed and cannot be upgraded. Not to mention ABI issues and so forth.

If you force this double-stateless declaration, then people will simply not care. They'll go right back to deriving from empty classes to avoid bloat. Why? Because it always works, whereas the `stateless` version does not. If stateless subobjects are going to solve this problem, then the feature must be able to work on the exact same types that EBO can work on right now.

By the nature of the problem statement, this is non-negotiable.

> > They probably should, but we have two options:
> >
> > 1) require "fake stateless" classes to have a member
> > 2) require stateless classes to have a "stateless" keywords
> >
> > Why not #2?
>
> I do appreciate the simplicity of #2. It was after all my original design
> from the very first version of this idea
> <https://groups.google.com/a/isocpp.org/forum/#!searchin/std-proposals/state
> less$20inner/std-proposals/HjGujSdKXX0/27WIWTc_EwAJ>. However, remember back
> to the defining problem for this proposal. Someone gives us a type. We want
> to use that type as a member variable. But if that type is empty, we don't
> want our type taking up extra space.

But only if the class author declares that it shouldn't take up space. If the
class author requires that the object have a unique pointer address, different
from any and all other objects, then you cannot ignore her.

But there's no way to explicitly state that as it currently stands. So you can either:

A) Assume that every empty class currently existing requires each instance to have a pointer address that is different from every other instance.

B) Assume that every empty class currently existing does not require unique identity for its instances.

Furthermore, I don't believe that option A means that you must forbid such classes from being used statelessly. All you need to do is make sure they don't overlap. Just like with standard layout rules.


> #2 is not a solution to this problem. Why? Because people will still pass
> us empty-but-not-stateless types. And therefore this:
>
> template<typename Alloc>
> struct DataBlock : public Alloc
> {};
>
> Will always be a better alternative than this:
>
> template<typename Alloc>
> struct DataBlock
> {
>   Alloc a;
> };
>
> In the former, we are guaranteed to avoid bloat for any empty class. In the
> latter, we only avoid bloat if the user provides a stateless type. And
> stateless types will *always* be a subset of empty types.

But you don't know if you're allowed to "avoid bloat". Maybe your (premature)
optimisation will break code later.

No. That code will never break.

In current C++ rules, `Alloc` cannot assume that its pointer value is unique among all other objects. Why? Because the first NSDM of an object often has the same pointer address as the containing object. Because base classes often have the same address as their derived class. In fact, this is the whole basis of the EBO rule: that the base class and derived class can have the same address. Unique identity of an object does not mean unique address among all object types. Nor does it mean having a region of storage that is disjoint from all other objects (indeed, being a subobject at all makes that impossible).

What `Alloc` can assume right now is that if two pointers of type `Alloc` are equal, then they are pointing to the same object instance. And if they are unequal, then they are pointing to separate instances. And that is where statelessness has a problem, since you can use such a unique pointer to create and access per-instance state.

But it only has a problem if there are two (or more) stateless subobjects of type `Alloc` within the same class (well, the scope of the problem is a bit more complex, but it's all within the same object). After all, pointers to stateless subobjects must refer to a valid memory address within the object that contains them (just like any subobject). So if you have two distinct instances of `DataBlock<Alloc>`, each instance will have a different address for its `a` member.

So there is no problem with `DataBlock`'s usage of a stateless `Alloc`. Member functions of `Alloc` can assume that there is a 1:1 correspondence between its address and the address of any (valid) pointer to an `Alloc`.

I really think that the class author should be allowed to decide whether the
use of that class can take up no space.

Given the limitations of identity as outlined before (ie: one subobject of that type per containing object, recursively), give me a circumstance where the implementation of an empty type would break under the stateless rules as I have outlined them. And this "something" must be legal, well-defined C++.
 
> For example:
>
> stateless struct no_identity {};
> struct empty {};
>
> struct s1
> {
>   no_identity n1; //Stateless by default
>   stateless empty e1; //Explicitly stateless.
>   no_identity n2; //OK, n1 and n2 are aliases of the same object.
>   stateless empty e2; //il-formed. `empty` was not declared `stateless`,
> and therefore it must have identity.
> };

That's the green-field case. It's the cleanest solution, I agree, but that
doesn't allow us to use it for std::allocator.

Why not? `no_identity` made it perfectly clear that two subobjects could have the same address. That's what it means to declare them `stateless`. However, since `empty` was not declared `stateless`, that means that it cannot have two subobjects that have the same address.

`std::allocator` would not need to be declared `stateless` in order to be used as a single, stateless NSDM.

Nicol Bolas

unread,
Jul 16, 2016, 2:09:46 PM7/16/16
to ISO C++ Standard - Future Proposals
Here's the next revision of the proposal.

I've added an issues section that discusses the identity problem that we've discussed here, along with an exploration of possible solutions. None of those are integrated into the actual proposed design yet, which remains as it was before.

I'm really leaning towards the solution, where we forbid any stateless subobjects that violate unique identity, but allowing classes explicitly declared stateless to lack unique identity. In such cases, multiple objects of the same type would coexist in the same storage. So you'd get multiple constructors and destructors called on the same memory location. But since you stated explicitly that the type was `stateless`, then you should know better than to use the `this` pointer in a way that expects unique identity.
Stateless Subobjects.html
Stateless Subobjects.md

Bengt Gustafsson

unread,
Jul 17, 2016, 6:51:58 AM7/17/16
to ISO C++ Standard - Future Proposals
When it comes to arrays of stateless objects I think there is a fourth option worth considering:

4. arrays of stateless objects take no room in the declared scope, but pointer arithmethics and indexing works as if sizeof(T) is 1, (which indeed it is).

This allows loops over the array elements and pointer arithmetic and similar operations to work as for any type. Of course you can't actually access the array elements, but this doesn't matter as there is nothing there anyway. I can't see that it would be worse to, say, do this:

stateless T arr[2];

arr[1].SomeMethod();   // Here you call SomeMethod() with a 'this' pointer which does not point to a T.

than this:

stateless T s;
s.SomeMethod();

Bengt Gustafsson

unread,
Jul 17, 2016, 6:59:43 AM7/17/16
to ISO C++ Standard - Future Proposals
Sorry, that got sent before it was complete...

I was aiming to point out that the this pointer when calling the method on the scalar s is just as invalid as when calling it for arr[1].

Oh, right now I see that this may cause an aliasing problem if the array address range goes outside the surrounding object, and then for instance two of those objects are allocated after each other on the heap. This brings up an idea I have had in the back of my head for a couple of days:

Couldn't it just be demanded from the compiler to guarantee that no two objects of the same type have the same address? Here is an example:

template<typename T> class Class {
    stateless T first;
    int x;

Bengt Gustafsson

unread,
Jul 17, 2016, 8:02:38 AM7/17/16
to ISO C++ Standard - Future Proposals
There it went again. I'm very sorry, but I think I pressed the space bar, how could that send the message?

Here is the complete class head I was aiming for:

template<typename T> class Class {

    stateless T first
[4];   // 0 bytes
   
int x;               // 4 bytes
    stateless T second
;  // 1 byte
};

Here first does not need its own storage as it does not alias with any other T thanks to x. However, and here I'm uncertain, it seems that second needs a dedicated byte for the case that
two Class objects are placed after each other, in which case the 'second' member of the 1st object would alias with the 'first' member of the 2nd object. Maybe the compiler can be required to recognize this case and put a padding byte between the two variables in this case? 

This also touches on the discussion of inheriting from T to get EBO and then having the first NSDM being a T too, which creates an alias situation today, which this rule would (potentially) remove. 

This rule could also be refined to guarantee that there is no aliasing between stateless variables and other variables even if they have different types: Just assign a "unused" address to each stateless variable, i.e. an address which is partway through another (stateful) variable. In my example class above this means that to avoid aliasing between first[0] and x an extra byte would be needed somewhere, probably best to place it after x and use x+1 ... x+4 as the offsets for first's elements. If first was an array 3 or less this would not be needed though. If first was an array 2 or less second's offset could also be crammed in as x + 3. It seems that this type of rule would be best formulated as a QoI thing "compiler may select to assign an offset inside another member as an alternative to wasting a byte".

I can't say that I find this idea particularly appealing but isn't it the logical continuation of the thinking that the first stateless NSDM could have the same offset as the surrounding object... if it can have the same offset as the first non-empty member, then it should be equally ok to use some other offset within the object for the second stateless member, and least likely to alias in a bad way if this is an offset not pointing to the start of any non-empty NSDM. This can then be generalized to (small) arrays. In cases where no free offsets can be found padding bytes may have to be added.

I think we are definitely making progress, but it is still unclear if this reasoning leads all the way to a useful proposal. My conclusions are that 'stateless' on class and variable levels carry part of the semantic meaning and that all four combinations are different:

non-stateless variable of a non-stateless type: This is the current case.

stateless variable of a non-stateless type: The variable does not need to take up space if a "unused" offset can be found in the surrounding object. This still allows keying on the object address as may be required by the type's behavior. This also allows taking the address of the member and using it as a key even if the variable's class is unaware.

non-stateless variable of a stateless type: While stateless types may not have access to their own this pointer inside their methods surrounding code may still take the address and use it as a key so the compiler must sill find an unused offset to be able to optimize the storage away.

stateless variable of a stateless type: Now neither the class itself or the surrounding object can get the address of the member so extra storage for it can always be avoided.


I think this means roughly that stateless on a class indicates that there is no this pointer with a particular address and stateless on a member means that its address can't be taken. This is logical a the author of the class knows it is empty and that accessing this is pointless. If such a class needs a this as a key, well, then it is not a candidate for being stateless after all. SImilarly the author of code with a stateless member knows that the code does not rely on the member having a distinct address (except maybe inside its own methods) or she would not use stateless keyword.

As an aside: I don't think that the possibility to restrict a template to only empty type parameter values is a good enough reason to complicate the proposal with three versions of stateless, i.e.
stateless, stateless(bool) and stateless(auto). Also, with the stateless class possibility it gets unclear to at least a fraction of programmers whether stateless(auto) kicks in for all is_empty classes or only those explicitly declared as stateless. Finally, on this issue, it seems that what we typically want is partial specialization on whether a T is_empty or not, not a hard error. And this can already be achieved using TMP, with or without concepts.

With this question I conclude: Is this whole idea moot when we can use a concept to specialize template classes not to even have a member if the class passed in type is_empty?

Avi Kivity

unread,
Jul 17, 2016, 11:54:00 AM7/17/16
to std-pr...@isocpp.org

Thanks a lot for this.  I've been meaning to come back to it "later", but your proposal is better than anything I would have come up with.


On 07/13/2016 09:18 PM, Nicol Bolas wrote:
Here's a small update to the idea, after receiving commentary. It explains the behavior of stateless types hopefully more explicitly. It also adds a notation that stateless subobjects still impose their alignments on their containing types. So if you declare that an empty class has 8 byte alignment, then classes which statelessly use it will have at least 8 byte alignment. Even though it doesn't take up size.

Also, I've been thinking about complexity reduction (this isn't reflected in the new version). I'm starting to think that `stateless` doesn't need full constant-expression conditionals.

At subobject declarations, there are 3 scenarios of interest: 1) The user requires the subobject to be stateless and therefore the type must be empty. 2) The user wants the subobject to be stateless if the type allows that, but if not then it won't be stateless. 3) The user doesn't care.


Scenario 2 will be the most common one,  I imagine, and the syntax should reflect that.

With type declarations, there are 3 scenarios of interest: 1) The user wants all potentially stateless uses of the type to be stateless, so fail if the type is non-empty. 2) The user wants all potentially stateless uses of the type to be stateless if the type is empty, but otherwise be non-stateless. 3) The user does not want all potentially stateless uses of the type to be stateless.

Even something like `std::pair` doesn't need a conditional. A `pair<stateless1, stateless2>` is an empty type, and therefore we could declare that `pair` would be stateless if both types passed to it are stateless.
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/3ffd6df6-ba77-414c-aeef-7ec23206bb88%40isocpp.org.

Nicol Bolas

unread,
Jul 19, 2016, 3:44:19 PM7/19/16
to ISO C++ Standard - Future Proposals
On Sunday, July 17, 2016 at 6:51:58 AM UTC-4, Bengt Gustafsson wrote:
When it comes to arrays of stateless objects I think there is a fourth option worth considering:

4. arrays of stateless objects take no room in the declared scope, but pointer arithmethics and indexing works as if sizeof(T) is 1, (which indeed it is).

This allows loops over the array elements and pointer arithmetic and similar operations to work as for any type. Of course you can't actually access the array elements, but this doesn't matter as there is nothing there anyway. I can't see that it would be worse to, say, do this:

A stateless object is supposed to be an object you can access. The whole point of defining statelessness as I have is to make such objects behave like regular objects as much as possible. Passing pointers and references to stateless subobjects is not optional.

Remember the primary use case:

template<typename Alloc>
struct Block
{
  stateless
(auto) Alloc a;
};

If I can't do something as simple as pass a pointer/reference to `a` around, even as a function parameter, why would you do this rather than make `Alloc` be a base class? You've got to permit the same functionality as in the base class case, or the whole idea is pointless.

Having "stateless objects" where you can't get a valid pointer to the object is a non-starter. Better to outright forbid requests which cannot be fulfilled than to pretend to fulfill those requests, but then tack on a bunch of conditions on their use.

If you cannot make a stateless array work like a regular array, then just don't allow them.

On Sunday, July 17, 2016 at 6:59:43 AM UTC-4, Bengt Gustafsson wrote:
Sorry, that got sent before it was complete...

I was aiming to point out that the this pointer when calling the method on the scalar s is just as invalid as when calling it for arr[1].

Oh, right now I see that this may cause an aliasing problem if the array address range goes outside the surrounding object, and then for instance two of those objects are allocated after each other on the heap.

That's not really an aliasing problem as a "my pointer isn't valid" problem.
 
This brings up an idea I have had in the back of my head for a couple of days:

Couldn't it just be demanded from the compiler to guarantee that no two objects of the same type have the same address? Here is an example:

No. That would mean that `stateless` becomes a suggestion, not a requirement. That's bad. We're trying to improve EBO, not take its bad ideas.

It would also prevent implementations from having null conversions to a particular stateless subobject (that is, keeping the pointer the same as the container). And keeping that door open is important for other features I want to see.

But in any case, if someone uses `stateless`, they should mean it. And if it can't be stateless, then it should be a hard error. After all, this is all about being able to control the layout of a type. If your layout controls are optional, why bother?

On Sunday, July 17, 2016 at 8:02:38 AM UTC-4, Bengt Gustafsson wrote:
This also touches on the discussion of inheriting from T to get EBO and then having the first NSDM being a T too, which creates an alias situation today, which this rule would (potentially) remove.

No, it does not create an aliasing situation. Such a type is not standard layout, and therefore EBO is not required. And [intro.object]/6 makes it clear that the only legal layouts for such a type are those where the two subobjects would have different addresses.

So that's not a case we need to be concerned about. I mean, we have to be concerned about it from the perspective of how it works if you declare one or both objects to be stateless subobjects. But the old way that works ought to be fine.

This rule could also be refined to guarantee that there is no aliasing between stateless variables and other variables even if they have different types:

I see no reason to impose this limitation. Permitting two subobjects with different types to overlap in storage should not be a standardization problem. [intro.object]/5 explicitly permits it in certain cases, so we can expand this to others.

The only problem is dealing with two stateless subobjects of the same type. Or stateless subobjects and non-stateless base classes that have the same type. Or an annoyingly huge variety of similar cases.

But I think I've found a halfway decent rule that makes this work.


I think we are definitely making progress, but it is still unclear if this reasoning leads all the way to a useful proposal. My conclusions are that 'stateless' on class and variable levels carry part of the semantic meaning and that all four combinations are different:

I don't at all agree with your particular combinations, but I am leaning towards a different distinction. I think that the syntax used on subobject declarations should be different from what you specify on a type. At this point in the design, they are diverging enough to be worthy of separate syntax.

I'm leaning towards calling the subobject declaration `zero_sized`, since that is what it means: the subobject is zero-sized. Not the type itself, just the subobject.

Applying `stateless` to a type means that, when it is used as a subobject, it will be `zero_sized`, but it also means that the type has no identity. Oh, it still has an address, like any object. But the standard will no longer guarantee that the address of one instance of that type will be distinct from any other instance of that type. Also, a `stateless` type cannot have subobjects that are merely `zero_sized`; they must be types declared `stateless`. After all, if the containing type has no guaranteed unique identity, then neither can any type it contains.

But at no time does either one mean that `this` does not exist or that you can't get pointers to zero-sized subobjects.

Nicol Bolas

unread,
Jul 19, 2016, 7:35:44 PM7/19/16
to ISO C++ Standard - Future Proposals
Yet another revision.

This one puts forth my preferred solution to the so-called unique identity problem. Namely, finding a good rule to forbid cases where stateless subobjects would lose identity, but allowing types declared stateless to have identity. This also means that stateless types cannot have subobjects that are not themselves declared stateless.

The new version also recognizes something I hadn't realized about `is_empty`. Namely that being empty by that definition does not mean that the type's size is 1. An `is_empty` type does not have to be standard layout.

As such, I've changed the wording to not talk about empty types with stateless subobjects. Instead, I define a new concept: possibly stateless types (PSTs), which is a type that satisfies `is_empty` and `is_standard_layout`, since only these types are required by the standard to have no non-zero-sized subobjects.

Furthermore, I've been thinking a lot about making a syntactic change. Since `stateless` as it applies to types now has a stronger meaning, it occurs to me that a stateless type is something rather different from a stateless subobject. Stateless types imply possible loss of identity, while stateless subobjects do not. I've also looked up some rules on how base class layouts are treated, and I noticed that the standard had quite a few provisions for "zero sized" subobjects. Granted, only base class subobjects, but that can be changed.

So I'm thinking that it would be best to change the per-subobject declaration to `zero_sized` or something like that. This makes it clear that the subobject takes up no size in the container's layout. `stateless` would still be used on types, since that represents something much stronger: a willingness to lose identity and so forth.
Stateless Subobjects.html
Stateless Subobjects.md

Arthur O'Dwyer

unread,
Jul 19, 2016, 8:47:20 PM7/19/16
to ISO C++ Standard - Future Proposals
On Tue, Jul 19, 2016 at 12:44 PM, Nicol Bolas <jmck...@gmail.com> wrote:
On Sunday, July 17, 2016 at 6:59:43 AM UTC-4, Bengt Gustafsson wrote:

Couldn't it just be demanded from the compiler to guarantee that no two objects of the same type have the same address? Here is an example:

No. That would mean that `stateless` becomes a suggestion, not a requirement. That's bad. We're trying to improve EBO, not take its bad ideas.

But in any case, if someone uses `stateless`, they should mean it. And if it can't be stateless, then it should be a hard error. After all, this is all about being able to control the layout of a type. If your layout controls are optional, why bother?

All "layout controls" in C++ are optional right now, because the implementation is permitted to insert padding bytes anywhere it pleases (even in a standard-layout POD struct).  Yes, yes, the members have to be in a certain order and the first member must be at offset 0; but if the compiler chooses to put your three consecutive S-sized objects at offsets S+1, 2S+2, and 3S+47, there's really nothing you can do about it except complain about its quality of implementation.  I suppose this property would continue to hold even when S==0.

So I think Bengt's idea deserves more than just a quick writeoff; I think it's potentially useful.


> It would also prevent implementations from having null conversions to a particular stateless
> subobject (that is, keeping the pointer the same as the container). And keeping that door
> open is important for other features I want to see.

Given
    struct S { stateless M m1; stateless M m2; int x; stateless M m3; };
do you mean that you want
    assert( (void*) &s == (void*) &s.m1 );   // I approve of this
or
    assert( (void*) &s == (void*) &s.m2 );   // I think this is sketchy only because m1 and m2 have the same type
    // Bengt's suggestion is to force the implementation to insert a padding byte between m1 and m2
    // so as to preserve this assertion's falseness
or
    assert( (void*) &s == (void*) &s.m3 );   // I think this is flat-out crazy, fwiw
?

Also, either the mention of "other features I want to see" is a distraction, or else it's relevant; if it's relevant, could you explain and/or link to an explanation of those features?

–Arthur

Nicol Bolas

unread,
Jul 20, 2016, 2:46:24 AM7/20/16
to ISO C++ Standard - Future Proposals


On Tuesday, July 19, 2016 at 8:47:20 PM UTC-4, Arthur O'Dwyer wrote:
On Tue, Jul 19, 2016 at 12:44 PM, Nicol Bolas <jmck...@gmail.com> wrote:
On Sunday, July 17, 2016 at 6:59:43 AM UTC-4, Bengt Gustafsson wrote:

Couldn't it just be demanded from the compiler to guarantee that no two objects of the same type have the same address? Here is an example:

No. That would mean that `stateless` becomes a suggestion, not a requirement. That's bad. We're trying to improve EBO, not take its bad ideas.

But in any case, if someone uses `stateless`, they should mean it. And if it can't be stateless, then it should be a hard error. After all, this is all about being able to control the layout of a type. If your layout controls are optional, why bother?

All "layout controls" in C++ are optional right now, because the implementation is permitted to insert padding bytes anywhere it pleases (even in a standard-layout POD struct).  Yes, yes, the members have to be in a certain order and the first member must be at offset 0; but if the compiler chooses to put your three consecutive S-sized objects at offsets S+1, 2S+2, and 3S+47, there's really nothing you can do about it except complain about its quality of implementation.  I suppose this property would continue to hold even when S==0.

So I think Bengt's idea deserves more than just a quick writeoff; I think it's potentially useful.

You are correct; you cannot a priori know the layout of a standard layout struct. But you can a priori know if two structs are layout compatible. And that's what we're competing with here. Namely, this:

struct empty{};

struct foo
{
 
int x;
};

struct bar1 : public empty
{
 
int y;
};

struct bar2
{
 
int z;
  stateless empty e
;
};

The C++ standard requires that `foo` and `bar1` have the same layout.  We are trying to define a feature that allows us to replace `bar1` with `bar2`. If our feature is going to be able to do that, it needs to have the same degree of certainty. Not an optimization or quality of implementation, but a hard requirement.

Because if we can't require that, why should anyone prefer the uncertainty of `bar2` over the absolute certainty of `bar1`?

> It would also prevent implementations from having null conversions to a particular stateless
> subobject (that is, keeping the pointer the same as the container). And keeping that door
> open is important for other features I want to see.

Given
    struct S { stateless M m1; stateless M m2; int x; stateless M m3; };
do you mean that you want
    assert( (void*) &s == (void*) &s.m1 );   // I approve of this
or
    assert( (void*) &s == (void*) &s.m2 );   // I think this is sketchy only because m1 and m2 have the same type
    // Bengt's suggestion is to force the implementation to insert a padding byte between m1 and m2
    // so as to preserve this assertion's falseness
or
    assert( (void*) &s == (void*) &s.m3 );   // I think this is flat-out crazy, fwiw
?
 
In the current version of the proposal, the meaning of that code depends on what `M` is declared as.

If `M` is just a regular potentially-stateless type (is_empty and is_standard_layout), then `S` is a compile error. The current proposal makes it explicitly illegal to have two subobjects of the same type, at least one of which is declared stateless, and they can possibly alias to the same address (note: the actual unique identity rule is rather more well-defined, though it does cull out some valid use cases in the interests of simplicity).

Unless of course `M` is explicitly declared as a `stateless` type. In which case, this code compiles, `sizeof(S) == 1`, all three variables point to the same location. Why is that OK? Because you declared the type to be `stateless`, and therefore you specifically declared that it was OK for the object to alias with another instance of itself.

Self-aliasing is, and must be, opt-in. If the user wants it, they can have it, but it is a property of the type and it is not available by default. Self-aliasing isn't a fundamental problem with C++'s object model; it's simply something that we cannot impose upon types, since C++ has never allowed such a thing before. Thus, having a way to explicitly say "It's OK to self-alias me" is a good thing.

The reason for making it illegal if `M` isn't stateless is what I said before: you don't want stateless to be a suggestion. Either the type can fulfill the request made of it, or the programmer needs to be informed that it cannot.

Lastly, think very carefully about what you're saying. Let's say we do things your way, where instead of forbidding such declarations, we allow it. But in these cases we bloat the type silently (even though that was explicitly what the user did not want to have happen).

So... is `S` empty? Well, I recently learned that `is_empty` counts types that aren't standard layout (due to using multiple empty base classes of the same type), so by your reckoning, I suppose that's what would happen here. We would declare that `S` is nominally empty, but it would not be empty in any useful way. For example, you could never get EBO with it.

Would `S` be standard layout too? I don't think it's possible to declare that it would be standard layout. Why? Because if you did make it standard layout and empty... you'd break the concept of standard layout.

If a type is both standard layout and empty, then that type is truly empty by the nature of its layout. The size of it should be 1, and it should be layout-compatible with every other type that is empty and standard layout. If you use it as the base class of another standard layout type, then you are guaranteed that it will not take part in the layout of that type (unless you break one of the other rules of standard layout).

A class that contains 3 stateless subobjects which take up space cannot be both empty standard layout. It cannot be layout compatible with any other standard layout/empty class. You cannot use it as a base class without it disrupting layout. And so forth.

Complexities like this is why it's so much better to just flat-out forbid these kinds of things. My rule for edge cases like this is that if you can't make it truly stateless, make it illegal. That's the reason why stateless arrays are forbidden, after all.

Also, either the mention of "other features I want to see" is a distraction, or else it's relevant; if it's relevant, could you explain and/or link to an explanation of those features?

Huh. I guess I forgot to link to the old thread in my initial post. I thought I had.

The main thing there is that it's very important that it be possible to implement stateless subobjects them such that their pointer always points to their direct containing type. If you can't do that, then stateless inner classes would have to not be stateless (or rather, they'd bloat the state of their container, which is exactly what the whole idea is trying to avoid). That is, it's important to be able to pass a pointer/reference of an inner class to someone, and from that pointer alone, the system needs to be able to reconstruct the containing instance pointer. And the only way I can see to do that for stateless inner class members is if the address of the member is actually the address of the containing class itself. Always. No matter what.

Now, since we're starting to diverge in the distinction between zero-sized subobject and stateless types, it would not be unreasonable to say that a zero-sized subobject could get an address that's anywhere, but a stateless subobject would get the address of its container (since they're allowed to freely alias anyway).

Nicol Bolas

unread,
Jul 20, 2016, 6:41:46 PM7/20/16
to ISO C++ Standard - Future Proposals
When I was documenting this idea (that is, `zero_sized` not being a binding request) in the proposal as an alternative to fixing the unique identity problem, I came to realize something. Well, besides the fact that I hate this idea with the fiery intensity of a million suns.

It could be a palatable solution with two changes:

1: `stateless` types are immune. My most recent version of the proposal says that if you declare a type `stateless`, then you are declaring that the type doesn't want or expect unique identity. That should remain true: if you use a stateless type as a subobject, it never disturbs the type's layout. Even if you put 20 of them in the same type and nothing else, they will all alias the same memory address happily.

2: `zero_sized` needs to work like empty/standard layout types in standard layout base classes. That is, so long as you follow the standard layout rules, they are guaranteed to be zero sized. Breaking the rules is OK; you'll still have a functioning data structure. But the layout of the type may be perturbed by the presence of those fields.

The latter is very important, because it allows the user to not have to rely on "quality of implementation", so long as they're following the rules. And thus, they have as much certainty in `zero_sized` declarations not causing bloat as they do in empty base classes in standard layout types.

The only reason I am even considering this is because I haven't managed to come up with a reasonable "unique identity rule" that only causes a compile error in cases where aliasing is guaranteed to happen. By doing things this way, the "unique identity rule" would simply specify the circumstances when the user can guarantee that the layout will not be disturbed. Effectively, it would be an addition to the standard layout rule, since a type that violates it can't be standard layout.

Also, the user can use `static_assert(is_standard_layout_v<T>);` to detect if the type is standard layout.

It would also mean that, since applying `zero_sized` would always be a suggestion, if the type isn't actually empty&standard layout, there would be no need for a compile error. And thus, no need for `zero_sized(auto)` to suppress such compile errors.

Nicol Bolas

unread,
Jul 25, 2016, 11:08:26 PM7/25/16
to ISO C++ Standard - Future Proposals
The next revision is here.

This one makes the division between "stateless types" and "zero sized subobjects" formal. `zero_sized` is what you apply to subobject declarations, and `stateless` is what you apply to types.

I'm kinda at a fork in the road with the unique identity problem. The proposal presents the same solution as before: if a type violates the "unique identity rule", you get a hard compile error.

The problem is the sheer complexity of coming up with a simple yet comprehensive version of the "unique identity rule". Any rule that is sufficiently "simple" leaves out a lot of cases where aliasing cannot possibly occur. And any rule that is even remotely comprehensive is just too complicated to even specify in simple prose, let alone standardese.

There is a solution of course: abandon the unique identity rule and make `zero_sized` a suggestion rather than a requirement. The proposal actually spends some time looking over that alternative. And while I hate it, I cannot deny... that it actually works.

Basically, this form of the "unique identity rule" is simplified because it only works for standard layout types. `zero_sized` is effectively treated like empty base classes in standard layout types. They are guaranteed to be zero sized... unless you break the rules of standard layout by using the same base class type twice or having the first NSDM match a base class type. This version simply adds all other `zero_sized` declared subobject to that list of types that cannot appear more than once. If they do, then the type is no longer standard layout and thus the compiler is free to do whatever it wants with the layout.

It even permits users to enforce whether layout can be maintained, since you can always `static_assert(is_standard_layout_v<T>)` on the type. This also means that `zero_sized(auto)` is not necessary, since any use of `zero_sized` is just a suggestion (though this would make the wording a bit more odd, since only an empty type can actually be zero sized, regardless of its declaration).

Personally, I much prefer having `zero_sized` be a hard requirement. And yet, I can't deny the simplicity (and orthogonality with empty base classes) that having `zero_sized` only work in standard layout types.

Which is your preference? The proposal attached here presents both, though the `zero_sized`-as-suggestion part is down in the "Unique Identity Problem" section.

But in either case, `stateless` declared types are always exempt from this sort of issue. They can alias with one another just fine (which is why we require the type to declare itself such), so the compiler need not insert special layout stuff to prevent that.
Stateless Subobjects.html
Stateless Subobjects.md

Matthew Woehlke

unread,
Aug 2, 2016, 3:39:53 PM8/2/16
to std-pr...@isocpp.org
On 2016-07-16 06:55, Bjorn Reese wrote:
> Can the stateless subobjects access the member variables of the parent
> class?

Not in and of themselves. You're thinking of inner classes. An inner
class may be stateless, but a stateless member is not necessarily an
inner class.

--
Matthew
Reply all
Reply to author
Forward
0 new messages