Virtual data members

322 views
Skip to first unread message

fourt...@gmail.com

unread,
Jan 24, 2013, 12:39:00 AM1/24/13
to std-pr...@isocpp.org
It's common when working with class hierarchies that you'll want to describe the object's type in some useful way that can be dynamically accessed. A good example of this would be reflection where you might want to store description information for each class. Two obvious approaches for doing this now would be:

1) Virtual function:
  class Base{ virtual bool IsComposite(){return false}; };
 
class Single:public Base{};
 
class Multi:public Base{virtual bool IsComposite(){ return true; }};


2) Data member:
  class Base{ bool IsComposite; Base():IsComposite(false){} };
 
class Single : public Base{};
 
class Multi : public Base{Multi():IsComposite(true){}};


The first approach suffers from virtual function call overhead -- follow the vtable pointer then follow the function pointer and execute. The second approach uses unnecessary additional memory to store the bool per object instance and does not adequately represent that it is a property of the type.

I'd like to float the idea of virtual data members. A virtual data member is a static const value that is stored in the class's vtable. 'Virtual' simply provides a method to access a static const data member according to the dynamic type of an object. Subclasses can override the initialization (but not the type) of these virtual data members.  This is a middle ground between the two approaches above, storing only only one value per class instead of per object.

So in this example we could instead do:
class Base {virtual const bool IsComposite=false;};                 //vtable contains value false
class Single : public Base{};                                       //vtable contains value false
class Multi : public Base { virtual const bool IsComposite=true;};  //vtable contains value true

...
{
Base*b= new Derived;
b
->IsComposite; //false

Base*m= new Multi;
m
->IsComposite; //true
Base::IsComposite;//false
}


Although the standard doesn't require the existence of a vtable, we can colloquially expect that the compiler would implement this by simply inserting the bool into a properly-aligned, well-defined location in the vtable of Base and each subclass of Base.

The syntax here can be flexible but I think that virtual is a great word to use. Override and final can map naturally to the same intuitive meaning. Unfortunately the =0 syntax for pure virtual functions doesn't map over so nicely here since =0 would likely be understood as the value initializer. Instead, =delete can be used to mark the data as 'pure virtual'. Classes with pure virtual data members cannot be instantiated and pure virtual data members cannot be statically accessed.

Thoughts?

Nicol Bolas

unread,
Jan 24, 2013, 3:03:20 AM1/24/13
to std-pr...@isocpp.org, fourt...@gmail.com
I can't really consider this a problem significant enough to be worth solving. All it saves is a few cache-line reads on a virtual call, or a couple of bytes of memory per object that you use in inheritance scenarios. Is this really something that deserves a language change? Are we really agonizing over a few bytes of data in inheritance graphs?

It's also not entirely clear what the lifetime of this virtual variable is. Are literal types the only types allowed, or are user-defined types allowed? When do they get created, and who is responsible for doing so? When do they get destroyed?

Also, overloading `= delete` to mean something completely different from what `= delete` normally means is not going to be acceptable. I would much rather you either not be allowed to have pure-virtual data members at all, or force constructor/uniform initialization syntax for them:

virtual int dataMember1(2);
virtual int dataMember2{3};
virtual int dataMember3 = 4; //Not legal.

DeadMG

unread,
Jan 24, 2013, 6:15:25 AM1/24/13
to std-pr...@isocpp.org, fourt...@gmail.com
I would not dismiss this so readily. Clang, for example, would much appreciate such a feature (if they ever got out of C++03). The fact is that people roll their own RTTI for speed, and this is usually how it's implemented, and they do that because they need the absolutely maximum performance. Giving them even higher performance would seem to be an appropriate optimization.

As for lifetime and other issues, they are just regular static variables. The only thing unusual about them is that they happen to be placed in a special segment of memory. However, this would mean that if you are on a system with both ROM and RAM, you would not be able to place some vtables in ROM anymore if a non-literal type was used.

fourt...@gmail.com

unread,
Jan 24, 2013, 11:06:20 AM1/24/13
to std-pr...@isocpp.org, fourt...@gmail.com
That's right - these would have all the same initialization and lifetime rules as static const variables.

I don't see any reason why we couldn't allow user-defined types since it follows the same basic rules as static const. The RAM/ROM issue is interesting for platforms that support it. I believe this can be accounted for by the compiler. If you need to guarantee that the vtable is placed in ROM then declaring the data member as constexpr e.g. 'virtual constexpr MyType(1);' should be sufficient to support that guarantee. The compiler can have its own switches for e.g. raising a warning or error if a vtable cannot be placed in ROM.

I do see this as a valuable problem worth solving.Rarely is it actually a feasible solution to store type information directly in the object because that will significantly bloat structure size beyond some trivial amount of data.  Any non-trivial implementation of roll-your-own type information is often going to require a per-object pointer to some class object which is basically what the vtable pointer already exists for. Allowing us to stuff additional data in this existing structure would be great for cache locality since some portion of the vtable would often already be in memory. 

Consider the existing solution one might implement where each object stores a pointer-to-type-info. I know this is a common implementation in many object frameworks. By moving to virtual data you would have the advantage of no longer needing this per-object pointer-to-type and have improved chances that the data you want will already be cached. The disadvantage is that the type info you store would now need to be static const.

The advantage over virtual function calls is instruction cache locality and saving the overhead of a function call. Function call overhead can be significant in tight loops especially since these calls cannot in general be inlined. The compiler knows that simply accessing this data cannot modify global state like a virtual function could and so may present significant opportunities for optimization in the surrounding code where virtual functions are converted to virtual data.  The disadvantage is again static const.

The advantage over storing the data in the object is an reduction in per-object memory usage which may help with cache locality. The disadvantage is the static const requirement and the single indirection to get at the data.

For cases where you do indeed have static const/constexpr data that you want associated with a type, virtual data members could be a very useful tool for improving performance and making good guarantees about your type.

Sylvester Hesp

unread,
Jan 24, 2013, 11:09:22 AM1/24/13
to std-pr...@isocpp.org, fourt...@gmail.com


On Thursday, January 24, 2013 12:15:25 PM UTC+1, DeadMG wrote:
The fact is that people roll their own RTTI for speed, and this is usually how it's implemented, and they do that because they need the absolutely maximum performance.

Be that as it may, I think it's more productive to actually extend / implement the RTTI and introspective capabilities of C++. This proposal basically suggests a specialization of attributes - some extra properties stored alongside the type. There is really no reason why it should be limited to classes. Just look at the possibilities of attributes in other languages such as Java and .Net.

So a -1 for this proposal for me. But if it will be considered, a couple of points:
- I don't like the term "virtual data members", nor the syntax. It suggests instance members, while they're really static. I think "virtual static int member" would be better syntax.
- Why are they being initialized in the constructor? Do I really need to construct an instance for them to be initialized? And can the initialization depend on constructor parameters? If they're static, at least treat them the same as other static members.

fourt...@gmail.com

unread,
Jan 24, 2013, 11:19:02 AM1/24/13
to std-pr...@isocpp.org, fourt...@gmail.com
On Thursday, January 24, 2013 10:09:22 AM UTC-6, Sylvester Hesp wrote:
- Why are they being initialized in the constructor? Do I really need to construct an instance for them to be initialized? And can the initialization depend on constructor parameters? If they're static, at least treat them the same as other static members.

 My examples above are initializing them exactly the same as static const member data would be initialized, not in the constructor. The initialization would occur during static initialization like normal static data members.

DeadMG

unread,
Jan 24, 2013, 11:45:52 AM1/24/13
to std-pr...@isocpp.org, fourt...@gmail.com
Extending the introspective capacity of C++ wouldn't solve the same problem. We already have the capacity to get this kind of value- the problem is that the native implementation is too slow for a number of reasons. Reflection and this proposal do not solve the same problem.

Nicol Bolas

unread,
Jan 24, 2013, 3:28:48 PM1/24/13
to std-pr...@isocpp.org, fourt...@gmail.com
On Thursday, January 24, 2013 8:45:52 AM UTC-8, DeadMG wrote:
Extending the introspective capacity of C++ wouldn't solve the same problem. We already have the capacity to get this kind of value- the problem is that the native implementation is too slow for a number of reasons.

Isn't that a problem that should be solved by the implementations? I mean, this all sounds like a catch-22: the implementors have no pressure to improve RTTI performance because people don't rely on its performance. And people don't rely on RTTI performance because implementors don't make it a priority.

How would that change with this new feature? Is there something about the current RTTI definition that makes reasonable performance impossible? If so, isn't that what we should fix, rather than promoting hack "solutions"?

DeadMG

unread,
Jan 24, 2013, 4:20:44 PM1/24/13
to std-pr...@isocpp.org, fourt...@gmail.com
Last I checked, changing the language so that the current RTTI can be performance-competitive would effectively result in modules, or possibly something even more extreme in terms of changing the language- i.e., it's effectively an unfixable problem. The reason why the existing RTTI is so slow is basically because all inheritance hierarchies are arbitrarily complex, from the point of view of the runtime, whereas Clang can do something much cheaper, since it knows that, say, nobody inherits from llvm::Function- whereas the compiler does not know such a thing.

Nevin Liber

unread,
Jan 24, 2013, 4:24:22 PM1/24/13
to std-pr...@isocpp.org
On 24 January 2013 15:20, DeadMG <wolfei...@gmail.com> wrote:
Last I checked, changing the language so that the current RTTI can be performance-competitive would effectively result in modules, or possibly something even more extreme in terms of changing the language- i.e., it's effectively an unfixable problem. The reason why the existing RTTI is so slow is basically because all inheritance hierarchies are arbitrarily complex, from the point of view of the runtime, whereas Clang can do something much cheaper, since it knows that, say, nobody inherits from llvm::Function- whereas the compiler does not know such a thing.

Even with final (not that I'm a fan of final, as it inhibits inheritance for the sake of morphing the interface)?
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404

Lawrence Crowl

unread,
Jan 24, 2013, 4:51:54 PM1/24/13
to std-pr...@isocpp.org, fourt...@gmail.com
On 1/24/13, Nicol Bolas <jmck...@gmail.com> wrote:
> I can't really consider this a problem significant enough to be worth
> solving. All it saves is a few cache-line reads on a virtual call, or a
> couple of bytes of memory per object that you use in inheritance scenarios.

Some objects are instantiated enough so that a couple of bytes per
object could be large amounts of memory.

Likewise, a few cache-line reads could turn into a few cache-line
misses, which could turn into hundreds of cycles.

If we can reliably move a significant amount of static information
into the virtual tables, we should consider it. However, it is not
clear to me from the description that we have the evidence yet.
Is there an application in which such a feature would have made
a difference?

>
> Is this really something that deserves a language change? Are we really
> agonizing over a few bytes of data in inheritance graphs?
>
> It's also not entirely clear what the lifetime of this virtual variable is.
>
> Are literal types the only types allowed, or are user-defined types
> allowed? When do they get created, and who is responsible for doing so?
> When do they get destroyed?
>
> Also, overloading `= delete` to mean something completely different from
> what `= delete` normally means is not going to be acceptable. I would much
> rather you either not be allowed to have pure-virtual data members at all,
> or force constructor/uniform initialization syntax for them:
>
> virtual int dataMember1(2);
> virtual int dataMember2{3};
> virtual int dataMember3 = 4; //Not legal.
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ISO C++ Standard - Future Proposals" group.
> To post to this group, send email to std-pr...@isocpp.org.
> To unsubscribe from this group, send email to
> std-proposal...@isocpp.org.
> Visit this group at
> http://groups.google.com/a/isocpp.org/group/std-proposals/?hl=en.
>
>
>


--
Lawrence Crowl

Sylvester Hesp

unread,
Jan 24, 2013, 4:52:23 PM1/24/13
to std-pr...@isocpp.org, fourt...@gmail.com
On Thursday, January 24, 2013 10:20:44 PM UTC+1, DeadMG wrote:
Last I checked, changing the language so that the current RTTI can be performance-competitive would effectively result in modules, or possibly something even more extreme in terms of changing the language- i.e., it's effectively an unfixable problem. The reason why the existing RTTI is so slow is basically because all inheritance hierarchies are arbitrarily complex, from the point of view of the runtime, whereas Clang can do something much cheaper, since it knows that, say, nobody inherits from llvm::Function- whereas the compiler does not know such a thing.

We're not dealing with inheritance structures. What's slow is dynamic_cast, not typeid. The latter typically takes the typeinfo usually right out of the vtable (well it's somewhat more complicated than that as it needs to work with non-vtable objects as well, but it's still constant time complexity). Exposing class attributes in a similar way could have *exactly* the same performance as is suggested by this proposal.

Brent Friedman

unread,
Jan 24, 2013, 9:17:43 PM1/24/13
to std-pr...@isocpp.org, fourt...@gmail.com
The concern with respect to overlapping functionality with attributes is valid, but I'm not sure that they could have the same performance as you suggest.

Suppose class Base has no attributes. Class Derived : public Base has attribute [[special]]. Given a Base pointer I want to know if it is [[special]]. Given that there are an arbitrary number of user-defined attributes so we can't store them all for every class -- how could this lookup be done without say a hash table?

Additionally, it may be possible to drop the requirement that virtual data be const. If that requirement were dropped then it opens up a lot of possibilities that attributes could not handle. If the value was non-const then you could use it to store e.g. number of objects currently instantiated, or the head of an intrusive linked list of all objects of this type. I'm not sure if non-const data could be placed in the vtable which is why I didn't originally include it. But if it can be allowed it would significantly strengthen the value this offers.

DeadMG

unread,
Jan 24, 2013, 9:21:38 PM1/24/13
to std-pr...@isocpp.org
As for RAM/ROM, one could conceivably create two tables, place one in ROM, as the regular vtable, and simply add a pointer to another table in RAM for non-const members. 

Nicol Bolas

unread,
Jan 24, 2013, 9:38:41 PM1/24/13
to std-pr...@isocpp.org, fourt...@gmail.com


On Thursday, January 24, 2013 1:52:23 PM UTC-8, Sylvester Hesp wrote:
On Thursday, January 24, 2013 10:20:44 PM UTC+1, DeadMG wrote:
Last I checked, changing the language so that the current RTTI can be performance-competitive would effectively result in modules, or possibly something even more extreme in terms of changing the language- i.e., it's effectively an unfixable problem. The reason why the existing RTTI is so slow is basically because all inheritance hierarchies are arbitrarily complex, from the point of view of the runtime, whereas Clang can do something much cheaper, since it knows that, say, nobody inherits from llvm::Function- whereas the compiler does not know such a thing.

We're not dealing with inheritance structures. What's slow is dynamic_cast, not typeid.

OK, but before we solve a problem, let's analyze why dynamic_cast is slow to begin with. What exactly is it about this feature that [i]requires[/i] that it be slow, and why would this "virtual data" be somehow faster? Isn't there a way to implement dynamic_cast so that it could just automatically generate and use this data?

Brent Friedman

unread,
Jan 24, 2013, 9:51:36 PM1/24/13
to std-pr...@isocpp.org, fourt...@gmail.com
On Thursday, January 24, 2013 8:38:41 PM UTC-6, Nicol Bolas wrote:

OK, but before we solve a problem, let's analyze why dynamic_cast is slow to begin with. What exactly is it about this feature that [i]requires[/i] that it be slow, and why would this "virtual data" be somehow faster? Isn't there a way to implement dynamic_cast so that it could just automatically generate and use this data?

Dynamic cast does not solve the problems that I'm attempting to solve here. A dynamic cast allows you to ask if object a is of type b. I want to retrieve property c on the type of object a regardless of what that type is. This question cannot be answered with RTTI.

kazu...@gmail.com

unread,
Jan 24, 2013, 10:50:39 PM1/24/13
to std-pr...@isocpp.org
Hi,

Great proposal!  For quite some time I have been dreaming of such a feature!

One more thing, could it be named "virtual static" instead of "virtual const", and allow modification of the variable / support "virtual static" functions?

Such feature is necessary in some cases when you are using GC.  The example below is a code snippet of a GC'ed object class and a derivation of the object class with shared information (that needs to be freed once  all instances of the derived type are being collected).

Under the current C++ spec, we need to do a virtual call for every single sweep.

struct GCObject {
 
virtual ~GCObject() {}
};


struct Derived : public GCObject {
 
static size_t numObjectsOfType_;
 
static SharedInfo* shared_;
 
Derived() {
   
if (numObjectsOfType_++ == 0)
      shared_
= newSharedInfo();
   
...
 
}
 
virtual ~Derived() {
   
if (--numObjectsOfType_ == 0)
     
delete shared_;
 
}
};


void sweepObject(GCObject* o) {
  o
->~GCObject();
}

But once "virtual static" variables and functions are introduced, the overhead of the virtual function call can be omitted by writing code like:

struct GCObject {
 
virtual static size_t numObjectsOfType_;
 
virtual static void freeShared() {}
 
~GCObject() {} // non-virtual
};


struct Derived : public GCObject {
 
static ShareInfo* shared_;
 
virtual static freeShared() {
   
delete shared_;
 
}
 
Derived() : GCObject() {
   
if (numObjectsOfType_++ == 0)
      shared_
= newSharedInfo();
 
}
};


void sweepObject(GCObject* o) {
 
if (--o->numObjectsOfType_ == 0)
    o
->freeShared();
}

The example might not be good enough to illustrate the necessity of non-const virtual static variables (since in the example, shared info could be collected using GC as well instead of using refcounting), but I wanted to bring my question onto the table; why can't non-const variables be stored in vtbl?

Brent Friedman

unread,
Jan 24, 2013, 11:48:53 PM1/24/13
to std-pr...@isocpp.org, kazu...@gmail.com
On Thu, Jan 24, 2013 at 9:50 PM,  <kazu...@gmail.com> wrote:
 why can't non-const variables be stored in vtbl?

If it's feasible for the compiler to support then I'd love to drop the const restriction and just make this a storage class specifier with no additional requirements.

And I'm quite okay with making the syntax be 'virtual static' or 'static virtual' since that is just a bit more clear, but it's a bit odd because it combines two storage class specifiers that are mutually exclusive in other contexts.

Brent Friedman

unread,
Jan 25, 2013, 12:03:10 AM1/25/13
to std-pr...@isocpp.org, kazu...@gmail.com

  virtual static freeShared() {
   
delete shared_;
 
}

I'm only proposing virtual static data members here, not virtual static functions. Unless this was just a mistake, I'm not sure what a virtual static function is supposed to mean here. From your code it looks like it would suffice just as a normal static function.

Richard Smith

unread,
Jan 25, 2013, 2:02:56 AM1/25/13
to std-pr...@isocpp.org, kazu...@gmail.com
virtual static functions seem like a natural extension of this
proposal. Presumably they would act exactly like normal virtual
functions, except that 'this' is not implicitly passed to them.
However, as with any feature, we would need more rationale for adding
them than just that they're implementable, and that they're the
logical result of combining two existing features.

Brent Friedman

unread,
Jan 25, 2013, 2:51:47 AM1/25/13
to std-pr...@isocpp.org, kazu...@gmail.com
On Fri, Jan 25, 2013 at 1:02 AM, Richard Smith  wrote:
 Presumably they would act exactly like normal virtual
> functions, except that 'this' is not implicitly passed to them.

I see. So a virtual static function could today be best implemented as

static TY staticfunc(...){ <body> }
virtual TY func(...){ staticfunc(...); }

I see the value in virtual static functions since it allows you to call the function statically using scope-resolution or dynamically using dynamic type. I also see a direct similarity in these two meanings of virtual static. I'm not sure what the convention is but this does seem like a separate proposal though.

Brent Friedman

unread,
Jan 25, 2013, 2:57:05 AM1/25/13
to std-pr...@isocpp.org, kazu...@gmail.com
Here are some syntactic examples I cooked up assuming that we can drop the const restriction.


class Base
{
 
public:
 
virtual static const bool IsSerializable=true; //each subclass advertises serializability
 
virtual static size_t CreationCount=0; //each subclass can count the number of instances created

 
Base()  {  ++CreationCount;}//Increments Base::CreationCount

 
void DynamicIncrement()
 
{  ++CreationCount; } // Increments ThisClass::CreationCount as chosen by the dynamic type of this
};

class Special : public Base
{
public:
 
virtual static const bool IsSerializable=false; //this class cannot be serialized.
 
Special() {  ++CreationCount; } //Increments Special::CreationCount

};

int main()
{
Base::CreationCount; // =0

Base* b = new Base;
Base::CreationCount; //=1
b
->CreationCount; // = 1, accesses static Base::CreationCount
b
->IsSerializable; // =true, accesses static const Base::IsSerializable

Base* s = new Special;
Base::CreationCount; //=2. Was incremented in Base::Base
s
->CreationCount; // =1, accesses static Special::CreationCount
s
->IsSerializable; //=false, accesses static const Special::IsSerializable
}



Richard Smith

unread,
Jan 25, 2013, 3:06:12 AM1/25/13
to std-pr...@isocpp.org, kazu...@gmail.com
Given:

struct A {
virtual static const int x = 0;
};
struct B : A {
};

Do we need a definition, "const int A::x;", if we odr-use A::x?
Do we need a definition if we don't odr-use A::x? (Are all virtual
static data members odr-used in the same way that all virtual
functions are?)
Do we have &A::x == &B::x?

(I think the answers should be: yes, yes, and yes.)

Daniel Krügler

unread,
Jan 25, 2013, 3:22:55 AM1/25/13
to std-pr...@isocpp.org, kazu...@gmail.com
2013/1/25 Richard Smith <ric...@metafoo.co.uk>:
> Given:
>
> struct A {
> virtual static const int x = 0;
> };
> struct B : A {
> };
>
> Do we need a definition, "const int A::x;", if we odr-use A::x?
> Do we need a definition if we don't odr-use A::x? (Are all virtual
> static data members odr-used in the same way that all virtual
> functions are?)
> Do we have &A::x == &B::x?
>
> (I think the answers should be: yes, yes, and yes.)

I agree with the first and second, but given that we have

"Otherwise if either is a pointer to a virtual member function, the
result is unspecified."

I would expect "unspecified" for the last (I don't know, why we have
this special unspecified rule here).

- Daniel

Richard Smith

unread,
Jan 25, 2013, 4:31:08 AM1/25/13
to std-pr...@isocpp.org, kazu...@gmail.com
I think we would need to specify this for virtual static data members,
if they are allowed to be non-const.

Brent Friedman

unread,
Jan 25, 2013, 10:20:53 AM1/25/13
to std-pr...@isocpp.org, kazu...@gmail.com
On Fri, Jan 25, 2013 at 12:22 AM, Daniel Krügler  wrote: 
> 2013/1/25 Richard Smith 

>> Given:
>>
>> struct A {
>>   virtual static const int x = 0;
>> };
>> struct B : A {
>> };
>>
>> Do we need a definition, "const int A::x;", if we odr-use A::x?
>> Do we need a definition if we don't odr-use A::x? (Are all virtual
>> static data members odr-used in the same way that all virtual
>> functions are?)
>> Do we have &A::x == &B::x?
>>
>> (I think the answers should be: yes, yes, and yes.)

The answer to (3) depends on the exact semantics of virtual static. Here I've assumed it means 'Each subclass automatically receives an instance of this type', but it could mean 'Any subclass may receive an instance of this type, if requested'. The vision here was accessing arbitrary data at vtable+offset and for the compiler to support that every subclass would be required to receive a copy of the data to keep everything aligned properly. In that case we should work with the natural implementation and say all subtypes receive this data. It would mean &A::x != &B::x.

I don't think that a separate A::x or B::x definition should be required. The storage-class specifier of virtual static is sufficient to give the compiler all the necessary information it needs. Static data must be stored in exactly one translation unit so creating a definition makes sense. With virtual static it is the compiler's duty to decide where the data is placed. Creating a definition for virtual static data in an arbitrary translation unit would seem to conflict with that. If the virtual static data is only declared but not defined then it should be valid to define the value separately but that would have no effect on where the compiler places the data. Additionally, given my above answer, it would seem quite unusual if you created struct B above and received a compilation error because B::x was not defined.
Reply all
Reply to author
Forward
0 new messages