About virtual inheritance

JiiPee

unread,

Jun 26, 2015, 4:18:39 PM6/26/15

to

Everybody knows that with diamond inheritance we need virtual
inheritance. But what does virtual inheritance do really (for one
class)? like if i have:

class A
{
public:
int data;
};

class B : virtual public A
{
public:
int data2;
};

class C : public B
{};

So what difference "virtual" makes here to class B? if I take virtual
off what would be the difference than if I leave it there?

No books/tutorials seem to explain this, they only explain the diamond
formation.

Message has been deleted

Marcel Mueller

unread,

Jun 26, 2015, 4:35:37 PM6/26/15

to

On 26.06.15 22.18, JiiPee wrote:
> Everybody knows that with diamond inheritance we need virtual
> inheritance. But what does virtual inheritance do really (for one
> class)? like if i have:
>
> class A
> {
> public:
> int data;
> };
>
> class B : virtual public A
> {
> public:
> int data2;
> };
>
> class C : public B
> {};
>
> So what difference "virtual" makes here to class B? if I take virtual
> off what would be the difference than if I leave it there?

In your example it makes no semantic difference, because there is no
multiple inheritance. But this may change if you derive from B and A.

Only if the class tree contains more than one A base, then virtual will
force them to share one instance of A. If there is only one link to A it
make no difference, of course.

> No books/tutorials seem to explain this, they only explain the diamond
> formation.

Well, there is no much to explain. Except there is a small runtime
overhead of virtual when accessing instance members of A.

Marcel

alf.p.s...@gmail.com

unread,

Jun 26, 2015, 5:00:56 PM6/26/15

to

Each ordinary inheritance of A adds a distinct A sub-objects, but all virtual inheritances of A together add just a single A sub-object. And this means that if there is more than one B base class sub-object, they must have the A sub-object at different offsets. In practice this means that each instance of B must contains some information that says at what offset the virtually inherited A sub-object is for this instance.

And so one in-practice effect is that the size of a B instance generally increases with use of virtual inheritance.

It might be possible for the compiler to tuck the offset information in somewhere unused, but generally you do get a size increase:

struct A
{
int data;
};

struct Bnv: public A
{
int data2;
};

struct Bv: virtual A
{
int data2;
};

#include <iostream>
auto main() -> int
{
using namespace std;
cout << sizeof( Bnv ) << " " << sizeof( Bv ) << endl;
}

MinGW g++ 5.1 (64-bit) reports 8 16 and Visual C++ 2015 (32-bit) reports 8 12.

The offset information has to be used at run time, and this means that unless one has an unusually smart compiler, referencing A sub-object data members will be somewhat less efficient with virtual inheritance.

Also, B's constructors may in practice have to support possible future inheritance from B, where it's a requirement that initialization of the virtual inheritance A sub-object from B is ignored, that instead it must be initialized from the most derived class. So there's decision overhead too. Although I suspect it's so marginal that it won't show up in ordinary measurements.

* * *

Summing up, in general you have a size increase, and in general somewhat less efficient access of the virtually inherited A sub-object's data items.

Cheers & hth.,

- Alf

JiiPee

unread,

Jun 26, 2015, 5:09:39 PM6/26/15

to

On 26/06/2015 21:35, Marcel Mueller wrote:
> On 26.06.15 22.18, JiiPee wrote:
>> Everybody knows that with diamond inheritance we need virtual
>> inheritance. But what does virtual inheritance do really (for one
>> class)? like if i have:
>>
>> class A
>> {
>> public:
>> int data;
>> };
>>
>> class B : virtual public A
>> {
>> public:
>> int data2;
>> };
>>
>> class C : public B
>> {};
>>
>> So what difference "virtual" makes here to class B? if I take virtual
>> off what would be the difference than if I leave it there?
>
> In your example it makes no semantic difference, because there is no
> multiple inheritance. But this may change if you derive from B and A.
>
> Only if the class tree contains more than one A base, then virtual
> will force them to share one instance of A. If there is only one link
> to A it make no difference, of course.

oh ok, then this is clear. Funny they do not explain this, as it leaves
the theory open. so it has only an effect if there is 2 (or more). I
think they should mention this in books.

JiiPee

unread,

Jun 26, 2015, 6:43:02 PM6/26/15

to

On 26/06/2015 22:00, alf.p.s...@gmail.com wrote:
> Each ordinary inheritance of A adds a distinct A sub-objects, but all virtual inheritances of A together add just a single A sub-object. And this means that if there is more than one B base class sub-object, they must have the A sub-object at different offsets.

ok this answers the question.

I was meant to ask more like : in what situation we would use virtual
inheritance like this. obviously I would not use it if not needed. But
that answers the question.

Richard Damon

unread,

Jun 27, 2015, 5:11:57 PM6/27/15

to

The simplest case that show the difference is this:

class A
{
public:
int data;
}

class B : virtual public A
{

};

class C : virtual public A
{
};

class D : public B, C {
};

without the virtual, in the class D there are two different members
data, one in D::B::A and one in D::C::A. with the virtual, both B and C
will refer to the same sub-object A.

Marcel Mueller

unread,

Jun 27, 2015, 5:39:02 PM6/27/15

to

On 26.06.15 23.00, alf.p.s...@gmail.com wrote:
> And so one in-practice effect is that the size of a B instance generally increases with use of virtual inheritance.

AFAIK this is just part of the vtable like the entry points for virtual
functions too. So as soon as you have at least one virtual function it
makes no difference in size anymore. And, well, it is quite likely to
have virtual functions when you need virtual inheritance.

Of course, none of the assumptions about sizeof are part of the
standard. They are only typical for common implementations.

Marcel

alf.p.s...@gmail.com

unread,

Jun 27, 2015, 7:13:37 PM6/27/15

to

On Saturday, June 27, 2015 at 11:39:02 PM UTC+2, Marcel Mueller wrote:
> On 26.06.15 23.00, alf.p.s...@gmail.com wrote:
> > And so one in-practice effect is that the size of a B instance generally increases with use of virtual inheritance.
>
> AFAIK this is just part of the vtable like the entry points for virtual
> functions too. So as soon as you have at least one virtual function it
> makes no difference in size anymore. And, well, it is quite likely to
> have virtual functions when you need virtual inheritance.

It would be easy to check such an assumption before posting, especially since I posted code that you could trivially amend for the purpose, e.g. just add

virtual ~A() {}

in class A, then compile and run.

I'm a big fan of checking reality. ;-)

Explanation why the vtable is (sually not used to store the offsets:

The vtable (AFAIK all extant C++ compilers use vtables) for a class is a single set of function pointers for the class, shared by all instances of that class. In contrast an offset adjustment for a virtual base sub-object is needed per object, although with a very limited number of possible values. Due to the limited number of possible values it is in principle possible to represent it via the vtable pointer, which might point to corresponding offset in a shared vtable, or to corresponding copies of a vtable. But that's neither very practical nor very efficient. The practical and not unduly inefficient way to do things is to store the offset directly in each object.

> Of course, none of the assumptions about sizeof are part of the
> standard. They are only typical for common implementations.

In a formal view that's right, but in practice we do have vtables and per instance vtable pointers(not the only possible implementation of runtime polymorphism) and in practice we do have per instance virtual base class sub-object offset (again, not the only possible implementation). I don't know of any extant compiler that isn't that way. Admittedly, nowaydays I don't know about a great many compilers, but it would be truly remarkable if such a spirit-of-PHP-like C++ compiler had been /introduced/ since way back then.

Richard Damon

unread,

Jun 27, 2015, 8:07:10 PM6/27/15

to

Actually, once you hit multiple inheretance, vtables tend to get a bit
more complicated, in my example (if we add some virtual functions), the
vtable for D would need have one table with several 'distinct' tables,
One pointed to by the actual instance of A, one put in B, and one in C.
Any new functions added in D would tend to be added after the entries
pointed by B, as would be done in single inheritance. For functions
defined in C that have been overridden in D, there needs to be something
to adjust the 'this' pointer (as D::this would point to the beginning of
the full object, while C::this would point to the beginning of the C
sub-object, since that might be all that is known at the call site.).

Similar VTables could be used for the offsets of the virtual bases. You
need to use a different pointer, as the virtual function pointer needs
to evolve as the object is created, while constructing C, it points to
the 'C' vtable, but when we move to constructing 'D' it points to the
'D' vtable. The virtual base offset table on the other hand needs to
stay constant throughout the construction process (as the layout is
fixed), being set by the 'topmost' constructor and left by the rest (as
they also skip calling the A constructor). Whether it is better to use a
pointer to the sub object or a pointer to an offset table depends on how
many virtual bases you expect in an object with virtual bases. If you
only have a single virtual base, the object pointer is simpler, and
saves the 'constant' cost of the offset table. If object frequently have
multiple virtual bases, then the table can gain the advantage of a
smaller object as you only need a single pointer to the description
object while the object pointer method needs a pointer to each virtual base.

Marcel Mueller

unread,

Jun 27, 2015, 8:27:11 PM6/27/15

to

On 28.06.15 01.13, alf.p.s...@gmail.com wrote:
> It would be easy to check such an assumption before posting, especially since I posted code that you could trivially amend for the purpose, e.g. just add

BTDT.

> virtual ~A() {}
>
> in class A, then compile and run.
>
> I'm a big fan of checking reality. ;-)

Your test is not valid, as it introduces the vtable into A rather than Bv.
Of course, the vtable of a virtual base cannot be joined.

> Explanation why the vtable is (sually not used to store the offsets:

[...]

Obviously gcc on OS/2 didn't know that.

Marcel

Alf P. Steinbach

unread,

Jun 28, 2015, 12:11:18 AM6/28/15

to

On 28-Jun-15 2:07 AM, Richard Damon wrote:
> Actually, once you hit multiple inheretance, vtables ...

Uhm, TLDR, but, in case you or other readers are not familiar with it,
the common object representations for C++ were reportedly discussed in a
good, clear way in

* Stanley B. Lippmann's 1996 “Inside The C++ Object Model”,

<url:
https://books.google.no/books/about/Inside_the_C++_Object_Model.html?id=hLdmQgAACAAJ&redir_esc=y>.

A simpler exposition of just single inheritance structures was given in
my old 2005 pointers tutorial, once referred by Wikipedia but then
offline for a long time until I put it on Google Drive,

<url:
https://drive.google.com/file/d/0B2oiI2reHOh4M2MzNzYwYzQtMGZkNC00NTljLWJiM2UtOGI0MmRkMTMyZGY4/view?ddrp=1&pli=1>.

Cheers & hth.,

- Alf

[Sorry, I first hit "Reply" instead of "Follow up"]

--
Using Thunderbird as Usenet client, Eternal September as NNTP server.

Alf P. Steinbach

unread,

Jun 28, 2015, 12:15:08 AM6/28/15

to

On 28-Jun-15 2:26 AM, Marcel Mueller wrote:
> On 28.06.15 01.13, alf.p.s...@gmail.com wrote:
>> It would be easy to check such an assumption before posting,
>> especially since I posted code that you could trivially amend for the
>> purpose, e.g. just add
>
> BTDT.
>
>> virtual ~A() {}
>>
>> in class A, then compile and run.
>>
>> I'm a big fan of checking reality. ;-)
>
> Your test is not valid, as it introduces the vtable into A rather than Bv.

Well, your argument was that

<quote>

as soon as you have at least one virtual function it makes no difference
in size anymore

</quote>

And that's now proved incorrect. The example is very valid for that.
I.e. for its purpose.

It's rather uncommon to have a non-polymorphic topmost base that one
derives virtually from. I can't think of any use case. But I imagine
that it can occur with template code.

On the other hand, when I do introduce polymorphism down in Bv and keep
A as non-polymorphic, as you now suggest, then MinGW g++ 5.1 produces 16
16, apparently using the vtable pointer to encode the offset, while
Visual C++ 2015 produces 12 16, apparently storing an offset value.

> Of course, the vtable of a virtual base cannot be joined.

Not sure what you mean here. Maybe that using the vtable pointer in A to
encode the sub-object offset would introduce a catch-22? One would need
it in order to find the offset that would allow one to find it.

But nothing prevents the compiler from introducing a vtable pointer in
each derived class. That would prevent overhead in member function
calls. And this vtable pointer could be used to encode the offset.

On the third hand, the reported sizes indicate that neither MingW g++
nor Visual C++ do that.

>> Explanation why the vtable is (sually not used to store the offsets:
> [...]
>
> Obviously gcc on OS/2 didn't know that.

Uhm, a single example that "usually" is not "all" isn't valuable in
itself. But it might just be that I'm wrong about the "usually". It's an
assessment based on what I remember about compilers, simple logic about
practicality and efficiency, the fact that having a topmost
non-polymorphic virtual base is unusual, and that Visual C++ does store
some offset information in each object in addition to vtable pointer, as
does MingW g++ for the case of a polymorphic topmost base.

Cheers & hth.,

- Alf

[Sorry, I inadvertently first hit "Reply" instead of "Follow up"]

Marcel Mueller

unread,

Jun 28, 2015, 5:46:48 AM6/28/15

to

On 28.06.15 06.15, Alf P. Steinbach wrote:
>>> virtual ~A() {}
>>>
>>> in class A, then compile and run.
>>>
>>> I'm a big fan of checking reality. ;-)
>>
>> Your test is not valid, as it introduces the vtable into A rather than
>> Bv.
>
> Well, your argument was that
>
> <quote>
> as soon as you have at least one virtual function it makes no difference
> in size anymore
> </quote>
>
> And that's now proved incorrect. The example is very valid for that.
> I.e. for its purpose.

Yes there are constraints. The virtual method must belong to the same
block of memory. I.e. to the same class or to a non virtual base. There
is an additional hack to combine the vtable pointers of a class with its
base class. But this only works for single inheritance.

So as long as A is non virtual it makes no difference whether B has a
virtual method or derives virtual or both. But as soon as A has also
virtual methods (or derives virtual) then the size increases by one
machine size word. Well, except for the case Bvnv where the virtual
tables of A and Bvnv can be combined.

struct A
{
int data;
//virtual ~A() {}

};

struct Bnv: public A
{
int data2;
};

struct Bv: virtual A
{
int data2;
};

struct Bvnv: A
{
int data2;
virtual ~Bvnv() {}
};

struct Bvv: virtual A
{
int data2;
virtual ~Bvv() {}
};

#include <iostream>
int main()
{
using namespace std;
cout << sizeof( Bnv ) << " " << sizeof( Bv ) << " " << sizeof(
Bvnv ) << " " << sizeof( Bvv ) << endl;

}

> But nothing prevents the compiler from introducing a vtable pointer in
> each derived class. That would prevent overhead in member function
> calls.

No, there is no overhead when joining vtables of base and derived. The
vtable of the derived class simply contains the vtable of the base at
the start. There is no additional indirection.

> And this vtable pointer could be used to encode the offset.

Of course, the offsets could be stored in the vtable. And well, this is
an additional indirection since first the vtable address has to be
loaded and then the offset. However, the same applies for any virtual
function call as well. But inlining the vtable in every object also has
drawbacks. First it increases the memory footprint, but now the same
values are read from different memory locations. This causes the memory
cache efficiency to decrease.
In case of virtual base classes there is another difference. Why should
one store the instance independent offsets in the class instance?
Instead the pointers to the virtual base could be stored directly. This
saves one integer addition - probably zero or one clock cycle on
nowadays CPUs.

> On the third hand, the reported sizes indicate that neither MingW g++
> nor Visual C++ do that.

I would not be that sure until I had a look to the assembler output.
Maybe it is not that easy.

>>> Explanation why the vtable is (sually not used to store the offsets:
>> [...]
>>
>> Obviously gcc on OS/2 didn't know that.
>
> Uhm, a single example that "usually" is not "all" isn't valuable in
> itself.

A Raspberry Pi (ARMhf) and LM17 (x86) and Debian Wheezy (x64) show the
same results. But you are right, all of them use some flavor of gcc. So
I guess all gcc versions do it the same way. Tested with different
versions between 3.2.2 and 4.9.2.
But you are right. Other compilers handle it differently. I.e. IBM VAC++
allocates 20 bytes rather than 12 (gcc, 32 bit) for the last test case.
But this one is more than 15 years old.

> [Sorry, I inadvertently first hit "Reply" instead of "Follow up"]

Since you are not in the white list for this address, I did't notice.

Marcel

Juha Nieminen

unread,

Jun 29, 2015, 4:14:50 AM6/29/15

to

JiiPee <n...@notvalid.com> wrote:
> Everybody knows that with diamond inheritance we need virtual
> inheritance.

You don't *need* it. It's *one possible solution* to how to handle
diamond inheritance. C++ also offers another solution (which is not
only a bit more efficient, but also feasible in some situations).

--- news://freenews.netfront.net/ - complaints: ne...@netfront.net ---

Öö Tiib

unread,

Jun 29, 2015, 12:39:48 PM6/29/15

to

On Monday, 29 June 2015 11:14:50 UTC+3, Juha Nieminen wrote:
> JiiPee <n...@notvalid.com> wrote:
> > Everybody knows that with diamond inheritance we need virtual
> > inheritance.
>
> You don't *need* it. It's *one possible solution* to how to handle
> diamond inheritance. C++ also offers another solution (which is not
> only a bit more efficient, but also feasible in some situations).

What is the solution? Actually I would more like to see a motivating
problem. I have not met a problem to solve with it. Typical examples
from books are also awfully abstract:

Base
/ \
/ \
/ \
Der1 Der2
\ /
\ /
\ /
Join

Uhh? What these Base, 'Der1', 'Der2' and 'Join' are? I don't have those
anywhere. However ... I have just learned it for case that maybe
someday I meet a case where something somewhere needs that.

Richard Damon

unread,

Jun 29, 2015, 9:56:37 PM6/29/15

to

Here is an example. Assume we are designing a fantasy game, and one of
the objects we want to be able to represent are creatures, which hold a
number of attributes common to all creatures (like life level). There
also exist some special types of creatures that need extra attributes
and are represented as derived classes (For example, Dragons with flying
and breath weapons, and Warriors with weapon skills). We may then want a
creature that belongs to multiple of these special classes, so we want
to multiply inherent from each of the specialty classes so it collects
all of the special properties, in this case a Dragon Warrior (Yikes).

Without using virtual bases, the Dragon warrior is two creature, each
with their own sets of properties. Thus if I damage it as a Dragon, that
damage doesn't affect the Warrior. This isn't right! We want all
creatures to only have creature class within them, so we make it a
virtual base.

The key is looking to see if you are going to get to the multiple
inheritance case with (a) common base(s). And if you do, should there be
just a single copy of that base shared by the whole object, or does each
path to it need their own distinct copy.

Öö Tiib

unread,

Jun 30, 2015, 9:18:29 AM6/30/15

to

Yikes? :) Feels that you think yourself that something is wrong with it?
Isn't it that "dragon" is "race" or "kind" and "warrior" is a "profession" or
"occupation" of "creature"? IOW these feel like properties or components
of every being.

The difference is that a base class is too rigid and too closely coupled
to serve as property or component. We can have type of component
dynamically changing but we can't have dynamically changing base
subobjects. For example it is imaginable how that "dragon warrior"
may want one day to change to something like "dragon gladiator" or
"dragon mercenary" during life-time of it. What to do then?

> Without using virtual bases, the Dragon warrior is two creature, each
> with their own sets of properties. Thus if I damage it as a Dragon, that
> damage doesn't affect the Warrior. This isn't right! We want all
> creatures to only have creature class within them, so we make it a
> virtual base.
>
> The key is looking to see if you are going to get to the multiple
> inheritance case with (a) common base(s). And if you do, should there be
> just a single copy of that base shared by the whole object, or does each
> path to it need their own distinct copy.

I understand how the diamond works and what it does. I still maintain my
impression that concrete "diamond" is where relations between classes are
somehow made incorrectly.

It can perhaps be that it has happened in real application already made and
we have just to maintain it without big corrections in the data architecture.
So there the virtual inheritance is sort of fix. I would perhaps just push
through the correction because it is hard for me to imagine the benefits of
it.

Juha Nieminen

unread,

Jun 30, 2015, 9:24:48 AM6/30/15

to

Öö Tiib <oot...@hot.ee> wrote:
>> You don't *need* it. It's *one possible solution* to how to handle
>> diamond inheritance. C++ also offers another solution (which is not
>> only a bit more efficient, but also feasible in some situations).
>
> What is the solution?

If you don't use virtual inheritance, then the common base class will
be duplicated for each of the derived classes. In other words, the
most derived class will have inside it everything from Der1 and
everything from Der2 independently (which means that the things from
Base will be duplicated: The Der1 part will have its own version of it
and the Der2 part will have its own version.)

This is not always desired. You may want the members of Base to
appear in Join only once, and for the Der1 and Der2 code to refer
to that one single Base data. (This requires the compiler to generate
special code in the member functions of Der1 and Der2 to access the
base class in a special way. This is what you are telling when you
are inheriting virtually.)

One example that comes to mind:

You have an intrusive smart pointer, and the classes that it can handle
are derived from a special class that contains a reference count. In
other words, you can have something like:

class MyClass1: public ReferenceCountable { ... };

class MyClass2: public ReferenceCountable { ... };

Now for one reason or another you need to multiple-inherit from those
two classes:

class Derived: public MyClass1, public MyClass 2 { ... };

Now how would your smart pointer be able to handle instance of this
Derived class? It has 'ReferenceCountable' as its base class... but
twice. Which one should it use, and how? (Obviously it ought to use
only one of them, else it won't work properly.)

In this case you want 'ReferenceCountable' to appear in 'Derived'
only once, and thus your smart pointer will be able to unambiguously
use that data.

Öö Tiib

unread,

Jun 30, 2015, 9:46:54 AM6/30/15

to

On Tuesday, 30 June 2015 16:24:48 UTC+3, Juha Nieminen wrote:
>
> One example that comes to mind:
>
> You have an intrusive smart pointer, and the classes that it can handle
> are derived from a special class that contains a reference count. In
> other words, you can have something like:
>
> class MyClass1: public ReferenceCountable { ... };
>
> class MyClass2: public ReferenceCountable { ... };
>
> Now for one reason or another you need to multiple-inherit from those
> two classes:
>
> class Derived: public MyClass1, public MyClass 2 { ... };
>
> Now how would your smart pointer be able to handle instance of this
> Derived class? It has 'ReferenceCountable' as its base class... but
> twice. Which one should it use, and how? (Obviously it ought to use
> only one of them, else it won't work properly.)
>
> In this case you want 'ReferenceCountable' to appear in 'Derived'
> only once, and thus your smart pointer will be able to unambiguously
> use that data.

Ok, now that makes sense.

It somewhat feels also a great explanation why to use 'std::make_shared'
instead of intrusive refcounting. I always suggested it but did not have
clear examples why. Here 'std::make_shared' seems both simpler and
more efficient.

Thanks.

Martijn van Buul

unread,

Jun 30, 2015, 11:26:33 AM6/30/15

to

* Öö Tiib:

Except that it isn't the same, and sometimes the differences matter,
and without further knowledge of the context it's hard to say whether it
is "more efficient" or not. I once rewrote a framework that used shared_ptr
to intrusive_ptr, because it allowed me to avoid using new. The resulting
gain in speed (and reduced memory fragmentation) was significant.

Note that I'm not saying that intrusive_ptr is generally better than
shared_ptr. It's not, but sometimes intrusive_ptr works where shared_ptr
doesn't.

The same can be said for most of the *intrusive part of Boost.
--
Martijn van Buul - pi...@dohd.org

Öö Tiib

unread,

Jul 1, 2015, 2:47:03 AM7/1/15

to

On Tuesday, 30 June 2015 18:26:33 UTC+3, Martijn van Buul wrote:
> * Öö Tiib:
> > On Tuesday, 30 June 2015 16:24:48 UTC+3, Juha Nieminen wrote:
> >
> >> In this case you want 'ReferenceCountable' to appear in 'Derived'
> >> only once, and thus your smart pointer will be able to unambiguously
> >> use that data.
> >
> > Ok, now that makes sense.
> >
> > It somewhat feels also a great explanation why to use 'std::make_shared'
> > instead of intrusive refcounting. I always suggested it but did not have
> > clear examples why. Here 'std::make_shared' seems both simpler and
> > more efficient.
>
> Except that it isn't the same, and sometimes the differences matter,
> and without further knowledge of the context it's hard to say whether it
> is "more efficient" or not. I once rewrote a framework that used shared_ptr
> to intrusive_ptr, because it allowed me to avoid using new. The resulting
> gain in speed (and reduced memory fragmentation) was significant.

Where you need 'new' with 'shared_ptr'? That was the whole point of my
question ... to find motivating examples.

> Note that I'm not saying that intrusive_ptr is generally better than
> shared_ptr. It's not, but sometimes intrusive_ptr works where shared_ptr
> doesn't.

May be. I have zero good examples. I can understand someone saying
that 'shared_ptr' has unneeded features for him (like indirection to
deleter or weak count) but in evolving project it is hard to tell if
those will be needed or not.

> The same can be said for most of the *intrusive part of Boost.

Intrusive containers in boost are certainly great but these can
not cause diamonds since the hooks can be bases or members.

Öö Tiib

unread,

Jul 1, 2015, 2:58:19 AM7/1/15

to

Just a clarification so people who read it won't get confused.
Boost isn't monolithic blob and 'boost::intrusive_ptr' is part of
"Boost Smart Ptr" library not part of "Boost Intrusive" library.
So the trees and containers are in separate libraries from that ptr.

Juha Nieminen

unread,

Jul 1, 2015, 4:05:14 AM7/1/15

to

Öö Tiib <oot...@hot.ee> wrote:
> It somewhat feels also a great explanation why to use 'std::make_shared'
> instead of intrusive refcounting. I always suggested it but did not have
> clear examples why. Here 'std::make_shared' seems both simpler and
> more efficient.

I don't understand how you conclude that shared_ptr is more efficient
in this case. Why would it be?

The major problem with shared_ptr is that it needs to make an additional
allocation to contain all of its ancillary data, and dynamic allocation
is inefficient. With an intrusive smart pointer you don't need the
extra allocation.

In practice you will *never* need to multiple-inherit from two classes
that use an intrusive smart pointer scheme. (Heck, I don't even remember
when was the last time I needed to use smart pointers at all.)

It was just an illustrative example of when virtual inheritance could
be potentially useful.

Bo Persson

unread,

Jul 1, 2015, 4:30:48 AM7/1/15

to

On 2015-07-01 10:05, Juha Nieminen wrote:
> Öö Tiib <oot...@hot.ee> wrote:
>> It somewhat feels also a great explanation why to use 'std::make_shared'
>> instead of intrusive refcounting. I always suggested it but did not have
>> clear examples why. Here 'std::make_shared' seems both simpler and
>> more efficient.
>
> I don't understand how you conclude that shared_ptr is more efficient
> in this case. Why would it be?
>
> The major problem with shared_ptr is that it needs to make an additional
> allocation to contain all of its ancillary data, and dynamic allocation
> is inefficient. With an intrusive smart pointer you don't need the
> extra allocation.
>

std::make_shared is allowed/supposed to allocate a single memory block
large enough to hold both the pointed-to data and its internal info. If
it does, that would be just as efficient as an intrusive pointer,
without being that 'intrusive'.

Bo Persson

Martijn van Buul

unread,

Jul 1, 2015, 6:22:50 AM7/1/15

to

* Öö Tiib:
> On Tuesday, 30 June 2015 18:26:33 UTC+3, Martijn van Buul wrote:
>
>> Except that it isn't the same, and sometimes the differences matter,
>> and without further knowledge of the context it's hard to say whether it
>> is "more efficient" or not. I once rewrote a framework that used shared_ptr
>> to intrusive_ptr, because it allowed me to avoid using new. The resulting
>> gain in speed (and reduced memory fragmentation) was significant.
>
> Where you need 'new' with 'shared_ptr'? That was the whole point of my
> question ... to find motivating examples.

smart_ptr uses 'new' internally (and the boost implementation used it
twice as much) - which caused performance issues.

The problem at hand revolved around a variant[1] of boost::any. boost::any
has performance issues, because it relies on dynamic allocation.
cdiggins::any works around this by including a memory buffer to be used
for small objects (like integers, doubles, small POD classes and structs)
using in-place new. This greatly reduces the burden on the heap, especially
if you add an operator new() and operator delete() that will manage instances
if cdiggins::any in a separate pool.

Before you dismiss this as premature optimalisation: It wasn't. It reduced
runtime *of the entire system* by almost 20% and reduced memory requirements
by a similar number due to a reduce in memory fragmentation.

shared_ptr<> throws a spanner into the works, however. Alongside the held
object, it needs to include at least a reference counter. It cannot combine
the held object and the counter in a single object using composition, as it
must be possible to create a shared_ptr<> from a naked pointer. It must
therefore internally use a template class that contains a "naked" pointer to
the held object, alongside the reference count. This means that the overloaded
operator new() of cdiggins::any will still be used, at the expense of an extra
allocation of the internal template class on the heap. This, however, is
exactly the issue we were trying to address using cdiggins::any: The heap usage
is back (thus increasing runtime and increasing fragmentation). make_shared()
tries to work around this by allocating one block of memory for both, but then
you're back at not using the overloaded operator new.

Neither can be fixed, without intimate knowledge of the implementation.
intrusive_ptr turned out to be the saving grace, however:

1) the overloaded operator new() and delete() continue to function.
2) No extra objects will be allocated on the heap, as no internal reference
counting object is required.
3) It is a solution that does not depend on implementation details of
shared_ptr

There is a price to pay: Reduced functionality (no support for anything
resembling std::weak_ptr), and added complexity - either deriving from a base
class (which opens up the diamond pattern issue) or adding accounting functions
to the containing namespace. For our application, this was well worth it.

Again, I'm not saying intrusive_ptr<> is the universal solution, and I don't
use them unless profiling indicates that I should. However, I have enough
evidence to dispel blanket statements like "shared_ptr is more efficient".

Blanket statements are always wrong. I never make them :P.

Martijn

[1] See http://www.drdobbs.com/an-efficient-variant-type/184402027

Öö Tiib

unread,

Jul 1, 2015, 6:27:25 AM7/1/15

to

On Wednesday, 1 July 2015 11:05:14 UTC+3, Juha Nieminen wrote:
> Öö Tiib <oot...@hot.ee> wrote:
> > It somewhat feels also a great explanation why to use 'std::make_shared'
> > instead of intrusive refcounting. I always suggested it but did not have
> > clear examples why. Here 'std::make_shared' seems both simpler and
> > more efficient.
>
> I don't understand how you conclude that shared_ptr is more efficient
> in this case. Why would it be?

It loses all the need for additional burden that virtual inheritance and
diamond causes.

> The major problem with shared_ptr is that it needs to make an additional
> allocation to contain all of its ancillary data, and dynamic allocation
> is inefficient. With an intrusive smart pointer you don't need the
> extra allocation.

I did write 'std::make_shared'. That does one allocation and puts the
ref-count block adjacent to object in memory.

> In practice you will *never* need to multiple-inherit from two classes
> that use an intrusive smart pointer scheme.

It was the whole question. Where I see a virtual base then I am always very
interested that maybe it is now the case where it gains performance,
clarity or convenience. However it typically performs worse, the relations
are mildly mixed up not clear and it causes nuisance not convenience.

> (Heck, I don't even remember when was the last time I needed to use
> smart pointers at all.)

Odd. It is hard to be more efficient than 'std::unique_ptr'. I have
massive amount of those. Need for 'std::shared_ptr' is rare (since shared
ownership is rare) but even less than that I need to use intrusive reference
counting.

> It was just an illustrative example of when virtual inheritance could
> be potentially useful.

Like I said it felt at least imaginable bonus on case when intrusive
reference-counting is somehow needed for resource management.

Juha Nieminen

unread,

Jul 1, 2015, 8:57:29 AM7/1/15

to

Bo Persson <b...@gmb.dk> wrote:
> std::make_shared is allowed/supposed to allocate a single memory block
> large enough to hold both the pointed-to data and its internal info. If
> it does, that would be just as efficient as an intrusive pointer,
> without being that 'intrusive'.

If that's the case, then it's much better than using std::shared_ptr
with 'new' directly.

Juha Nieminen

unread,

Jul 1, 2015, 9:00:36 AM7/1/15

to

Öö Tiib <oot...@hot.ee> wrote:
>> (Heck, I don't even remember when was the last time I needed to use
>> smart pointers at all.)
>
> Odd. It is hard to be more efficient than 'std::unique_ptr'. I have
> massive amount of those. Need for 'std::shared_ptr' is rare (since shared
> ownership is rare) but even less than that I need to use intrusive reference
> counting.

The reason I don't usually use smart pointers at all is because I usually
don't allocate memory explicitly. I tend to either use the standard
data containers, or objects by value (if it's feasible and efficient).

Of course it depends on the application how much explicit dynamic
allocation of objects is needed.

Öö Tiib

unread,

Jul 1, 2015, 12:08:15 PM7/1/15

to

On Wednesday, 1 July 2015 13:22:50 UTC+3, Martijn van Buul wrote:
> * Öö Tiib:
> > On Tuesday, 30 June 2015 18:26:33 UTC+3, Martijn van Buul wrote:
> >
> >> Except that it isn't the same, and sometimes the differences matter,
> >> and without further knowledge of the context it's hard to say whether it
> >> is "more efficient" or not. I once rewrote a framework that used shared_ptr
> >> to intrusive_ptr, because it allowed me to avoid using new. The resulting
> >> gain in speed (and reduced memory fragmentation) was significant.
> >
> > Where you need 'new' with 'shared_ptr'? That was the whole point of my
> > question ... to find motivating examples.
>
> smart_ptr uses 'new' internally (and the boost implementation used it
> twice as much) - which caused performance issues.
>
> The problem at hand revolved around a variant[1] of boost::any. boost::any
> has performance issues, because it relies on dynamic allocation.

I have used 'boost::variant'. It is bit simpler to use, works faster and
does more things compile-time than 'boost::any'. It anyway feels not
relevant.

> cdiggins::any works around this by including a memory buffer to be used
> for small objects (like integers, doubles, small POD classes and structs)
> using in-place new. This greatly reduces the burden on the heap, especially
> if you add an operator new() and operator delete() that will manage instances
> if cdiggins::any in a separate pool.
>
> Before you dismiss this as premature optimalisation: It wasn't. It reduced
> runtime *of the entire system* by almost 20% and reduced memory
> requirements by a similar number due to a reduce in memory fragmentation.

I don't dismiss it. Your design feels like hybrid of 'any' and 'variant'. I trust it does
not matter for question of intrusive reference counting.

> shared_ptr<> throws a spanner into the works, however. Alongside the held
> object, it needs to include at least a reference counter. It cannot combine
> the held object and the counter in a single object using composition, as it
> must be possible to create a shared_ptr<> from a naked pointer.

There is clear need to make 'shared_ptr' from naked pointer when the naked
pointer to manage comes from outside of our code (like from external library
or operating system API). On these cases it is also likely difficult to add that
intrusive reference count as alternative? On other cases neither feels needed.

> It must
> therefore internally use a template class that contains a "naked" pointer to
> the held object, alongside the reference count. This means that the overloaded
> operator new() of cdiggins::any will still be used, at the expense of an extra
> allocation of the internal template class on the heap. This, however, is
> exactly the issue we were trying to address using cdiggins::any: The heap usage
> is back (thus increasing runtime and increasing fragmentation). make_shared()
> tries to work around this by allocating one block of memory for both, but then
> you're back at not using the overloaded operator new.

When we have custom allocator then 'std::make_shared' does not
work but then it might be worth considering usage of 'std::allocate_shared'
as alternative to switching to intrusive pointer.

> Neither can be fixed, without intimate knowledge of the implementation.
> intrusive_ptr turned out to be the saving grace, however:
>
> 1) the overloaded operator new() and delete() continue to function.
> 2) No extra objects will be allocated on the heap, as no internal reference
> counting object is required.
> 3) It is a solution that does not depend on implementation details of
> shared_ptr

I do not understand how usage of 'std::allocate_shared' makes extra allocations
and what internal details you have in mind.

> There is a price to pay: Reduced functionality (no support for anything
> resembling std::weak_ptr), and added complexity - either deriving from a base
> class (which opens up the diamond pattern issue) or adding accounting functions
> to the containing namespace. For our application, this was well worth it.
>
> Again, I'm not saying intrusive_ptr<> is the universal solution, and I don't
> use them unless profiling indicates that I should. However, I have enough
> evidence to dispel blanket statements like "shared_ptr is more efficient".
>
> Blanket statements are always wrong. I never make them :P.

I can not say that 'shared_ptr' is *always* better than 'intrusive_ptr' because it has
more features so it is more expensive (if it matters). I only wrote that virtual
inheritance and diamond caused by two refcounted base classes is such a great
example where 'shared_ptr' most likely wins intrusive refcounting.

Öö Tiib

unread,

Jul 1, 2015, 12:17:18 PM7/1/15

to

On Wednesday, 1 July 2015 16:00:36 UTC+3, Juha Nieminen wrote:
> Öö Tiib <oot...@hot.ee> wrote:
> >> (Heck, I don't even remember when was the last time I needed to use
> >> smart pointers at all.)
> >
> > Odd. It is hard to be more efficient than 'std::unique_ptr'. I have
> > massive amount of those. Need for 'std::shared_ptr' is rare (since shared
> > ownership is rare) but even less than that I need to use intrusive reference
> > counting.
>
> The reason I don't usually use smart pointers at all is because I usually
> don't allocate memory explicitly. I tend to either use the standard
> data containers, or objects by value (if it's feasible and efficient).

There seem to always be few things that are dynamically polymorphic to
get rid of repetitive switch cases. Can't keep such in continuous container.

> Of course it depends on the application how much explicit dynamic
> allocation of objects is needed.

Indeed.

Richard

unread,

Jul 1, 2015, 12:57:52 PM7/1/15

to

[Please do not mail me a copy of your followup]

Martijn van Buul <pi...@dohd.org> spake the secret code
<slrnmp7fr...@mud.stack.nl> thusly:

>shared_ptr<> throws a spanner into the works, however. Alongside the held
>object, it needs to include at least a reference counter. It cannot combine
>the held object and the counter in a single object using composition, as it
>must be possible to create a shared_ptr<> from a naked pointer.

To get the reference count and the object allocated in the same chunk
of memory, you use std::make_shared.
--
"The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline>
The Computer Graphics Museum <http://computergraphicsmuseum.org>
The Terminals Wiki <http://terminals.classiccmp.org>
Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Martijn van Buul

unread,

Jul 6, 2015, 4:31:07 PM7/6/15

to

* Öö Tiib:

>> Before you dismiss this as premature optimalisation: It wasn't. It reduced
>> runtime *of the entire system* by almost 20% and reduced memory
>> requirements by a similar number due to a reduce in memory fragmentation.
>
> I don't dismiss it. Your design feels like hybrid of 'any' and 'variant'. I
> trust it does not matter for question of intrusive reference counting.

Well, it doesn't, other than setting the stage for a specific, real-life
example.

> When we have custom allocator then 'std::make_shared' does not
> work but then it might be worth considering usage of 'std::allocate_shared'
> as alternative to switching to intrusive pointer.

std::allocate_shared wouldn't work. In our case, because I created a
memory pool for one specific object (of constant size), I was able to
use a pool with constant block size (in other words: A doubly linked
list). This has several benefits - most importantly memory fragmentation,
as a memory pool of constant block size cannot fragment, whereas a dynamic
heap can.

The Allocator-interface used by std::make_shared wouldn't work here. Even
though it would still make allocations of the exact same size *in practice*,
the rub is that this size isn't known in advance.

>
>> Neither can be fixed, without intimate knowledge of the implementation.
>> intrusive_ptr turned out to be the saving grace, however:
>>
>> 1) the overloaded operator new() and delete() continue to function.
>> 2) No extra objects will be allocated on the heap, as no internal reference
>> counting object is required.
>> 3) It is a solution that does not depend on implementation details of
>> shared_ptr
>
> I do not understand how usage of 'std::allocate_shared' makes extra
> allocations and what internal details you have in mind.

std::allocate_shared<T> will make a single allocation, containing the payload
and "a small bookkeeping object". The latter is an implementation detail;
the spec doesn't say anything about it, and different libraries use
different approaches.

So, on gcc, std::allocate_shared<T> will allocate

sizeof(T) + sizeof(__shared_ptr<>) - where __shared_ptr is overloaded
so often that I gave up trying to figure out which of the many templated
versions is actually used.

On Clang/LLVM, std::allocate_shared<T> will allocate

sizeof(T) + sizeof(__shared_ptr_emplace<T, Alloc>) which may or may
not be the same as sizeof(T) + sizeof(__shared_weak_count)

What MSVC does is anyone's guess - and it doesn't even matter. It's an
internal detail, relying on it would be foolish at best. The result is
that there is no way to predict exactly what size of allocation
std::allocae_shared<> will make, and I can't even rely on the fact that
this allocation will always be of the same size, given the same template
argument T.

With an intrusive_ptr I have none of these issues. I know exactly what
size of objects will be allocated, and I know it at compile time. It's

sizeof(T)

>> Blanket statements are always wrong. I never make them :P.
>
> I can not say that 'shared_ptr' is *always* better than 'intrusive_ptr'
> because it has more features so it is more expensive (if it matters). I only
> wrote that virtual inheritance and diamond caused by two refcounted base
> classes is such a great example where 'shared_ptr' most likely wins intrusive
> refcounting.

... and i say that this *just* as application dependant as my case to the
contrary.

Öö Tiib

unread,

Jul 7, 2015, 2:28:49 AM7/7/15

to

On Monday, 6 July 2015 23:31:07 UTC+3, Martijn van Buul wrote:
> * Öö Tiib:
> >> Before you dismiss this as premature optimalisation: It wasn't. It reduced
> >> runtime *of the entire system* by almost 20% and reduced memory
> >> requirements by a similar number due to a reduce in memory fragmentation.
> >
> > I don't dismiss it. Your design feels like hybrid of 'any' and 'variant'. I
> > trust it does not matter for question of intrusive reference counting.
>
> Well, it doesn't, other than setting the stage for a specific, real-life
> example.

Ok.

> > When we have custom allocator then 'std::make_shared' does not
> > work but then it might be worth considering usage of 'std::allocate_shared'
> > as alternative to switching to intrusive pointer.
>
> std::allocate_shared wouldn't work. In our case, because I created a
> memory pool for one specific object (of constant size), I was able to
> use a pool with constant block size (in other words: A doubly linked
> list). This has several benefits - most importantly memory fragmentation,
> as a memory pool of constant block size cannot fragment, whereas a dynamic
> heap can.
>
> The Allocator-interface used by std::make_shared wouldn't work here. Even
> though it would still make allocations of the exact same size *in practice*,
> the rub is that this size isn't known in advance.

Why your allocator must know the block size in advance? What several
benefits? If the requested blocks will be in practice of exact same size
then what does fragment your pool?

>
> >
> >> Neither can be fixed, without intimate knowledge of the implementation.
> >> intrusive_ptr turned out to be the saving grace, however:
> >>
> >> 1) the overloaded operator new() and delete() continue to function.
> >> 2) No extra objects will be allocated on the heap, as no internal reference
> >> counting object is required.
> >> 3) It is a solution that does not depend on implementation details of
> >> shared_ptr
> >
> > I do not understand how usage of 'std::allocate_shared' makes extra
> > allocations and what internal details you have in mind.
>
> std::allocate_shared<T> will make a single allocation, containing the payload
> and "a small bookkeeping object". The latter is an implementation detail;
> the spec doesn't say anything about it, and different libraries use
> different approaches.
>
> So, on gcc, std::allocate_shared<T> will allocate
>
> sizeof(T) + sizeof(__shared_ptr<>) - where __shared_ptr is overloaded
> so often that I gave up trying to figure out which of the many templated
> versions is actually used.
>
> On Clang/LLVM, std::allocate_shared<T> will allocate
>
> sizeof(T) + sizeof(__shared_ptr_emplace<T, Alloc>) which may or may
> not be the same as sizeof(T) + sizeof(__shared_weak_count)
>
> What MSVC does is anyone's guess - and it doesn't even matter. It's an
> internal detail, relying on it would be foolish at best.

Certainly. We won't use all those compilers together in mix for same
program so the differences do not matter.

> The result is
> that there is no way to predict exactly what size of allocation
> std::allocae_shared<> will make, and I can't even rely on the fact that
> this allocation will always be of the same size, given the same template
> argument T.

The size of control block of 'shared_ptr' might perhaps vary when it is
produced by different means (and has different allocators/deleters).
How it can vary with 'allocate_shared' of same type using same allocator?

> With an intrusive_ptr I have none of these issues. I know exactly what
> size of objects will be allocated, and I know it at compile time. It's
>
> sizeof(T)

Compiler does know the control block size compile time but what benefits
it gives is yet undisclosed. Same size can't cause fragmentation of your
pool regardless.

>
> >> Blanket statements are always wrong. I never make them :P.
> >
> > I can not say that 'shared_ptr' is *always* better than 'intrusive_ptr'
> > because it has more features so it is more expensive (if it matters). I only
> > wrote that virtual inheritance and diamond caused by two refcounted base
> > classes is such a great example where 'shared_ptr' most likely wins intrusive
> > refcounting.
>
> ... and i say that this *just* as application dependant as my case to the
> contrary.

How can expensive diamond inheritance caused by intrusive reference count
be better than simple 'make_shared'?