How does C++ implementation cast the pointer returned by a virtual function?

Shiyao Ma

unread,

Jan 21, 2017, 10:53:18 PM1/21/17

to

Hi.

A virtual function can return a pointer to the derived class.

When it comes to multiple inheritance, the pointer might be casted to add some offset, in order to point to the right base.

E.g.,

Base *ptr = someobjptr->some_func_return_pointer_to_derived();

The problem is, virtual function binding is a runtime work. While, "Base*" type is a static work.

So how is the offset calculated to adjust the result of "someobj->some_func_return_pointer_to_derived()" to be "Base*" ?

Though it's high impl. specific, any concrete example, like gcc?

The following is the code snippet, we can see the addresses of the two pointers is different.

http://ideone.com/6DmlT3

Jens Thoms Toerring

unread,

Jan 22, 2017, 12:34:41 AM1/22/17

to

Shiyao Ma <i...@introo.me> wrote:
> A virtual function can return a pointer to the derived class.

Here's your program, it's short enough for posting it.

> #include <iostream>
> using namespace std;
>
> struct A { int a; };
> struct B { int b; };
> struct C: A, B { int c; };
>
> struct Base {
> virtual B* func() = 0;
> virtual ~Base() = default;
> };
>
> struct Derived: Base {
> C* func() {
> auto p = new C;
> cout << p << endl;
> return p;
> }
> };
>
> int main() {
> Base* pb = new Derived;
> cout << pb->func() << endl;
> // the outputed two addresses are different.
> // how is the pointer cast (adding some offset) achieved?
> }

Your assumption that func() would return a pointer to a
'Derived' class instance is simply wrong. It returns a
pointer to a newly created instance of 'C'. So it has
nothing to do with the address of the 'Derived' class
instance and they must be different since they are com-
pletely different objects. The address stored in 'pb' is
exactly the same as that of the instance of 'Derived'
from which it was assigned. Try instead

Derived *dp = new Derived;
Base *bp = dp;
cout << dp << ' ' << bp << cendl;

If your assumption would be correct you could do

Base *bp2 = pb->func();

But the compiler won't let you do that because the result
of 'pb->func()' is neither a pointer to an instance of
'Derived' nor 'Base' but to a new instance of 'C'. And 'C'
isn't derived from 'Base' and thus the assignment isn't
possible (unless you force the compiler via a cast to
let you do that anyway).
Regards, Jens
--
\ Jens Thoms Toerring ___ j...@toerring.de
\__________________________ http://toerring.de

Stuart Redmann

unread,

Jan 22, 2017, 9:46:44 AM1/22/17

to

If you add a covariant implementation you actually add two virtual methods.
One is the one that you have explicitely provided, the other is supplied by
the compiler automatically, and it looks like this:

B* func ()
{
C* derived = func(); // invokes your version, but this code
// would not compile in reality because
// this compiler could not figure out which
// version of func should be called.
B* base = static_cast<B*>(derived);
return base;
}

Note that above snippet is an implementation detail that is called vtable
thunking. However, since most C++ compilers use vtables they will most
likely also use this technique.

One implication of this technique is that each covariant method declaration
adds another entry to the vtable of a class. So if you create a real large
inheritance tree that consists only of a single branch and each level adds
a covariant method, your most derived class will end up with a large vtable
as well, even if it contains only a single virtual method!

Regards,
Stuart

Shiyao Ma

unread,

Jan 22, 2017, 10:06:53 AM1/22/17

to

Thanks stuar,

Very enlightening to have a read.

Alf P. Steinbach

unread,

Jan 22, 2017, 10:43:22 AM1/22/17

to

On 22.01.2017 04:53, Shiyao Ma wrote:
> Hi.
>
> A virtual function can return a pointer to the derived class.
>
> When it comes to multiple inheritance, the pointer might be casted to
> add some offset, in order to point to the right base.

Address adjustment can happen also for single inheritance.

> Base *ptr = someobjptr->some_func_return_pointer_to_derived();
>
> The problem is, virtual function binding is a runtime work. While,
> "Base*" type is a static work.

The declared return type is known at compile time.

> So how is the offset calculated to adjust the result of
> "someobj->some_func_return_pointer_to_derived()" to be "Base*" ?

There are a number of different cases.

> Though it's high impl. specific, any concrete example, like gcc?

The under-the-hood details are implementation specific, yes.

> The following is the code snippet, we can see the addresses of the
> two pointers is different.
>
> http://ideone.com/6DmlT3

Usenet is not a web forum. Messages are archived and can be read for
many decades. Your URL will probably not be valid in a year or two.

As it happens Jens Thoms Toerring has already discussed your example
else-thread: your code creates two objects of two unrelated types, and
compares their addresses. It's an irrelevant example. :)

Here is about the simplest example where there is likely to be a pointer
adjustment when converting from `Derived*` to `Base*`:

struct Base { int x; };
struct Derived: Base { virtual ~Derived() {} };

#include <iostream>
using namespace std;

auto main()
-> int
{
Derived o;
Base& b = o;
cout << "Base at " << &b << ", Derived at " << &o << endl;
}

Since `b` is (apparently) a reference to `o`, an alias for `o`, one
might naively expect that they should be at the same address. But in
more detail `b` is a reference to /the `Base` sub-object`/ of `o`. I.e.
to something inside `o`, and since `Derived` is not a POD class that
sub-object is not guaranteed to be at the very start of `o`.

OTOH it's not guaranteed that there will be an adjustment, either: it
depends on the implementation. But it's likely. With MinGW g++ I get

Base at 0x22fd58, Derived at 0x22fd50

The Derived object logically contains a Base sub-object. And here the
g++ compiler placed a vtable pointer before the Base sub-object, at the
very start of Derived. Hence the Base sub-object is at a slightly higher
address than Derived, just sufficiently to make room for that vtable
pointer, which is 8 bytes with this 64-bit compiler.

So what happens if, in `Base`, you add this method:

virtual auto p() -> Base* = 0;

and in `Derived` you implement it as

auto p() -> Base* override { return this; }

Well that case is easy, in two ways! First, in `Derived` the type of
`this` is known to be `Derived*`, and the conversion to `Base*` can be
determined at compile time. And secondly, by adding a virtual method up
in `Base` we have introduced a vtable pointer there, so there's likely
no address adjustment at all, i.e., `Base` is at offset 0 in `Derived`.

To get an address adjustment sort of within the call of a virtual
method, one apparently needs multiple inheritance. At least for the in
practice.

Let's first construct an example with sub-objects of the same type but
at different offsets within the containing derived class object:

struct Base { int x; virtual ~Base(){} };
struct Intermediate1: Base {};
struct Intermediate2: Base {};
struct Derived: Intermediate1, Intermediate2 { };

#include <iostream>
using namespace std;

auto main()
-> int
{
Derived o;
Base& b1 = static_cast<Intermediate1&>( o );
Base& b2 = static_cast<Intermediate2&>( o );
cout << "Base 1 at " << &b1 << ", Base 2 at " << &b2 << ",
Derived at " << &o << endl;
}

With MinGW g++ I get

Base 1 at 0x22fd40, Base 2 at 0x22fd50, Derived at 0x22fd40

The problem with this example is that it's still irrelevant for the
virtual function call question, for it's not the case that a `Derived`
is-a single `Base`. A `Derived` here is two `Base`´s, a `Base` plus a
`Base`. And since they are on equal footing a `Derived*` doesn't convert
implicitly to single `Base*`: the conversion is ambiguous!

And so a virtual function implementation down in `Derived` cannot simply
return `this` in order to return a `Base*`: it must disambiguate, e.g.
via a `static_cast` as shown above, exactly which of the two `Base`
sub-objects it should return the address of.

The relevant conversion `Derived*` → `Base*` will therefore be known at
compile time, and it's also known at compile time for a virtual function
implementation in `Intermediate1` or `Intermediate2`.

To make things more complex & interesting we can sort of merge the two
`Base` sub-objects into a single shared one, by using `virtual`
inheritance from `Base`. All lines of `virtual` (direct) inheritance of
a class T go the same single T sub-object, and so in this code:

struct Base { int x; virtual ~Base(){} };
struct Derived_v: virtual Base {};
struct Derived1: Derived_v {};
struct Derived2: Derived_v {};
struct Most_derived: Derived1, Derived2 { };

#include <iostream>
using namespace std;

auto main()
-> int
{
Most_derived o;

Derived1& d1 = o;
Derived_v& d1v = d1;

Derived2& d2 = o;
Derived_v& d2v = d2;

Base& b = o;

cout << "b at " << &b << ", d1v at " << &d1v << ", d2v at " <<
&d2v << ", " << "o at " << &o << endl;
}

… the single `Base` sub-object must necessarily be at different offsets
in the two `Derived_v` sub-objects, and indeed, I get e.g. this output:

b at 0x22fd30, d1v at 0x22fd20, d2v at 0x22fd28, o at 0x22fd20

So now we're in position to ATTEMPT to create a virtual function with
covariant result, where that result must be adjusted in diffent ways
depending on through which sub-object that function is called.

That is, we attempt to force an adjustment of the function result that
depends on information only known at run-time:

struct Base
{
int x;
virtual auto p() -> Base* { return this; }
virtual ~Base(){}
};

struct Derived_v
: virtual Base
{
auto p() -> Derived_v* override { return this; }
};

struct Derived1
: Derived_v
{
auto p() -> Derived1* override { return this; }
};

struct Derived2
: Derived_v
{
auto p() -> Derived2* override { return this; }
};

struct Most_derived: Derived1, Derived2 { };

#include <iostream>
using namespace std;

auto main()
-> int
{
Most_derived o;

Derived1& d1 = o;
Derived_v& d1v = d1;

Derived2& d2 = o;
Derived_v& d2v = d2;

Base& b = o;

cout << "b at " << b.p() << ", d1v at " << d1v.p() << ", d2v at
" << d2v.p() << ", " << "o at " << &o << endl;
}

But this is just not allowed by the C++ rules: this code will not
compile with a standard-conforming compiler.

[C:\my\forums\clc++\050]
> g++ d.cpp
d.cpp:26:8: error: no unique final overrider for 'virtual Base*
Base::p()' in 'Most_derived
struct Most_derived: Derived1, Derived2 { };
^~~~~~~~~~~~

[C:\my\forums\clc++\050]
> cl d.cpp
d.cpp
d.cpp(26): error C2250: 'Most_derived': ambiguous inheritance of
'Derived1 *Base::p(void)'

[C:\my\forums\clc++\050]
> _

So, the short answer is that the C++ rules ensure that any adjustment of
a function result is completely known at compile time, when the function
implementation is compiled. And the slightly longer answer is that the
rules are quite complex, but they add up to reliable simple behavior. Or
at least, it's been that way up till and including C++14.

Cheers & hth.,

- Alf

Manfred

unread,

Jan 22, 2017, 11:30:41 AM1/22/17

to

On 1/22/2017 6:34 AM, Jens Thoms Toerring wrote:
> Shiyao Ma <i...@introo.me> wrote:
>> A virtual function can return a pointer to the derived class.
>
> Here's your program, it's short enough for posting it.
>
>> #include <iostream>
>> using namespace std;
>>
>> struct A { int a; };
>> struct B { int b; };
>> struct C: A, B { int c; };
>>
>> struct Base {
>> virtual B* func() = 0;
>> virtual ~Base() = default;
>> };
>>
>> struct Derived: Base {
>> C* func() {
>> auto p = new C;
>> cout << p << endl;
>> return p;
>> }
>> };
>>
>> int main() {
>> Base* pb = new Derived;
>> cout << pb->func() << endl;
>> // the outputed two addresses are different.
>> // how is the pointer cast (adding some offset) achieved?
>> }
>
> Your assumption that func() would return a pointer to a
> 'Derived' class instance is simply wrong.

This does not appear to be the assumption of the program.
It first prints (in Derived::func()) the address of a new object of type
C, and then prints (in main() the address of the same object converted
to a B*.
The confusing part is that it is using Base and Derived as well as A, B
and C.
The difference in address is, though, not due to func() being virtual
nor it has to do with polymorphism. It is simply due to the fact that C
has two bases A and B which obviously cannot both share the same
address, so the B (the second base) subobject has a different address
than the C object (which probably has the same address as the A subobject).
Conversions between A, B and C pointers are performed by the compiler
given that their relative layout is known at compile time.

Alf P. Steinbach

unread,

Jan 22, 2017, 4:16:12 PM1/22/17

to

On 22.01.2017 17:30, Manfred wrote:
>>>
>>> int main() {
>>> Base* pb = new Derived;
>>> cout << pb->func() << endl;
>>> // the outputed two addresses are different.
>>> // how is the pointer cast (adding some offset) achieved?
>>> }
>>
>> Your assumption that func() would return a pointer to a
>> 'Derived' class instance is simply wrong.
> This does not appear to be the assumption of the program.
> It first prints (in Derived::func()) the address of a new object of type
> C, and then prints (in main() the address of the same object converted
> to a B*.

No, it prints the result of calling `pb->func()`, not `pb`.

Cheers & hth.,

- ALf

Manfred

unread,

Jan 22, 2017, 4:53:45 PM1/22/17

to

Exactly, pb is never printed nor assumed to be compared to anything.
The other printout (meant to be compared with the result of
`pb->func()`) is in the following, where the thing being printed is the
result of `new C`:

Alf P. Steinbach

unread,

Jan 22, 2017, 5:24:36 PM1/22/17

to

Oh, a function with side effect, and covariant with respect to a
parallel class hierarchy.

Well then the analysis given by Jens up-thread is incorrect. And I
failed to see that. But who would expect such code, huh.

Now if I were the compiler I would implement it as follows:

auto func()
-> B* override // Known to be C* for this implementation.

{
auto p = new C;
cout << p << endl;
return p;
}

auto _non_virtual_func()
-> C*
{ return static_cast<C*>( func() ); }

... and translate every call of the source code's `func() -> C*`, to a
call of `_non_virtual_func`.

This keeps all the pointer type conversion (with possible address
adjustment) using only compile time information.

The direction of delegation is important because `func` can be further
overridden in a more derived class, and then one wants that reflected
also in calls of `_non_virtual_func`.

By the way, this is the usual pattern for creating covariant functions
returning smart pointers, since C++ supports covariance only for raw
pointer and raw reference result.

Cheers!, & thanks,

- Alf