On 10/12/12 1:39 AM, Pavel wrote:
>> Actually, most of the jumps can be removed, by placing the thunk
>> directly in front of the code for the subroutine.
> Not unless you generate multiple copies of the subroutine because there
> can be more than one offset for the same function (D::f() can be called
> by either of its the bases that declare virtual f() and there can be
> more than one such base, some inheriting from the other).
>
> One problem with generating multiple functions is that one function can
> be relatively long (in code terms) and *all of them has to be generated*
> regardless of whether even one of them is used (as opposed to the
> code-bloat caused by templates that only generates really used
> functions). For example, if you have this hierarchy: class B { public:
> virtual int f(); };
> class B1 {...}; class B2 {...}; class B3 {...};
> class D1: public B1, public B {...}; // no f() override!
> class D2: public B3, public D1 {...}; // no f() override!
> class D3: public B2, public D2 {... virtual int f(); };,
>
> you will have to generate 4 virtual tables and 4 different functions
> fully copying the body of function D3::f(): to call it by B*, D1*, D2*
> and D3*; of these, the former will not have adjustment and the latter 3
> will.
>
As I have been thinking about it, supporting pointer to member functions
basically will require any implementation to have a virtual table for
every distinct base class (That defines virtual functions) excepting
single derivation. So the the 4 virtual tables are basically required.
There does NOT need to be 4 different functions though. There may need
to be 4 entry points, but the whole body doesn't need to be duplicated,
the entries which need to adjust the this pointer, can adjust the
pointer parameter and then fall into/jump into the main subroutine. You
may not be able to write the equivalent to this in C++, but the complier
doesn't need to. If you were to attempt to write it is C++ what you
would need to do is something like:
D3::f(D* d) { ... } // This parameter being shown explisitly
D3::f(B* b) { return f(static_cast<D*>(b)); }
and then let the compiler use tail recursion like elimination to replace
the call with a jump.
The next step is to realize that we can eliminate the jump by placing
the code directly in from of the original function. In the case of
multiple layers, they can be stacked B* -> D1* -> D2* -> D3* to
eliminate the jumps, or the compiler can decide when the cost of the
additional conversion exceeds the cost of the jump, and just add the
jump. Thus the cost will never be higher than an adjustment followed by
a jump, will be 0 cost if no adjustment is needed, and only the cost of
a single adjustment if called from a single level of non-first base
class (no jump needed).
For many systems, if we are careful with are ABI design, so that the
this pointer is passed via a register, the adjustment can be as simple
as a single instruction in the basic case, a add or subtract immediate.
Thus a possible layout of the entry to the function would be the
equivalent to
D3::f__thunk_B: this <- this - offsetof B in D1
D3::f__thunk_D1: this <- this - offsetof D1 in D2
D3::f__thunk_D2: this <- this - offsetof D2 in D3
D3::f start of normal code for D3::f
If a jump is cheaper than 2 subtracts then it
D3::f__thunk_B: this <- this - offsetof B in D3
jump B3::f
D3::f__thunk_D1: this <- this - offsetof D1 in D2
D3::f__thunk_D2: this <- this - offsetof D2 in D3
D3::f start of normal code for D3::f
This makes the thunking cost for a virtual function on par with the
fixed adjustment cost for calling any non-first base member function,
with at most the addition of a jump instruction (and this additional
cost ONLY occurs if there are multiple non-first base derivations
happening).
>
>>
>> In fact, if we define that when calling a virtual function, the this
>> pointer is always set to point to the sub object of the type where the
>> function was first declared as virtual,
> How can you define 'this' to be always set this way? This same pointer
> can be used for other purposes, too (for example, for accessing
> non-virtual members of an object of its 'compile-time' type. It *has* to
> point to a particular sub-object (which one, in general depends on the
> way how it was created; but if a class is not derived from the same base
> class more than once, it has to be to that only sub-object that is of
> the class matching the pointer type).
>
The key feature is that if you call using a pointer in the "B"
sub-objects vtable, than you need to call it with a B* pointer as this.
If the function is actually a D3 member function, then IT will adjust
the this pointer as described above. For a direct virtual call (not via
a member-function pointer), the compiler can choose the virtual table
for the most derived class that it knows about that defines an override.
If the object isn't of a more derived class that defines an override
(say D4), then the code will just execute the adjustment at the call
site and go directly into the function. If there is a subsequent
override, then the call will go into the above thunk entry code, correct
the this pointer. The compiler also has the option of using the most
derived class's vtable and not adjust the pointer, knowing that this
will go into a thunk that will adjust the pointer. If for example we
call a g() that was declared virtual in B and D2, it will need to go
into a thunk that adds the offset of D2 in D3 and then jump into
D2::g(). This is a trade off of gaining space (using the common thunk,
instead of adjusting at each call) at the expense of time (the cost of a
jump). If jumps are really expensive, it may be possible with proper
linker instruction to build the preamble for g to be:
D2::g__thunk_D3: this <- this + offsetof B in D3
D2::g__thunk_B: this <- this - offsetof B in D1
D2::g__thunk_D1: this <- this - offsetof D1 in D2
D2::g start of normal code for D2::g
(and further add later thunks in front until the cost of the extra
adjustments exceed the cost of the jump).
Note again, the main body of the member function has the this pointer
pointing to the (sub)object of the type of the member function, the
possible different type of this is only for the input to the thunk,
which effectively casts the pointer to the needed type.
Perhaps I wasn't as clear on what I was describing here. In the case of
a call via a pointer to object -> name of member function call, the
compiler knows a lot of the layout, and can use this to choose how it
wants to code the call among its options. It can validly choose any of
the vtables as long as it uses the proper object base for the call.
Since with the structure I have described, "down casting" thunks are
more expensive, it might make sense to down cast at the call site and
then make a call that, at worse, will only need to up cast.
>
>>
>> At the function side, if the function is a virtual function of a type
>> that has the initial definition in a non-first base class, include code
>> to adjust the this pointer from that base class to the actual class just
>> before the normal entry point, and have the vtable point there.
> Yes, but without jump it creates the whole new function for every such
> base whether it (the function) is going to be called by that base
> anywhere at all or not. I think that's why thunks with jumps are used in
> practice but I am unsure about the multiple versions of the functions
> (for trivial functions, it may not be bad idea though; but I expect
> complications with the name and cross-compiler linking. I am not sure if
> ABI supports such things though (simply don't know); otherwise, each
> compiler will name its "additional" virtual functions for the same
> 'user-visible' name differently).
As I have pointed out, these entries do NOT need "whole new functions"
but a bit of adjusting code at the entry point to make the call with the
"wrong" type of this now correct. From the call side, they may act like
different functions, since they all called with different types of
parameters, but the body (after the thunk) is likely to all be the same
(not just identical looking code, but being the same code at the same
addresses). Of course, if the function is so simple that the code for a
version with the different base is "better" than doing the adjustment,
the compiler if free to make them separate functions.
>
>
>> The
>> regular entry point can be used for non-virtual calls, with the caller
>> passing the "normal" this pointer.
> true, but probably besides the point.
>
>>
>> This is probably minimal overhead
> It probably is -- in terms of instructions involved; but code bloating
> seems to be nasty; and it does lead to measurable performance penalty
> for sizable programs.
It seems to be minimal, and only imposed when needed. As opposed to the
other method presented which places in the vtable an offset FOR ALL
FUNCTIONS and the offset is applied AT THE CALL SITE for ALL CALLS.
This will add more space and time for most programs (since it needs the
offset even for 1st base entries, and adds code to every call) as
opposed to only when needed.