A struct[ure] (also called a record) is simply a group of data offsets
related by a common base pointer. The code generator (or programmer)
obviously needs to know the layout, types and sizes of the fields, but
references to the fields then are simply pointer+offset.
The trick with objects is how to implement dynamic dispatch to
functions based on the object's type. There are 2 general ways to do
this: generic functions with parametric dispatch or member functions.
Imperitive languages tend to use member functions. FP languages tend
to prefer generic functions, although some use a blend of generic and
member functions.
Member Functions:
The very simplest way to implement member functions is to add to the
data struct a pointer field for each function. However, this isn't
very satisfactory because every object of the same "class" (i.e. the
same type) will need to have pointers to all the same functions.
Redundantly putting pointers into each object needlessly wastes space.
So the first optimization is to pull the common function pointers out
into a separate dispatch table, and to place a pointer to the dispatch
table into each object. This is the "class pointer" referred to in
the literature.
How best to implement the dispatch "table" depends on the type of
inheritance (none, single or multiple) and also on whether the
inheritance hierarchy is fixed or can change at run time. I'm not
going to discuss either multiple inheritance or how to modify the
hierarchy at run time ... those are topics for next semester after you
master single inheritance.
For single inheritance with the class hierarchy fixed at run time, the
simplest thing to do is to act like there is no inheritance and give
each class a complete table of its functions, even if they were
inherited from a superclass. More flexible, but also more involved,
is to keep the individual class dispatch tables separate and with each
dispatch table keep a pointer to the immediate superclass. The
difficulty using separate tables is having to follow the chain of
superclass pointers at run time to find the correct dispatch table. To
do this the code generator has to know the "distance" in the hierarchy
between the current class and the superclass containing the target
function definition.
[A speed optimization for separate tables is to use a "display" of
superclass pointers which exposes the entire root hierarchy instead of
a single pointer to the immediate superclass. A display permits
reaching any of the superclasses with a single pointer dereference.
However, you still need to know the "distance" to the function
definition to determine which of the display pointers to follow.]
Full function tables are the way to go if the OO implementation is
"classless" ... that is, there are OO languages which are "template"
based: you create a new object by cloning an existing one, and create
a new type of object by adding/deleting data fields and methods. WRT
class based OO, template OO doesn't necessarily require any additional
work at run time: all the "created-from" type relationships still can
be determined at compile time and so the layouts of the objects and
dispatch tables can be fixed.
Generic Functions:
Generic functions are groups overloaded functions (same name, unique
parameter types) dispatched on the *runtime* types of the arguments
(i.e. *not* like C++ overloads which dispatch on static types).
The general way of implementing them is to generate the code for each
typed overload, then to create an dispatch function which determines
the runtime types of its arguments, matches the call (if possible) to
the most specific of the typed overloads, and then executes the
overload passing through the arguments [ideally this should be a jump
to the overload code rather than an additional level of call nesting].
The runtime type of an object can be known by embedding a numeric
"class" id field in the structure (analogous to the class pointer in
the member function implementation). The id then is used as an index
into a (sparse) multi-dimensional table with one dimension for each
polymorphic parameter of the generic function. For each unique set
of parameter types, the address of the correct overload code to
execute is stored at the intersection of the dimension type indices.
The dispatch table only needs to consider *distinguishing* parameters
... i.e. if the nth parameter of every overload is a FOO, then the nth
argument is no help deciding which overload to execute [however the
dispatch code still needs to check that the nth argument of the call
actually *is* a FOO].
There are a couple "gotcha's" to watch out for. The set of overloads
may not cover the hierarchy: e.g., a FOO argument may have to be
matched to an overload which expects a superclass of FOO. So you
either need to represent the entire hierarchy in the dispatch table,
or you need an ordering of type ids such that you can know levels have
been skipped and the superclass id is the correct match. You also
need a fall back position for when the call arguments can't be matched
to any existing overload.
It may seem like generic functions are a lot more work, but in fact
the 2 methods for dynamic dispatch are essentially equivalent in
complexity ... I haven't talked about the difficulties of implementing
multiple inheritance with member functions. However, it should be
obvious that generic dispatch on a single parameter is (or at least
can be) of the same complexity as single inheritance member function
dispatch.
If you really are interested in this topic, you should get yourself a
book on compiler implementation (don't just look at examples).
FWIW: OO programming is a royal PITA when the language has no support
for it. IMO it's easier to write a compiler that generates assembler
for a high(er) level OO language than it is to program the OO
constructs directly in assembler. YMMV.
Hope this doesn't (further) muddy the water.
George