Proposal: unions for support polymorphic programming

Rafal Dabrowa

unread,

Jul 13, 2005, 7:00:01 PM7/13/05

to

Assume we want to have an array of objects derived from
a base class. Let's say, we have a base class:

class base {
public:
virtual void f();
/* ... */
};

and a few classes derived from it:

class C : pubic base { /* ... */ }
class D : public base { /* ... */ }
class E : public base { /* ... */ }

Now, we want to have a vector of objects, where every object
is one of: C, D or E. What choice we have ? I see only one:

vector<base*> v;

Of course, instead of base* it would be better to
have boost::shared_ptr, or at least a wrapper class, which
would perform proper memory management. But this is not my
interest here.

What disadvantage has the above solution ? It excessively uses heap.
Every object - vector member has to be allocated separately.
In some cases, it may cause high performance degradation.

Idea of my proposal is: allow to create an union of C, D and E, and
allow to inspect union data through common base class members.
Something like this:

union U : base {
C c;
D d;
E e;
};

Then, we might create a vector:

vector<U> v;

and then invoke a virtual method of base class, e.g.

v[10].f();

for object, which is currently kept in the union.
Of course, this solution allows to keep
only one of objects specified in union. But, this
is enough in many applications.

I don't propose exact syntax of such unions here.

Probably not all classes would be allowed in union.
E.g. base class should have virtual destructor
up to properly destroy union object.

--
Rafal

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std...@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.jamesd.demon.co.uk/csc/faq.html ]

leonardo77

unread,

Jul 14, 2005, 12:29:37 PM7/14/05

to

http://www.boost.org/doc/html/variant.html

Maciej Sobczak

unread,

Jul 14, 2005, 11:29:45 AM7/14/05

to

Rafal Dabrowa wrote:

> Idea of my proposal is: allow to create an union of C, D and E, and
> allow to inspect union data through common base class members.
> Something like this:
>
> union U : base {
> C c;
> D d;
> E e;
> };

> Probably not all classes would be allowed in union.

This would require to extend the concept of layout compatibility to
cover not only members, but also common base sub-objects.
It looks nice when there is only one base class, but can get tricky or
even impossible with multiple inheritance or with virtual inheritance.

--
Maciej Sobczak : http://www.msobczak.com/
Programming : http://www.msobczak.com/prog/

Rafal Dabrowa

unread,

Jul 15, 2005, 11:40:19 AM7/15/05

to

leonardo77 wrote:
> http://www.boost.org/doc/html/variant.html

Could you be more verbose ?

The variant class from boost does not allow inspecting objects
through common interface of base class. Objects are always treated
by it as unrelated.

Also, the variant class uses sometimes heap during assigment. So,
use of the variant class instead of pointers does not take
any advantage.

So, the variant class does not fulfil my needs, in any degree.

--
Rafal

Larry Evans

unread,

Jul 15, 2005, 1:05:47 PM7/15/05

to

On 07/15/2005 10:40 AM, Rafal Dabrowa wrote:
> leonardo77 wrote:
>> http://www.boost.org/doc/html/variant.html
[snip]

>> The variant class from boost does not allow inspecting objects
> through common interface of base class.

[snip]

> Also, the variant class uses sometimes heap during assigment. So,
> use of the variant class instead of pointers does not take
> any advantage.

There was some reason why it does this, but I can't remember what
it was. I remember there was some discussion about it in the
boost mailing list:

http://www.boost.org/more/mailing_lists.htm#archive

As an alternative, you might consider the code here:

http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/boost/indexed_types/

any file with 'sum' in the name corresponds to some version of variant.
All have a common base class. In the case of sum.hpp, the common base
is term<Indices,Typemap,0>. In the case of composite_sum.hpp it's
term<TypeMap,0>. There could also be a version of these which use
the fold_seq template demonstrated here:

http://boost-sandbox.sourceforge.net/vault/index.php?&direction=0&order=&directory=cppljevans/mpl

in file fold_seq_test.zip. One template parameter to fold_seq is the
base class. Now the base class in all these is not *truly* the base
class of each, let's call it 'summand' of the disjoint sum (variant)
type. The only way I can figure you can do this is to actually include
the base part when each summand is created with the placement new
(see inject member functions in composite_sum.hpp). I guess the
'actual' summands would be derived from each of the 'expected'
summands (your C,D, and E) and the derived part would have the
virtual functions needed by the base part(your void f(void) ).

HTH.

leonardo77

unread,

Jul 15, 2005, 2:05:46 PM7/15/05

to

Rafal Dabrowa wrote:
> Could you be more verbose ?

Of course :-)

> The variant class from boost does not allow inspecting objects
> through common interface of base class. Objects are always treated
> by it as unrelated.

What's the problem with that?

> Also, the variant class uses sometimes heap during assigment. So,
> use of the variant class instead of pointers does not take
> any advantage.

Every solution has it's advantages and disadvantages.There is/was a
double-storage proposal that would have allowed to get rid of heap
allocation, but it would mean(just like the name suggests) taking up
almost twice the amount of storage.

> So, the variant class does not fulfil my needs, in any degree.

Just out of curiosity,what are your needs? Perhaps something else
suits your needs better,but a language change seems a bit drastic.

Rafal Dabrowa

unread,

Jul 16, 2005, 5:33:48 AM7/16/05

to

leonardo77 wrote:
>>The variant class from boost does not allow inspecting objects
>>through common interface of base class. Objects are always treated
>>by it as unrelated.
>
> What's the problem with that?

No problem, simply this class does not achieve the goal
which I wanted. I want to have an union, which:
* contains (one of) objects, which are derived from a specified
base class.
* allows to inspect the kept object through interface of the base
class. No matter what object type is currently kept.

This is a quite different union concept, much more usefull in
implementation of ideas in object oriented manner.

>>Also, the variant class uses sometimes heap during assigment. So,
>>use of the variant class instead of pointers does not take
>>any advantage.
>
> Every solution has it's advantages and disadvantages.There is/was a
> double-storage proposal that would have allowed to get rid of heap
> allocation, but it would mean(just like the name suggests) taking up
> almost twice the amount of storage.

Yes, of course, and which solution is the best one, often
depends on a particular application. I'm wondering why
the variant class has choosen one, arbitrary solution
instead of giving possibility of choice for programmers.

I see one more solution here, namely variant with
"null" value. This value might be initial one when default
constructor is used, and it might be used also in erroneous
situations. So, in my opinion, variant should have possibility
of choice among at least these three implementations: use of heap,
double storage, and possibility of null value.

>>So, the variant class does not fulfil my needs, in any degree.
>
> Just out of curiosity,what are your needs? Perhaps something else
> suits your needs better,but a language change seems a bit drastic.

C++ has a possibility of inspecting union members through common
initial sequence if the the union members are POD structures.
But this is not very useful in C++ language.

The best way of ensuring "common initial sequence" is deriving
from a base class. Unfortunately such intention cannot be
expressed explicitly. Union cannot be derived from a class.

Intention of deriving from a base class is often polymorphism.
Pointers to a base class are kept in containers. Containers are
later scanned in many ways. But, why the containers keep
ponters ? Because this is the only possibility. Of course,
you may use variant class, but this makes use of the container
something hard. Like in union, in variant you cannot express
your intention, that all objects are derived from one, common class.
So, it is hard to take any advantage of polymorphism here.

Let's stay at pointers. They have two disadvantages:
* every object has to be created separately, which causes that
heap is used excessively
* additonal memory management is required

So, why why don't allow an extension of unions, more suitable for
C++ language ? Extension, which would allow to implement some
things in more efficient way.

Larry Evans

unread,

Jul 16, 2005, 2:47:18 PM7/16/05

to

On 07/16/2005 04:33 AM, Rafal Dabrowa wrote:
[snip]

> The best way of ensuring "common initial sequence" is deriving
> from a base class. Unfortunately such intention cannot be
> expressed explicitly. Union cannot be derived from a class.

But each of its possible elements can.
[snip]

> something hard. Like in union, in variant you cannot express
> your intention, that all objects are derived from one, common class.
> So, it is hard to take any advantage of polymorphism here.

But each of the C, D, and E class in your OP were derived from
base; hence, when variant creates one of those, it will have
a base superclass. IOW, the result of variant's:

get(variant<C,D,E>&)

function can be implicitly converted to base. Is that not
what you want?

John Nagle

unread,

Jul 16, 2005, 3:29:23 PM7/16/05

to

leonardo77 wrote:

> Rafal Dabrowa wrote:
>

>
> Just out of curiosity,what are your needs? Perhaps something else
> suits your needs better,but a language change seems a bit drastic.

It's clear what he wants:

class base {
};

class C : public base { /* ... */ }

class D : public base { /* ... */ }
class E : public base { /* ... */ }

vector<base> v;

But that's not useful; you can't store a C, D, or E in v.
The usual solution is

vector<base*> v;

but if the contained objects are small, this is inefficient,
requiring many more allocations.

From an implementation perspective, this is a linker problem.
You'd like to be able to get the size of the biggest class
derived from "base", then allocate a container accordingly.
But there's no way to get that information, although it is
known at link time.

This is a special case of the general problem that there's
no way to get any information about all the subclasses of
a class. And, in turn, that's a special case of the
"assume a dumb linker" problem. Which dates back to the
"don't change the linker" tradition of UNIX. Which comes
from the fact that the PDP-11 UNIX linker was written in
assembler and was an undocumented mess.

John Nagle
Animats

Niklas Matthies

unread,

Jul 18, 2005, 12:54:01 PM7/18/05

to

On 2005-07-16 19:29, John Nagle wrote:
:

> From an implementation perspective, this is a linker problem.
> You'd like to be able to get the size of the biggest class
> derived from "base", then allocate a container accordingly.
> But there's no way to get that information, although it is
> known at link time.

.and link time can mean run-time with dynamically loaded code,
so that's no help in the general case.

-- Niklas Matthies

msalters

unread,

Jul 19, 2005, 5:05:21 AM7/19/05

to

Rafal Dabrowa schreef:

> Assume we want to have an array of objects derived from
> a base class. Let's say, we have a base class

> and a few classes derived from it:
>
> class C : pubic base { /* ... */ }
> class D : public base { /* ... */ }
> class E : public base { /* ... */ }
>
> Now, we want to have a vector of objects, where every object
> is one of: C, D or E.
>

> Idea of my proposal is: allow to create an union of C, D and E, and
> allow to inspect union data through common base class members.
> Something like this:
>
> union U : base {
> C c;
> D d;
> E e;
> };

How should this work in the presence of multiple inheritance? And
virtual
inheritance? This is only easy if C,D and E all have a trivial layout
(starting with a base subobject at a fixed offset).

Regards,
Michiel Salters

Gabriel Dos Reis

unread,

Aug 10, 2005, 10:47:02 AM8/10/05

to

"msalters" <Michiel...@logicacmg.com> writes:

| Rafal Dabrowa schreef:
| > Assume we want to have an array of objects derived from
| > a base class. Let's say, we have a base class
| > and a few classes derived from it:
| >
| > class C : pubic base { /* ... */ }
| > class D : public base { /* ... */ }
| > class E : public base { /* ... */ }
| >
| > Now, we want to have a vector of objects, where every object
| > is one of: C, D or E.
| >
| > Idea of my proposal is: allow to create an union of C, D and E, and
| > allow to inspect union data through common base class members.
| > Something like this:
| >
| > union U : base {
| > C c;
| > D d;
| > E e;
| > };
|
| How should this work in the presence of multiple inheritance? And
| virtual
| inheritance?

First, I will start removing the inheritance " : base" -- it is
superfluous. Second, the classes listed in the union have "sealed"
dynamic type. Consequently their sizes are determined at compile-time,
therefore I do not see a problem with multiple inheritance or virtual
inheritance. All operations in the common "greatest" base class are
allowed on the union.

Gabriel Dos Reis

unread,

Aug 11, 2005, 1:24:59 AM8/11/05

to

no....@no.spam.com (Maciej Sobczak) writes:

| Rafal Dabrowa wrote:
|
| > Idea of my proposal is: allow to create an union of C, D and E, and
| > allow to inspect union data through common base class members.
| > Something like this:
| > union U : base {
| > C c;
| > D d;
| > E e;
| > };
|
| > Probably not all classes would be allowed in union.
|
| This would require to extend the concept of layout compatibility to
| cover not only members, but also common base sub-objects.

Yes. But, I don't see a problem with that. I can leave with
requiring common base bases, and allow operations on the "greatest"
common base classes.

| It looks nice when there is only one base class, but can get tricky or
| even impossible with multiple inheritance or with virtual inheritance.

The way I have always thought of it is along the lines of data types
that explicitly "closes" the inheritance chain -- something missing
from standard C++.

struc Visitor;

struct Expr_base {
virtual void accept(Visitor&) const = 0;
};

struct Constant : Expr_base {
// ...
};

struct Add : Expr_base {
// ...
};

struct Mul : Expr_base {
// ...
};

union class Expr = { Constant, Add, Mul };

Larry

unread,

Aug 12, 2005, 11:48:56 AM8/12/05

to

Gabriel Dos Reis wrote:
> "msalters" <Michiel...@logicacmg.com> writes:
>
> | Rafal Dabrowa schreef:

[snip]

> | > union U : base {
> | > C c;
> | > D d;
> | > E e;
> | > };
> |
> | How should this work in the presence of multiple inheritance? And
> | virtual
> | inheritance?
>
> First, I will start removing the inheritance " : base" -- it is
> superfluous. Second, the classes listed in the union have "sealed"

[snip]

I thought Rafal needed that in order be able to store a U* in an stl
container, as expressed in:

http://groups-beta.google.com/group/comp.std.c++/msg/455aebfe66e9bb4b?hl=en&

msalters

unread,

Aug 12, 2005, 1:29:45 PM8/12/05

to

Gabriel Dos Reis schreef:

It's not just size, but also layout. If C, D and E all have unique
base classes A and B, but at different offsets, either the conversion
from U* to A* or the conversion from U* to B* depends on the actual
type
stored. In the original example, C, D and E can have offsets in U such
that the base class subobjects overlap. That means the offset U* to
base*
is known.

If the inheritance ": base" if kept for U, and if it's restricted
to single inheritance, then it is possible to lay out all members of
U such that that the specified base class subobjects overlap - even
if C,D and E would share multiple base classes.

Of course, if we really want it, this type of union is trivial. Just
add an extra utable (similar to a vtable) with all the information
needed. You'd need a C-in-U utable, D-in-U utable, and an E-in-U
utable. Just use an utable pointer in each U object, and you're done.
All possible U*-to-base* conversions can be precalculated.
Even a dynamic_cast<C*> ( some_U ) would be trivial.

HTH,
Michiel Salters.

Gabriel Dos Reis

unread,

Aug 13, 2005, 2:33:49 AM8/13/05

to

"msalters" <Michiel...@logicacmg.com> writes:

Indeed, I was not precise enough. I implicilty assumed the common
greatest base class to be the primary base, without writing it
explicitly so. Given that property:

| If C, D and E all have unique
| base classes A and B, but at different offsets, either the conversion
| from U* to A* or the conversion from U* to B* depends on the actual
| type stored.

the offset is known at anytime: 0, because the common greatest base
class is the primary base class. A and B in your example do not qualify.

| In the original example, C, D and E can have offsets in U such
| that the base class subobjects overlap. That means the offset U* to
| base*
| is known.
|
| If the inheritance ": base" if kept for U, and if it's restricted
| to single inheritance, then it is possible to lay out all members of
| U such that that the specified base class subobjects overlap - even
| if C,D and E would share multiple base classes.
|
| Of course, if we really want it, this type of union is trivial. Just
| add an extra utable (similar to a vtable) with all the information
| needed.

The existance of a common "greatest" base class makes unnecessary the
need to resort to a "utable". The vtable of that base class (also
shared by U) contains the necessary information to determine the
dynamic type and any data needed by dynamic_cast.

Gabriel Dos Reis

unread,

Aug 13, 2005, 2:33:22 AM8/13/05

to

"Larry" <cpplj...@cox-internet.com> writes:

| Gabriel Dos Reis wrote:
| > "msalters" <Michiel...@logicacmg.com> writes:
| >
| > | Rafal Dabrowa schreef:
| [snip]
| > | > union U : base {
| > | > C c;
| > | > D d;
| > | > E e;
| > | > };
| > |
| > | How should this work in the presence of multiple inheritance? And
| > | virtual
| > | inheritance?
| >
| > First, I will start removing the inheritance " : base" -- it is
| > superfluous. Second, the classes listed in the union have "sealed"
| [snip]
|
| I thought Rafal needed that in order be able to store a U* in an stl
| container, as expressed in:
|
| http://groups-beta.google.com/group/comp.std.c++/msg/455aebfe66e9bb4b?hl=en&

The point why I consider it superfluous is that all members of U have
the same common "greatest" base class. Consequently, it provides a
common set of operations supported by U. Storing "U*" is just as
ordinary as storing any other T*. In fact, with that "discrimnitated"
union, there is no need to store a U* -- storing directly a U is now
possible. The limitation is the need for a common greatest base class.

msalters

unread,

Aug 16, 2005, 12:30:01 AM8/16/05

to

Gabriel Dos Reis schreef:

I'm probably confused by your choice of words. I was suggesting a case
in which A and B are two base classes, there is no unique base class
of A and B. C,D and E may contain unrelated extra base classes, so
there
is in general nothing you can say about the offsets of A and B in C,D
and
E. As I see it, those 6 offsets can have pretty much any value. Now,
given 6 random offsets, how does one layout U?

That is what I meant by multiple inheritance, not the easy situation
where the 6 offsets are trivial. Java-style MI would make this
possible,
but we have to deal with C++.

HTH,
Michiel Salters

Larry Evans

unread,

Aug 19, 2005, 12:58:03 AM8/19/05

to

On 07/16/2005 04:33 AM, Rafal Dabrowa wrote:

> leonardo77 wrote:

[snip]

> No problem, simply this class does not achieve the goal
> which I wanted. I want to have an union, which:
> * contains (one of) objects, which are derived from a specified
> base class.
> * allows to inspect the kept object through interface of the base
> class. No matter what object type is currently kept.

what about:

base_union<CommonBase,T1,T2,...,Tn>

where each T1,T2,...,Tn is derived from common base and there exists:

CommonBase*
base_union<CommonBase,T1,T2,...,Tn>::
get_base_ptr(void);

and in every other way, is the same as boost's variant.

[snip]

> I see one more solution here, namely variant with
> "null" value. This value might be initial one when default
> constructor is used, and it might be used also in erroneous
> situations. So, in my opinion, variant should have possibility
> of choice among at least these three implementations: use of heap,
> double storage, and possibility of null value.

[snip]

> So, why why don't allow an extension of unions, more suitable for
> C++ language ? Extension, which would allow to implement some
> things in more efficient way.

Sounds like you want something like a "policy_variant"
which allows several different implementations, somewhat like
the "policy_ptr" currently at:

http://cvs.sourceforge.net/viewcvs.py/boost-sandbox/boost-sandbox/boost/policy_ptr/

IOW, there would be a policy for how storage is laid out:

composite_layout<...>

and one for how updates are made:

composite_put<...>

and while we're at it, why not do the same for tuple and rename
the proposed library, "policy_composite". For a tuple, one
policy might be the way the construction of the tuple is
done. Currently, with boost's tuple, there's a limited
number of ways it can be constructed:

http://www.boost.org/libs/tuple/doc/tuple_users_guide.html#constructing_tuples

but some programmer might need something more specialized.

What do people think about such a policy_composite library?

Gabriel Dos Reis

unread,

Aug 29, 2005, 11:55:08 AM8/29/05

to

"msalters" <Michiel...@logicacmg.com> writes:

[...]

My use of the term "primary base class" is as defined in

http://www.codesourcery.com/cxx-abi/abi.html#definitions

and further explained in the algorithm.

http://www.codesourcery.com/cxx-abi/abi.html#class-types