Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Guru of the Week #85: Solution

1 view
Skip to first unread message

Herb Sutter

unread,
Dec 30, 2002, 9:40:48 PM12/30/02
to

-------------------------------------------------------------------
Guru of the Week problems and solutions are posted regularly on
news:comp.lang.c++.moderated. For past problems and solutions
see the GotW archive at www.GotW.ca. (c) 2002 H.P.Sutter
News archives may keep copies of this article.
-------------------------------------------------------------------

______________________________________________________________________

GotW #85: Style Case Study #3: Construction Unions

Difficulty: 4 / 10
______________________________________________________________________

Unions Redux
------------

>JG Questions
>------------
>
>1. What are unions, and what purpose do they serve?

Unions allow more than one object, of either class or builtin type, to
occupy the same space in memory. For example:

// Example 1
//
union U
{
int i;
float f;
};

U u;

u.i = 42; // ok, now i is active
std::cout << u.i << std::endl;

u.f = 3.14f; // ok, now f is active
std::cout << 2 * u.f << std::endl;

But only one of the types can be "active" at a time -- after all, the
storage can after all only hold one value at a time. Also, unions only
support some kinds of types, which leads us into the next question:


>2. What kinds of types cannot be used as members of unions? Why do
> these limitations exist? Explain.

>From the C++ standard:

An object of a class with a non-trivial constructor, a non-trivial
copy constructor, a non-trivial destructor, or a non-trivial copy
assignment operator cannot be a member of a union, nor can an array
of such objects.

In brief, for a class type to be usable in a union, it must meet all
of the following criteria:

- The only constructors, destructors, and copy assignment operators
are the compiler-generated ones.

- There are no virtual functions or virtual base classes.

- Ditto for all of its base classes and nonstatic members (or arrays
thereof).

That's all, but that sure eliminates a lot of types.

Unions were inherited from C. The C language has a strong tradition of
efficiency and support for low-level close-to-the-metal programming,
which has been compatibly preserved in C++; that's why C++ also has
unions. On the other hand, the C language does not have any tradition
of language support for an object model supporting class types with
constructors and destructors and user-defined copying, which C++
definitely does; that's why C++ also has to define what, if any, uses
of such newfangled types make sense with the "oldfangled" unions, and
do not violate the C++ object model including its object lifetime
guarantees.

If C++'s restrictions on unions did not exist, Bad Things could
happen. For example, consider what could happen if the following code
were allowed:

// Example 2: Not Standard C++ code, but what if it were allowed?
//
void f()
{
union IllegalImmoralAndFattening
{
std::string s;
std::auto_ptr<int> p;
};

IllegalImmoralAndFattening iiaf;

iiaf.s = "Hello, world"; // has s's constructor run?
iiaf.p = new int(4); // has p's constructor run?
}
// will s get destroyed? should it be?
// will p get destroyed? should it be?

As the comments indicate, serious problems would exist if this were
allowed. To avoid further complicating the language by trying to craft
rules that at best only might partly patch up a few of the problems,
the problematic operations were simply banished.

But don't think that unions are only a holdover from earlier times.
Unions are perhaps most useful for saving space by allowing data to
overlap, and this is still desirable in C++ and in today's modern
world. For example, some of the most advanced C++ standard library
implementations in the world now use just this technique for
implementing the "small string optimization," a great optimization
alternative that reuses the storage inside a string object itself: for
large strings, space inside the string object stores the usual pointer
to the dynamically allocated buffer and housekeeping information like
the size of the buffer; for small strings, the same space is instead
reused to store the string contents directly and completely avoid any
dynamic memory allocation. For more about the small string
optimization (and other string optimizations and pessimizations in
considerable depth), see Items 13-16 in my book More Exceptional C++
[2], or Scott Meyers' discussion of current commercial std::string
implementations in Effective STL [3].


Toward Dissection and Correction
--------------------------------

>3. The article in [1] cites the motivating case of writing a scripting
> language: Say that you want your language to support a single type
> for variables that at various times can hold an integer, a string,
> or a list. Creating a union { int i; list<int> l; string s; }
> doesn't work for the reasons given above. The following code
> presents a workaround that attempts to support allowing any type to
> participate in a union. For a more detailed explanation, see the
> original article.

On the plus side, the cited article addresses a real problem, and
clearly much effort has been put into coming up with a good solution.
Unfortunately, from well-intentioned beginnings more than one
programmer has gone badly astray.

The problems with the design and the code fall into three major
categories: legality, safety, and morality.

> Critique this code and identify:
>
> a) Mechanical errors, such as invalid syntax or nonportable
> conventions.
>
> b) Stylistic improvements that would improve code clarity,
> reusability, and maintainability.

The first overall comment that needs to be made is that the
fundamental idea behind this code is not legal in Standard C++. The
original article summarizes the key idea:

"The idea is that instead of declaring object members, you instead
declare a raw buffer [non-dynamically, as a char array member
inside the object pretending to act like a union] and instantiate
the needed objects on the fly [by in-place construction]." [1]

The idea is common, but unfortunately isn't sound. This technique is
nonconforming and nonportable because buffers that are not dynamically
allocated (e.g., via malloc() or new()) are not guaranteed to be
correctly aligned for any other type. Even if this technique happens
to accidentally work for some types on someone's current compiler,
there's no guarantee it will continue to work for other types, or for
the same types in the next version of the same compiler. For more
details and some directly related discussion, see for example Item 30
in Exceptional C++, notably the sidebar titled "Reckless Fixes and
Optimizations, and Why They're Evil." [4] See also the alignment
discussion in [9].

For C++0x, the standards committee is considering adding alignment
aids to the language specifically to enable techniques that rely on
alignment like this, but that's all still in the future. For now, to
make this work reasonably reliably even some of time, you'd have to do
one of the following:

- Rely on the max_align hack (see the above citation which footnotes
the max_align hack, or do a Google search for max_align); or

- Rely on nonstandard extensions like Gnu's __alignof__ to make this
work reliably on a particular compiler that supports such an
extension. (Even though Gnu provides an ALIGNOF macro intended to
work more reliably on other compilers, it too is admitted "hackery"
that relies on the compiler's laying out objects in certain ways
and making guesses based on offsetof() inquiries, which may often
be a good guess but is not guaranteed by the standard. See for
example [5].)

You could work around this by dynamically allocating the array using
malloc() or new(), which would guarantee that the char buffer is
suitably aligned for object of any type, but that would still be a bad
idea (it's still not type-safe) and it wouldn't achieve the potential
efficiency gains that the original article was aiming for. An
alternative and correct solution would be to use boost::any (see
below) which incurs a similar allocation/indirection overhead and is
also both safe and correct; more about that later on.

Attempts to work against the language, or to make the language work
the way we want it to work instead of the way it actually does work,
are often questionable and should be a big red flag. In the
Exceptional C++ sidebar cited above, while in an ornery mood I also
accused a similar technique of "just plain wrongheadedness" followed
by some pretty strong language. There can still be cases where it
could be reasonable to use constructs that are known to be nonportable
but okay in a particular environment (in this case, perhaps using the
max_align hack), but even then I would argue that that fact should be
noted explicitly and further that it still has no place in a general
piece of code recommended for wide use.

> #include <list>
> #include <string>
> #include <iostream>
> using namespace std;

Since new is going to be used below, also #include <new>. (The
<iostream> header was used later in the original code, not shown here,
which had a test harness that emitted output.)

> #define max(a,b) (a)>(b)?(a):(b)
>
> typedef list<int> LIST;
> typedef string STRING;
>
> struct MYUNION {
> MYUNION() : currtype( NONE ) {}
> ~MYUNION() {cleanup();}

The first classic mechanical error above is that MYUNION is unsafe to
copy because the programmer forgot to provide a suitable copy
constructor and copy assignment operator.

MYUNION is choosing to play games that require special work be done in
the constructor and destructor, so these are provided as above; that's
fine as far as it goes. But it doesn't go far enough, because the same
games require special work in the copy constructor and copy assignment
operator, which are not provided. The default compiler-generated
copying operations do the wrong thing, namely copy the contents
bitwise as an array of chars, which is likely to have most
unsatisfactory results, in most cases leading straight to memory
corruption. Consider the following code:

// Example 3-1: MYUNION is unsafe for copying
//
{
MYUNION u1, u2;
u1.getstring() = "Hello, world";
u2 = u1; // copies the bits of u1 to u2
} // oops, double delete of the string (assuming the bitwise copy even
made sense)

Guideline: Observe the Law of the Big Three: If a class needs a
custom copy constructor, copy assignment operator, or
destructor, it probably needs all three.

Passing on from the classic mechanical error, we next encounter a duo
of classic stylistic errors:

>
> enum uniontype {NONE,_INT,_LIST,_STRING};
> uniontype currtype;
>
> inline int& getint();
> inline LIST& getlist();
> inline STRING& getstring();

There are two stylistic errors here. First, this struct is not
reusable because it is hardcoded for specific types. Indeed, the
original article recommended handcoding such a struct every time it
was needed. Second, even given its limited intended usefulness, it is
not very extensible or maintainable. We'll return to this frailty
again later, once we've covered more of the context.

There are also two mechanical problems. The first is that currtype is
public for no good reason; this violates good encapsulation and means
any user can freely mess with the type, even by accident. The second
mechanical problem concerns the names used in the union; I'll cover
that in its own section, "Underhanded Names," later on.

> protected:

Next, we encounter another mechanical error: The internals ought to be
private, not protected. The only reason to use protected would be to
make the internals available to derived classes, but there had better
not be any derived classes because MYUNION is unsafe to derive from
for several reasons -- not least because of the murky and abstruse
games it plays with its internals, and because it lacks a virtual
destructor.

> union {
> int i;
> unsigned char buff[max(sizeof(LIST),sizeof(STRING))];
> } U;

The's a mechanical error here that's a small one in comparison with
the others, but still worth noting. The above code assumes that LIST
and STRING have size at least as large as int. That's probably true on
every compiler and standard library platform you're ever likely to
meet, but people who make assumptions like this are apt to make
assumptions about other types, too, and eventually get hoist right
smartly by their own petard. The moral? If you're going to calculate
the maximum size of a list of things, then at least be complete and
list them all, otherwise the one you choose to omit might return to
haunt you someday.

> void cleanup();
> };

That's it for the main class definition. Moving on, consider the three
parallel accessor functions:

> inline int& MYUNION::getint()
> {
> if( currtype==_INT ) {
> return U.i;
> } else {
> cleanup();
> currtype=_INT;
> return U.i;
> } // else
> }
>
> inline LIST& MYUNION::getlist()
> {
> if( currtype==_LIST ) {
> return *(reinterpret_cast<LIST*>(U.buff));
> } else {
> cleanup();
> LIST* ptype = new(U.buff) LIST();
> currtype=_LIST;
> return *ptype;
> } // else
> }
>
> inline STRING& MYUNION::getstring()
> {
> if( currtype==_STRING) {
> return *(reinterpret_cast<STRING*>(U.buff));
> } else {
> cleanup();
> STRING* ptype = new(U.buff) STRING();
> currtype=_STRING;
> return *ptype;
> } // else
> }

A minor nit: The "// else" comment adds nothing. It's unfortunate that
the only comments in the code are useless ones.

More seriously, there are three major problems here. The first is that
the functions are not written symmetrically, and whereas the first use
of a list or a string yields a default-constructed object, the first
use of int yields an uninitialized object. If that is intended, in
order to mirror the ordinary semantics of uninitialized int variables,
that should be documented; since it is not, the int ought to be
initialized. For example, if the caller accesses getint() and tries to
make a copy of the (uninitialized) value, the result is undefined
behavior -- not all platforms support copying arbitrary invalid int
values, and some will reject the instruction at runtime.

The second major problem is that this code hinders const-correct use.
If the code is really going to be written the above way, then at least
it would be useful to also provide const overloads for each of these
functions; each would naturally return the same thing as its non-const
counterpart, but by a reference to const.

The third major problem is that the approach above is fragile and
brittle in the face of change. It relies on type switching (see any of
Steve Dewhurst's many commentaries against this notion in other
contexts in previous issues of CUJ), and it's easy to accidentally
fail to keep all the functions in sync when you add or remove new
types.

Stop reading here and consider: What do you have to do in the above
code if you want to add a new type? Make as complete a list as you
can.

* * * * *

Are you back? All right, here's the list I came up with. To add a new
type, you have to remember to: (a) add a new enum value; (b) add a new
accessor member; (c) update the cleanup() function to safely destroy
the new type; and (d) add that type to the max() calculation to ensure
buff is sufficiently large to hold the new type too.

If you missed one or more of those, well, that just illustrates how
difficult this code really is to maintain and extend.

Pressing onward, we come to the final function:

> void MYUNION::cleanup()
> {
> switch( currtype ) {
> case _LIST: {
> LIST& ptype = getlist();
> ptype.~LIST();
> break;
> } // case
> case _STRING: {
> STRING& ptype = getstring();
> ptype.~STRING();
> break;
> } // case
> default: break;
> } // switch
> currtype=NONE;
> }

Let's reprise that small commenting nit again: The "// case" and "//
switch" comments add nothing; it's unfortunate that the only comments
in the code are useless ones. It is better to have no comments at all
than to have comments that are just distractions.

But there's a larger issue here: Rather than having simply "default:
break;", it would be good to make an exhaustive list (including the
"int" type) and signal a logic error if the type is unknown -- perhaps
via "throw std::logic_error(...);".

Again, type switching is purely evil. A Google search for "switch C++
Dewhurst" will yield all sorts of interesting references on this
topic, including [6]; see those for more details, if you need more
ammo to convince colleagues to avoid the type-switching beast.

Guideline: Avoid type switching; prefer type safety.


Underhanded Names
-----------------

There's one mechanical problem I haven't yet covered. This problem
first rears its ugly, unshaven, and unshampooed head in the following
line:

> enum uniontype {NONE,_INT,_LIST,_STRING};

Never, ever, ever create names that begin with an underscore or
contain a double underscore; they're reserved for your compiler and
standard library vendor's exclusive use, so that they have names that
they can use without tromping on your code. Tromp on their names, and
their names might just tromp back on you! (The more specific rule is
that any name with a double underscore anywhere in it __like__this or
that starts with an underscore and a capital letter _LikeThis is
reserved. It's easier just to avoid both leading underscores and
double underscores entirely.)

Don't stop! Keep reading! You might have read this advice before. You
might even have read it from me. You might even be tired of it, and
yawning, and ready to ignore the rest of this section. If so, this
one's for you, because this advice is not at all theoretical, and it
bites and bites hard in this code.

The above line happens to compile on most of the compilers I tried
(Borland 5.5, Comeau 4.3.0.1, Intel 7.0, gcc 2.95.3 / 3.1.1 / 3.2, and
Microsoft Visual C++ 6.0, 7.0, and 7.1 RC1). But under two of them --
Metrowerks CodeWarrior 8.2, and the EDG 3.0.1 demo front-end used with
the Dinkumware 4.0 standard library -- the code breaks horribly.

Under Metrowerks CodeWarrior 8, this line breaks noisily with the
first of 52 errors. The 225 lines of error messages begin with the
following diagnostics:

### mwcc Compiler:
# File: 1.cpp
# --------------
# 17: enum uniontype {NONE,_INT,_LIST,_STRING};
# Error: ^
# identifier expected
### mwcc Compiler:
# 18: uniontype currtype;
# Error: ^^^^^^^^^
# declaration syntax error

followed by 52 further error messages, and 215 more lines. What's
pretty obvious from the second and later errors is that we should
ignore them for now because they're just cascades from the first error
-- since uniontype was never successfully defined, the rest of the
code which uses uniontype extensively will of course break too.

But what's up with the definition of uniontype? The indicated comma
sure looks like it's in a reasonable place, doesn't it? There's an
identifier happily sitting in front of it, isn't there? All becomes
clear when we ask the Metrowerks compiler to spit out the preprocessed
output... omitting many many lines, here's what the compiler finally
sees:

enum uniontype {NONE,_INT, , };

Aha! That's not valid C++, and the compiler rightly complains about
the third comma because there's no identifier in front of it.

But what happened to _LIST and _STRING? You guessed it -- tromped on
and eaten by the ravenously hungry Preprocessor Beast. It just so
happens that Metrowerks' implementation has macros that happily strip
away the names _LIST and _STRING, which is perfectly legal and
legitimate because it (the implementation) allowed to own those _Names
(as well as _Other__names).

So Metrowerks' implementation happens to eat both _LIST and _STRING.
What about EDG's/Dinkumware's? Judge for yourself:

"1.cpp", line 17: error: trailing comma is nonstandard
enum uniontype {NONE,_INT,_LIST,_STRING};
^

"1.cpp", line 58: error: expected an expression
if( currtype==_STRING) {
^

"1.cpp", line 63: error: expected an expression
currtype=_STRING;
^

"1.cpp", line 76: error: expected an expression
case _STRING: {
^

4 errors detected in the compilation of "1.cpp".


This time, even without generating and inspecting a preprocessed
version of the file, we can see what's going on: The compiler is
behaving as though the word "_STRING" wasn't there. That's because it
was -- you guess it -- tromped on, not to mention thoroughly chewed up
and spat out, by the still-peckish Preprocessor Beast.

I hope that this will convince you that when some writers natter on
about not using _Names like__these, the problem is far from
theoretical. It's practical indeed, because the naming restriction
directly affects your relationship with your compiler and standard
library writer. Trespass on their turf, and you might get lucky and
remain unscathed; on the other hand, you might not.

The C++ landscape is wide-open and clear and lets you write all sorts
of wonderful and flexible code and wander in pretty much whatever
direction your development heart desires, including that it lets you
choose pretty much whatever names you like outside of namespace std.
But when it comes to names, C++ also has one big fenced-off grove,
surrounded by gleaming barbed wire and signs that say things like
"Employees__Only -- Must Have Valid _Badge To Enter Here" and
"Violators May Be Tromped and Eaten." The above is a stellar example
of the tromping one gets for disregarding the _Warnings.

Guideline: Never use "underhanded names" -- ones that begin with
an underscore, or that contain a double underscore.


Toward a Better Way: boost::any
-------------------------------

>4. Show a better way to achieve a generalized variant type, and
> comment on any tradeoffs you encounter.

The original article says:

"[Y]ou might want to implement a scripting language with a single
variable type that can either be an integer, a string, or a list."
[1]

This is true, and there's no disagreement so far. But the article then
continues:

"A union is the perfect candidate for implementing such a composite
type." [1]

Rather, the article has served to show in some considerable detail
just why a union is not suitable at all.

But if not a union, then what? One very good candidate for
implementing such a variant type is Boost's "any" facility, along with
its "many" and "any_cast".[7] Jim Hyslop and I discussed it in our
article "I'd Hold Anything For You."[8] Interestingly, the complete
implementation for the fully general "any" (covering any
number/combination of types and even some platform-specific #ifdefs)
is about the same amount of code as the sample MYUNION solution for
the special case of the three types int, list<int>, and string -- and
it's fully general, extensible, type-safe, and part of a healthy
low-cholesterol diet.

There is still a tradeoff, however, and it is this: Dynamic
allocation. The boost::any facility does not attempt to achieve the
potential efficiency gain of avoiding a dynamic memory allocation,
which was part of the motivation in the original article. Note too
that the boost::any dynamic allocation overhead is more than if the
original article's code was just modified to use (and reuse) a single
dynamically allocated buffer that's acquired once for the lifetime of
MYUNION, because boost::any performs a dynamic allocation every time
the contained type is changed, too.

Here's how the article's demo harness would look if it instead used
boost::any. The old code that uses the original article's version of
MYUNION is shown in comments for comparison:

// MYUNION u;
any u;

Instead of a handwritten struct, which has to be written again for
each use, just use any directly. Note that any is a plain class, not a
template.

// access union as integer
// u.getint() = 12345;
u = 12345;

The assignment shows any's more natural syntax.

// cout << "int=" << u.getint() << endl;
cout << "int=" << any_cast<int>(u) << endl;
// or just "int(u)"

I like any's cast form better because it's more general (including
that it is a nonmember) and more natural to C++ style; you could also
use the less verbose "int(u)" without an any_cast if you know the type
already. On the other hand, get[type]() is more fragile, harder to
write and maintain, and so forth.

// access union as std::list
// LIST& list = u.getlist();
// list.push_back(5);
// list.push_back(10);
// list.push_back(15);

u = list<int>();
list<int>& l = *any_cast<list<int> >(&u);
l.push_back(5);
l.push_back(10);
l.push_back(15);

I think any_cast could be improved to make it easier to get
references, but this isn't too bad. (Aside: I'd discourage using
'list' as a variable name when it's also the name of a template in
scope; too much room for expression ambiguity.)

So far, we've achieved some typability and readability savings. The
remaining differences are more minor:

// LIST::iterator it = list.begin();
list<int>::iterator it = l.begin();
while( it != l.end() ) {
cout << "list item=" << *(it) << endl;
it++;
} // while

Pretty much unchanged.

// access union as std::string
// STRING& str = u.getstring();
// str = "Hello world!";
u = string("Hello world!");

Again, about a wash; I'd say the any version is slightly simpler than
the original, but only slightly.

// cout << "string='" << str.c_str() << "'" << endl;
cout << "string='" << any_cast<string>(u) << "'" << endl;
// or just "string(u)"

As before.


Alexandrescu's Discriminated Unions
-----------------------------------

Is it possible to fully achieve both of the original goals -- safety
and avoiding dynamic memory -- in a conforming Standard C++
implementation? That sounds like a problem that someone like Andrei
Alexandrescu would love to sink his teeth into, especially if it could
somehow involve complicated templates. As evidenced in [9], [10], and
[11], where Andrei describes his discriminated unions (a.k.a. Variant)
approach, it turns out that:

- it is (something he would love to tackle), and

- it can (involve weird templates, and just one quote from [9] says
it all: "Did you know that unions can be templates?"), so

- he does.

In short, by performing heroic efforts to push the boundaries of the
language as far as possible, Alexandrescu's Variant comes very close
to a truly portable solution. It falls only slightly short, and is
probably portable enough in practice even though it goes beyond the
pale of what the Standard guarantees. Its main problem is that it
actually works on very few compilers -- in my testing, I only managed
to get it to work with one.

A key part of Alexandrescu's Variant approach is an attempt to
generalize the max_align idea to make it a reusable library facility
that can itself still be written in conforming Standard C++. The
reason for wanting this is specifically to deal with the alignment
problems in the code we've been analyzing above, so that a non-dynamic
char buffer can continue to be used in relative safety. Alexandrescu
makes heroic efforts to use template metaprogramming to calculate a
safe alignment. Will it work portably? His discussion of this question
follows:

"Even with the best Align, the implementation above is still not
100-percent portable for all types. In theory, someone could
implement a compiler that respects the Standard but still does not
work properly with discriminated unions. This is because the
Standard does not guarantee that all user-defined types ultimately
have the alignment of some POD type. Such a compiler, however,
would be more of a figment of a wicked language lawyer's
imagination, rather than a realistic language implementation.

"[...] Computing alignment portably is hard, but feasible. It
never is 100-percent portable." [10]

There are other key features in Alexandrescu's approach, notably a
union template that takes a typelist template of the types to be
contained, visitation support for extensibility, and an implementation
technique that will "fake a vtable" for efficiency to avoid an extra
indirection when accessing a contained type. These parts are more
heavyweight than boost::any, but are portable in theory. That
"portable in theory" part is important -- as with Andrei's great work
in Modern C++ Design [12] [13], the implementation is so heavy on
templates that the code itself contains comments like: "Guaranteed to
issue an internal compiler error on: [various popular compilers,
Metrowerks, Microsoft, Gnu gcc]", and the mainline test harness
contains a commented-out test helpfully labeled "The construct below
didn't work on any compiler."

That is Variant's major weakness: Most real-world compilers don't even
come close to being able to handle this implementation, and the code
should be viewed as important but still experimental. I attempted to
build Alexandrescu's Variant code using all of the compilers that I
have available: Borland 5.5; Comeau 4.3.0.1; EDG 3.0.1; Intel 7.0; gcc
2.95, 3.1.1, and 3.2; Metrowerks 8.2; and Microsoft VC++ 6.0, 7.0, and
7.1 RC1. As some readers will know, some of the products in that list
are very strong and standards-conforming compilers. None of these
compilers could successfully compile Alexandrescu's template-heavy
source as it was provided.

I tried to massage the code by hand to get it through any of the
compilers, but was only successful with Microsoft VC++ 7.1 RC1. Most
of the compilers didn't stand a chance, because they did not have
nearly strong enough template support to deal with Alexandrescu's
code. (Some emitted a truly prodigious quantity of warnings and errors
-- Intel 7.0's response to compiling main.cpp was to spew back an
impressive 430K's worth -- really, nearly half a megabyte! -- of
diagnostic messages.)

I had to make three changes to get the code to compile without errors
(although still with some narrowing-conversion warnings at the highest
warning level) under Microsoft VC++ 7.1 RC1:

- Added a missing "typename" in class AlignedPOD.

- Added a missing "this->" to make a name dependent in
ConverterTo<>::Unit<>::DoVisit().

- Added a final newline character at the end of several headers, as
required by the C++ standard (some conforming compilers aren't
strict about this and allow the absence of a final newline as a
conforming extension; VC++ is stricter and requires the
newline). [14]

As the author of [1] commented further about tradeoffs in
Alexandrescu's design: "It doesn't use dynamic memory, and it avoids
alignment issues and type switching. Unfortunately I don't have access
to a compiler that can compile the code, so I can't evaluate its
performance vs. myunion and any. Alexandrescu's approach requires 9
supporting header files totaling ~80KB, which introduces its own set
of maintenance problems." [15]

I won't try to summarize Andrei's three articles further here, but I
encourage readers who are interested in this problem to look them up.
They're available online as indicated in the references below.

Guideline: If you want to represent variant types, for now prefer
to use boost::any (or something equally simple).

Once the compiler you are using catches up (in template support) and
the Standard catches up (in true alignment support) and Variant
libraries catch up (in mature implementations), it will be time to
consider using Variant-like library tools as type-safe replacements
for unions.


Summary
-------

Even if the design and implementation of MYUNION are lacking, the
motivating problem is both real and worth considering. I'd like to
thank Mr. Manley for taking the time to write this article and raise
awareness of the need for variant type support, and Kevlin Henney and
Andrei Alexandrescu for contributing their own solutions to this area.
It is a hard enough problem that Manley's and Alexandrescu's
approaches are not strictly portable, standards-conforming C++,
although Alexandrescu's Variant makes heroic efforts to get there --
Alexandrescu's design is very close to portable in theory, although
the implementation is still far from portable in practice.

For now, an approach like Henney's boost::any is the preferred way to
go. If in certain places your measurements tell you that you really
need the efficiency or extra features provided by something like
Alexandrescu's Variant, and you have time on your hands and some
template know-how, you might experiment with writing your own
scaled-back version of the full-blown Variant by applying only the
ideas in [9], [10], and [11] that are applicable to your situation.


References
----------

[1] K. Manley. "Using Constructed Types in Unions" (C/C++ Users
Journal, 20(8), August 2002).

[2] H. Sutter. More Exceptional C++ (Addison-Wesley, 2002).

[3] S. Meyers. Effective STL (Addison-Wesley, 2001).

[4] H. Sutter. Exceptional C++ (Addison-Wesley, 2000).

[5] http://list-archive.xemacs.org/xemacs-patches/200101/msg00183.html

[6] S. Dewhurst. "C++ Hierarchy Design Idioms", available online at
www.semantics.org/talknotes/SD2002W_HIERARCHY.pdf.

[7] K. Henney. C++ Boost any class, www.boost.org/libs/any.

[8] H. Sutter and J. Hyslop. "I'd Hold Anything For You" (C/C++ Users
Journal, 19(12), December 2001), available online at
http://www.cuj.com/experts/1912/hyslop.htm.

[9] A. Alexandrescu. "Discriminated Unions (I)" (C/C++ Users Journal,
20(4), April 2002), available online at
http://cuj.com/experts/2004/alexandr.htm.

[10] A. Alexandrescu. "Discriminated Unions (II)" (C/C++ Users
Journal, 20(6), June 2002), available online at
http://cuj.com/experts/2006/alexandr.htm.

[11] A. Alexandrescu. "Discriminated Unions (III)" (C/C++ Users
Journal, 20(8), August 2002), available online at
http://cuj.com/experts/2008/alexandr.htm.

[12] A. Alexandrescu. Modern C++ Design (Addison-Wesley, 2001).

[13] H. Sutter. "Review of Alexandrescu's Modern C++ Design" (C/C++
Users Journal, 20(4), April 2002), available online at
http://www.gotw.ca/publications/mcd_review.htm.

[14] Thanks to colleague Jeff Peil for pointing out this requirement
in clause 2.1/1, which states: "If a source file that is not empty
does not end in a new-line character, or ends in a new-line character
immediately preceded by a backslash character, the behavior is
undefined."

[15] K. Manley, private communication.

---
Herb Sutter (www.gotw.ca)

Convener, ISO WG21 - Secretary, ANSI J16 (www.gotw.ca/iso)
Contributing editor, C/C++ Users Journal (www.gotw.ca/cuj)
C++ community program manager, Microsoft (www.gotw.ca/microsoft)

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

Ivan Vecerina

unread,
Jan 3, 2003, 11:44:24 PM1/3/03
to
I just had a comment pop up while reading this GotW solution:

"Herb Sutter" <hsu...@gotw.ca> wrote in message
news:a8a11v4lb2avth0p6...@4ax.com...


| > union {
| > int i;
| > unsigned char buff[max(sizeof(LIST),sizeof(STRING))];
| > } U;
|
| The's a mechanical error here that's a small one in comparison with
| the others, but still worth noting. The above code assumes that LIST
| and STRING have size at least as large as int.

How does it assume it?

The char buffer is within a union with member i of type int, which
is used whenever the stored integer value needs to be accessed.
Sufficient space would be allocated for 'int' in the union,
even in the most exotic C++ platform.

I guess this is just another example showing that the asymmetry
between the handling of 'int' vs. the two other types is
confusing...


--
Ivan Vecerina, Dr. med. <> http://www.post1.com/~ivec
Soft Dev Manger, xitact <> http://www.xitact.com
Brainbench MVP for C++ <> http://www.brainbench.com

0 new messages