Code Bloat due to Templates

864 views
Skip to first unread message

Scott Meyers

unread,
May 16, 2002, 10:07:58 AM5/16/02
to
I often hear about "code bloat" arising from templates. I've been trying
to figure out exactly what this means. From what I can tell, there are
several different meanings, as follows, where Temp is a template and T, T1,
and T2 are type parameters.

1. [Multiple instantiations] Temp<T> is instantiated in more than one
translation unit, so when the objs are linked together, the exe has
more than one copy of Temp<T>'s member functions.

2. [Duplicate instantiations] Temp<T1> and Temp<T2> could be share a
single underlying binary implementation, but they don't. The most
common example is when T1 and T2 are both pointer types, but it would
also apply if T1=int and T2=long on a machine where int and long are
the same size.

3. [Near-duplicate instantiations] Temp<T, n> and Temp<T, m> are
individually instantiated for non-type parameters n and m, even
though the resulting member functions are almost identical. This
would be the case for e.g., FixedSizeBuffer<int, 10> and
FixedSizeBuffer<int, 20>.

4. [Excessive inlining] Because many compilers require that all template
code be in header files, all such code is inlined, and that makes
executables bigger than they would be if template functions that are
large and frequently called could be outlined.

5. [Excessive instantiation] All the member functions of Temp<T> are
instantiated, even though only a few are called. (This is
nonstandard behavior, but, at least in the past, it was a problem
with some compilers.)

6. [Gratuitous types] Temp<T1> and Temp<T2> are both instantiated, but
if templates didn't exist, programmers would make do with a single
untemplatized implementation. For example, programmers instantiate
both Stack<int> and Stack<long>, but if they lacked templates, they'd
get by with only IntStack.

Are there other causes of code bloat? Of the different types of code
bloat, which are the most troublesome in practice?

Thanks,

Scott

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

Ingolf Steinbach

unread,
May 16, 2002, 7:19:48 PM5/16/02
to
Scott Meyers wrote:
> 2. [Duplicate instantiations] Temp<T1> and Temp<T2> could be share a
> single underlying binary implementation, but they don't. The most
> common example is when T1 and T2 are both pointer types, but it would
> also apply if T1=int and T2=long on a machine where int and long are
> the same size.

Are you sure? What about:

template <typename T> void foo(T tl, T tr)
{
*tl = *tr;
}

typedef char* T1;
typedef std::string* T2;

Both T1 and T2 are pointer types. Could they share the same
instantiation of foo()?

Kind regards
Ingolf
--

Ingolf Steinbach Jena-Optronik GmbH
ingolf.s...@jena-optronik.de ++49 3641 200-147
PGP: 0x7B3B5661 213C 828E 0C92 16B5 05D0 4D5B A324 EC04

Francis Glassborow

unread,
May 16, 2002, 7:24:59 PM5/16/02
to
In article <MPG.174cf495d...@news.hevanet.com>, Scott Meyers
<sme...@aristeia.com> writes

>I often hear about "code bloat" arising from templates. I've been trying
>to figure out exactly what this means. From what I can tell, there are
>several different meanings, as follows, where Temp is a template and T, T1,
>and T2 are type parameters.
>
> 1. [Multiple instantiations] Temp<T> is instantiated in more than one
> translation unit, so when the objs are linked together, the exe has
> more than one copy of Temp<T>'s member functions.

An implementation that does this is broken. Consider what will happen
with:

void foo(){ static int i; ...}

as a member function of the class template. The implementation HAS to be
able to remove duplicated function instantiations.

>
> 2. [Duplicate instantiations] Temp<T1> and Temp<T2> could be share a
> single underlying binary implementation, but they don't. The most
> common example is when T1 and T2 are both pointer types, but it would
> also apply if T1=int and T2=long on a machine where int and long are
> the same size.

This is subtler. When dealing with pointer types the hoisting idiom
should be used so that most of the code is in a base class used for a
partial specialisation.

template< typename T> class X {
//whatever
}

template<> class X<void*>{
// fully specialise for void*
};

template <typename T>
class X<T*>: X<void*> {
// use hoisting idiom
};

The int/long problem can be dealt with via specialisation but doing so
would normally be reserved for places where code size became critical.

>
> 3. [Near-duplicate instantiations] Temp<T, n> and Temp<T, m> are
> individually instantiated for non-type parameters n and m, even
> though the resulting member functions are almost identical. This
> would be the case for e.g., FixedSizeBuffer<int, 10> and
> FixedSizeBuffer<int, 20>.

The hoisting idiom should certainly be considered for such cases.

>
> 4. [Excessive inlining] Because many compilers require that all template
> code be in header files, all such code is inlined, and that makes
> executables bigger than they would be if template functions that are
> large and frequently called could be outlined.

Yes, and this is a real problem when programmers actually stuff the
implementation into the class template definition. This is one reason
why I advocate that the implementation should always be in its own file
even if you then #include it into the definition file.

>
> 5. [Excessive instantiation] All the member functions of Temp<T> are
> instantiated, even though only a few are called. (This is
> nonstandard behavior, but, at least in the past, it was a problem
> with some compilers.)

Actually it can still be a problem with explicit instantiation, but it
can be avoided by making the explicit instantiation file(s) into a
library.

>
> 6. [Gratuitous types] Temp<T1> and Temp<T2> are both instantiated, but
> if templates didn't exist, programmers would make do with a single
> untemplatized implementation. For example, programmers instantiate
> both Stack<int> and Stack<long>, but if they lacked templates, they'd
> get by with only IntStack.

Yes, but presumably IntStack would be for long, with the possibility
that the memory requirement for the data doubles on systems where int
and long are different sizes.

As I hope is clear, there is little reason for code bloat if those
designing and implementing templates are competent and the compiler is a
high quality product. Poor compilers, whether C or C++ or any other
language can generate excessive code, and poor programmers can write
code that will generate much more code than is necessary.

The point is that the code bloat (where it exists) is not the result of
C++ but the consequence of using inferior products or programmers.


--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

Carl Daniel

unread,
May 16, 2002, 7:26:19 PM5/16/02
to
My $0.02...

"Scott Meyers" <sme...@aristeia.com> wrote in message
news:MPG.174cf495d...@news.hevanet.com...


> I often hear about "code bloat" arising from templates. I've been trying
> to figure out exactly what this means. From what I can tell, there are
> several different meanings, as follows, where Temp is a template and T,
T1,
> and T2 are type parameters.
>
> 1. [Multiple instantiations] Temp<T> is instantiated in more than one
> translation unit, so when the objs are linked together, the exe has
> more than one copy of Temp<T>'s member functions.

With the compilers I use, this doesn't happen.

>
> 2. [Duplicate instantiations] Temp<T1> and Temp<T2> could be share a
> single underlying binary implementation, but they don't. The most
> common example is when T1 and T2 are both pointer types, but it
would
> also apply if T1=int and T2=long on a machine where int and long
are
> the same size.

I think this is the single most incidious source of code bloat - none of the
compilers I use will share the binary representation between distinct C++
types which happen to be structurally identical. Of course, this kind of
optimization can frequently be manually implemented by the programmer as
long as the compiler supports PTS.

>
> 3. [Near-duplicate instantiations] Temp<T, n> and Temp<T, m> are
> individually instantiated for non-type parameters n and m, even
> though the resulting member functions are almost identical. This
> would be the case for e.g., FixedSizeBuffer<int, 10> and
> FixedSizeBuffer<int, 20>.

This would be nice to solve (at the compiler level), but can be mitigated by
appropriate construction of the template classes (using base class members,
etc).

>
> 4. [Excessive inlining] Because many compilers require that all
template
> code be in header files, all such code is inlined, and that makes
> executables bigger than they would be if template functions that
are
> large and frequently called could be outlined.

My impression is that compilers do a reasonably good job of choosing what's
inlined and what's not when the appropriate switches are used. Compilers
with whole-program-optimization are becoming more common, so hopefully this
will get even better in the future.

>
> 5. [Excessive instantiation] All the member functions of Temp<T> are
> instantiated, even though only a few are called. (This is
> nonstandard behavior, but, at least in the past, it was a problem
> with some compilers.)

This isn't a problem with the compilers I use.

>
> 6. [Gratuitous types] Temp<T1> and Temp<T2> are both instantiated, but
> if templates didn't exist, programmers would make do with a single
> untemplatized implementation. For example, programmers instantiate
> both Stack<int> and Stack<long>, but if they lacked templates,
they'd
> get by with only IntStack.

If #2 and #3 are addressed, this becomes a non-issue. At present, it's a
programmer discipline issue if memory is tight enough that it matters.

>
> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?
>
> Thanks,
>
> Scott

-cd

Vincent Finn

unread,
May 16, 2002, 7:27:51 PM5/16/02
to
> 4. [Excessive inlining] Because many compilers require that all template
> code be in header files, all such code is inlined, and that makes
> executables bigger than they would be if template functions that are
> large and frequently called could be outlined.

Are you sure about this
I remember a discussion about it in comp.lang.c++
and the last comment was that templates are NOT always inlined

The link below mightn't be clickable but if you paste it in it'll go to the
end of that discussion

http://groups.google.com/groups?q=templates+inline+group:comp.lang.c%2B%2B.*&hl=en&lr=&selm=GbBw8.344%24h02.145674%40news20.bellglobal.com&rnum=8

Vin

Garry Lancaster

unread,
May 17, 2002, 5:45:16 AM5/17/02
to
Scott Meyers:

> I often hear about "code bloat" arising from templates.
> I've been trying to figure out exactly what this means.
> From what I can tell, there are several different
> meanings, as follows, where Temp is a template and
> T, T1, and T2 are type parameters.
>
> 1. [Multiple instantiations] Temp<T> is instantiated
> in more than one translation unit, so when the
> objs are linked together, the exe has
> more than one copy of Temp<T>'s member functions.

A decent linker should remove all but one of these,
surely?

> 2. [Duplicate instantiations] Temp<T1> and
> Temp<T2> could be share a single underlying
> binary implementation, but they don't. The most
> common example is when T1 and T2 are both
> pointer types, but it would also apply if T1=int and
> T2=long on a machine where int and long are
> the same size.

A smart compiler maybe able to spot this in a single
translation unit, but this is probably asking too much.
A smart linker maybe able to spot this, but this is
almost certainly asking too much.

In some cases the programmer can intervene. In
the pointer case, a smart programmer can develop a
partial specialization for all pointer types that is
implemented using a shared void* based
implementation. This can be somewhat more
work than a straightforward, non-shared,
implementation and is often slower. This is a key
point.

So it depends.

> 3. [Near-duplicate instantiations] Temp<T, n> and
> Temp<T, m> are individually instantiated for
> non-type parameters n and m, even though the
> resulting member functions are almost identical. This
> would be the case for e.g., FixedSizeBuffer<int, 10> and
> FixedSizeBuffer<int, 20>.

Again, a smart programmer could factor out the common
code if there was any. For reasons that should be obvious,
this is not worth doing for inline functions that are genuinely
inlined.

> 4. [Excessive inlining] Because many compilers require
> that all template code be in header files,

Then they don't support explicit instantiation? You should
be able to write:

// foo.h
template <typename T> T foo(T); // Declaration.

// foo.cpp
template <typename T> T foo(T t) { return t; } // Definition.
template int foo<int>(int); // Explicit instantiation.

// bar.cpp
#include "foo.h"
int main()
{
int i = foo( 1 );
}

Of course this isn't as versatile as implicit instantiation,
but still it shows templates don't always have to live
in headers. Even without export. Note that the standard
is a little confusing as to whether this is allowed: a
future revision will be clearer.

> all such code is inlined, and that makes
> executables bigger than they would be if template
> functions that are large and frequently called
> could be outlined.

What about function template definitions declared
outside a class definition without the inline keyword?
For example,

template <typename T>
void blah(T t) {} // Out-of-line template definition.

Or do you mean that even without the inline keyword,
optimizers can still choose to inline the code and that
you believe they do that more aggressively than they
should?

> 5. [Excessive instantiation] All the member functions
> of Temp<T> are instantiated, even though only a
> few are called. (This is nonstandard behavior, but,
> at least in the past, it was a problem with some
> compilers.)

I agree that's less of a problem now. The standard
is a bit weird about in-class friend definitions though:
these are supposed to be instantiated when their
containing class is instantiated, even if they are
never called, but most compilers seem to ignore
that rule. Hopefully the rule will be changed in line
with Defect Report #329.

> 6. [Gratuitous types] Temp<T1> and Temp<T2> are
> both instantiated, but if templates didn't exist,
> programmers would make do with a single
> untemplatized implementation. For example,
> programmers instantiate both Stack<int> and
> Stack<long>, but if they lacked templates, they'd
> get by with only IntStack.

I think this could be common. Although again
the problem could be solved by smart compilers
or smart programmers.

> Are there other causes of code bloat? Of the
> different types of code bloat, which are the most
> troublesome in practice?

When we write many templates we make a choice
about the implementation. There are two ways to
implement a template with template parameter T:

1. Fast. Where T (value type) is used in most member
functions. Emitted code likely to be very fast, but there
is little scope for sharing the implementation between
different instantiations.

2. Small. Where void* is used in most member functions
with a thin inline wrapper that casts from void*<=>T*.
Likely to be a bit slower and more complex but the
void* parts can be factored out into non-template functions
shared between all instantiations of the template.

Most templates I see choose option 1 (fast), but option
2 (small) is the choice we tended to make in C and is the
closest to the way most containers are implemented
in Java and C# (with their base Object types and casting).

A typical standard library contains many good examples
of an option 1 approach. However, it would be
possible to implement std::list using a linked list of
generic_node.

struct generic_node
{
generic_node* prev, next;
void* obj; // Note: not T obj;
};

(allowing parts of the implementation to be shared)
and no inline functions (though see note 1). This is,
as far as I'm aware, never done, probably because the
result would be considerably slower.

In summary, provided you have a linker that deals
correctly with what you term multiple instantiations,
the main cause of template bloat is the way
programmers choose to write template code. Almost
a classic time vs. space trade-off in fact except that
the template system also makes the fast version far
easier to write than the small version, something that is
not the case when programming in languages like
C, Java and C#.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

NOTES:

1. We'd have to be a bit careful removing all inline
functions in the name of bloat reduction. Sometimes
an inline expansion can actually be smaller than
the equivalent function call setup.

Daniel T.

unread,
May 17, 2002, 5:46:18 AM5/17/02
to
Scott Meyers <sme...@aristeia.com> wrote:

>Are there other causes of code bloat? Of the different types of code
>bloat, which are the most troublesome in practice?

In VC++, if you optimize for speed and have a lot of global std::string
objects which were initialized with something other than the default
c_tor, you will get an executable that is an extra 250 bytes in size for
each string initialized. I'm not sure why this happens, and it doesn't
seem to affect the size of the program in RAM but the executable is
bigger and it was justification enough for my boss to ban the use of
std::string at our company.

--
Improve your company's understanding of objects...
Hire me. <http://home1.gte.net/danielt3/resume.html>

JKB

unread,
May 17, 2002, 5:47:04 AM5/17/02
to

"Scott Meyers" <sme...@aristeia.com> wrote ...

> I often hear about "code bloat" arising from templates. I've been trying
> to figure out exactly what this means. [various bloats listed here]

> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?

okay, here's my two cents:

[Unreadable headers] Putting template code in the class declaration
significantly obscures the interface. Moving the bodies outside the class
declaration helps somewhat, but makes them even more syntactically bizarre
than they are inside the class. And it's not always possible - see the
"templatized assignment" thread. This impacts readability far more than the
bodies for inlined functions do, because those are normally quite small.

[Code dependency] Having function bodies in the header files, whether
they're inside the class declaration or not, causes other files to depend on
them. When an implementation detail is changed, those other files are
recompiled when they really don't need to be. Granted that fast processors
have reduced the importance of build speed, but this is still an issue on
large code bases.

[Insufficient instantiation] When a template class is being written, it's
hard to know if it even compiles correctly. Obviously this is
compiler-specific, but I've often seen errors in template code go undetected
for quite some time. Then somebody happens to specialize the right thing
and the code won't even compile.

These aren't precisely 'bloat' in the sense you asked, but they are ways in
which template instantiation issues impact the development process.
-- jkb

Giovanni Bajo

unread,
May 17, 2002, 5:50:18 AM5/17/02
to

"Scott Meyers" <sme...@aristeia.com> ha scritto nel messaggio
news:MPG.174cf495d...@news.hevanet.com...

I'll just comment along.

> 1. [Multiple instantiations]

Modern compilers should handle this very well. EDG has a "prelinker" to
trace which template has been instanciated where, GCC (used to?) has the
template repository, etc. I'm not sure how you could end up with two copies
of the same code, since there would be a name clash at link time (we're
speaking of course about non-inline code).

> 2. [Duplicate instantiations]

Another condition could be when Temp<>::F() does not rely in any meaningul
way on the template parameter. For example:

template <class T>
class FixedArray
{
T a[100];
bool flag;

public:
[.....]

void SetFlag(bool value)
{ flag = value; }
};

I put the flag variable after a[] by purpose, because I think that
offsetof(flag) could be passed as a hidden parameter to an unified
instanciation of SetFlag(), to avoid different instanciations just because
sizeof(T) changes. This case becoms even more interesting when your template
gets two parameters, and a function relies only on one of them.

> 3. [Near-duplicate instantiations] Temp<T, n> and Temp<T, m> are

Even in this case, most code could be unified by passing the numeric
argument as a hidden parameter to the member function. On the other hand, we
must be careful here, because it could break some optimizations (e.g.
CircularBuffer<int, 128> might rely on the fact that %128 is very fast,
while passing 128 as a hidden parameter obviously breaks this optimization.
Usual tradeoff space/size here.

> 4. [Excessive inlining]

Actually, a template member function is inlined only if it is defined within
the class definition, otherwise it is not (unless you explicitally specify
inline), just like in normal classes. The code must be in "header" files
because the compiler needs to be able to instantiate it but it won't
necessarily inline it unless needed/required.

> 5. [Excessive instantiation]

Not only it is nonstandard, but it even breaks existing code that relies on
the fact that member functions shall not be instanciated until their use. I
surely don't bother so much about this point.

> 6. [Gratuitous types] Temp<T1> and Temp<T2> are both instantiated, but
> if templates didn't exist, programmers would make do with a single
> untemplatized implementation. For example, programmers instantiate
> both Stack<int> and Stack<long>, but if they lacked templates,
they'd
> get by with only IntStack.

Yes, this can be a problem, especially when working in a team (I use
stack<int>, you use stack<long>). Not even sure how this could be fixed, if
not by the programmers themselves.

> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?

What about source code bloat? <g>
C++ template syntax is _verbose_.

Giovanni Bajo

Daniel Miller

unread,
May 17, 2002, 6:51:46 AM5/17/02
to
Scott Meyers wrote:

> I often hear about "code bloat" arising from templates. I've been trying
> to figure out exactly what this means. From what I can tell, there are
> several different meanings, as follows, where Temp is a template and T, T1,
> and T2 are type parameters.
>
> 1. [Multiple instantiations] Temp<T> is instantiated in more than one
> translation unit, so when the objs are linked together, the exe has
> more than one copy of Temp<T>'s member functions.
>
> 2. [Duplicate instantiations] Temp<T1> and Temp<T2> could be share a
> single underlying binary implementation, but they don't. The most
> common example is when T1 and T2 are both pointer types, but it would
> also apply if T1=int and T2=long on a machine where int and long are
> the same size.


For the purpose of my design-time code-bloat reason #7 below, I am going to
consider this purely a compile-time deficiency, where (my compile-time
reinterpretation of) #2 hopes for a compiler template-engine which is clever
enough to notice that the only difference between two or more compiler-generated
specializations of the same template is in its own information-base related to
types, not in the resulting machine-code, especially in the resulting
machine-code of out-of-line functions. Hence, I effectively explicitly name
this one [Duplicate instantiations at compile-time] to avoid overlap with the
design-time focus of my reason #7 below.


> 3. [Near-duplicate instantiations] Temp<T, n> and Temp<T, m> are
> individually instantiated for non-type parameters n and m, even
> though the resulting member functions are almost identical. This
> would be the case for e.g., FixedSizeBuffer<int, 10> and
> FixedSizeBuffer<int, 20>.
>
> 4. [Excessive inlining] Because many compilers require that all template
> code be in header files, all such code is inlined,


You might want to be more precise in your wording here, because the
state-of-affairs does not need to be exactly as worded. Your wording implicitly
encourages the reader to think of having only header files instead of having
both header files (e.g., with a .h/.hpp/.hxx suffix) and template function-body
files (e.g., using RogueWave's local conventions .cc suffix for #included
template function-body files as opposed to their .h and .cpp file extensions for
header-files and source-files, respectively). I think what you should write is
something to the effect of:

"Because many compilers require that all template code be made available to
the compiler at compile-time before the point-of-usage, common practice is to
put the body of template functions in files which are #included. Because of the
#include similarity to canonical header-files, some people consider these
#included template-function-body file-content to be header file-content.
Because this #included template-function-body file-content which **define
implementation** are confused for header file-content which **declare
interface**, some people simply place all the function bodies within the class
declarations because this is where they typically see function-bodies in
header-files placed outside of the context of templates. Such functions
inappropriately declared in the class declaration are thus inlined
inappropriately instead of being out-of-lined in a #included
template-function-body file."

> and that makes
> executables bigger than they would be if template functions that are
> large and frequently called could be outlined.


Again, you might want to be more precise in your wording here, because such
template member-functions can in fact be out-of-lined by the aforementioned
technique. The "if [they] could be" implies (to at least some of us readers)
that they cannot be out-of-lined (when in fact they can). See RogueWave's
products for a demonstration of this portability practice.


> 5. [Excessive instantiation] All the member functions of Temp<T> are
> instantiated, even though only a few are called. (This is
> nonstandard behavior, but, at least in the past, it was a problem
> with some compilers.)
>
> 6. [Gratuitous types] Temp<T1> and Temp<T2> are both instantiated, but
> if templates didn't exist, programmers would make do with a single
> untemplatized implementation. For example, programmers instantiate
> both Stack<int> and Stack<long>, but if they lacked templates, they'd
> get by with only IntStack.
>
> Are there other causes of code bloat?


7. [Improper factoring out a type-parameterized interface at design-time]
As discussed by Stroustrup in _D&E_, parameterized-type-based
nontrivial-software should admit that there are two layers: 1) the type-safe
public-interface layer which uses templates for type-safety ricocheting off to
2) some inner layer which does *not* get expanded over & over & over again which
is coded without any (or as much) type-safety by a wise & informed elite who
strictly promise to not abuse the lack of type-safety (or at least not abuse the
lack of type-safety in the inner layer to the point that a bug is observable
when using the public type-parameterized interface).


> Of the different types of code
> bloat, which are the most troublesome in practice?


Over the years, I qualitatively feel that #1, #4, #5, and #7 are the most
frequent causes of bloat that I have personally observed. But then again my
presence might be tainting the data because whereever I am, all people work with
me know that I am vigilant & insistent about minding the store when it comes to
template code-bloat. These four are the topics 1) which I can do little about
without being an author of a compiler or 2) on which I can apply firm
enforcement pressure at the right time.

Although #4 might be frequently causing problems, it can easily be stamped out with education & discipline (and strictly-enforced coding
conventions). (And I do stamp it out.)

Although #7 might be frequently causing problems, it is hard to teach every last person how to design software perfectly.


Although #2 might in fact be frequently causing problems which I have never
thought about enough to notice, I suspect that all code-bloat situations in #2
could be considered deficiencies at design-time regarding too little factoring
out of the type-parameterized layer from some inner layer instead of expecting
the compiler to be extra clever. (Or equivalently for #2 the ball ought to be
in the C++ programmers'/engineers' court, not in the compiler vendors' court.)

This leaves me with #1 and #5 on my somebody-ought-to-do-something-about-that
list for each & every C++ compilation environment, especially in these days of
heavy use of STL and ever-increasing dependence on templates in the Loki & Boost
libraries. Sun's SparcWorks/Workshop/Forte C++ compiler has solved #1 very well
for some time now (except for the fact that Forte's build-avoidance capabilities
of Forte's template-engine fights with the extensive build-avoidance
capabilities within ClearCase's clearmake). Historically GNU g++ has suffered
especially harshly from #1, but I have not checked in lately on how recent
revisions g++ are doing in that regard.

Scott Meyers

unread,
May 17, 2002, 6:57:06 AM5/17/02
to
On 16 May 2002 19:24:59 -0400, Francis Glassborow wrote:
> As I hope is clear, there is little reason for code bloat if those
> designing and implementing templates are competent and the compiler is a
> high quality product. Poor compilers, whether C or C++ or any other
> language can generate excessive code, and poor programmers can write
> code that will generate much more code than is necessary.

Uh huh. I'm well aware of the C++ "blame the victim" party line on code
bloat. But that's not what I asked about. This is what I asked:

Are there other causes of code bloat? Of the different types of code
bloat, which are the most troublesome in practice?

IME, many people -- MANY people -- avoid templates because of concerns
about code bloat. Many of these many have first hand experience with code
bloat. Denying the existence of the problem doesn't help them any.
Telling them it is all in their heads doesn't help any. Telling them to
rewrite the libraries they must use does not help them any. Telling them
their compilers are broken does not help them any.

If we want to help them, we must first understand what they mean when they
complain about "code bloat." That's why I want to know (1) if I've
overlooked any meanings and (2) which of the many meanings are the most
important.

Scott

JKB

unread,
May 17, 2002, 7:01:49 AM5/17/02
to

> Scott Meyers <sme...@aristeia.com> writes
> >I often hear about "code bloat" arising from templates. >
> 4. [Excessive inlining] Because many compilers require that all
template
> > code be in header files, all such code is inlined, and that makes
> > executables bigger than they would be if template functions that
are
> > large and frequently called could be outlined.

>"Francis Glassborow" <francis.g...@ntlworld.com> wrote


> Yes, and this is a real problem when programmers actually stuff the
> implementation into the class template definition. This is one reason
> why I advocate that the implementation should always be in its own file
> even if you then #include it into the definition file.


But compilers are always free to not inline a call, even if the function
body is available. See also 'register'. Is there a history of compilers
making appallingly bad decisions on this?

And why would you break implementation into a separate file that's #included
back in? That makes it all the same translation unit, and the compiler
doesn't care what source file the code comes from. Moving function bodies
outside the class declaration can help readability, but I don't see where it
affects language semantics at all.

-- jkb

Francis Glassborow

unread,
May 17, 2002, 6:24:55 PM5/17/02
to
In article <MPG.174e391de...@news.hevanet.com>, Scott Meyers
<sme...@aristeia.com> writes

>On 16 May 2002 19:24:59 -0400, Francis Glassborow wrote:
> > As I hope is clear, there is little reason for code bloat if those
> > designing and implementing templates are competent and the compiler is a
> > high quality product. Poor compilers, whether C or C++ or any other
> > language can generate excessive code, and poor programmers can write
> > code that will generate much more code than is necessary.
>
>Uh huh. I'm well aware of the C++ "blame the victim" party line on code
>bloat. But that's not what I asked about. This is what I asked:
>
> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?
However I am not alone in responding to your list. I think it is
important to distinguish between inherent problems within the language
(the case of instantiating a template for int and long when they are the
same size could be such a case) and problems brought about by failure to
understand the consequences of particular code. Designing good templates
is a highly skilled task and should be appreciated as such. Producing
compilers that meet the requirements of the Standard is also a skilled
task. Adding in good optimisation is what sets a compiler ahead of
others that otherwise correctly implement the language. Blaming the
language because compilers do not do what is required (only compile
'used' member functions from class templates etc.) is unfair.

The point that needs to be made is that in almost all cases where
templates are accused of creating code bloat the fault is either a
compiler that fails in its responsibilities, or a programmer who is
working beyond their skills.

If the myth that templates inherently cause code bloat is not laid to
rest, people will continue to just accept poor products, and poor
workers. Providing a list such as the one you have is a service exactly
because we can identify how to address those problems if they occur in
the work environment.

I am quite willing to accept blame (as part of WG 21) for things we got
wrong but I am not willing to accept blame for people using poor tools
or failing to recognise the inadequacies of their skills. Until those
responsible accept fault we can do little to improve the situation.

(Note that that is why I make no apology for writing highly critical if
brief reviews of the numerous books aimed at novice and just post-novice
books that fail to present good technique.)

>
>IME, many people -- MANY people -- avoid templates because of concerns
>about code bloat. Many of these many have first hand experience with code
>bloat. Denying the existence of the problem doesn't help them any.
>Telling them it is all in their heads doesn't help any. Telling them to
>rewrite the libraries they must use does not help them any. Telling them
>their compilers are broken does not help them any.

But telling them that
1) There are better idioms that solve many of the problems
2) Modern compilers do much better than the ones they had even three
years ago might.

I have no problem with the company that forbade the use of templates
five years ago. I have serious problems with those that do not
reconsider that position on an annual basis. Internal coding standards
need re-examination, not in their entirety but where-ever decisions have
been made on a practical basis of the tools being currently used.

Telling people that the problem with most code bloat is a quality issue
that cannot be fixed by the language but can be fixed by better tools
and better training admits the problem and tells how to solve it.
Blaming the language designers only helps where the problem really is
poor language design. Actually I think quite a bit of the problem with
templates is the opaqueness of the required syntax, and that we should
try to address that.

>
>If we want to help them, we must first understand what they mean when they
>complain about "code bloat." That's why I want to know (1) if I've
>overlooked any meanings and (2) which of the many meanings are the most
>important.

OK, but the motive was not clear from your post.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Francis Glassborow

unread,
May 17, 2002, 6:25:13 PM5/17/02
to
In article <ue8hk7l...@corp.supernews.com>, JKB
<burr...@seanet.com> writes

Really? When I provide in class definitions they are exactly as if I had
provided an out of class definition and prefixed it with inline. But
there is no way that I can provide an in class definition and require
that it not be inline.

By separating template class definition and implementation I

1) ensure that inline becomes explicit and is restricted to those places
I want to give that advice to the compiler.

2) That I can use other instantiation mechanisms such as explicit
instantiation and 'export' (even though I am less than enthusiastic
about that)

3) I also give the option of compiling a TU against just the template
definition until such time as I wish to link the whole programme to
produce an executable.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Ivan Vecerina

unread,
May 17, 2002, 6:26:42 PM5/17/02
to
"Scott Meyers" <sme...@aristeia.com> wrote in message
news:MPG.174cf495d...@news.hevanet.com...
> I often hear about "code bloat" arising from templates. I've been trying
> to figure out exactly what this means. From what I can tell, there are
> several different meanings, as follows, where Temp is a template and T,
T1,
> and T2 are type parameters.

.... (not much to add to the list, except that item 1 isn't supposed to
happen and should be rare) ...

> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?

Item 1 would be my guess, as I suspect that this is what slows down
compilation in the environment I use (see my first comment below).

I only have two related thoughts:

- I often hear 'code bloat' used with an altered meaning, which could be
called 'compile-time bloat': many instantiations of templates are made in
every header that uses a file, which leads to large 'obj' files and slows
compile & link time, even if the linker will strip them off the executable.

- how many of the actual code bloat problems whould be discarded by a smart
linker that would merge identical code sections ?
(of course, according to the standard functions whose address is used could
not be merged, but thet could be converted into forwarding stubs?).

Regards.

--
Ivan Vecerina, Dr. med. <> http://www.post1.com/~ivec
Soft Dev Manger, XiTact <> http://www.xitact.com
Brainbench MVP for C++ <> http://www.brainbench.com

Vincent Finn

unread,
May 17, 2002, 6:26:59 PM5/17/02
to
> > 4. [Excessive inlining] Because many compilers require that all template
> > code be in header files, all such code is inlined, and that makes
> > executables bigger than they would be if template functions that are
> > large and frequently called could be outlined.
>
> Yes, and this is a real problem when programmers actually stuff the
> implementation into the class template definition. This is one reason
> why I advocate that the implementation should always be in its own file
> even if you then #include it into the definition file.

But surely if you #include it in the header file there is no real difference
Is there any actual benefit ?

I did use the #inlcude for a while but went back to putting the whole
code in the header because people found it more confusing the other way !

Ernest Friedman-Hill

unread,
May 17, 2002, 8:54:54 PM5/17/02
to
Scott Meyers wrote:
> 1. [Multiple instantiations] Temp<T> is instantiated in more than one
> translation unit, so when the objs are linked together, the exe has
> more than one copy of Temp<T>'s member functions.

This one used to be a killer, but compilers have gotten *much* better.
I was on a project about 9 years ago that tried to make heavy use of
templates with g++ (version 1.7. something, at the time, I think?) Anyway,
the executable was growing out of control and link times were rising
exponentially. We were building 25 megabyte executables by the time we
figured out how to manage this with explicit instantiations -- then it
shrank by a factor of 25 or so.

---------------------------------------------------------
Ernest Friedman-Hill
Distributed Systems Research Phone: (925) 294-2154
Sandia National Labs FAX: (925) 294-2234
Org. 8920, MS 9012 ejf...@ca.sandia.gov
PO Box 969 http://herzberg.ca.sandia.gov
Livermore, CA 94550

Bob Archer

unread,
May 17, 2002, 8:55:16 PM5/17/02
to
In article <ue8hk7l...@corp.supernews.com>, burr...@seanet.com
says...

> And why would you break implementation into a separate file that's #included
> back in? That makes it all the same translation unit, and the compiler
> doesn't care what source file the code comes from. Moving function bodies
> outside the class declaration can help readability, but I don't see where it
> affects language semantics at all.

We split implementation into a separate file because it gave us more
flexibility, particularly when swapping code between compilers that all
had a slightly different set of rules for template instantiation. We
could selectively include the implementation depending on the exact
circumstances (which compiler, was this part of an explicit template
instantiation file etc.)

We also separated out inline functions into a separate file. In fact we
had seven different file types:

..h Header files for template and non-template classes and functions
..cpp Implementation for non-inlined non-template classes and functions
..ipp Implementation for inlined non-template classes and functions
..ctf Implementation for non-inlined template functions
..itf Implementation for inlined template functions
..ctp Implementation for non-inlined template classes
..itp Implementation for inlined template classes

This gave us maximum flexibility for different compilers, different
builds (the release build inlined things, the debug build didn't) and
different template instantiation policies.

It was a pain to set up and maintain but was probably worth it in the
end.

Bob

Greg Milford

unread,
May 17, 2002, 8:56:10 PM5/17/02
to
> 2. [Duplicate instantiations] Temp<T1> and Temp<T2> could be share a
> single underlying binary implementation, but they don't. The most
> common example is when T1 and T2 are both pointer types, but it
would
> also apply if T1=int and T2=long on a machine where int and long
are
> the same size.
>

I have wondered since being faced with this problem why STL container
implementations did not come with partial template specializations that
provide the single void* implementation for pointer types. This would seem
to be a natural feature for embedded software, but end users rarely will
attempt to add specializations to library code since it locks you in to that
version. A quick look at our project shows that we have 600+ instantiations
of vectors containing pointer types. Throw in the other containers and the
code generated here really adds up against our limitied flash space.
Perhaps library writers are waiting for compilers to make this optimization
and visa versa.

> 3. [Near-duplicate instantiations] Temp<T, n> and Temp<T, m> are
> individually instantiated for non-type parameters n and m, even
> though the resulting member functions are almost identical. This
> would be the case for e.g., FixedSizeBuffer<int, 10> and
> FixedSizeBuffer<int, 20>.
>
> 4. [Excessive inlining] Because many compilers require that all
template
> code be in header files, all such code is inlined, and that makes
> executables bigger than they would be if template functions that
are
> large and frequently called could be outlined.
>

Explicitly instantiating the templates (by function) in separate compilation
units is our workaround for this. Compile time is also dramatically
reduced.

> 5. [Excessive instantiation] All the member functions of Temp<T> are
> instantiated, even though only a few are called. (This is
> nonstandard behavior, but, at least in the past, it was a problem
> with some compilers.)
>

While our compiler is somewhat dated, what we saw was that explicit
instantiation by class cause this version of code bloat. The linker would
not remove all unused function in this case. It should not IMO do this as
default behavior, but an option for it would be nice.

> 6. [Gratuitous types] Temp<T1> and Temp<T2> are both instantiated, but
> if templates didn't exist, programmers would make do with a single
> untemplatized implementation. For example, programmers instantiate
> both Stack<int> and Stack<long>, but if they lacked templates,
they'd
> get by with only IntStack.
>

CATHLibCPP was a library for Acorn that was an attempt to aggressively hoist
template bloat away. Past posts about experience with it went unanswered,
so I assume it never reached prime time.

Thanks for giving this issue some much needed attention for those of us in
the embedded world :)

Greg

Roland Pibinger

unread,
May 17, 2002, 8:56:46 PM5/17/02
to
On 17 May 2002 06:57:06 -0400, Scott Meyers <sme...@aristeia.com>
wrote:

>This is what I asked:
>
> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?
>
>IME, many people -- MANY people -- avoid templates because of concerns
>about code bloat. Many of these many have first hand experience with code
>bloat. Denying the existence of the problem doesn't help them any.
>Telling them it is all in their heads doesn't help any. Telling them to
>rewrite the libraries they must use does not help them any. Telling them
>their compilers are broken does not help them any.
>
>If we want to help them, we must first understand what they mean when they
>complain about "code bloat." That's why I want to know (1) if I've
>overlooked any meanings and (2) which of the many meanings are the most
>important.

IMO, 'syntactical' and not 'physical' code bloat is the most important
problem WRT templates (Giovanni Bajo mentioned it before). The
template syntax added an extra level of complexity to the language.
"Modern" C++ idioms like nested classes within class templates
(iterator), explicit namespace qualifiers, typedef cascades, template
meta-programming, etc. boosted complexity further instead of
mitigating it. And the STL is obviously more appealing to computer
scientists than to the average programmer who just wanted some
uncomplicated containers.
Propagating 'lightweight' C++ might be a remedy. 'Less is more'
applies to templates, too.

Best regards,
Roland Pibinger

Tom Plunket

unread,
May 17, 2002, 8:57:39 PM5/17/02
to
Scott Meyers wrote:

> Uh huh. I'm well aware of the C++ "blame the victim" party line
> on code bloat. But that's not what I asked about. This is what
> I asked:
>
> Are there other causes of code bloat? Of the different types
> of code bloat, which are the most troublesome in practice?
>
> IME, many people -- MANY people -- avoid templates because of
> concerns about code bloat. Many of these many have first hand
> experience with code bloat.

My experience shows that most of the people concerned about code
bloat are concerned because of intuition and not experimentation.
I find when talking to people concerned about "template-generated
code bloat" that these people typically have little idea even how
to use templates much less experience in actually trying to.

> Denying the existence of the problem doesn't help them any.
> Telling them it is all in their heads doesn't help any. Telling
> them to rewrite the libraries they must use does not help them
> any. Telling them their compilers are broken does not help them
> any.

While I understand that these things do not make the perception
of a problem go away, the truth remains that templates do not
necessarily create interesting code bloat due to any number of
reasons.

> If we want to help them, we must first understand what they mean
> when they complain about "code bloat."

If we want to help them, we will tell them that their perceptions
are wrong and that a simple experiment can prove it.

MHO, at least.

-tom!

Nicola Musatti

unread,
May 17, 2002, 8:58:15 PM5/17/02
to

Daniel Miller wrote:
>
> Scott Meyers wrote:
[...]


> > 4. [Excessive inlining] Because many compilers require that all template
> > code be in header files, all such code is inlined,
>
> You might want to be more precise in your wording here, because the
> state-of-affairs does not need to be exactly as worded. Your wording implicitly
> encourages the reader to think of having only header files instead of having
> both header files (e.g., with a .h/.hpp/.hxx suffix) and template function-body
> files (e.g., using RogueWave's local conventions .cc suffix for #included
> template function-body files as opposed to their .h and .cpp file extensions for
> header-files and source-files, respectively). I think what you should write is
> something to the effect of:
>
> "Because many compilers require that all template code be made available to
> the compiler at compile-time before the point-of-usage, common practice is to
> put the body of template functions in files which are #included. Because of the
> #include similarity to canonical header-files, some people consider these
> #included template-function-body file-content to be header file-content.
> Because this #included template-function-body file-content which **define
> implementation** are confused for header file-content which **declare
> interface**, some people simply place all the function bodies within the class
> declarations because this is where they typically see function-bodies in
> header-files placed outside of the context of templates. Such functions
> inappropriately declared in the class declaration are thus inlined
> inappropriately instead of being out-of-lined in a #included
> template-function-body file."

In my experience this point, as worded above, is the single major
problem. This is due to the fact that many implementations of the
standard library make heavy use of inlining which results in an increase
of code size that is at the same time very evident and not fully under
programmer control.

[Note that I don't believe library implementors to be as naif as
described above; yet they do make the choice for the programmer].

Cheers,
Nicola Musatti

Cyril Schmidt

unread,
May 17, 2002, 8:59:14 PM5/17/02
to
Scott Meyers <sme...@aristeia.com> wrote in message news:<MPG.174cf495d...@news.hevanet.com>...
> I often hear about "code bloat" arising from templates. I've been trying
> to figure out exactly what this means.

One observation from my colleague: templates often cause the bloat of the
debug symbol table. Although it cannot be qualified as "code bloat", if
loading of your executable in a debugger takes an hour, and then every
single-step takes 10 minutes, that could be a perfect reason to avoid
templates.

Speaking of duplicate instantiations (item 2): even well-designed code
suffers from that. I have just experimented with gcc-3.0 on Sparc and
STL implementation from SGI. I instantiated std::map with key_type int and
mapped_type int, unsigned int, long, and unsigned long (all have the same
size and alignment requirements). The first instantiation added about
16K to the size of .text, subsequent instantiations added about 10K each.

If I understand the figures correctly, it means that the common (type-
-independent) part of map implementation is about 6K, while the variable
(type-dependent) part is about 10K. I believe that the SGI implementation
is one of the best map implementation at this time, so it would be hard
to make a significant improvement here.

Kind regards,

Cyril

Scott Meyers

unread,
May 18, 2002, 7:25:20 AM5/18/02
to
On 17 May 2002 05:45:16 -0400, Garry Lancaster wrote:
> In some cases the programmer can intervene. In
> the pointer case, a smart programmer can develop a
> partial specialization for all pointer types that is
> implemented using a shared void* based
> implementation. This can be somewhat more
> work than a straightforward, non-shared,
> implementation and is often slower. This is a key
> point.

Assuming the T* implementation consists of nothing more than inlined calls
to the void* implementation, why would this be slower? Or is it naive to
expect the T* wrapper to consist only of inlines?

Scott

Dietmar Kuehl

unread,
May 18, 2002, 7:36:25 AM5/18/02
to
Scott Meyers wrote:

> Are there other causes of code bloat?


The single biggest problem I have encountered with respect to code bloat
is people comparing apples to oranges: Yes, a fully type-safe program
using different containers for everything is probably bigger than a
program using a container for 'void*' or some 'CObject*' which has
*very* different properties (a typical difference apart from type-safety
is value vs. reference semantics). Make a correct comparison and compare
similar solutions to similar solutions: Multiple 'vector<T>' do not
corresponding to an array class taking a 'void*' - this is what a
'std::vector<void*>' is which in turn may be a reasonable choice to be
used in programs. You should not forget that you can parameterize
templates on their template arguments :-) If may need writing a
(generic?) proxy class to have easy access to the operations required by
the template but if a container of 'void*' is the solution, use this
solution.

There is somewhat of a "problem" that the object files created when
using templates have a tendency to be bigger - however, without impact
on the size of the resulting executable. The plot is this: The compiler
works hard with creating multiple versions of the same function in
different translation units resulting in longer compile times, bigger
excutables, longer link times and, last but not least, much faster
programs. Basically, the duplicate work is thrown [mostly] away at link
time. At this time it is worth noting that it is ill-advised to inline
all template code! *This* indeed causes code-bloat as many ill-advised
techniques to. There is no need to inline template code even if it
appears in multiple translation units: Duplicates of template functions
with external linkage have to be coped with (typically removed) by the
linker. Inline functions, however, can have static linkage and may not
be removed at link time! That is: Just because the template code is in
the header file does not mean that it has to or should be inline.

As far as I have seen, there is actually no [executable] code-bloat
(when comparing similar approaches rather than completely different
ones). There is, however, something which may be described as
"development resource bloat" which is worth considering, too: Disk space
needed on the development machines, compile and link times, etc.
Depending on what you do this can also often be reduced considerably,
however. There are quite a few templates which have a points of
variation which rarely, if ever, change. A primary example of this is
the IOStreams and locales library: Has anybody used a different
character type than 'char' and 'wchar_t'? Congratulations if you have
done: You have gathered experience with a rather hideous portions of the
standard library! (for those who have not done or tried it: you need to
write at least a 'ctype', 'codecvt' and a 'numpunct' facet; you probably
need to implement suitable character traits and you need to create an
'std::locale' object containing suitable 'ctype', 'numpunct', 'num_put',
'num_get', and 'codecvt' facets). Put differently: The code for
IOStreams and locales need not at all be in the header files! You just
use preinstantiated implementations for the standard types. For those
who *really* want to use a different character type the implementation
can provide a compile time flag, say '-D_NEED_IOSTREAM_IMPLEMENTATION',
which is either defined throught the project or in the translation unit
doing the explicit instantiations for the specific type.

Of course, this approach is not restricted to standard library
components since users can typically extend the compiler with '-D'
switches (or similar). Say, a numeric package might use a numeric type
like 'double', 'some::rational', or 'my::infinite_precision'. It is
unlikely to mix them up or come up all the times with new numeric types.
That is, although it is benefitial to templatize on the numeric type
(because the alternatives are, simply put, not viable) there is *no*
form of bloat if it is done "correctly". Sure, explicit instantiations
take a little bit more work but they often *are* a viable approach.

Of course, for a library like eg. STL it is not viable to rely on
explicit instantiation. However, the code bloat for these libraries is
typically rather low because they effectively fold down to rather
trivial operations in most cases - the development resource bloat is,
however, to stay for those. Of course, there are techniques to reduce
this kind of bloat, too. The major tool for this is fine grained
factorization of common portions of code: It is viable for template
code to call loads of trivial function (something which kills
performance eg. for dynamic polymorphism). That is, you can create
low-level abstractions. Of course, using eg. [smart] pointers to a
common base type is also a reasonable approach which should be
considered: This is basically the only choice for non-template code
short of having too much executation time (because many calls to
virtual function kill performance, not due to the actual virtual
function call but due to the lost optimization potential) and/or
development time to write the different alternatives by hand. Of
course, the template is still superior to a library having made the
choice for the base used up-front.... BTW, if you use a fixed type
to parameterize your 'std::vector<T>' you can of course create a thin
wrapper template, merely provide the member declarations in the header
and preinstantiate the use functionality again.

The problem with all kinds of template bloat is simple that people
expect to get the benefits (type-safety, often better execution time,
more flexibility, etc.) for free: This is not the case. Template do not
address your design problem. You still need to know what approach you
are supposed to take.

In summary: Code-bloat is non-problem. There is development resource
bloat with templates but even this can be addressed when the
non-template solution would be similarily suitable. Typically, the
non-template solution is not suitable for various reasons anyway so what
are you comparing in the first place?
--
<mailto:dietma...@yahoo.com> <http://www.dietmar-kuehl.de/>
Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.com/>

Chris Uzdavinis

unread,
May 18, 2002, 7:43:04 AM5/18/02
to
Scott Meyers <sme...@aristeia.com> writes:

> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?

A variation of your first category (multiple instantiations) exists
too, and that is unused or unnecessary template instantiations are
still linked into an application. Clearly a problem of the linker,
but I do see it happen.

Using class templates to calculate a value, type, or even a
compile-time assertion should not add to the resulting program's
size but often times does.

--
Chris

Nick Thurn

unread,
May 18, 2002, 11:28:31 AM5/18/02
to

"Tom Plunket" <to...@fancy.org> wrote in message
news:fudaeuoc0dq3etc37...@4ax.com...

> If we want to help them, we will tell them that their perceptions
> are wrong and that a simple experiment can prove it.
>

As Scott said "Denying the existence of the problem... ".
Please post your simple experiment.

Having scratched my head for a fair amount of time over
a Loki class that needed 5 minutes and 150Mb space
to compile a simple 10 line test program I think it's fair to
say that there those of us who are more interested
in how things actually work in practice than how they
should work or could work "in theory".

BTW making one loki method non-inline reduced
compile to 1min and 30Mb space, so go figure.

I use templates and have no idea if my code is more
or less bloated than it should be. I do know that moving
from the system linker to the gnu linker reduced my
libraries by a large factor (was it 10 or 5? I can't recall)
when I did it three years ago.

I don't think anyone knows how to use templates
to their full advantage as yet. Andrei is hard at it
and others as well but the definitive "how to" is
yet to be written. We're still in the "cool stuff"
and "gotchas" stage IMHO.

Please post your simple experiment

cheers
Nick

Francis Glassborow

unread,
May 18, 2002, 11:28:49 AM5/18/02
to
In article <Gw9uM...@news.boeing.com>, Greg Milford
<gregory....@boeing.com> writes

>While our compiler is somewhat dated, what we saw was that explicit
>instantiation by class cause this version of code bloat. The linker would
>not remove all unused function in this case. It should not IMO do this as
>default behavior, but an option for it would be nice.

I do not know what options you had available, but compiling your
explicit instantiations as a library should remove this problem.


--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Francis Glassborow

unread,
May 18, 2002, 11:29:06 AM5/18/02
to
In article <3CE4D538...@2.com>, Vincent Finn
<1...@2.com.cos.agilent.com> writes

>> Yes, and this is a real problem when programmers actually stuff the
>> implementation into the class template definition. This is one reason
>> why I advocate that the implementation should always be in its own file
>> even if you then #include it into the definition file.
>
>But surely if you #include it in the header file there is no real difference
>Is there any actual benefit ?

Yes there is. The most important difference is that it prevents
programmers from writing in class implementations of member functions of
templates. Such code is implicitly inline, and therefore you are
unnecessarily relying on the compiler to not inline the code.

>
>I did use the #inlcude for a while but went back to putting the whole
>code in the header because people found it more confusing the other way !

And do you do the same for non template classes?


--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Joshua Lehrer

unread,
May 19, 2002, 4:46:18 PM5/19/02
to
Scott Meyers <sme...@aristeia.com> wrote in message news:<MPG.174f5f1c3...@news.hevanet.com>...

> On 17 May 2002 05:45:16 -0400, Garry Lancaster wrote:
> > In some cases the programmer can intervene. In
> > the pointer case, a smart programmer can develop a
> > partial specialization for all pointer types that is
> > implemented using a shared void* based
> > implementation. This can be somewhat more
> > work than a straightforward, non-shared,
> > implementation and is often slower. This is a key
> > point.
>
> Assuming the T* implementation consists of nothing more than inlined calls
> to the void* implementation, why would this be slower? Or is it naive to
> expect the T* wrapper to consist only of inlines?
>

Our solution here is:

1- we have a class, call it Array<>, that is our base Array class that
we use everywhere.
2- we then specialized Array<T*> to be implemented in terms of a
wrapper around void*. (it can't be implemented in terms of void*, this
would be self referential)

We then have a rule - the specialization may only contain inlined
forwarding functions that forward to the base implementation, and may
only include casts to do this. In this way, the interface of Array<T>
is identical to that of Array<T*>. Finally, it is equally as
efficient as all of the methods are forwarding functions, forwarding
copies of pointers, and casting return values/out parameters, which
are pointers as well.

So, no, I do not believe that it is naive to believe that this can be
done with only casts and inlined forwarding functions, as this is
exactly what we have done.

Finally, while I was writing this, I realized that we only need to
cast OUT parameters, as IN parameters will upcast implicitly (SUB*
will cast to BASE*).

here is an example:

from base class:

const T& operator[](int i) const;


from specialization:

const SubType& operator[](int i) const { return (const
SubType&)inherited::operator[](i); }


joshua lehrer
brown university, 1996 (yes, you lectured to me, Scott. "should this
be const?")
factset research systems

Francis Glassborow

unread,
May 20, 2002, 10:53:34 AM5/20/02
to
In article <mjolnir_DELETE_-F4...@newsfeed.slurp.net>,
Adin Hunter Baber <mjolnir...@soltec.net> writes
>Could you please clarify what you mean by "explicit instantions" ?
>Including a short piece of example code would be helpful.

in example.h

template<typename T> class X {
// whatever
};

in int_example.cpp
(actually it would often be better to make that
int_example.lib, assuming your compiler reacts
to extension names)

template X<int>;

The above forces the compiler to instantiate X for an int. The big
problem being that it will try to instantiate all members of X, and some
-- that you do not intend to use for an int -- may not be instantiable.
It would help if the language just required that those be skipped, or if
compilers issued diagnostics and then skipped them. It is also possible
to explicitly instantiate member functions on a one by one basis but
that is often tedious and computers should do the tedious.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Garry Lancaster

unread,
May 20, 2002, 11:07:45 AM5/20/02
to
Garry Lancaster wrote:
> > In some cases the programmer can intervene. In
> > the pointer case, a smart programmer can develop a
> > partial specialization for all pointer types that is
> > implemented using a shared void* based
> > implementation. This can be somewhat more
> > work than a straightforward, non-shared,
> > implementation and is often slower. This is a key
> > point.

Scott Meyers:


> Assuming the T* implementation consists of nothing
> more than inlined calls to the void* implementation,
> why would this be slower?

If the only difference between the simple implementation
and a shared void* based implementation is some extra
inlined forwarding calls containing a bit of casting and
you have a decent optimizer it should come out the same
speed [see note 1].

> Or is it naive to expect
> the T* wrapper to consist only of inlines?

No, I don't think that is naive at all. However, it is
important that at least parts of the rest of it - the
void* part - are *not* inlined, in order to benefit from
bloat reduction. So, reduced inlining usually makes
a speed difference.

With more complicated templates (e.g. ones that
actually store T rather than just T*) the refactoring to
a void*-based implementation may require an extra
level of indirection, which is another reason why we
would expect it to be slower.

These are my general rules of thumb, but compilers
have the ability to surprise. The only real way of
knowing whether implementation A is faster or
smaller than implementation B for a particular
compiler is to measure it.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

NOTES:

1. The standard doesn't actually require that the bit
patterns of T* and void* be identical, so in theory
the conversions between the two types could add an
extra overhead. However, I'm not personally familiar
with any platforms that have a C++ compiler where
the bit patterns *are* different.

Vincent Finn

unread,
May 20, 2002, 11:08:21 AM5/20/02
to
Francis Glassborow wrote:

> In article <3CE4D538...@2.com>, Vincent Finn
> <1...@2.com.cos.agilent.com> writes
> >> Yes, and this is a real problem when programmers actually stuff the
> >> implementation into the class template definition. This is one reason
> >> why I advocate that the implementation should always be in its own file
> >> even if you then #include it into the definition file.
> >
> >But surely if you #include it in the header file there is no real difference
> >Is there any actual benefit ?
>
> Yes there is. The most important difference is that it prevents
> programmers from writing in class implementations of member functions of
> templates. Such code is implicitly inline, and therefore you are
> unnecessarily relying on the compiler to not inline the code.
>
> >
> >I did use the #inlcude for a while but went back to putting the whole
> >code in the header because people found it more confusing the other way !
>
> And do you do the same for non template classes?

I do, of course

It was the fact that the .cpp file for templates is not compiled that was
confusing
People tried to compile the .cpp as they would with a normal code file and it
wouldn't compile
(the problem was then exacerbated by the fact that VC seems to have a small bug
to do with
files that are excluded from the compile, if you try and compile them you will
get errors from compiling the
project until you shut the workspace down and open it again !)

I was not aware that the inlining was changed by moving it so I may return to
that practice

Vin

Francis Glassborow

unread,
May 20, 2002, 1:55:33 PM5/20/02
to
In article <3CE8C129...@2.com>, Vincent Finn
<1...@2.com.cos.agilent.com> writes

>I was not aware that the inlining was changed by moving it so I may return to
>that practice

Let me be clear:

template <typename T> class X {
// whatever

public:
void complicated_function(/* parameters*/){
// code
}
};

X::complicated function is implicitly inline and it is up to the good
sense of the compiler whether it notes your unintended hint. However:

template <typename T> class X {
// whatever

public:
void complicated_function(/* parameters*/);
};

template <typename T>
void X<T>::complicated_function(/* parameters*/){
// code
}

Is not inline unless you say so. However there is an added advantage in
not stuffing that code automatically into the header for X. You have an
option to use explicit instantiation by having a file containing, for
example,

#include "x.h"
#include "x.impl"
template X<int>;

And calling that file something like int_X.lib may even result in the
executable only including the functions you call even without a smart
linker.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Arnold the Aardvark

unread,
May 20, 2002, 1:56:21 PM5/20/02
to
"Vincent Finn" <1...@2.com.cos.agilent.com>

> It was the fact that the .cpp file for templates is not compiled that was
> confusing

It is quite common to put the template implementation in a file called
.ipp or similar, to avoid the confusion you mention. For my own
part, I have a tendency to implicitly inline one-liners, but
implement larger functions below the class definition i.e. in the header
but not inline. I believe moving these to an .ipp file might improve my
code a little, but I haven't written any large/complicated templates
yet so the gain would be small.

Won't 'export' make all this go away, anyway (eventually)?


Arnold the Aardvark

Daniel Miller

unread,
May 20, 2002, 8:54:24 PM5/20/02
to
Vincent Finn wrote:

> Francis Glassborow wrote:
>
>
>>In article <3CE4D538...@2.com>, Vincent Finn
>><1...@2.com.cos.agilent.com> writes
>>
>>>>Yes, and this is a real problem when programmers actually stuff the
>>>>implementation into the class template definition. This is one reason
>>>>why I advocate that the implementation should always be in its own file
>>>>even if you then #include it into the definition file.
>>>>
>>>But surely if you #include it in the header file there is no real difference
>>>Is there any actual benefit ?
>>>
>>Yes there is. The most important difference is that it prevents
>>programmers from writing in class implementations of member functions of
>>templates. Such code is implicitly inline, and therefore you are
>>unnecessarily relying on the compiler to not inline the code.
>>
>>
>>>I did use the #inlcude for a while but went back to putting the whole
>>>code in the header because people found it more confusing the other way !
>>>
>>And do you do the same for non template classes?
>>
>
> I do, of course
>
> It was the fact that the .cpp file for templates is not compiled that was
> confusing
> People tried to compile the .cpp as they would with a normal code file and it
> wouldn't compile

[...snip...]


Use a different file-extension as RogueWave does: .cc versus .cpp (or
equivalent).

RogueWave uses .cpp (in a "source" directory) for function-definition/et.al.
files which are intended for canonical compilation of out-of-line
function-definitions and of static-data definitions to object-files.

RogueWave uses .cc (in an "include" directory) for function-definition/et.al.
files which are intended for #including to make template function-definitions &
static-data defintions known to compiler prior to the point of
usage/specialization/instantiation/expansion if being built on a platform which
requires (or prefers) compile-time expansion of templates.

The difference of file-extension (plus the difference of directory) makes
these categories of files quite clearly separated to even the casual observer.
Header files retain their traditional role of containing declarations (where
class-definition is a category of declaration).

Francis Glassborow

unread,
May 20, 2002, 8:56:48 PM5/20/02
to
In article <1021909150.18263....@news.demon.co.uk>,
Arnold the Aardvark <aard...@notthistubulidentata.demon.co.uk> writes

>"Vincent Finn" <1...@2.com.cos.agilent.com>
>
>> It was the fact that the .cpp file for templates is not compiled that was
>> confusing
>
>It is quite common to put the template implementation in a file called
>.ipp or similar, to avoid the confusion you mention. For my own
>part, I have a tendency to implicitly inline one-liners, but
>implement larger functions below the class definition i.e. in the header
>but not inline. I believe moving these to an .ipp file might improve my
>code a little, but I haven't written any large/complicated templates
>yet so the gain would be small.
>
>Won't 'export' make all this go away, anyway (eventually)?

Some hope so. But if you are not writing separate implementation files,
export will do nothing to help you.


--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Stefan Heinzmann

unread,
May 21, 2002, 9:11:26 AM5/21/02
to
Scott Meyers <sme...@aristeia.com> wrote in message news:<MPG.174cf495d...@news.hevanet.com>...
> I often hear about "code bloat" arising from templates. I've been trying
> to figure out exactly what this means. From what I can tell, there are
> several different meanings, as follows, where Temp is a template and T, T1,
> and T2 are type parameters.
>
> 1. [Multiple instantiations] Temp<T> is instantiated in more than one
> translation unit, so when the objs are linked together, the exe has
> more than one copy of Temp<T>'s member functions.
[...]

> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?

In a statically linked executable, the above point should be solved
with a reasonable toolset, as others have said. I am astonished
however that noone has yet mentioned that this can indeed be a problem
in dynamically loaded libraries or plug-in modules. Since they're
compiled and linked separately, they will of course contain their own
template instantiations.

For example if my application consists of a number of separate modules
which are loaded dynamically (not uncommon in Windows-land), and I use
the standard library a lot (streams, strings and the like), I will
likely have instantiations of the same stuff in each module.

This can of course be circumvented by putting the commonly used
instantiations into a shared library, but it is not as easy as you may
think.

Cheers
Stefan

Vincent Finn

unread,
May 21, 2002, 2:03:38 PM5/21/02
to
Arnold the Aardvark wrote:

> "Vincent Finn" <1...@2.com.cos.agilent.com>
>
> > It was the fact that the .cpp file for templates is not compiled that was
> > confusing
>
> It is quite common to put the template implementation in a file called
> .ipp or similar, to avoid the confusion you mention. For my own
> part, I have a tendency to implicitly inline one-liners, but
> implement larger functions below the class definition i.e. in the header
> but not inline. I believe moving these to an .ipp file might improve my
> code a little, but I haven't written any large/complicated templates
> yet so the gain would be small.

you use '.ipp'
Francis example uses '.impl'
and Daniel suggests '.cc'

Is there any accepted naming for this file ?

Vin

Francis Glassborow

unread,
May 22, 2002, 5:25:23 AM5/22/02
to
In article <95e0e5ef.02052...@posting.google.com>, Stefan
Heinzmann <stefan_h...@yahoo.com> writes

>In a statically linked executable, the above point should be solved
>with a reasonable toolset, as others have said. I am astonished
>however that noone has yet mentioned that this can indeed be a problem
>in dynamically loaded libraries or plug-in modules. Since they're
>compiled and linked separately, they will of course contain their own
>template instantiations.
>
>For example if my application consists of a number of separate modules
>which are loaded dynamically (not uncommon in Windows-land), and I use
>the standard library a lot (streams, strings and the like), I will
>likely have instantiations of the same stuff in each module.

But that is not a template issue, it is an issue with dynamically linked
libraries.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Scott Meyers

unread,
May 22, 2002, 5:27:18 AM5/22/02
to
[ I'm posting this as a favor for the writer, who sent it to me via
email. ]

I posted this reply to comp.lang.c++.moderated, but it didn't appear
so either my newsreader screwed up (likely, since Outlook sucks) or
the moderators didn't approve it (in which case I never got it in
return because I posted with a spam-protected email address).

Anyway, regardless, I thought I'd send it directly to you as it
has a slightly different take on "code bloat" than you perhaps
intended, but still an example of what we consider "code bloat"
to be.

{If you are having problems posting, you might want to try
google. -- mod}

--- snip ---

"Scott Meyers" <sme...@aristeia.com> wrote in message
news:MPG.174cf495d...@news.hevanet.com...
>I often hear about "code bloat" arising from templates. I've been trying
>to figure out exactly what this means. From what I can tell, there are
>several different meanings, as follows, where Temp is a template and T,
T1,
>and T2 are type parameters.

>[...]

I'm working with extremely performance sensitive code all day long
and the code bloat I'm experiencing is primarily due to abstraction
and aliasing, but as in some cases templates 'cause' the abstraction
(which in term 'cause' the aliasing) you could certainly blaim the
templates in those cases.

I've tried to boil down the problem to a very simple example to
illustrate it. Consider the problem of maintaining a buffer for DMA
data. In C++ you would perhaps write this along the lines of:

class DmaBuffer {
public:
DmaBuffer(int size) { pBufStart = pBuf = new int[size]; }
~DmaBuffer() { delete pBuf; }
int *GetBuffer() { return pBufStart; }
void AddQWord(int a, int b, int c, int d) {
pBuf[0] = a;
pBuf[1] = b;
pBuf[2] = c;
pBuf[3] = d;
pBuf += 4;
}
private:
int *pBufStart, *pBuf;
};


Its (simplified) use might look something like:


DmaBuffer dmaBuffer(100000);

void TestDmaBuffer1(const int *p)
{
for (int i = 0; i < 1000; i++) {
dmaBuffer.AddQWord(p[0],p[1],p[2],p[3]);
p += 4;
}
}

This results in the following code on a certain MIPS platform using
gcc:

TestDmaBuffer1(int *)
001000C8 0080402D dmove t0,a0
001000CC 240903E7 addiu t1,zero,0x3E7
001000D0 8F828114 lw v0,0x8114(gp) ; loads pBuf
001000D4 2529FFFF addiu t1,t1,0xFFFF
001000D8 8D030000 lw v1,0x0000(t0)
001000DC 8D060004 lw a2,0x0004(t0)
001000E0 24470010 addiu a3,v0,0x10
001000E4 8D040008 lw a0,0x0008(t0)
001000E8 8D05000C lw a1,0x000C(t0)
001000EC AC430000 sw v1,0x0000(v0)
001000F0 AC460004 sw a2,0x0004(v0)
001000F4 AC440008 sw a0,0x0008(v0)
001000F8 AC45000C sw a1,0x000C(v0)
001000FC AF878114 sw a3,0x8114(gp) ; stores pBuf
00100100 0521FFF3 bgez t1,0x001000D0
00100104 25080010 addiu t0,t0,0x10
00100108 03E00008 jr ra
0010010C 00000000 nop

What's wrong with that you say? Why? Because of aliasing issues the
compiler is updating pBuf within the loop, as indicated by the
added comments.

But that's silly. As the programmers we know that in this case the
buffer and the buffer object can never alias. So, we try declaring
each and every pointer as "restrict" using the restrict extension
that gcc provides, in hope that gcc can do better with it. But it
doesn't -- it generates exactly the same code. Blah.

So can you get rid of it? Yes, by giving up on the abstraction and
writing the code in a straightforward, C-style, no-nonsense version:

int dmaBuffer2[100000];
void TestDmaBuffer2(int *p)
{
int *dst = &dmaBuffer2[0];
for (int i = 0; i < 1000; i++) {
dst[0] = p[0];
dst[1] = p[1];
dst[2] = p[2];
dst[3] = p[3];
dst += 4;
p += 4;
}
}

Then, as we've effectively circumvented the aliasing issue, we
finally get rid of that annoying and unnecessary code:

TestDmaBuffer2(int *)
00100110 3C020012 lui v0,0x12
00100114 240603E7 addiu a2,zero,0x3E7
00100118 2445E110 addiu a1,v0,0xE110
0010011C 00000000 nop
00100120 8C830000 lw v1,0x0000(a0)
00100124 24C6FFFF addiu a2,a2,0xFFFF
00100128 ACA30000 sw v1,0x0000(a1)
0010012C 8C820004 lw v0,0x0004(a0)
00100130 ACA20004 sw v0,0x0004(a1)
00100134 8C830008 lw v1,0x0008(a0)
00100138 ACA30008 sw v1,0x0008(a1)
0010013C 8C82000C lw v0,0x000C(a0)
00100140 24840010 addiu a0,a0,0x10
00100144 ACA2000C sw v0,0x000C(a1)
00100148 04C1FFF5 bgez a2,0x00100120
0010014C 24A50010 addiu a1,a1,0x10
00100150 03E00008 jr ra
00100154 00000000 nop


To return to the issue at hand, now picture that first class no
longer being just a simple DmaBuffer, but a templatized version of
a stack, a list or some other data structure that contains a pointer
and therefore suffers the exact same aliasing problem.

Now you're suddenly having this issue everywhere in your code that
you're using that templatized class, whatever the instantiated type
is, and whenever that pointer is updated -- generally when you add
or remove values from your class.

But wait, this is C++ so you've implemented iterators in terms of
inline functions over your data structure as well. Well, depending
on how you did this, now your iterator pointer probably suffers
from exactly the same problem. So you don't even have to update the
data structure to see code bloat and slowdown, you just have to
iterate over it!

But wait again! It's actually much much worse than this, because the
aliasing issue compounds, so that the one pointer update the compiler
wasn't able to remove now causes an aliasing issue with something
else that really should have been removed as well, etc.


Of course, this really is a "well-known" problem (well, not as well-
known as it should be) -- it's really the C++ "abstraction penalty"
problem, see eg:

http://www.acl.lanl.gov/Pooma96/abstracts/robison.html

That gcc has a serious problem with it, despite decent results on the
Stepanov benchmark, is also known to some:

http://gcc.gnu.org/ml/gcc/2000-11/msg00323.html

Expression templates can address abstraction penalty problems in certain
situations, but certainly not all of them. What's worse, expression
templates IMO constitute write-only code and therefore have a limited
viability.


Overall I consider this a very real and very important issue.
Unfortunately, there are lots of people carelessly brushing it aside --
sadly most of them never having heard of the term "abstraction penalty"
in the first place, less read up on it and explored its effects first-
hand.

--
Christer Ericson
Senior principal programmer
Sony Computer Entertainment, Santa Monica

Francis Glassborow

unread,
May 22, 2002, 9:14:16 AM5/22/02
to
In article <MPG.1754252d7...@news.hevanet.com>, Scott Meyers
<sme...@aristeia.com> (actually it was Christer Ericson)writes

>I'm working with extremely performance sensitive code all day long
>and the code bloat I'm experiencing is primarily due to abstraction
>and aliasing, but as in some cases templates 'cause' the abstraction
>(which in term 'cause' the aliasing) you could certainly blaim the
>templates in those cases.

>I've tried to boil down the problem to a very simple example to
>illustrate it. Consider the problem of maintaining a buffer for DMA
>data. In C++ you would perhaps write this along the lines of:

>class DmaBuffer {
>public:
> DmaBuffer(int size) { pBufStart = pBuf = new int[size]; }

Stylistically I would prefer:
DmaBuffer(int size):pBufStart(new int[size], pBuf(pBufStart) {}

> ~DmaBuffer() { delete pBuf; }

that is undefined behaviour. I think you meant pBufStart. And you should
also have written delete[]

> int *GetBuffer() { return pBufStart; }
> void AddQWord(int a, int b, int c, int d) {
> pBuf[0] = a;
> pBuf[1] = b;
> pBuf[2] = c;
> pBuf[3] = d;
> pBuf += 4;

Now here, where you exit the function the compiler inserts a sequence
point at which stage it ensures that all side effects are complete. It
certainly needs to update at the very least cached value of bBuf because
the function cannot know what you will do next and so needs to clear the
register. Note that this will be in the function implementation where it
knows nothing of the calling context. Well, as you have written this in
class it might inline the call, on the other hand it might not.


> }
>private:
> int *pBufStart, *pBuf;
>};


>DmaBuffer dmaBuffer(100000);

But that is a matter for how good the optimiser is when calling an
inlined function in a loop. That is not a language issue.


>But that's silly. As the programmers we know that in this case the
>buffer and the buffer object can never alias. So, we try declaring
>each and every pointer as "restrict" using the restrict extension
>that gcc provides, in hope that gcc can do better with it. But it
>doesn't -- it generates exactly the same code. Blah.

Of course, because how does restrict help when you are working across
invocations of the function?

>So can you get rid of it? Yes, by giving up on the abstraction and
>writing the code in a straightforward, C-style, no-nonsense version:

But I think you get the same problem if your C version calls a function.
The issue is not whether you use C or C++ but whether you write code
that encapsulates the copying of 4 ints to your buffer or not.
Furthermore there is a serious difference between the two pieces of code
because there is no equivalent to pBuf in your C code. That means that
you cannot track the next available location in your array. Then there
is the issue of dynamic memory v static memory, how much code you must
write if you need two buffers etc. Put simply, your C code is quite
different in intent from your C++ code.

>int dmaBuffer2[100000];
>void TestDmaBuffer2(int *p)
>{
> int *dst = &dmaBuffer2[0];
> for (int i = 0; i < 1000; i++) {
> dst[0] = p[0];
> dst[1] = p[1];
> dst[2] = p[2];
> dst[3] = p[3];
> dst += 4;
> p += 4;
> }
>}

>Then, as we've effectively circumvented the aliasing issue, we
>finally get rid of that annoying and unnecessary code:

I think all your example demonstrates is that in code that needs to be
very tight, you have to consider the cost of being general.


>To return to the issue at hand, now picture that first class no
>longer being just a simple DmaBuffer, but a templatized version of
>a stack, a list or some other data structure that contains a pointer
>and therefore suffers the exact same aliasing problem.

No, I think that all your example shows is that poor design and
inappropriate use of technology results in poor code. Its a case of
horses for courses and good programmers (and you have demonstrated that
you understand the issues, but - unfairly, IMO, - blame the language)
know that they need to watch the level of abstraction when concerned
with tight code requirements.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

James Kanze

unread,
May 22, 2002, 9:19:34 AM5/22/02
to
Dietmar Kuehl <dietma...@yahoo.com> writes:

|> Scott Meyers wrote:

|> > Are there other causes of code bloat?

|> The single biggest problem I have encountered with respect to code
|> bloat is people comparing apples to oranges: Yes, a fully
|> type-safe program using different containers for everything is
|> probably bigger than a program using a container for 'void*' or
|> some 'CObject*' which has *very* different properties (a typical
|> difference apart from type-safety is value vs. reference
|> semantics).

Finally a sensible answer.

When I first started using C++ (long before templates), I found that
my C++ programs were regularly bigger than my C programs. But when I
analysed why, the reason was always that they did more -- why offer
straight string comparison if you have a regular expression class
handy, and offering full regular expressions is no more work than
using strcmp. (And let's face it, my GB_RegExpr *is* a bit bigger
than strcmp.)

The largest singular reason for code bloat is what has disparagingly
been called featuritis. In so far as templates make it easier to
implement complex features correctly, it is responsible for code
bloat. Because if we can, we do.

Sometimes, there is a space versus time tradeoff, of course. On my
last project, I had six sets of the same contained type, each with a
different ordering criterion. The standard library's use of templates
imposed six separate implementations of "std::set< MyType const* >";
had it used inheritance for the ordering object, there would only have
been one. (In the case of this particular application, the tradeoff
was the right one.)

In the end, there's nothing you can do with templates that you cannot
do without. It's just that some things are a lot, lot harder; who
wants to maintain 20 copies of basically the same code, just because
it deals with different types (and you want the type safety, because
you want the application to work -- Dietmar is definitly right here;
comparing a correct program with one that works most of the time isn't
really a fair comparison).

|> Make a correct comparison and compare similar solutions to similar
|> solutions: Multiple 'vector<T>' do not corresponding to an array
|> class taking a 'void*' - this is what a 'std::vector<void*>' is
|> which in turn may be a reasonable choice to be used in
|> programs. You should not forget that you can parameterize
|> templates on their template arguments :-) If may need writing a
|> (generic?) proxy class to have easy access to the operations
|> required by the template but if a container of 'void*' is the
|> solution, use this solution.

That's true up to a point. On the other hand, if you have six
different vectors, each containing a smart pointer to a different
type, you really don't need six instances of the code for vector, nor
for the smart pointer, once the compiler has done the type checking.

All of the current compilers I know will give you six instances.
Let's hope that this improves in the future. An implementation could
provide a specialized instance of std::vector<void*>, then a partial
specialization on std::vector<T*> which derives from std::vector<T*>.
But you really can't, or shouldn't, expect this kind of effort from
the application programmers. Non-algorithmic optimization *should* be
the compiler's job.

|> As far as I have seen, there is actually no [executable]
|> code-bloat (when comparing similar approaches rather than
|> completely different ones).

In my example, above, if the std::set had been designed to use either
the template pattern or delegation through an abstract base class for
comparison, the code would have definitly been smaller. So there is
*some* code bloat. The question is one of significance -- I suspect
that the above case is exceptional, but the total space used by the
instantiations of std::set was still only about 1% of the total code
space (and the code was only about 10% of the size of the typical data
sets). So frankly, who cares.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique oriente objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

James Kanze

unread,
May 22, 2002, 9:21:16 AM5/22/02
to
Scott Meyers <sme...@aristeia.com> writes:

|> DmaBuffer dmaBuffer(100000);

This is a problem independant of templates, and linked to the use of
pointers. In theory, a compiler could trace the use of the pointer,
and determine that it was initially from an operator new, and that no
other aliases exist, but few compilers do.

In this case, it is also a case of really poor optimization; once
inlining has taken place, the compiler can easily see *all* of the
accesses through the relative pointers. The only problem it has to
deal with is a possible aliasing between *p and *pBufStart; there is
no code anywhere that could modify pBufStart outside of the function.

This is standard optimization technology; I have seen it in compilers
25 years ago. There is no excuse for it not being present in a
compiler today. Any simple peephole optimizer should be able to do
the trick.

I'm not familiar with your assembly, but roughly speaking, what I
would expect from the generated code is that it load both pointers and
the count in registers, and each time in the loop, copy the four
values, increment the two pointers, decrement the count, and test for
the end. A good compiler might also eliminate the count, using a
comparison with an end pointer. (In this case, what the compiler is
maintaining in memory is pBuf. The only variable with the same type,
and thus which could possibly alias it, is p. But pBuf is a member
variable, and p is a local variable, so the compiler knows that no
aliasing is possible.)

|> But that's silly. As the programmers we know that in this case the
|> buffer and the buffer object can never alias. So, we try declaring
|> each and every pointer as "restrict" using the restrict extension
|> that gcc provides, in hope that gcc can do better with it. But it
|> doesn't -- it generates exactly the same code. Blah.

That's because the problem is unrelated to any aliasing between pBuf
and the p argument of the function. Such aliasing will only become a
problem if the compiler starts shuffling code around (e.g. to keep the
various pipelines full). Such aliasing will normally only be a
problem if you use a value twice, and write through a pointer in
between uses -- there is a possibility that the write through the
pointer will modify the value, so it must be reloaded the second time.

About the only changes that restrict will allow here is for the
compiler to shuffle the loads and the stores; this might be relevant,
depending on how the pipelines were managed (but manifestly, the
compiler is far from taking such issues into account).

|> So can you get rid of it? Yes, by giving up on the abstraction and
|> writing the code in a straightforward, C-style, no-nonsense
|> version:

|> int dmaBuffer2[100000];
|> void TestDmaBuffer2(int *p)
|> {
|> int *dst = &dmaBuffer2[0];
|> for (int i = 0; i < 1000; i++) {
|> dst[0] = p[0];
|> dst[1] = p[1];
|> dst[2] = p[2];
|> dst[3] = p[3];
|> dst += 4;
|> p += 4;
|> }
|> }

In this case, the difference isn't so much the C versus C++ style, it
is a dynamically allocated buffer with a non-local pointer vs. a
statically allocated array.

Declare the dmaBuffer class to contain a large buffer, rather than
using dynamic memory, and maintain an index rather than a pointer, and
a good compiler should be able to optimize. I tried it with g++,
both 2.95.2 and 3.0.4, on my Linux PC, however, and the results were
really bad. The necessary optimization techniques are well known,
once the inlining has occured, and there is *NO* excuse for such poor
code generation.

However, one important point to consider is that we are comparing
oranges to apples -- the class implementation takes a parameter in
order to allocate dynamically the exact size needed (and thus must
work with pointers), where as the second implementation uses a
statically allocated fixed size array, rather than pointers, so the
compiler has a lot more information to work with.

|> Then, as we've effectively circumvented the aliasing issue, we
|> finally get rid of that annoying and unnecessary code:

But we've changed the semantics.

But what is the alternative? An iterator which is implemented as a
pointer, with all functions inline, will hardly cause any code bloat
over a simple pointer.

To show code bloat, you have to show what the alternative solutions
cost. For the moment, you've shown two additional machine
instructions. Hardly what I would call code bloat. (The runtime
repercussions could be important however, since those two instructions
are executed in what is presumably a tight loop.)

|> But wait again! It's actually much much worse than this, because
|> the aliasing issue compounds, so that the one pointer update the
|> compiler wasn't able to remove now causes an aliasing issue with
|> something else that really should have been removed as well, etc.

|> Of course, this really is a "well-known" problem (well, not as
|> well- known as it should be) -- it's really the C++ "abstraction
|> penalty" problem, see eg:

|> http://www.acl.lanl.gov/Pooma96/abstracts/robison.html

The problem described is a performance problem, and not a code bloat
problem.

It is also a problem which shouldn't occur in your simple example.
The problem occurs when inline functions become too complex, or call
other inline functions, to the point where the resulting code becomes
too complex for the optimizer. The problem also occurs because many
(most) compilers simply will not pass a class type in a register, even
if the class type is just a wrapper for a simple base type or pointer.

|> That gcc has a serious problem with it, despite decent results on
|> the Stepanov benchmark, is also known to some:

|> http://gcc.gnu.org/ml/gcc/2000-11/msg00323.html

Again, we are talking about different things. Code bloat isn't an
issue here, and the issues are exactly the same with or without
templates.

|> Expression templates can address abstraction penalty problems in
|> certain situations, but certainly not all of them. What's worse,
|> expression templates IMO constitute write-only code and therefore
|> have a limited viability.

The real problem with expression templates (other than the fact that
they are unmaintainable -- I agree with you there) is that they lead
to the situation described above, where the results of inlining become
too complex for the optimizer. As an experiment, I once wrote a
Matrix class whose operators returned nodes in an expression tree,
rather than temporary objects. The goal was that everything would end
up inlined, and that the compiler would generate a neat loop out of
it. In practice, one almost immediatly reached the situation where
there was too much inlining, and the compilers I had at the time (g++
and Sun CC) both gave up on even the simplest expressions.

|> Overall I consider this a very real and very important issue.
|> Unfortunately, there are lots of people carelessly brushing it
|> aside -- sadly most of them never having heard of the term
|> "abstraction penalty" in the first place, less read up on it and
|> explored its effects first- hand.

The term abstraction penalty refers to a very specific runtime (not
space) problem, and only affects programs which do a large number of
very simple operations on very small objects. I suspect that most of
the affected programs are in the numerics domain, but there are
probably examples elsewhere as well. However, I can say that none of
the applications I've worked on have been affected by it; in all
cases, we had a number of large objects (which couldn't have been
passed in registers anyway), and fairly complex operations on the
object (so they weren't inlined, and the compiler optimizer could do
its job within the member function).

--
James Kanze mailto:ka...@gabi-soft.de

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

Chris Uzdavinis

unread,
May 22, 2002, 9:25:05 AM5/22/02
to
Vincent Finn <1...@2.com.cos.agilent.com> writes:

> you use '.ipp'
> Francis example uses '.impl'
> and Daniel suggests '.cc'
>
> Is there any accepted naming for this file ?

Apparently not. To add one more, the ACE library uses '.i'

--
Chris

Paavo Helde

unread,
May 22, 2002, 9:38:21 AM5/22/02
to
> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?

An example schema, remotely resembling a real-life situation:

class Buffer {
/* ... */
public:
enum datatype_t {Integer, Double, Complex, /*...*/};
datatype_t GetType() const;
int* GetIntBuffer(); // throws if buffer not integer
double* GetDoubleBuffer();
complex<double>* GetComplexBuffer();
// ...
};

// Objects of type Buffer come from other DLL/network machine, etc.
// These encapsulate some data arrays read in from external source,
// which can be of several different types. The exact type is known
// only at run-time.

// Apply an operation to two buffers producing the result in the third
buffer:
// The function for that is ApplyOp() defined below; the rest are
template
// helpers.

// Because the types of operands are known only at run-time, the
templates
// are instantiated for all combinations. (Another solution would be do
// carry out sophisticated analysis of what combinations are meaningful,
// support only those and pre-convert operands to the required type
before
// applying the operation).

template<typename T, typename U, typename V>
void op3(const T* x, const U* y, V* z) {
// some real working code
}

template<typename T, typename U>
void op2(const T* x, const U* y, Buffer& Z) {
switch(Z.GetType()) {
case Buffer::Integer: op3(x, y, Z.GetIntBuffer()); break;
case Buffer::Double: op3(x, y, Z.GetDoubleBuffer()); break;
case Buffer::Complex: op3(x, y, Z.GetComplexBuffer()); break;
// ...
}
}

template<typename T>
void op1(const T* x, const Buffer& Y, Buffer& Z) {
switch(Y.GetType()) {
case Buffer::Integer: op3(x, Y.GetIntBuffer(), Z); break;
case Buffer::Double: op3(x, Y.GetDoubleBuffer(), Z); break;
case Buffer::Complex: op3(x, Y.GetComplexBuffer(), Z); break;
// ...
}
}

void ApplyOp(const Buffer& X, const Buffer& Y, Buffer& Z) {
switch(X.GetType()) {
case Buffer::Integer: op3(X.GetIntBuffer(), Y, Z); break;
case Buffer::Double: op3(X.GetDoubleBuffer(), Y, Z); break;
case Buffer::Complex: op3(X.GetComplexBuffer(), Y, Z); break;
// ...
}
}

Now go figure the number of op3() instantiations and dependence on the
number of data types supported.

I do not say that such code bloat is necessarily troublesome; of course
the size of executable goes through roof, but hard disks are large these
days, and any decent OS should load only those parts of executable which
are accessed at the moment.

OTOH, note that because of huge amount of linker symbols linking (both
static and dynamic) will probably go slower; also, if the templated
things are polymorphic classes, then at least some widespread
implementations want to initialize all vtables in the beginning of the
program, which may take a noticable time.

hth
Paavo

Hannah Schroeter

unread,
May 22, 2002, 9:40:26 AM5/22/02
to
Hello!

In article <y2FNktF+...@robinton.demon.co.uk>,
Francis Glassborow <fran...@robinton.demon.co.uk> wrote:
>[...]

>> 1. [Multiple instantiations] Temp<T> is instantiated in more than one
>> translation unit, so when the objs are linked together, the exe has
>> more than one copy of Temp<T>'s member functions.

>An implementation that does this is broken. Consider what will happen
>with:

>void foo(){ static int i; ...}

>as a member function of the class template. The implementation HAS to be
>able to remove duplicated function instantiations.

The implementation has to coalesce (sp?) the "static int i" for all
instantiations, but the code itself may be duplicated. E.g. gcc.
In that case, the implementation is suboptimal, but not completely
wrong.

>[...]

Kind regards,

Hannah.

Tom Puverle

unread,
May 22, 2002, 6:15:30 PM5/22/02
to
> The real problem with expression templates (other than the fact that
> they are unmaintainable -- I agree with you there) is that they lead
> to the situation described above, where the results of inlining become
> too complex for the optimizer. As an experiment, I once wrote a
> Matrix class whose operators returned nodes in an expression tree,
> rather than temporary objects. The goal was that everything would end
> up inlined, and that the compiler would generate a neat loop out of
> it. In practice, one almost immediatly reached the situation where
> there was too much inlining, and the compilers I had at the time (g++
> and Sun CC) both gave up on even the simplest expressions.

I did that once too. My matrix class worked on an element-by-element basis.
There was an overloaded operator() for element access. The expression node's
operator(int i, int j) would then return (e.g. for +) would return
subExpr1_(i,j) + subExpr2_(i,j) and similarly for other operators. All the
real work would then happen in the assignment operators of the destination
matrix - a "template for loop" that generated something like:

innerRep_[0][0] = ....
innerRep_[0][1] = ....
...
innerRep_[1][0] = ....
etc.

In the end the optimizer managed to expand the entire expression into just a
list of array accesses
( e.g. innerRep[0][0] = mat1[0][0] + mat2[0][0] + mat3[0][0]; etc. )

To get down from 1 to 0 temporaries the matrix class used 2 data buffers and
an indicator which is valid.

It's a lot of generated code (not really usable for very large matrices) but
it surely is fast! Another advantage of it is that it might be easier for a
vectorising optimizer to do something with it...

Tom

Stefan Heinzmann

unread,
May 22, 2002, 6:15:46 PM5/22/02
to
Francis Glassborow <francis.g...@ntlworld.com> wrote in message news:<edC2eYAp...@robinton.demon.co.uk>...

> In article <95e0e5ef.02052...@posting.google.com>, Stefan
> Heinzmann <stefan_h...@yahoo.com> writes
> >In a statically linked executable, the above point should be solved
> >with a reasonable toolset, as others have said. I am astonished
> >however that noone has yet mentioned that this can indeed be a problem
> >in dynamically loaded libraries or plug-in modules. Since they're
> >compiled and linked separately, they will of course contain their own
> >template instantiations.
> >
> >For example if my application consists of a number of separate modules
> >which are loaded dynamically (not uncommon in Windows-land), and I use
> >the standard library a lot (streams, strings and the like), I will
> >likely have instantiations of the same stuff in each module.
>
> But that is not a template issue, it is an issue with dynamically linked
> libraries.

Yes, but one that becomes worse with templates.

James Kanze

unread,
May 22, 2002, 6:23:41 PM5/22/02