Code Bloat due to Templates

Scott Meyers

unread,

May 16, 2002, 10:07:58 AM5/16/02

to

I often hear about "code bloat" arising from templates. I've been trying
to figure out exactly what this means. From what I can tell, there are
several different meanings, as follows, where Temp is a template and T, T1,
and T2 are type parameters.

1. [Multiple instantiations] Temp<T> is instantiated in more than one
translation unit, so when the objs are linked together, the exe has
more than one copy of Temp<T>'s member functions.

2. [Duplicate instantiations] Temp<T1> and Temp<T2> could be share a
single underlying binary implementation, but they don't. The most
common example is when T1 and T2 are both pointer types, but it would
also apply if T1=int and T2=long on a machine where int and long are
the same size.

3. [Near-duplicate instantiations] Temp<T, n> and Temp<T, m> are
individually instantiated for non-type parameters n and m, even
though the resulting member functions are almost identical. This
would be the case for e.g., FixedSizeBuffer<int, 10> and
FixedSizeBuffer<int, 20>.

4. [Excessive inlining] Because many compilers require that all template
code be in header files, all such code is inlined, and that makes
executables bigger than they would be if template functions that are
large and frequently called could be outlined.

5. [Excessive instantiation] All the member functions of Temp<T> are
instantiated, even though only a few are called. (This is
nonstandard behavior, but, at least in the past, it was a problem
with some compilers.)

6. [Gratuitous types] Temp<T1> and Temp<T2> are both instantiated, but
if templates didn't exist, programmers would make do with a single
untemplatized implementation. For example, programmers instantiate
both Stack<int> and Stack<long>, but if they lacked templates, they'd
get by with only IntStack.

Are there other causes of code bloat? Of the different types of code
bloat, which are the most troublesome in practice?

Thanks,

Scott

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]

Ingolf Steinbach

unread,

May 16, 2002, 7:19:48 PM5/16/02

to

Scott Meyers wrote:
> 2. [Duplicate instantiations] Temp<T1> and Temp<T2> could be share a
> single underlying binary implementation, but they don't. The most
> common example is when T1 and T2 are both pointer types, but it would
> also apply if T1=int and T2=long on a machine where int and long are
> the same size.

Are you sure? What about:

template <typename T> void foo(T tl, T tr)
{
*tl = *tr;
}

typedef char* T1;
typedef std::string* T2;

Both T1 and T2 are pointer types. Could they share the same
instantiation of foo()?

Kind regards
Ingolf
--

Ingolf Steinbach Jena-Optronik GmbH
ingolf.s...@jena-optronik.de ++49 3641 200-147
PGP: 0x7B3B5661 213C 828E 0C92 16B5 05D0 4D5B A324 EC04

Francis Glassborow

unread,

May 16, 2002, 7:24:59 PM5/16/02

to

In article <MPG.174cf495d...@news.hevanet.com>, Scott Meyers
<sme...@aristeia.com> writes

>I often hear about "code bloat" arising from templates. I've been trying
>to figure out exactly what this means. From what I can tell, there are
>several different meanings, as follows, where Temp is a template and T, T1,
>and T2 are type parameters.
>
> 1. [Multiple instantiations] Temp<T> is instantiated in more than one
> translation unit, so when the objs are linked together, the exe has
> more than one copy of Temp<T>'s member functions.

An implementation that does this is broken. Consider what will happen
with:

void foo(){ static int i; ...}

as a member function of the class template. The implementation HAS to be
able to remove duplicated function instantiations.

>
> 2. [Duplicate instantiations] Temp<T1> and Temp<T2> could be share a
> single underlying binary implementation, but they don't. The most
> common example is when T1 and T2 are both pointer types, but it would
> also apply if T1=int and T2=long on a machine where int and long are
> the same size.

This is subtler. When dealing with pointer types the hoisting idiom
should be used so that most of the code is in a base class used for a
partial specialisation.

template< typename T> class X {
//whatever
}

template<> class X<void*>{
// fully specialise for void*
};

template <typename T>
class X<T*>: X<void*> {
// use hoisting idiom
};

The int/long problem can be dealt with via specialisation but doing so
would normally be reserved for places where code size became critical.

>
> 3. [Near-duplicate instantiations] Temp<T, n> and Temp<T, m> are
> individually instantiated for non-type parameters n and m, even
> though the resulting member functions are almost identical. This
> would be the case for e.g., FixedSizeBuffer<int, 10> and
> FixedSizeBuffer<int, 20>.

The hoisting idiom should certainly be considered for such cases.

>
> 4. [Excessive inlining] Because many compilers require that all template
> code be in header files, all such code is inlined, and that makes
> executables bigger than they would be if template functions that are
> large and frequently called could be outlined.

Yes, and this is a real problem when programmers actually stuff the
implementation into the class template definition. This is one reason
why I advocate that the implementation should always be in its own file
even if you then #include it into the definition file.

>
> 5. [Excessive instantiation] All the member functions of Temp<T> are
> instantiated, even though only a few are called. (This is
> nonstandard behavior, but, at least in the past, it was a problem
> with some compilers.)

Actually it can still be a problem with explicit instantiation, but it
can be avoided by making the explicit instantiation file(s) into a
library.

>
> 6. [Gratuitous types] Temp<T1> and Temp<T2> are both instantiated, but
> if templates didn't exist, programmers would make do with a single
> untemplatized implementation. For example, programmers instantiate
> both Stack<int> and Stack<long>, but if they lacked templates, they'd
> get by with only IntStack.

Yes, but presumably IntStack would be for long, with the possibility
that the memory requirement for the data doubles on systems where int
and long are different sizes.

As I hope is clear, there is little reason for code bloat if those
designing and implementing templates are competent and the compiler is a
high quality product. Poor compilers, whether C or C++ or any other
language can generate excessive code, and poor programmers can write
code that will generate much more code than is necessary.

The point is that the code bloat (where it exists) is not the result of
C++ but the consequence of using inferior products or programmers.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

Carl Daniel

unread,

May 16, 2002, 7:26:19 PM5/16/02

to

My $0.02...

"Scott Meyers" <sme...@aristeia.com> wrote in message
news:MPG.174cf495d...@news.hevanet.com...

> I often hear about "code bloat" arising from templates. I've been trying
> to figure out exactly what this means. From what I can tell, there are
> several different meanings, as follows, where Temp is a template and T,
T1,
> and T2 are type parameters.
>
> 1. [Multiple instantiations] Temp<T> is instantiated in more than one
> translation unit, so when the objs are linked together, the exe has
> more than one copy of Temp<T>'s member functions.

With the compilers I use, this doesn't happen.

>
> 2. [Duplicate instantiations] Temp<T1> and Temp<T2> could be share a
> single underlying binary implementation, but they don't. The most
> common example is when T1 and T2 are both pointer types, but it
would
> also apply if T1=int and T2=long on a machine where int and long
are
> the same size.

I think this is the single most incidious source of code bloat - none of the
compilers I use will share the binary representation between distinct C++
types which happen to be structurally identical. Of course, this kind of
optimization can frequently be manually implemented by the programmer as
long as the compiler supports PTS.

>
> 3. [Near-duplicate instantiations] Temp<T, n> and Temp<T, m> are
> individually instantiated for non-type parameters n and m, even
> though the resulting member functions are almost identical. This
> would be the case for e.g., FixedSizeBuffer<int, 10> and
> FixedSizeBuffer<int, 20>.

This would be nice to solve (at the compiler level), but can be mitigated by
appropriate construction of the template classes (using base class members,
etc).

>
> 4. [Excessive inlining] Because many compilers require that all
template
> code be in header files, all such code is inlined, and that makes
> executables bigger than they would be if template functions that
are
> large and frequently called could be outlined.

My impression is that compilers do a reasonably good job of choosing what's
inlined and what's not when the appropriate switches are used. Compilers
with whole-program-optimization are becoming more common, so hopefully this
will get even better in the future.

>
> 5. [Excessive instantiation] All the member functions of Temp<T> are
> instantiated, even though only a few are called. (This is
> nonstandard behavior, but, at least in the past, it was a problem
> with some compilers.)

This isn't a problem with the compilers I use.

>
> 6. [Gratuitous types] Temp<T1> and Temp<T2> are both instantiated, but
> if templates didn't exist, programmers would make do with a single
> untemplatized implementation. For example, programmers instantiate
> both Stack<int> and Stack<long>, but if they lacked templates,
they'd
> get by with only IntStack.

If #2 and #3 are addressed, this becomes a non-issue. At present, it's a
programmer discipline issue if memory is tight enough that it matters.

>
> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?
>
> Thanks,
>
> Scott

-cd

Vincent Finn

unread,

May 16, 2002, 7:27:51 PM5/16/02

to

> 4. [Excessive inlining] Because many compilers require that all template
> code be in header files, all such code is inlined, and that makes
> executables bigger than they would be if template functions that are
> large and frequently called could be outlined.

Are you sure about this
I remember a discussion about it in comp.lang.c++
and the last comment was that templates are NOT always inlined

The link below mightn't be clickable but if you paste it in it'll go to the
end of that discussion

http://groups.google.com/groups?q=templates+inline+group:comp.lang.c%2B%2B.*&hl=en&lr=&selm=GbBw8.344%24h02.145674%40news20.bellglobal.com&rnum=8

Vin

Garry Lancaster

unread,

May 17, 2002, 5:45:16 AM5/17/02

to

Scott Meyers:

> I often hear about "code bloat" arising from templates.
> I've been trying to figure out exactly what this means.
> From what I can tell, there are several different
> meanings, as follows, where Temp is a template and
> T, T1, and T2 are type parameters.
>
> 1. [Multiple instantiations] Temp<T> is instantiated
> in more than one translation unit, so when the
> objs are linked together, the exe has
> more than one copy of Temp<T>'s member functions.

A decent linker should remove all but one of these,
surely?

> 2. [Duplicate instantiations] Temp<T1> and
> Temp<T2> could be share a single underlying
> binary implementation, but they don't. The most
> common example is when T1 and T2 are both
> pointer types, but it would also apply if T1=int and
> T2=long on a machine where int and long are
> the same size.

A smart compiler maybe able to spot this in a single
translation unit, but this is probably asking too much.
A smart linker maybe able to spot this, but this is
almost certainly asking too much.

In some cases the programmer can intervene. In
the pointer case, a smart programmer can develop a
partial specialization for all pointer types that is
implemented using a shared void* based
implementation. This can be somewhat more
work than a straightforward, non-shared,
implementation and is often slower. This is a key
point.

So it depends.

> 3. [Near-duplicate instantiations] Temp<T, n> and
> Temp<T, m> are individually instantiated for
> non-type parameters n and m, even though the
> resulting member functions are almost identical. This
> would be the case for e.g., FixedSizeBuffer<int, 10> and
> FixedSizeBuffer<int, 20>.

Again, a smart programmer could factor out the common
code if there was any. For reasons that should be obvious,
this is not worth doing for inline functions that are genuinely
inlined.

> 4. [Excessive inlining] Because many compilers require
> that all template code be in header files,

Then they don't support explicit instantiation? You should
be able to write:

// foo.h
template <typename T> T foo(T); // Declaration.

// foo.cpp
template <typename T> T foo(T t) { return t; } // Definition.
template int foo<int>(int); // Explicit instantiation.

// bar.cpp
#include "foo.h"
int main()
{
int i = foo( 1 );
}

Of course this isn't as versatile as implicit instantiation,
but still it shows templates don't always have to live
in headers. Even without export. Note that the standard
is a little confusing as to whether this is allowed: a
future revision will be clearer.

> all such code is inlined, and that makes
> executables bigger than they would be if template
> functions that are large and frequently called
> could be outlined.

What about function template definitions declared
outside a class definition without the inline keyword?
For example,

template <typename T>
void blah(T t) {} // Out-of-line template definition.

Or do you mean that even without the inline keyword,
optimizers can still choose to inline the code and that
you believe they do that more aggressively than they
should?

> 5. [Excessive instantiation] All the member functions
> of Temp<T> are instantiated, even though only a
> few are called. (This is nonstandard behavior, but,
> at least in the past, it was a problem with some
> compilers.)

I agree that's less of a problem now. The standard
is a bit weird about in-class friend definitions though:
these are supposed to be instantiated when their
containing class is instantiated, even if they are
never called, but most compilers seem to ignore
that rule. Hopefully the rule will be changed in line
with Defect Report #329.

> 6. [Gratuitous types] Temp<T1> and Temp<T2> are
> both instantiated, but if templates didn't exist,
> programmers would make do with a single
> untemplatized implementation. For example,
> programmers instantiate both Stack<int> and
> Stack<long>, but if they lacked templates, they'd
> get by with only IntStack.

I think this could be common. Although again
the problem could be solved by smart compilers
or smart programmers.

> Are there other causes of code bloat? Of the
> different types of code bloat, which are the most
> troublesome in practice?

When we write many templates we make a choice
about the implementation. There are two ways to
implement a template with template parameter T:

1. Fast. Where T (value type) is used in most member
functions. Emitted code likely to be very fast, but there
is little scope for sharing the implementation between
different instantiations.

2. Small. Where void* is used in most member functions
with a thin inline wrapper that casts from void*<=>T*.
Likely to be a bit slower and more complex but the
void* parts can be factored out into non-template functions
shared between all instantiations of the template.

Most templates I see choose option 1 (fast), but option
2 (small) is the choice we tended to make in C and is the
closest to the way most containers are implemented
in Java and C# (with their base Object types and casting).

A typical standard library contains many good examples
of an option 1 approach. However, it would be
possible to implement std::list using a linked list of
generic_node.

struct generic_node
{
generic_node* prev, next;
void* obj; // Note: not T obj;
};

(allowing parts of the implementation to be shared)
and no inline functions (though see note 1). This is,
as far as I'm aware, never done, probably because the
result would be considerably slower.

In summary, provided you have a linker that deals
correctly with what you term multiple instantiations,
the main cause of template bloat is the way
programmers choose to write template code. Almost
a classic time vs. space trade-off in fact except that
the template system also makes the fast version far
easier to write than the small version, something that is
not the case when programming in languages like
C, Java and C#.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

NOTES:

1. We'd have to be a bit careful removing all inline
functions in the name of bloat reduction. Sometimes
an inline expansion can actually be smaller than
the equivalent function call setup.

Daniel T.

unread,

May 17, 2002, 5:46:18 AM5/17/02

to

Scott Meyers <sme...@aristeia.com> wrote:

>Are there other causes of code bloat? Of the different types of code
>bloat, which are the most troublesome in practice?

In VC++, if you optimize for speed and have a lot of global std::string
objects which were initialized with something other than the default
c_tor, you will get an executable that is an extra 250 bytes in size for
each string initialized. I'm not sure why this happens, and it doesn't
seem to affect the size of the program in RAM but the executable is
bigger and it was justification enough for my boss to ban the use of
std::string at our company.

--
Improve your company's understanding of objects...
Hire me. <http://home1.gte.net/danielt3/resume.html>

JKB

unread,

May 17, 2002, 5:47:04 AM5/17/02

to

"Scott Meyers" <sme...@aristeia.com> wrote ...

> I often hear about "code bloat" arising from templates. I've been trying

> to figure out exactly what this means. [various bloats listed here]

> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?

okay, here's my two cents:

[Unreadable headers] Putting template code in the class declaration
significantly obscures the interface. Moving the bodies outside the class
declaration helps somewhat, but makes them even more syntactically bizarre
than they are inside the class. And it's not always possible - see the
"templatized assignment" thread. This impacts readability far more than the
bodies for inlined functions do, because those are normally quite small.

[Code dependency] Having function bodies in the header files, whether
they're inside the class declaration or not, causes other files to depend on
them. When an implementation detail is changed, those other files are
recompiled when they really don't need to be. Granted that fast processors
have reduced the importance of build speed, but this is still an issue on
large code bases.

[Insufficient instantiation] When a template class is being written, it's
hard to know if it even compiles correctly. Obviously this is
compiler-specific, but I've often seen errors in template code go undetected
for quite some time. Then somebody happens to specialize the right thing
and the code won't even compile.

These aren't precisely 'bloat' in the sense you asked, but they are ways in
which template instantiation issues impact the development process.
-- jkb

Giovanni Bajo

unread,

May 17, 2002, 5:50:18 AM5/17/02

to

"Scott Meyers" <sme...@aristeia.com> ha scritto nel messaggio
news:MPG.174cf495d...@news.hevanet.com...

I'll just comment along.

> 1. [Multiple instantiations]

Modern compilers should handle this very well. EDG has a "prelinker" to
trace which template has been instanciated where, GCC (used to?) has the
template repository, etc. I'm not sure how you could end up with two copies
of the same code, since there would be a name clash at link time (we're
speaking of course about non-inline code).

> 2. [Duplicate instantiations]

Another condition could be when Temp<>::F() does not rely in any meaningul
way on the template parameter. For example:

template <class T>
class FixedArray
{
T a[100];
bool flag;

public:
[.....]

void SetFlag(bool value)
{ flag = value; }
};

I put the flag variable after a[] by purpose, because I think that
offsetof(flag) could be passed as a hidden parameter to an unified
instanciation of SetFlag(), to avoid different instanciations just because
sizeof(T) changes. This case becoms even more interesting when your template
gets two parameters, and a function relies only on one of them.

> 3. [Near-duplicate instantiations] Temp<T, n> and Temp<T, m> are

Even in this case, most code could be unified by passing the numeric
argument as a hidden parameter to the member function. On the other hand, we
must be careful here, because it could break some optimizations (e.g.
CircularBuffer<int, 128> might rely on the fact that %128 is very fast,
while passing 128 as a hidden parameter obviously breaks this optimization.
Usual tradeoff space/size here.

> 4. [Excessive inlining]

Actually, a template member function is inlined only if it is defined within
the class definition, otherwise it is not (unless you explicitally specify
inline), just like in normal classes. The code must be in "header" files
because the compiler needs to be able to instantiate it but it won't
necessarily inline it unless needed/required.

> 5. [Excessive instantiation]

Not only it is nonstandard, but it even breaks existing code that relies on
the fact that member functions shall not be instanciated until their use. I
surely don't bother so much about this point.

> 6. [Gratuitous types] Temp<T1> and Temp<T2> are both instantiated, but
> if templates didn't exist, programmers would make do with a single
> untemplatized implementation. For example, programmers instantiate
> both Stack<int> and Stack<long>, but if they lacked templates,
they'd
> get by with only IntStack.

Yes, this can be a problem, especially when working in a team (I use
stack<int>, you use stack<long>). Not even sure how this could be fixed, if
not by the programmers themselves.

> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?

What about source code bloat? <g>
C++ template syntax is _verbose_.

Giovanni Bajo

Daniel Miller

unread,

May 17, 2002, 6:51:46 AM5/17/02

to

Scott Meyers wrote:

> I often hear about "code bloat" arising from templates. I've been trying
> to figure out exactly what this means. From what I can tell, there are
> several different meanings, as follows, where Temp is a template and T, T1,
> and T2 are type parameters.
>
> 1. [Multiple instantiations] Temp<T> is instantiated in more than one
> translation unit, so when the objs are linked together, the exe has
> more than one copy of Temp<T>'s member functions.
>
> 2. [Duplicate instantiations] Temp<T1> and Temp<T2> could be share a
> single underlying binary implementation, but they don't. The most
> common example is when T1 and T2 are both pointer types, but it would
> also apply if T1=int and T2=long on a machine where int and long are
> the same size.

For the purpose of my design-time code-bloat reason #7 below, I am going to
consider this purely a compile-time deficiency, where (my compile-time
reinterpretation of) #2 hopes for a compiler template-engine which is clever
enough to notice that the only difference between two or more compiler-generated
specializations of the same template is in its own information-base related to
types, not in the resulting machine-code, especially in the resulting
machine-code of out-of-line functions. Hence, I effectively explicitly name
this one [Duplicate instantiations at compile-time] to avoid overlap with the
design-time focus of my reason #7 below.

> 3. [Near-duplicate instantiations] Temp<T, n> and Temp<T, m> are
> individually instantiated for non-type parameters n and m, even
> though the resulting member functions are almost identical. This
> would be the case for e.g., FixedSizeBuffer<int, 10> and
> FixedSizeBuffer<int, 20>.
>
> 4. [Excessive inlining] Because many compilers require that all template
> code be in header files, all such code is inlined,

You might want to be more precise in your wording here, because the
state-of-affairs does not need to be exactly as worded. Your wording implicitly
encourages the reader to think of having only header files instead of having
both header files (e.g., with a .h/.hpp/.hxx suffix) and template function-body
files (e.g., using RogueWave's local conventions .cc suffix for #included
template function-body files as opposed to their .h and .cpp file extensions for
header-files and source-files, respectively). I think what you should write is
something to the effect of:

"Because many compilers require that all template code be made available to
the compiler at compile-time before the point-of-usage, common practice is to
put the body of template functions in files which are #included. Because of the
#include similarity to canonical header-files, some people consider these
#included template-function-body file-content to be header file-content.
Because this #included template-function-body file-content which **define
implementation** are confused for header file-content which **declare
interface**, some people simply place all the function bodies within the class
declarations because this is where they typically see function-bodies in
header-files placed outside of the context of templates. Such functions
inappropriately declared in the class declaration are thus inlined
inappropriately instead of being out-of-lined in a #included
template-function-body file."

> and that makes
> executables bigger than they would be if template functions that are
> large and frequently called could be outlined.

Again, you might want to be more precise in your wording here, because such
template member-functions can in fact be out-of-lined by the aforementioned
technique. The "if [they] could be" implies (to at least some of us readers)
that they cannot be out-of-lined (when in fact they can). See RogueWave's
products for a demonstration of this portability practice.

> 5. [Excessive instantiation] All the member functions of Temp<T> are
> instantiated, even though only a few are called. (This is
> nonstandard behavior, but, at least in the past, it was a problem
> with some compilers.)
>
> 6. [Gratuitous types] Temp<T1> and Temp<T2> are both instantiated, but
> if templates didn't exist, programmers would make do with a single
> untemplatized implementation. For example, programmers instantiate
> both Stack<int> and Stack<long>, but if they lacked templates, they'd
> get by with only IntStack.
>
> Are there other causes of code bloat?

7. [Improper factoring out a type-parameterized interface at design-time]
As discussed by Stroustrup in _D&E_, parameterized-type-based
nontrivial-software should admit that there are two layers: 1) the type-safe
public-interface layer which uses templates for type-safety ricocheting off to
2) some inner layer which does *not* get expanded over & over & over again which
is coded without any (or as much) type-safety by a wise & informed elite who
strictly promise to not abuse the lack of type-safety (or at least not abuse the
lack of type-safety in the inner layer to the point that a bug is observable
when using the public type-parameterized interface).

> Of the different types of code
> bloat, which are the most troublesome in practice?

Over the years, I qualitatively feel that #1, #4, #5, and #7 are the most
frequent causes of bloat that I have personally observed. But then again my
presence might be tainting the data because whereever I am, all people work with
me know that I am vigilant & insistent about minding the store when it comes to
template code-bloat. These four are the topics 1) which I can do little about
without being an author of a compiler or 2) on which I can apply firm
enforcement pressure at the right time.

Although #4 might be frequently causing problems, it can easily be stamped out with education & discipline (and strictly-enforced coding
conventions). (And I do stamp it out.)

Although #7 might be frequently causing problems, it is hard to teach every last person how to design software perfectly.

Although #2 might in fact be frequently causing problems which I have never
thought about enough to notice, I suspect that all code-bloat situations in #2
could be considered deficiencies at design-time regarding too little factoring
out of the type-parameterized layer from some inner layer instead of expecting
the compiler to be extra clever. (Or equivalently for #2 the ball ought to be
in the C++ programmers'/engineers' court, not in the compiler vendors' court.)

This leaves me with #1 and #5 on my somebody-ought-to-do-something-about-that
list for each & every C++ compilation environment, especially in these days of
heavy use of STL and ever-increasing dependence on templates in the Loki & Boost
libraries. Sun's SparcWorks/Workshop/Forte C++ compiler has solved #1 very well
for some time now (except for the fact that Forte's build-avoidance capabilities
of Forte's template-engine fights with the extensive build-avoidance
capabilities within ClearCase's clearmake). Historically GNU g++ has suffered
especially harshly from #1, but I have not checked in lately on how recent
revisions g++ are doing in that regard.

Scott Meyers

unread,

May 17, 2002, 6:57:06 AM5/17/02

to

On 16 May 2002 19:24:59 -0400, Francis Glassborow wrote:
> As I hope is clear, there is little reason for code bloat if those
> designing and implementing templates are competent and the compiler is a
> high quality product. Poor compilers, whether C or C++ or any other
> language can generate excessive code, and poor programmers can write
> code that will generate much more code than is necessary.

Uh huh. I'm well aware of the C++ "blame the victim" party line on code
bloat. But that's not what I asked about. This is what I asked:

Are there other causes of code bloat? Of the different types of code
bloat, which are the most troublesome in practice?

IME, many people -- MANY people -- avoid templates because of concerns
about code bloat. Many of these many have first hand experience with code
bloat. Denying the existence of the problem doesn't help them any.
Telling them it is all in their heads doesn't help any. Telling them to
rewrite the libraries they must use does not help them any. Telling them
their compilers are broken does not help them any.

If we want to help them, we must first understand what they mean when they
complain about "code bloat." That's why I want to know (1) if I've
overlooked any meanings and (2) which of the many meanings are the most
important.

Scott

JKB

unread,

May 17, 2002, 7:01:49 AM5/17/02

to

> Scott Meyers <sme...@aristeia.com> writes
> >I often hear about "code bloat" arising from templates. >

> 4. [Excessive inlining] Because many compilers require that all
template
> > code be in header files, all such code is inlined, and that makes
> > executables bigger than they would be if template functions that
are
> > large and frequently called could be outlined.

>"Francis Glassborow" <francis.g...@ntlworld.com> wrote

> Yes, and this is a real problem when programmers actually stuff the
> implementation into the class template definition. This is one reason
> why I advocate that the implementation should always be in its own file
> even if you then #include it into the definition file.

But compilers are always free to not inline a call, even if the function
body is available. See also 'register'. Is there a history of compilers
making appallingly bad decisions on this?

And why would you break implementation into a separate file that's #included
back in? That makes it all the same translation unit, and the compiler
doesn't care what source file the code comes from. Moving function bodies
outside the class declaration can help readability, but I don't see where it
affects language semantics at all.

-- jkb

Francis Glassborow

unread,

May 17, 2002, 6:24:55 PM5/17/02

to

In article <MPG.174e391de...@news.hevanet.com>, Scott Meyers
<sme...@aristeia.com> writes

>On 16 May 2002 19:24:59 -0400, Francis Glassborow wrote:
> > As I hope is clear, there is little reason for code bloat if those
> > designing and implementing templates are competent and the compiler is a
> > high quality product. Poor compilers, whether C or C++ or any other
> > language can generate excessive code, and poor programmers can write
> > code that will generate much more code than is necessary.
>
>Uh huh. I'm well aware of the C++ "blame the victim" party line on code
>bloat. But that's not what I asked about. This is what I asked:
>
> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?

However I am not alone in responding to your list. I think it is
important to distinguish between inherent problems within the language
(the case of instantiating a template for int and long when they are the
same size could be such a case) and problems brought about by failure to
understand the consequences of particular code. Designing good templates
is a highly skilled task and should be appreciated as such. Producing
compilers that meet the requirements of the Standard is also a skilled
task. Adding in good optimisation is what sets a compiler ahead of
others that otherwise correctly implement the language. Blaming the
language because compilers do not do what is required (only compile
'used' member functions from class templates etc.) is unfair.

The point that needs to be made is that in almost all cases where
templates are accused of creating code bloat the fault is either a
compiler that fails in its responsibilities, or a programmer who is
working beyond their skills.

If the myth that templates inherently cause code bloat is not laid to
rest, people will continue to just accept poor products, and poor
workers. Providing a list such as the one you have is a service exactly
because we can identify how to address those problems if they occur in
the work environment.

I am quite willing to accept blame (as part of WG 21) for things we got
wrong but I am not willing to accept blame for people using poor tools
or failing to recognise the inadequacies of their skills. Until those
responsible accept fault we can do little to improve the situation.

(Note that that is why I make no apology for writing highly critical if
brief reviews of the numerous books aimed at novice and just post-novice
books that fail to present good technique.)

>
>IME, many people -- MANY people -- avoid templates because of concerns
>about code bloat. Many of these many have first hand experience with code
>bloat. Denying the existence of the problem doesn't help them any.
>Telling them it is all in their heads doesn't help any. Telling them to
>rewrite the libraries they must use does not help them any. Telling them
>their compilers are broken does not help them any.

But telling them that
1) There are better idioms that solve many of the problems
2) Modern compilers do much better than the ones they had even three
years ago might.

I have no problem with the company that forbade the use of templates
five years ago. I have serious problems with those that do not
reconsider that position on an annual basis. Internal coding standards
need re-examination, not in their entirety but where-ever decisions have
been made on a practical basis of the tools being currently used.

Telling people that the problem with most code bloat is a quality issue
that cannot be fixed by the language but can be fixed by better tools
and better training admits the problem and tells how to solve it.
Blaming the language designers only helps where the problem really is
poor language design. Actually I think quite a bit of the problem with
templates is the opaqueness of the required syntax, and that we should
try to address that.

>
>If we want to help them, we must first understand what they mean when they
>complain about "code bloat." That's why I want to know (1) if I've
>overlooked any meanings and (2) which of the many meanings are the most
>important.

OK, but the motive was not clear from your post.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Francis Glassborow

unread,

May 17, 2002, 6:25:13 PM5/17/02

to

In article <ue8hk7l...@corp.supernews.com>, JKB
<burr...@seanet.com> writes

Really? When I provide in class definitions they are exactly as if I had
provided an out of class definition and prefixed it with inline. But
there is no way that I can provide an in class definition and require
that it not be inline.

By separating template class definition and implementation I

1) ensure that inline becomes explicit and is restricted to those places
I want to give that advice to the compiler.

2) That I can use other instantiation mechanisms such as explicit
instantiation and 'export' (even though I am less than enthusiastic
about that)

3) I also give the option of compiling a TU against just the template
definition until such time as I wish to link the whole programme to
produce an executable.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Ivan Vecerina

unread,

May 17, 2002, 6:26:42 PM5/17/02

to

"Scott Meyers" <sme...@aristeia.com> wrote in message
news:MPG.174cf495d...@news.hevanet.com...

> I often hear about "code bloat" arising from templates. I've been trying
> to figure out exactly what this means. From what I can tell, there are
> several different meanings, as follows, where Temp is a template and T,
T1,
> and T2 are type parameters.

.... (not much to add to the list, except that item 1 isn't supposed to
happen and should be rare) ...

> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?

Item 1 would be my guess, as I suspect that this is what slows down
compilation in the environment I use (see my first comment below).

I only have two related thoughts:

- I often hear 'code bloat' used with an altered meaning, which could be
called 'compile-time bloat': many instantiations of templates are made in
every header that uses a file, which leads to large 'obj' files and slows
compile & link time, even if the linker will strip them off the executable.

- how many of the actual code bloat problems whould be discarded by a smart
linker that would merge identical code sections ?
(of course, according to the standard functions whose address is used could
not be merged, but thet could be converted into forwarding stubs?).

Regards.

--
Ivan Vecerina, Dr. med. <> http://www.post1.com/~ivec
Soft Dev Manger, XiTact <> http://www.xitact.com
Brainbench MVP for C++ <> http://www.brainbench.com

Vincent Finn

unread,

May 17, 2002, 6:26:59 PM5/17/02

to

> > 4. [Excessive inlining] Because many compilers require that all template
> > code be in header files, all such code is inlined, and that makes
> > executables bigger than they would be if template functions that are
> > large and frequently called could be outlined.
>
> Yes, and this is a real problem when programmers actually stuff the
> implementation into the class template definition. This is one reason
> why I advocate that the implementation should always be in its own file
> even if you then #include it into the definition file.

But surely if you #include it in the header file there is no real difference
Is there any actual benefit ?

I did use the #inlcude for a while but went back to putting the whole
code in the header because people found it more confusing the other way !

Ernest Friedman-Hill

unread,

May 17, 2002, 8:54:54 PM5/17/02

to

Scott Meyers wrote:
> 1. [Multiple instantiations] Temp<T> is instantiated in more than one
> translation unit, so when the objs are linked together, the exe has
> more than one copy of Temp<T>'s member functions.

This one used to be a killer, but compilers have gotten *much* better.
I was on a project about 9 years ago that tried to make heavy use of
templates with g++ (version 1.7. something, at the time, I think?) Anyway,
the executable was growing out of control and link times were rising
exponentially. We were building 25 megabyte executables by the time we
figured out how to manage this with explicit instantiations -- then it
shrank by a factor of 25 or so.

---------------------------------------------------------
Ernest Friedman-Hill
Distributed Systems Research Phone: (925) 294-2154
Sandia National Labs FAX: (925) 294-2234
Org. 8920, MS 9012 ejf...@ca.sandia.gov
PO Box 969 http://herzberg.ca.sandia.gov
Livermore, CA 94550

Bob Archer

unread,

May 17, 2002, 8:55:16 PM5/17/02

to

In article <ue8hk7l...@corp.supernews.com>, burr...@seanet.com
says...

> And why would you break implementation into a separate file that's #included
> back in? That makes it all the same translation unit, and the compiler
> doesn't care what source file the code comes from. Moving function bodies
> outside the class declaration can help readability, but I don't see where it
> affects language semantics at all.

We split implementation into a separate file because it gave us more
flexibility, particularly when swapping code between compilers that all
had a slightly different set of rules for template instantiation. We
could selectively include the implementation depending on the exact
circumstances (which compiler, was this part of an explicit template
instantiation file etc.)

We also separated out inline functions into a separate file. In fact we
had seven different file types:

..h Header files for template and non-template classes and functions
..cpp Implementation for non-inlined non-template classes and functions
..ipp Implementation for inlined non-template classes and functions
..ctf Implementation for non-inlined template functions
..itf Implementation for inlined template functions
..ctp Implementation for non-inlined template classes
..itp Implementation for inlined template classes

This gave us maximum flexibility for different compilers, different
builds (the release build inlined things, the debug build didn't) and
different template instantiation policies.

It was a pain to set up and maintain but was probably worth it in the
end.

Bob

Greg Milford

unread,

May 17, 2002, 8:56:10 PM5/17/02

to

> 2. [Duplicate instantiations] Temp<T1> and Temp<T2> could be share a
> single underlying binary implementation, but they don't. The most
> common example is when T1 and T2 are both pointer types, but it
would
> also apply if T1=int and T2=long on a machine where int and long
are
> the same size.
>

I have wondered since being faced with this problem why STL container
implementations did not come with partial template specializations that
provide the single void* implementation for pointer types. This would seem
to be a natural feature for embedded software, but end users rarely will
attempt to add specializations to library code since it locks you in to that
version. A quick look at our project shows that we have 600+ instantiations
of vectors containing pointer types. Throw in the other containers and the
code generated here really adds up against our limitied flash space.
Perhaps library writers are waiting for compilers to make this optimization
and visa versa.

> 3. [Near-duplicate instantiations] Temp<T, n> and Temp<T, m> are
> individually instantiated for non-type parameters n and m, even
> though the resulting member functions are almost identical. This
> would be the case for e.g., FixedSizeBuffer<int, 10> and
> FixedSizeBuffer<int, 20>.
>
> 4. [Excessive inlining] Because many compilers require that all
template
> code be in header files, all such code is inlined, and that makes
> executables bigger than they would be if template functions that
are
> large and frequently called could be outlined.
>

Explicitly instantiating the templates (by function) in separate compilation
units is our workaround for this. Compile time is also dramatically
reduced.

> 5. [Excessive instantiation] All the member functions of Temp<T> are
> instantiated, even though only a few are called. (This is
> nonstandard behavior, but, at least in the past, it was a problem
> with some compilers.)
>

While our compiler is somewhat dated, what we saw was that explicit
instantiation by class cause this version of code bloat. The linker would
not remove all unused function in this case. It should not IMO do this as
default behavior, but an option for it would be nice.

> 6. [Gratuitous types] Temp<T1> and Temp<T2> are both instantiated, but
> if templates didn't exist, programmers would make do with a single
> untemplatized implementation. For example, programmers instantiate
> both Stack<int> and Stack<long>, but if they lacked templates,
they'd
> get by with only IntStack.
>

CATHLibCPP was a library for Acorn that was an attempt to aggressively hoist
template bloat away. Past posts about experience with it went unanswered,
so I assume it never reached prime time.

Thanks for giving this issue some much needed attention for those of us in
the embedded world :)

Greg

Roland Pibinger

unread,

May 17, 2002, 8:56:46 PM5/17/02

to

On 17 May 2002 06:57:06 -0400, Scott Meyers <sme...@aristeia.com>
wrote:

>This is what I asked:
>
> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?
>
>IME, many people -- MANY people -- avoid templates because of concerns
>about code bloat. Many of these many have first hand experience with code
>bloat. Denying the existence of the problem doesn't help them any.
>Telling them it is all in their heads doesn't help any. Telling them to
>rewrite the libraries they must use does not help them any. Telling them
>their compilers are broken does not help them any.
>
>If we want to help them, we must first understand what they mean when they
>complain about "code bloat." That's why I want to know (1) if I've
>overlooked any meanings and (2) which of the many meanings are the most
>important.

IMO, 'syntactical' and not 'physical' code bloat is the most important
problem WRT templates (Giovanni Bajo mentioned it before). The
template syntax added an extra level of complexity to the language.
"Modern" C++ idioms like nested classes within class templates
(iterator), explicit namespace qualifiers, typedef cascades, template
meta-programming, etc. boosted complexity further instead of
mitigating it. And the STL is obviously more appealing to computer
scientists than to the average programmer who just wanted some
uncomplicated containers.
Propagating 'lightweight' C++ might be a remedy. 'Less is more'
applies to templates, too.

Best regards,
Roland Pibinger

Tom Plunket

unread,

May 17, 2002, 8:57:39 PM5/17/02

to

Scott Meyers wrote:

> Uh huh. I'm well aware of the C++ "blame the victim" party line
> on code bloat. But that's not what I asked about. This is what
> I asked:
>
> Are there other causes of code bloat? Of the different types
> of code bloat, which are the most troublesome in practice?
>
> IME, many people -- MANY people -- avoid templates because of
> concerns about code bloat. Many of these many have first hand
> experience with code bloat.

My experience shows that most of the people concerned about code
bloat are concerned because of intuition and not experimentation.
I find when talking to people concerned about "template-generated
code bloat" that these people typically have little idea even how
to use templates much less experience in actually trying to.

> Denying the existence of the problem doesn't help them any.
> Telling them it is all in their heads doesn't help any. Telling
> them to rewrite the libraries they must use does not help them
> any. Telling them their compilers are broken does not help them
> any.

While I understand that these things do not make the perception
of a problem go away, the truth remains that templates do not
necessarily create interesting code bloat due to any number of
reasons.

> If we want to help them, we must first understand what they mean
> when they complain about "code bloat."

If we want to help them, we will tell them that their perceptions
are wrong and that a simple experiment can prove it.

MHO, at least.

-tom!

Nicola Musatti

unread,

May 17, 2002, 8:58:15 PM5/17/02

to

Daniel Miller wrote:
>
> Scott Meyers wrote:
[...]

> > 4. [Excessive inlining] Because many compilers require that all template
> > code be in header files, all such code is inlined,
>
> You might want to be more precise in your wording here, because the
> state-of-affairs does not need to be exactly as worded. Your wording implicitly
> encourages the reader to think of having only header files instead of having
> both header files (e.g., with a .h/.hpp/.hxx suffix) and template function-body
> files (e.g., using RogueWave's local conventions .cc suffix for #included
> template function-body files as opposed to their .h and .cpp file extensions for
> header-files and source-files, respectively). I think what you should write is
> something to the effect of:
>
> "Because many compilers require that all template code be made available to
> the compiler at compile-time before the point-of-usage, common practice is to
> put the body of template functions in files which are #included. Because of the
> #include similarity to canonical header-files, some people consider these
> #included template-function-body file-content to be header file-content.
> Because this #included template-function-body file-content which **define
> implementation** are confused for header file-content which **declare
> interface**, some people simply place all the function bodies within the class
> declarations because this is where they typically see function-bodies in
> header-files placed outside of the context of templates. Such functions
> inappropriately declared in the class declaration are thus inlined
> inappropriately instead of being out-of-lined in a #included
> template-function-body file."

In my experience this point, as worded above, is the single major
problem. This is due to the fact that many implementations of the
standard library make heavy use of inlining which results in an increase
of code size that is at the same time very evident and not fully under
programmer control.

[Note that I don't believe library implementors to be as naif as
described above; yet they do make the choice for the programmer].

Cheers,
Nicola Musatti

Cyril Schmidt

unread,

May 17, 2002, 8:59:14 PM5/17/02

to

Scott Meyers <sme...@aristeia.com> wrote in message news:<MPG.174cf495d...@news.hevanet.com>...

> I often hear about "code bloat" arising from templates. I've been trying
> to figure out exactly what this means.

One observation from my colleague: templates often cause the bloat of the
debug symbol table. Although it cannot be qualified as "code bloat", if
loading of your executable in a debugger takes an hour, and then every
single-step takes 10 minutes, that could be a perfect reason to avoid
templates.

Speaking of duplicate instantiations (item 2): even well-designed code
suffers from that. I have just experimented with gcc-3.0 on Sparc and
STL implementation from SGI. I instantiated std::map with key_type int and
mapped_type int, unsigned int, long, and unsigned long (all have the same
size and alignment requirements). The first instantiation added about
16K to the size of .text, subsequent instantiations added about 10K each.

If I understand the figures correctly, it means that the common (type-
-independent) part of map implementation is about 6K, while the variable
(type-dependent) part is about 10K. I believe that the SGI implementation
is one of the best map implementation at this time, so it would be hard
to make a significant improvement here.

Kind regards,

Cyril

Scott Meyers

unread,

May 18, 2002, 7:25:20 AM5/18/02

to

On 17 May 2002 05:45:16 -0400, Garry Lancaster wrote:
> In some cases the programmer can intervene. In
> the pointer case, a smart programmer can develop a
> partial specialization for all pointer types that is
> implemented using a shared void* based
> implementation. This can be somewhat more
> work than a straightforward, non-shared,
> implementation and is often slower. This is a key
> point.

Assuming the T* implementation consists of nothing more than inlined calls
to the void* implementation, why would this be slower? Or is it naive to
expect the T* wrapper to consist only of inlines?

Scott

Dietmar Kuehl

unread,

May 18, 2002, 7:36:25 AM5/18/02

to

Scott Meyers wrote:

> Are there other causes of code bloat?

The single biggest problem I have encountered with respect to code bloat
is people comparing apples to oranges: Yes, a fully type-safe program
using different containers for everything is probably bigger than a
program using a container for 'void*' or some 'CObject*' which has
*very* different properties (a typical difference apart from type-safety
is value vs. reference semantics). Make a correct comparison and compare
similar solutions to similar solutions: Multiple 'vector<T>' do not
corresponding to an array class taking a 'void*' - this is what a
'std::vector<void*>' is which in turn may be a reasonable choice to be
used in programs. You should not forget that you can parameterize
templates on their template arguments :-) If may need writing a
(generic?) proxy class to have easy access to the operations required by
the template but if a container of 'void*' is the solution, use this
solution.

There is somewhat of a "problem" that the object files created when
using templates have a tendency to be bigger - however, without impact
on the size of the resulting executable. The plot is this: The compiler
works hard with creating multiple versions of the same function in
different translation units resulting in longer compile times, bigger
excutables, longer link times and, last but not least, much faster
programs. Basically, the duplicate work is thrown [mostly] away at link
time. At this time it is worth noting that it is ill-advised to inline
all template code! *This* indeed causes code-bloat as many ill-advised
techniques to. There is no need to inline template code even if it
appears in multiple translation units: Duplicates of template functions
with external linkage have to be coped with (typically removed) by the
linker. Inline functions, however, can have static linkage and may not
be removed at link time! That is: Just because the template code is in
the header file does not mean that it has to or should be inline.

As far as I have seen, there is actually no [executable] code-bloat
(when comparing similar approaches rather than completely different
ones). There is, however, something which may be described as
"development resource bloat" which is worth considering, too: Disk space
needed on the development machines, compile and link times, etc.
Depending on what you do this can also often be reduced considerably,
however. There are quite a few templates which have a points of
variation which rarely, if ever, change. A primary example of this is
the IOStreams and locales library: Has anybody used a different
character type than 'char' and 'wchar_t'? Congratulations if you have
done: You have gathered experience with a rather hideous portions of the
standard library! (for those who have not done or tried it: you need to
write at least a 'ctype', 'codecvt' and a 'numpunct' facet; you probably
need to implement suitable character traits and you need to create an
'std::locale' object containing suitable 'ctype', 'numpunct', 'num_put',
'num_get', and 'codecvt' facets). Put differently: The code for
IOStreams and locales need not at all be in the header files! You just
use preinstantiated implementations for the standard types. For those
who *really* want to use a different character type the implementation
can provide a compile time flag, say '-D_NEED_IOSTREAM_IMPLEMENTATION',
which is either defined throught the project or in the translation unit
doing the explicit instantiations for the specific type.

Of course, this approach is not restricted to standard library
components since users can typically extend the compiler with '-D'
switches (or similar). Say, a numeric package might use a numeric type
like 'double', 'some::rational', or 'my::infinite_precision'. It is
unlikely to mix them up or come up all the times with new numeric types.
That is, although it is benefitial to templatize on the numeric type
(because the alternatives are, simply put, not viable) there is *no*
form of bloat if it is done "correctly". Sure, explicit instantiations
take a little bit more work but they often *are* a viable approach.

Of course, for a library like eg. STL it is not viable to rely on
explicit instantiation. However, the code bloat for these libraries is
typically rather low because they effectively fold down to rather
trivial operations in most cases - the development resource bloat is,
however, to stay for those. Of course, there are techniques to reduce
this kind of bloat, too. The major tool for this is fine grained
factorization of common portions of code: It is viable for template
code to call loads of trivial function (something which kills
performance eg. for dynamic polymorphism). That is, you can create
low-level abstractions. Of course, using eg. [smart] pointers to a
common base type is also a reasonable approach which should be
considered: This is basically the only choice for non-template code
short of having too much executation time (because many calls to
virtual function kill performance, not due to the actual virtual
function call but due to the lost optimization potential) and/or
development time to write the different alternatives by hand. Of
course, the template is still superior to a library having made the
choice for the base used up-front.... BTW, if you use a fixed type
to parameterize your 'std::vector<T>' you can of course create a thin
wrapper template, merely provide the member declarations in the header
and preinstantiate the use functionality again.

The problem with all kinds of template bloat is simple that people
expect to get the benefits (type-safety, often better execution time,
more flexibility, etc.) for free: This is not the case. Template do not
address your design problem. You still need to know what approach you
are supposed to take.

In summary: Code-bloat is non-problem. There is development resource
bloat with templates but even this can be addressed when the
non-template solution would be similarily suitable. Typically, the
non-template solution is not suitable for various reasons anyway so what
are you comparing in the first place?
--
<mailto:dietma...@yahoo.com> <http://www.dietmar-kuehl.de/>
Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.com/>

Chris Uzdavinis

unread,

May 18, 2002, 7:43:04 AM5/18/02

to

Scott Meyers <sme...@aristeia.com> writes:

> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?

A variation of your first category (multiple instantiations) exists
too, and that is unused or unnecessary template instantiations are
still linked into an application. Clearly a problem of the linker,
but I do see it happen.

Using class templates to calculate a value, type, or even a
compile-time assertion should not add to the resulting program's
size but often times does.

--
Chris

Nick Thurn

unread,

May 18, 2002, 11:28:31 AM5/18/02

to

"Tom Plunket" <to...@fancy.org> wrote in message
news:fudaeuoc0dq3etc37...@4ax.com...

> If we want to help them, we will tell them that their perceptions
> are wrong and that a simple experiment can prove it.
>

As Scott said "Denying the existence of the problem... ".
Please post your simple experiment.

Having scratched my head for a fair amount of time over
a Loki class that needed 5 minutes and 150Mb space
to compile a simple 10 line test program I think it's fair to
say that there those of us who are more interested
in how things actually work in practice than how they
should work or could work "in theory".

BTW making one loki method non-inline reduced
compile to 1min and 30Mb space, so go figure.

I use templates and have no idea if my code is more
or less bloated than it should be. I do know that moving
from the system linker to the gnu linker reduced my
libraries by a large factor (was it 10 or 5? I can't recall)
when I did it three years ago.

I don't think anyone knows how to use templates
to their full advantage as yet. Andrei is hard at it
and others as well but the definitive "how to" is
yet to be written. We're still in the "cool stuff"
and "gotchas" stage IMHO.

Please post your simple experiment

cheers
Nick

Francis Glassborow

unread,

May 18, 2002, 11:28:49 AM5/18/02

to

In article <Gw9uM...@news.boeing.com>, Greg Milford
<gregory....@boeing.com> writes

>While our compiler is somewhat dated, what we saw was that explicit
>instantiation by class cause this version of code bloat. The linker would
>not remove all unused function in this case. It should not IMO do this as
>default behavior, but an option for it would be nice.

I do not know what options you had available, but compiling your
explicit instantiations as a library should remove this problem.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Francis Glassborow

unread,

May 18, 2002, 11:29:06 AM5/18/02

to

In article <3CE4D538...@2.com>, Vincent Finn
<1...@2.com.cos.agilent.com> writes

>> Yes, and this is a real problem when programmers actually stuff the
>> implementation into the class template definition. This is one reason
>> why I advocate that the implementation should always be in its own file
>> even if you then #include it into the definition file.
>
>But surely if you #include it in the header file there is no real difference
>Is there any actual benefit ?

Yes there is. The most important difference is that it prevents
programmers from writing in class implementations of member functions of
templates. Such code is implicitly inline, and therefore you are
unnecessarily relying on the compiler to not inline the code.

>
>I did use the #inlcude for a while but went back to putting the whole
>code in the header because people found it more confusing the other way !

And do you do the same for non template classes?

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Joshua Lehrer

unread,

May 19, 2002, 4:46:18 PM5/19/02

to

Scott Meyers <sme...@aristeia.com> wrote in message news:<MPG.174f5f1c3...@news.hevanet.com>...

> On 17 May 2002 05:45:16 -0400, Garry Lancaster wrote:
> > In some cases the programmer can intervene. In
> > the pointer case, a smart programmer can develop a
> > partial specialization for all pointer types that is
> > implemented using a shared void* based
> > implementation. This can be somewhat more
> > work than a straightforward, non-shared,
> > implementation and is often slower. This is a key
> > point.
>
> Assuming the T* implementation consists of nothing more than inlined calls
> to the void* implementation, why would this be slower? Or is it naive to
> expect the T* wrapper to consist only of inlines?
>

Our solution here is:

1- we have a class, call it Array<>, that is our base Array class that
we use everywhere.
2- we then specialized Array<T*> to be implemented in terms of a
wrapper around void*. (it can't be implemented in terms of void*, this
would be self referential)

We then have a rule - the specialization may only contain inlined
forwarding functions that forward to the base implementation, and may
only include casts to do this. In this way, the interface of Array<T>
is identical to that of Array<T*>. Finally, it is equally as
efficient as all of the methods are forwarding functions, forwarding
copies of pointers, and casting return values/out parameters, which
are pointers as well.

So, no, I do not believe that it is naive to believe that this can be
done with only casts and inlined forwarding functions, as this is
exactly what we have done.

Finally, while I was writing this, I realized that we only need to
cast OUT parameters, as IN parameters will upcast implicitly (SUB*
will cast to BASE*).

here is an example:

from base class:

const T& operator[](int i) const;

from specialization:

const SubType& operator[](int i) const { return (const
SubType&)inherited::operator[](i); }

joshua lehrer
brown university, 1996 (yes, you lectured to me, Scott. "should this
be const?")
factset research systems

Francis Glassborow

unread,

May 20, 2002, 10:53:34 AM5/20/02

to

In article <mjolnir_DELETE_-F4...@newsfeed.slurp.net>,
Adin Hunter Baber <mjolnir...@soltec.net> writes
>Could you please clarify what you mean by "explicit instantions" ?
>Including a short piece of example code would be helpful.

in example.h

template<typename T> class X {
// whatever
};

in int_example.cpp
(actually it would often be better to make that
int_example.lib, assuming your compiler reacts
to extension names)

template X<int>;

The above forces the compiler to instantiate X for an int. The big
problem being that it will try to instantiate all members of X, and some
-- that you do not intend to use for an int -- may not be instantiable.
It would help if the language just required that those be skipped, or if
compilers issued diagnostics and then skipped them. It is also possible
to explicitly instantiate member functions on a one by one basis but
that is often tedious and computers should do the tedious.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Garry Lancaster

unread,

May 20, 2002, 11:07:45 AM5/20/02

to

Garry Lancaster wrote:
> > In some cases the programmer can intervene. In
> > the pointer case, a smart programmer can develop a
> > partial specialization for all pointer types that is
> > implemented using a shared void* based
> > implementation. This can be somewhat more
> > work than a straightforward, non-shared,
> > implementation and is often slower. This is a key
> > point.

Scott Meyers:

> Assuming the T* implementation consists of nothing
> more than inlined calls to the void* implementation,
> why would this be slower?

If the only difference between the simple implementation
and a shared void* based implementation is some extra
inlined forwarding calls containing a bit of casting and
you have a decent optimizer it should come out the same
speed [see note 1].

> Or is it naive to expect
> the T* wrapper to consist only of inlines?

No, I don't think that is naive at all. However, it is
important that at least parts of the rest of it - the
void* part - are *not* inlined, in order to benefit from
bloat reduction. So, reduced inlining usually makes
a speed difference.

With more complicated templates (e.g. ones that
actually store T rather than just T*) the refactoring to
a void*-based implementation may require an extra
level of indirection, which is another reason why we
would expect it to be slower.

These are my general rules of thumb, but compilers
have the ability to surprise. The only real way of
knowing whether implementation A is faster or
smaller than implementation B for a particular
compiler is to measure it.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

NOTES:

1. The standard doesn't actually require that the bit
patterns of T* and void* be identical, so in theory
the conversions between the two types could add an
extra overhead. However, I'm not personally familiar
with any platforms that have a C++ compiler where
the bit patterns *are* different.

Vincent Finn

unread,

May 20, 2002, 11:08:21 AM5/20/02

to

Francis Glassborow wrote:

> In article <3CE4D538...@2.com>, Vincent Finn
> <1...@2.com.cos.agilent.com> writes
> >> Yes, and this is a real problem when programmers actually stuff the
> >> implementation into the class template definition. This is one reason
> >> why I advocate that the implementation should always be in its own file
> >> even if you then #include it into the definition file.
> >
> >But surely if you #include it in the header file there is no real difference
> >Is there any actual benefit ?
>
> Yes there is. The most important difference is that it prevents
> programmers from writing in class implementations of member functions of
> templates. Such code is implicitly inline, and therefore you are
> unnecessarily relying on the compiler to not inline the code.
>
> >
> >I did use the #inlcude for a while but went back to putting the whole
> >code in the header because people found it more confusing the other way !
>
> And do you do the same for non template classes?

I do, of course

It was the fact that the .cpp file for templates is not compiled that was
confusing
People tried to compile the .cpp as they would with a normal code file and it
wouldn't compile
(the problem was then exacerbated by the fact that VC seems to have a small bug
to do with
files that are excluded from the compile, if you try and compile them you will
get errors from compiling the
project until you shut the workspace down and open it again !)

I was not aware that the inlining was changed by moving it so I may return to
that practice

Vin

Francis Glassborow

unread,

May 20, 2002, 1:55:33 PM5/20/02

to

In article <3CE8C129...@2.com>, Vincent Finn
<1...@2.com.cos.agilent.com> writes

>I was not aware that the inlining was changed by moving it so I may return to
>that practice

Let me be clear:

template <typename T> class X {
// whatever

public:
void complicated_function(/* parameters*/){
// code
}
};

X::complicated function is implicitly inline and it is up to the good
sense of the compiler whether it notes your unintended hint. However:

template <typename T> class X {
// whatever

public:
void complicated_function(/* parameters*/);
};

template <typename T>
void X<T>::complicated_function(/* parameters*/){
// code
}

Is not inline unless you say so. However there is an added advantage in
not stuffing that code automatically into the header for X. You have an
option to use explicit instantiation by having a file containing, for
example,

#include "x.h"
#include "x.impl"
template X<int>;

And calling that file something like int_X.lib may even result in the
executable only including the functions you call even without a smart
linker.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Arnold the Aardvark

unread,

May 20, 2002, 1:56:21 PM5/20/02

to

"Vincent Finn" <1...@2.com.cos.agilent.com>

> It was the fact that the .cpp file for templates is not compiled that was
> confusing

It is quite common to put the template implementation in a file called
.ipp or similar, to avoid the confusion you mention. For my own
part, I have a tendency to implicitly inline one-liners, but
implement larger functions below the class definition i.e. in the header
but not inline. I believe moving these to an .ipp file might improve my
code a little, but I haven't written any large/complicated templates
yet so the gain would be small.

Won't 'export' make all this go away, anyway (eventually)?

Arnold the Aardvark

Daniel Miller

unread,

May 20, 2002, 8:54:24 PM5/20/02

to

Vincent Finn wrote:

> Francis Glassborow wrote:
>
>
>>In article <3CE4D538...@2.com>, Vincent Finn
>><1...@2.com.cos.agilent.com> writes
>>
>>>>Yes, and this is a real problem when programmers actually stuff the
>>>>implementation into the class template definition. This is one reason
>>>>why I advocate that the implementation should always be in its own file
>>>>even if you then #include it into the definition file.
>>>>
>>>But surely if you #include it in the header file there is no real difference
>>>Is there any actual benefit ?
>>>
>>Yes there is. The most important difference is that it prevents
>>programmers from writing in class implementations of member functions of
>>templates. Such code is implicitly inline, and therefore you are
>>unnecessarily relying on the compiler to not inline the code.
>>
>>
>>>I did use the #inlcude for a while but went back to putting the whole
>>>code in the header because people found it more confusing the other way !
>>>
>>And do you do the same for non template classes?
>>
>
> I do, of course
>
> It was the fact that the .cpp file for templates is not compiled that was
> confusing
> People tried to compile the .cpp as they would with a normal code file and it
> wouldn't compile

[...snip...]

Use a different file-extension as RogueWave does: .cc versus .cpp (or
equivalent).

RogueWave uses .cpp (in a "source" directory) for function-definition/et.al.
files which are intended for canonical compilation of out-of-line
function-definitions and of static-data definitions to object-files.

RogueWave uses .cc (in an "include" directory) for function-definition/et.al.
files which are intended for #including to make template function-definitions &
static-data defintions known to compiler prior to the point of
usage/specialization/instantiation/expansion if being built on a platform which
requires (or prefers) compile-time expansion of templates.

The difference of file-extension (plus the difference of directory) makes
these categories of files quite clearly separated to even the casual observer.
Header files retain their traditional role of containing declarations (where
class-definition is a category of declaration).

Francis Glassborow

unread,

May 20, 2002, 8:56:48 PM5/20/02

to

In article <1021909150.18263....@news.demon.co.uk>,
Arnold the Aardvark <aard...@notthistubulidentata.demon.co.uk> writes

>"Vincent Finn" <1...@2.com.cos.agilent.com>
>
>> It was the fact that the .cpp file for templates is not compiled that was
>> confusing
>
>It is quite common to put the template implementation in a file called
>.ipp or similar, to avoid the confusion you mention. For my own
>part, I have a tendency to implicitly inline one-liners, but
>implement larger functions below the class definition i.e. in the header
>but not inline. I believe moving these to an .ipp file might improve my
>code a little, but I haven't written any large/complicated templates
>yet so the gain would be small.
>
>Won't 'export' make all this go away, anyway (eventually)?

Some hope so. But if you are not writing separate implementation files,
export will do nothing to help you.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Stefan Heinzmann

unread,

May 21, 2002, 9:11:26 AM5/21/02

to

Scott Meyers <sme...@aristeia.com> wrote in message news:<MPG.174cf495d...@news.hevanet.com>...
> I often hear about "code bloat" arising from templates. I've been trying

> to figure out exactly what this means. From what I can tell, there are
> several different meanings, as follows, where Temp is a template and T, T1,
> and T2 are type parameters.
>

> 1. [Multiple instantiations] Temp<T> is instantiated in more than one
> translation unit, so when the objs are linked together, the exe has
> more than one copy of Temp<T>'s member functions.

[...]

> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?

In a statically linked executable, the above point should be solved
with a reasonable toolset, as others have said. I am astonished
however that noone has yet mentioned that this can indeed be a problem
in dynamically loaded libraries or plug-in modules. Since they're
compiled and linked separately, they will of course contain their own
template instantiations.

For example if my application consists of a number of separate modules
which are loaded dynamically (not uncommon in Windows-land), and I use
the standard library a lot (streams, strings and the like), I will
likely have instantiations of the same stuff in each module.

This can of course be circumvented by putting the commonly used
instantiations into a shared library, but it is not as easy as you may
think.

Cheers
Stefan

Vincent Finn

unread,

May 21, 2002, 2:03:38 PM5/21/02

to

Arnold the Aardvark wrote:

> "Vincent Finn" <1...@2.com.cos.agilent.com>
>
> > It was the fact that the .cpp file for templates is not compiled that was
> > confusing
>
> It is quite common to put the template implementation in a file called
> .ipp or similar, to avoid the confusion you mention. For my own
> part, I have a tendency to implicitly inline one-liners, but
> implement larger functions below the class definition i.e. in the header
> but not inline. I believe moving these to an .ipp file might improve my
> code a little, but I haven't written any large/complicated templates
> yet so the gain would be small.

you use '.ipp'
Francis example uses '.impl'
and Daniel suggests '.cc'

Is there any accepted naming for this file ?

Vin

Francis Glassborow

unread,

May 22, 2002, 5:25:23 AM5/22/02

to

In article <95e0e5ef.02052...@posting.google.com>, Stefan
Heinzmann <stefan_h...@yahoo.com> writes

>In a statically linked executable, the above point should be solved
>with a reasonable toolset, as others have said. I am astonished
>however that noone has yet mentioned that this can indeed be a problem
>in dynamically loaded libraries or plug-in modules. Since they're
>compiled and linked separately, they will of course contain their own
>template instantiations.
>
>For example if my application consists of a number of separate modules
>which are loaded dynamically (not uncommon in Windows-land), and I use
>the standard library a lot (streams, strings and the like), I will
>likely have instantiations of the same stuff in each module.

But that is not a template issue, it is an issue with dynamically linked
libraries.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Scott Meyers

unread,

May 22, 2002, 5:27:18 AM5/22/02

to

[ I'm posting this as a favor for the writer, who sent it to me via
email. ]

I posted this reply to comp.lang.c++.moderated, but it didn't appear
so either my newsreader screwed up (likely, since Outlook sucks) or
the moderators didn't approve it (in which case I never got it in
return because I posted with a spam-protected email address).

Anyway, regardless, I thought I'd send it directly to you as it
has a slightly different take on "code bloat" than you perhaps
intended, but still an example of what we consider "code bloat"
to be.

{If you are having problems posting, you might want to try
google. -- mod}

--- snip ---

"Scott Meyers" <sme...@aristeia.com> wrote in message
news:MPG.174cf495d...@news.hevanet.com...
>I often hear about "code bloat" arising from templates. I've been trying
>to figure out exactly what this means. From what I can tell, there are
>several different meanings, as follows, where Temp is a template and T,
T1,
>and T2 are type parameters.

>[...]

I'm working with extremely performance sensitive code all day long
and the code bloat I'm experiencing is primarily due to abstraction
and aliasing, but as in some cases templates 'cause' the abstraction
(which in term 'cause' the aliasing) you could certainly blaim the
templates in those cases.

I've tried to boil down the problem to a very simple example to
illustrate it. Consider the problem of maintaining a buffer for DMA
data. In C++ you would perhaps write this along the lines of:

class DmaBuffer {
public:
DmaBuffer(int size) { pBufStart = pBuf = new int[size]; }
~DmaBuffer() { delete pBuf; }
int *GetBuffer() { return pBufStart; }
void AddQWord(int a, int b, int c, int d) {
pBuf[0] = a;
pBuf[1] = b;
pBuf[2] = c;
pBuf[3] = d;
pBuf += 4;
}
private:
int *pBufStart, *pBuf;
};

Its (simplified) use might look something like:

DmaBuffer dmaBuffer(100000);

void TestDmaBuffer1(const int *p)
{
for (int i = 0; i < 1000; i++) {
dmaBuffer.AddQWord(p[0],p[1],p[2],p[3]);
p += 4;
}
}

This results in the following code on a certain MIPS platform using
gcc:

TestDmaBuffer1(int *)
001000C8 0080402D dmove t0,a0
001000CC 240903E7 addiu t1,zero,0x3E7
001000D0 8F828114 lw v0,0x8114(gp) ; loads pBuf
001000D4 2529FFFF addiu t1,t1,0xFFFF
001000D8 8D030000 lw v1,0x0000(t0)
001000DC 8D060004 lw a2,0x0004(t0)
001000E0 24470010 addiu a3,v0,0x10
001000E4 8D040008 lw a0,0x0008(t0)
001000E8 8D05000C lw a1,0x000C(t0)
001000EC AC430000 sw v1,0x0000(v0)
001000F0 AC460004 sw a2,0x0004(v0)
001000F4 AC440008 sw a0,0x0008(v0)
001000F8 AC45000C sw a1,0x000C(v0)
001000FC AF878114 sw a3,0x8114(gp) ; stores pBuf
00100100 0521FFF3 bgez t1,0x001000D0
00100104 25080010 addiu t0,t0,0x10
00100108 03E00008 jr ra
0010010C 00000000 nop

What's wrong with that you say? Why? Because of aliasing issues the
compiler is updating pBuf within the loop, as indicated by the
added comments.

But that's silly. As the programmers we know that in this case the
buffer and the buffer object can never alias. So, we try declaring
each and every pointer as "restrict" using the restrict extension
that gcc provides, in hope that gcc can do better with it. But it
doesn't -- it generates exactly the same code. Blah.

So can you get rid of it? Yes, by giving up on the abstraction and
writing the code in a straightforward, C-style, no-nonsense version:

int dmaBuffer2[100000];
void TestDmaBuffer2(int *p)
{
int *dst = &dmaBuffer2[0];
for (int i = 0; i < 1000; i++) {
dst[0] = p[0];
dst[1] = p[1];
dst[2] = p[2];
dst[3] = p[3];
dst += 4;
p += 4;
}
}

Then, as we've effectively circumvented the aliasing issue, we
finally get rid of that annoying and unnecessary code:

TestDmaBuffer2(int *)
00100110 3C020012 lui v0,0x12
00100114 240603E7 addiu a2,zero,0x3E7
00100118 2445E110 addiu a1,v0,0xE110
0010011C 00000000 nop
00100120 8C830000 lw v1,0x0000(a0)
00100124 24C6FFFF addiu a2,a2,0xFFFF
00100128 ACA30000 sw v1,0x0000(a1)
0010012C 8C820004 lw v0,0x0004(a0)
00100130 ACA20004 sw v0,0x0004(a1)
00100134 8C830008 lw v1,0x0008(a0)
00100138 ACA30008 sw v1,0x0008(a1)
0010013C 8C82000C lw v0,0x000C(a0)
00100140 24840010 addiu a0,a0,0x10
00100144 ACA2000C sw v0,0x000C(a1)
00100148 04C1FFF5 bgez a2,0x00100120
0010014C 24A50010 addiu a1,a1,0x10
00100150 03E00008 jr ra
00100154 00000000 nop

To return to the issue at hand, now picture that first class no
longer being just a simple DmaBuffer, but a templatized version of
a stack, a list or some other data structure that contains a pointer
and therefore suffers the exact same aliasing problem.

Now you're suddenly having this issue everywhere in your code that
you're using that templatized class, whatever the instantiated type
is, and whenever that pointer is updated -- generally when you add
or remove values from your class.

But wait, this is C++ so you've implemented iterators in terms of
inline functions over your data structure as well. Well, depending
on how you did this, now your iterator pointer probably suffers
from exactly the same problem. So you don't even have to update the
data structure to see code bloat and slowdown, you just have to
iterate over it!

But wait again! It's actually much much worse than this, because the
aliasing issue compounds, so that the one pointer update the compiler
wasn't able to remove now causes an aliasing issue with something
else that really should have been removed as well, etc.

Of course, this really is a "well-known" problem (well, not as well-
known as it should be) -- it's really the C++ "abstraction penalty"
problem, see eg:

http://www.acl.lanl.gov/Pooma96/abstracts/robison.html

That gcc has a serious problem with it, despite decent results on the
Stepanov benchmark, is also known to some:

http://gcc.gnu.org/ml/gcc/2000-11/msg00323.html

Expression templates can address abstraction penalty problems in certain
situations, but certainly not all of them. What's worse, expression
templates IMO constitute write-only code and therefore have a limited
viability.

Overall I consider this a very real and very important issue.
Unfortunately, there are lots of people carelessly brushing it aside --
sadly most of them never having heard of the term "abstraction penalty"
in the first place, less read up on it and explored its effects first-
hand.

--
Christer Ericson
Senior principal programmer
Sony Computer Entertainment, Santa Monica

Francis Glassborow

unread,

May 22, 2002, 9:14:16 AM5/22/02

to

In article <MPG.1754252d7...@news.hevanet.com>, Scott Meyers
<sme...@aristeia.com> (actually it was Christer Ericson)writes

>I'm working with extremely performance sensitive code all day long
>and the code bloat I'm experiencing is primarily due to abstraction
>and aliasing, but as in some cases templates 'cause' the abstraction
>(which in term 'cause' the aliasing) you could certainly blaim the
>templates in those cases.

>I've tried to boil down the problem to a very simple example to
>illustrate it. Consider the problem of maintaining a buffer for DMA
>data. In C++ you would perhaps write this along the lines of:

>class DmaBuffer {
>public:
> DmaBuffer(int size) { pBufStart = pBuf = new int[size]; }

Stylistically I would prefer:
DmaBuffer(int size):pBufStart(new int[size], pBuf(pBufStart) {}

> ~DmaBuffer() { delete pBuf; }

that is undefined behaviour. I think you meant pBufStart. And you should
also have written delete[]

> int *GetBuffer() { return pBufStart; }
> void AddQWord(int a, int b, int c, int d) {
> pBuf[0] = a;
> pBuf[1] = b;
> pBuf[2] = c;
> pBuf[3] = d;
> pBuf += 4;

Now here, where you exit the function the compiler inserts a sequence
point at which stage it ensures that all side effects are complete. It
certainly needs to update at the very least cached value of bBuf because
the function cannot know what you will do next and so needs to clear the
register. Note that this will be in the function implementation where it
knows nothing of the calling context. Well, as you have written this in
class it might inline the call, on the other hand it might not.

> }
>private:
> int *pBufStart, *pBuf;
>};

>DmaBuffer dmaBuffer(100000);

But that is a matter for how good the optimiser is when calling an
inlined function in a loop. That is not a language issue.

>But that's silly. As the programmers we know that in this case the
>buffer and the buffer object can never alias. So, we try declaring
>each and every pointer as "restrict" using the restrict extension
>that gcc provides, in hope that gcc can do better with it. But it
>doesn't -- it generates exactly the same code. Blah.

Of course, because how does restrict help when you are working across
invocations of the function?

>So can you get rid of it? Yes, by giving up on the abstraction and
>writing the code in a straightforward, C-style, no-nonsense version:

But I think you get the same problem if your C version calls a function.
The issue is not whether you use C or C++ but whether you write code
that encapsulates the copying of 4 ints to your buffer or not.
Furthermore there is a serious difference between the two pieces of code
because there is no equivalent to pBuf in your C code. That means that
you cannot track the next available location in your array. Then there
is the issue of dynamic memory v static memory, how much code you must
write if you need two buffers etc. Put simply, your C code is quite
different in intent from your C++ code.

>int dmaBuffer2[100000];
>void TestDmaBuffer2(int *p)
>{
> int *dst = &dmaBuffer2[0];
> for (int i = 0; i < 1000; i++) {
> dst[0] = p[0];
> dst[1] = p[1];
> dst[2] = p[2];
> dst[3] = p[3];
> dst += 4;
> p += 4;
> }
>}

>Then, as we've effectively circumvented the aliasing issue, we
>finally get rid of that annoying and unnecessary code:

I think all your example demonstrates is that in code that needs to be
very tight, you have to consider the cost of being general.

>To return to the issue at hand, now picture that first class no
>longer being just a simple DmaBuffer, but a templatized version of
>a stack, a list or some other data structure that contains a pointer
>and therefore suffers the exact same aliasing problem.

No, I think that all your example shows is that poor design and
inappropriate use of technology results in poor code. Its a case of
horses for courses and good programmers (and you have demonstrated that
you understand the issues, but - unfairly, IMO, - blame the language)
know that they need to watch the level of abstraction when concerned
with tight code requirements.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

James Kanze

unread,

May 22, 2002, 9:19:34 AM5/22/02

to

Dietmar Kuehl <dietma...@yahoo.com> writes:

|> Scott Meyers wrote:

|> > Are there other causes of code bloat?

|> The single biggest problem I have encountered with respect to code
|> bloat is people comparing apples to oranges: Yes, a fully
|> type-safe program using different containers for everything is
|> probably bigger than a program using a container for 'void*' or
|> some 'CObject*' which has *very* different properties (a typical
|> difference apart from type-safety is value vs. reference
|> semantics).

Finally a sensible answer.

When I first started using C++ (long before templates), I found that
my C++ programs were regularly bigger than my C programs. But when I
analysed why, the reason was always that they did more -- why offer
straight string comparison if you have a regular expression class
handy, and offering full regular expressions is no more work than
using strcmp. (And let's face it, my GB_RegExpr *is* a bit bigger
than strcmp.)

The largest singular reason for code bloat is what has disparagingly
been called featuritis. In so far as templates make it easier to
implement complex features correctly, it is responsible for code
bloat. Because if we can, we do.

Sometimes, there is a space versus time tradeoff, of course. On my
last project, I had six sets of the same contained type, each with a
different ordering criterion. The standard library's use of templates
imposed six separate implementations of "std::set< MyType const* >";
had it used inheritance for the ordering object, there would only have
been one. (In the case of this particular application, the tradeoff
was the right one.)

In the end, there's nothing you can do with templates that you cannot
do without. It's just that some things are a lot, lot harder; who
wants to maintain 20 copies of basically the same code, just because
it deals with different types (and you want the type safety, because
you want the application to work -- Dietmar is definitly right here;
comparing a correct program with one that works most of the time isn't
really a fair comparison).

|> Make a correct comparison and compare similar solutions to similar
|> solutions: Multiple 'vector<T>' do not corresponding to an array
|> class taking a 'void*' - this is what a 'std::vector<void*>' is
|> which in turn may be a reasonable choice to be used in
|> programs. You should not forget that you can parameterize
|> templates on their template arguments :-) If may need writing a
|> (generic?) proxy class to have easy access to the operations
|> required by the template but if a container of 'void*' is the
|> solution, use this solution.

That's true up to a point. On the other hand, if you have six
different vectors, each containing a smart pointer to a different
type, you really don't need six instances of the code for vector, nor
for the smart pointer, once the compiler has done the type checking.

All of the current compilers I know will give you six instances.
Let's hope that this improves in the future. An implementation could
provide a specialized instance of std::vector<void*>, then a partial
specialization on std::vector<T*> which derives from std::vector<T*>.
But you really can't, or shouldn't, expect this kind of effort from
the application programmers. Non-algorithmic optimization *should* be
the compiler's job.

|> As far as I have seen, there is actually no [executable]
|> code-bloat (when comparing similar approaches rather than
|> completely different ones).

In my example, above, if the std::set had been designed to use either
the template pattern or delegation through an abstract base class for
comparison, the code would have definitly been smaller. So there is
*some* code bloat. The question is one of significance -- I suspect
that the above case is exceptional, but the total space used by the
instantiations of std::set was still only about 1% of the total code
space (and the code was only about 10% of the size of the typical data
sets). So frankly, who cares.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique oriente objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

James Kanze

unread,

May 22, 2002, 9:21:16 AM5/22/02

to

Scott Meyers <sme...@aristeia.com> writes:

|> DmaBuffer dmaBuffer(100000);

This is a problem independant of templates, and linked to the use of
pointers. In theory, a compiler could trace the use of the pointer,
and determine that it was initially from an operator new, and that no
other aliases exist, but few compilers do.

In this case, it is also a case of really poor optimization; once
inlining has taken place, the compiler can easily see *all* of the
accesses through the relative pointers. The only problem it has to
deal with is a possible aliasing between *p and *pBufStart; there is
no code anywhere that could modify pBufStart outside of the function.

This is standard optimization technology; I have seen it in compilers
25 years ago. There is no excuse for it not being present in a
compiler today. Any simple peephole optimizer should be able to do
the trick.

I'm not familiar with your assembly, but roughly speaking, what I
would expect from the generated code is that it load both pointers and
the count in registers, and each time in the loop, copy the four
values, increment the two pointers, decrement the count, and test for
the end. A good compiler might also eliminate the count, using a
comparison with an end pointer. (In this case, what the compiler is
maintaining in memory is pBuf. The only variable with the same type,
and thus which could possibly alias it, is p. But pBuf is a member
variable, and p is a local variable, so the compiler knows that no
aliasing is possible.)

|> But that's silly. As the programmers we know that in this case the
|> buffer and the buffer object can never alias. So, we try declaring
|> each and every pointer as "restrict" using the restrict extension
|> that gcc provides, in hope that gcc can do better with it. But it
|> doesn't -- it generates exactly the same code. Blah.

That's because the problem is unrelated to any aliasing between pBuf
and the p argument of the function. Such aliasing will only become a
problem if the compiler starts shuffling code around (e.g. to keep the
various pipelines full). Such aliasing will normally only be a
problem if you use a value twice, and write through a pointer in
between uses -- there is a possibility that the write through the
pointer will modify the value, so it must be reloaded the second time.

About the only changes that restrict will allow here is for the
compiler to shuffle the loads and the stores; this might be relevant,
depending on how the pipelines were managed (but manifestly, the
compiler is far from taking such issues into account).

|> So can you get rid of it? Yes, by giving up on the abstraction and
|> writing the code in a straightforward, C-style, no-nonsense
|> version:

|> int dmaBuffer2[100000];
|> void TestDmaBuffer2(int *p)
|> {
|> int *dst = &dmaBuffer2[0];
|> for (int i = 0; i < 1000; i++) {
|> dst[0] = p[0];
|> dst[1] = p[1];
|> dst[2] = p[2];
|> dst[3] = p[3];
|> dst += 4;
|> p += 4;
|> }
|> }

In this case, the difference isn't so much the C versus C++ style, it
is a dynamically allocated buffer with a non-local pointer vs. a
statically allocated array.

Declare the dmaBuffer class to contain a large buffer, rather than
using dynamic memory, and maintain an index rather than a pointer, and
a good compiler should be able to optimize. I tried it with g++,
both 2.95.2 and 3.0.4, on my Linux PC, however, and the results were
really bad. The necessary optimization techniques are well known,
once the inlining has occured, and there is *NO* excuse for such poor
code generation.

However, one important point to consider is that we are comparing
oranges to apples -- the class implementation takes a parameter in
order to allocate dynamically the exact size needed (and thus must
work with pointers), where as the second implementation uses a
statically allocated fixed size array, rather than pointers, so the
compiler has a lot more information to work with.

|> Then, as we've effectively circumvented the aliasing issue, we
|> finally get rid of that annoying and unnecessary code:

But we've changed the semantics.

But what is the alternative? An iterator which is implemented as a
pointer, with all functions inline, will hardly cause any code bloat
over a simple pointer.

To show code bloat, you have to show what the alternative solutions
cost. For the moment, you've shown two additional machine
instructions. Hardly what I would call code bloat. (The runtime
repercussions could be important however, since those two instructions
are executed in what is presumably a tight loop.)

|> But wait again! It's actually much much worse than this, because
|> the aliasing issue compounds, so that the one pointer update the
|> compiler wasn't able to remove now causes an aliasing issue with
|> something else that really should have been removed as well, etc.

|> Of course, this really is a "well-known" problem (well, not as
|> well- known as it should be) -- it's really the C++ "abstraction
|> penalty" problem, see eg:

|> http://www.acl.lanl.gov/Pooma96/abstracts/robison.html

The problem described is a performance problem, and not a code bloat
problem.

It is also a problem which shouldn't occur in your simple example.
The problem occurs when inline functions become too complex, or call
other inline functions, to the point where the resulting code becomes
too complex for the optimizer. The problem also occurs because many
(most) compilers simply will not pass a class type in a register, even
if the class type is just a wrapper for a simple base type or pointer.

|> That gcc has a serious problem with it, despite decent results on
|> the Stepanov benchmark, is also known to some:

|> http://gcc.gnu.org/ml/gcc/2000-11/msg00323.html

Again, we are talking about different things. Code bloat isn't an
issue here, and the issues are exactly the same with or without
templates.

|> Expression templates can address abstraction penalty problems in
|> certain situations, but certainly not all of them. What's worse,
|> expression templates IMO constitute write-only code and therefore
|> have a limited viability.

The real problem with expression templates (other than the fact that
they are unmaintainable -- I agree with you there) is that they lead
to the situation described above, where the results of inlining become
too complex for the optimizer. As an experiment, I once wrote a
Matrix class whose operators returned nodes in an expression tree,
rather than temporary objects. The goal was that everything would end
up inlined, and that the compiler would generate a neat loop out of
it. In practice, one almost immediatly reached the situation where
there was too much inlining, and the compilers I had at the time (g++
and Sun CC) both gave up on even the simplest expressions.

|> Overall I consider this a very real and very important issue.
|> Unfortunately, there are lots of people carelessly brushing it
|> aside -- sadly most of them never having heard of the term
|> "abstraction penalty" in the first place, less read up on it and
|> explored its effects first- hand.

The term abstraction penalty refers to a very specific runtime (not
space) problem, and only affects programs which do a large number of
very simple operations on very small objects. I suspect that most of
the affected programs are in the numerics domain, but there are
probably examples elsewhere as well. However, I can say that none of
the applications I've worked on have been affected by it; in all
cases, we had a number of large objects (which couldn't have been
passed in registers anyway), and fairly complex operations on the
object (so they weren't inlined, and the compiler optimizer could do
its job within the member function).

--
James Kanze mailto:ka...@gabi-soft.de

Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

Chris Uzdavinis

unread,

May 22, 2002, 9:25:05 AM5/22/02

to

Vincent Finn <1...@2.com.cos.agilent.com> writes:

> you use '.ipp'
> Francis example uses '.impl'
> and Daniel suggests '.cc'
>
> Is there any accepted naming for this file ?

Apparently not. To add one more, the ACE library uses '.i'

--
Chris

Paavo Helde

unread,

May 22, 2002, 9:38:21 AM5/22/02

to

> Are there other causes of code bloat? Of the different types of code
> bloat, which are the most troublesome in practice?

An example schema, remotely resembling a real-life situation:

class Buffer {
/* ... */
public:
enum datatype_t {Integer, Double, Complex, /*...*/};
datatype_t GetType() const;
int* GetIntBuffer(); // throws if buffer not integer
double* GetDoubleBuffer();
complex<double>* GetComplexBuffer();
// ...
};

// Objects of type Buffer come from other DLL/network machine, etc.
// These encapsulate some data arrays read in from external source,
// which can be of several different types. The exact type is known
// only at run-time.

// Apply an operation to two buffers producing the result in the third
buffer:
// The function for that is ApplyOp() defined below; the rest are
template
// helpers.

// Because the types of operands are known only at run-time, the
templates
// are instantiated for all combinations. (Another solution would be do
// carry out sophisticated analysis of what combinations are meaningful,
// support only those and pre-convert operands to the required type
before
// applying the operation).

template<typename T, typename U, typename V>
void op3(const T* x, const U* y, V* z) {
// some real working code
}

template<typename T, typename U>
void op2(const T* x, const U* y, Buffer& Z) {
switch(Z.GetType()) {
case Buffer::Integer: op3(x, y, Z.GetIntBuffer()); break;
case Buffer::Double: op3(x, y, Z.GetDoubleBuffer()); break;
case Buffer::Complex: op3(x, y, Z.GetComplexBuffer()); break;
// ...
}
}

template<typename T>
void op1(const T* x, const Buffer& Y, Buffer& Z) {
switch(Y.GetType()) {
case Buffer::Integer: op3(x, Y.GetIntBuffer(), Z); break;
case Buffer::Double: op3(x, Y.GetDoubleBuffer(), Z); break;
case Buffer::Complex: op3(x, Y.GetComplexBuffer(), Z); break;
// ...
}
}

void ApplyOp(const Buffer& X, const Buffer& Y, Buffer& Z) {
switch(X.GetType()) {
case Buffer::Integer: op3(X.GetIntBuffer(), Y, Z); break;
case Buffer::Double: op3(X.GetDoubleBuffer(), Y, Z); break;
case Buffer::Complex: op3(X.GetComplexBuffer(), Y, Z); break;
// ...
}
}

Now go figure the number of op3() instantiations and dependence on the
number of data types supported.

I do not say that such code bloat is necessarily troublesome; of course
the size of executable goes through roof, but hard disks are large these
days, and any decent OS should load only those parts of executable which
are accessed at the moment.

OTOH, note that because of huge amount of linker symbols linking (both
static and dynamic) will probably go slower; also, if the templated
things are polymorphic classes, then at least some widespread
implementations want to initialize all vtables in the beginning of the
program, which may take a noticable time.

hth
Paavo

Hannah Schroeter

unread,

May 22, 2002, 9:40:26 AM5/22/02

to

Hello!

In article <y2FNktF+...@robinton.demon.co.uk>,
Francis Glassborow <fran...@robinton.demon.co.uk> wrote:
>[...]

>> 1. [Multiple instantiations] Temp<T> is instantiated in more than one
>> translation unit, so when the objs are linked together, the exe has
>> more than one copy of Temp<T>'s member functions.

>An implementation that does this is broken. Consider what will happen
>with:

>void foo(){ static int i; ...}

>as a member function of the class template. The implementation HAS to be
>able to remove duplicated function instantiations.

The implementation has to coalesce (sp?) the "static int i" for all
instantiations, but the code itself may be duplicated. E.g. gcc.
In that case, the implementation is suboptimal, but not completely
wrong.

>[...]

Kind regards,

Hannah.

Tom Puverle

unread,

May 22, 2002, 6:15:30 PM5/22/02

to

> The real problem with expression templates (other than the fact that
> they are unmaintainable -- I agree with you there) is that they lead
> to the situation described above, where the results of inlining become
> too complex for the optimizer. As an experiment, I once wrote a
> Matrix class whose operators returned nodes in an expression tree,
> rather than temporary objects. The goal was that everything would end
> up inlined, and that the compiler would generate a neat loop out of
> it. In practice, one almost immediatly reached the situation where
> there was too much inlining, and the compilers I had at the time (g++
> and Sun CC) both gave up on even the simplest expressions.

I did that once too. My matrix class worked on an element-by-element basis.
There was an overloaded operator() for element access. The expression node's
operator(int i, int j) would then return (e.g. for +) would return
subExpr1_(i,j) + subExpr2_(i,j) and similarly for other operators. All the
real work would then happen in the assignment operators of the destination
matrix - a "template for loop" that generated something like:

innerRep_[0][0] = ....
innerRep_[0][1] = ....
...
innerRep_[1][0] = ....
etc.

In the end the optimizer managed to expand the entire expression into just a
list of array accesses
( e.g. innerRep[0][0] = mat1[0][0] + mat2[0][0] + mat3[0][0]; etc. )

To get down from 1 to 0 temporaries the matrix class used 2 data buffers and
an indicator which is valid.

It's a lot of generated code (not really usable for very large matrices) but
it surely is fast! Another advantage of it is that it might be easier for a
vectorising optimizer to do something with it...

Tom

Stefan Heinzmann

unread,

May 22, 2002, 6:15:46 PM5/22/02

to

Francis Glassborow <francis.g...@ntlworld.com> wrote in message news:<edC2eYAp...@robinton.demon.co.uk>...

> In article <95e0e5ef.02052...@posting.google.com>, Stefan
> Heinzmann <stefan_h...@yahoo.com> writes
> >In a statically linked executable, the above point should be solved
> >with a reasonable toolset, as others have said. I am astonished
> >however that noone has yet mentioned that this can indeed be a problem
> >in dynamically loaded libraries or plug-in modules. Since they're
> >compiled and linked separately, they will of course contain their own
> >template instantiations.
> >
> >For example if my application consists of a number of separate modules
> >which are loaded dynamically (not uncommon in Windows-land), and I use
> >the standard library a lot (streams, strings and the like), I will
> >likely have instantiations of the same stuff in each module.
>
> But that is not a template issue, it is an issue with dynamically linked
> libraries.

Yes, but one that becomes worse with templates.

James Kanze

unread,

May 22, 2002, 6:23:41 PM5/22/02

to

han...@schlund.de (Hannah Schroeter) writes:

|> >An implementation that does this is broken. Consider what will
|> >happen with:

|> >void foo(){ static int i; ...}

|> >as a member function of the class template. The implementation
|> >HAS to be able to remove duplicated function instantiations.

|> The implementation has to coalesce (sp?) the "static int i" for
|> all instantiations, but the code itself may be
|> duplicated. E.g. gcc. In that case, the implementation is
|> suboptimal, but not completely wrong.

The implementation must ensure that &foo is the same in all
translation units. If it isn't, the implementation is broken.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Francis Glassborow

unread,

May 22, 2002, 7:10:07 PM5/22/02

to

In article <aceg0f$qet$1...@c3po.schlund.de>, Hannah Schroeter
<han...@schlund.de> writes

>In article <y2FNktF+...@robinton.demon.co.uk>,
>Francis Glassborow <fran...@robinton.demon.co.uk> wrote:
> >[...]
>
> >> 1. [Multiple instantiations] Temp<T> is instantiated in more than one
> >> translation unit, so when the objs are linked together, the exe has
> >> more than one copy of Temp<T>'s member functions.
>
> >An implementation that does this is broken. Consider what will happen
> >with:
>
> >void foo(){ static int i; ...}
>
> >as a member function of the class template. The implementation HAS to be
> >able to remove duplicated function instantiations.
>
>The implementation has to coalesce (sp?) the "static int i" for all
>instantiations, but the code itself may be duplicated. E.g. gcc.
>In that case, the implementation is suboptimal, but not completely
>wrong.

True, but it is a pretty poor compiler/linker that cannot do better than
that. I know that in the early days instantiations had more important
priorities but these days a respectable implementation should manage.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Daniel

unread,

May 22, 2002, 8:07:18 PM5/22/02

to

Ingolf Steinbach <ingolf.s...@jena-optronik.de> wrote in message news:<3CE3C636...@jena-optronik.de>...
> Scott Meyers wrote:
> > 2. [Duplicate instantiations] Temp<T1> and Temp<T2> could be share a
> > single underlying binary implementation, but they don't. The most
> > common example is when T1 and T2 are both pointer types, but it would
> > also apply if T1=int and T2=long on a machine where int and long are
> > the same size.
>
> Are you sure? What about:
>
> template <typename T> void foo(T tl, T tr)
> {
> *tl = *tr;
> }
>
> typedef char* T1;
> typedef std::string* T2;
>
> Both T1 and T2 are pointer types. Could they share the same
> instantiation of foo()?

And how about:

class X
{
virtual const char *get_flavour() = 0;
};

class T1
{
virtual const char *get_flavour() { return "chocolate"; }
};

class T2
{
virtual const char *get_flavour() { return "strawberry"; }
};

template <class T> void dump_flavours(std::vector<T> &v)
{
for (size_t n=0; n<v.size(); n++)
std::cout << v[n]->get_flavour() << std::endl;
}

Now there's a choice:

There could be separate instantiations of dump_flavours for T1 and T2,
which would allow the compiler to do fast inline expansion of the call
to get_flavour, rather than virtual function calls.

Or the template could be just instantiated for X and the same binary
code used for T1 and T2, but this would mean calling get_flavour would
be a virtual function call (a little slower.)

Michael Glassford

unread,

May 22, 2002, 8:09:06 PM5/22/02

to

I came up with an interesting technique for dealing with this that I haven't
seen used anywhere else. It allows the.h file to be included in all files
that need it, exactly as for non-templates, and the .cpp file to be included
in the project or make file, exactly as for non-templates. It also provides
a convenient place for defining non-template definitions. It also prevents a
proliferation of file extensions (and the associated problem of
non-standardization of what extensions to use). It may also work with the
export keyword when more compilers implement it, although I'm not sure about
that because I've never really took the time to figure out exactly how it's
supposed to work (because I can't use it anyway and likely won't be able to
for quite some time yet). It's a little confusing to figure out how it works
at first (by #defining and #including things in the right places) but it
makes sense once you do figure it out. With appropriate comments, it might
prove useful. It looks like this (typing directly in email, so there are
bound to be plenty of mistakes).

//MyTemplate.h------------------------------------------

//Standard #include guard,
//but now also used by MyTemplate.cpp
//to determine what to define:
#if !defined(MYTEMPLATE_H)
#define MYTEMPLATE_H

class NonTemplateClass
{
public:
void F1_Inline(void);
void F1_NonInline(void);
};

template <typename T>
class TemplateClass : public NonTemplateClass
{
public:
void F2_Inline(void);
void F2_NonInline(void);
};

//Put inline functions here (or in the class definition) as you normally
would:

inline void NonTemplateClass::F1_Inline(void)
{...}

template<typename T>
inline void TemplateClass ::F2_Inline(void)
{...}

//#include the implementation file here if necessary
#if !defined(__ExportKeywordImplemented__)
#include "MyTemplate.cpp"
#endif
#endif

//MyTemplate.cpp------------------------------------------
#if !defined(MYTEMPLATE_H)
//If not being included by MyTemplate.h,
//define non-template implementation

#include "MyTemplate.h"

void NonTemplateClass::F1_NonInline(void)
{...}
#endif
#if defined(MYTEMPLATE_H) || defined(__ExportKeywordImplemented__)
//If being included by MyTemplate.h or if export keyword is implemented,
//define template implementations

template<typename T>
void TemplateClass ::F2_Non_Inline(void)
{...}
#endif

"Vincent Finn" <1...@2.com.cos.agilent.com> wrote in message
news:3CE8C129...@2.com...
[snip]

Daniel Miller

unread,

May 23, 2002, 4:41:09 AM5/23/02

to

James Kanze wrote:

> han...@schlund.de (Hannah Schroeter) writes:
>
> |> >An implementation that does this is broken. Consider what will
> |> >happen with:
>
> |> >void foo(){ static int i; ...}
>
> |> >as a member function of the class template. The implementation
> |> >HAS to be able to remove duplicated function instantiations.
>
> |> The implementation has to coalesce (sp?) the "static int i" for
> |> all instantiations, but the code itself may be
> |> duplicated. E.g. gcc. In that case, the implementation is
> |> suboptimal, but not completely wrong.
>
> The implementation must ensure that &foo

There can be misinterpretations regarding: which foo. (Read further before
responding.)

> is the same

There can be misinterpretations regarding: the same as what. (Read further
before responding.)

> in all translation units. If it isn't, the implementation is broken.

If we have the following:

template class ParameterizedType
<
class T
>
{
T blah;
static int i;
};

And then at points of usage, where MyClass is different from YourClass:
ParameterizedType<MyClass> m;
ParameterizedType<YourClass> y;

interpretation#1: Certain people will argue that the correct interpretation
is to have exactly one instance of ParameterizedType's i, period. In this
school of thought &m::i == &y::i. James, is this your intent?

interpretation#2: Whereas other people will argue that the correct
interpretation is to have more than one instance of ParameterizedType's i. In
this school of thought &m::i != &y::i. James, is this your intent?

People who subscribe to interpretation#2 would likely perceive this to be
code-bloat (rightly or wrongly), because they got more than one instance of
ParameterizedType's i. I say "rightly or wrongly" because remember we are
talking about *perceptions* when talking about what bloat is versus what bloat
is not. One person's unintended bloat is another person's intent. Remember
that generic programming with parameterized types is a form of code generation
and that people are notorious for willfully wanting the generated code to be
different than what was mechanistically/algorithmically generated.

class Class1
{
MyClass blah;
static int i;
};

class Class2
{
YourClass blah;
static int i;
};

People who subscribe to interpretation#1 would certainly not extend their
line of reasoning to argue that &Class1::i should == &Class2::i. Because
ParameterizedType is simply factoring out the variably-typed member-datum named
blah from Class1 and Class2 into ParameterizedType, people who subscribe to
interpretation#2 could make the case that &m::i != &y::i for the same reason
that &Class1::i != &Class2::i.

I claim that both interpretations are useful in different situations. I
claim that without the expressivity in the C++ language to explicitly choose
between interpretation#1 and interpretation#2 where only one intrepretation is
supported instead, the people who subscribe to the losing interpretation will
complain endlessly that the wrong interpretation was chosen. If the chosen
interpretation were to be interpretation#2, then subscribers to interpretation#1
will perceive code bloat which they did not intend. If the chosen
interpretation were to be interpretation#1, then subscribers to interpretation#2
will (rightfully) claim that when they factored out the type-variability of blah
from Class1 and Class2 to create ParameterizedType, the semantics of the i
member-datum (wrongfully) changed in ways which they did not intend.

I am not actually proposing such an explicit selection mechanism between
interpretation#1 and interpretation#2. I am merely diagnosing people's
code-bloat versus semantical perceptions & intentions vis a vis a lack of
expressivity in C++ which can aggrevate people's perceptions about unintended
consequences.

Cyril Schmidt

unread,

May 23, 2002, 4:44:52 AM5/23/02

to

Scott Meyers <sme...@aristeia.com> wrote in message news:<MPG.174f5f1c3...@news.hevanet.com>...

> Assuming the T* implementation consists of nothing more than inlined calls

> to the void* implementation, why would this be slower? Or is it naive to

> expect the T* wrapper to consist only of inlines?

Yes, one can implement a container of T* in terms of void*
(or a container of long in terms of int if they are the same size),
but it would be difficult to define the corresponding iterator class.

To have a container of T implemented as a container of U, one can use
a template like this:

template <class T, class U, template <class, class> class Container,
class Alloc = std::allocator<T>, class ContainerImpl=
Container<U, typename Alloc::template rebind<U>::other> >
class sequence_adaptor {
ContainerImpl impl;
public:
void push_back(const T& t) {
impl.push_back(reinterpret_cast<const U&>(t));
}
// other methods can be defined in a similar way
};

Vectors of long and unsigned int can then be defined as

sequence_adaptor<long, int, std::vector> vl;
sequence_adaptor<unsigned, int, std::vector> vu;

This works (gcc 3.0 on Sparc) without noticeable difference
in performance with plain vector (I tried to push_back about 1M of
longs).

Now, the problems begin: if I want, say, to sort my vector of long,
how do I define a random access iterator for my vector?

The solution that I could come up with adds some copying overhead
wherever a return by value happens (in operator+, for instance).
[The code of iterator_adaptor template is at the end of this posting.]

sequence_adaptor gets additional members:
typedef iterator_adaptor<T, typename ContainerImpl::iterator>
iterator;
iterator begin() { return iterator(impl.begin()); }
iterator end() { return iterator(impl.end()); }

It is now possible to write
sort(vl.begin(), vl.end());
but:
1)This code runs about 30% slower than the sorting of a plain vector
(I tried it with a 100K array).

2)Every instantiation of sort() for a different kind of
iterator_adaptor
(say, sorting vl and vu) added about 12K to the size of the
program's
code segment.

The reason for (2) is that the call sort(vl.begin(), vl.end());
instantiates sort<iterator_adaptor<long, std::vector<int>::iterator>
>,
while the call sort(vu.begin(), vu.end());
instantiates sort<iterator_adaptor<unsigned,
std::vector<int>::iterator> >.

That means, we still cannot have common code handle all variations of
vector<int>, vector<long>, and vector<unsigned>, even if we use
sequence_adaptor. By the way, the size of vector<> instantiation is
about 8K on my platform, which is less than 12K - the size of sort<>
instantiation. It is more important to have algoritms that share
common code
than containers than share common representation.

I started thinking of using sort() with the internal representation of
the containers (i.e. doing sort(vl.impl.begin(), vl.impl.end()); or
something like it). That will not work correctly with
sequence_adaptor<unsigned, int, vector>, but it may work well for
sequence_adaptor<Widget*, void*, vector>. I do not know if there is a
reason
to sort pointers, but the problem is the same for any algorithm, not
just
for sort().

Using sort() on internal representation of the container would solve
both
performance and size issues; the only problem is that I do not know
how to
implement it nicely. All the things that I tried for the last hour did
not work,
so I am giving up. Does anybody have an idea how that could be done?

Kind regards,

Cyril
---

template <class T, class Impl>
class iterator_adaptor {
Impl impl;
public:
typedef iterator_adaptor<T, Impl> iterator_type;
typedef typename Impl::iterator_category iterator_category;
typedef typename Impl::difference_type difference_type;
typedef T value_type;
typedef T* pointer;
typedef T& reference;

iterator_adapter() {}
explicit iterator_adaptor(const Impl& i) : impl(i) {}
bool operator == (const iterator_type& i) { return impl == i.impl;
}

reference operator*() const { return
reinterpret_cast<reference>(*impl); }
pointer operator->() const { return
reinterpret_cast<pointer>(impl.operator->()); }

iterator_type& operator++() { ++impl; return *this; }
iterator_type operator++(int) { return iterator_type(impl++); }
iterator_type& operator+=(difference_type n) { impl+=n; return
*this; }
iterator_type operator+(difference_type n) { return
iterator_type(impl+n); }
difference_type operator-(const iterator_type& other) { return
impl - other.impl; }
};
(I skipped operations such as -=, <=, etc.; their definition is
obvious)

Garry Lancaster

unread,

May 23, 2002, 8:09:43 AM5/23/02

to

Hannah Schroeter:

> |> >An implementation that does this is broken. Consider what will
> |> >happen with:
>
> |> >void foo(){ static int i; ...}
>
> |> >as a member function of the class template. The implementation
> |> >HAS to be able to remove duplicated function instantiations.
>
> |> The implementation has to coalesce (sp?) the "static int i" for
> |> all instantiations, but the code itself may be
> |> duplicated. E.g. gcc. In that case, the implementation is
> |> suboptimal, but not completely wrong.

James Kanze:

> The implementation must ensure that &foo is the same in all
> translation units. If it isn't, the implementation is broken.

Since foo is a *member* function, and of a class template at that,
&foo is wrong. I think you meant something like
&some_class_template<some_type>::foo.

Can you give a standard reference for your claim? The only
possibly relevant sections I can find say "An implementation
is not required to diagnose a violation of this rule" or "no
diagnostic required".

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

James Kanze

unread,

May 23, 2002, 8:13:40 AM5/23/02

to

Daniel Miller <daniel...@tellabs.com> writes:

|> James Kanze wrote:

|> > han...@schlund.de (Hannah Schroeter) writes:

|> > |> >An implementation that does this is broken. Consider what
|> > |> >will happen with:

|> > |> >void foo(){ static int i; ...}

|> > |> >as a member function of the class template. The
|> > |> >implementation HAS to be able to remove duplicated
|> > |> >function instantiations.

|> > |> The implementation has to coalesce (sp?) the "static int i"
|> > |> for all instantiations, but the code itself may be
|> > |> duplicated. E.g. gcc. In that case, the implementation is
|> > |> suboptimal, but not completely wrong.

|> > The implementation must ensure that &foo

|> There can be misinterpretations regarding: which foo. (Read
|> further before responding.)

In my example, there is only one foo.

|> > is the same

|> There can be misinterpretations regarding: the same as what.
|> (Read further before responding.)

Vague wording on my part. The standard requires that pointers to foo
compare equal. Even if these pointers were created in different
translation units.

|> > in all translation units. If it isn't, the implementation is broken.

|> If we have the following:

|> template class ParameterizedType
|> <
|> class T
|> >
|> {
|> T blah;
|> static int i;
|> };

|> And then at points of usage, where MyClass is different from
|> YourClass:

|> ParameterizedType<MyClass> m;
|> ParameterizedType<YourClass> y;

We have two different classes. Templates don't define classes or
functions; they define a template for generating classes or functions.

|> interpretation#1: Certain people will argue that the correct
|> interpretation is to have exactly one instance of
|> ParameterizedType's i, period. In this school of thought &m::i ==
|> &y::i. James, is this your intent?

No. This is completely irrelevant to my example (and the one I was
responding to). The variables m and y have completely unrelated
types, and each must have its own instance of i.

|> interpretation#2: Whereas other people will argue that the
|> correct interpretation is to have more than one instance of
|> ParameterizedType's i. In this school of thought &m::i != &y::i.
|> James, is this your intent?

It's what the standard clearly says. A template does NOT define a
type (or a function). Only instantiations of the template are types
(or functions).

|> People who subscribe to interpretation#2 would likely perceive
|> this to be code-bloat (rightly or wrongly), because they got more
|> than one instance of ParameterizedType's i.

It's not code bloat, since it is what the semantics require. (It's
also not code bloat, because the variable i isn't code.)

If the template ParameterizedType contained a large function, and that
function was in fact independant of the type of T, they would still
get two instances of that function. That is code bloat. It can be
avoided by putting the function in a non-template base class.

If the function depended only on the size and the copy semantics of T,
instantiating ParameterizedType on two different pointer types will
result in two copies; on all modern machines, however, the size and
the copy semantics of all pointers are the same, so the two functions
would result in exactly identical machine code. That is also code
bloat.

In the subthread I was responding to, the question was one of
ParameterizedType being instantiated over the same type (say int) in
two or more compilation units. Today, most compilers will issue an
instantiation of the function in the object file of all of the
compilation units. The question concerned the case in which the
linker didn't coalesce these copies. That would definitly create code
bloat. On the other hand, it would not be conform, since taking the
address of the function in different translation units would result in
different addresses.

--
James Kanze mailto:ka...@gabi-soft.de

Conseils en informatique oriente objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

James Kanze

unread,

May 23, 2002, 8:19:59 AM5/23/02

to

brou...@yahoo.com writes:

|> Dietmar Kuehl <dietma...@yahoo.com> wrote:

|> >Just because the template code is in the header file does not
|> >mean that it has to or should be inline.

|> Is this a quality of implementation issue or is there a way to
|> specify that template code in a header file shouldn't be inlined?

Whether a function is inlined or not is an implementation detail. You
can never specify it.

In the case of non-template functions, the inline attribute makes it
legal to have multiple definitions of the function, in several
different translation units. (All definitions must be identical, of
course.)

In the case of template functions, strictly speaking, the inline
attribute has absolutely no signification (except in the case of
exported templates).

In practice, I suspect that most compilers will use the inline
attribute as a hint, and try harder than otherwise to inline a
function declared inline. But there is nothing in the standard to
require this, and since multiple definition of a non-inline
non-template function is undefined behavior, a compiler could simply
ignore the keyword completely and still be conform.

Francis Glassborow

unread,

May 23, 2002, 9:59:20 AM5/23/02

to

In article <3jboeusctia4cciqh...@4ax.com>,

brou...@yahoo.com writes
>Dietmar Kuehl <dietma...@yahoo.com> wrote:
>
>>Just because the template code is in
>>the header file does not mean that it has to or should be inline.
>
>Is this a quality of implementation issue or is there a way to specify that
>template code in a header file shouldn't be inlined?

NO. It is an issue of how you place it in the header file. If, as many
seem to do, you make those definitions in-class then they are implicitly
inline and the language provides no mechanism to cancel that. If you
place them in the header file but outside the class definition whether
they are inline or not is up to you. Being templates means that the
implementation has to handle multiple copies so that should not be an
issue.

Actually, weeding out multiple copies is also an issue for in-class
member function definitions for non-templates.

--
Francis Glassborow ACCU
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Tom Plunket

unread,

May 23, 2002, 11:26:29 AM5/23/02

to

Nick Thurn wrote:

> Tom Plunket wrote...
>
> > If we want to help them, we will tell them that their perceptions
> > are wrong and that a simple experiment can prove it.
>
> As Scott said "Denying the existence of the problem... ".
> Please post your simple experiment.

Francis and Scott did bring up interesting points, but it comes
down to knowing your tools.

I have posted in the past an experiment involving for_each over a
container, and in this case for_each generated less code than an
explicit for loop. I have also posted results gained from
compiling this code with other compilers, showing that other
compilers do not do the optimal thing (and subsequently I filed
bug reports to the vendor). This is an experiment that I ran and
posted on clc++ at a point in the past.

I have done other experiments for myself to "prove" that template
code is not generating interesting overhead; indeed the optimizer
in the compiler that I typically use seems to be quite good at
optimizing the cases that I need.

I cannot post experiments that will be valid for other people
because I do not know what needs other people have. I can,
however, often come up with decent experimentation techniques for
people if they fill me in on what they need.

> Having scratched my head for a fair amount of time over
> a Loki class that needed 5 minutes and 150Mb space
> to compile a simple 10 line test program I think it's fair to
> say that there those of us who are more interested
> in how things actually work in practice than how they
> should work or could work "in theory".

I didn't figure that the OP was talking about build resources, I
thought he was mostly interested in end-product bloat, but I
understand your point. I have never played around with Loki
because I am not yet at the point where I fully understand its
utility, and I have enough going on that I don't have the
bandwidth available to study it at this time.

> BTW making one loki method non-inline reduced
> compile to 1min and 30Mb space, so go figure.

Why was that surprising? Was it not self-evident by looking at
the code? Did you just pick one method at random or did you
apply some heuristic to guess which might be the "optimal" method
to change to reduce compile times?

> I use templates and have no idea if my code is more
> or less bloated than it should be.

Do some experiments. Write the code both ways. Compile and see.
That is the only way to know for sure if your environment does
the right thing, or even if it's "right enough" that the reduced
maintenance costs outweigh the additional weight.

-tom!

James Kanze

unread,

May 23, 2002, 11:34:57 AM5/23/02

to

"Garry Lancaster" <glanc...@ntlworld.com> writes:

|> > |> >void foo(){ static int i; ...}

|> > |> >as a member function of the class template. The
|> > |> >implementation HAS to be able to remove duplicated
|> > |> >function instantiations.

|> > |> The implementation has to coalesce (sp?) the "static int i"
|> > |> for all instantiations, but the code itself may be
|> > |> duplicated. E.g. gcc. In that case, the implementation is
|> > |> suboptimal, but not completely wrong.

|> James Kanze:
|> > The implementation must ensure that &foo is the same in all
|> > translation units. If it isn't, the implementation is broken.

|> Since foo is a *member* function, and of a class template at that,
|> &foo is wrong. I think you meant something like
|> &some_class_template<some_type>::foo.

OK, I lost some context. You're right.

|> Can you give a standard reference for your claim? The only
|> possibly relevant sections I can find say "An implementation is
|> not required to diagnose a violation of this rule" or "no
|> diagnostic required".

5.10/1: "Two pointers of the same type compare equal if and only if
they are both null, both pointer to the same object or function,
or both point one past the end of the same array."

The text concerning pointer to member functions is somewhat more
complex, and leaves the case of pointers to virtual member functions
unspecified, but otherwise, the effect is the same.

This text is copied directly from the C standard. In practice, I have
never found an implementation which is conform, but none that I know
have had problems with pointers to functions. (The problem is with
something like:
int a[ 2 ] ;
int b[ 2 ] ;
int c[ 2 ] ;
In all implementations I've seen, either &a[ 2 ] == &b[ 0 ] or &b[ 2 ]
== &c[ 0 ]. The text given would seem to preclude this, although in
fact, &a[ 2 ] does point to &b[ 0 ], so it may just be that the
standard is actually saying less than it seems to.)

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Vincent Finn

unread,

May 23, 2002, 11:39:10 AM5/23/02

to

> //MyTemplate.cpp------------------------------------------
> #if !defined(MYTEMPLATE_H)
> //If not being included by MyTemplate.h,
> //define non-template implementation
>
> #include "MyTemplate.h"
>
> void NonTemplateClass::F1_NonInline(void)
> {...}
> #endif
> #if defined(MYTEMPLATE_H) || defined(__ExportKeywordImplemented__)
> //If being included by MyTemplate.h or if export keyword is implemented,
> //define template implementations
>
> template<typename T>
> void TemplateClass ::F2_Non_Inline(void)
> {...}
> #endif

Unfortunately can't be used with precompiled headers
but it's the best solution I have seen so far

Thanks, Vin

Daniel Miller

unread,

May 23, 2002, 10:40:08 PM5/23/02

to

James Kanze wrote:

> Daniel Miller <daniel...@tellabs.com> writes:
>
[...snip...]> |> If we have the following:

>
> |> template class ParameterizedType
> |> <
> |> class T
> |> >
> |> {
> |> T blah;
> |> static int i;
> |> };
>
> |> And then at points of usage, where MyClass is different from
> |> YourClass:
>
> |> ParameterizedType<MyClass> m;
> |> ParameterizedType<YourClass> y;
>
> We have two different classes. Templates don't define classes or
> functions; they define a template for generating classes or functions.
>
> |> interpretation#1: Certain people will argue that the correct
> |> interpretation is to have exactly one instance of
> |> ParameterizedType's i, period. In this school of thought &m::i ==
> |> &y::i. James, is this your intent?
>
> No. This is completely irrelevant to my example (and the one I was
> responding to).

It is relevant to exposing the incompleteness of that line of reasoning with
respect to the larger topic of code-bloat. As usual, many of the negative-reply
postings throughout all branches of this thread are in large part treating the
topic as a sort of one-upmanship competitive debating society establishing a
pecking-order rather than a free an open exchange of ideas among co-equal
experts all of whose inputs are valued.

The idea which I am bringing to the table here is that code bloat is a lot
less about dodge-perry-spin quoting from the standard and a lot more about the
softness & illogicalness of human perception. Some interpretations on which
human perceptions are based may in fact be noncompliant with the standard. Some
interpretations may be outside of the chosen thrust of a posting's message. But
these interpretations on which human perceptions are based are in fact very
germane when discussing human-perception-based problems such as code bloat.

I repeat my message: One person's unintended code-bloat is another person's
intent, due to human perceptions. Those human-perceptions can be possibly
illogical, possibly standards-noncompliant, or possibly 100% valid in some
metric (e.g., a metric which you or the standard is not considering at the moment).

> The variables m and y have completely unrelated
> types, and each must have its own instance of i.
>
> |> interpretation#2: Whereas other people will argue that the
> |> correct interpretation is to have more than one instance of
> |> ParameterizedType's i. In this school of thought &m::i != &y::i.
> |> James, is this your intent?
>
> It's what the standard clearly says. A template does NOT define a
> type (or a function). Only instantiations of the template are types
> (or functions).
>
> |> People who subscribe to interpretation#2 would likely perceive
> |> this to be code-bloat (rightly or wrongly), because they got more
> |> than one instance of ParameterizedType's i.
>
> It's not code bloat,

Where is human-perception-based code bloat normatively/exhaustively defined
for you to issue such a decisive edict for all of humankind?

> since it is what the semantics require.

One person's code-bloat is another person's required semantics. One
committee-person's required semantics which they drove hard to get into the
standard is one user's perception of ridiculous unintended code bloat.

> (It's
> also not code bloat, because the variable i isn't code.)

Maybe the meaning of the term "code" has drifted for non-native English
speakers. Deducing from your "the variable i isn't code", the way you use the
term "code" is the way that standard practice uses the terms "executable
statement" or "instructions".

Standard computer-industry practice uses the term "code" for any
manifestation of a computer language. If I say, "Let's look at the code for
your C++ program." I am definitely not saying "Let's look at the instructions
between braces of function-bodies in your C++ program." I am saying "Let's look
at all of the language constructs manifested in your C++ program---both
declaration and definition, both data and instructions." The term "code"
implies obedience to some language's legal syntax (even if that language's legal
syntax is vendor-specified instead of standardized, even if that specification
is the language translator itself---bugs and all---instead of a document).

Furthermore general practice uses the term "code" for any manifestation of
highly-technical/highy-specialized authorship. If I say, "The law is codified
in United States Civil Code." I am saying, "The law is written in noncolloquial
legalese in the official normative set of highly-specialized legal documents
which obey the rules/conventions/constitution of the United States." The term
"code" implies obedience to some normative-bodies' conventions (e.g.,
constitution, bar association, application of government's technique of enacting
law, lack of application of government's technique of dissolving law).

Even in the sense of "machine code", "code" includes data as well as
instructions because the byte-sequence of the program (including both
instructions and data) is codified as per the rules & conventions of that
processor's architecture. I have never heard *anyone* ever say, "The compiler
compiles only the C++ program's instructions into machine code and compiles the
C++ program's data into (machine?) data."

[...snip...]

> In the subthread I was responding to, the question was one of
> ParameterizedType being instantiated over the same type (say int) in
> two or more compilation units.

In the branch to which I was responding (i.e., your post), the deficiency
requiring public airing was whether the topic was being treated too narrowly.
By taking a strict issue-edicts-on-standards-compliance approach, one loses the
primary component of code-bloat complaints: human perceptions and expectations.

> Today, most compilers will issue an
> instantiation of the function in the object file of all of the
> compilation units. The question concerned the case in which the
> linker didn't coalesce these copies.

That is a mere example of the larger topic which I was discussing: human
perception of how many copies (of various language constructs) *should* versus
*should not* be generated. James, you have your individual perception. Other
people have their perceptions. One person's intended semantics is another
person's code bloat due to these perception/expectation differences between
individual divergent human beings. This human-perception/expectation component
is an important message to be heard & respected regarding code bloat. Please do
not attempt to drown me out by eroding my message via deflection or derision.
You could have just said privately to yourself "Hmmmm, that is an interesting
alternate perspective of which I was not thinking at the time. I need to ponder
that more and see whether considering the human-perception component of
code-bloat complaints has merit." without rebuttal.

> That would definitly create code
> bloat. On the other hand, it would not be conform, since taking the
> address of the function in different translation units would result in
> different addresses.

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

James Kanze

unread,

May 23, 2002, 11:29:30 PM5/23/02

to

Francis Glassborow <francis.g...@ntlworld.com> writes:

|> In article <3jboeusctia4cciqh...@4ax.com>,
|> brou...@yahoo.com writes
|> >Dietmar Kuehl <dietma...@yahoo.com> wrote:

|> >>Just because the template code is in the header file does not
|> >>mean that it has to or should be inline.

|> >Is this a quality of implementation issue or is there a way to
|> >specify that template code in a header file shouldn't be inlined?

|> NO. It is an issue of how you place it in the header file. If, as
|> many seem to do, you make those definitions in-class then they are
|> implicitly inline and the language provides no mechanism to cancel
|> that. If you place them in the header file but outside the class
|> definition whether they are inline or not is up to you.

Definitions in the class are processed as if they were declared
inline. For definitions outside of the class, you can decide whether
they are declared inline or not.

But that doesn't change the fact that the inline modify on a
non-exported template function has absolutely no semantics, other than
that of a simple comment or a suggestion that the compiler is free to
take or not, as it pleases.

--
James Kanze mailto:ka...@gabi-soft.de
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
Ziegelhüttenweg 17a, 60598 Frankfurt, Germany Tel. +49(0)179 2607481

[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]

Garry Lancaster

unread,

May 23, 2002, 11:30:21 PM5/23/02

to

James Kanze:

> 5.10/1: "Two pointers of the same type compare equal if and only if
> they are both null, both pointer to the same object or function,
> or both point one past the end of the same array."

Thanks.

I think it's clear that this applies to normal functions,
but does it apply to instantiations of member functions
of class templates? (There are a number of places in
the standard that treat templates and functions as
different things. I don't believe the word "function" is
ever unambiguously defined.) I wasn't sure until I
spotted section 14.4 Type Equivalence:

"Two template-ids refer to the same class or *function*
if their template names are identical, they refer to the
same template, their type template-arguments are the
same type, their non-type template-arguments of integral
or enumeration type have identical values, their non-
type template-arguments of pointer or reference type
refer to the same external object or function, and their
template template-arguments refer to the same template."

(my emphasis)

Which makes it clear that it does, I think.

Kind regards

Garry Lancaster
Codemill Ltd
Visit our web site at http://www.codemill.net

Hannah Schroeter

unread,

May 23, 2002, 11:30:46 PM5/23/02

to

Hello!

In article <zXWdclC4...@robinton.demon.co.uk>,
Francis Glassborow <fran...@robinton.demon.co.uk> wrote:
>[... compilers not coalescing equal template instantiations from
> different compilation units ...]

>True, but it is a pretty poor compiler/linker that cannot do better than
>that. I know that in the early days instantiations had more important
>priorities but these days a respectable implementation should manage.

Example: gcc/OpenBSD/x86. In fact, the -frepo thing AFAIK only works
on new GNU ld, and ELF only. OpenBSD/x86 has neither. Strange that I
don't see any *inherent* problems in doing more collect2 trickery
for non-GNU-ld and/or non-ELF platforms, probably just that no-one has
done it.

Kind regards,

Hannah.

Dietmar Kuehl

unread,

May 24, 2002, 2:51:43 AM5/24/02

to

brou...@yahoo.com wrote:

> Dietmar Kuehl <dietma...@yahoo.com> wrote:
>>Just because the template code is in
>>the header file does not mean that it has to or should be inline.
>
> Is this a quality of implementation issue or is there a way to specify that
> template code in a header file shouldn't be inlined?

Unless you specify the function explicitly (using the keyword 'inline')
or implicitly (defining it inside a class definition) inline, it is not
inline. A function not being inline basically generates a block of code
which is really called as a function (strictly speaking, the
implementation is free to do the same for inline functions, too; also,
the implementation is free to inline non-inline functions as well...; so
there are indeed some QoI issues here, too). The major difference is
this: Often, even relatively complex functions are inlined because they
are declared as inline. If they were not inlined, the calling code would
be smaller and the compiler basically has to remove the duplicate code
blocks somehow (strictly speaking, it has actually not to do this: it
can include whatever unused bytes it wants into the final executable,
ie. there is a QoI issue here, too).

Although there are some QoI issues involved, is normally comes down to
this: inline functions are inlined unless there are good reasons not to
inline them (eg. because they are virtual, are really big, etc.). This
results in lots of code duplication which is not remove. Non-inline
template functions are normally not inlined and result in multiple
blocks of code which are reduce to just one copy at link time. Compilers
differ in their exact approaches and I have not really investigated the
assembler of the linked executable to proof this. However, there are
differences in the excutable sizes.
--
<mailto:dietma...@yahoo.com> <http://www.dietmar-kuehl.de/>
Phaidros eaSE - Easy Software Engineering: <http://www.phaidros.com/>

Nick Thurn

unread,

May 24, 2002, 2:54:02 AM5/24/02

to

Tom Plunket <to...@fancy.org> wrote in message news:<8b5jeukaa2s8ccjjk...@4ax.com>...

> Nick Thurn wrote:
>
> > Tom Plunket wrote...
> >
> > > If we want to help them, we will tell them that their perceptions
> > > are wrong and that a simple experiment can prove it.
> >
> > As Scott said "Denying the existence of the problem... ".
> > Please post your simple experiment.
>
> Francis and Scott did bring up interesting points, but it comes
> down to knowing your tools.
>
> I have posted in the past an experiment involving for_each

> [snip] in the past.
>
Ok I'll look for it however what you are saying is *much* less
catagorical than your previous statement. BTW what compiler
are you using?

> > Having scratched my head for a fair amount of time over
> > a Loki class that needed 5 minutes and 150Mb space

> > to compile a simple 10 line test program [snip]

>
> I didn't figure that the OP was talking about build resources, I
> thought he was mostly interested in end-product bloat, but I
> understand your point. I have never played around with Loki
> because I am not yet at the point where I fully understand its
> utility, and I have enough going on that I don't have the
> bandwidth available to study it at this time.
>

Yes you are correct. My point was that so much can occur around
templates that the "why" is a real problem. As you say "bandwidth"
means you don't have time to definitivly examine/solve these
issues.

> > BTW making one loki method non-inline reduced
> > compile to 1min and 30Mb space, so go figure.
>
> Why was that surprising? Was it not self-evident by looking at
> the code? Did you just pick one method at random or did you
> apply some heuristic to guess which might be the "optimal" method
> to change to reduce compile times?
>

Whoo! you'd better have a look at Loki before declaring things
are/should be self evident!! BTW how did you know my porting
methodology <g>.

Thanks for responding.

cheers
Nick

Alex Oren

unread,

May 24, 2002, 7:10:24 PM5/24/02

to

On 23 May 2002 04:41:09 -0400, Daniel Miller wrote in
<3CEC2AF3...@tellabs.com>:

> If we have the following:
>
> template class ParameterizedType
> <
> class T
> >
> {
> T blah;
> static int i;
> };
>
> And then at points of usage, where MyClass is different from YourClass:
> ParameterizedType<MyClass> m;
> ParameterizedType<YourClass> y;
>
> interpretation#1: Certain people will argue that the correct interpretation
> is to have exactly one instance of ParameterizedType's i, period. In this
> school of thought &m::i == &y::i. James, is this your intent?
>
> interpretation#2: Whereas other people will argue that the correct
> interpretation is to have more than one instance of ParameterizedType's i. In
> this school of thought &m::i != &y::i. James, is this your intent?

[...]

> I claim that both interpretations are useful in different situations. I
> claim that without the expressivity in the C++ language to explicitly choose
> between interpretation#1 and interpretation#2 where only one intrepretation is
> supported instead, the people who subscribe to the losing interpretation will
> complain endlessly that the wrong interpretation was chosen.

[...]

> I am not actually proposing such an explicit selection mechanism between
> interpretation#1 and interpretation#2.

[...]

I am in favour of interpretation #2 being correct because it will allow
achieving interpretation #1 using inheritance:

struct Base { static int i; };
template <typename T> struct Data: Base { /*...*/ };

Which gives us the selection mechanism.

That is consistent with Daniel's other example:

> class Class1
> {
> MyClass blah;
> static int i;
> };
>
> class Class2
> {
> YourClass blah;
> static int i;
> };
>
> People who subscribe to interpretation#1 would certainly not extend their
> line of reasoning to argue that &Class1::i should == &Class2::i. Because
> ParameterizedType is simply factoring out the variably-typed member-datum named
> blah from Class1 and Class2 into ParameterizedType, people who subscribe to
> interpretation#2 could make the case that &m::i != &y::i for the same reason
> that &Class1::i != &Class2::i.

&Class1::i != &Class2::i but we can achieve &Class1::i == &Class2::i
using the same inheritance method:

struct Base { static int i; };
struct Data1: Base {};
struct Data2: Base {};

Best regards,
Alex.

--
To email me, replace "myrealbox" with "alexoren".
Sorry for the inconvenience. Blame the spammers.

Tom Plunket

unread,

May 24, 2002, 7:18:31 PM5/24/02

to

Nick Thurn wrote:

> > I have posted in the past an experiment involving for_each
> > [snip] in the past.
>
> Ok I'll look for it however what you are saying is *much* less
> catagorical than your previous statement.

What I am saying is that I have done the research on my own
regarding what my compilers do with templates, and I posted the
actual research gained from only one minimal experiment because
it was surprising to me how differently two different "major
vendors" compiled the same code.

Posting a simple experiment that proves some functionality of my
compiler would be useless to many here, and it would also
necessarily be platform-specific. What use would there be in
posting detailed results regarding the code generation of a C++
compiler for PlayStation2?

> > > Having scratched my head for a fair amount of time over
> > > a Loki class that needed 5 minutes and 150Mb space
> > > to compile a simple 10 line test program [snip]
> >
> > I didn't figure that the OP was talking about build resources, I
> > thought he was mostly interested in end-product bloat, but I
> > understand your point. I have never played around with Loki
> > because I am not yet at the point where I fully understand its
> > utility, and I have enough going on that I don't have the
> > bandwidth available to study it at this time.
>
> Yes you are correct. My point was that so much can occur around
> templates that the "why" is a real problem. As you say "bandwidth"
> means you don't have time to definitivly examine/solve these
> issues.

"bandwidth" was meant to imply that the tools I have at my
disposal work fine and I have no need to introduce more. The
fundamental reasons being that the people I work with are big NIH
believers so it would be an uphill battle to put anything like
that into place regardless.

> > > BTW making one loki method non-inline reduced
> > > compile to 1min and 30Mb space, so go figure.
> >
> > Why was that surprising? Was it not self-evident by looking at
> > the code? Did you just pick one method at random or did you
> > apply some heuristic to guess which might be the "optimal" method
> > to change to reduce compile times?
>
> Whoo! you'd better have a look at Loki before declaring things
> are/should be self evident!! BTW how did you know my porting
> methodology <g>.

I am not declaring stuff self-evident, I am asking what you
found. Specifically, "is your understanding of the way templates
work on your compiler something that led to your changing of this
one function, or did you just pick one of the functions of Loki
at random and change it?"

-tom!

Nick Thurn

unread,

May 25, 2002, 5:52:10 AM5/25/02

to

"Tom Plunket" <to...@fancy.org> wrote in message

news:sj0teushe5a3eeli5...@4ax.com...

> Nick Thurn wrote:
>
> > > I have posted in the past an experiment involving for_each
> > > [snip] in the past.
> >
> > Ok I'll look for it however what you are saying is *much* less
> > catagorical than your previous statement.
>
> What I am saying is that I have done the research on my own
> regarding what my compilers do with templates, and I posted the
> actual research gained from only one minimal experiment because
> it was surprising to me how differently two different "major
> vendors" compiled the same code.
>

No, what you said was:

> Tom Plunket wrote...
>
> > If we want to help them, we will tell them that their perceptions
> > are wrong and that a simple experiment can prove it.
>

This is pretty catagorical. What you meant was only in
a limited context, but that is not what you said.

>
> I am not declaring stuff self-evident, I am asking what you
> found. Specifically, "is your understanding of the way templates
> work on your compiler something that led to your changing of this
> one function, or did you just pick one of the functions of Loki
> at random and change it?"
>

Ok as another poster has pointed out concerning expression
templates - the compiler gives up. At what point it gives up
is unknown, except by experiment, to compiler users.

Understanding how templates work on any compiler
is only possible for simple cases as the combination
of the simple cases in real code will hit limits imposed
by the compiler writer that are not knowable except,
at least in g++'s case, by reading the compiler code.
In particular why, with all the complexity of Loki's
other features, would this specific feature cause
a blowout is not obvious. Much more complex code
compiled in a flash.

The method that was causing the grief was a simple three
line one (I don't have the code on this machine otherwise
I'd post it - hopefully I've recalled correctly). The compiler
did not "give up" but did take huge resources to perform
the expansion. How I discovered the "solution" I can't
recall. However I had posted queries in this group to
no avail prior to discovering it. As, in any event, I was
porting a bleeding edge library to g++ the whys were
not my primary concern as g++ itself is (and is still)
a moving target wrt template/standard correctness.
In other words just getting the code to run *at all*
was my first goal.

cheers
Nick

Trevor Taylor

unread,

May 29, 2002, 6:35:35 AM5/29/02

to

Francis Glassborow <francis.g...@ntlworld.com> wrote in message news:<edC2eYAp...@robinton.demon.co.uk>...
> In article <95e0e5ef.02052...@posting.google.com>, Stefan
> Heinzmann <stefan_h...@yahoo.com> writes
> >In a statically linked executable, the above point should be solved
> >with a reasonable toolset, as others have said. I am astonished
> >however that noone has yet mentioned that this can indeed be a problem
> >in dynamically loaded libraries or plug-in modules. Since they're
> >compiled and linked separately, they will of course contain their own
> >template instantiations.

"Since... of course"? It doesn't necessarily follow. Granted, though,
that by using default compile and link time behaviour on many
compilers/platforms they will contain their own template
instantiations.

> >
> >For example if my application consists of a number of separate modules
> >which are loaded dynamically (not uncommon in Windows-land), and I use
> >the standard library a lot (streams, strings and the like), I will
> >likely have instantiations of the same stuff in each module.
>
> But that is not a template issue, it is an issue with dynamically linked
> libraries.

Whether it's a "template issue" will depend on how you interpret
"template issue", but it is certainly relevant to the discussion (if
not the original question). In particular because it shows that...

1. [Multiple instantiations] Temp<T> is instantiated in more
than one translation unit, so when the objs are linked
together, the exe has more than one copy of Temp<T>'s
member functions.

... is not as simple as the example and other replies suggest:

- several previous replies have said that the duplicate code will be
stripped by the linker, but Stefan was the first that hinted at
caveats in this area when using shared libraries and templates.

- shared libraries are typically used to reduce duplicated code. But
they can actually *increase* duplicated code when templates come into
play

- it is easy to overlook the run-time dependencies that are introduced
by using shared libraries that instantiate templates. Since one might
introduce shared libraries in order to reduce code bloat, shared
libraries would seem relevant to the discussion.

Trevor