Automatically eliminating unnecessary #includes

Scott Meyers

unread,

Nov 18, 2003, 3:14:47 PM11/18/03

to

I recently received the following query, which summarized a question I'd
been thinking about, anyway, so here goes:

There are almost ninety projects in our company product! My task is
decreasing compile time observably, but I have a restriction, because I
am a newbie in my team, I am not familiar with our product, can not
modify any class structure! I have two questions that I hope u can give
me help.

1. Can u tell me which case I can remove unnecessary include header
files? Any reference is also expected!

2. Are there any tools for do this work automatically? It is difficult
for people to do it in a large project!

For (1), I referred him to Item 34 in Effective C++ as well as to John
Lakos' book (Large-Scale C++ Software Design), but I don't know of any
tools for (2). Do readers here? I can think of two kinds of tools.

- Something that would identify superfluous #includes, i.e., #includes
that, if removed, don't affect the behavior of the program. For
example, it's easy to imagine a "#include <vector>" sitting at the top
of a file long after all the vectors were changed to lists.

- Something that would suggest changes (or offer to make them itself!) to
classes that would eliminate the need for #includes. For example, I
wrote about the idea of an automatic pimplifier in the first edition of
EC++ (though I didn't use that term, because Herb hadn't popularized it
yet), but I've never heard of a program that did that. Do such
programs exist?

Finally, does anybody know whether PC Lint or any other static analysis
tools can help with the problem of too many compilation dependencies?

Thanks,

Scott

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Daniel Frey

unread,

Nov 19, 2003, 5:20:06 AM11/19/03

to

Scott Meyers wrote:
> I recently received the following query, which summarized a question I'd
> been thinking about, anyway, so here goes:
>
> There are almost ninety projects in our company product! My task is
> decreasing compile time observably, but I have a restriction, because I
> am a newbie in my team, I am not familiar with our product, can not
> modify any class structure! I have two questions that I hope u can give
> me help.
>
> 1. Can u tell me which case I can remove unnecessary include header
> files? Any reference is also expected!
>
> 2. Are there any tools for do this work automatically? It is difficult
> for people to do it in a large project!
>
> For (1), I referred him to Item 34 in Effective C++ as well as to John
> Lakos' book (Large-Scale C++ Software Design), but I don't know of any
> tools for (2). Do readers here? I can think of two kinds of tools.
>
> - Something that would identify superfluous #includes, i.e., #includes
> that, if removed, don't affect the behavior of the program. For
> example, it's easy to imagine a "#include <vector>" sitting at the top
> of a file long after all the vectors were changed to lists.

I think it's not that easy. The vectors may not have been removed, but
only moved away. This means that you cannot remove #include <vector>,
but probably need to move it, too. At least this is what I've observed
in our code (~320.000 lines of C++).

Also, sometimes an include cannot be moved/removed, but it can be
replaced with a forward declaration which can also lead to significant
time savings.

The tool needs to be able to move/remove/replace headers plus reordering
iff the compiler can benefit from it (the GCC 3.4 precompiled headers
come to mind). If anything like that exists, I'd be glad to hear about
it. :o)

OTOH doing it manually is an excellent start for the guy who send you
the query to learn more about his company's code base (and C++) :)

Regards, Daniel

--
Daniel Frey

aixigo AG - financial solutions & technology
Schloß-Rahe-Straße 15, 52072 Aachen, Germany
fon: +49 (0)241 936737-42, fax: +49 (0)241 936737-99
eMail: danie...@aixigo.de, web: http://www.aixigo.de

John Torjo

unread,

Nov 19, 2003, 1:34:24 PM11/19/03

to

>
> - Something that would identify superfluous #includes, i.e., #includes
> that, if removed, don't affect the behavior of the program. For
> example, it's easy to imagine a "#include <vector>" sitting at the top
> of a file long after all the vectors were changed to lists.
>
> - Something that would suggest changes (or offer to make them itself!) to
> classes that would eliminate the need for #includes. For example, I
> wrote about the idea of an automatic pimplifier in the first edition of
> EC++ (though I didn't use that term, because Herb hadn't popularized it
> yet), but I've never heard of a program that did that. Do such
> programs exist?
>

I'm not sure how many are out there, but I was thinking of a very
interesting utility program that would actually see which files are
modified most (and their dependencies).

This way, you can in time, realize ***where*** it's feasible to remove
dependencies.

In short, it could be something like this; in each file, have
something similar to:
namespace { some_type var(__FILE__,__LINE__,__DATE__,__TIME__); }

The some_type's constructor will write __FILE__,... to a database when
the program is run. In time you can see where which files are modified
most (with some similar technique it's easy to find out the
dependencies as well).

(note: each time a file is recompiled, __DATE__ and __TIME__ change,
therefore each time the program is run, we know which files got
re-compiled).

Unfortunately, I don't have the time to implement it right now - but
it's a very interesting idea for future reference;)

Best,
John

Jack Klein

unread,

Nov 19, 2003, 1:35:19 PM11/19/03

to

On 18 Nov 2003 15:14:47 -0500, Scott Meyers <Use...@aristeia.com>
wrote in comp.lang.c++.moderated:

> I recently received the following query, which summarized a question I'd
> been thinking about, anyway, so here goes:

[snip]

> 1. Can u tell me which case I can remove unnecessary include header
> files? Any reference is also expected!
>
> 2. Are there any tools for do this work automatically? It is difficult
> for people to do it in a large project!

> - Something that would identify superfluous #includes, i.e., #includes

> that, if removed, don't affect the behavior of the program. For
> example, it's easy to imagine a "#include <vector>" sitting at the top
> of a file long after all the vectors were changed to lists.

[snip]

> Finally, does anybody know whether PC Lint or any other static analysis
> tools can help with the problem of too many compilation dependencies?

PC-Lint claims to identify unnecessary inclusions in both C and C++.
I've never tested it in C++, but it most certainly works quite well
with C source code.

I don't know of any automated tool that will automatically remove the
includes from the source files.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++ ftp://snurse-l.org/pub/acllc-c++/faq

Daniel Krügler

unread,

Nov 19, 2003, 3:35:06 PM11/19/03

to

Good Morning, Scott Meyers!

Scott Meyers schrieb:

[snip]

> Finally, does anybody know whether PC Lint or any other static analysis
> tools can help with the problem of too many compilation dependencies?

We use PC Lint regularily and one of its add-ons is that it remembers one
concerning unnecessary header files. Regrettably I never checked this very
feature for its limits or other way around for its "intelligence".
So I cannot say how efficient it works. Do others here have some comparisons?

Greetings from Bremen,

Daniel Krügler (nee Spangenberg)

Balog Pal

unread,

Nov 19, 2003, 3:41:54 PM11/19/03

to

"Scott Meyers" <Use...@aristeia.com> wrote in message
news:MPG.1a23ee294...@news.hevanet.com...

> - Something that would identify superfluous #includes, i.e., #includes
> that, if removed, don't affect the behavior of the program. For
> example, it's easy to imagine a "#include <vector>" sitting at the top
> of a file long after all the vectors were changed to lists.

You say there are compilers used by real programmers where having that exra
include has measurable impact on compile time?

Well I'm possibly too used to MSVC and its precomp header mechanism, but I
find it really hard to imagine.

Having not needed includes that are in flux is indeed bad -- as it triggers
recompilation when none is needed. But read-only headers does not have such
an effect.

And the original problem is IMHO to be attacked from the design and not the
techncal level. Fency tools that can find a few droppable includes will
hardly gain much speed in general. The whole development process should be
analised to find the root causes.

Auto-pimplifying is an interesting idea. Though again, even if we had it
ready, is it really good to apply it blindly to everything in sight? OTOH
if search discovers a small set of fast changing classes -- they can be
redone by hand using the problem info.

Paul

Antoun Kanawati

unread,

Nov 19, 2003, 4:07:03 PM11/19/03

to

Scott Meyers wrote:
> I recently received the following query, which summarized a question I'd
> been thinking about, anyway, so here goes:
>
> There are almost ninety projects in our company product! My task is
> decreasing compile time observably, but I have a restriction, because I
> am a newbie in my team, I am not familiar with our product, can not
> modify any class structure! I have two questions that I hope u can give
> me help.
>
> 1. Can u tell me which case I can remove unnecessary include header
> files? Any reference is also expected!
>
> 2. Are there any tools for do this work automatically? It is difficult
> for people to do it in a large project!

> ....

Doxygen generates include-dependency graphs, and also maps definitions
and declaration to header files.

In addition to HTML, Doxygen emits XML and .dot files (graphviz inputs).
I believe the XML output contains all the necessary information, but
I have never bothered to check thoroughly.

So, it may be possible to use that data to build a dependency graph,
and calculate redundant #include edges.

There is also the question of what is "necessary"; for example, let's
say that both B.h and C.h make direct reference to declarations in
A.h, but C.h also includes B.h for other reaons. Do we remove the "A.h"
inclusion in C.h because B.h does it? What happens if B.h ceases to
depend on A.h?

That is: is necessity defined by minimal requirement for compilation,
or augmented by a stylistic requirement? [your refer to A.h's things,
then you include A.h even if it is indirectly included through other
means.]

Dhruv

unread,

Nov 19, 2003, 6:28:19 PM11/19/03

to

This is not feasible because you might have preceding #includes necessary
for the succeeding #includes. A typical example is the libstdc++ standard
c++ library, where the actua include files #include the dependencies for
the actual implementation files. Runnning such a tool on such files would
mean having a pseudo preprocessor to actually go thru each included file,
and see if it is used there or not!!!

Regards,
-Dhruv.

Scott Meyers

unread,

Nov 19, 2003, 6:31:42 PM11/19/03

to

On 19 Nov 2003 05:20:06 -0500, Daniel Frey wrote:
> I think it's not that easy. The vectors may not have been removed, but
> only moved away. This means that you cannot remove #include <vector>,
> but probably need to move it, too. At least this is what I've observed
> in our code (~320.000 lines of C++).

I don't understand what you mean exactly. What do you mean by "moved
away?"

Personally, I've found many files with #includes for headers that
correspond to functionality that is no longer used, i.e., where removing
the #includes does nothing but speed compilation and linking. Is this an
uncommon experience?

> Also, sometimes an include cannot be moved/removed, but it can be
> replaced with a forward declaration which can also lead to significant
> time savings.

I agree. That's the kind of thing I'd hope the second kind of tool I
posted about would be able to do.

Scott

Ed Avis

unread,

Nov 20, 2003, 4:57:19 AM11/20/03

to

Scott Meyers <Use...@aristeia.com> writes:

> 1. Can u tell me which case I can remove unnecessary include header
> files?

You could run every file through the preprocessor (cc -E or whatever;
make sure to use the same preprocessor options you use when building)
and remove those #includes that make no difference to the output
(apart from maybe blank lines and comments). A tool to do this
automatically shouldn't be hard to write.

But I think if you trim #includes in such a way you're pushed in the
direction of following Rob Pike's advice:

>Simple rule: include files should never include include files. If
>instead they state (in comments or implicitly) what files they need
>to have included first, the problem of deciding which files to
>include is pushed to the user (programmer) but in a way that's easy
>to handle and that, by construction, avoids multiple inclusions.
>Multiple inclusions are a bane of systems programming. It's not rare
>to have files included five or more times to compile a single C
>source file. The Unix /usr/include/sys stuff is terrible this way.
>
>There's a little dance involving #ifdef's that can prevent a file
>being read twice, but it's usually done wrong in practice - the
>#ifdef's are in the file itself, not the file that includes it. The
>result is often thousands of needless lines of code passing through
>the lexical analyzer, which is (in good compilers) the most expensive
>phase.
>
>Just follow the simple rule.

Not everyone agrees with this - some try to make include files
'self-contained' so that if some header file fred.h needs vector, for
example, it will #include <vector> itself rather than telling its
includer jim.c to add such a line. But if you ran such a project
through a #include-trimmer, it might remove the #include <vector> from
jim.c because it is 'not needed' (since it already got included as
part of fred.h, and has an include guard to make the second inclusion
have no effect) - which then means that jim.c will unexpectedly break
if it one day stops including fred.h. The inclusions start to work
only by coincidence.

> - Something that would identify superfluous #includes, i.e., #includes
> that, if removed, don't affect the behavior of the program.

That is a much bigger challenge - the above removes only those that
don't affect the program text. What you suggest is Turing-complete.
The best approximation might be to remove as much as possible while
still getting the program to compile (with the hope that if something
was not needed to compile it can't have been used anywhere else).

> - Something that would suggest changes (or offer to make them itself!) to
> classes that would eliminate the need for #includes.

An automatic pimplifier is tricky; moving function bodies from header
to implementation file is usually easier and can be done with brute
force text munging (provided you check it compiles afterwards).

--
Ed Avis <e...@membled.com>

Ben Liddicott

unread,

Nov 20, 2003, 5:10:44 AM11/20/03

to

"Scott Meyers" <Use...@aristeia.com> wrote in message news:MPG.1a23ee294...@news.hevanet.com...

> I recently received the following query, which summarized a question I'd
> been thinking about, anyway, so here goes:
>
> There are almost ninety projects in our company product! My task is
> decreasing compile time observably, but I have a restriction, because I
> am a newbie in my team, I am not familiar with our product, can not
> modify any class structure! I have two questions that I hope u can give
> me help.

Rather than answer the questions directly, can I suggest the following
which, for me, have had far, far more effect than I could ever have hoped
for by removing unwanted dependencies.

1. Build from local disks.

Building from headers and files on NFS or SMB (Windows Networking)
mounted disks is an enormous overhead. If you are currently doing this,
my recommendation is to copy all files to a local disk, then build from
there. Even if you copy all the files every time you build, you will
STILL save build time, in my judgement and experience.

If copying files becomes a bottleneck, use rsync or a similar tool to speed it up.

2. Turn off the on-access virus scanner.

Obviously this only applies if you have an on-access virus scanner
(that's one which scans files for viruses as you read them). My
experience with this was McAffee on Windows NT 4.0, about four years ago,
so your mileage may vary. Turning it off speeded up the compilation phase
noticably, (a few percent) but speeded up the link phase by a factor of tens.

Other possibilities to consider are other compilation options which might
exist. Most compilers have a plethora of options, many of which affect build time.

On a related topic (at least, they were related for me at the time), I
also speeded up load time for the same application by changing DLLs from
exporting functions by name to exporting them by ordinal only. For C++
class members, the name was frequently much larger than the function
itself, and didn't benefit from duplicate comdat folding (as MS calls it
-- merging functions which consist of the same instruction sequence). I
also changed the DLLs from exporting classes to exporting only the
functions which were actually being used. This was a single-purpose DLL,
so there was no issue of compatibility with other applications. These two
changes, together with a linker flag, cut load time from minutes to tens
of seconds (I forget the exact numbers). The bigger effect came from
exporting by ordinal, which enormously reduced the size of the DLLs.

I'm sorry that these suggestions aren't about standard C++, but I believe
they may help your questioner.

Directly related are the following, possibly too obvious to be worth
mentioning. These only help when the same header is included twice in the
same compilation unit:

Use pragma once or equivalent.
Use external exclude guards, if #pragma once is not available.

Cheers,
Ben Liddicott

Joshua Lehrer

unread,

Nov 20, 2003, 6:26:26 AM11/20/03

to

We tried to write such a product on our platform (VMS) and quickly
discovered that it was a dead end.

We focused on header files, as removing an include from a header file
has a wider effect than working on a source module.

The problem is that the header may be able to be removed, and the
header may still parse correctly. However, some source module may be
relying on that header dragging in the secondary header. In other
words, B.H may not have needed A.H, but B.CPP may have been relying
(perhaps unknowingly) on getting A.H from B.H.

Also, simply proving that the code still compiles with the header
removed is not enough. Some header files may cause a change in
behavior, but may have compiled having been included, or not. Think
of a trait class that has valid defaults. You may be able to remove
the header which defines a necessary specialization, and the code may
still work, but differently.

We finally decided that the best approach was to simply find the worst
offenders, fix them by hand, and teach people techniques to keep the
includes to a necessary minimum.

joshua lehrer
fellow BrownU grad - I was in a guest lecture you gave for Spike's
CS132 class
factset research systems
NYSE:FDS

Albrecht Fritzsche

unread,

Nov 20, 2003, 9:26:21 AM11/20/03

to

Scott Meyers wrote:

> - Something that would identify superfluous #includes, i.e., #includes
> that, if removed, don't affect the behavior of the program. For
> example, it's easy to imagine a "#include <vector>" sitting at the top
> of a file long after all the vectors were changed to lists.

A kinda /brute force/ way, and only half-right, would be a automatic
check for every header, if you can compile it via

#include "ThatHeader.h"
int main()
{ return 0; }

If yes, then you might try to remove headers in that very header etc to
find the minimal set of required headers. Unfortunately, this gives
you some headers, which contain "less than minimal" headers - depending
on the included headers in some other headers, ie

// stack.h
#include <vector>
...

// foo.h
#include "stack.h"
//+#include <vector> <-- removed by that tool

class foo {
...
std::vector<int> bar;
};

If now the stack implementation switches to <list>... But, at least,
it's a starting point.

> Finally, does anybody know whether PC Lint or any other static analysis
> tools can help with the problem of too many compilation dependencies?

Yeaph - PC Lint, doxygen and ctags are creating some useful informations
about such dependencies.

Daniel Frey

unread,

Nov 20, 2003, 10:10:43 AM11/20/03

to

Scott Meyers wrote:
> On 19 Nov 2003 05:20:06 -0500, Daniel Frey wrote:
>
>>I think it's not that easy. The vectors may not have been removed, but
>>only moved away. This means that you cannot remove #include <vector>,
>>but probably need to move it, too. At least this is what I've observed
>>in our code (~320.000 lines of C++).
>
> I don't understand what you mean exactly. What do you mean by "moved
> away?"

Projects evolve, often under a significant time pressure :) Sometimes,
the implementation of a function which needs/uses the #include <vector>
is moved from the header to the implementation file. Thus the #include
can be removed from the header, but it needs to be added to .cc-file, so
overall it is moved, not removed. But people often forget about the
includes as long as it works.

Another example is, when a class is refactored. If you factor out some
stuff into a derived class and turn the original class into a base for a
set of classes, the derived classes usually contain #include <base.h>.
If base.h contains #include <whatever> which is only needed for one of
the derived classes, it should also be moved.

One more example: If you decide to turn a class into a template to take
'vector' as a template argument. If your header for the class includes
'vector', you include it even when instantiating the class with 'set' or
'list'.

Of course all these things shouldn't be an issue in an ideal world, but
in practice people don't touch includes as long as the compiler doesn't
complain. At least this is what I observed (even when looking at myself
when working on a tight schedule), YMMV.

> Personally, I've found many files with #includes for headers that
> correspond to functionality that is no longer used, i.e., where removing
> the #includes does nothing but speed compilation and linking. Is this an
> uncommon experience?

Not at all. But it's only part of the job. It's a good start, but from
my experience, the most valuable replacements are done in conjunction
with forward references. But it seems you already know that ;)

>>Also, sometimes an include cannot be moved/removed, but it can be
>>replaced with a forward declaration which can also lead to significant
>>time savings.
>
> I agree. That's the kind of thing I'd hope the second kind of tool I
> posted about would be able to do.

The point is that there is a danger involved when you have a tool that
only handles a part of the cases: People might think that they needn't
think about the includes any more as they now have a tool for it. It's a
good excuse for some...

Regards, Daniel

--
Daniel Frey

aixigo AG - financial solutions & technology
Schloß-Rahe-Straße 15, 52072 Aachen, Germany
fon: +49 (0)241 936737-42, fax: +49 (0)241 936737-99
eMail: danie...@aixigo.de, web: http://www.aixigo.de

Tom Houlder

unread,

Nov 21, 2003, 4:47:52 AM11/21/03

to

"Scott Meyers" <Use...@aristeia.com> wrote in message
news:MPG.1a23ee294...@news.hevanet.com...

> - Something that would suggest changes (or offer to make them

itself!) to
> classes that would eliminate the need for #includes. For
example, I
> wrote about the idea of an automatic pimplifier in the first
edition of
> EC++ (though I didn't use that term, because Herb hadn't
popularized it
> yet), but I've never heard of a program that did that. Do such
> programs exist?

The Chic source tool includes an automatic pimplifier and
eliminates the problem of superfluous include directives and
forward declarations. In fact, you almost never have to write
include directives or forward declarations, as they're
automatically generated.

The bad news is that Chic is being built specially for a customer
and we haven't had the resources to support the public version.
Still, for a glimpse of hassle free C++, have a look at

http://www.houlder.net

Tom Houlder

Chic - C++ without hassles
http://www.houlder.net

Balog Pal

unread,

Nov 21, 2003, 9:37:06 AM11/21/03

to

> >There's a little dance involving #ifdef's that can prevent a file
> >being read twice, but it's usually done wrong in practice - the
> >#ifdef's are in the file itself, not the file that includes it.

As that is the good way of it. Lakos' external include guards maybe good
for extremely large projects compiled on 10 year old machines and
compilers -- but that's just another way to enlarge redundancy and chance
for mistakes. [what in those large projects will come back to haunt you
soon.]

> >The
> >result is often thousands of needless lines of code passing through
> >the lexical analyzer, which is (in good compilers) the most expensive
> >phase.

Come on, preproc directives are designed to be easily parseable and even
easier to skip the unwanted #if sections. No lexing shall happen at all,
just skip to the matching #endif. In between no lines are considered but
matching starting #if, #else, #endif, and only to inc/dec a depth counter.

(And little better compilers suppoert #pragma once -- if this including is
really a problem lobby at your vendor to support it too.)

> >Just follow the simple rule.
>
> Not everyone agrees with this - some try to make include files
> 'self-contained'

As that is the only sane way of it. If I want one class/module, anything, I
shall include one header for it. It shall arrange for the rest needed, not
me infesting all my .cc files with 20 includes, and change them anytime thae
component decides to rearrange itself. [anyone around counted the cost of
that redundancy?]

> so that if some header file fred.h needs vector, for
> example, it will #include <vector> itself rather than telling its
> includer jim.c to add such a line. But if you ran such a project
> through a #include-trimmer, it might remove the #include <vector> from
> jim.c because it is 'not needed' (since it already got included as
> part of fred.h, and has an include guard to make the second inclusion
> have no effect) - which then means that jim.c will unexpectedly break
> if it one day stops including fred.h. The inclusions start to work
> only by coincidence.

IMHO that's more like a theoretical problem. When you remve some include,
you compile before check-in, and if anything is missing, you put in those
includes in no time.

In not very large projects, with compilers suppoering precompiled headers it
worth to have most system/library includes in the stdafx.h or its
equivalent, that greatly reduces compilation time, and also all that hassle
about what to include and what not.

> > - Something that would identify superfluous #includes, i.e., #includes
> > that, if removed, don't affect the behavior of the program.
>
> That is a much bigger challenge - the above removes only those that
> don't affect the program text. What you suggest is Turing-complete.
> The best approximation might be to remove as much as possible while
> still getting the program to compile (with the hope that if something
> was not needed to compile it can't have been used anywhere else).

Hopefully you don't remove vital specialisations for stuff you use -- to get
the (wrong) generic template for your stuff unnoticed.

Or -- even more subtle -- don't get in situation to break the ODR due to
have different overload and sepcialisation sets in different modules,
producing slightly different inlines and templates.

> > - Something that would suggest changes (or offer to make them itself!)
to
> > classes that would eliminate the need for #includes.

Wow, the java way -- to have nothing but headers in the end? ;-)

Paul

Gerhard Menzl

unread,

Nov 21, 2003, 11:48:08 AM11/21/03

to

Ed Avis wrote:

> But I think if you trim #includes in such a way you're pushed in the
> direction of following Rob Pike's advice:
>
>
>>Simple rule: include files should never include include files. If
>>instead they state (in comments or implicitly) what files they need
>>to have included first, the problem of deciding which files to
>>include is pushed to the user (programmer) but in a way that's easy
>>to handle and that, by construction, avoids multiple inclusions.
>>Multiple inclusions are a bane of systems programming. It's not rare
>>to have files included five or more times to compile a single C
>>source file. The Unix /usr/include/sys stuff is terrible this way.
>>
>>There's a little dance involving #ifdef's that can prevent a file
>>being read twice, but it's usually done wrong in practice - the
>>#ifdef's are in the file itself, not the file that includes it. The
>>result is often thousands of needless lines of code passing through
>>the lexical analyzer, which is (in good compilers) the most expensive
>>phase.
>>
>>Just follow the simple rule.
>
>
> Not everyone agrees with this - some try to make include files
> 'self-contained' so that if some header file fred.h needs vector, for
> example, it will #include <vector> itself rather than telling its
> includer jim.c to add such a line.

Not everyone? Some? Self-contained headers and include guards are pretty
much standard practice. Large C++ projects would be unmanageable
otherwise. Just imagine what happens when a class gets an additional
private member that requires an additional header file to be included,
and that class is used in two hundred source files. If you "just follow
the simple rule", you would have to change two hundred files manually,
rather than one. To a considerable extent, software engineering is about
minimizing dependencies. Forcing client code to acknowledge
implementation details of library code explicitly achieves the opposite.

I once added external include guards to a project that consisted of over
1000 source files and took hours to build. The difference in build time
I measured was negligible.

In my view, Pike's advice (which relates to C anyway) is as outdated as
5 1/4" floppy disks.

Gerhard Menzl

Francis Glassborow

unread,

Nov 21, 2003, 9:41:47 PM11/21/03

to

In article <3fbd24d7$1...@news.broadpark.no>, Tom Houlder
<thou...@houlder.net> writes

>The Chic source tool includes an automatic pimplifier and
>eliminates the problem of superfluous include directives and
>forward declarations. In fact, you almost never have to write
>include directives or forward declarations, as they're
>automatically generated.

You mean it can read the programmer's mind and understands exactly which
headers to bring in to create the overload set the programmer wants:-) A
tool of this kind is relatively easy in C and damn near impossible in
C++ (not least because there are no requirements on C++ headers
including or not including other C++ headers)

--
Francis Glassborow ACCU
If you are not using up-to-date virus protection you should not be reading
this. Viruses do not just hurt the infected but the whole community.

Chris Theis

unread,

Nov 22, 2003, 5:31:56 AM11/22/03

to

"Scott Meyers" <Use...@aristeia.com> wrote in message

news:MPG.1a25504c6...@news.hevanet.com...

> On 19 Nov 2003 05:20:06 -0500, Daniel Frey wrote:
> > I think it's not that easy. The vectors may not have been removed, but
> > only moved away. This means that you cannot remove #include <vector>,
> > but probably need to move it, too. At least this is what I've observed
> > in our code (~320.000 lines of C++).
>
> I don't understand what you mean exactly. What do you mean by "moved
> away?"

I guess the OP means that the portions of the code involving for example
vector have been moved to another file. But IMHO this makes it perfectly
okay to remove the #include <vector> statement, doesn't it?

>
> Personally, I've found many files with #includes for headers that
> correspond to functionality that is no longer used, i.e., where removing
> the #includes does nothing but speed compilation and linking. Is this an
> uncommon experience?

Well, I don't know about others but I found this to be common behavior,
though this is just based on personal impression and not on studies.

[SNIP]

Chris

Tom Houlder

unread,

Nov 22, 2003, 1:00:49 PM11/22/03

to

"Francis Glassborow" <fra...@robinton.demon.co.uk> wrote in message
news:+zUOKHCJHfv$Ew...@robinton.demon.co.uk...

> In article <3fbd24d7$1...@news.broadpark.no>, Tom Houlder
> <thou...@houlder.net> writes
> >The Chic source tool includes an automatic pimplifier and
> >eliminates the problem of superfluous include directives and
> >forward declarations. In fact, you almost never have to write
> >include directives or forward declarations, as they're
> >automatically generated.
>
> You mean it can read the programmer's mind and understands exactly
which
> headers to bring in to create the overload set the programmer
wants:-)

If there are ambiguities, Chic takes no action, and you have to bring
in the headers yourself.

But your intuition is good, along with conditional compilation and
template instantiation points, overloaded functions are the most
complicated area.

> A tool of this kind is relatively easy in C and damn near impossible
in
> C++

Yes, it's impossible to make a tool that automatically takes all
of the programmer's decisions concerning dependencies. But firstly,
the bulk of the decisions that you cannot take concerns code that's
normally classified as crap code. When that's not the case, Chic
gives
you the possibility of manually designing the physical dependencies
between the various files. And if that doesn't cut it, resort to
classical
ISO C++ and Chic should be able to work alongside that.

> (not least because there are no requirements on C++ headers
> including or not including other C++ headers)

The basic assumption is that each header is selfconsistent. But if
they're not Chic still offer options to work around it.

--
Tom Houlder

Chic - C++ without hassles
http://www.houlder.net

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Steve Toledo-Brown

unread,

Nov 22, 2003, 7:23:29 PM11/22/03

to

Balog Pal wrote:

> You say there are compilers used by real programmers where having that exra
> include has measurable impact on compile time?

Yes, there certainly are. It's not even necessarily just a compiler
issue - some filesystems are much faster than others.
However, I have to say that in my recent experience the compilers that
are slow at compiling come with linkers that are massively slower at
linking, so I'm unclear quite what proportion of difference to the
overall build time the includes will contribute.

Balog Pal

unread,

Nov 23, 2003, 10:32:04 AM11/23/03

to

"Steve Toledo-Brown" <StephendotT...@uk.ibm.com> wrote in message
news:bpojmf$1k3sjk$1...@ID-66723.news.uni-berlin.de...

> > You say there are compilers used by real programmers where having that
exra
> > include has measurable impact on compile time?
>
> Yes, there certainly are. It's not even necessarily just a compiler
> issue - some filesystems are much faster than others.

Well, I must admit the NFS issue just slipped my mind. As, like some other
poster recently wrote compiling from nonlocal disks is a show-stopper
anyway, however good your includes stand. And using SCSS will localise the
files anyway.

So if we talk about the 'how to speed up compile time' going for a
reasonably fast-accessing file source stands way before what and how to
include.

Paul

Bradd W. Szonye

unread,

Nov 25, 2003, 9:11:30 AM11/25/03

to

Balog Pal <pa...@lib.hu> wrote:
>>> You say there are compilers used by real programmers where having
>>> that exra include has measurable impact on compile time?

> "Steve Toledo-Brown" wrote:
>> Yes, there certainly are. It's not even necessarily just a compiler
>> issue - some filesystems are much faster than others.

> Well, I must admit the NFS issue just slipped my mind. As, like some
> other poster recently wrote compiling from nonlocal disks is a
> show-stopper anyway, however good your includes stand. And using
> SCSS will localise the files anyway.

It's not just NFS that can cause this. Some filesystems with built-in
version control systems also have similar issues, and it's not really
practical to copy everything to a local filesystem before building.

Also, as Lakos notes, unnecessary includes can require O(N^2) disk
accesses, which adds up even if the individual accesses are very fast.

> So if we talk about the 'how to speed up compile time' going for a
> reasonably fast-accessing file source stands way before what and how
> to include.

That's not always practical.
--
Bradd W. Szonye
http://www.szonye.com/bradd

Saul

unread,

Nov 25, 2003, 1:34:47 PM11/25/03

to

Scott Meyers wrote

[snip]

>
> 1. Can u tell me which case I can remove unnecessary include header
> files? Any reference is also expected!
>
> 2. Are there any tools for do this work automatically? It is difficult
> for people to do it in a large project!

There's an open-source Java development environment called Eclipse
that provides functionality for automatically detecting unused package
imports.
There's some C++ plugin for Eclipse but I haven't tried it.

Saul

Chris Chedgey

unread,

Nov 25, 2003, 1:53:34 PM11/25/03

to

We have a visual product called Headway Review that does all kinds of
cool stuff plus the following with includes:

- identify any includes that can be immediately removed without
breaking the build. That is if the identified include is from A to B,
then nothing in A uses anything in B, and if A uses something in the
includes closure of B, then A also includes the required file.

- Identify any includes that are not immediately used. Removing these
includes may break the build until direct includes of files in the
closure of B are added to A.

- Identify any missing includes. These are includes on files that A
uses but only includes indirectly through the closure of one of the
files it directly includes.

The 3 lists let you decide and implement your preferred include
stategy, including the most rigourous - if you use something, include
it, if you don't, don't.

We have put a lot of effort into detecting macro dependencies, and
have got the level of "false positives" down to a very low level.

You can request downloads at www.headwaysoftware.com

Scott Meyers <Use...@aristeia.com> wrote in message news:<MPG.1a23ee294...@news.hevanet.com>...

Thomas Mang

unread,

Nov 26, 2003, 4:17:45 AM11/26/03

to

Chris Chedgey schrieb:

> We have a visual product called Headway Review that does all kinds of
> cool stuff plus the following with includes:
>
> - identify any includes that can be immediately removed without
> breaking the build. That is if the identified include is from A to B,
> then nothing in A uses anything in B, and if A uses something in the
> includes closure of B, then A also includes the required file.

I don't think "identifying any includes that can be immediately removed without
breaking the build" is enough.
Depending on the macros, functions ..... defined in included files, the semantics of the program can change
without breaking the build.

For example:

header A:

void foo(double)
{
// do something with double
}

header B:

void foo(int)
{
// do something different with int
}

file C:
#include "A.h"
#include "B.h"

int main()
{
foo(20);
}

as you can see, header B can be removed without breaking a build, but with different program semantics.

Does your tool respect this?
And overloaded functions are probably an easier problem; what about macros, template specializations?

regards,

Thomas

Allan W

unread,

Nov 26, 2003, 9:41:59 AM11/26/03

to

Ed Avis <e...@membled.com> wrote

> >Simple rule: include files should never include include files. If
> >instead they state (in comments or implicitly) what files they need
> >to have included first, the problem of deciding which files to
> >include is pushed to the user (programmer) but in a way that's easy
> >to handle and that, by construction, avoids multiple inclusions.

...

> >Just follow the simple rule.
>
> Not everyone agrees with this - some try to make include files
> 'self-contained' so that if some header file fred.h needs vector, for
> example, it will #include <vector> itself rather than telling its
> includer jim.c to add such a line. But if you ran such a project
> through a #include-trimmer, it might remove the #include <vector> from
> jim.c because it is 'not needed' (since it already got included as
> part of fred.h, and has an include guard to make the second inclusion
> have no effect) - which then means that jim.c will unexpectedly break
> if it one day stops including fred.h. The inclusions start to work
> only by coincidence.

Which is exactly why I do *NOT* agree with your "simple rule."

I have had experience with include files #including other include files
to excess. We developed dozens of reusable classes for one project.
Later we started a second project and tried to port those reusable
classes over. We encountered a problem that we called "header file hell."
One header file would have twelve other #include directives, only one of
which was actually neccesary to correctly compile the classes it
contained. But removing those other #include directives caused hundreds
of other programs to get compile errors.

But a blanket declaration that header files should not #include other
header files, is even worse. As you noted above, it puts the burden
on the programmer to put all of the #include files in the right order.
I've even seen code with cyclic redundancies -- one.h requires two.h
first, which requires three.h first, which requires one.h first! Try
giving that to an intern to fix...

Sane rule: The rule we finally adopted was similar but more
comprehensive. If your code uses a class or function (or whatever)
directly, it must #include the appropriate header directly. Conversely,
if it does NOT use that class or function, it must remove the #includes.
This applies not only to source code modules, but to include files as well.

This means that if Derived is derived from Base, derived.h should
#include base.h. It also means that any source code modules which use
both base and derived directly must #include both header files, ignoring
the fact that derived.h probably #includes base.h. Furthermore, header
files should use forward declarations where possible. These are two of
the first things we check for in code reviews.

The programmer's burden no longer neccesarily involves digging in to
header files to see what other header files need to come first. Instead,
it involves whatever documentation that was needed in the first place.
If you plan to use the Customer class, you obviously have to #include
Customer.h -- do NOT assume that FindPreferredCustomer.h will do this
for you. (Even if it happens to today, it might not tomorrow.)

Obviously with this technique, #include guards are absolutely essential.
Perhaps surprisingly, even without #pragma once or external guards
(where the #ifndef is in the calling code, in addition to the internal
ones in the header file), this DOES speed up compilations. The reason
seems to be that we have moved most of the extraneous #include directives
out of header files -- now, Customer.h only includes Salesperson.h if
it needs to use Salesperson directly in the Customer class, and it no
longer does anything with the Database header, or the StoreLocation
header, or anything else not needed directly.

This technique ensures that all needed header files have been included,
while isolating the code from unimportant changes in other headers.
More importantly, it also reusable code to be reusable! If you want to
bring Customer.h and Customer.cc into a new project, and Customer.h
uses Salesperson.h, you can be confident that Salesperson.h really is
actually needed in order to make the Customer class work correctly.
Meanwhile, the other twelve header files that represent requirements
it USED to have, will no longer be there -- avoiding Header File Hell.

Unfortunately, this technique does defy automation. It would be easy to
scan code that uses Customer.h to see that it actually does use Customer
class directly, by scanning for the class name Customer. But it might
not be so easy to figure out which classes (in which namespaces) that
Customer.h defines in the first place, in order to know what to look
for... nor to differentiate code that uses Namespace1::Customer from
code that uses Namespace2::Customer (which is a completely different
header file).

As simple (programmer-enforced) rules go, I still prefer this one.
Call it the "Sane rule."

Allan W

unread,

Nov 26, 2003, 9:50:15 AM11/26/03

to

> > > You say there are compilers used by real programmers where
> > > having that exra include has measurable impact on compile time?

> "Steve Toledo-Brown" <StephendotT...@uk.ibm.com> wrote

> > Yes, there certainly are. It's not even necessarily just a compiler
> > issue - some filesystems are much faster than others.
>

"Balog Pal" <pa...@lib.hu> wrote

> Well, I must admit the NFS issue just slipped my mind. As, like some other
> poster recently wrote compiling from nonlocal disks is a show-stopper
> anyway, however good your includes stand. And using SCSS will localise the
> files anyway.

Hmmph. When 10-million-baud LANs were new (at least to me), they were
considered blindingly fast. And they were, compared to state-of-the-art
computers typically used at workstations back then. We found more than
once that moving an application to a non-local disk would actually
speed it up! (Novell Network was faster than MFM "hard disks").

My point (yes, I do have one) is that if you obsess about speed,
trial-and-error is still the best way to find the best configuration.
It's not always intuitive.

[No, I'm not old, I'm just forgetful... where did I put my teeth?]

ka...@gabi-soft.fr

unread,

Nov 28, 2003, 4:07:39 AM11/28/03

to

"Balog Pal" <pa...@lib.hu> wrote in message
news:<3fc0...@andromeda.datanet.hu>...

> "Steve Toledo-Brown" <StephendotT...@uk.ibm.com> wrote in
> message news:bpojmf$1k3sjk$1...@ID-66723.news.uni-berlin.de...

> > > You say there are compilers used by real programmers where having
> > > that exra include has measurable impact on compile time?

> > Yes, there certainly are. It's not even necessarily just a compiler
> > issue - some filesystems are much faster than others.

> Well, I must admit the NFS issue just slipped my mind. As, like some
> other poster recently wrote compiling from nonlocal disks is a
> show-stopper anyway, however good your includes stand. And using SCSS
> will localise the files anyway.

Realistically, you're never compiling on the machine with the source
code in a large project -- any decent make will run the compiles in
parallel on different machines. Where as you can generally arrange to
run the link on the file server with the libraries.

And of course, SCCS is rather primative compared to the better tools
these days -- which give the user a virtual view of his chosen state.
To do this, of course, they have there own NFS server, which generally
will NOT be on the local machine either (and are usually somewhat slower
than a classical NFS server, too).

> So if we talk about the 'how to speed up compile time' going for a
> reasonably fast-accessing file source stands way before what and how
> to include.

Yes and no. If it means, say, going back in time from Clearcase to
SCCS, it is going to be mighty costly in terms of programmer
productivity.

--
James Kanze GABI Software mailto:ka...@gabi-soft.fr
Conseils en informatique orientée objet/ http://www.gabi-soft.fr
Beratung in objektorientierter Datenverarbeitung
11 rue de Rambouillet, 78460 Chevreuse, France, +33 (0)1 30 23 45 16