Modules before adding Concepts

290 views
Skip to first unread message

snk_kid

unread,
Sep 30, 2013, 11:07:09 AM9/30/13
to std-pr...@isocpp.org
Hi, I wanted to post this in the Concepts group but I never got a response from my join request so I'm posting here. Something that I've been thinking about on and off for a while but since the GoingNative 2013 videos I wanted to put my thoughts out there and see what the community thinks.

I don't remember which video it was but Bjarne Stroustrup mentioned that one of the issues of the old Concepts proposals was that compilation times got worse with the prototype implementation.

I started to think that maybe the community is going about this the wrong way, maybe the first step and the highest priority above everything else should be to standardize a module system so we can get away from textual inclusion we have now. Then come back to features, features which can have an influence (and influenced by) compilation times and see how they perform.

To me it does not seem to make much sense to to accept/reject solutions which are disadvantaged by an old/outdated system (textual includes) that everyone knows the issues with it and we eventually be replaced.

I'm not trying to say that we should (or should not) bring back the old Concepts proposal but rather we should fix the fundamental issues first  (the ones which affects the 99%) before coming up with solutions for new features, ones that are influenced by them.

I'm also in the belief that regardless of Concepts, a module system should be the highest priority above all else. I've been working with C++ commercially for almost 10 years (and longer prior to this). When working with a team of programmers on a large project build times for C++ projects can get pretty insane and most C++ programmers do not consistently use features/techniques like forward declarations if ever. It's just too much mental work and too mechnical for a team of programmers to get right especially when programmers are rushed to meet deadlines.

The more extensions and/or features added to C++ that influence the build times, the worse the build times are going to get.

Ville Voutilainen

unread,
Sep 30, 2013, 11:23:17 AM9/30/13
to std-pr...@isocpp.org
On 30 September 2013 18:07, snk_kid <korcan....@googlemail.com> wrote:

I'm not trying to say that we should (or should not) bring back the old Concepts proposal but rather we should fix the fundamental issues first  (the ones which affects the 99%) before coming up with solutions for new features, ones that are influenced by them.


Well, I guess I'm in the 1% then, because lack of modules is not a problem for me.

Robert Zeh

unread,
Sep 30, 2013, 11:57:45 AM9/30/13
to std-pr...@isocpp.org
On Mon, Sep 30, 2013 at 10:07 AM, snk_kid <korcan....@googlemail.com> wrote:
I started to think that maybe the community is going about this the wrong way, maybe the first step and the highest priority above everything else should be to standardize a module system so we can get away from textual inclusion we have now. Then come back to features, features which can have an influence (and influenced by) compilation times and see how they perform.

To me it does not seem to make much sense to to accept/reject solutions which are disadvantaged by an old/outdated system (textual includes) that everyone knows the issues with it and we eventually be replaced.



I agree that the compile times for projects over 1 million lines of code makes life difficult --- it has certainly made my life difficult.  Fixing it is important, and it will most likely be fixed by modules.

However, I haven't been able to find a single priority queue for the community :-)  The people working on modules are not fungible with the people working on concepts for a lot of reasons, but keep in mind that they they are also volunteers.  Volunteers work on what they want to work on.  It is not as if all the other standardization work (sized dealloc, filesystem, concurrency, etc) is being done at the expense of modules.

One thing you could do is to talk to the people working on modules and see if there is anything that you could help with.

Robert

David Krauss

unread,
Oct 1, 2013, 1:00:07 AM10/1/13
to std-pr...@isocpp.org
On 9/30/13 11:57 PM, Robert Zeh wrote:
> On Mon, Sep 30, 2013 at 10:07 AM, snk_kid <korcan....@googlemail.com>wrote:
>
>> Then come back to features, features which can have
>> an influence (and influenced by) compilation times and see how they perform.
>>
> I agree that the compile times for projects over 1 million lines of code
> makes life difficult --- it has certainly made my life difficult.

Why don't precompiled headers work for you?

Adjusting the language to make compilers faster seems conceptually
backwards, or a red flag. I haven't looked into modules much, but I'd
assumed the motivation was elsewhere. I thought header repetition was a
solved problem.

Loading precompiled header files also takes significant time, but
modules wouldn't necessarily be any better. I'm sure that can be
improved, but it's a job for compiler vendors.

Standardizing an interoperable format equivalent to a precompiled header
would take forever. PCHes are just AST dumps. Standardizing a
compressed, serialized AST for every C++ construct? That's much harder
than standardizing an ABI. And when a compiler has to perform meaningful
conversion because the native AST differs, performance is lost and the
exercise becomes pointless.

So we already have a speedy binary format on every platform used for
serious work, and we're never going to unify these formats anyway� what
are modules supposed to fix? Just to remove the PCH build-system
boilerplate by letting the compiler cache things more easily? Seems minor.

Klaim - Joël Lamotte

unread,
Oct 1, 2013, 8:12:31 AM10/1/13
to std-pr...@isocpp.org
On Tue, Oct 1, 2013 at 7:00 AM, David Krauss <pot...@gmail.com> wrote:

Why don't precompiled headers work for you?


I must say it don't work for me neither. Precompiled headers are not a silver bullet, they also can have the reverse effect of adding compile time.
Anyway, it's better to manage your dependencies well, but it's also hard with a language that can't figure out which part of the compiled file is a dependency or not (because it can't tell you when you don't
need to include a file anymore, after a change).
 
Adjusting the language to make compilers faster seems conceptually backwards, or a red flag.

Not if it's the design of the language that prevent build time to get faster. 
 
I haven't looked into modules much, but I'd assumed the motivation was elsewhere.

My understanding is that the main motivation is reducing build time. You can read the issues there: http://clang.llvm.org/docs/Modules.html#problems-with-the-current-model
The first stated problem that Modules would solve is "Compile-time scalability". Also see the Modules paper for more details on this.
 
I thought header repetition was a solved problem.


It's not: not repeating opening the same file several time is indeed solved problem, but the content of a header might be interpreted differently depending on the code after and before it's inclusion,
which forces the compiler to at least re-parse the content of headers for each compilation unit.

Loading precompiled header files also takes significant time, but modules wouldn't necessarily be any better. I'm sure that can be improved, but it's a job for compiler vendors.


The Modules documentation states that it can make "the API of each software library is only parsed once, reducing the M x N compilation problem to an M + N problem."
Precompiled headers can't do that. They also can't automatically be configured to scale with changes.
 
Standardizing an interoperable format equivalent to a precompiled header would take forever. PCHes are just AST dumps. Standardizing a compressed, serialized AST for every C++ construct? That's much harder than standardizing an ABI. And when a compiler has to perform meaningful conversion because the native AST differs, performance is lost and the exercise becomes pointless.


I've seen no such format proposal. Did I miss something? 
 
So we already have a speedy binary format on every platform used for serious work, and we're never going to unify these formats anyway… what are modules supposed to fix? Just to remove the PCH build-system boilerplate by letting the compiler cache things more easily? Seems minor.

 
It's not. First, there is no way PCH builds can be scaled.
Second, lack of quick feedback is killing productivity. It's a major problem.
All the C++ projects I've worked on so  far have had this build time project, even the smallest.
Even applying all the best dependencies technique (including unity builds, which are no silver bullet either) can't scale.

Klaim - Joël Lamotte

unread,
Oct 1, 2013, 8:19:47 AM10/1/13
to std-pr...@isocpp.org

On Tue, Oct 1, 2013 at 2:12 PM, Klaim - Joël Lamotte <mjk...@gmail.com> wrote:
All the C++ projects I've worked on so  far have had this build time project, even the smallest.

I meant "All the C++ projects I've worked on so  far had this build time problem, even the smallest project."

stackm...@hotmail.com

unread,
Oct 1, 2013, 8:43:31 AM10/1/13
to std-pr...@isocpp.org
So many ignorant people. Pathetic.

Sean Middleditch

unread,
Oct 1, 2013, 2:12:21 PM10/1/13
to std-pr...@isocpp.org
On Monday, September 30, 2013 10:00:07 PM UTC-7, David Krauss wrote:
On 9/30/13 11:57 PM, Robert Zeh wrote:
> On Mon, Sep 30, 2013 at 10:07 AM, snk_kid <korcan....@googlemail.com>wrote:
>
>> Then come back to features, features which can have
>> an influence (and influenced by) compilation times and see how they perform.
>>
> I agree that the compile times for projects over 1 million lines of code
> makes life difficult --- it has certainly made my life difficult.

Why don't precompiled headers work for you?

They cause many problems.  They're often buggy.  They play very poorly with the preprocessor but yet have to play with it where as modules just side-step the preprocessor entirely.  Precompiled headers impose a large number of limitations.  They're not universally available.  Setting them up differs between compilers, IDEs, and build systems.  They can often _increase_ build time if used naively and don't help nearly as much as modules can when used intelligently.  Precompiled headers are a hackjob stopgap to work around the severe limitations and problems of the C preprocessor token-pasting dependency mechanism.
 

Adjusting the language to make compilers faster seems conceptually
backwards, or a red flag. I haven't looked into modules much, but I'd

Not fixing a language that is often characterized by its atrocious compile times when a clearly identified and widely known (to other similar languages) solution is presented is a lot more backwards, as I see it.  There is a very clear problem and a clear fix; why not do it?  :)
 

assumed the motivation was elsewhere. I thought header repetition was a
solved problem.

Loading precompiled header files also takes significant time, but
modules wouldn't necessarily be any better. I'm sure that can be
improved, but it's a job for compiler vendors.

Other languages have long laid all this to rest.  I'd suggest looking into the module sytems - and build time improvements brought about by them - in compiled languages like C#, Java, D, Go, etc.  The C include pattern is a huge problem that simply does not scale at either a parsing or an I/O level.  That's not surprising considering that the whole C preprocessor was a hack to work around short-comings in the original C language itself (you might be aware of what C coding was like before the preprocessor was tacked on; lots of manual copying of prototypes and structure definitions from big printed manuals).  The original C language had no way to include dependencies without pasting things in.  The preprocessor is just a tool to make that pasting automated.  Modules fundamentally changes and fixes the entire problem rather than just hacking it to be barely tolerable.
 
Standardizing an interoperable format equivalent to a precompiled header
would take forever. PCHes are just AST dumps. Standardizing a
compressed, serialized AST for every C++ construct? That's much harder
than standardizing an ABI. And when a compiler has to perform meaningful
conversion because the native AST differs, performance is lost and the
exercise becomes pointless.

<pedantry> Well, it's not a hard problem to solve at the basic level of having such an interoperable format.  It's an impossible problem to solve in a way that provides any significant benefit.  Simply changing things from needing to parse C/C++ (a portable interchange of machine instructions, if you will) into a compiler-specific in-memory format to needing to parse a slightly more compact version into a compiler-specific in-memory format is going to help on the I/O side a bit, but not the parsing or template instantiation side.  The problem is not that it's _hard_ to provide a portable format but that no such portable format can actually do what PCHs do, making the entire exercise pointless. A portable PCH would likely just be a big C++ header file run through the preprocessor, maybe with a terser syntax. </pedantry>
 

So we already have a speedy binary format on every platform used for
serious work, and we're never going to unify these formats anyway� what
are modules supposed to fix? Just to remove the PCH build-system
boilerplate by letting the compiler cache things more easily? Seems minor.

Modules simplify the entire coding process.  No other language designed requires such an awkward split as header files.  They're sometimes touted as a clean separation between implementation and interface, but that's rarely true.  Templates make that untrue.  Class definitions usually make that untrue (since you the header needs to declare public and private members).  The needs of the linker - which on a related subject are in general pretty terrible - mandates a lot in terms of putting private symbols in headers to share things between translation units.  Look at the pattern adopted by many/most libraries where they have to split headers between internal and external headers simply because headers by themselves don't actually solve any problems at all besides the awkward translation unit semantics inherited from C.

Modules let C++ behave more like all the other modern C-like languages.  They make the problems of header dependencies _disappear_.  They make the issues with public and private symbols _explicit_ rather than poorly abstracted.  They remove the code-duplication necessary between a header file and a cpp file.  They remove the arbitrary (from the standpoint of a newcomer) distinction between which code must be in a header and which code can/should be in a cpp go away.  If fully embraced, they can make most of the hairier problems of the linker and dynamic library generation go away, too.  And yes, they (can) make the compilation and linking process significantly shorter (if designed/implemented well) in cases where a PCH is infeasible or counter-productive and without all the nasty side-effects of a unity build.

David Krauss

unread,
Oct 2, 2013, 12:00:19 AM10/2/13
to std-pr...@isocpp.org


On Wednesday, October 2, 2013 2:12:21 AM UTC+8, Sean Middleditch wrote:
On Monday, September 30, 2013 10:00:07 PM UTC-7, David Krauss wrote:
On 9/30/13 11:57 PM, Robert Zeh wrote:
> On Mon, Sep 30, 2013 at 10:07 AM, snk_kid <korcan....@googlemail.com>wrote:
>
>> Then come back to features, features which can have
>> an influence (and influenced by) compilation times and see how they perform.
>>
> I agree that the compile times for projects over 1 million lines of code
> makes life difficult --- it has certainly made my life difficult.

Why don't precompiled headers work for you?

They cause many problems.  They're often buggy.  

Vacuous argument.
 
They play very poorly with the preprocessor but yet have to play with it where as modules just side-step the preprocessor entirely.

A module sidesteps it by starting translation with no user-defined macros, no? And a PCH inclusion must be the first thing in the TU, before any macros are defined. It would seem that modules are merely multiple PCHes, or conversely that the problem of using PCHes is exactly that of generating a one-size-fits-all PCH-module (or a small set of few-sizes-fit-most PCH-modules) from header components.

If modules are forbidden from defining macros entirely, that's not a good thing. All I know is that there have been various proposals of different scopes. It's hard to tell what you're referring to.
 
 Precompiled headers impose a large number of limitations.  They're not universally available.  Setting them up differs between compilers, IDEs, and build systems.

The same applies to headers, not because of preprocessor limitations but because the language says little about file storage. (And projects really do manage headers differently existence, so the vagueness has a purpose.) Every "professional" platform I've seen has a facility to dump state to a local file, and reload it as the first directive in a TU.

It would be nice to have a standard way of doing that. It could be just a single pragma.

Modules, like PCHes, would need to be build targets. Big programs are harder to build; that's just reality.
 
 They can often _increase_ build time if used naively

The necessary work consists of checking to see which headers are slow, and only putting the PCH in sources that actually need those headers.
 
and don't help nearly as much as modules can when used intelligently.

PCH is a build optimization to be used in addition to headers, which are semantic equivalent to other languages' modules. Indeed it's unhelpful to need to work towards a faster build. As for helping the build times, you're just speculating because C++ isn't other languages.
 
 Precompiled headers are a hackjob stopgap to work around the severe limitations and problems of the C preprocessor token-pasting dependency mechanism.

Some implementations are half-assed, but as for performance, the vendor is just addressing customer needs. Modules aren't magic fast sauce as the underlying problem is the same.
 
Adjusting the language to make compilers faster seems conceptually
backwards, or a red flag. I haven't looked into modules much, but I'd

Not fixing a language that is often characterized by its atrocious compile times when a clearly identified and widely known (to other similar languages) solution is presented is a lot more backwards, as I see it.  There is a very clear problem and a clear fix; why not do it?  :)

We're now talking about slow compile times due to excessive long-hand code. Not templates or complicated overload resolution, because they need to be done the same no matter what.

Wouldn't another fix be to cache redundant definitions, mapping token streams to AST, and skip parsing them? The ODR rule specifically allows this, but no implementations do it. (Disclaimer: I plan to implement this.)

Or, for the here-and-now, minimize the interface and combat bloat with proper separation of concerns. Reusable, highly generic header libraries like Eigen and Boost are slow, but amenable to PCH. They aren't bloat. But if you have definitions living in a header which *aren't* getting reused all the time, perhaps they don't belong to an interface.
 
Loading precompiled header files also takes significant time, but
modules wouldn't necessarily be any better. I'm sure that can be
improved, but it's a job for compiler vendors.

Other languages have long laid all this to rest.  I'd suggest looking into the module sytems - and build time improvements brought about by them - in compiled languages like C#, Java, D, Go, etc.

I'm also familiar with the quirks of Java viz module versioning and identification (and source tree layout, yuck). You still get situations where a "clean rebuild" fixes an issue. Aside from C# which I haven't used, I haven't seen D or Go used in a "really big" project.
 
 The C include pattern is a huge problem that simply does not scale at either a parsing or an I/O level.  That's not surprising considering that the whole C preprocessor was a hack to work around short-comings in the original C language itself (you might be aware of what C coding was like before the preprocessor was tacked on; lots of manual copying of prototypes and structure definitions from big printed manuals).  The original C language had no way to include dependencies without pasting things in.  The preprocessor is just a tool to make that pasting automated.  Modules fundamentally changes and fixes the entire problem rather than just hacking it to be barely tolerable.

Impassioned, but a vacuous argument. I/O can be cached (and usually is), and a module system only changes its quantity by a constant factor. The ODR rule already gives us enough to skip unnecessary parsing. In theory, and hopefully soon in practice, the preprocessor does not make anything else slow. And preprocessing itself (given a file cache) is fast enough not to be a problem.

No implementation to my knowledge has even bothered to internalize caching, which would be a first step. This is ultimately what makes Java fast: the modules are always loaded in the always-on VM.
 
<pedantry> Well, it's not a hard problem to solve at the basic level of having such an interoperable format.  It's an impossible problem to solve in a way that provides any significant benefit.

Just to keep apples to apples, note that Java does derive a benefit because the task at hand isn't translation to native code.

You're mostly rephrasing what I had said.
 
A portable PCH would likely just be a big C++ header file run through the preprocessor, maybe with a terser syntax. </pedantry>

I was supposing it would already be run through the parser, hence AST. To my knowledge that's what most PCHes do.

So we already have a speedy binary format on every platform used for
serious work, and we're never going to unify these formats anyway� what
are modules supposed to fix? Just to remove the PCH build-system
boilerplate by letting the compiler cache things more easily? Seems minor.

Modules simplify the entire coding process.  No other language designed requires such an awkward split as header files.  They're sometimes touted as a clean separation between implementation and interface, but that's rarely true.  Templates make that untrue.

If you *want* template interface separation, you can have it by forward declaration. Folks are just lazy to do so.

Function return type deduction is the first thing to my knowledge to really lock out the possibility of interface separation.
 
 Class definitions usually make that untrue (since you the header needs to declare public and private members).

This is a real issue, if you really want to hide the private members. But I'm not aware of a proposed C++ solution. Modules would make them invisible only by obfuscation of the binary format. If that format is unportable, the module file still needs to be generated by a publicly distributed source file.

Java doesn't solve it either, it just makes virtual dispatch and factory functions more idiomatic. If you use C++ like Java, the problem already goes away.
 
 The needs of the linker - which on a related subject are in general pretty terrible - mandates a lot in terms of putting private symbols in headers to share things between translation units.  Look at the pattern adopted by many/most libraries where they have to split headers between internal and external headers simply because headers by themselves don't actually solve any problems at all besides the awkward translation unit semantics inherited from C.

If you don't want to distribute internal headers, you either need separate external headers, or comprehensive documentation in a non-source format, which adds certain risks. I don't see how this relates to TUs. It's a matter of JavaDoc vs Doxygen. Users who don't rely on a manual will see private members when they look up the public ones.

The way to avoid this in Java is with an additional abstract base class, and the same applies in C++. But of course this doesn't mean it applies to templates. So really this is fairly new territory. It's not fair to say we should copy a solution from a non-templated language.
 
Modules let C++ behave more like all the other modern C-like languages.  They make the problems of header dependencies _disappear_.  They make the issues with public and private symbols _explicit_ rather than poorly abstracted.  They remove the code-duplication necessary between a header file and a cpp file.  They remove the arbitrary (from the standpoint of a newcomer) distinction between which code must be in a header and which code can/should be in a cpp go away.  If fully embraced, they can make most of the hairier problems of the linker and dynamic library generation go away, too.  And yes, they (can) make the compilation and linking process significantly shorter (if designed/implemented well) in cases where a PCH is infeasible or counter-productive and without all the nasty side-effects of a unity build.

 I do think that C++ needs to improve interface specification and separation. Modules might help, but remember that C++ does more than other C-like languages. It never foists a pointer upon you, even if it would simplify the ABI or language implementation, and there are templates.

As for performance, implementations should continue to improve handling of the current language. We shouldn't need a silver bullet. I still have doubts about non-viability of PCHes, but I guess what matters is real user experience, not theoretical user experience :v) .

Reply all
Reply to author
Forward
0 new messages