What is the impact of Modules on buildsystems?

1,039 views
Skip to first unread message

Stephen Kelly

unread,
Jan 22, 2017, 6:49:39 AM1/22/17
to mod...@isocpp.org, mod...@microsoft.com

Hello,

I am interested in Modules both as a C++ developer and as a developer of a buildsystem (CMake). I realize that the standard has nothing to say about buildsystems or even compiler drivers, but the impact that the standard has on those (particularly in the case of Modules) will affect adoption of the feature, so I think it makes sense to talk about it.

It seems to me that large amounts of C++ library code would have to be rewritten to take advantage of them.

In the Microsoft implementation of Modules at least, it seems that users will have to create new ixx files, and the buildsystem will have to gain additional steps to generate ifc files

 https://blogs.msdn.microsoft.com/vcblog/2015/12/03/c-modules-in-vs-2015-update-1/

Other implementations might make other choices, and tools like cmake will try to abstract away the differences to whatever extent is possible. For now in the rest of this email, I'll assume the current Microsoft implementation is 'the way'.

One of the problems I have with all examples of Modules that I've seen is that they use simple inline functions to demonstrate the feature. What does 'using Modules' look like for a shared library?

Imagine I have 40 classes in a library, which today each has a .cpp and a .h file and I want to support modules. I am required to write one ixx file for the entire library, so I copy the contents of the 40 .h files into the MyLib.ixx and I add a few export keywords.

 Question: Does having a single file for all declarations like this give me 'modularity'?

Now I'm unsatisfied with the duplication from the .h files, so I delete them all and I go to each of the 40 .cpp files and replace the

 #include "foo1.h"

with

 import MyLib;

So, the duplication is gone. Additionally, now I have the definition of the class Foo{2..40} available while I'm implementing Foo1. I might also (It is not clear to me) have the definitions from any dependent libraries available.

 Question: Is that a deliberate outcome of the choices made in designing Modules? What impact does it have on maintainability of the code?

If it is deliberate, it is a very large change. Such a large change should surely be the result of a decision made about the language, not an unmentioned/undiscussed side effect. It has a huge impact on adoption of the feature.

So, now I've updated my C++ code and it is time to consider the impact of this change on my buildsystem. Previously, my buildsystem would compile each of the 40 cpp files into object files and then link them together. Now, my buildsystem has to perform the step of generating the .ifc file from the .ixx file. Buildsystems need to know the output of commands which make up the build, so I have a custom command which uses

 cl /c MyLib.ixx /module:output C:\Output\path\MyLib.ifc

which apparently produces an object file (what does it contain? Is that what I want?) and the MyLib.ifc file.


My buildsystem knows to invoke that command before attempting to compile any of my cpp files. If I have multiple libraries in the same build and there are dependencies between them, my buildsystem knows to generate the .ifc files in the correct order.

 Question: What is the impact of naming the module in the .ixx file? What synchronization is expected between the 'module M;' name and the name of the .ifc file generated? What if there was no 'module M;' syntax? Is there an alternate design without name redundancy?

I'm interested in any answers to the questions I've asked here, but if you prefer I am also very interested in whether anyone has done their own thinking about this and has an answer to the open question

 What is the impact of Modules on buildsystems?

Thanks,

Steve.

Klaim - Joël Lamotte

unread,
Jan 23, 2017, 6:09:22 AM1/23/17
to mod...@isocpp.org, C++ Modules

On 22 January 2017 at 12:49, Stephen Kelly <stev...@gmail.com> wrote:
One of the problems I have with all examples of Modules that I've seen is that they use simple inline functions to demonstrate the feature. What does 'using Modules' look like for a shared library?

On this point in particular, my current understanding is that implementation-specific instructions like dllexport
are totally orthogonal to the module's proposal export.
Which basically mean that at for visual studio compiler, if you don't do this:

module A;

export class SOME_EXPORTING_MACRO MyClass
{
}; 

your class isn't exposed by a shared library.

I might be wrong but that's what I assumed in my experimentations and it seemed to work as expected.

I assume that the default with gcc would be to export symbols when the type is exported as even non exported types
in non-module translation units are exported by defaults if not static already (unless you use the appropriate flags of course).


Joël Lamotte



Gabriel Dos Reis

unread,
Jan 30, 2017, 12:11:56 PM1/30/17
to Stephen Kelly, mod...@isocpp.org, C++ Modules

Steve:

>> I realize that the standard has nothing to say about buildsystems or even compiler drivers, but the impact that the standard has on those (particularly in the case of Modules) will affect adoption of the feature, so I think it makes sense to talk about it.

 

Agreed.

 

Steve:

>> It seems to me that large amounts of C++ library code would have to be rewritten to take advantage of them.

 

That hasn’t been my experience, nor is there any factual logical reason for that to be.

If your existing library/program is already architecturally modular, there is no reason to have to rewrite them.  You may have

  • to add a module declaration to state in code the symbolic name of your module (which was in the background of your architecture, but you had no way to express directly)
  • stick the keyword ‘export’ in front of the declarations that you intended to be part of the interface of your library

but that is a far cry from “have to be rewritten.”

 

Steve:

>> In the Microsoft implementation of Modules at least, it seems that users will have to create new ixx files

 

No, there is no requirement that you have to create new ixx files.  You can invoke the compiler with -module:interface command line switch.  Having an “ixx” is a convenience, not a requirement.

 

Steve

>> One of the problems I have with all examples of Modules that I've seen is that they use simple inline functions to demonstrate the feature.

 

I don’t know that to be true.  Certainly the design document (P0142R0) contains examples of non-inline functions.

If your larger suggestion is that we need more tutorial materials, I fully agree with that.

 

Steve:

>> What does 'using Modules' look like for a shared library?

 

I know you know the distinction, but I want to take the opportunity here to clarify a common confusion. C++ modules are distinct from shared libraries or DLLs.  There is no requirement that a C++ module be compiled to a single shared library or that a shared library corresponds to a single module.  Furthermore, C++ export declarations are not necessarily linker-level export symbols (e.g. dllexport) for dynamic linked purposes.  It is most likely that a linker-level symbol that is dllexported ends up being declared ‘export’ at a C++ source level; but the converse does not necessarily hold.

 

Now, if you do export a declaration both in the C++ sense (using the keyword ‘export’) and in the linker sense (e.g. _declspec(dllexport)), VC++ will take the step of automatically marking the symbol as ‘dllimport’ on the ‘import side’.  This relives you from having to perform the confusing macro dance regarding when to put __declspec(dllexport) or __declspec(dllimport).  All of that evaporates.

 

Steve:

>> Imagine I have 40 classes in a library, which today each has a .cpp and a .h file and I want to support modules. I am required to write one ixx file for the entire library, so I copy the contents of the 40 .h files into the MyLib.ixx and I add a few export keywords.

 

I am a bit puzzled.  If your existing library has 40 classes in modular 40 headers, and that corresponds to the architecture you had in mind for your library, why would you want to smash them together into a single MyLib.ixx?  Apparently you have determined that you wanted a single module instead of 40 modules (that would mirror your existing 40 headers).  If that is the case, then why didn’t you have a super-header that included all of the 40 headers which would have been the official interface of your library?  The question isn’t rhetorical; I’m trying to understand what has changed in your architecture to push you to do that?

 

By the way, you can also have this:

 

    // file: MyLib.ixx

    module MyLib;

    #include “foo1.h”

    // …

    #include “foo40.h”

 

If you like the precept of one class per file.  You can even go further using module aggregation if you go by the precept of one class per module.

 

Steve:

>> Question: Is that a deliberate outcome of the choices made in designing Modules? What impact does it have on maintainability of the code?

 

Yes.  It help maintenance by centralizing the definitions of helpers (shared by module units of the same module) at one place.

 

Steve:

>> If it is deliberate, it is a very large change.

 

No, not really.  The actual issue here is the decision made to lump the contents of 40 headers together.  That isn’t required if it does not match the architecture you had in mind for your library.

 

Steve:

>> which apparently produces an object file (what does it contain? Is that what I want?) and the MyLib.ifc file.

 

First all, a module interface unit can contain definitions, so their compilation generally need to go somewhere.  The traditional place is object file.  So, yes that is what you want. :-)

What this means is that compiling a module interface file produces at least two outputs.

 

Steve:

>> My buildsystem knows to invoke that command before attempting to compile any of my cpp files. If I have multiple libraries in the same build and there are dependencies between them, my buildsystem knows to generate the .ifc files in the correct order.

 

That is awesome!  Which version of CMake has this capability?

 

Steve:

>> Question: What is the impact of naming the module in the .ixx file? What synchronization is expected between the 'module M;' name and the name of the .ifc file generated? What if there was no 'module M;' syntax? Is there an alternate design without name redundancy?

 

There is no formal relationship between the pathname of the file containing a module interface unit and the module name itself.

“module M;” is what indicate to the compiler (from C++ semantics) that any declaration that follows is owned by the module M.  If that module declaration is missing then there is no module semantics.

 

In the VC++ case, the name of the IFC file is settable by the user via the command line “-module:output”.  You can choose any name you want.  If you import a module M in your source, you need to specify an IFC file for the compiler to find the metadata for the interface of M.  You can do that via “-module:reference <pathname>” where you specifiy a specific file, or via a directory search path with the option “-module:search <directory>” in which case the compiler will try to associate the module M with a file M.ifc in that directory.  I recommend being explicit, e.g. using the “-module:reference” option.  See the sections “Consuming Modules” and “Module Search Path” at https://blogs.msdn.microsoft.com/vcblog/2015/12/03/c-modules-in-vs-2015-update-1/

 

Steve:

>> What is the impact of Modules on buildsystems?

 

This is a good question.

The Windows engineering team upgraded its existing build system to include infrastructure for using modules.  From what I understand, it was a smooth upgrade. 

 

Thanks,

 

-- Gaby

Johan Boulé

unread,
Feb 4, 2017, 7:07:06 PM2/4/17
to SG2 - Modules
Thanks for your interesting questions and answers.

It's good news that msvc's modules allow to get rid of the dll-im/ex-port macro dance. Still no standard keyword though.

I wish it would go beyond that though.

For example a compiler flag to automatically dll-export module-exports, or some auto-import feature à la mingw. I don't know if mingw is doing an ugly hack internally but linking directly to a DLL file without macros or these strange importlibs sure is more pleasant.

On the other hand msvc has a pragma to do autolinking which saves headaches when you want to link e.g. to a boost lib which like to encode the phase of the moon in its libname ;)

I can't help but think that since modules do impact the boilerplate source code, and also the build systems, people who are willing to do the change might as well go on and do a few extra fixes to de-uglify the way libraries are declared and used.

I don't mean it has anything to do with modules, but it's probably a very good timeframe to start addressing a standardisation of a minimal set of pragmas/attributes to help with library declarations.

As a buildsystem writer, I also wouldn't mind if compiler flags for modules were standardized. The defacto standard C compiler flags have been standardized into POSIX for example (an ISO standard too IIRC).

Are libraries ever discussed inside of the ISO committee ?

Daniel Krügler

unread,
Feb 5, 2017, 7:47:06 AM2/5/17
to mod...@isocpp.org
The committee discusses proposals and issues, but it doesn't discuss
themes "as such" (at least not officially), because there would be no
base to vote for or against something. Either look at the paper
history (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/) and ask
what the outcome of previously written concrete papers was or write a
paper that proposes something to be discussed addressing library
standardization, see

https://isocpp.org/std/submit-a-proposal

for the general way of doing so. For any further details about writing
proposals please contact the lwgchair via the email address published
here:

http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html

Thanks,

- Daniel

Stephen Kelly

unread,
Feb 5, 2017, 6:55:27 PM2/5/17
to Gabriel Dos Reis, mod...@isocpp.org, C++ Modules

Hi Gaby,

Thanks for the response.


On 01/30/2017 05:11 PM, Gabriel Dos Reis wrote:
Steve:

>> It seems to me that large amounts of C++ library code would have to be rewritten to take advantage of them.

 

That hasn’t been my experience, nor is there any factual logical reason for that to be.


That's good.


If your existing library/program is already architecturally modular, there is no reason to have to rewrite them.  You may have

  • to add a module declaration to state in code the symbolic name of your module (which was in the background of your architecture, but you had no way to express directly)

I don't know what you are referring to here.


  • stick the keyword ‘export’ in front of the declarations that you intended to be part of the interface of your library

but that is a far cry from “have to be rewritten.”


Agreed.


 

Steve:

>> In the Microsoft implementation of Modules at least, it seems that users will have to create new ixx files

 

No, there is no requirement that you have to create new ixx files.  You can invoke the compiler with -module:interface command line switch.  Having an “ixx” is a convenience, not a requirement.


Ok. I didn't see anything about this on the blog post I linked before, and an internet search didn't show anything up either. Can you point me to more information?


Steve

>> One of the problems I have with all examples of Modules that I've seen is that they use simple inline functions to demonstrate the feature.

 

I don’t know that to be true.  Certainly the design document (P0142R0) contains examples of non-inline functions.

If your larger suggestion is that we need more tutorial materials, I fully agree with that.


Well, I would like to see some larger-scale examples. The above document does contain some more.

I've just now put a small set of libraries online at

 https://github.com/steveire/ModulesExperiments

It would help my understanding hugely if you could help port it to VS modules. I just tried for a while and couldn't figure out how to make it work with the libraries. If you port the C++ code and tell me how the existing compile commands need to change and what additional commands need to be run, I'm sure we can make it work as a more-complete example, and figure out how to change CMake to hide the details.

 

Steve:

>> What does 'using Modules' look like for a shared library?

 

Now, if you do export a declaration both in the C++ sense (using the keyword ‘export’) and in the linker sense (e.g. _declspec(dllexport)), VC++ will take the step of automatically marking the symbol as ‘dllimport’ on the ‘import side’.  This relives you from having to perform the confusing macro dance regarding when to put __declspec(dllexport) or __declspec(dllimport).  All of that evaporates.


Right. If we try to imagine a future where there is no preprocessor at all during C++ builds, something like that is needed.


 

Steve:

>> Imagine I have 40 classes in a library, which today each has a .cpp and a .h file and I want to support modules. I am required to write one ixx file for the entire library, so I copy the contents of the 40 .h files into the MyLib.ixx and I add a few export keywords.

 

I am a bit puzzled.  If your existing library has 40 classes in modular 40 headers, and that corresponds to the architecture you had in mind for your library, why would you want to smash them together into a single MyLib.ixx?


Given that a module corresponds to one ixx file, and given that I thought it could be reasonable to make the ixx file correspond to the complete source interface of a library, it seems that smashing together would be the outcome.

However, you're saying that in my above hypothetical library, one header must be modularized into one module. You seem to be saying it would be unreasonable and against the design of modules to do otherwise. Do I understand correctly?


 Apparently you have determined that you wanted a single module instead of 40 modules (that would mirror your existing 40 headers).  If that is the case, then why didn’t you have a super-header that included all of the 40 headers which would have been the official interface of your library?  The question isn’t rhetorical; I’m trying to understand what has changed in your architecture to push you to do that?


I view this design through a prism of some python experience. There it is not uncommon to have multiple classes in a single module:

import datetime

d = datetime.date(2016, 1, 12)
dt = datetime.datetime(2016, 1, 12)

datetime is a module
datetime.date is a type in that module
datetime.datetime is a type in that module


Some C++ libraries today *do* have a super-header which #includes the other headers in the module. For example, Qt provides module headers such as QtCore and QtGui, but they are largely discouraged. See for example

 http://stackoverflow.com/questions/4437598/doing-qt-includes-the-right-way


 

By the way, you can also have this:

 

    // file: MyLib.ixx

    module MyLib;

    #include “foo1.h”

    // …

    #include “foo40.h”


That means continuing to rely on the preprocessor. It would be nice to have a design which is independent of the preprocessor. However, the above would seem to be equivalent to super-headers today, yes, and your suggestion of one C++ module per class would mean not having to do that.



If you like the precept of one class per file.  You can even go further using module aggregation if you go by the precept of one class per module.


If I understand you correctly, this the way modules are designed/intended to be used, right? That is, instead of creating the super-module above, Qt would create a C++ module for each class (which currently resides in its own header file) and users of the libraries would use something like

 import QtCore.QString
 import QtCore.QObject
 import QtGui.QWindow
 import QtWidgets.QPushButton

correct? And we would additionally use the 'module aggregation' feature you mention?

In his talk, Manual Klimek describes scalability problems encountered by having so many small modules:

 https://www.youtube.com/watch?v=dHFNpBfemDI

In the end, they kept the design of having so many modules and optimized other areas to compensate, but there may be other reasonable approaches based on the idea of reducing the absolute number of modules.

However, you recommend that the latter is not an approach which modules are really designed for, right?


 

Steve:

>> Question: Is that a deliberate outcome of the choices made in designing Modules? What impact does it have on maintainability of the code?

 

Yes.  It help maintenance by centralizing the definitions of helpers (shared by module units of the same module) at one place.


I'm not sure what you are referring to here.


 

Steve:

>> If it is deliberate, it is a very large change.

 

No, not really.  The actual issue here is the decision made to lump the contents of 40 headers together.  That isn’t required if it does not match the architecture you had in mind for your library.


Right. That was one of my misunderstandings I suppose.


 

Steve:

>> My buildsystem knows to invoke that command before attempting to compile any of my cpp files. If I have multiple libraries in the same build and there are dependencies between them, my buildsystem knows to generate the .ifc files in the correct order.

 

That is awesome!  Which version of CMake has this capability?


I think you misunderstood me here. I'll try to clarify:

Speaking generally, CMake/Makefiles/Ninja need to know the output files which will be created by any command that is executed in order to know which command to execute to create a particular file, and to determine the correct order to generate them. This is already part of those systems for years. I could create 'custom target' rules today with CMake to generate appropriate Makefiles/Ninja files to invoke cl.exe to generate ifc files.

Note though that this requires knowing/hardcoding the output files which will be created by the command.


 

Steve:

>> Question: What is the impact of naming the module in the .ixx file? What synchronization is expected between the 'module M;' name and the name of the .ifc file generated? What if there was no 'module M;' syntax? Is there an alternate design without name redundancy?

 

There is no formal relationship between the pathname of the file containing a module interface unit and the module name itself.

“module M;” is what indicate to the compiler (from C++ semantics) that any declaration that follows is owned by the module M.  If that module declaration is missing then there is no module semantics.

 

In the VC++ case, the name of the IFC file is settable by the user via the command line “-module:output”.  You can choose any name you want.  If you import a module M in your source, you need to specify an IFC file for the compiler to find the metadata for the interface of M.  You can do that via “-module:reference <pathname>” where you specifiy a specific file, or via a directory search path with the option “-module:search <directory>” in which case the compiler will try to associate the module M with a file M.ifc in that directory.  I recommend being explicit, e.g. using the “-module:reference” option. 


This is probably made clear elsewhere but I've already forgotten. Do I need to transitively specify all modules? For example, if I want to use QtCore.QPushButton, do I need to specify module files in my buildsystem for QtWidgets.QWidget and QtCore.QObject (which are dependencies by inheritance)?

If I have a library or executable and I add a new user of QtWidgets.QLabel to one of the classes, do I have to change my buildsystem to add a module reference for that?

Or would I (as an upsteam Qt maintainer), in a case like that, design the Qt modules system such that users specify directories instead because it is more convenient and requires fewer changes to the buildsystem?

That ends up kind of similar to what we have with header files today. However, the buildsystem still needs to know what files get included, in order that in the input header/module files change, the translation units depending on those headers/modules get recompiled. At least today GCC can output the used header files in a Makefile compatible format (and Ninja can read those files too). I suppose they would need the same feature to output a list of used modules. Just something to think about as part of the impact of this.

Also, one of the things Manual Klimek mentioned in his talk is that they hit command line length limits in the buildsystem due to specifying all of the module files.


Steve:

>> What is the impact of Modules on buildsystems?

 

This is a good question.

The Windows engineering team upgraded its existing build system to include infrastructure for using modules.  From what I understand, it was a smooth upgrade. 


What I would find interesting is whether they need to parse the 'module modulename;' content from the source files, or whether they always need to specify the ifc output file name in the buildsystem when processing the ixx file (and if that is the case - what the value of the 'module modulename;' content is).

[Disclosure: I'm a Microsoft employee, but I'm not working on anything related to this. This is a side-interest in my capacity as a CMake maintainer and Qt maintainer and as a C++ developer generally]

Thanks,

Steve.

Manuel Klimek

unread,
Feb 6, 2017, 10:07:23 AM2/6/17
to mod...@isocpp.org, Gabriel Dos Reis, C++ Modules, Daniel Jasper
I thought my message was that the problems with scalability are modules that are too large :)

Note also that my talk focused on distributed build systems, which in the non-modules world take advantage of the full denormalization of C++ builds via textual includes.

I've talked with folks from Apple, who have experience with C++-ish builds and modules (via Obj-C modules), and iirc the problem was that they have to live with very large modules (due to Swift/Obj-C interoperability), which lead to a large number of rebuilds that previously were not needed.

Also note that since that talk we've brought build times down significantly by automatically pruning unneeded modules (as exposed by the .d file of the modules build). Something like that would need to be implemented by CMake modules support for extra speed-ups for rebuilds.
 

Steve:

>> Question: Is that a deliberate outcome of the choices made in designing Modules? What impact does it have on maintainability of the code?

 

Yes.  It help maintenance by centralizing the definitions of helpers (shared by module units of the same module) at one place.


I'm not sure what you are referring to here.


 

Steve:

>> If it is deliberate, it is a very large change.

 

No, not really.  The actual issue here is the decision made to lump the contents of 40 headers together.  That isn’t required if it does not match the architecture you had in mind for your library.


Right. That was one of my misunderstandings I suppose.


 

Steve:

>> My buildsystem knows to invoke that command before attempting to compile any of my cpp files. If I have multiple libraries in the same build and there are dependencies between them, my buildsystem knows to generate the .ifc files in the correct order.

 

That is awesome!  Which version of CMake has this capability?


I think you misunderstood me here. I'll try to clarify:

Speaking generally, CMake/Makefiles/Ninja need to know the output files which will be created by any command that is executed in order to know which command to execute to create a particular file, and to determine the correct order to generate them. This is already part of those systems for years. I could create 'custom target' rules today with CMake to generate appropriate Makefiles/Ninja files to invoke cl.exe to generate ifc files.

Note though that this requires knowing/hardcoding the output files which will be created by the command.

Don't we do the exact same thing for object files today?
Similarly to "C++ source file generates object file", I'd expect we'll have "C++ module file generates module output file".
  

Steve:

>> Question: What is the impact of naming the module in the .ixx file? What synchronization is expected between the 'module M;' name and the name of the .ifc file generated? What if there was no 'module M;' syntax? Is there an alternate design without name redundancy?

 

There is no formal relationship between the pathname of the file containing a module interface unit and the module name itself.

“module M;” is what indicate to the compiler (from C++ semantics) that any declaration that follows is owned by the module M.  If that module declaration is missing then there is no module semantics.

 

In the VC++ case, the name of the IFC file is settable by the user via the command line “-module:output”.  You can choose any name you want.  If you import a module M in your source, you need to specify an IFC file for the compiler to find the metadata for the interface of M.  You can do that via “-module:reference <pathname>” where you specifiy a specific file, or via a directory search path with the option “-module:search <directory>” in which case the compiler will try to associate the module M with a file M.ifc in that directory.  I recommend being explicit, e.g. using the “-module:reference” option. 


This is probably made clear elsewhere but I've already forgotten. Do I need to transitively specify all modules? For example, if I want to use QtCore.QPushButton, do I need to specify module files in my buildsystem for QtWidgets.QWidget and QtCore.QObject (which are dependencies by inheritance)?
 

If I have a library or executable and I add a new user of QtWidgets.QLabel to one of the classes, do I have to change my buildsystem to add a module reference for that?

Or would I (as an upsteam Qt maintainer), in a case like that, design the Qt modules system such that users specify directories instead because it is more convenient and requires fewer changes to the buildsystem?

That ends up kind of similar to what we have with header files today. However, the buildsystem still needs to know what files get included, in order that in the input header/module files change, the translation units depending on those headers/modules get recompiled. At least today GCC can output the used header files in a Makefile compatible format (and Ninja can read those files too). I suppose they would need the same feature to output a list of used modules. Just something to think about as part of the impact of this.

clang for example already does that.
 
Also, one of the things Manual Klimek mentioned in his talk is that they hit command line length limits in the buildsystem due to specifying all of the module files.

That is a very clang specific issue, though, and really unrelated to standardization issues :)
 

Steve:

>> What is the impact of Modules on buildsystems?

 

This is a good question.

The Windows engineering team upgraded its existing build system to include infrastructure for using modules.  From what I understand, it was a smooth upgrade. 


What I would find interesting is whether they need to parse the 'module modulename;' content from the source files, or whether they always need to specify the ifc output file name in the buildsystem when processing the ixx file (and if that is the case - what the value of the 'module modulename;' content is).

That is all very MS specific terminology. Generally, I'd expect that build systems could, in a modules world, be able to parse the C++ code that specifies the module and its dependencies (for example by using the compiler with a switch that will run in reduced mode, similar to preprocessing, but without requiring transitive inputs) and build the dependency graph from that.

Alternatively, at least for the foreseeable future, I'd expect us to still specify library level dependencies, and have the build system use those to trigger the module builds; in that case, your source code would need to match what's written in your code, but that's already the case today.

Finally, I'm not buying into the "modules only" world-view yet - while I think it has value to think about what will be in 10-15 years, at least until then we'll need build systems supporting a mixture of modules and textual inclusion, mainly due to a large OS ecosystem of libraries that have backwards-compatibility issues.
 
--
You received this message because you are subscribed to the Google Groups "SG2 - Modules" group.
To unsubscribe from this group and stop receiving emails from it, send an email to modules+u...@isocpp.org.
To post to this group, send email to mod...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/modules/.

Johan Boulé

unread,
Feb 6, 2017, 2:43:33 PM2/6/17
to SG2 - Modules
About the ability to add a __declspec(dllexport) attribute to an export clause, on second thought, this still requires replacing the attribute with a macro, unless the compiler accepts and ignores the attribute when building in non-pic mode (i.e with the intent of making a static lib). Generally, we can't make any assumption whether the user will build the code as a static or a shared lib. Both have to be supported through buildsystem switches.

Regards

Gabriel Dos Reis

unread,
Feb 6, 2017, 2:49:56 PM2/6/17
to mod...@isocpp.org
We should assume that compilers attempt to make sense of the switches they are passed. Some of these macros are attempts to fix compiler deficiencies; the complexity should be pushed back to where it belongs.

Stephen Kelly

unread,
Feb 6, 2017, 4:31:38 PM2/6/17
to mod...@isocpp.org, Gabriel Dos Reis, C++ Modules, Daniel Jasper
On 02/06/2017 03:07 PM, 'Manuel Klimek' via SG2 - Modules wrote:
On Mon, Feb 6, 2017 at 12:55 AM Stephen Kelly <stev...@gmail.com> wrote:

In his talk, Manual Klimek describes scalability problems encountered by having so many small modules:

 https://www.youtube.com/watch?v=dHFNpBfemDI

In the end, they kept the design of having so many modules and optimized other areas to compensate, but there may be other reasonable approaches based on the idea of reducing the absolute number of modules.

However, you recommend that the latter is not an approach which modules are really designed for, right?

I thought my message was that the problems with scalability are modules that are too large :)

I suppose I misremembered, sorry. I recall you mentioned several problems with different approaches, but it is not possible to search/skim the information the video. I thought there was some problem of having to repeatedly cycle through all module files many times. Thanks for clarifying. Could you provide some more-raw information?

Nevertheless, I'm sure when people get their hands on this they will want to take many different approaches to solve the problems that occur, including along the spectrum from few large modules to many small ones.


Also note that since that talk we've brought build times down significantly by automatically pruning unneeded modules (as exposed by the .d file of the modules build). Something like that would need to be implemented by CMake modules support for extra speed-ups for rebuilds.

I think you did mention that in the talk already actually.


Note though that this requires knowing/hardcoding the output files which will be created by the command.

Don't we do the exact same thing for object files today?

I don't know. Is it? Do our source files contain the name of the object file to create? Or perhaps: does the source file contain information about how the linker should refer to the object file? I really don't know here. I'm no linker expert.

It seems odd/redundant/potentially problematic to me that the module name is specified in the source. I'm trying to convince someone to provide some rationale for that so that I can understand (and preferably help add modules to the git repo I posted so that we can experiment).


Similarly to "C++ source file generates object file", I'd expect we'll have "C++ module file generates module output file".

... "which must be referred to by tokens specified in the C++ module file". That is the part that I'm not aware of the precedence or rationale for, and I'm not sure if it's relevant to buildsystems. Maybe it's not relevant for them. I don't have the information.

 
At least today GCC can output the used header files in a Makefile compatible format (and Ninja can read those files too). I suppose they would need the same feature to output a list of used modules. Just something to think about as part of the impact of this.

clang for example already does that.

Interesting. As far as I was aware, clang doesn't support the 'import' syntax at all yet. Is the feature of generating the list of imported modules documented on

 https://clang.llvm.org/docs/Modules.html

or elsewhere yet?


 
Also, one of the things Manual Klimek mentioned in his talk is that they hit command line length limits in the buildsystem due to specifying all of the module files.

That is a very clang specific issue, though, and really unrelated to standardization issues :)

We're not discussing standardization anyway (there's nothing standardized about buildsystems, compiler switches for file name conventions).

If, as Gaby recommends, all used modules should be specified on the compile command line with MSVC, why do you call it a clang issue? Am I missing something here?

 

Steve:

>> What is the impact of Modules on buildsystems?

 

This is a good question.

The Windows engineering team upgraded its existing build system to include infrastructure for using modules.  From what I understand, it was a smooth upgrade. 


What I would find interesting is whether they need to parse the 'module modulename;' content from the source files, or whether they always need to specify the ifc output file name in the buildsystem when processing the ixx file (and if that is the case - what the value of the 'module modulename;' content is).

That is all very MS specific terminology.

What is? I don't know what part of the quote you are referring to. Parts of are are from the modules TS, and I don't think file extensions are so relevant? (I'm making a guess about what your remark means)


Generally, I'd expect that build systems could, in a modules world, be able to parse the C++ code that specifies the module and its dependencies (for example by using the compiler with a switch that will run in reduced mode, similar to preprocessing, but without requiring transitive inputs) and build the dependency graph from that.

This is the kind of input I was looking for with this thread. This is why I tried to a very open question so that people like yourself can share the experience you have with modules, and your conclusions about how the buildsystem and compiler drivers are affected.


Alternatively, at least for the foreseeable future, I'd expect us to still specify library level dependencies, and have the build system use those to trigger the module builds;

Can you give an example?


in that case, your source code would need to match what's written in your code, but that's already the case today.

I'm having trouble parsing "your source code would need to match what's written in your code". What do you mean?

Thanks,

Steve.

Richard Smith

unread,
Feb 6, 2017, 4:45:31 PM2/6/17
to mod...@isocpp.org, Gabriel Dos Reis, C++ Modules, Daniel Jasper
On 6 February 2017 at 13:31, Stephen Kelly <stev...@gmail.com> wrote:
On 02/06/2017 03:07 PM, 'Manuel Klimek' via SG2 - Modules wrote:
On Mon, Feb 6, 2017 at 12:55 AM Stephen Kelly <stev...@gmail.com> wrote:

In his talk, Manual Klimek describes scalability problems encountered by having so many small modules:

 https://www.youtube.com/watch?v=dHFNpBfemDI

In the end, they kept the design of having so many modules and optimized other areas to compensate, but there may be other reasonable approaches based on the idea of reducing the absolute number of modules.

However, you recommend that the latter is not an approach which modules are really designed for, right?

I thought my message was that the problems with scalability are modules that are too large :)

I suppose I misremembered, sorry. I recall you mentioned several problems with different approaches, but it is not possible to search/skim the information the video. I thought there was some problem of having to repeatedly cycle through all module files many times. Thanks for clarifying. Could you provide some more-raw information?

Nevertheless, I'm sure when people get their hands on this they will want to take many different approaches to solve the problems that occur, including along the spectrum from few large modules to many small ones.

This doesn't seem fundamentally different from people wanting anything on the spectrum from a large number of small header files to a small number of huge header files today. And as with that choice, one of the things they may want to consider is the effect on their build performance when different parts of their codebase change (larger headers or modules will typically mean that changes to that interface cause more downstream targets to recompile).
Also note that since that talk we've brought build times down significantly by automatically pruning unneeded modules (as exposed by the .d file of the modules build). Something like that would need to be implemented by CMake modules support for extra speed-ups for rebuilds.

I think you did mention that in the talk already actually.

Note though that this requires knowing/hardcoding the output files which will be created by the command.

Don't we do the exact same thing for object files today?

I don't know. Is it? Do our source files contain the name of the object file to create? Or perhaps: does the source file contain information about how the linker should refer to the object file? I really don't know here. I'm no linker expert.

It seems odd/redundant/potentially problematic to me that the module name is specified in the source. I'm trying to convince someone to provide some rationale for that so that I can understand (and preferably help add modules to the git repo I posted so that we can experiment).

Are you assuming that the name of the module output file would need to be in some way related to the name of the module in the source? I don't see any reason to assume that. If you want to build module Foo in bar.cppm to baz.pcm, I would expect that to work, just as if you wanted to build a definition of class Foo in bar.cpp to baz.o (although I would question the wisdom of some of those choices).
Similarly to "C++ source file generates object file", I'd expect we'll have "C++ module file generates module output file".

... "which must be referred to by tokens specified in the C++ module file". That is the part that I'm not aware of the precedence or rationale for, and I'm not sure if it's relevant to buildsystems. Maybe it's not relevant for them. I don't have the information.
 
At least today GCC can output the used header files in a Makefile compatible format (and Ninja can read those files too). I suppose they would need the same feature to output a list of used modules. Just something to think about as part of the impact of this.

clang for example already does that.

Interesting. As far as I was aware, clang doesn't support the 'import' syntax at all yet.

It does, under -fmodules-ts, but using a #include to import a module via a module map also writes the module dependency to the .d file.
 
Is the feature of generating the list of imported modules documented on

 https://clang.llvm.org/docs/Modules.html

or elsewhere yet?

It's not exactly a new feature, it's just that the dependency file generator is modules-aware.
 

Also, one of the things Manual Klimek mentioned in his talk is that they hit command line length limits in the buildsystem due to specifying all of the module files.

That is a very clang specific issue, though, and really unrelated to standardization issues :)

We're not discussing standardization anyway (there's nothing standardized about buildsystems, compiler switches for file name conventions).

If, as Gaby recommends, all used modules should be specified on the compile command line with MSVC, why do you call it a clang issue? Am I missing something here?

Some implementations (clang included) support the ability to pass command-line arguments via a "response file" instead of as actual command-line arguments. Command-line argument length limits vary between operating systems. An implementation could support a way for a module file to suggest the location of another module file if it's not explicitly specified (and clang supports such a mechanism).

Steve:

>> What is the impact of Modules on buildsystems?

 

This is a good question.

The Windows engineering team upgraded its existing build system to include infrastructure for using modules.  From what I understand, it was a smooth upgrade. 


What I would find interesting is whether they need to parse the 'module modulename;' content from the source files, or whether they always need to specify the ifc output file name in the buildsystem when processing the ixx file (and if that is the case - what the value of the 'module modulename;' content is).

That is all very MS specific terminology.

What is? I don't know what part of the quote you are referring to. Parts of are are from the modules TS, and I don't think file extensions are so relevant? (I'm making a guess about what your remark means)

Generally, I'd expect that build systems could, in a modules world, be able to parse the C++ code that specifies the module and its dependencies (for example by using the compiler with a switch that will run in reduced mode, similar to preprocessing, but without requiring transitive inputs) and build the dependency graph from that.

This is the kind of input I was looking for with this thread. This is why I tried to a very open question so that people like yourself can share the experience you have with modules, and your conclusions about how the buildsystem and compiler drivers are affected.

Alternatively, at least for the foreseeable future, I'd expect us to still specify library level dependencies, and have the build system use those to trigger the module builds;

Can you give an example?

in that case, your source code would need to match what's written in your code, but that's already the case today.

I'm having trouble parsing "your source code would need to match what's written in your code". What do you mean?

I would guess that's meant to say "your build files would need to match what's written in your code".

Stephen Kelly

unread,
Feb 6, 2017, 6:04:19 PM2/6/17
to mod...@isocpp.org, Gabriel Dos Reis, C++ Modules, Daniel Jasper
On 02/06/2017 09:45 PM, 'Richard Smith' via SG2 - Modules wrote:
On 6 February 2017 at 13:31, Stephen Kelly <stev...@gmail.com> wrote:
On 02/06/2017 03:07 PM, 'Manuel Klimek' via SG2 - Modules wrote:
On Mon, Feb 6, 2017 at 12:55 AM Stephen Kelly <stev...@gmail.com> wrote:

In his talk, Manual Klimek describes scalability problems encountered by having so many small modules:

 https://www.youtube.com/watch?v=dHFNpBfemDI

In the end, they kept the design of having so many modules and optimized other areas to compensate, but there may be other reasonable approaches based on the idea of reducing the absolute number of modules.

However, you recommend that the latter is not an approach which modules are really designed for, right?

I thought my message was that the problems with scalability are modules that are too large :)

I suppose I misremembered, sorry. I recall you mentioned several problems with different approaches, but it is not possible to search/skim the information the video. I thought there was some problem of having to repeatedly cycle through all module files many times. Thanks for clarifying. Could you provide some more-raw information?

Nevertheless, I'm sure when people get their hands on this they will want to take many different approaches to solve the problems that occur, including along the spectrum from few large modules to many small ones.

This doesn't seem fundamentally different from people wanting anything on the spectrum from a large number of small header files to a small number of huge header files today

I suppose. This is a topic because it is not clear to me what approach Qt should take to modules and what impact that has on buildsystems of Qt users for example. I was more specific about that in a previous email, including the requirement to make a change in the buildsystem every time I add 'import QPushButton' to my applications cpp file. See my previous email for more.

Manuel mentioned in his talk that Google requires specifying all headers in use anyway, so I guess it's no big change for you, but many projects don't do that. If that is needed, I wonder if it would harm adoption. Manuel mentioned that perhaps compilers could learn to parse the import statements and present that information to the buildsystem, but that also requires implementation by buildsystem implementors, which becomes the next adoption bottleneck.

Really, this whole thread for me is about trying to find out what impediments there are to C++ modules being successful, and what assumptions are being made about how tooling will make them successful. I know modules have been made to work for some large codebases, but I don't know how that will generalize to the rest of the world.

The more I think about it, the more I think one module per class wouldn't work well for Qt. To prevent requiring the user to specify the path to all module files that they use in their buildsystem, we would probably make Qt simply put all module files from, say, all classes in QtWidgets on your compile line if you use QtWidgets. Then you could 'import QPushButton' without changing the buildsystem, but there would be no advantage to having one C++ module per class. One C++ module per Qt library would be the only thing to make sense instead I think.

Perhaps build performance is the thing that would cause it swing the other way, I don't know.

Then the question remains how to specify such a C++ module for a Qt library in a maintainable way.

A separate issue is that, as compiled modules will probably not be distributable, the buildsystem will have to compile all the module files it depends on. I don't have any sense of how long it would take to compile module files for all classes in QtCore QtGui and QtWidgets (or QtCore QtGui QtQml and QtQuick) just to build your first hello world Qt program.


And as with that choice, one of the things they may want to consider is the effect on their build performance when different parts of their codebase change (larger headers or modules will typically mean that changes to that interface cause more downstream targets to recompile).
Also note that since that talk we've brought build times down significantly by automatically pruning unneeded modules (as exposed by the .d file of the modules build). Something like that would need to be implemented by CMake modules support for extra speed-ups for rebuilds.

I think you did mention that in the talk already actually.

Note though that this requires knowing/hardcoding the output files which will be created by the command.

Don't we do the exact same thing for object files today?

I don't know. Is it? Do our source files contain the name of the object file to create? Or perhaps: does the source file contain information about how the linker should refer to the object file? I really don't know here. I'm no linker expert.

It seems odd/redundant/potentially problematic to me that the module name is specified in the source. I'm trying to convince someone to provide some rationale for that so that I can understand (and preferably help add modules to the git repo I posted so that we can experiment).

Are you assuming that the name of the module output file would need to be in some way related to the name of the module in the source? I don't see any reason to assume that. If you want to build module Foo in bar.cppm to baz.pcm, I would expect that to work, just as if you wanted to build a definition of class Foo in bar.cpp to baz.o (although I would question the wisdom of some of those choices).

Ah, ok - that's the analogy.

That's why all the compiled module files need to be specified on the compile line of any translation unit using them. I'm assuming all files in the transitive closure need to be specified. I suppose that transitive closure would be computed by the buildsystem an mostly hidden from the user (as particular names of object files are today generally). At least in the blog at

 https://blogs.msdn.microsoft.com/vcblog/2015/12/03/c-modules-in-vs-2015-update-1/

there is an example of using a search directory and not specifying the module file name on the compile line. The 'import M' in bar.cpp still works. That can only work if either some correlation from modulename to filename is assumed by the compiler, or the compiler simply pre-loads all modules in that directory regardless of filename and makes the modulenames available.

If that could be expected to work, CMake could compile module files into some internal directory within the build directory and simply pass the path to it to all compiles (CMake would still need to know somehow which module files to compile in that way though).

Also, one of the things Manual Klimek mentioned in his talk is that they hit command line length limits in the buildsystem due to specifying all of the module files.

That is a very clang specific issue, though, and really unrelated to standardization issues :)
If, as Gaby recommends, all used modules should be specified on the compile command line with MSVC, why do you call it a clang issue? Am I missing something here?

Some implementations (clang included) support the ability to pass command-line arguments via a "response file" instead of as actual command-line arguments. Command-line argument length limits vary between operating systems. An implementation could support a way for a module file to suggest the location of another module file if it's not explicitly specified (and clang supports such a mechanism).

I still don't see anything 'very clang-specific' about this, so I guess I'm still missing something :). Doesn't seem important though.

Thanks,

Steve.

Manuel Klimek

unread,
Feb 7, 2017, 4:33:14 AM2/7/17
to mod...@isocpp.org, Gabriel Dos Reis, C++ Modules, Daniel Jasper
On Tue, Feb 7, 2017 at 12:04 AM Stephen Kelly <stev...@gmail.com> wrote:
On 02/06/2017 09:45 PM, 'Richard Smith' via SG2 - Modules wrote:
On 6 February 2017 at 13:31, Stephen Kelly <stev...@gmail.com> wrote:
On 02/06/2017 03:07 PM, 'Manuel Klimek' via SG2 - Modules wrote:
On Mon, Feb 6, 2017 at 12:55 AM Stephen Kelly <stev...@gmail.com> wrote:

In his talk, Manual Klimek describes scalability problems encountered by having so many small modules:

 https://www.youtube.com/watch?v=dHFNpBfemDI

In the end, they kept the design of having so many modules and optimized other areas to compensate, but there may be other reasonable approaches based on the idea of reducing the absolute number of modules.

However, you recommend that the latter is not an approach which modules are really designed for, right?

I thought my message was that the problems with scalability are modules that are too large :)

I suppose I misremembered, sorry. I recall you mentioned several problems with different approaches, but it is not possible to search/skim the information the video. I thought there was some problem of having to repeatedly cycle through all module files many times. Thanks for clarifying. Could you provide some more-raw information?

Nevertheless, I'm sure when people get their hands on this they will want to take many different approaches to solve the problems that occur, including along the spectrum from few large modules to many small ones.

This doesn't seem fundamentally different from people wanting anything on the spectrum from a large number of small header files to a small number of huge header files today

I suppose. This is a topic because it is not clear to me what approach Qt should take to modules and what impact that has on buildsystems of Qt users for example. I was more specific about that in a previous email, including the requirement to make a change in the buildsystem every time I add 'import QPushButton' to my applications cpp file. See my previous email for more.

Manuel mentioned in his talk that Google requires specifying all headers in use anyway, so I guess it's no big change for you, but many projects don't do that. If that is needed, I wonder if it would harm adoption. Manuel mentioned that perhaps compilers could learn to parse the import statements and present that information to the buildsystem, but that also requires implementation by buildsystem implementors, which becomes the next adoption bottleneck.

Really, this whole thread for me is about trying to find out what impediments there are to C++ modules being successful, and what assumptions are being made about how tooling will make them successful. I know modules have been made to work for some large codebases, but I don't know how that will generalize to the rest of the world.

The more I think about it, the more I think one module per class wouldn't work well for Qt. To prevent requiring the user to specify the path to all module files that they use in their buildsystem, we would probably make Qt simply put all module files from, say, all classes in QtWidgets on your compile line if you use QtWidgets. Then you could 'import QPushButton' without changing the buildsystem, but there would be no advantage to having one C++ module per class. One C++ module per Qt library would be the only thing to make sense instead I think.

Perhaps build performance is the thing that would cause it swing the other way, I don't know.

Then the question remains how to specify such a C++ module for a Qt library in a maintainable way.

A separate issue is that, as compiled modules will probably not be distributable, the buildsystem will have to compile all the module files it depends on. I don't have any sense of how long it would take to compile module files for all classes in QtCore QtGui and QtWidgets (or QtCore QtGui QtQml and QtQuick) just to build your first hello world Qt program.

I think it's too early to really say how Qt should be layed out. It'll depend a lot on how specific compilers will need to get modules presented.

For example, we went back-and-forth multiple times between specifying all transitive modules on the command line (less non-parallelizable work for the build system) vs only handing in top-level modules (that is, not specifying modules on the command line that are in the transitive dependencies of a different module in the set of transitively used modules).
Currently, we are back to only specifying top-level modules, as that means clang can figure out which modules are actually used, writes only the used modules to the .d file, and we can use that information to prune the dependency graph of builds depending on that module (yea, it's somewhat complex, and we actually do an include-scanning step where we try to figure out which headers, and thus modules, are reachable at all from a source file).

A different consideration is that for large continuously integrated code bases, the triggered rebuilds by a header change in a core library are much more of a problem than for a user of Qt, for example, who will probably stick with a single version for a longer while.

 
And as with that choice, one of the things they may want to consider is the effect on their build performance when different parts of their codebase change (larger headers or modules will typically mean that changes to that interface cause more downstream targets to recompile).
Also note that since that talk we've brought build times down significantly by automatically pruning unneeded modules (as exposed by the .d file of the modules build). Something like that would need to be implemented by CMake modules support for extra speed-ups for rebuilds.

I think you did mention that in the talk already actually.

Note though that this requires knowing/hardcoding the output files which will be created by the command.

Don't we do the exact same thing for object files today?

I don't know. Is it? Do our source files contain the name of the object file to create? Or perhaps: does the source file contain information about how the linker should refer to the object file? I really don't know here. I'm no linker expert.

It seems odd/redundant/potentially problematic to me that the module name is specified in the source. I'm trying to convince someone to provide some rationale for that so that I can understand (and preferably help add modules to the git repo I posted so that we can experiment).

Are you assuming that the name of the module output file would need to be in some way related to the name of the module in the source? I don't see any reason to assume that. If you want to build module Foo in bar.cppm to baz.pcm, I would expect that to work, just as if you wanted to build a definition of class Foo in bar.cpp to baz.o (although I would question the wisdom of some of those choices).

Ah, ok - that's the analogy.

That's why all the compiled module files need to be specified on the compile line of any translation unit using them. I'm assuming all files in the transitive closure need to be specified. I suppose that transitive closure would be computed by the buildsystem an mostly hidden from the user (as particular names of object files are today generally). At least in the blog at

 https://blogs.msdn.microsoft.com/vcblog/2015/12/03/c-modules-in-vs-2015-update-1/

there is an example of using a search directory and not specifying the module file name on the compile line. The 'import M' in bar.cpp still works. That can only work if either some correlation from modulename to filename is assumed by the compiler, or the compiler simply pre-loads all modules in that directory regardless of filename and makes the modulenames available.

If that could be expected to work, CMake could compile module files into some internal directory within the build directory and simply pass the path to it to all compiles (CMake would still need to know somehow which module files to compile in that way though).

I think a simple modules support in CMake could look this way:
If you add a C++ library, you can specify interface h1.hm h2.hm h3.hm (no idea how module files will be called :), which would build a module out of these headers / C++ module definitions.
The module will be passed to all libraries in the reverse transitive dependency closure.
For clang, for example, we could also currently pass (for backwards compatibility) h.cppmap in interface, which will only be needed for the transitional period, and compile a module out of plain C++ headers.

That way, you could get incremental compilation benefits early, and we could get real world experience on modules builds for projects that are happy to be early testers.
 

Also, one of the things Manual Klimek mentioned in his talk is that they hit command line length limits in the buildsystem due to specifying all of the module files.

That is a very clang specific issue, though, and really unrelated to standardization issues :)
If, as Gaby recommends, all used modules should be specified on the compile command line with MSVC, why do you call it a clang issue? Am I missing something here?

Some implementations (clang included) support the ability to pass command-line arguments via a "response file" instead of as actual command-line arguments. Command-line argument length limits vary between operating systems. An implementation could support a way for a module file to suggest the location of another module file if it's not explicitly specified (and clang supports such a mechanism).

I still don't see anything 'very clang-specific' about this, so I guess I'm still missing something :). Doesn't seem important though.

It's clang-specific whether you'll need to pass all transitive modules.
It's very system specific what the max command line length is :)

Stephen Kelly

unread,
Feb 7, 2017, 7:16:00 PM2/7/17
to mod...@isocpp.org, Gabriel Dos Reis, C++ Modules, Daniel Jasper
On 02/07/2017 09:32 AM, 'Manuel Klimek' via SG2 - Modules wrote:
On Tue, Feb 7, 2017 at 12:04 AM Stephen Kelly <stev...@gmail.com> wrote:
A separate issue is that, as compiled modules will probably not be distributable, the buildsystem will have to compile all the module files it depends on. I don't have any sense of how long it would take to compile module files for all classes in QtCore QtGui and QtWidgets (or QtCore QtGui QtQml and QtQuick) just to build your first hello world Qt program.

I think it's too early to really say how Qt should be layed out. It'll depend a lot on how specific compilers will need to get modules presented.

The issue of what the interface to modules is and how libraries such as (but not limited to) Qt and their users and buildsystems will make use of them seem quite interdependent to me.

It is also interdependent with buildsystems. The Ninja buildsystem can not currently build Fortran code because the way that Fortran modules are specified is not compatible with how Ninja currently works:

 https://groups.google.com/forum/#!topic/ninja-build/tPOcu5EWXio

I don't know how fixable that is, but I think that's an important conversation, and one which is inseparable from the design of modules.

I think, rather than the question in the email title here, a better open question is:


 What needs to happen in tooling and popular libraries for C++ modules to become a success?


That can include questions such as the impact of naming a module in the source, the features Manuel mentioned would be needed to add to all compilers such as an import/module/export scanning mode, buildsystems to be ported to use such a mode, extra steps buildsystems may have to explicitly take where today 'compile and link' gets you most of the way and how that will be exposed, what people who maintain the buildsystem for their project have to do, what does the transition look like etc.

Currently, from what I read on reddit at least, people seem to think that C++ modules will be added to the standard, magic will happen, and then they will achieve nirvana. What needs to be done to make that true?

The modules proposal seems more interdependent with tooling and how libraries maintain/define their interface than any other C++ feature I'm aware of since C++11.


For example, we went back-and-forth multiple times between specifying all transitive modules on the command line (less non-parallelizable work for the build system) vs only handing in top-level modules (that is, not specifying modules on the command line that are in the transitive dependencies of a different module in the set of transitively used modules).

Not specifying all transitive (public) dependencies on the command line will only work if all module files are in well-known locations or the same location, right?

IOW, if Lib1 depends (publically) on Lib2 and Exe depends on Lib1, you specify only the full path to the module file for Lib1 when compiling Exe.cpp. How is the module file for Lib2 found if it is in some other random location? Is the path hardcoded in the Lib1 module file?


Currently, we are back to only specifying top-level modules, as that means clang can figure out which modules are actually used, writes only the used modules to the .d file, and we can use that information to prune the dependency graph of builds depending on that module (yea, it's somewhat complex, and we actually do an include-scanning step where we try to figure out which headers, and thus modules, are reachable at all from a source file).

Whatever complexity you hit regarding your buildsystem with modules, every other buildsystem else will eventually hit too.


A different consideration is that for large continuously integrated code bases, the triggered rebuilds by a header change in a core library are much more of a problem than for a user of Qt, for example, who will probably stick with a single version for a longer while.

That depends on whether you are a developer working on Qt :).

Anyway, that's just an example. Most large codebases I've seen have the code separated into multiple libraries, not unlike the way Qt is structured. They will want to use modules and will hit the same issues we hit with Qt and in the same way.

 
If that could be expected to work, CMake could compile module files into some internal directory within the build directory and simply pass the path to it to all compiles (CMake would still need to know somehow which module files to compile in that way though).

I think a simple modules support in CMake could look this way:
If you add a C++ library, you can specify interface h1.hm h2.hm h3.hm (no idea how module files will be called :), which would build a module out of these headers / C++ module definitions.
The module will be passed to all libraries in the reverse transitive dependency closure.
For clang, for example, we could also currently pass (for backwards compatibility) h.cppmap in interface, which will only be needed for the transitional period, and compile a module out of plain C++ headers.

You use the singular 'a module' for a library, whereas other discussion was a recommendation to maintain one module per class in the library. I'm having trouble juggling all of the different options and what impact they have on libraries and buildsystems.

What you describe above is similar to what I had in mind at the beginning which is something akin to one module per library.


That way, you could get incremental compilation benefits early, and we could get real world experience on modules builds for projects that are happy to be early testers.

I'm sure there are many willing. I'm sure that will bring up even more questions too.
 
Thanks,

Steve.

Richard Smith

unread,
Feb 7, 2017, 7:32:49 PM2/7/17
to mod...@isocpp.org, Gabriel Dos Reis, C++ Modules, Daniel Jasper
On 7 February 2017 at 16:15, Stephen Kelly <stev...@gmail.com> wrote:
On 02/07/2017 09:32 AM, 'Manuel Klimek' via SG2 - Modules wrote:
On Tue, Feb 7, 2017 at 12:04 AM Stephen Kelly <stev...@gmail.com> wrote:
A separate issue is that, as compiled modules will probably not be distributable, the buildsystem will have to compile all the module files it depends on. I don't have any sense of how long it would take to compile module files for all classes in QtCore QtGui and QtWidgets (or QtCore QtGui QtQml and QtQuick) just to build your first hello world Qt program.

I think it's too early to really say how Qt should be layed out. It'll depend a lot on how specific compilers will need to get modules presented.

The issue of what the interface to modules is and how libraries such as (but not limited to) Qt and their users and buildsystems will make use of them seem quite interdependent to me.

It is also interdependent with buildsystems. The Ninja buildsystem can not currently build Fortran code because the way that Fortran modules are specified is not compatible with how Ninja currently works:

 https://groups.google.com/forum/#!topic/ninja-build/tPOcu5EWXio

I don't know how fixable that is, but I think that's an important conversation, and one which is inseparable from the design of modules.

I think, rather than the question in the email title here, a better open question is:


 What needs to happen in tooling and popular libraries for C++ modules to become a success?


That can include questions such as the impact of naming a module in the source, the features Manuel mentioned would be needed to add to all compilers such as an import/module/export scanning mode, buildsystems to be ported to use such a mode, extra steps buildsystems may have to explicitly take where today 'compile and link' gets you most of the way and how that will be exposed, what people who maintain the buildsystem for their project have to do, what does the transition look like etc.

Currently, from what I read on reddit at least, people seem to think that C++ modules will be added to the standard, magic will happen, and then they will achieve nirvana. What needs to be done to make that true?

The modules proposal seems more interdependent with tooling and how libraries maintain/define their interface than any other C++ feature I'm aware of since C++11.

For example, we went back-and-forth multiple times between specifying all transitive modules on the command line (less non-parallelizable work for the build system) vs only handing in top-level modules (that is, not specifying modules on the command line that are in the transitive dependencies of a different module in the set of transitively used modules).

Not specifying all transitive (public) dependencies on the command line will only work if all module files are in well-known locations or the same location, right?

Yes; Clang's primary approach here is to store the relative path from the output module to its direct dependencies (to allow the entire module subtree to be relocated).
 
IOW, if Lib1 depends (publically) on Lib2 and Exe depends on Lib1, you specify only the full path to the module file for Lib1 when compiling Exe.cpp. How is the module file for Lib2 found if it is in some other random location? Is the path hardcoded in the Lib1 module file?

Currently, we are back to only specifying top-level modules, as that means clang can figure out which modules are actually used, writes only the used modules to the .d file, and we can use that information to prune the dependency graph of builds depending on that module (yea, it's somewhat complex, and we actually do an include-scanning step where we try to figure out which headers, and thus modules, are reachable at all from a source file).

Whatever complexity you hit regarding your buildsystem with modules, every other buildsystem else will eventually hit too.

A different consideration is that for large continuously integrated code bases, the triggered rebuilds by a header change in a core library are much more of a problem than for a user of Qt, for example, who will probably stick with a single version for a longer while.

That depends on whether you are a developer working on Qt :).

Anyway, that's just an example. Most large codebases I've seen have the code separated into multiple libraries, not unlike the way Qt is structured. They will want to use modules and will hit the same issues we hit with Qt and in the same way.
 
If that could be expected to work, CMake could compile module files into some internal directory within the build directory and simply pass the path to it to all compiles (CMake would still need to know somehow which module files to compile in that way though).

I think a simple modules support in CMake could look this way:
If you add a C++ library, you can specify interface h1.hm h2.hm h3.hm (no idea how module files will be called :), which would build a module out of these headers / C++ module definitions.
The module will be passed to all libraries in the reverse transitive dependency closure.
For clang, for example, we could also currently pass (for backwards compatibility) h.cppmap in interface, which will only be needed for the transitional period, and compile a module out of plain C++ headers.

You use the singular 'a module' for a library, whereas other discussion was a recommendation to maintain one module per class in the library. I'm having trouble juggling all of the different options and what impact they have on libraries and buildsystems.

What you describe above is similar to what I had in mind at the beginning which is something akin to one module per library.

That way, you could get incremental compilation benefits early, and we could get real world experience on modules builds for projects that are happy to be early testers.

I'm sure there are many willing. I'm sure that will bring up even more questions too.
 
Thanks,

Steve.

Manuel Klimek

unread,
Feb 8, 2017, 6:47:25 AM2/8/17
to mod...@isocpp.org, Gabriel Dos Reis, C++ Modules, Daniel Jasper
On Wed, Feb 8, 2017 at 1:16 AM Stephen Kelly <stev...@gmail.com> wrote:
On 02/07/2017 09:32 AM, 'Manuel Klimek' via SG2 - Modules wrote:
On Tue, Feb 7, 2017 at 12:04 AM Stephen Kelly <stev...@gmail.com> wrote:
A separate issue is that, as compiled modules will probably not be distributable, the buildsystem will have to compile all the module files it depends on. I don't have any sense of how long it would take to compile module files for all classes in QtCore QtGui and QtWidgets (or QtCore QtGui QtQml and QtQuick) just to build your first hello world Qt program.

I think it's too early to really say how Qt should be layed out. It'll depend a lot on how specific compilers will need to get modules presented.

The issue of what the interface to modules is and how libraries such as (but not limited to) Qt and their users and buildsystems will make use of them seem quite interdependent to me.

It is also interdependent with buildsystems. The Ninja buildsystem can not currently build Fortran code because the way that Fortran modules are specified is not compatible with how Ninja currently works:

 https://groups.google.com/forum/#!topic/ninja-build/tPOcu5EWXio

I don't know how fixable that is, but I think that's an important conversation, and one which is inseparable from the design of modules.

I think, rather than the question in the email title here, a better open question is:


 What needs to happen in tooling and popular libraries for C++ modules to become a success?


That can include questions such as the impact of naming a module in the source, the features Manuel mentioned would be needed to add to all compilers such as an import/module/export scanning mode, buildsystems to be ported to use such a mode, extra steps buildsystems may have to explicitly take where today 'compile and link' gets you most of the way and how that will be exposed, what people who maintain the buildsystem for their project have to do, what does the transition look like etc.

Currently, from what I read on reddit at least, people seem to think that C++ modules will be added to the standard, magic will happen, and then they will achieve nirvana. What needs to be done to make that true?

The modules proposal seems more interdependent with tooling and how libraries maintain/define their interface than any other C++ feature I'm aware of since C++11.

If I'm not mistaken, the modules TS currently still has open discussion points like "will modules support macros", "will modules support legacy #includes", etc. I expect that we'll only be able to answer your questions fully once that is settled. If you want to have influence on this, you'll probably need to get involved in the standardization process :)

For example, we went back-and-forth multiple times between specifying all transitive modules on the command line (less non-parallelizable work for the build system) vs only handing in top-level modules (that is, not specifying modules on the command line that are in the transitive dependencies of a different module in the set of transitively used modules).

Not specifying all transitive (public) dependencies on the command line will only work if all module files are in well-known locations or the same location, right?

IOW, if Lib1 depends (publically) on Lib2 and Exe depends on Lib1, you specify only the full path to the module file for Lib1 when compiling Exe.cpp. How is the module file for Lib2 found if it is in some other random location? Is the path hardcoded in the Lib1 module file?


Currently, we are back to only specifying top-level modules, as that means clang can figure out which modules are actually used, writes only the used modules to the .d file, and we can use that information to prune the dependency graph of builds depending on that module (yea, it's somewhat complex, and we actually do an include-scanning step where we try to figure out which headers, and thus modules, are reachable at all from a source file).

Whatever complexity you hit regarding your buildsystem with modules, every other buildsystem else will eventually hit too.

I don't think that's necessarily true, because C++ modules might still end up being somewhat different from how Clang modules work today.

For example, if the only way to get modules is to rewrite your code bottom-up, and having a #include tree for legacy reasons on the side, most of the problems we hit by declaring current code as modular might not arise, as folks would naturally avoid them.
 
A different consideration is that for large continuously integrated code bases, the triggered rebuilds by a header change in a core library are much more of a problem than for a user of Qt, for example, who will probably stick with a single version for a longer while.

That depends on whether you are a developer working on Qt :).

Anyway, that's just an example. Most large codebases I've seen have the code separated into multiple libraries, not unlike the way Qt is structured. They will want to use modules and will hit the same issues we hit with Qt and in the same way.

Correct. Fundamentally, how you split up your library into modules will be the decision of the library owner, and I think module systems of other languages have shown us that all modules implementations will basically face the same problems - when you put large interfaces into a single unit, and require that to be available for builds in reverse dependencies, you'll get more rebuilds, for the trade-off of more convenient imports.
The old "maximize cohesion, minimize coupling" rule applies.
 
If that could be expected to work, CMake could compile module files into some internal directory within the build directory and simply pass the path to it to all compiles (CMake would still need to know somehow which module files to compile in that way though).

I think a simple modules support in CMake could look this way:
If you add a C++ library, you can specify interface h1.hm h2.hm h3.hm (no idea how module files will be called :), which would build a module out of these headers / C++ module definitions.
The module will be passed to all libraries in the reverse transitive dependency closure.
For clang, for example, we could also currently pass (for backwards compatibility) h.cppmap in interface, which will only be needed for the transitional period, and compile a module out of plain C++ headers.

You use the singular 'a module' for a library, whereas other discussion was a recommendation to maintain one module per class in the library. I'm having trouble juggling all of the different options and what impact they have on libraries and buildsystems.

What you describe above is similar to what I had in mind at the beginning which is something akin to one module per library.

Yea, but I'll also argue that you should just have a library per .cc file on larger projects, and that for smaller projects it doesn't matter :)
 
That way, you could get incremental compilation benefits early, and we could get real world experience on modules builds for projects that are happy to be early testers.

I'm sure there are many willing. I'm sure that will bring up even more questions too.
 
Thanks,

Steve.

Stephen Kelly

unread,
Feb 8, 2017, 7:16:47 PM2/8/17
to mod...@isocpp.org, Gabriel Dos Reis, C++ Modules, Daniel Jasper
On 02/08/2017 11:47 AM, 'Manuel Klimek' via SG2 - Modules wrote:
On Wed, Feb 8, 2017 at 1:16 AM Stephen Kelly <stev...@gmail.com> wrote:
The modules proposal seems more interdependent with tooling and how libraries maintain/define their interface than any other C++ feature I'm aware of since C++11.

If I'm not mistaken, the modules TS currently still has open discussion points like "will modules support macros", "will modules support legacy #includes", etc. I expect that we'll only be able to answer your questions fully once that is settled. If you want to have influence on this, you'll probably need to get involved in the standardization process :)

AFAIK, that's what I'm doing by engaging in this thread :).

Thanks,

Steve.

Reply all
Reply to author
Forward
0 new messages