Re-reading dep files for auto-generated files while building

Maxim Kalaev

unread,

Jun 17, 2012, 5:10:46 PM6/17/12

to ninja-build

Hi,
I am trying to port a legacy home-written build system to ninja.

The feature I am struggling to address is calculating dependencies for
auto-generated files.
There are auto-generated .c and .h files, where these .c and .h files
may include other automatically generated .h files.

Current system re-evaluates target dependencies once all inputs were
built, and if dependencies list was updated, new inputs are rebuilt,
and so on.
It seem like ninja, only reads dep files once while building the
graph, and doesn't re-read these while targets are built.

A toy example would be this manifest:

1 rule LD
2 command = gcc $in -o $out
3
4 rule CC
5 depfile = $out.d
6 command = gcc -c $in -o $out
7
8 rule GENH
9 command = echo $code > $out
10
11 rule GENC
12 command = echo $code > $out && gcc -MM -MG $out -MF auto.o.d -
MT auto.o
13
14 build auto: LD auto.o
15
16 build auto.o: CC auto.c
17
18 build auto.c: GENC
19 code = '#include "auto1.h"' "\n" 'int main() { return R; }'
"\n"
20
21 build auto1.h: GENH
22 code = '#define R 1' "\n"
23
24 default auto

It attempts to generate a .dep file signaling that auto.o depends of
auto.h, but the dep file is created too later. Build passes if I re-
run ninja twice.

Re-loading dep files on building is not expensive, as this is only
done for targets which are rebuilt. Also, it would be possible to
optimize this by checking mtime of .dep files to reload these only if
they were changed.

How bad would it be to call RecomputeDirty() from somewhere in the
Builder::Build() main loop?
Any other suggestions?

Thanks,
Maxim

Nico Weber

unread,

Jun 17, 2012, 9:21:42 PM6/17/12

to ninja...@googlegroups.com

Could you make your generator rules write a stamp file as side effect
(one for cc and h each), and have the CC step for the generated source
have an implicit dependency (
http://martine.github.com/ninja/manual.html#_build_dependencies ) on
these two stamp files?

It seems that relying on gcc to generate the depfile won't work: Since
it doesn't know it needs to generate auto1.h first, the `#include
"auto1.h"` line will cause a compilation the first time round anyway.

Nico

Evan Martin

unread,

Jun 17, 2012, 9:31:11 PM6/17/12

to ninja...@googlegroups.com

On Sun, Jun 17, 2012 at 2:10 PM, Maxim Kalaev <maxim...@gmail.com> wrote:

> The feature I am struggling to address is calculating dependencies for
> auto-generated files.
> There are auto-generated .c and .h files, where these .c and .h files
> may include other automatically generated .h files.

Are the ultimate dependencies always DAG or is possible for there to be loops?

I think at that point we've already committed to a Plan so even if new
dirty files are discovered they won't be built.
(It is possible to *drop* files from an existing Plan via the restat attribute.)
Of course, the code could always be adjusted, but I think it will be
more complicated.

> Any other suggestions?

At what point do you know about the generated file interdependencies?
One approach is to regenerate the .ninja files whenever the
dependencies change.

For example, when you're about to emit this snippet of your build file:
build auto.c: GENC

code = '#include "auto1.h"' "\n" 'int main() { return R; }'

You could scan the "code" variable for includes and add auto1.h as an
implicit or order-only dependency.

build auto.c: GENC || auto1.h

Maxim Kalaev

unread,

Jun 18, 2012, 1:22:05 AM6/18/12

to ninja...@googlegroups.com

Could you make your generator rules write a stamp file as side effect
(one for cc and h each), and have the CC step for the generated source
have an implicit dependency (

In other words, the proposal is adding a "barrier" between auto-generation of all files and all (some if I am lucky) compilations?
One barrier probably won't be sufficient, as some 'generator tools' are built as part of the tree and include some of the generated files (like ~error_codes.h), but two should be enough.
I've thought about this alternative... - not be not optimal in terms of parallelism, and a bit ugly, but may be the easiest path.

Other approach would be a trick known in 'make' world:
Writing a shell script (build.sh) which runs ninja in loop retrying when ninja finishes with an error IFF '.restart' file present.
Compilation rules checks if the appropriate .dep file was rebuilt and if it was - touches .restart and kills ninja.
This will work as well, with its own drawbacks.

It seems that relying on gcc to generate the depfile won't work: Since
it doesn't know it needs to generate auto1.h first, the `#include
"auto1.h"` line will cause a compilation the first time round anyway.

Oh, it may work. Take a look at '-MG' flag.

Thanks for the prompt reply!

Maxim Kalaev

unread,

Jun 18, 2012, 2:31:23 AM6/18/12

to ninja...@googlegroups.com

On Monday, June 18, 2012 4:31:11 AM UTC+3, Evan Martin wrote:

Are the ultimate dependencies always DAG or is possible for there to be loops?

These are DAG, unless there are problems in the build rules (otherwise it would not be possible to compile the project, right?).

I can try making the change, if you will bootstrap me:
- What are the problems you foresee with _adding_ new tasks to the Plan?
- Let's say we'll add a 'rescan-deps' attribute to targets, specifying to rescan dependencies if changed (limiting a number of rescans for a target to detect rules problems).

'rescan-deps' attribute can be good for limiting the scope of regression problems which can be introduced by the change, at expense of interface complications (which as probably a bad tradeoff in the long term).

Another problem to be addressed is _when_ to regenerate the dependencies file. See below:

> Any other suggestions?

At what point do you know about the generated file interdependencies?

It would be best to regenerate dependencies _each time_ when I _approach_ target compilation.
At this point I know the name of the target (to put it bebore ':' in the dep file) and this is a single place evaluated repeatedly (as the target stays dirty).
This could be done by using 'rescan-deps' to specify a command or a target to regenerate the dependencies? Or, is it possible to achieve this somehow with phony rules available today?

It is possible to generate dependencies when .c files are generated as I did in my example, but the drawbacks are that this address nesting of automatically generated include files (e.g., .c is built once and only first level of non-existing yet .h files is discovered), or generator of each .h file will have to generate deps for all .o files which will potentially depend in this .h which is a mess.
Creating a rule to generate .dep file won't work for the same reason - it will only be built once.

One approach is to regenerate the .ninja files whenever the
dependencies change.

Yeah, this is the 'retry all on deps change' approach which is not so great, as unless there is some 'barrier' allowing to build as much as possible before retrying, otherwise in the worst case it can restart after each file and we have ~360 generated .h files and ~650 .c files. Also, it's far easier and faster to regenerated dep file when .ninja.

P.S>
Thanks for your time and my respects to the project!
You've chosen your goals wisely and created a really good build system. Simple an beautiful.

Maxim Kalaev

unread,

Jun 28, 2012, 7:01:48 AM6/28/12

to ninja...@googlegroups.com

I'd like to continue the discussion on this topic assuming someone is interested.

Following the discussion above, I've tried the approach of setting build order making sure that targets with .h extension are build before object files.
This didn't work well, because some of the .h files are generated using tools which are built. Thus, I can't make all .o to depend on all .h, as this creates loop.
Then I've tried to classify .o to objects which are for 'tools' and all the rest which can depend on all .o files. This didn't as well, because objects of some of the tools were depending on headers generated by other tools.

Given that, I realized I need to change ninja to re-read dep files during the build, and it helped indeed (fixing that took about the same time as above and was more fun, actually).
The patch is here: https://github.com/maximuska/ninja/commit/2cbfe92ad2a1f61c83c791417ea20ec8b30b12bc.
It works like that:
Pick a ready edge, re-read dep file, update input dependencies, re-evaluate dirtiness, and (re)add all dependent edges to Plan.
Then check that AllInputsReady() and if not - simply go pick another (ready) edge from the Plan.

Some of the operations in "RecomputeDirty()" can be redundant in this scenario, but I am not feel very comfortable with the code yet to decide.

Overall, it works for me now, but I still have to keep some artificial ordering on building .h files:

Following the patch, the dep file can be read multiple times (basically, at most 'number of dependency changes' times per target. This is good, as it allows discovering automatically generated header files layer by layer (when there are multiple), but there is a problem with _building_ the deps. What I did was creating an artificial rule with '$out.d.' target with the same dependencies as '$out' has and making $out target to depend on it. This rebuilds deps just before the .o is built. The advantage is that deps are rebuilt as a task, not blocking main loop. The problem is that edges are only built once, by design, thus this dep rule will discover only the 1st layer of included headers, which is almost good but still requires adding some manual ordering rules.

It can be trivial to add a 'deps_evaluation_command' to be executed before rule 'command' on the main loop, before re-reading dependencies, but this will block and is not optimal, obviously.
Do you have any elegant suggestions to do this properly?

Maxim Kalaev

unread,

Jul 29, 2012, 4:49:38 AM7/29/12

to ninja...@googlegroups.com

I've implemented the dynamic dependencies reload feature which we've discussed on this thread, I'd like to ask you to review it and give your opinions.

The implementation itself is pretty self contained, IMO, I've put a lot of thinking to make this as such.
https://github.com/maximuska/ninja/commit/f2abd83d27b38a8a45c623c2c6ee1dd128a909cc

I understand though, that the application of this feature is pretty limited, as the right thing (and more efficient, as you don't have to build deps in separate) to do in typical projects is that was proposed by Nico: build all autogenerated files before starting with compilations. It seems like currently the only customers are myself and Fortran guys. This may be used also for generating sub-ninja files discussed recently, though I am not sure this is the right size of the hammer.

I've used the following test while developing it: https://github.com/maximuska/ninja/commit/8edb1004381ecb1011a2ace731c3acc97ab0af6f, will change it to ninja's test framework if the feature itself is interesting for upstreaming.

Qingning

unread,

Sep 6, 2012, 6:47:22 AM9/6/12

to ninja...@googlegroups.com

On Thursday, June 28, 2012 12:01:48 PM UTC+1, Maxim Kalaev wrote:

I'd like to continue the discussion on this topic assuming someone is interested.

Following the discussion above, I've tried the approach of setting build order making sure that targets with .h extension are build before object files.
This didn't work well, because some of the .h files are generated using tools which are built. Thus, I can't make all .o to depend on all .h, as this creates loop.
Then I've tried to classify .o to objects which are for 'tools' and all the rest which can depend on all .o files. This didn't as well, because objects of some of the tools were depending on headers generated by other tools.

Surely you have a very complicated use case :)

If you have one tool whose build depends on generated files, you should have all object files for this tool (order-only) depend on all the generated files that are used to compile the object files for this tool. I assume you know all the object files for this tool, and all the cpp files for this tool. I also assume you should have a high level knowledge of all header files required for this tool for two reasons, (1) typically the required headers match the libraries dependencies, (2) particularly in your case, you should have the knowledge anyway to avoid circular dependencies between building tools and running tools.

Then you can apply this pattern to any other tools as well as your main program.