In this email I'll try to explain one of the oddities of make (which
some CMake-based build systems rely on), and why we can't currently
express this in Ninja. I'll also propose how we could extend Ninja
to support this behaviour.
(Warning, long essay ahead.)
In the LLVM project we maintain a code generator called tblgen,
the purpose of which is to generate a number of header files
containing various metadata. From time to time, tblgen and its
dependent libraries will be modified, causing a rebuild of tblgen.
Strictly speaking, we should now rebuild the header files, and all of
their reverse dependencies, even if the generated file did not change.
This is suboptimal, because the reverse dependencies constitute
every object file built from a source file that transitively includes
one of the tblgen generated files (which is >50% of object files in
the build).
To avoid this problem, we cause tblgen to write its output to a
temporary file, and use a utility to copy the temporary file over the
target file only if the temporary file is different from the target.
In makefile terms, it looks something like this:
-----
all: outputuser.o
tblgen: tblgen.cpp
touch tblgen
output.inc.tmp: tblgen
touch output.inc.tmp
output.inc: output.inc.tmp
if cmp output.inc.tmp output.inc ; then : ; else cp output.inc.tmp output.inc ; fi
outputuser.o: outputuser.cpp output.inc
touch outputuser.o
-----
Note what happens during an incremental build where tblgen is the
only dirty file, but its output file output.inc.tmp does not change
relative to output.inc. make initially schedules a rebuild of tblgen,
output.inc.tmp, output.inc and outputuser.o. After output.inc has
been rebuilt, its timestamp remains the same as before the build.
Before make begins to rebuild outputuser.o, it will re-evaluate the
dirty state of outputuser.o based on the timestamps of its inputs
(i.e. it will stat them again). Because outputuser.cpp and output.inc
are both older than outputuser.o, make doesn't rebuild it after all,
despite it being initially scheduled for a build.
The behaviour is different in Ninja, which operates in two phases:
the scheduling of the build and the build itself. Like make, Ninja
will schedule a rebuild of tblgen, output.inc.tmp, output.inc and
outputuser.o. Unlike make, it will rebuild targetuser.o, because
it does not re-stat inputs during the build. The key observation
here is that unlike make, Ninja currently provides no mechanism for
pruning the scheduled build graph during a build using a build rule.
What I propose for Ninja is that we implement this pruning behaviour
in a similar way to make, but only for specific rules with a special
variable set on the rule. We can call this variable "restat"
(suggestions for better names are welcome). If this variable is
present on a rule, Ninja will, after executing the rule command,
re-stat each output file to obtain its modification time. If the
modification time is unchanged from when Ninja initially stat'ed the
file before starting the build, Ninja will mark that output file as
clean, and recursively for each reverse dependency of the output file,
recompute its dirty status.
As an improvement over what make does, Ninja then stores the current
timestamp in the build log entry associated with the output file.
This timestamp will be treated by future invocations of Ninja as the
output file's modification time instead of the output file's actual
modification time for the purpose of deciding whether it is dirty
(but not whether its reverse dependencies are dirty).
To give an example of how this would look, here is the above makefile
translated to Ninja:
-----
rule touch
command = touch $out
rule cpifdiff
command = if cmp $in $out ; then : ; else cp $in $out ; fi
restat = true
build tblgen: touch tblgen.cpp
build output.inc.tmp: touch tblgen
build output.inc: cpifdiff output.inc.tmp
build outputuser.o: touch outputuser.cpp output.inc
default outputuser.o
-----
Now consider what happens when Ninja is asked at timestamp 3 to rebuild
the default target (outputuser.o) where tblgen.cpp has timestamp 2 and
all other files have timestamp 1. Ninja will schedule a rebuild of
tblgen, output.inc.tmp, output.inc and outputuser.o. Again, suppose
that the contents of output.inc.tmp are equal to output.inc when built.
So after output.inc has been rebuilt, it still has a timestamp 1.
Ninja will notice that the timestamp is the same as at the start of
the build, and will mark output.inc as clean. This is propagated
through to outputuser.o, which is also marked clean. So no further
rebuilding is needed. Ninja will also associate the timestamp 3 with
output.inc in the build log.
Suppose that Ninja is invoked again immediately afterwards. The build
planner compares output.inc's timestamp in the build log, 3, against
the modification time of output.inc.tmp, also 3, so it is marked
as clean. It then compares output.inc's actual modification time,
1, against outputuser.o's modification time, also 1, so it is also
marked as clean, and this is a no-op build.
There's a small UI issue here in that the total number of files to be
rebuilt is unknown during the course of a rebuild until all restat
edges are done. There are a number of options for presenting the
build status until this happens. I can think of:
1) Show the maximum number of files that could be rebuilt at the
current time, and allow this number to drop.
1a) Same as 1, but prioritise the restat edges in order to show a
correct total to the user as quickly as possible.
2) Keep the total constant, and treat any skipped outputs as completed.
3) Display a question mark in place of the total until all restat
edges are done.
3a) Same as 3, but prioritise the restat edges.
I'm leaning towards 1 at the moment, unless the prioritisation turns
out to be easy, in which case I'd go for 3a.
Thanks for reading... thoughts?
Thanks,
--
Peter
On Wed, Sep 7, 2011 at 6:17 PM, Peter Collingbourne <pe...@pcc.me.uk> wrote:
> In the LLVM project we maintain a code generator called tblgen,
> the purpose of which is to generate a number of header files
> containing various metadata. From time to time, tblgen and its
> dependent libraries will be modified, causing a rebuild of tblgen.
> Strictly speaking, we should now rebuild the header files, and all of
> their reverse dependencies, even if the generated file did not change.
> This is suboptimal, because the reverse dependencies constitute
> every object file built from a source file that transitively includes
> one of the tblgen generated files (which is >50% of object files in
> the build).
>
> To avoid this problem, we cause tblgen to write its output to a
> temporary file, and use a utility to copy the temporary file over the
> target file only if the temporary file is different from the target.
Chrome has a similar pattern in it in a few places.
In fact, the gyp Ninja generator still has an ugly hack in it to work
around some of the consequences:
http://codesearch.google.com/codesearch#OAMlx_jo-ck/src/tools/gyp/pylib/gyp/generator/ninja.py&exact_package=chromium&q=ninja&type=cs&l=462
Let me summarize those needs here just to summarize the design space.
1) Chrome has XML source files containing translations of strings.
From these we generate C++ headers and packed data files used in
shipping. The program that generates these is rather slow, so it'd be
nice to not run it unnecessarily; the resulting headers are used in
many places, so it's nice to not rebuild them needlessly; but
thankfully the inputs very rarely change.
2) Chrome has a lastchange script that attempt to insert the current
SVN/git revision into a header file to be used in the about box.
As written, the intent of the build system is that the lastchange
script always runs and computes the current SVN revision, but it
conditionally writes its output file so that it doesn't trigger extra
rebuilds.
> In makefile terms, it looks something like this:
>
> -----
> all: outputuser.o
>
> tblgen: tblgen.cpp
> touch tblgen
>
> output.inc.tmp: tblgen
> touch output.inc.tmp
>
> output.inc: output.inc.tmp
> if cmp output.inc.tmp output.inc ; then : ; else cp output.inc.tmp output.inc ; fi
>
> outputuser.o: outputuser.cpp output.inc
> touch outputuser.o
> -----
>
> Note what happens during an incremental build where tblgen is the
> only dirty file, but its output file output.inc.tmp does not change
> relative to output.inc. make initially schedules a rebuild of tblgen,
> output.inc.tmp, output.inc and outputuser.o. After output.inc has
> been rebuilt, its timestamp remains the same as before the build.
> Before make begins to rebuild outputuser.o, it will re-evaluate the
> dirty state of outputuser.o based on the timestamps of its inputs
> (i.e. it will stat them again). Because outputuser.cpp and output.inc
> are both older than outputuser.o, make doesn't rebuild it after all,
> despite it being initially scheduled for a build.
Does this mean that the "cmp" step will continue to run in the future
for every build?
It seems output.inc.tmp and output.inc have the same content but
output.inc.tmp has a newer timestamp.
> The behaviour is different in Ninja, which operates in two phases:
> the scheduling of the build and the build itself. Like make, Ninja
> will schedule a rebuild of tblgen, output.inc.tmp, output.inc and
> outputuser.o. Unlike make, it will rebuild targetuser.o, because
> it does not re-stat inputs during the build. The key observation
> here is that unlike make, Ninja currently provides no mechanism for
> pruning the scheduled build graph during a build using a build rule.
>
> What I propose for Ninja is that we implement this pruning behaviour
> in a similar way to make, but only for specific rules with a special
> variable set on the rule. We can call this variable "restat"
> (suggestions for better names are welcome).
Alternative proposal, just for comparison's sake: always restat.
Pro:
- simpler build files and less conceptual overhead
Con:
- need to stat files while building. Will that be expensive? Since
the command likely just wrote the file, it'll probably be in the
kernel's cache at least.
> If this variable is
> present on a rule, Ninja will, after executing the rule command,
> re-stat each output file to obtain its modification time. If the
> modification time is unchanged from when Ninja initially stat'ed the
> file before starting the build, Ninja will mark that output file as
> clean, and recursively for each reverse dependency of the output file,
> recompute its dirty status.
For some history, part of the motivation of the two-pass approach was
that I wanted to support inotify.
Supporting that would require dirty status to propagate from the files
out to the targets, which is what you also would like here.
But it turned out we didn't need inotify to be fast enough, so the
code related to that isn't in the tree anymore (it didn't quite work
anyway).
It turned out to be surprisingly subtle to get everything right; see
Edge::RecomputeDirty, which does the traversal in the other direction
and relies on whether the file has been stat'ed to track nodes already
visited in the graph. Just thinking out loud, but perhaps it won't be
so bad if, when propagating dirty stat in the other direction like in
your proposal, we rely on the fact that RecomputeDirty has already
been called...
> As an improvement over what make does, Ninja then stores the current
> timestamp in the build log entry associated with the output file.
> This timestamp will be treated by future invocations of Ninja as the
> output file's modification time instead of the output file's actual
> modification time for the purpose of deciding whether it is dirty
> (but not whether its reverse dependencies are dirty).
>
> To give an example of how this would look, here is the above makefile
> translated to Ninja:
>
> -----
> rule touch
> command = touch $out
>
> rule cpifdiff
> command = if cmp $in $out ; then : ; else cp $in $out ; fi
> restat = true
>
> build tblgen: touch tblgen.cpp
> build output.inc.tmp: touch tblgen
> build output.inc: cpifdiff output.inc.tmp
> build outputuser.o: touch outputuser.cpp output.inc
>
> default outputuser.o
> -----
I wonder if there is something smarter or simpler we can do here -- if
the pattern of "this command may produce the same output as before" is
the reason for the restat stuff maybe we can automate some of that?
For Chrome's case, the source gyp file doesn't express whether the
command is one of the special cases where we want to do special
handling of the outputs. (I can only assume MSVS and Xcode do the
restat behavior already, because we successfully use those; but on the
other hand, I've heard MSVS grinds for ~30s for a no-change build so
perhaps it is needless rerunning rules.)
> There's a small UI issue here in that the total number of files to be
> rebuilt is unknown during the course of a rebuild until all restat
> edges are done. There are a number of options for presenting the
> build status until this happens. I can think of:
>
> 1) Show the maximum number of files that could be rebuilt at the
> current time, and allow this number to drop.
> 1a) Same as 1, but prioritise the restat edges in order to show a
> correct total to the user as quickly as possible.
> 2) Keep the total constant, and treat any skipped outputs as completed.
> 3) Display a question mark in place of the total until all restat
> edges are done.
> 3a) Same as 3, but prioritise the restat edges.
>
> I'm leaning towards 1 at the moment, unless the prioritisation turns
> out to be easy, in which case I'd go for 3a.
1 sounds ok to me. I'd like to change the UI for showing pending
state anyway, so I'm open to options.
I had a thread within Google about it and saw many different
proposals, but none made me completely happy.
I even got this patch:
https://github.com/martine/ninja/pull/88
The issue I have is that I typically build with -j100 (we are spoiled,
I know) which means right now, with the logic that the first number is
the number of completed tasks, we print out 100 file names before the
number goes up from zero.
One idea that might converge with your proposal is something like:
[edges started / edges remaining]
Where the right number counts down to 0 as the build progresses.
With that sort of presentation, if we reduce the number of edges
remaining due to restats the number will just jump down, which will
feel the same as commands that execute quickly.
I'm not sure if that's acceptable for your case, though -- if this
header really is used everywhere, the initial estimate of edges might
be a very large number that jumps down by a thousand or so immediately
after a restat rule.
It seems that restat should be sufficient to handle both of those cases.
In Make, that is correct. The intention of the below part of the
proposal...
> > As an improvement over what make does, Ninja then stores the current
> > timestamp in the build log entry associated with the output file.
> > This timestamp will be treated by future invocations of Ninja as the
> > output file's modification time instead of the output file's actual
> > modification time for the purpose of deciding whether it is dirty
> > (but not whether its reverse dependencies are dirty).
...is to avoid running "cmp" every time in Ninja. Which is even
better if "cmp" is actually the code generator.
Actually, I've been rethinking this part of the proposal. Due to clock
skew it may be best to store the latest timestamp of any input file in
the build log instead.
> > What I propose for Ninja is that we implement this pruning behaviour
> > in a similar way to make, but only for specific rules with a special
> > variable set on the rule. �We can call this variable "restat"
> > (suggestions for better names are welcome).
>
> Alternative proposal, just for comparison's sake: always restat.
>
> Pro:
> - simpler build files and less conceptual overhead
>
> Con:
> - need to stat files while building. Will that be expensive? Since
> the command likely just wrote the file, it'll probably be in the
> kernel's cache at least.
Intuitively, the time taken to rebuild the file should dwarf the cost
of stat'ing, but I don't know if this is true on all operating systems
(particularly Windows) though.
This also brings up a related question: should phony rules restat?
The intuitive answer seems "no", but there are some special cases.
For example, the KDE build system uses CMake in such a way that one
can't see which of a set of dependencies (conditionally) generates
a source file. So we have to output things like:
generated_file.cpp: phony some_rule1 some_rule2 some_lib1.a some_lib2.a
Of course I'd like Ninja to restat generated_file.cpp.
Perhaps the right thing to do is to restat by default (except for the
phony rule), and allow both rules and build statements to override
this.
> For some history, part of the motivation of the two-pass approach was
> that I wanted to support inotify.
> Supporting that would require dirty status to propagate from the files
> out to the targets, which is what you also would like here.
> But it turned out we didn't need inotify to be fast enough, so the
> code related to that isn't in the tree anymore (it didn't quite work
> anyway).
>
> It turned out to be surprisingly subtle to get everything right; see
> Edge::RecomputeDirty, which does the traversal in the other direction
> and relies on whether the file has been stat'ed to track nodes already
> visited in the graph. Just thinking out loud, but perhaps it won't be
> so bad if, when propagating dirty stat in the other direction like in
> your proposal, we rely on the fact that RecomputeDirty has already
> been called...
I've been playing with an implementation of this, and yes, it depends
on RecomputeDirty having already run. The main change I needed to
make to the graph data structure was to store in each edge the total
number of dirty inputs at the start of execution (since the dirty flags
are cleared during the build, I needed a way of knowing when all inputs
had been cleaned by a restat as opposed to a rebuild).
My branch is here, if you would like to take a look. I'll turn it into a
pull request once I'm more comfortable with it.
http://github.com/pcc/ninja/tree/restat
> I wonder if there is something smarter or simpler we can do here -- if
> the pattern of "this command may produce the same output as before" is
> the reason for the restat stuff maybe we can automate some of that?
We may want to go one step beyond restat and add another option for
comparing not only the timestamp before and after but also the
file contents. This wouldn't be much use for CMake though, since
many existing CMake-based build systems are already using their own
mechanism for the conditional copy (for example, by building it into
the code generator itself).
> For Chrome's case, the source gyp file doesn't express whether the
> command is one of the special cases where we want to do special
> handling of the outputs.
CMake doesn't express this either. Everything that isn't a standard
compiler or linker command is lumped into the category of "custom
commands". It looks like I'll have to make sure all custom commands
restat.
> (I can only assume MSVS and Xcode do the
> restat behavior already, because we successfully use those; but on the
> other hand, I've heard MSVS grinds for ~30s for a no-change build so
> perhaps it is needless rerunning rules.)
I've heard from people on the LLVM mailing list that MSVS can keep
rebuilding everything if tblgen changes, so I guess it doesn't restat.
Actually I like the sound of that proposal. It seems that no matter
which way we go (other than something like my 3), we'd still get
jumps on one of the numbers. And if we make restat the default,
3 doesn't seem very appealing.
Thanks,
--
Peter
Yes, it should work in any scenario where the build command makes a
rewrite decision based on only the dependencies and output file.
Thanks,
--
Peter
Can you paste the error message?
Hi Fran,
This feature was not implemented in the restat branch. The new
implementation of restat rules, which features a new internal design,
is in my outputs-ready branch, and it implements this feature (see
commit 05b8f8e158):
https://github.com/pcc/ninja/tree/outputs-ready
Thanks,
--
Peter
If we do this, we would need to think carefully about the interaction
between the restat and hardlink features to avoid introducing subtle
bugs.
> I also agree that restat is a good idea. It would also be awesome if
> we could automate the check for sameness. Perhaps with a tag like
> "revert_if_same = true"; if present, then ninja will md5-checksum the
> file before starting the build, and after finishing, and if checksums
> are the same, would retouch it with the previous timestamp, prune the
> tree, and update the timestamp in the log like Peter described. I have
> two immediate use cases, both related to incorporating information
> about the current build environment (svn info, gcc + os version) into
> the built files.
Is there any reason why you cannot use something like the cpifdiff
rule mentioned in my earlier email to do this? Personally I would be
wary of building functionality into Ninja when it can be implemented
externally.
> How stable is the outputs_ready branch? And what are the prospects for
> getting this stuff into the mainstream ninja?
I have been dogfooding the outputs-ready branch for almost a month now
for my own development using the CMake Ninja generator, and I have
also been recommending it to others who are trying the generator,
which relies on the functionality in outputs-ready. In that time I
have not encountered any bugs or received any bug reports specifically
relating to outputs-ready. The generator is now passing all tests
in the CMake test suite, so I will be almost ready to submit it to
CMake upstream once outputs-ready is merged upstream.
Unless any issues with the design are raised, I will rebase
outputs-ready on top of Evan's master branch once generator rules
have been merged (outputs-ready contains an earlier version of the
generator rule implementation which would need to be removed) and
send him a pull request.
Thanks,
--
Peter
It's now done: https://github.com/martine/ninja/pull/125
The first two changes look fine to me. Perhaps you can prepare a pull
request for Evan with those changes? They don't look like anything
related to my branch, so you can probably base them on Evan's master.
Thanks,
--
Peter