Proposal: restat rules

127 views
Skip to first unread message

Peter Collingbourne

unread,
Sep 7, 2011, 9:17:18 PM9/7/11
to ninja...@googlegroups.com
Hi,

In this email I'll try to explain one of the oddities of make (which
some CMake-based build systems rely on), and why we can't currently
express this in Ninja. I'll also propose how we could extend Ninja
to support this behaviour.

(Warning, long essay ahead.)

In the LLVM project we maintain a code generator called tblgen,
the purpose of which is to generate a number of header files
containing various metadata. From time to time, tblgen and its
dependent libraries will be modified, causing a rebuild of tblgen.
Strictly speaking, we should now rebuild the header files, and all of
their reverse dependencies, even if the generated file did not change.
This is suboptimal, because the reverse dependencies constitute
every object file built from a source file that transitively includes
one of the tblgen generated files (which is >50% of object files in
the build).

To avoid this problem, we cause tblgen to write its output to a
temporary file, and use a utility to copy the temporary file over the
target file only if the temporary file is different from the target.
In makefile terms, it looks something like this:

-----
all: outputuser.o

tblgen: tblgen.cpp
touch tblgen

output.inc.tmp: tblgen
touch output.inc.tmp

output.inc: output.inc.tmp
if cmp output.inc.tmp output.inc ; then : ; else cp output.inc.tmp output.inc ; fi

outputuser.o: outputuser.cpp output.inc
touch outputuser.o
-----

Note what happens during an incremental build where tblgen is the
only dirty file, but its output file output.inc.tmp does not change
relative to output.inc. make initially schedules a rebuild of tblgen,
output.inc.tmp, output.inc and outputuser.o. After output.inc has
been rebuilt, its timestamp remains the same as before the build.
Before make begins to rebuild outputuser.o, it will re-evaluate the
dirty state of outputuser.o based on the timestamps of its inputs
(i.e. it will stat them again). Because outputuser.cpp and output.inc
are both older than outputuser.o, make doesn't rebuild it after all,
despite it being initially scheduled for a build.

The behaviour is different in Ninja, which operates in two phases:
the scheduling of the build and the build itself. Like make, Ninja
will schedule a rebuild of tblgen, output.inc.tmp, output.inc and
outputuser.o. Unlike make, it will rebuild targetuser.o, because
it does not re-stat inputs during the build. The key observation
here is that unlike make, Ninja currently provides no mechanism for
pruning the scheduled build graph during a build using a build rule.

What I propose for Ninja is that we implement this pruning behaviour
in a similar way to make, but only for specific rules with a special
variable set on the rule. We can call this variable "restat"
(suggestions for better names are welcome). If this variable is
present on a rule, Ninja will, after executing the rule command,
re-stat each output file to obtain its modification time. If the
modification time is unchanged from when Ninja initially stat'ed the
file before starting the build, Ninja will mark that output file as
clean, and recursively for each reverse dependency of the output file,
recompute its dirty status.

As an improvement over what make does, Ninja then stores the current
timestamp in the build log entry associated with the output file.
This timestamp will be treated by future invocations of Ninja as the
output file's modification time instead of the output file's actual
modification time for the purpose of deciding whether it is dirty
(but not whether its reverse dependencies are dirty).

To give an example of how this would look, here is the above makefile
translated to Ninja:

-----
rule touch
command = touch $out

rule cpifdiff
command = if cmp $in $out ; then : ; else cp $in $out ; fi
restat = true

build tblgen: touch tblgen.cpp
build output.inc.tmp: touch tblgen
build output.inc: cpifdiff output.inc.tmp
build outputuser.o: touch outputuser.cpp output.inc

default outputuser.o
-----

Now consider what happens when Ninja is asked at timestamp 3 to rebuild
the default target (outputuser.o) where tblgen.cpp has timestamp 2 and
all other files have timestamp 1. Ninja will schedule a rebuild of
tblgen, output.inc.tmp, output.inc and outputuser.o. Again, suppose
that the contents of output.inc.tmp are equal to output.inc when built.
So after output.inc has been rebuilt, it still has a timestamp 1.
Ninja will notice that the timestamp is the same as at the start of
the build, and will mark output.inc as clean. This is propagated
through to outputuser.o, which is also marked clean. So no further
rebuilding is needed. Ninja will also associate the timestamp 3 with
output.inc in the build log.

Suppose that Ninja is invoked again immediately afterwards. The build
planner compares output.inc's timestamp in the build log, 3, against
the modification time of output.inc.tmp, also 3, so it is marked
as clean. It then compares output.inc's actual modification time,
1, against outputuser.o's modification time, also 1, so it is also
marked as clean, and this is a no-op build.

There's a small UI issue here in that the total number of files to be
rebuilt is unknown during the course of a rebuild until all restat
edges are done. There are a number of options for presenting the
build status until this happens. I can think of:

1) Show the maximum number of files that could be rebuilt at the
current time, and allow this number to drop.
1a) Same as 1, but prioritise the restat edges in order to show a
correct total to the user as quickly as possible.
2) Keep the total constant, and treat any skipped outputs as completed.
3) Display a question mark in place of the total until all restat
edges are done.
3a) Same as 3, but prioritise the restat edges.

I'm leaning towards 1 at the moment, unless the prioritisation turns
out to be easy, in which case I'd go for 3a.

Thanks for reading... thoughts?

Thanks,
--
Peter

Evan Martin

unread,
Sep 7, 2011, 11:34:50 PM9/7/11
to ninja...@googlegroups.com
First, let me say that I'm totally willing to adapt Ninja to these
sorts of needs. I have often worried that its semantics were too
close to gyp, and adapting it for another project is a great way to
keep it general.

On Wed, Sep 7, 2011 at 6:17 PM, Peter Collingbourne <pe...@pcc.me.uk> wrote:
> In the LLVM project we maintain a code generator called tblgen,
> the purpose of which is to generate a number of header files
> containing various metadata.  From time to time, tblgen and its
> dependent libraries will be modified, causing a rebuild of tblgen.
> Strictly speaking, we should now rebuild the header files, and all of
> their reverse dependencies, even if the generated file did not change.
> This is suboptimal, because the reverse dependencies constitute
> every object file built from a source file that transitively includes
> one of the tblgen generated files (which is >50% of object files in
> the build).
>
> To avoid this problem, we cause tblgen to write its output to a
> temporary file, and use a utility to copy the temporary file over the
> target file only if the temporary file is different from the target.

Chrome has a similar pattern in it in a few places.

In fact, the gyp Ninja generator still has an ugly hack in it to work
around some of the consequences:
http://codesearch.google.com/codesearch#OAMlx_jo-ck/src/tools/gyp/pylib/gyp/generator/ninja.py&exact_package=chromium&q=ninja&type=cs&l=462

Let me summarize those needs here just to summarize the design space.
1) Chrome has XML source files containing translations of strings.
From these we generate C++ headers and packed data files used in
shipping. The program that generates these is rather slow, so it'd be
nice to not run it unnecessarily; the resulting headers are used in
many places, so it's nice to not rebuild them needlessly; but
thankfully the inputs very rarely change.

2) Chrome has a lastchange script that attempt to insert the current
SVN/git revision into a header file to be used in the about box.
As written, the intent of the build system is that the lastchange
script always runs and computes the current SVN revision, but it
conditionally writes its output file so that it doesn't trigger extra
rebuilds.

> In makefile terms, it looks something like this:
>
> -----
> all: outputuser.o
>
> tblgen: tblgen.cpp
>        touch tblgen
>
> output.inc.tmp: tblgen
>        touch output.inc.tmp
>
> output.inc: output.inc.tmp
>        if cmp output.inc.tmp output.inc ; then : ; else cp output.inc.tmp output.inc ; fi
>
> outputuser.o: outputuser.cpp output.inc
>        touch outputuser.o
> -----
>
> Note what happens during an incremental build where tblgen is the
> only dirty file, but its output file output.inc.tmp does not change
> relative to output.inc.  make initially schedules a rebuild of tblgen,
> output.inc.tmp, output.inc and outputuser.o.  After output.inc has
> been rebuilt, its timestamp remains the same as before the build.
> Before make begins to rebuild outputuser.o, it will re-evaluate the
> dirty state of outputuser.o based on the timestamps of its inputs
> (i.e. it will stat them again).  Because outputuser.cpp and output.inc
> are both older than outputuser.o, make doesn't rebuild it after all,
> despite it being initially scheduled for a build.

Does this mean that the "cmp" step will continue to run in the future
for every build?
It seems output.inc.tmp and output.inc have the same content but
output.inc.tmp has a newer timestamp.

> The behaviour is different in Ninja, which operates in two phases:
> the scheduling of the build and the build itself.  Like make, Ninja
> will schedule a rebuild of tblgen, output.inc.tmp, output.inc and
> outputuser.o.  Unlike make, it will rebuild targetuser.o, because
> it does not re-stat inputs during the build.  The key observation
> here is that unlike make, Ninja currently provides no mechanism for
> pruning the scheduled build graph during a build using a build rule.
>
> What I propose for Ninja is that we implement this pruning behaviour
> in a similar way to make, but only for specific rules with a special
> variable set on the rule.  We can call this variable "restat"
> (suggestions for better names are welcome).

Alternative proposal, just for comparison's sake: always restat.

Pro:
- simpler build files and less conceptual overhead

Con:
- need to stat files while building. Will that be expensive? Since
the command likely just wrote the file, it'll probably be in the
kernel's cache at least.

>  If this variable is
> present on a rule, Ninja will, after executing the rule command,
> re-stat each output file to obtain its modification time.  If the
> modification time is unchanged from when Ninja initially stat'ed the
> file before starting the build, Ninja will mark that output file as
> clean, and recursively for each reverse dependency of the output file,
> recompute its dirty status.

For some history, part of the motivation of the two-pass approach was
that I wanted to support inotify.
Supporting that would require dirty status to propagate from the files
out to the targets, which is what you also would like here.
But it turned out we didn't need inotify to be fast enough, so the
code related to that isn't in the tree anymore (it didn't quite work
anyway).

It turned out to be surprisingly subtle to get everything right; see
Edge::RecomputeDirty, which does the traversal in the other direction
and relies on whether the file has been stat'ed to track nodes already
visited in the graph. Just thinking out loud, but perhaps it won't be
so bad if, when propagating dirty stat in the other direction like in
your proposal, we rely on the fact that RecomputeDirty has already
been called...

> As an improvement over what make does, Ninja then stores the current
> timestamp in the build log entry associated with the output file.
> This timestamp will be treated by future invocations of Ninja as the
> output file's modification time instead of the output file's actual
> modification time for the purpose of deciding whether it is dirty
> (but not whether its reverse dependencies are dirty).
>
> To give an example of how this would look, here is the above makefile
> translated to Ninja:
>
> -----
> rule touch
>  command = touch $out
>
> rule cpifdiff
>  command = if cmp $in $out ; then : ; else cp $in $out ; fi
>  restat = true
>
> build tblgen: touch tblgen.cpp
> build output.inc.tmp: touch tblgen
> build output.inc: cpifdiff output.inc.tmp
> build outputuser.o: touch outputuser.cpp output.inc
>
> default outputuser.o
> -----

I wonder if there is something smarter or simpler we can do here -- if
the pattern of "this command may produce the same output as before" is
the reason for the restat stuff maybe we can automate some of that?
For Chrome's case, the source gyp file doesn't express whether the
command is one of the special cases where we want to do special
handling of the outputs. (I can only assume MSVS and Xcode do the
restat behavior already, because we successfully use those; but on the
other hand, I've heard MSVS grinds for ~30s for a no-change build so
perhaps it is needless rerunning rules.)

> There's a small UI issue here in that the total number of files to be
> rebuilt is unknown during the course of a rebuild until all restat
> edges are done.  There are a number of options for presenting the
> build status until this happens.  I can think of:
>
> 1) Show the maximum number of files that could be rebuilt at the
>   current time, and allow this number to drop.
>   1a) Same as 1, but prioritise the restat edges in order to show a
>       correct total to the user as quickly as possible.
> 2) Keep the total constant, and treat any skipped outputs as completed.
> 3) Display a question mark in place of the total until all restat
>   edges are done.
>   3a) Same as 3, but prioritise the restat edges.
>
> I'm leaning towards 1 at the moment, unless the prioritisation turns
> out to be easy, in which case I'd go for 3a.

1 sounds ok to me. I'd like to change the UI for showing pending
state anyway, so I'm open to options.

I had a thread within Google about it and saw many different
proposals, but none made me completely happy.
I even got this patch:
https://github.com/martine/ninja/pull/88

The issue I have is that I typically build with -j100 (we are spoiled,
I know) which means right now, with the logic that the first number is
the number of completed tasks, we print out 100 file names before the
number goes up from zero.

One idea that might converge with your proposal is something like:
[edges started / edges remaining]
Where the right number counts down to 0 as the build progresses.
With that sort of presentation, if we reduce the number of edges
remaining due to restats the number will just jump down, which will
feel the same as commands that execute quickly.

I'm not sure if that's acceptable for your case, though -- if this
header really is used everywhere, the initial estimate of edges might
be a very large number that jumps down by a thousand or so immediately
after a restat rule.

Qingning

unread,
Sep 8, 2011, 5:03:26 PM9/8/11
to ninja-build
[not quoting the messages because they are too long]

This change would be a huge improvment to us as well. It would save us
a lot of unnecessary build time if ninja can trim the build action
tree if previous actions do not update their output files.

With the current proposal, I wonder whether it is absolutely necessary
to have two copies of the files and run cmp on them. I suppose it
should also work if the code generating tool can compare the file
content and only update the output file if it is to be different?

Thanks
Qingning

Peter Collingbourne

unread,
Sep 11, 2011, 9:58:19 PM9/11/11
to ninja...@googlegroups.com

It seems that restat should be sufficient to handle both of those cases.

In Make, that is correct. The intention of the below part of the
proposal...

> > As an improvement over what make does, Ninja then stores the current
> > timestamp in the build log entry associated with the output file.
> > This timestamp will be treated by future invocations of Ninja as the
> > output file's modification time instead of the output file's actual
> > modification time for the purpose of deciding whether it is dirty
> > (but not whether its reverse dependencies are dirty).

...is to avoid running "cmp" every time in Ninja. Which is even
better if "cmp" is actually the code generator.

Actually, I've been rethinking this part of the proposal. Due to clock
skew it may be best to store the latest timestamp of any input file in
the build log instead.

> > What I propose for Ninja is that we implement this pruning behaviour
> > in a similar way to make, but only for specific rules with a special
> > variable set on the rule. �We can call this variable "restat"
> > (suggestions for better names are welcome).
>
> Alternative proposal, just for comparison's sake: always restat.
>
> Pro:
> - simpler build files and less conceptual overhead
>
> Con:
> - need to stat files while building. Will that be expensive? Since
> the command likely just wrote the file, it'll probably be in the
> kernel's cache at least.

Intuitively, the time taken to rebuild the file should dwarf the cost
of stat'ing, but I don't know if this is true on all operating systems
(particularly Windows) though.

This also brings up a related question: should phony rules restat?
The intuitive answer seems "no", but there are some special cases.
For example, the KDE build system uses CMake in such a way that one
can't see which of a set of dependencies (conditionally) generates
a source file. So we have to output things like:

generated_file.cpp: phony some_rule1 some_rule2 some_lib1.a some_lib2.a

Of course I'd like Ninja to restat generated_file.cpp.

Perhaps the right thing to do is to restat by default (except for the
phony rule), and allow both rules and build statements to override
this.

> For some history, part of the motivation of the two-pass approach was
> that I wanted to support inotify.
> Supporting that would require dirty status to propagate from the files
> out to the targets, which is what you also would like here.
> But it turned out we didn't need inotify to be fast enough, so the
> code related to that isn't in the tree anymore (it didn't quite work
> anyway).
>
> It turned out to be surprisingly subtle to get everything right; see
> Edge::RecomputeDirty, which does the traversal in the other direction
> and relies on whether the file has been stat'ed to track nodes already
> visited in the graph. Just thinking out loud, but perhaps it won't be
> so bad if, when propagating dirty stat in the other direction like in
> your proposal, we rely on the fact that RecomputeDirty has already
> been called...

I've been playing with an implementation of this, and yes, it depends
on RecomputeDirty having already run. The main change I needed to
make to the graph data structure was to store in each edge the total
number of dirty inputs at the start of execution (since the dirty flags
are cleared during the build, I needed a way of knowing when all inputs
had been cleaned by a restat as opposed to a rebuild).

My branch is here, if you would like to take a look. I'll turn it into a
pull request once I'm more comfortable with it.
http://github.com/pcc/ninja/tree/restat

> I wonder if there is something smarter or simpler we can do here -- if
> the pattern of "this command may produce the same output as before" is
> the reason for the restat stuff maybe we can automate some of that?

We may want to go one step beyond restat and add another option for
comparing not only the timestamp before and after but also the
file contents. This wouldn't be much use for CMake though, since
many existing CMake-based build systems are already using their own
mechanism for the conditional copy (for example, by building it into
the code generator itself).

> For Chrome's case, the source gyp file doesn't express whether the
> command is one of the special cases where we want to do special
> handling of the outputs.

CMake doesn't express this either. Everything that isn't a standard
compiler or linker command is lumped into the category of "custom
commands". It looks like I'll have to make sure all custom commands
restat.

> (I can only assume MSVS and Xcode do the
> restat behavior already, because we successfully use those; but on the
> other hand, I've heard MSVS grinds for ~30s for a no-change build so
> perhaps it is needless rerunning rules.)

I've heard from people on the LLVM mailing list that MSVS can keep
rebuilding everything if tblgen changes, so I guess it doesn't restat.

Actually I like the sound of that proposal. It seems that no matter
which way we go (other than something like my 3), we'd still get
jumps on one of the numbers. And if we make restat the default,
3 doesn't seem very appealing.

Thanks,
--
Peter

Peter Collingbourne

unread,
Sep 11, 2011, 10:01:34 PM9/11/11
to ninja...@googlegroups.com
On Thu, Sep 08, 2011 at 02:03:26PM -0700, Qingning wrote:
> With the current proposal, I wonder whether it is absolutely necessary
> to have two copies of the files and run cmp on them. I suppose it
> should also work if the code generating tool can compare the file
> content and only update the output file if it is to be different?

Yes, it should work in any scenario where the build command makes a
rewrite decision based on only the dependencies and output file.

Thanks,
--
Peter

Frances

unread,
Sep 14, 2011, 12:52:46 PM9/14/11
to ninja-build
BTW - I can't get the restat branch to build out of the box using
VS2005.
I believe the hash_map template parameters differ between gcc an VC
See http://www.gamedev.net/topic/405995-hash_map-header/

Evan Martin

unread,
Sep 14, 2011, 2:50:04 PM9/14/11
to ninja...@googlegroups.com

Can you paste the error message?

Frances

unread,
Sep 15, 2011, 11:42:35 AM9/15/11
to ninja-build


On Sep 14, 7:50 pm, Evan Martin <mart...@danga.com> wrote:
> On Wed, Sep 14, 2011 at 9:52 AM, Frances <frances.buonte...@gmail.com> wrote:
> > BTW - I can't get the restat branch to build out of the box using
> > VS2005.
> > I believe the hash_map template parameters differ between gcc an VC
> > Seehttp://www.gamedev.net/topic/405995-hash_map-header/
>
> Can you paste the error message?

The essence is
2>C:\Program Files\Microsoft Visual Studio 8\VC\include\xhash(127) :
error C2039: 'bucket_size' : is not a member of 'StatCache::KeyHash'
2> c:\src\ninja_stat\patched\src\stat_cache.h(47) : see
declaration of 'StatCache::KeyHash'

2>C:\Program Files\Microsoft Visual Studio 8\VC\include\xhash(128) :
error C2039: 'min_buckets' : is not a member of 'StatCache::KeyHash'
2> c:\src\ninja_stat\patched\src\stat_cache.h(47) : see
declaration of 'StatCache::KeyHash'


2>c:\src\ninja_stat\patched\src\stat_cache.h(51) : error C2976:
'stdext::hash_map' : too few template arguments
2> C:\Program Files\Microsoft Visual Studio 8\VC\include
\hash_map(89) : see declaration of 'stdext::hash_map'

MS uses the third (defaulted) template param for hash_map as a policy
for both the key and the comparison.

Fran.

Frances

unread,
Oct 14, 2011, 10:07:48 AM10/14/11
to ninja-build


On Sep 8, 2:17 am, Peter Collingbourne <pe...@pcc.me.uk> wrote:
<snip>

> Suppose that Ninja is invoked again immediately afterwards.  The build
> planner compares output.inc's timestamp in the build log,
<snip>

Just to clarify - I can't see the timestamp being taken from the build
log in the branch
http://github.com/pcc/ninja/tree/restat

Surely, the build log doesn't contain the actual start time, only the
milliseconds from start of the ninja run to end of that task, so you'd
need an extra line in the log file specifying the actual start time in
order to implement this.

Please let me know if I've missed something.

Fran.

Peter Collingbourne

unread,
Oct 14, 2011, 11:38:23 AM10/14/11
to ninja...@googlegroups.com

Hi Fran,

This feature was not implemented in the restat branch. The new
implementation of restat rules, which features a new internal design,
is in my outputs-ready branch, and it implements this feature (see
commit 05b8f8e158):

https://github.com/pcc/ninja/tree/outputs-ready

Thanks,
--
Peter

Dmitry

unread,
Oct 14, 2011, 5:44:12 PM10/14/11
to ninja-build

On Sep 7, 9:17 pm, Peter Collingbourne <pe...@pcc.me.uk> wrote:
>
> As an improvement over what make does, Ninja then stores the current
> timestamp in the build log entry associated with the output file.
> This timestamp will be treated by future invocations of Ninja as the
> output file's modification time instead of the output file's actual
> modification time for the purpose of deciding whether it is dirty
> (but not whether its reverse dependencies are dirty).

Just wanted to point out that this is also almost exactly what I was
suggesting doing for supporting hardlinks (both for cache support, or
just as a build rule). If a file is built by creating a new hardlink
to an existing file, there is no separate timestamp associated with
it, and storing one in ninja's log would be a great solution.

I also agree that restat is a good idea. It would also be awesome if
we could automate the check for sameness. Perhaps with a tag like
"revert_if_same = true"; if present, then ninja will md5-checksum the
file before starting the build, and after finishing, and if checksums
are the same, would retouch it with the previous timestamp, prune the
tree, and update the timestamp in the log like Peter described. I have
two immediate use cases, both related to incorporating information
about the current build environment (svn info, gcc + os version) into
the built files.

How stable is the outputs_ready branch? And what are the prospects for
getting this stuff into the mainstream ninja?

Dmitry

Peter Collingbourne

unread,
Oct 16, 2011, 8:51:47 PM10/16/11
to ninja...@googlegroups.com
On Fri, Oct 14, 2011 at 02:44:12PM -0700, Dmitry wrote:
>
> On Sep 7, 9:17�pm, Peter Collingbourne <pe...@pcc.me.uk> wrote:
> >
> > As an improvement over what make does, Ninja then stores the current
> > timestamp in the build log entry associated with the output file.
> > This timestamp will be treated by future invocations of Ninja as the
> > output file's modification time instead of the output file's actual
> > modification time for the purpose of deciding whether it is dirty
> > (but not whether its reverse dependencies are dirty).
>
> Just wanted to point out that this is also almost exactly what I was
> suggesting doing for supporting hardlinks (both for cache support, or
> just as a build rule). If a file is built by creating a new hardlink
> to an existing file, there is no separate timestamp associated with
> it, and storing one in ninja's log would be a great solution.

If we do this, we would need to think carefully about the interaction
between the restat and hardlink features to avoid introducing subtle
bugs.

> I also agree that restat is a good idea. It would also be awesome if
> we could automate the check for sameness. Perhaps with a tag like
> "revert_if_same = true"; if present, then ninja will md5-checksum the
> file before starting the build, and after finishing, and if checksums
> are the same, would retouch it with the previous timestamp, prune the
> tree, and update the timestamp in the log like Peter described. I have
> two immediate use cases, both related to incorporating information
> about the current build environment (svn info, gcc + os version) into
> the built files.

Is there any reason why you cannot use something like the cpifdiff
rule mentioned in my earlier email to do this? Personally I would be
wary of building functionality into Ninja when it can be implemented
externally.

> How stable is the outputs_ready branch? And what are the prospects for
> getting this stuff into the mainstream ninja?

I have been dogfooding the outputs-ready branch for almost a month now
for my own development using the CMake Ninja generator, and I have
also been recommending it to others who are trying the generator,
which relies on the functionality in outputs-ready. In that time I
have not encountered any bugs or received any bug reports specifically
relating to outputs-ready. The generator is now passing all tests
in the CMake test suite, so I will be almost ready to submit it to
CMake upstream once outputs-ready is merged upstream.

Unless any issues with the design are raised, I will rebase
outputs-ready on top of Evan's master branch once generator rules
have been merged (outputs-ready contains an earlier version of the
generator rule implementation which would need to be removed) and
send him a pull request.

Thanks,
--
Peter

Dmitry

unread,
Oct 17, 2011, 12:02:37 AM10/17/11
to ninja-build


On Oct 16, 8:51 pm, Peter Collingbourne <pe...@pcc.me.uk> wrote:
> On Fri, Oct 14, 2011 at 02:44:12PM -0700, Dmitry wrote:
>
> > On Sep 7, 9:17 pm, Peter Collingbourne <pe...@pcc.me.uk> wrote:
>
> > > As an improvement over what make does, Ninja then stores the current
> > > timestamp in the build log entry associated with the output file.
> > > This timestamp will be treated by future invocations of Ninja as the
> > > output file's modification time instead of the output file's actual
> > > modification time for the purpose of deciding whether it is dirty
> > > (but not whether its reverse dependencies are dirty).
>
> > Just wanted to point out that this is also almost exactly what I was
> > suggesting doing for supporting hardlinks (both for cache support, or
> > just as a build rule). If a file is built by creating a new hardlink
> > to an existing file, there is no separate timestamp associated with
> > it, and storing one in ninja's log would be a great solution.
>
> If we do this, we would need to think carefully about the interaction
> between the restat and hardlink features to avoid introducing subtle
> bugs.

True... I guess there are cases when you wouldn't want restat behavior
when creating a hard-link to a potentially old file. That's a good
reason for having the "restat = true" variable, rather than have
restat for everything. But the extra complexity (in the need to
understand these details) is unfortunate.

> > I also agree that restat is a good idea. It would also be awesome if
> > we could automate the check for sameness. Perhaps with a tag like
> > "revert_if_same = true"; if present, then ninja will md5-checksum the
> > file before starting the build, and after finishing, and if checksums
> > are the same, would retouch it with the previous timestamp, prune the
> > tree, and update the timestamp in the log like Peter described. I have
> > two immediate use cases, both related to incorporating information
> > about the current build environment (svn info, gcc + os version) into
> > the built files.
>
> Is there any reason why you cannot use something like the cpifdiff
> rule mentioned in my earlier email to do this?  Personally I would be
> wary of building functionality into Ninja when it can be implemented
> externally.

It can be. It seems a useful abstraction, but you are right, it
probably doesn't belong inside ninja itself.

Dmitry

Frances

unread,
Oct 17, 2011, 6:37:45 AM10/17/11
to ninja-build
> Peter- Hide quoted text -
>
> - Show quoted text -

Thanks Peter.
By the way, when I got this to build on Windows (still having problems
with the hash_map, which I should sort a patch out for), I get tests
failing -

[ FAILED ] BuildWithLogTest.RestatTest
[ FAILED ] MakefileParser.Basic
[ FAILED ] MakefileParser.Continuation

The MakefileParsers are trying to use an index into the temporary
strings passed to MakefileParser, so I see this
src/parsers_test.cc(499): error: Value of: parser.out_.AsString()
Actual: "\0\0\0\0d/ninja.o"
Expected: "build/ninja.o"
[ FAILED ] MakefileParser.Basic (0 ms)
...
src/parsers_test.cc(527): error: Value of: parser.out_.AsString()
Actual: "8\xFC*\x1o"
Expected: "foo.o"


I haven't got to the bottom of the restat test problem yet. It says
src/build_test.cc(666): error: Value of: builder_.AlreadyUpToDate()
Actual: false
Expected: true
src/build_test.cc(679): error: Value of: commands_ran_.size()
Actual: 1
Expected: 2u
Which is: 2


Cheers,
Fran.

Frances

unread,
Oct 18, 2011, 5:14:34 AM10/18/11
to ninja-build
> Fran.- Hide quoted text -
>
> - Show quoted text -

Hi again Peter,

I figured out what happened to BuildWithLogTest.RestatTest - sorry
about the false alarm. It is fine - one of our patches sneaked an
if(!log_file_) return;
at the top of the BuildLog::recordCommand, which caused this failure.

The other tests seem to pass on Linux, but not on Windows, but they
look like they are indexing into an anonymous temporary that has gone
out of scope, via the StringPiece. Making the strings sent to the
parer.Parse in the test stack variables instead of temporaries sorts
this.

Fran.

Frances

unread,
Oct 20, 2011, 4:25:50 AM10/20/11
to ninja-build


On Oct 17, 1:51 am, Peter Collingbourne <pe...@pcc.me.uk> wrote:
> On Fri, Oct 14, 2011 at 02:44:12PM -0700, Dmitry wrote:
>
<snip>
>
> Unless any issues with the design are raised, I will rebase
> outputs-ready on top of Evan's master branch once generator rules
> have been merged (outputs-ready contains an earlier version of the
> generator rule implementation which would need to be removed) and
> send him a pull request.
>


Hi Peter,

I've been testing this branch using Microsoft's Visual studio. It
works really well, so I wonder when you were planning on making this
into a pull request?

I had a couple of problems on windows.
1. test.cc needs #include <alogirthm> to use std::find in
VirtualFileSystem::RemoveFile

2. I also changed TEST(MakefileParser, Basic)
+ string input("build/ninja.o: ninja.cc ninja.h eval_env.h
manifest_parser.h\n");
+ EXPECT_TRUE(parser.Parse(input,
- EXPECT_TRUE(parser.Parse(
- EXPECT_TRUE(parser.Parse(
- "build/ninja.o: ninja.cc ninja.h eval_env.h manifest_parser.h\n",

The StringPiece indexes into the temporary string, which shows up on
Windows, but doesn't seem to on Linux.

TEST(MakefileParser, Continuation) has a similar problem
+ string input(
+"foo.o: \\\n"
+" bar.h baz.h\n");
+ EXPECT_TRUE(parser.Parse(input,
- EXPECT_TRUE(parser.Parse(
-"foo.o: \\\n"
-" bar.h baz.h\n",
&err));

3. The hash key has a problem, as mentioned earlier. As a quick work
round I left the microsoft version using a hash_map of a <string,
FileStat*> in stat_cahe.h
#ifdef _MSC_VER
typedef hash_map<std::string, FileStat*> Paths;
#else
typedef ExternalStringHashMap<FileStat*>::Type Paths;
#endif
and left the ExternalStringHash code in hash_map.h inside an #ifndef
_MSC_VER.

Cheers,
Fran.

Peter Collingbourne

unread,
Oct 23, 2011, 8:52:53 PM10/23/11
to ninja...@googlegroups.com
On Thu, Oct 20, 2011 at 01:25:50AM -0700, Frances wrote:
> I've been testing this branch using Microsoft's Visual studio. It
> works really well, so I wonder when you were planning on making this
> into a pull request?

It's now done: https://github.com/martine/ninja/pull/125

The first two changes look fine to me. Perhaps you can prepare a pull
request for Evan with those changes? They don't look like anything
related to my branch, so you can probably base them on Evan's master.

Thanks,
--
Peter

Frances

unread,
Nov 11, 2011, 10:05:19 AM11/11/11
to ninja-build
> Peter- Hide quoted text -
>
> - Show quoted text -

Just wondering about the dry run option. It now seems to clean all the
outputs because of the Builder::FinishEdge code check for the file's
mtime, which won't change for a dry run. I suspect a dry run should
not invoke CleanNode.

Fran.
Reply all
Reply to author
Forward
0 new messages