.d files on Windows again

80 views
Skip to first unread message

Scott Graham

unread,
Feb 15, 2012, 12:56:13 AM2/15/12
to ninja...@googlegroups.com
Hi,

I'd like to work on getting .d files baked on Windows. I had a brief look at the deplist branch which seems like the best candidate for this right now. deplist is not currently mergeable with master (looks like it just needs a little love in src/graph.cc in LoadDepFile/List).

Feedback was relatively positive I think for including deplist, and it's an (optional) superset of functionality while potentially improving performance. Are there objections or outstanding issues to resolve? Or is it just a matter of testing and doing the work?

thanks, scott

Scott Graham

unread,
Feb 15, 2012, 2:00:26 AM2/15/12
to ninja...@googlegroups.com
Here's the deplist branch with master merged (I got a little confused on the merge, but I think it's correct), as well as a few fixes for Windows, bitrot, etc.


Does merging this seem like an OK idea?

Evan Martin

unread,
Feb 15, 2012, 11:09:36 AM2/15/12
to ninja...@googlegroups.com
On Tue, Feb 14, 2012 at 9:56 PM, Scott Graham <sco...@chromium.org> wrote:

I had hesitated because I had thought others had patches that
implemented the /ShowIncludes parsing, and I wanted to see how things
had worked for them before I just landed my own implementation. To be
honest, it's probably my fault that I didn't respond to the mails. I
always fear I'm losing patches that way. :(

Scott Graham

unread,
Feb 15, 2012, 1:37:24 PM2/15/12
to ninja...@googlegroups.com
OK, I agree it's probably good to experiment a little. I'll try using the deplist branch for a while, and see how it goes.

The only non-superficial thing that seems worth thinking about is Petr's suggestion of combining .d files into bigger clumps. Maybe we should consider what .build syntax for that would look like, and then maybe the file format to support it.

Scott Graham

unread,
Feb 15, 2012, 2:05:24 PM2/15/12
to ninja...@googlegroups.com
On Wed, Feb 15, 2012 at 10:37 AM, Scott Graham <sco...@chromium.org> wrote:
On Wed, Feb 15, 2012 at 8:09 AM, Evan Martin <mar...@danga.com> wrote:
On Tue, Feb 14, 2012 at 9:56 PM, Scott Graham <sco...@chromium.org> wrote:
> I'd like to work on getting .d files baked on Windows. I had a brief look at
> the deplist branch which seems like the best candidate for this right now.
> deplist is not currently mergeable with master (looks like it just needs a
> little love in src/graph.cc in LoadDepFile/List).
>
> Feedback was relatively positive I think for including deplist, and it's an
> (optional) superset of functionality while potentially improving
> performance. Are there objections or outstanding issues to resolve? Or is it
> just a matter of testing and doing the work?

I had hesitated because I had thought others had patches that
implemented the /ShowIncludes parsing, and I wanted to see how things
had worked for them before I just landed my own implementation.  To be
honest, it's probably my fault that I didn't respond to the mails.  I
always fear I'm losing patches that way.  :(

OK, I agree it's probably good to experiment a little. I'll try using the deplist branch for a while, and see how it goes.

So, on actually trying this in a .build file (heh, novel idea!) it turns out it doesn't work so well.

Specifically | syntax doesn't work because CreateProcess is used to launch which doesn't implicate a shell.

I can put cmd /c everywhere for now, but for something that's going to be used on every invocation of the compiler, it seems like avoiding that would be better. I'd prefer to switch it around and have it look something more like:

ninja-deplist-helper -f cl -o a.dep -- cl /nologo /showIncludes /c a.c ...

Does that seem too ugly?

I'd also be tempted to have it suppress some of cl's noisy messages too, like echoing filename.cc, and "Generating code" and so on, so that we can get one line output as on other platforms.


Evan Martin

unread,
Feb 15, 2012, 8:04:04 PM2/15/12
to ninja...@googlegroups.com
On Wed, Feb 15, 2012 at 11:05 AM, Scott Graham <sco...@chromium.org> wrote:
>> OK, I agree it's probably good to experiment a little. I'll try using the
>> deplist branch for a while, and see how it goes.

Some broad thoughts on this branch:

I agree that it needs experimentation. If we end up concatenating .d
files, which sounds necessary for Windows, we'll need a different file
format. Since the primary point of the new format is to make Windows
work better, it'd be better to figure that out before committing to a
path.

For example another option is to use a Real Database for storing
dependency info (e.g. the equivalent of ninja-deplist-helper could all
talk to some central file). SQLite is the obvious one, and Tony
suggested that leveldb would make sense:
http://code.google.com/p/leveldb/ .

I was hoping someone who has more experience on Windows would be able
to say "We tried X, Y, Z, and concluded that Y is the best way
forward." Perhaps that person will be you. :)

> Specifically | syntax doesn't work because CreateProcess is used to launch
> which doesn't implicate a shell.
>
> I can put cmd /c everywhere for now, but for something that's going to be
> used on every invocation of the compiler, it seems like avoiding that would
> be better.

Playing devil's advocate, is it a real problem? I had figured that,
even if Windows process startup is 50ms, 50ms is still small relative
to the time of a compile. And since this happens as part of the
compile step, it's both not in the critical path and it parallelizes.

> I'd prefer to switch it around and have it look something more
> like:
>
> ninja-deplist-helper -f cl -o a.dep -- cl /nologo /showIncludes /c a.c ...
>
> Does that seem too ugly?

That seems fine to me. I had avoided it because interprocess
communication on Windows is a mystery to me.

Another alternative is that Ninja itself could expose some endpoint
(named pipe on Windows or Unix domain socket on Linux) that the
subprocesses could stream into. Going back to the database idea, this
would allow central management of dependency data.

> I'd also be tempted to have it suppress some of cl's noisy messages too,
> like echoing filename.cc, and "Generating code" and so on, so that we can
> get one line output as on other platforms.

It's not obvious to me how much of this is Ninja's domain and how much
of it is up to the downstream project. In general I like as little
output as possible but maybe others like a lot (I was surprised to
learn that I had to argue to reduce Chrome's Mac build logs should be
under 50mb per build).

In all, I welcome your experimentation and insight. And I think your
gyp-related changes for Chrome will be mostly independent of all of
this anyway so those are not wasted time.

Petr Wolf

unread,
Feb 16, 2012, 5:27:36 AM2/16/12
to ninja-build
Hi all,

a couple of points:

* we have a working prototype of .d files concatenation. It needs some
updates though. To be published as a pull request and opened for
comments next week or the week after.

* we also have a patch to ninja which adds support for reading
compressed files (using zlib). This will also be published as a
proposal and open for comments soon. Stay tuned.

* some numbers about win32 performance using these two were discussed
in the "Building ninja with MSVC" thread. See
http://groups.google.com/group/ninja-build/browse_thread/thread/3742dcda61914707

* creating the depfiles for cl.exe is a completely different problem.
For that we've used wrapper around cl.exe which adds /showIncludes to
its command line and parses its output, producing a gcc-like depfile.
That is not part of ninja at all.

* the same goes for a tool, which does the actual concatenation and/or
compression of the .d files. That's a separate tool, outside ninja.

Petr

On Feb 16, 2:04 am, Evan Martin <mart...@danga.com> wrote:

Scott Graham

unread,
Feb 16, 2012, 12:10:37 PM2/16/12
to ninja...@googlegroups.com
Overall, sounds good to me. Just a few random comments inline.

On Wed, Feb 15, 2012 at 5:04 PM, Evan Martin <mar...@danga.com> wrote:

For example another option is to use a Real Database for storing
dependency info (e.g. the equivalent of ninja-deplist-helper could all
talk to some central file).  SQLite is the obvious one, and Tony
suggested that leveldb would make sense:
http://code.google.com/p/leveldb/ .

Thinking about this a bit this morning, I think my current preferred approach is just to be dumb (or maybe I'm just Real Database averse)

c:\src\chromium\src>dir *.cc /s/b | wc -l
10599

Guessing 200 includes per .cc, even if stored as fixed width records of _MAX_PATH:

c:\src\chromium\src>python -c print(10599*200*256)
542668800

So, for (probably less than) 500M, a simple daemon using ReadDirectoryChangesW or USN could have _stat info always up-to-date and available for incremental builds. For me, 500M for instabuild would be a good tradeoff, but some people might find that ugly, or too heavy for some hardware ir projects I guess.

Seems simple enough though, that going Real DB isn't really required unless there's other benefits or other uses for it.

Pushing that a bit further, it could even push the mtimes through the edges of the build graph as it updates them, so there'd be no "thinking" to do at all when ninja is invoked: just look up the list of command lines that need to run to bring a particular target up to date.

It's sort of cheating of course, but hey, this is Windows. Cheating is the only way to win. Of course, it creates some sort of IPC or synchronization, and adds to overall complexity, so it's certainly not free.

 
> I can put cmd /c everywhere for now, but for something that's going to be
> used on every invocation of the compiler, it seems like avoiding that would
> be better.

Playing devil's advocate, is it a real problem?  I had figured that,
even if Windows process startup is 50ms, 50ms is still small relative
to the time of a compile.  And since this happens as part of the
compile step, it's both not in the critical path and it parallelizes.

You're right, it's probably not worth worrying about. The only other reason I try to avoid cmd is that its quoting behaviour sucks (see "multiple commands" here http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/cmd.mspx?mfr=true). e.g. as soon as /D"blah" gets added to the command line, someone's previously working cmd /c line might break unless they knew the intricacies of what's going on. There's set of people writing .build files is probably small enough and  knowledgeable enough that it's probably not a big deal.

And, all that said, it'll be quite a while before I have a big project (i.e. Chromium) working well enough to experiment with improving speed of dependency, so this is probably all premature for the time being anyway.

scott

Scott Graham

unread,
Feb 16, 2012, 12:13:59 PM2/16/12
to ninja...@googlegroups.com
That sounds great Petr, looking forward to seeing your patches.

While I agree that keeping .d, /showIncludes, etc. as separate tools is nicer for a lot of reasons, if we're able to improve speed by conflating the concerns, it might be worth considering.

Scott Graham

unread,
Mar 1, 2012, 5:17:44 PM3/1/12
to ninja...@googlegroups.com
On Wed, Feb 15, 2012 at 5:04 PM, Evan Martin <mar...@danga.com> wrote:

> Specifically | syntax doesn't work because CreateProcess is used to launch
> which doesn't implicate a shell.
>
> I can put cmd /c everywhere for now, but for something that's going to be
> used on every invocation of the compiler, it seems like avoiding that would
> be better.

Playing devil's advocate, is it a real problem?  I had figured that,
even if Windows process startup is 50ms, 50ms is still small relative
to the time of a compile.  And since this happens as part of the
compile step, it's both not in the critical path and it parallelizes.

> I'd prefer to switch it around and have it look something more
> like:
>
> ninja-deplist-helper -f cl -o a.dep -- cl /nologo /showIncludes /c a.c ...
>
> Does that seem too ugly?

That seems fine to me.  I had avoided it because interprocess
communication on Windows is a mystery to me.


I didn't realize this until just now, but it turns out there's another problem with using "cl ... | ninja-deplist-helper ...". Annoyingly enough, that always returns the exit code of ninja-deplist-helper, rather than cl, even if cl fails.

Nico Weber

unread,
Mar 1, 2012, 5:19:24 PM3/1/12
to ninja...@googlegroups.com

In bash, you could do `cl ... > >(ninja-deplist-helper ...)` to always
get the exit code of cl (ninja-deplist-helper probably doesn't fail?
:-) ). Maybe cmd has something similar?

Nico

Scott Graham

unread,
Mar 2, 2012, 3:57:43 PM3/2/12
to ninja...@googlegroups.com
I got the main Chromium .dll to build on Windows using ninja. That's not the whole project, but it's the primary & largest target, so should be pretty representative.

It looks like the times are similar to previously reported times for large projects. Deps file loading completely dominates when cold, deps & stat are similar %'s when hot. This is using the "deplist" branch, which makes the depfiles a simple binary file rather than a text file requiring parsing, but is still one dep file per obj file.

One thing I've noticed is that compared to an identical spec Linux machine, it seems to be easier to "lose" your hot cache state, and get worse performance on a subsequent incremental build. So, it may be more useful to second guess the system cache with a customized one. 6s is a little longer than I'd like to start too, ideally.

Data below for "ninja -d stats chrome.dll" (the hash load looks odd? but I'm not clear on what that's really reporting, and doesn't seem to be too relevant to the profile anyway)

COLD:

metric                  count   total (ms)      avg (us)
.ninja parse            633     912.3           1441.3
canonicalize str        106245  0.4             0.0
canonicalize path       106245  0.3             0.0
lookup node             2867662 28.0            0.0
node stat               36863   9195.2          249.4
deplist load            8245    202746.9        24590.3

path->node hash load 11447.25 (45789 entries / 4 buckets)


HOT:

metric                  count   total (ms)      avg (us)
.ninja parse            633     906.6           1432.2
canonicalize str        106245  0.4             0.0
canonicalize path       106245  0.3             0.0
lookup node             2867580 12.9            0.0
node stat               36863   3009.5          81.6
deplist load            8245    2970.1          360.2

path->node hash load 11447.25 (45789 entries / 4 buckets)


scott

On Thu, Feb 16, 2012 at 2:27 AM, Petr Wolf <petr...@gmail.com> wrote:

Scott Graham

unread,
Mar 3, 2012, 3:33:19 PM3/3/12
to ninja...@googlegroups.com
More on windows deps files and loading times for those interested. All tests/times are on Chromium's main DLL.

I wrote a dependency database to hold the equivalent of .d files. ninja and deps parser helper use it via shared memory: https://github.com/sgraham/ninja/blob/deplist/src/dep_database.cc.

This cut the hot cache deps load time from just over 3000ms to 2400ms (and way way better for cold cache). With that change, only about 60ms of 2400ms is getting the raw deps off of disk and parsing them into StringPiece vectors.

Of the remaining 2.4s, 2.3s is (was) here:

  { METRIC_RECORD("depdb add in-edges");
  // Add all its in-edges.
  for (vector<StringPiece>::iterator i = deps.begin();
       i != deps.end(); ++i, ++implicit_dep) {
    *implicit_dep = GetDepNode(state, *i);
  }
  }

I had been building ninja with VS 2008. I just tried with VS2010 instead, and that improved that block from 2.3 to 1.6s which is a nice easy improvement I should have thought of before.

The deplist branch also didn't have the improved (and maybe fixed?) version of adding in-edges that uses StringPiece directly rather than converting to string. Merging that improved performance further down to 1.2s.

Removing a few METRIC_RECORDs that I'd put into heavily called functions cut another 150ms. (-> ~1s).

The Linux version is only 880ms though, and that's loading loose .d files! Sigh.

Further investigations to do:
- The Linux build uses "gcc -MMD", whereas "cl /showIncludes" is analogous to "gcc -MD". That certainly makes for extra _stat'ing, but perhaps the extra hash lookups here are significant too. I'm going to look at this next, as stat'ing is now the biggest contributor to startup time by about 4x anyway.
- Perhaps try a memory allocator other than the default VS one
- Possibly a different hash_map or work on the compare func (e.g. https://github.com/paulhodge/EASTL/blob/master/include/EASTL/hash_map.h). Not sure if that'd be worth the effort though.

scott

Simon B.

unread,
Mar 4, 2012, 1:51:19 PM3/4/12
to ninja-build
Nice work!
One way to avoid stat() is if ninja could ask an external process (an
oracle) whether a file has changed.

It could be a small code change in ninja:
- Ask the oracle before each stat call
1) if oracle says yes/no you're done
2) on "maybe" (oracle broken or whatever) just go ahead and stat()

This would open up for different types of oracles.

On Windows it is cheap to keep track of files as they change and store
in a hash list. I have C# code somewhere doing that, though with a
separate GUI thread. (Which used to crash a lot, but it seems
Microsoft has patched something since it now works flawlessly for
several months. Also; beware that a single File-Save in your editor
will generate several events.)

The oracle would need to know when ninja completes its build target,
so as to know when to clean the hash table. To avoid race conditions
(start ninja, edit a file, start ninja again) the oracle would only
clean away entries that can be presumed to have been compiled already.
Oracle even loop:
* On file change: add to hashtable and/or clear stale-bit for the
entry
* On interrogation: mark any found entries as stale
* On ninja all done signal: clear away stale entries.

(The older idea to keep ninja running continuously is probably harder
and may make ninja slower for single-hit usage.)

On Mar 3, 9:33 pm, Scott Graham <scot...@chromium.org> wrote:
[...]
> Removing a few METRIC_RECORDs that I'd put into heavily called functions
> cut another 150ms. (-> ~1s).
>
> The Linux version is only 880ms though, and that's loading loose .d files!
[...]
> - Perhaps try a memory allocator other than the default VS one
> - Possibly a different hash_map or work on the compare func

Nico Weber

unread,
Mar 4, 2012, 4:46:31 PM3/4/12
to ninja...@googlegroups.com
Hi Scott,

On Sat, Mar 3, 2012 at 12:33 PM, Scott Graham <sco...@chromium.org> wrote:
> More on windows deps files and loading times for those interested. All
> tests/times are on Chromium's main DLL.
>
> I wrote a dependency database to hold the equivalent of .d files. ninja and
> deps parser helper use it via shared
> memory: https://github.com/sgraham/ninja/blob/deplist/src/dep_database.cc.
>
> This cut the hot cache deps load time from just over 3000ms to 2400ms (and
> way way better for cold cache). With that change, only about 60ms of 2400ms
> is getting the raw deps off of disk and parsing them into StringPiece
> vectors.
>
> Of the remaining 2.4s, 2.3s is (was) here:
>
>   { METRIC_RECORD("depdb add in-edges");
>   // Add all its in-edges.
>   for (vector<StringPiece>::iterator i = deps.begin();
>        i != deps.end(); ++i, ++implicit_dep) {
>     *implicit_dep = GetDepNode(state, *i);
>   }
>   }
>
> I had been building ninja with VS 2008. I just tried with VS2010 instead,
> and that improved that block from 2.3 to 1.6s which is a nice easy
> improvement I should have thought of before.
>
> The deplist branch also didn't have the improved (and maybe fixed?) version
> of adding in-edges that uses StringPiece directly rather than converting to
> string. Merging that improved performance further down to 1.2s.
>
> Removing a few METRIC_RECORDs that I'd put into heavily called functions cut
> another 150ms. (-> ~1s).

the way I read this is that switching from msvc2008 to 2010, rebasing
the deplist branch on top of trunk, and removing a few METRIC_RECORDs
brings warm cache perf from 3s to 1.6s already. Did I get that right?

Thanks,
Nico

Scott Graham

unread,
Mar 4, 2012, 5:45:45 PM3/4/12
to ninja...@googlegroups.com
Yup, probably pretty close, assuming the other things that have changed don't impact the performance of the rest.

Scott Graham

unread,
Mar 4, 2012, 6:00:56 PM3/4/12
to ninja...@googlegroups.com
On Sun, Mar 4, 2012 at 10:51 AM, Simon B. <simon....@gmail.com> wrote:
Nice work!
One way to avoid stat() is if ninja could ask an external process (an
oracle) whether a file has changed.


Thanks for the suggestion. This was my original plan too, but it does have some complexity. I looked into using the ChangeJournal to keep an external source of mtimes up to date.

The problem with this is (I believe) the restats. Because both ReadDirectoryChangesW and the ChangeJournal are asynchronous and may be slightly delayed compared with doing direct stats on file, the update might be delayed past when ninja would expect the time to have been updated.

So, I think the cache is only useful for initial mtimes. It's not terrible though since the initial set of stat'ing is really the most annoying as it's on the critical path for startup (vs. the restats which are distributed and done in parallel with compiles and so on).

I tried reading the MFT timestamps using FindFirstFile/FindNextFile (which doesn't require touching each individual file because the directory contains the last write time). This is much faster, it took about 500ms to get all the timestamps for my entire source tree, vs. 4000ms to stat individual files that are involved in the build (and this case is a lot fewer files).

It's slightly complex to get the right set of directories to walk, and it requires that all the file paths are "strongly" normalized: drive letter case, path case, slash direction, relative-ness, normalizing /showIncludes output, etc. otherwise the directory walking code will have a slightly different path than the one in the Node. So, I haven't got a fully working prototype or timings for this approach yet because I need to clean up our generator and the /showIncludes parser to make the paths they both output to be "cleaner".

scott

Philip Craig

unread,
Mar 5, 2012, 4:40:10 AM3/5/12
to ninja...@googlegroups.com
Would this Oracle be a persistent process (daemon), hooked up to monitor file-system changes, using inotify or similar?

Evan Martin

unread,
Mar 5, 2012, 11:12:05 AM3/5/12
to ninja...@googlegroups.com
On Sun, Mar 4, 2012 at 10:51 AM, Simon B. <simon....@gmail.com> wrote:
> Nice work!
> One way to avoid stat() is if ninja could ask an external process (an
> oracle) whether a file has changed.

For what it's worth, the dependency graph in Ninja has arrows going
"both ways": both from inputs to outputs and from outputs to inputs.
This was because I had thought that originally I would need to use an
oracle process like this.

Currently, when you ask Ninja to build an output, we follow the arrows
back to the inputs, stat'ing everything along the way.
But conceptually (and older code even did this) once you recognized
some input was dirty you could recursively mark every dependent output
as also dirty.

pcc even added logic similar to this to handle the "restat" case,
where we think an output is dirty, but once we run the dependent
command we discover it didn't change the output, which requires us to
do the same "forward" (from input->output) traversal of the dependency
graph marking files as clean.

All of this is a long way of saying that it shouldn't be *too* hard,
if you had a process (or even just subcomponent of ninja) that fed to
ninja "hey, this file has changed" events, for ninja to dynamically
track exactly what needs to be built at all times. (That was in fact
how I had originally envisioned it would work.)

Thiago Farina

unread,
Mar 5, 2012, 7:22:29 AM3/5/12
to ninja...@googlegroups.com
On Mon, Mar 5, 2012 at 6:40 AM, Philip Craig <phi...@pobox.com> wrote:
> Would this Oracle be a persistent process (daemon), hooked up to monitor
> file-system changes, using inotify or similar?
>
Doesn't tup do this?

--
Thiago

Nicolas Desprès

unread,
Mar 5, 2012, 11:19:00 AM3/5/12
to ninja...@googlegroups.com
AFAIK yes. It has a daemon that keep track of the dependency graph in a sqlite DB and these daemon can be notify by several way
- inotify
- IDE
- LD_PRELOAD on open
- and conceptually any other reliable way.

--
Nicolas Desprès

Elazar Leibovich

unread,
Mar 5, 2012, 11:33:35 AM3/5/12
to ninja...@googlegroups.com
2012/3/5 Nicolas Desprès <nicolas...@gmail.com>

On Mon, Mar 5, 2012 at 1:22 PM, Thiago Farina <tfa...@chromium.org> wrote:
On Mon, Mar 5, 2012 at 6:40 AM, Philip Craig <phi...@pobox.com> wrote:
> Would this Oracle be a persistent process (daemon), hooked up to monitor
> file-system changes, using inotify or similar?
>
Doesn't tup do this?

AFAIK yes. It has a daemon that keep track of the dependency graph in a sqlite DB and these daemon can be notify by several way
 
I don't think it has a daemon running all time. The dependencies are saved in the sqlite DB, and are later recalculated when running tup upd, or when running tup monitor.

Petr Wolf

unread,
Mar 19, 2012, 11:45:52 AM3/19/12
to ninja...@googlegroups.com
Hi all,
 
I've just published a pull request with the support for concatenated depfiles.
 
Please see https://github.com/martine/ninja/pull/258 and comment on it, if you're interested
 
Regards
Petr

Bill Hoffman

unread,
Apr 6, 2012, 1:40:05 PM4/6/12
to ninja...@googlegroups.com
On 3/19/2012 11:45 AM, Petr Wolf wrote:
> Hi all,
> I've just published a pull request with the support for concatenated
> depfiles.
> Please see https://github.com/martine/ninja/pull/258 and comment on it,
> if you're interested
> Regards
> Petr

Any chance of this getting merged? ninja is sort of useless on windows
with depend information... :)

Thanks.

-Bill

Philip Craig

unread,
Apr 11, 2012, 6:15:35 AM4/11/12
to ninja...@googlegroups.com
Yes please to merging this.
Reply all
Reply to author
Forward
0 new messages