My approach to port Ninja to Windows and other compilers (like cl.exe)
would be to use CMake. We could bootstrap it using one of the
existing generators and then use the upcoming Ninja Generator to test
ninja itself using itself.
Compiling Ninja with cl.exe would help to port it to Windows since it
is much easier to debug a Windows program using Visual Studio than
using the mingw toolchain.
Cheers,
--
Nicolas Desprès
It's always easiest for me to pull a branch that is a series of
commits so I can cherry-pick the good ones.
If it's easier for you, I'm happy to do that work.
Your commits currently don't have an author on them:
Author: unknown <Fran@Fran-PC.(none)>
I'd like to give you proper attribution, in the form of
Firstname Lastname <email@address>
> bootstrap.py doesn't work as it stands, if you want to invoke cl
> instead of g++.
Yes, I see. A separate file is ok for now, but I think this code will
probably rot over time.
Some other ideas:
- add more logic to bootstrap.py to make it work with cl.exe
- require users to download a ninja.exe for the first-time bootstrap
- require some additional build system like cmake
Of those, perhaps the second one isn't so bad... ?
I really do think that the easiest to solve this issue is to use
cmake. It already knows all the options to use with the given
compiler. Although, the Ninja generator is not merged yet in CMake's
next branch, we can still start to port Ninja using a Makefile based
build-system.
Cheers,
--
Nicolas Desprès
I think the most recent change I made was on the order of 5%
improvement in CPU-bound code.
That number is still 2-3x what I'd like it to be.
I will profile it soon!
I will profile it soon!
On Tue, Jan 3, 2012 at 5:12 PM, Evan Martin <mar...@danga.com> wrote:29% is in RealDiskInterface::Stat, so switching to FindFirst/Next to make the stat'ing closer to O(#dirs) rather than O(#files) might be productive. The bigger hammer would be using the USN Journal to get to O(#changes-since-last-run), but that's a bit more work. I didn't confirm... when it errors out with the "sed.sh missing..." has it stat'd everything? If it's only done some of the work then stat may completely dominate the runtime.
Hi all,
the experience from our Win32 project (roughly the size of Chromium ~ 600MB of depfiles) led to the following improvements.
During the build
1] group the .d files in larger .D files (simple concatenation), per "component". There are cca 300 components in the project
2] compress the .D files into .D.gz
On ninja startup
1] decompress the .D.gz files (if the depfiles attribute points to such)
2] read the individual .d files from it (in memory)
This has delivered cca 5-10x speedup of ninja startup. Still, it takes tens of seconds or minutes to start (before the first action is fired), depending on a PC and state of the cache.
See also two "hotspot" analysis from VTune, using "pure" ninja and individual .d files (i.e. none of the above-listed improvements). One is on a cold disk, the other from a hot (after running the same thing a couple of times). Clearly, the reading of depfiles (tsopen_nolock) dominates on a cold disk.
Note that VTune works via sampling in this mode, checking the currently active function every 10ms. Very fast methods might therefore escape the metric.
Thanks for experimenting with this and for providing the profiles,
this is awesome!
When you run cl.exe, do you pass multiple files to it? I was
wondering whether we could build source files in batches (like your
components) and generate the .d file one-to-one with those batches.
Compression is an interesting way to work around a slow file system,
but you trade off CPU for it. Do you know how much additional CPU
time it costs?
I am still really sad and sorry it takes so long to start. I haven't
put nearly as much effort into cold startup behavior because I was
focused on the edit-compile cycle, where everything is hot. What is
the fastest time you've found? If Linux Chrome is ~1s, multiple
minutes would indicate something is more than 60x slower on Windows.
> See also two "hotspot" analysis from VTune, using "pure" ninja and
> individual .d files (i.e. none of the above-listed improvements). One is on
> a cold disk, the other from a hot (after running the same thing a couple of
> times). Clearly, the reading of depfiles (tsopen_nolock) dominates on a cold
> disk.
>
>
> Note that VTune works via sampling in this mode, checking the currently
> active function every 10ms. Very fast methods might therefore escape the
> metric.
Awesome, thanks!
From a glance at the hot profile, it seems that while stat/read
definitely dominate, a lot of additional time is spent across many
other places. I recognize a few of them (like Tokenizer::PeekToken)
as places I've since optimized on trunk, but even without these
optimizations I'm still a little surprised at the numbers.
For example the third-highest entry is hashing a string, and the
fourth is lower_bound() on a hash (which I believe is part of the MSVC
hash_map<> implementation). Even calls to malloc in your hot trace
sum to more than the total time spent on Ninja in my tests. I wonder
if this points at differences in our hardware, which means some of my
recent micro-optimizations may pay off. (For example I have a branch
on my laptop that has eliminated a lot of string copies, which should
save on basic memory operations, and saves an addtional 5-10% or so
off of Linux Ninja trunk.)
But in any case, those hash functions together only add up to a
quarter of the time spent in stat/read. You might first start with
cherry-picking (or manually implementing, it is really smalll) this
patch into your branch to see how much of a benefit it has:
https://github.com/martine/ninja/commit/93c78361e30e33f950eef754742b236251e2c81e
By aggregation, you mean writing out multiple outputs' worth of
dependencies into a single file?
I think using the deplist format for that is probably a good idea, as
the depfile approach is pretty tailored to mirror gcc's output. We'll
need to extend the format to support more than one file.
In either the deplist or depfile approach, where did you specify the
path to this file such that ninja could load it? I guess you might
have needed some syntax extension?
Wow, on my train ride in I thought more about your prior mailed and
concluded more or less exactly all of what you just wrote: we should
support this in depfiles and we should cache a loaded depfile so that
we can extract info from it for all build edges that reference it.
So yes, sounds good. :)
I will hold off on landing the deplist code until I'm certain we need
it. I can adjust the format then.
> Some other ideas:
...
> - require some additional build system like cmake
I have a cmake file [1] which works on Windows and Linux,
it isn't in sync with the compiler flags of configure.py,
but when this one CMakeLists.txt could be added in ninja/misc
I will bring it up-to-date and test it with Win(msvc/mingw)/Linux/Mac.
Peter
[1]
https://github.com/syntheticpp/ninja/blob/martine-cmake/misc/CMakeLists.txt
Well, CMake has a differenet generator for each of them.
> At any rate, the decision comes down to:
> * either maintain both the python script and the cmake
> * or pick one or the other, and thus require either cmake or python in order
> to do a ninja build (on Windows or Linux or mingw).
I agree with your analysis. However, I think even if we choose cmake
we will never totaly get rid of the python dependency because there is
some helper scripts like misc/measure.py that won't be translated to
cmake script.
Cheers,
--
Nicolas Desprès
I've tested it with VS2010, the other VS versions will not be a problem.
>
> At any rate, the decision comes down to:
> * either maintain both the python script and the cmake
> * or pick one or the other, and thus require either cmake or python in
> order to do a ninja build (on Windows or Linux or mingw).
>
I would add the cmake file only as an option to the official
supported python based build system.
The python scripts is OK for bootstrapping. But the problem with the
python script is, that it is not possible to generate project files
for any IDE.
Peter
but; it doesn't build the full project, and I'm not sure why....if I have it print the commands it's doing... here's a samplecmd.exe /c cd C:\general\build\ninja\sack\debug_solution\core && cmake -G Ninja M:/sack/cmake_all/.. -DCMAKE_BUILD_TYPE=debug -DCMAKE_INSTALL_PREFIX=C:/general/build/ninja/sack/debug_out/core -DBUILD_MONOLITHIC=ON -DNEED_FREETYPE=1 -DNEED_JPEG=1 -DNEED_PNG=1 -DNEED_ZLIB=1 && c:\tools\unix\ninja.exe installbut the last step 'ninja.exe install' doesn't seem to do anything. I added an && echo test before ninja, and it does get that far. If I change the arguments to 'ninja.exe -? install' then ninja logs its usage, and exits. I tried adding '-v -n -d stats -d explain' arguments, but I get no output from ninja.I'll probably come up with a few cmake scripts that simply replicate the issue... at least try to