build.ninja serialization

389 views
Skip to first unread message

Shinichiro Hamaji

unread,
Jan 7, 2016, 11:12:59 AM1/7/16
to ninja...@googlegroups.com
Hi,

Recently, I was looking at ninja's NULL build performance, which is
already very fast. However, for Android build, I feel ninja's
performance is still slightly slower than I want it to be. For
example, for AOSP, it takes 3 or 4 seconds to do nothing. Ninja spends
~80% of the time and kati (a make-to-ninja translator used in Android)
spends ~20% to check if it needs to re-generate build.ninja.

I think introducing binary build.ninja has been discussed from time to
time. I created a very experimental patch to see if the idea helps:


I think the result is fairly good for Android build. It seems we can
cut down the incremental build time for Android to about half. I'm
guessing this optimization may be beneficial for smaller projects if
you don't have a beefy workstation.

Does this sound good? If maintainers are OK with this, I'll complete
the implementation and send a PR. Current implementation is just for
this experiment and I guess I'll need to rewrite almost
everything. The implementation idea in my mind is similar to Python's
.pyc files for .py files. Ninja tries to open a file like
.ninja_state.build.ninja (or .ninja_state.<.ninja specified by -f flag>).
If this file is newer than the build.ninja and ninja successfully
deserializes the data, it can skip the parse step. Ninja will output
this file when parsing happens.

There are two further optimization ideas. I'm not sure if they are good
ideas. They would be more intrusive to ninja's design than just
introducing serialization format, but let me share:

- Current implementation deserializes all data at ninja's start-up. I
think we can make NULL build even faster by delaying deserialization
of BindingEnv. Maybe we can introduce something like
SerializedBindingEnv, but I'm not sure.
- I'm also wondering if introducing interned strings to ninja helps. We
could use offsets to a serialized string pool instead of std::string
as keys of BindingEnv.

FWIW, I have tried/considered some other approaches. Here is my memo:


Suggestions are welcomed, thanks!

Nico Weber

unread,
Jan 7, 2016, 11:16:50 AM1/7/16
to Shinichiro Hamaji, ninja-build
How much faster is the binary manifest?

I think it's possible to make the text-based parser 2-3x faster, so unless it's much faster than that, improving the text parser performance might be a better first step.

If it _is_ much faster, this sounds like a good approach.

--
You received this message because you are subscribed to the Google Groups "ninja-build" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ninja-build...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shinichiro Hamaji

unread,
Jan 7, 2016, 12:45:55 PM1/7/16
to Nico Weber, ninja-build
Current implementation is ~3.5x faster for Android builds (numbers are in https://github.com/shinh/ninja/commit/d21e548309876dbfed6533c81d5151c1255f144a). I'm guessing if we can delay loading BindingEnv as I wrote (you don't need "command = xxx" until you actually decide to build the target), its win would be larger.

Evan Martin

unread,
Jan 7, 2016, 12:47:17 PM1/7/16
to Shinichiro Hamaji, ninja-build
I really like the idea of optimizing software by doing less work.

All of these performance problems are simplified if the input build
files are smaller, so any change we can make to make the input
build.ninja files small would solve performance problems without
needing to write any tricky software. I appreciate that kati is
already very smart though!

> Does this sound good? If maintainers are OK with this, I'll complete
> the implementation and send a PR. Current implementation is just for
> this experiment and I guess I'll need to rewrite almost
> everything. The implementation idea in my mind is similar to Python's
> .pyc files for .py files. Ninja tries to open a file like
> .ninja_state.build.ninja (or .ninja_state.<.ninja specified by -f flag>).
> If this file is newer than the build.ninja and ninja successfully
> deserializes the data, it can skip the parse step. Ninja will output
> this file when parsing happens.

I think Nico's advice (that we should ensure parsing time is really
the problem) is good.

But if a binary format is necessary, what do you think about defining
a binary file format and having kati output it directly?
The original philosophy of ninja is that the generator program does as
much work as possible, so having ninja parse a text file to produce a
binary file and then read the binary file later seems like extra work.

> There are two further optimization ideas. I'm not sure if they are good
> ideas. They would be more intrusive to ninja's design than just
> introducing serialization format, but let me share:
>
> - Current implementation deserializes all data at ninja's start-up. I
> think we can make NULL build even faster by delaying deserialization
> of BindingEnv. Maybe we can introduce something like
> SerializedBindingEnv, but I'm not sure.

Can you describe this idea more? Don't you need the full build graph
to determine what to run?

> - I'm also wondering if introducing interned strings to ninja helps. We
> could use offsets to a serialized string pool instead of std::string
> as keys of BindingEnv.

Do lookups in BindingEnv show up in profiles?
Can you paste a profile just so I can follow along in the code to
understand this problem better?

...I probably should try making a kati+android build so I can better
understand all of this, hmm.

Robert Iannucci

unread,
Jan 7, 2016, 1:31:49 PM1/7/16
to Evan Martin, Shinichiro Hamaji, ninja-build

(I would just like to interject that if a binary protocol is warranted, protobuf might be a perfect way to define the format, especially if two different codebase are communicating via it. *goes back to lurk mode*)


Ben Boeckel

unread,
Jan 7, 2016, 1:43:20 PM1/7/16
to Robert Iannucci, Evan Martin, Shinichiro Hamaji, ninja-build
On Thu, Jan 07, 2016 at 18:31:38 +0000, Robert Iannucci wrote:
> (I would just like to interject that if a binary protocol is warranted,
> protobuf might be a perfect way to define the format, especially if two
> different codebase are communicating via it. *goes back to lurk mode*)

Also, ninja should be able to spit back out the build.ninja format for
it so that it can be inspected for debugging purposes.

--Ben

Shinichiro Hamaji

unread,
Jan 8, 2016, 3:06:15 AM1/8/16
to Evan Martin, ninja-build
On Fri, Jan 8, 2016 at 2:47 AM, Evan Martin <mar...@danga.com> wrote:
I really like the idea of optimizing software by doing less work.

All of these performance problems are simplified if the input build
files are smaller, so any change we can make to make the input
build.ninja files small would solve performance problems without
needing to write any tricky software.  I appreciate that kati is
already very smart though!

 Yeah, build.ninja generated by kati is huge. Currently, kati generates a rule for each non-phony targets. I have tried a few easy-to-implement approaches to reduce the size of build.ninja, but there seemed not to be a low-hanging fruits. I still have an idea, but I'm not sure if it's feasible yet. I decided to hack ninja first because it's easier and I thought improving ninja could also benefit other projects while kati will be replaced in favor of Soong eventually.

Speaking of Soong, Soong will generate much smarter build.ninja. But still Android platform is a larger project than Chromium. A real Android target has ~4.4x build targets than Chromium has. So, I'm guessing optimizing ninja's parsing would make a visible difference even if we reduce the size of build.ninja.



> Does this sound good? If maintainers are OK with this, I'll complete
> the implementation and send a PR. Current implementation is just for
> this experiment and I guess I'll need to rewrite almost
> everything. The implementation idea in my mind is similar to Python's
> .pyc files for .py files. Ninja tries to open a file like
> .ninja_state.build.ninja (or .ninja_state.<.ninja specified by -f flag>).
> If this file is newer than the build.ninja and ninja successfully
> deserializes the data, it can skip the parse step. Ninja will output
> this file when parsing happens.

I think Nico's advice (that we should ensure parsing time is really
the problem) is good.

Understood. Nico, could you share your idea to improve the text parser? I think the ".ninja parse" phase actually does two tasks. Ninja parses build.ninja and constructs build graph at this stage. I'm guessing we need to optimize both of them to achieve 2x speedup.


But if a binary format is necessary, what do you think about defining
a binary file format and having kati output it directly?
The original philosophy of ninja is that the generator program does as
much work as possible, so having ninja parse a text file to produce a
binary file and then read the binary file later seems like extra work.

I'd prefer to have everything in ninja because

1. We can arbitrary change the format in future for further improvements if we want. We'd be able to pre-calculate all command lines before the serialization, though I'm not sure if this is effective.
2. I'd like to keep the text-format build.ninja as its extremely useful for debugging dependency.
3. The overhead is small. Both serialization and deserialization take 30% of parsing time.

I want the binary format to be similar to .ninja_deps, not an external format like build.ninja. But I'm OK with defining a binary format.


> There are two further optimization ideas. I'm not sure if they are good
> ideas. They would be more intrusive to ninja's design than just
> introducing serialization format, but let me share:
>
> - Current implementation deserializes all data at ninja's start-up. I
> think we can make NULL build even faster by delaying deserialization
> of BindingEnv. Maybe we can introduce something like
> SerializedBindingEnv, but I'm not sure.

Can you describe this idea more?  Don't you need the full build graph
to determine what to run?

We do need the full build graph but we don't need "command = something" until ninja decides to execute the command. I've just realized this idea was not good as it's very kati specific. I think delaying loading this info wouldn't help for non-kati build.ninja. Command lines consume 65% of build.ninja generated by kati, but only 0.24% of Chromium's.


> - I'm also wondering if introducing interned strings to ninja helps. We
> could use offsets to a serialized string pool instead of std::string
> as keys of BindingEnv.

Do lookups in BindingEnv show up in profiles?
Can you paste a profile just so I can follow along in the code to
understand this problem better?

Sorry, it seemed BindingEnv seemed to be a bad example, but I see some map<string/StringPiece, xxx> related functions in the profiler result:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 25.69      0.28     0.28  1525214     0.00     0.00  Lexer::ReadEvalString(EvalString*, bool, std::string*)
 11.93      0.41     0.13   215052     0.00     0.00  __gnu_cxx::hashtable<std::pair<StringPiece const, Node*>, StringPiece, __gnu_cxx::hash<StringPiece>, std::_Select1st<std::pair<StringPiece const, Node*> >, std::equal_to<StringPiece>, std::allocator<Node*> >::resize(unsigned long)
 10.09      0.52     0.11  1041394     0.00     0.00  State::LookupNode(StringPiece) const
  8.26      0.61     0.09   992133     0.00     0.00  CanonicalizePath(char*, unsigned long*, unsigned int*, std::string*)
  6.42      0.68     0.07   215052     0.00     0.00  EditDistance(StringPiece const&, StringPiece const&, bool, int)
  6.42      0.75     0.07   176078     0.00     0.00  std::_Rb_tree<std::string, std::pair<std::string const, Rule const*>, std::_Select1st<std::pair<std::string const, Rule const*> >, std::less<std::string>, std::allocator<std::pair<std::string const, Rule const*> > >::find(std::string const&)
  4.59      0.80     0.05        1    50.00    50.00  std::_Rb_tree<std::string, std::pair<std::string const, Rule const*>, std::_Select1st<std::pair<std::string const, Rule const*> >, std::less<std::string>, std::allocator<std::pair<std::string const, Rule const*> > >::_M_erase(std::_Rb_tree_node<std::pair<std::string const, Rule const*> >*)
  2.75      0.83     0.03 18106806     0.00     0.00  EvalString::AddText(StringPiece)
  2.75      0.86     0.03   356780     0.00     0.00  std::vector<EvalString, std::allocator<EvalString> >::_M_insert_aux(__gnu_cxx::__normal_iterator<EvalString*, std::vector<EvalString, std::allocator<EvalString> > >, EvalString const&)
  2.75      0.89     0.03    94292     0.00     0.00  BindingEnv::LookupRule(std::string const&)
  2.75      0.92     0.03        1    30.00    30.00  __gnu_cxx::hashtable<std::pair<StringPiece const, Node*>, StringPiece, __gnu_cxx::hash<StringPiece>, std::_Select1st<std::pair<StringPiece const, Node*> >, std::equal_to<StringPiece>, std::allocator<Node*> >::clear()
  1.83      0.94     0.02  6051326     0.00     0.00  (anonymous namespace)::HighResTimer()
  1.83      0.96     0.02  3025663     0.00     0.00  ScopedMetric::~ScopedMetric()
  0.92      0.97     0.01  3025663     0.00     0.00  ScopedMetric::ScopedMetric(Metric*)


...I probably should try making a kati+android build so I can better
understand all of this, hmm.

You'd be amused by the fact ninja can happily handle a project which is 3x or 4x larger than Chromium!

Evan Martin

unread,
Jan 8, 2016, 12:18:10 PM1/8/16
to Shinichiro Hamaji, ninja-build
> 6.42 0.68 0.07 215052 0.00 0.00
> EditDistance(StringPiece const&, StringPiece const&, bool, int)

Did something go wrong in the profile backtrace? Why does
EditDistance show up at all?

Shinichiro Hamaji

unread,
Jan 9, 2016, 12:31:53 AM1/9/16
to Evan Martin, ninja-build
Recent AOSP always runs a single step so I was specifying an unknown target when I want to check load time only.

Nico Weber

unread,
Jan 9, 2016, 8:09:21 PM1/9/16
to Shinichiro Hamaji, Evan Martin, ninja-build
On Fri, Jan 8, 2016 at 3:05 AM, 'Shinichiro Hamaji' via ninja-build <ninja...@googlegroups.com> wrote:
On Fri, Jan 8, 2016 at 2:47 AM, Evan Martin <mar...@danga.com> wrote:
I really like the idea of optimizing software by doing less work.

All of these performance problems are simplified if the input build
files are smaller, so any change we can make to make the input
build.ninja files small would solve performance problems without
needing to write any tricky software.  I appreciate that kati is
already very smart though!

 Yeah, build.ninja generated by kati is huge. Currently, kati generates a rule for each non-phony targets. I have tried a few easy-to-implement approaches to reduce the size of build.ninja, but there seemed not to be a low-hanging fruits. I still have an idea, but I'm not sure if it's feasible yet. I decided to hack ninja first because it's easier and I thought improving ninja could also benefit other projects while kati will be replaced in favor of Soong eventually.

Speaking of Soong, Soong will generate much smarter build.ninja. But still Android platform is a larger project than Chromium. A real Android target has ~4.4x build targets than Chromium has. So, I'm guessing optimizing ninja's parsing would make a visible difference even if we reduce the size of build.ninja.



> Does this sound good? If maintainers are OK with this, I'll complete
> the implementation and send a PR. Current implementation is just for
> this experiment and I guess I'll need to rewrite almost
> everything. The implementation idea in my mind is similar to Python's
> .pyc files for .py files. Ninja tries to open a file like
> .ninja_state.build.ninja (or .ninja_state.<.ninja specified by -f flag>).
> If this file is newer than the build.ninja and ninja successfully
> deserializes the data, it can skip the parse step. Ninja will output
> this file when parsing happens.

I think Nico's advice (that we should ensure parsing time is really
the problem) is good.

Understood. Nico, could you share your idea to improve the text parser? I think the ".ninja parse" phase actually does two tasks. Ninja parses build.ninja and constructs build graph at this stage. I'm guessing we need to optimize both of them to achieve 2x speedup.

https://groups.google.com/forum/#!msg/ninja-build/5ioF9byt3uw/Duyd7i84lz0J has a writeup. https://github.com/nico/ninja/commits/parse/bench0_sse_tok.cc?page=1 (all three pages) are how I added more and more stuff and measured how much each step made things slower. https://goo.gl/photos/XCuFkyPQLZXmMcmf9 is a short summary (possibly too short to make much sense to others).

I didn't get around to do more with that yet. The most visible artifact from that experiment is that you can do `ninja manifest_parser_perftest && ./manifest_parser_perftest` to generate large-ish synthetic realistic-looking .ninja files and measure how long parsing them takes. (The manifest generation is done by `misc/write_fake_manifests.py.)
 

Shinichiro Hamaji

unread,
Jan 12, 2016, 6:46:09 AM1/12/16
to Nico Weber, Evan Martin, ninja-build
Thanks for pointers!

I have tried your bench0_sse_tok.cc. I also tweaked my deserialization code to reduce unnecessary string copy (I optimized only the hottest path of kati-generated build.ninja, so this isn't very effective Chromium's).


Here's a summary:

parse time (s) (*2)Nico's "bench" (s) (*3)deserialization time (s)deserialization v2 (s) (*4)
Chromium0.3280.1430.1120.092
AOSP (aosp_arm)2.0171.0280.6840.418
Real Android target3.4671.7151.0680.627

(*2) Nico told me "-d stats" is incorrect for multiple subninjas so I added a new METRIC_RECORD in ninja.cc which is called only once

Your code is actually much faster than the current parser, but still slower than binary cache. The deserialization is 2.46x faster for AOSP and 1.55x faster for Chromium than Nico's parser. I'm not sure if this diff is considered to be large, but I think I can make the deserialization even faster.

That said, I do understand you don't like having two manifest parsers. Hmm.


Nico Weber

unread,
Jan 12, 2016, 10:13:21 AM1/12/16
to Shinichiro Hamaji, Evan Martin, ninja-build
Thank you for collecting this data! 0.6s is certainly much faster than 1.7s, so binary manifests (sadly [1]) look very interesting. It does make things more complicated, though -- I wish the parser would be faster, but getting speeding it up another 2.5x seems unlikely.

How fast are complete empty Android builds with your patch (i.e. not just parsing, but also stat()ing, deps loading, and so on)? (I'm asking to find out if even larger reorganizations might be necessary.)

How long does serialization take? Someone told me they had a 1GB manifest file (that I used to benchmark my parser changes a bit back then) -- I suppose serializing large manifests would take a while?

In general, I prefer if programs are consistently fast, not sometimes slow and sometimes fast -- so I agree with Evan it'd be nice to define a binary manifest file, so that build.ninja could either be in either text or binary format. (I don't think we should depend on protobuf, though. I do think having a "dump binary manifest as textual manifest" thingy would be useful.)

For the serialization format: Have you tried having a string table at the front and then have strings only be references to that string table? That might make deserialization even faster (but it's probably not easily doable with ninja's current internal graph structure -- but if the text parser hashed up strings, it might become feasible. It'd also require programs that generate ninja to not just stream out data, but that might be ok. But maybe it's not worth it.

We could do the .ninja_state thing in addition to that (if build.ninja is in text format, write a binary version of it to .ninja_state and use that if available, like you say), but I'm not sure it's worth the complications. (I'm not sure if comparing mtime on build.ninja and .ninja_state is good enough: A generator might only touch a subninja file if the toplevel manifest doesn't change; and so on.) On the other hand, if we did this we could not expose the binary format at first and iterate a bit on it until we're happy with it, before we make it available as public API.

I'm not sure what the desired end state would be: If we ever do a 2.0, would we drop support for textual manifests? I sometimes create ninja files from shell scripts, and creating binary files that way is more difficult. But having a slow and a fast format is opposed to ninja's "fast and simple" philosophy -- I suppose we could have a separate tool that converts textual manifests to binary manifests that scripts (and generators?) could call, and ninja 2.0 itself would only support binary manifest files (?)

Lee Winter

unread,
Jan 12, 2016, 10:52:50 AM1/12/16
to Nico Weber, Shinichiro Hamaji, Evan Martin, ninja-build
On Tue, Jan 12, 2016 at 10:13 AM, Nico Weber <tha...@chromium.org> wrote:

I suppose we could have a separate tool that converts textual manifests to binary manifests that scripts (and generators?) could call, and ninja 2.0 itself would only support binary manifest files (?)

Now you are talking.  With a separate tool ninja would have a human-readable intermediate format and an optimized (probably binary, but not necessarily) format.  The build would expect an optimized file in exactly the same way that a libarian or linker expects machine/object code instead of intermediate assembly language.

Yes this is more complicated.  However, the generator/parser ratio (reader/writer ratio) justifies the additional complexity.  With a separate tool the implementation gains many degrees of freedom at only a small price in convenience.

However, as a stepping-stone approach, it may be useful to have ninja itself call the optimizer tool if ninja's input is text.  Temporarily that eliminates almost all of the user inconvenience.  The tool can be externalized once the performance advantages are clear and compelling.

This sounds like a very useful improvement.

Lee Winter
Nashua, New Hampshire
United States of America

Peter Collingbourne

unread,
Jan 12, 2016, 2:47:54 PM1/12/16
to Shinichiro Hamaji, ninja...@googlegroups.com
Hi all,

I wanted to throw out one idea for what the binary serialization format
could look like.

It could literally just be the memory region backing the build graph
(allocated using a bump pointer allocator) dumped out to disk.

This way, "deserialization" would just be a call to mmap.

Using bump pointer allocation everywhere could be rather intrusive, but I
suspect that it could also help speed things up in the non-serialized case
as well, so it may be worth doing anyway.

To ensure that pointers in the memory region match up correctly between
runs, we can use mmap with MAP_FIXED to request a fixed address for the
memory region. If for whatever reason we cannot allocate at the fixed address
(unlikely, but could happen if we're running under a sanitizer or something),
we can always load the build manifests from text files.

We would obviously need to be careful about loading the memory only into the
exact same binary that we saved it from, and avoid pointers from our memory
region into program memory (e.g. vtable pointers), but that shouldn't be
too hard.

Peter
> --
> You received this message because you are subscribed to the Google Groups "ninja-build" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ninja-build...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
Peter

Nico Weber

unread,
Jan 12, 2016, 2:49:43 PM1/12/16
to Peter Collingbourne, Shinichiro Hamaji, ninja-build
On Tue, Jan 12, 2016 at 2:47 PM, Peter Collingbourne <pe...@pcc.me.uk> wrote:
Hi all,

I wanted to throw out one idea for what the binary serialization format
could look like.

It could literally just be the memory region backing the build graph
(allocated using a bump pointer allocator) dumped out to disk.

That's not a format that we could expect generators to write though (?)

Peter Collingbourne

unread,
Jan 12, 2016, 2:52:34 PM1/12/16
to Nico Weber, Shinichiro Hamaji, ninja-build
On Tue, Jan 12, 2016 at 02:49:42PM -0500, Nico Weber wrote:
> On Tue, Jan 12, 2016 at 2:47 PM, Peter Collingbourne <pe...@pcc.me.uk>
> wrote:
>
> > Hi all,
> >
> > I wanted to throw out one idea for what the binary serialization format
> > could look like.
> >
> > It could literally just be the memory region backing the build graph
> > (allocated using a bump pointer allocator) dumped out to disk.
> >
>
> That's not a format that we could expect generators to write though (?)

True. This assumes that we stick with text as the user-facing format, which
is my personal preference, but I know that opinions differ.

Thanks,
--
Peter

Shinichiro Hamaji

unread,
Jan 13, 2016, 7:12:31 AM1/13/16
to Nico Weber, Evan Martin, ninja-build
On Wed, Jan 13, 2016 at 12:13 AM, Nico Weber <tha...@chromium.org> wrote:
Thank you for collecting this data! 0.6s is certainly much faster than 1.7s, so binary manifests (sadly [1]) look very interesting. It does make things more complicated, though -- I wish the parser would be faster, but getting speeding it up another 2.5x seems unlikely.

How fast are complete empty Android builds with your patch (i.e. not just parsing, but also stat()ing, deps loading, and so on)? (I'm asking to find out if even larger reorganizations might be necessary.)

For a real Android target build on my workstation, ~6 secs at best. Ninja spends ~4.7 secs and kati spends 1 sec or so. Ninja's parse time is ~3.5 secs out of ~4.7 secs.


How long does serialization take? Someone told me they had a 1GB manifest file (that I used to benchmark my parser changes a bit back then) -- I suppose serializing large manifests would take a while?

Current implementation takes ~1 sec. This could be ignorable considering parsing takes 3 or 4 secs and ninja generator spends more.


In general, I prefer if programs are consistently fast, not sometimes slow and sometimes fast -- so I agree with Evan it'd be nice to define a binary manifest file, so that build.ninja could either be in either text or binary format. (I don't think we should depend on protobuf, though. I do think having a "dump binary manifest as textual manifest" thingy would be useful.)

For the serialization format: Have you tried having a string table at the front and then have strings only be references to that string table? That might make deserialization even faster (but it's probably not easily doable with ninja's current internal graph structure -- but if the text parser hashed up strings, it might become feasible. It'd also require programs that generate ninja to not just stream out data, but that might be ok. But maybe it's not worth it.

Yeah, I think having such table would be nice, but requires some intrusive changes to existing ninja's code. The "deserialization v2" I've mentioned introduced an intrusive change (note I have added EvalString::buf_ and see how this field is used in EvalString::AddText:  https://github.com/shinh/ninja/commit/21476208ee5efcccda224bb3be3129710231c549#diff-d6c0a60608a92a7888b97380e5e3746dR58), which would be similar to the string table idea but this was done only for EvalString. Unfortunately, applying this technique everywhere would require a major refactoring for ninja's code, so I agree. I'm not sure if it's worth doing for all data structures in ninja, but maybe a good to do for a few selected structures.

If we want to do the string table approach, we may want to convert all std::strings in most data structures to StringPiece (or IdentifierInfo in Nico's experimental code would be even better). I've created an experimental patch which changes all std::string to StringPiece in ninja's data structures (https://github.com/shinh/ninja/commit/d49c8601d809ffe459f3b3b3381e273362621cef). As this has less string copies, I thought this patch could make the parsing faster, but unfortunately, this didn't improve the parser performance.


We could do the .ninja_state thing in addition to that (if build.ninja is in text format, write a binary version of it to .ninja_state and use that if available, like you say), but I'm not sure it's worth the complications. (I'm not sure if comparing mtime on build.ninja and .ninja_state is good enough: A generator might only touch a subninja file if the toplevel manifest doesn't change; and so on.) On the other hand, if we did this we could not expose the binary format at first and iterate a bit on it until we're happy with it, before we make it available as public API. 

I'm not sure what the desired end state would be: If we ever do a 2.0, would we drop support for textual manifests? I sometimes create ninja files from shell scripts, and creating binary files that way is more difficult. But having a slow and a fast format is opposed to ninja's "fast and simple" philosophy -- I suppose we could have a separate tool that converts textual manifests to binary manifests that scripts (and generators?) could call, and ninja 2.0 itself would only support binary manifest files (?)

I understand introducing .ninja_state complicates ninja's code. Maybe we could add "ninja -t compile" or something first, without an automatic conversion from text to binary? During the development of the binary format, we don't specify the format of the compilation output but "ninja -t compile" produces a file which can be used as the input of ninja. Generator should be responsible to re-compile the binary whenever a user updates build.ninja or ninja binary itself. A different ninja version may have a different binary format. If the binary format seems to be working nicely, eventually we could define a binary format and optionally deprecate the text format.

Peter Collingbourne

unread,
Jan 14, 2016, 3:16:46 AM1/14/16
to Shinichiro Hamaji, ninja...@googlegroups.com
On Tue, Jan 12, 2016 at 11:47:58AM -0800, Peter Collingbourne wrote:
> Hi all,
>
> I wanted to throw out one idea for what the binary serialization format
> could look like.
>
> It could literally just be the memory region backing the build graph
> (allocated using a bump pointer allocator) dumped out to disk.
>
> This way, "deserialization" would just be a call to mmap.
>
> Using bump pointer allocation everywhere could be rather intrusive, but I
> suspect that it could also help speed things up in the non-serialized case
> as well, so it may be worth doing anyway.
>
> To ensure that pointers in the memory region match up correctly between
> runs, we can use mmap with MAP_FIXED to request a fixed address for the
> memory region. If for whatever reason we cannot allocate at the fixed address
> (unlikely, but could happen if we're running under a sanitizer or something),
> we can always load the build manifests from text files.
>
> We would obviously need to be careful about loading the memory only into the
> exact same binary that we saved it from, and avoid pointers from our memory
> region into program memory (e.g. vtable pointers), but that shouldn't be
> too hard.

I couldn't resist hacking up an implementation of this idea. It's available at
https://github.com/pcc/ninja/tree/alloc

In my local Chromium tree I measured a decrease in no-op build time for
the "chrome" target from (100-run mean) 0.89 to 0.62 seconds, or a 44%
improvement. This doesn't seem to be too much of an improvement over
Shinichiro's implementation (my local measurement is 0.68 seconds, or a 31%
improvement), so I'm undecided on whether this is worth the added complexity
over a more conventional serialization. The implementation isn't fully
optimized though, and that might help a bit.

I'm a little curious what the numbers look like for Android. Going off of
Shinichiro's figures it seems like this could help a lot more there. Maybe
if it isn't too hard I could ask someone with an Android checkout to try
out this version of Ninja.

(Instructions:

"ninja -Y FILE target" to save state to FILE,
"ninja -X FILE target" to load state from FILE).

Thanks,
--
Peter

Shinichiro Hamaji

unread,
Jan 14, 2016, 5:23:31 AM1/14/16
to Peter Collingbourne, ninja-build
Thanks for the experiment! This is very interesting.

I tried your code, with a little modification attached at the bottom of this email. Your branch seemed not to have mblock.cc and 1GB was too small for Android.

NULL build performances for a real Android target:

NULL build (sec)save state (sec)saved file size (MB)
Base4.65700
Mine2.2861.807374
Peter's1.6685.781469

Peter's code completely eliminated parsing/deserialization time. The downside was the saved state file is big (note the input ninja file size is 446MB) and saving state takes longer time. My feeling is that we'll get comfortable performance with conventional serialization especially if we introduce a string table as Nico suggested, but the instant parsing is still tempting me :) I'm also thinking a server-mode ninja would be cool, but this is an off-topic.

diff --git a/src/mblock.cc b/src/mblock.cc
new file mode 100644
index 0000000..7bfbbfd
--- /dev/null
+++ b/src/mblock.cc
@@ -0,0 +1,3 @@
+#include "mblock.h"
+
+mblock *cur_mb;
diff --git a/src/ninja.cc b/src/ninja.cc
index a113cca..8e9f226 100644
--- a/src/ninja.cc
+++ b/src/ninja.cc
@@ -184,36 +184,41 @@ struct NinjaMain : public BuildLogUser {
   void SaveState(const char *path);
 };
 
+static const size_t kStateStart = 4ULL * 1024 * 1024 * 1024;
+static const size_t kStateSize = 4ULL * 1024 * 1024 * 1024;
+
 void NinjaMain::NewState() {
-  void* mem = mmap(reinterpret_cast<void*>(1 * 1024 * 1024 * 1024),
-                   1 * 1024 * 1024 * 1024, PROT_READ | PROT_WRITE,
+  void* mem = mmap(reinterpret_cast<void*>(kStateStart),
+                   kStateSize, PROT_READ | PROT_WRITE,
                    MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
   mb_ = new (mem) mblock(static_cast<char*>(mem) + sizeof(mblock),
-                         static_cast<char*>(mem) + 1024 * 1024 * 1024);
+                         static_cast<char*>(mem) + kStateSize);
   cur_mb = mb_;
   state_ = new (*mb_) State(mb_);
 }
 
 void NinjaMain::LoadState(const char *path) {
+  METRIC_RECORD("load state");
   int fd = open(path, O_RDONLY);
   off_t size = lseek(fd, 0, SEEK_END);
-  mmap(reinterpret_cast<void*>(1 * 1024 * 1024 * 1024), size,
+  mmap(reinterpret_cast<void*>(kStateStart), size,
        PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_FIXED, fd, 0);
-  mb_ = reinterpret_cast<mblock*>(1 * 1024 * 1024 * 1024);
+  mb_ = reinterpret_cast<mblock*>(kStateStart);
   cur_mb = mb_;
-  state_ = reinterpret_cast<State*>(1 * 1024 * 1024 * 1024 + sizeof(mblock));
+  state_ = reinterpret_cast<State*>(kStateStart + sizeof(mblock));
 
-  mmap(reinterpret_cast<void*>(1 * 1024 * 1024 * 1024 + size),
-       1 * 1024 * 1024 * 1024 - size, PROT_READ | PROT_WRITE,
+  mmap(reinterpret_cast<void*>(kStateStart + size),
+       kStateSize - size, PROT_READ | PROT_WRITE,
        MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
 }
 
 void NinjaMain::SaveState(const char* path) {
+  METRIC_RECORD("save state");
   FILE* dump = fopen(path, "w");
   size_t size =
-      reinterpret_cast<uintptr_t>(mb_->begin) - 1024 * 1024 * 1024;
+      reinterpret_cast<uintptr_t>(mb_->begin) - kStateStart;
   size = (size + 4095) & -4096;
-  fwrite(reinterpret_cast<void*>(1024 * 1024 * 1024), size, 1, dump);
+  fwrite(reinterpret_cast<void*>(kStateStart), size, 1, dump);
   fclose(dump);
 }
 

Evan Martin

unread,
Jan 14, 2016, 10:54:03 AM1/14/16
to Shinichiro Hamaji, Peter Collingbourne, ninja-build
On Thu, Jan 14, 2016 at 2:23 AM, 'Shinichiro Hamaji' via ninja-build <ninja...@googlegroups.com> wrote:
NULL build (sec)save state (sec)saved file size (MB)
Base4.65700
Mine2.2861.807374
Peter's1.6685.781469

Peter's code completely eliminated parsing/deserialization time. The downside was the saved state file is big (note the input ninja file size is 446MB) and saving state takes longer time.

That is a surprise -- why is the in-memory representation larger than the source text?  Variable expansion, I guess? 

Shinichiro Hamaji

unread,
Jan 19, 2016, 3:29:12 AM1/19/16
to Evan Martin, Peter Collingbourne, ninja-build
Yeah, I think so. Peter's implementation hooks all allocations and all intermediate data will be written.

Not sure if Nico and Evan are OK with introducing binary manifest, but I've sent a PR: https://github.com/ninja-build/ninja/pull/1093

This version doesn't modify ninja's data structure yet, so is slightly slower than the code in my local repository. There's already visible improvement. Further optimization could be done in separate changes if introducing binary manifest is OK.

Thanks!

Nico Weber

unread,
Jan 19, 2016, 9:51:37 PM1/19/16
to Shinichiro Hamaji, Evan Martin, Peter Collingbourne, ninja-build
I think it's a good change in principle (I haven't looked at the code yet). I'm very behind in reviewing ninja patches. I'll try to catch up soon.

Reply all
Reply to author
Forward
0 new messages