scons performance

1,247 views
Skip to first unread message

MarkCallaghan

unread,
Feb 25, 2014, 10:04:08 AM2/25/14
to mongo...@googlegroups.com
After a successful build rerunning "scons all" takes a long time -- 130 seconds. Am I doing something wrong? For context, "make all" takes ~8 seconds to tell me nothing needs to be recompiled with MySQL 5.6. This is with scons versions 2.0, 2.1 and 2.2 compiling MongoDB 2.6. I don't have numbers for MongoDB 2.5 but it was also slow.

Running scons --nostrip --full --cxx=${BASE_CXX} --cc=${BASE_CC}  --allocator=tcmalloc --opt=on --mongod-concurrency-level=db -j 10 all
takes ...
5 seconds to reach -> scons: done reading SConscript files.
10 seconds to reach -> Generating build/linux2/normal/buildinfo.cpp
130 scons to reach -> scons: `all' is up to date.

Leif Walsh

unread,
Feb 25, 2014, 10:13:20 AM2/25/14
to mongo...@googlegroups.com
I have experienced the same problem with scons.  We recently replaced it with cmake for tokumx, with the ninja generator it is night and day for incremental builds.


--
You received this message because you are subscribed to the Google Groups "mongodb-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-dev...@googlegroups.com.
To post to this group, send email to mongo...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-dev.
For more options, visit https://groups.google.com/groups/opt_out.



--
Cheers,
Leif

Eric Milkie

unread,
Feb 25, 2014, 10:58:10 AM2/25/14
to mongo...@googlegroups.com
Due to the design of scons the startup time of a build can be quite slow.
You can speed up your build by only building the targets you need instead of "all", which builds a lot of unit testing targets that you may or may not be utilizing.
Some common targets are:
core (builds mongod, mongos, mongo shell)
tools (builds the mongo support tools)

FWIW, "scons all" takes less than 8 seconds on my machine to tell me that nothing needs to be recompiled, so your milage may vary.
As with many build systems, it's important to choose a proper -j concurrency value as well, to make sure your machine is heavily utilized but not overloaded.

MARK CALLAGHAN

unread,
Feb 25, 2014, 12:50:28 PM2/25/14
to mongo...@googlegroups.com
What magic gets an 8 second "scons all"? Concurrency doesn't help when there is nothing to be remade. When there is nothing to do "scons all" takes 96 and 130 seconds on my two servers. One at work with CentOS, Python 2.6.6, scons 2.2 and the other not at work with Fedora 19, Python 2.7.5, scons 2.3. Both have modern x86. With nothing to do "scons mongod" takes 34 and 43 seconds.


--
You received this message because you are subscribed to the Google Groups "mongodb-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-dev...@googlegroups.com.
To post to this group, send email to mongo...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-dev.
For more options, visit https://groups.google.com/groups/opt_out.



--
Mark Callaghan
mdca...@gmail.com

Eric Milkie

unread,
Feb 25, 2014, 1:31:08 PM2/25/14
to mongo...@googlegroups.com
Hi Mark,
I misspoke about the 8 seconds, sorry for that.
For the 2.4 version branch, on a typical dev machine, "scons mongod" can run with everything already up-to-date in about 18 seconds.  In the master branch, it's basically double that time, and the reason is due to the numerous new object and library targets we've added in the next version.
The build slowness is bothering us too.  We've done some preliminary investigations for this already and will be taking a closer look at solutions in the next release cycle along with other build improvements, starting next month.  In the meantime, to speed up my personal development, I've been using .a archives and .o object files as direct targets when invoking scons, which allows it to skip much of the dependency calculations it has to do for end targets like "mongod", but this is an annoying workaround to use routinely.
-Eric

Balint Szente

unread,
Feb 26, 2014, 2:42:45 AM2/26/14
to mongo...@googlegroups.com
I experienced this behavior of SCons with other projects as well, but
*only on Windows*.

The startup overhead for SCons on Windows is huge, on second run. It is
related somehow to the ".sconsign.dblite" file. For example for the
same project (at my work) on Linux, the SCons startup is very fast and
the .sconsign.dblite is ~4MB, but the very same project on Windows has
a huge .sconsign.dblite, around ~120-130MB, with very slow startup
after the first run.

If you delete the ".sconsign.dblite" file, you will experience a very
fast startup, but all the files will be recompiled.

As a conclusion: SCons is painfully slow on Windows to create the
target python objects *after* the first run, and this is related to
the .sconsign.dblite file, which is huge on Windows in comparison to
Linux. It is not MongoDB's fault.

But I did not investigate the cause of this odd behavior.

dl9...@darc.de

unread,
Apr 14, 2014, 5:03:59 PM4/14/14
to mongo...@googlegroups.com
Hi there,

I'm involved with SCons development and carried out a bunch of performance analysis for this build system over the last months (see http://www.scons.org/wiki/WhySconsIsNotSlow ), so I'd like to chime in on this topic a little bit. I hope you don't mind me sneaking in just like that. ;)


Am Dienstag, 25. Februar 2014 16:58:10 UTC+1 schrieb Eric Milkie:
Due to the design of scons the startup time of a build can be quite slow.

You might be right...but my experience so far is, that whenever someone claims that SCons' design is bad, he's wrong about it. Slow build and update times are often a result of misunderstanding how SCons works internally, and of following good ol' "make"-based practices that just don't fit.
I had a short look at your current build setup for the 2.6.0 version (downloaded as tarball).
You're using a single build environment, and flooding the CPPPATH variable with all the include dirs that might ever be required. Since SCons tries to resolve all implicit dependencies by scanning C/CPP (and the included headers recursively) this results in the following:
For every "include" in each single source file, SCons tries to find the requested headers by searching through the full CPPPATH list, even if it tries to compile a self-contained library. That's mainly where your current overhead comes from, you're actually forcing SCons to scan for headers in all the wrong places.
On my system (Ubuntu 12.04 LTS) a full update with "scons all" takes about 4minutes. But when I tell SCons that no dependencies changed with "scons --max-drift=1 --implicit-deps-unchanged all", this boils down to about 25seconds!

You can speed up your build by only building the targets you need instead of "all", which builds a lot of unit testing targets that you may or may not be utilizing.
Some common targets are:
core  (builds mongod, mongos, mongo shell)
tools (builds the mongo support tools)

Selecting targets helps a little, especially for a clean build. But you could significantly reduce your update times by trying to build the single libraries/programs with a minimal CPPPATH subset.
 
FWIW, "scons all" takes less than 8 seconds on my machine to tell me that nothing needs to be recompiled, so your milage may vary.  
As with many build systems, it's important to choose a proper -j concurrency value as well, to make sure your machine is heavily utilized but not overloaded.

Again, useful for clean builds...but the "-j" option has no effect on updates, sorry. And a switch to using the "timestamp" decider (just like "make") won't help either...

If you want to go about restructuring your current build descriptions and need some help with it, please come to our user ML and ask your questions there. We'll be more than happy to help you out and speed things up for mongodb development.

Best regards,

Dirk Baechle
 

Andy Schwerin

unread,
Apr 16, 2014, 10:12:09 AM4/16/14
to mongo...@googlegroups.com
Dirk,

You assert that our current overhead comes from forcing SCons to search for headers in the "wrong places".  But your proof simply involves skipping all dependency scanning. Before we re-engineer our build system to cut down the CPPPATH for some compiles, I'd like to know if you've profiled the implicit dependency scanning code to see where the time goes.  Your argument implies that you believe it goes to stat-ing files to check for their existence, rather than time spent reading and parsing those files.  Do you have profiles or measurements that back up that claim, or a suggestion for tools we might use to verify it?

Thanks, by the way, for investigating.  We are very interested in build performance.

-Andy


--
You received this message because you are subscribed to the Google Groups "mongodb-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-dev...@googlegroups.com.
To post to this group, send email to mongo...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-dev.
For more options, visit https://groups.google.com/d/optout.

dl9...@darc.de

unread,
Apr 17, 2014, 7:06:52 PM4/17/14
to mongo...@googlegroups.com
Hi Andy,

and thanks for your reply. It's not only the scanning/stating of files that affects performance in your case.

What I did with mongodb so far was:

1.) Looking at the output of the single compile and linker commands, as displayed on stdout during a "clean" build.
2.) Inspecting the top-level SConstruct and some of the library SConscripts, to check how you setup things like Environments and VariantDirs.
3.) Ran an SCons update for mongodb with the "--debug=memoizer" option, showing some statistics about internal caching hits/misses:

Memoizer (memory cache) hits and misses:
       5387 hits      73 misses    Base.Rfindalldirs()
       1562 hits      32 misses    Base._gsm()
    3319048 hits    8275 misses    Base._save_str()
          0 hits       0 misses    Base.rentry()
      20330 hits   19414 misses    Base.stat()
      17289 hits    1236 misses    BuilderBase.get_src_builders()
      15791 hits    1236 misses    BuilderBase.subst_src_suffixes()
      11131 hits     281 misses    Dir.get_all_rdirs()
          0 hits       4 misses    Dir.rel_path()
      52142 hits    5867 misses    Dir.srcdir_find_file()
     238533 hits     293 misses    Dir.srcdir_list()
       7774 hits    3246 misses    Executor.get_build_env()
       4593 hits    1578 misses    Executor.get_contents()
          0 hits    1558 misses    Executor.get_unignored_sources()
     217028 hits   10844 misses    File.exists()
     534399 hits  172351 misses    File.get_found_includes()
       6461 hits    4938 misses    File.get_size()
      34507 hits    5034 misses    File.get_stored_info()
      16328 hits    4992 misses    File.get_timestamp()
     198274 hits   15475 misses    File.rfile()
     725133 hits   15802 misses    FileFinder.find_file()
          0 hits   13050 misses    Node._children_get()
    1303888 hits    6465 misses    Node.get_build_env()
       5963 hits      13 misses    PathListCache.PathList()

Quite a lot of find_includes (resulting in stat() calls) there, for a project with roughly 3500 header and 1500 source files, don't you think? ;)

4.)  Patched the SCons sources locally, such that no MD5 computing would be performed internally on the file's contents, only on the file's size converted to a string. Having the MD5 stuff out of the way, I ran another SCons update...which would reduce the total update time from 82s to 55s. The difference seems to be based on the fact that all your main programs, "mongo*" and such, have sizes of 300-350MB...so MD5ing the total of 23GB that get created by a build needs a little longer. ;) Do you really have to link your main programs against all these single object files? This is definitely another point to investigate.

5.) Then I profiled an update run of the patched SCons version, to check on which parts the remaing 55s are spent. You'll find the SVG attached, which shows again the large number of calls to the find_include() and get_found_includes() methods. Both are called more than 700000 times, which stresses the build significantly.
I mentioned a Wiki page in my last post, please check it out if you like. It has pointers to the repos where I archive all the test scripts and setups that I used for profiling SCons. I'm basically using cProfile for this, and then converting the profile results (pstats format) into DOT files, showing the call graph.

I hope this is convincing enough for you, to maybe have another close look at your current build setup.

Dirk

P.S.: I don't want to prove anything, and you aren't doing anything wrong in your current build. But if you're after a little more speed, I definitely see some potential there. The actual restructuring is rather straightforward, and can be done folder by folder. You'd have to clone your current main environment in each SConscript file:

Import('env')
localEnv = env.Clone()

, then append only the required CPPPATH variables for the local Libraries/Programs. You can keep the setup of all the other environment variables like paths or compiler options like "-pipe -O3" in the main "env". For being able to switch between a "yaml" library, provided by either the system's package manager or your local copy, you could define its path as variable env['YAMLCPPP'] = ... in your global setup. In the local SConscript you could then reference it by

localEnv.Append(CPPPATH=['$YAMLCPPP'])

Just as a first idea...untested, and from the top of my head.

mongodb_update.svg

Andy Schwerin

unread,
Apr 18, 2014, 11:24:58 AM4/18/14
to mongo...@googlegroups.com
Thanks, Dirk.  Build system performance is a frequent topic of conversation among MongoDB devs, and this will give us a good jumping off point.


Andrew Morrow

unread,
Jun 19, 2014, 4:39:18 PM6/19/14
to mongo...@googlegroups.com

Hi Dirk -

I wanted to let you know that I've been experimenting with some of the ideas you outlined above. All of my experiments were based on building just the 'mongod' target, with 'scons -jN --dbg=on ./mongod' or some close variant, as this is the most frequently built target.

Taking your advice on reducing the CPPPATH did work to some degree. On my linux machine, a no-op rebuild of mongod with 'fat' include paths takes about 36 seconds when using -j24 with 24 cores. When reduced to 'thin' include paths on an as-needed-per-library basis, that drops to about 31 seconds. That is nice, but not a big gain.

I also investigated your suggestion about not using monolithic linking. With a little hacking of our SCons environment, I was able to build with dylibs on OS X. Here, I found that this made no difference to the time to do a -j10 (on a 4 core machine) no-op rebuild of mongod. This makes sense to me: SCons still needs to check objects in either the graph mongod -> [.a] -> [.o], or mongod -> [.dylib] -> [.os]. So the total number of bytes checksummed doesn't really change much.

One thing to note is that during all no-op rebuilds, it appears that only one core is ever used. Given that, I think the biggest gain we could get here would be if the -jN value allowed N threads to parallelize the dependency checking work. Most of my machine sits idle during a no-op rebuild.

Thanks,
Andrew

dl9...@darc.de

unread,
Jun 20, 2014, 7:23:36 AM6/20/14
to mongo...@googlegroups.com
Hi Andrew,

and thanks for the update.


Am Donnerstag, 19. Juni 2014 22:39:18 UTC+2 schrieb acm:

Hi Dirk -

I wanted to let you know that I've been experimenting with some of the ideas you outlined above. All of my experiments were based on building just the 'mongod' target, with 'scons -jN --dbg=on ./mongod' or some close variant, as this is the most frequently built target.

 
Is this on a publicly accessible branch somewhere? I'd really like to have a second look at your latest changes.
 
Taking your advice on reducing the CPPPATH did work to some degree. On my linux machine, a no-op rebuild of mongod with 'fat' include paths takes about 36 seconds when using -j24 with 24 cores. When reduced to 'thin' include paths on an as-needed-per-library basis, that drops to about 31 seconds. That is nice, but not a big gain.


When discussing performance I'd like to clearly distinguish between "clean builds" (from scratch), and "zero builds" (during actual development, when everything is built, and no or only some files have changed). For the first case, that you seem to be referring to, it doesn't make sense to compare absolute runtimes in my opinion. I'd rather look at the ratio of "time spent on actual compiling/building" vs. "total time", and try to get that closer to 100% (80% or up, are very good already...depending a little on the structure of the project). It's not enough to say "this build took 36s with SCons", you actually have to add "and x seconds were overhead, where nothing was built" to make an objective statement.

This is different for "zero builds", where the user is interested in having the build system return as fast as possible. In this case all files are built already, so the actual compile commands don't contribute to the total time. Here it's okay to directly compare the results of "/usr/bin/time"....
 
[...]

One thing to note is that during all no-op rebuilds, it appears that only one core is ever used. Given that, I think the biggest gain we could get here would be if the -jN value allowed N threads to parallelize the dependency checking work. Most of my machine sits idle during a no-op rebuild.


This isn't easy to change, it's a problem related to the Python GIL. It can't be circumvented unless you cut away basic parts of SCons' dependency checking...and then you can't use it anymore. ;)

So, the approach still has to be "make your -j1 build fast" instead of "shadow that your build isn't properly setup, by using more and more cores". ;)
Sorry if this sounds over-simplified, it's not meant as an offense.

Finally, if you'd like to work on this actively, I'm here and ready to help out. Just create a branch, and point me to it...I'll hack away a little and see if I can come up with a few pull requests. I'd really like to keep you developers (and your users) convinced that SCons was a good choice as build system.

Regards,

Dirk

Andy Schwerin

unread,
Jun 20, 2014, 8:23:00 AM6/20/14
to mongo...@googlegroups.com
On Fri, Jun 20, 2014 at 7:23 AM, <dl9...@darc.de> wrote:
Hi Andrew,

and thanks for the update.

Am Donnerstag, 19. Juni 2014 22:39:18 UTC+2 schrieb acm:

Hi Dirk -

I wanted to let you know that I've been experimenting with some of the ideas you outlined above. All of my experiments were based on building just the 'mongod' target, with 'scons -jN --dbg=on ./mongod' or some close variant, as this is the most frequently built target.

 
Is this on a publicly accessible branch somewhere? I'd really like to have a second look at your latest changes.
 
Taking your advice on reducing the CPPPATH did work to some degree. On my linux machine, a no-op rebuild of mongod with 'fat' include paths takes about 36 seconds when using -j24 with 24 cores. When reduced to 'thin' include paths on an as-needed-per-library basis, that drops to about 31 seconds. That is nice, but not a big gain.


When discussing performance I'd like to clearly distinguish between "clean builds" (from scratch), and "zero builds" (during actual development, when everything is built, and no or only some files have changed). For the first case, that you seem to be referring to, it doesn't make sense to compare absolute runtimes in my opinion. I'd rather look at the ratio of "time spent on actual compiling/building" vs. "total time", and try to get that closer to 100% (80% or up, are very good already...depending a little on the structure of the project). It's not enough to say "this build took 36s with SCons", you actually have to add "and x seconds were overhead, where nothing was built" to make an objective statement.

I believe Andrew meant "zero build" when he wrote "no-op build".  It was a build that ultimately determined that no work needed to be done, which is the case you expressed interest in.

I have noticed that different developers in our organization see somewhat widely varying times for their zero-builds, though I've never investigated the root cause.  It's often on similar hardware, but different devs tend to run different Linux distros, and with my luck, probably different versions of SCons and the other tools.  I usually see zero builds of the same product take 10-20 seconds, while Andrew and some others regularly see 30+ seconds.  Even at 10-20 seconds, we'd certainly be interested in shaving down the zero-build time.
 

This is different for "zero builds", where the user is interested in having the build system return as fast as possible. In this case all files are built already, so the actual compile commands don't contribute to the total time. Here it's okay to directly compare the results of "/usr/bin/time"....
 
[...]

One thing to note is that during all no-op rebuilds, it appears that only one core is ever used. Given that, I think the biggest gain we could get here would be if the -jN value allowed N threads to parallelize the dependency checking work. Most of my machine sits idle during a no-op rebuild.


This isn't easy to change, it's a problem related to the Python GIL. It can't be circumvented unless you cut away basic parts of SCons' dependency checking...and then you can't use it anymore. ;)

So, the approach still has to be "make your -j1 build fast" instead of "shadow that your build isn't properly setup, by using more and more cores". ;)
Sorry if this sounds over-simplified, it's not meant as an offense.

Finally, if you'd like to work on this actively, I'm here and ready to help out. Just create a branch, and point me to it...I'll hack away a little and see if I can come up with a few pull requests. I'd really like to keep you developers (and your users) convinced that SCons was a good choice as build system.

Regards,

Dirk

Andrew Morrow

unread,
Jun 20, 2014, 10:48:45 AM6/20/14
to mongo...@googlegroups.com
On Fri, Jun 20, 2014 at 8:22 AM, Andy Schwerin <schw...@mongodb.com> wrote:



On Fri, Jun 20, 2014 at 7:23 AM, <dl9...@darc.de> wrote:
Hi Andrew,

and thanks for the update.

Am Donnerstag, 19. Juni 2014 22:39:18 UTC+2 schrieb acm:

Hi Dirk -

I wanted to let you know that I've been experimenting with some of the ideas you outlined above. All of my experiments were based on building just the 'mongod' target, with 'scons -jN --dbg=on ./mongod' or some close variant, as this is the most frequently built target.

 
Is this on a publicly accessible branch somewhere? I'd really like to have a second look at your latest changes.


Currently generates a "two different envirnoments" warning that I've not yet sorted out, but otherwise everything should build OK. Warning: I feel entitled to rebase and force push to my personal branches.

 
 
Taking your advice on reducing the CPPPATH did work to some degree. On my linux machine, a no-op rebuild of mongod with 'fat' include paths takes about 36 seconds when using -j24 with 24 cores. When reduced to 'thin' include paths on an as-needed-per-library basis, that drops to about 31 seconds. That is nice, but not a big gain.


When discussing performance I'd like to clearly distinguish between "clean builds" (from scratch), and "zero builds" (during actual development, when everything is built, and no or only some files have changed). For the first case, that you seem to be referring to, it doesn't make sense to compare absolute runtimes in my opinion. I'd rather look at the ratio of "time spent on actual compiling/building" vs. "total time", and try to get that closer to 100% (80% or up, are very good already...depending a little on the structure of the project). It's not enough to say "this build took 36s with SCons", you actually have to add "and x seconds were overhead, where nothing was built" to make an objective statement.

I believe Andrew meant "zero build" when he wrote "no-op build".  It was a build that ultimately determined that no work needed to be done, which is the case you expressed interest in.

Definitely. Here are the numbers for a complete rebuild and for a "no-op build" or "zero build" on my machine, first run done without the 'less-includes' changes, second run done with. The SCons invocation on a 24 core machine with an SSD:

> time scons --silent --dbg=on -j24 all

//// "fat includes":

// full build
real 6m32.574s
user 117m22.574s
sys 9m42.272s

// no-op rebuild
real 1m54.708s
user 1m48.571s
sys 0m5.998s


//// "lean includes":
// full build
real 6m26.181s
user 117m28.063s
sys 9m36.597s

// no-op rebuild
real 1m47.248s
user 1m41.066s
sys 0m6.071s

These times look worse than earlier discussion because here I'm building every binary target (unit tests, server binaries, tools, etc).

 

I have noticed that different developers in our organization see somewhat widely varying times for their zero-builds, though I've never investigated the root cause.  It's often on similar hardware, but different devs tend to run different Linux distros, and with my luck, probably different versions of SCons and the other tools.  I usually see zero builds of the same product take 10-20 seconds, while Andrew and some others regularly see 30+ seconds.  Even at 10-20 seconds, we'd certainly be interested in shaving down the zero-build time.

When I strace scons during a no-op rebuild, I see it doing tons of IO, presumably MD5ing. Maybe just different quality IO subsystems?

 
 

This is different for "zero builds", where the user is interested in having the build system return as fast as possible. In this case all files are built already, so the actual compile commands don't contribute to the total time. Here it's okay to directly compare the results of "/usr/bin/time"....
 
[...]

One thing to note is that during all no-op rebuilds, it appears that only one core is ever used. Given that, I think the biggest gain we could get here would be if the -jN value allowed N threads to parallelize the dependency checking work. Most of my machine sits idle during a no-op rebuild.


This isn't easy to change, it's a problem related to the Python GIL. It can't be circumvented unless you cut away basic parts of SCons' dependency checking...and then you can't use it anymore. ;)

I'd like to understand this better. 

 

So, the approach still has to be "make your -j1 build fast" instead of "shadow that your build isn't properly setup, by using more and more cores". ;)
Sorry if this sounds over-simplified, it's not meant as an offense.
No offense taken. Obviously, making our -j1 builds as fast as possible can only be a good thing. However, I still contend that once you have done that, it should be possible to capitalize on the available cores in all phases of the build, not just when calling out to external tools.

 

Finally, if you'd like to work on this actively, I'm here and ready to help out. Just create a branch, and point me to it...I'll hack away a little and see if I can come up with a few pull requests. I'd really like to keep you developers (and your users) convinced that SCons was a good choice as build system.

We are definitely interested. We have a substantial backlog of things that we would like to see improved, both in our use of SCons, and in SCons itself. Personally, my view is that an excellent build system pays dividends in developer productivity. However, we have found that many of our efforts to improve SCons or our use of it end in failure, making it hard to justify investing further developer time. Feel free to ping me off-list if you would like to discuss those aspects, as that is really more scons-dev than mongodb-dev.


 

dl9...@darc.de

unread,
Jun 20, 2014, 5:01:03 PM6/20/14
to mongo...@googlegroups.com


Am Freitag, 20. Juni 2014 16:48:45 UTC+2 schrieb acm:



On Fri, Jun 20, 2014 at 8:22 AM, Andy Schwerin <schw...@mongodb.com> wrote:



On Fri, Jun 20, 2014 at 7:23 AM, <dl9...@darc.de> wrote:
Hi Andrew,

and thanks for the update.

Am Donnerstag, 19. Juni 2014 22:39:18 UTC+2 schrieb acm:

Hi Dirk -

I wanted to let you know that I've been experimenting with some of the ideas you outlined above. All of my experiments were based on building just the 'mongod' target, with 'scons -jN --dbg=on ./mongod' or some close variant, as this is the most frequently built target.

 
Is this on a publicly accessible branch somewhere? I'd really like to have a second look at your latest changes.



I had a first look at your changes, that got build times for a no-op update of "mongodb" down to 31 seconds (from 36s) for you. When calling "scons --dbg=on --tree=derived ./mongodb", I see that the target depends on 881 *.o files. But your patches in

src/mongo/db/fts/SConscript
src/mongo/db/sorter/SConscript
src/mongo/util/options_parser/SConscript
src/third_party/SConscript
src/third_party/s2/SConscript
src/third_party/v8-3.25/SConscript
src/third_party/v8/SConscript

affects only 187 of those object files. So, from my angle following this strategy further appears to make sense (simple extrapolation: 36.0s - (881.0/187.0 * 5.0s) = 12.44s).
 
 
 

This is different for "zero builds", where the user is interested in having the build system return as fast as possible. In this case all files are built already, so the actual compile commands don't contribute to the total time. Here it's okay to directly compare the results of "/usr/bin/time"....
 
[...]

One thing to note is that during all no-op rebuilds, it appears that only one core is ever used. Given that, I think the biggest gain we could get here would be if the -jN value allowed N threads to parallelize the dependency checking work. Most of my machine sits idle during a no-op rebuild.


This isn't easy to change, it's a problem related to the Python GIL. It can't be circumvented unless you cut away basic parts of SCons' dependency checking...and then you can't use it anymore. ;)

I'd like to understand this better. 

 

This would probably take reading several design documents/mail threads and also inspecting parts of the source code. If you really want to go down that road, please write to the SCons user mailing list. The core developers can then take you on a tour through the Taskmaster. ;)

Tyler Brock

unread,
Jun 20, 2014, 6:18:53 PM6/20/14
to mongo...@googlegroups.com
There is also this page which I found super interesting:


It's mostly exploratory but there are some good ideas there.

Andrew Morrow

unread,
Jun 21, 2014, 11:50:45 AM6/21/14
to mongo...@googlegroups.com

Is this on a publicly accessible branch somewhere? I'd really like to have a second look at your latest changes.



I had a first look at your changes, that got build times for a no-op update of "mongodb" down to 31 seconds (from 36s) for you. When calling "scons --dbg=on --tree=derived ./mongodb", I see that the target depends on 881 *.o files. But your patches in

src/mongo/db/fts/SConscript
src/mongo/db/sorter/SConscript
src/mongo/util/options_parser/SConscript
src/third_party/SConscript
src/third_party/s2/SConscript
src/third_party/v8-3.25/SConscript
src/third_party/v8/SConscript

affects only 187 of those object files. So, from my angle following this strategy further appears to make sense (simple extrapolation: 36.0s - (881.0/187.0 * 5.0s) = 12.44s).

I think I disagree with your assessment of how many .o builds were affected. Without my patch, the CPPPATH for every .o contains all of the include paths for all of the third party libraries. My patch reverses this, defaulting to a more or less empty CPPPATH, and adds items back on an as-needed basis.

So, for instance, it used to be that every .o we built had the v8 includes on CPPPATH:


Now only a handful of files do:


Some libraries, like boost, are systemic so cannot be scoped more narrowly.

Similarly, most of the libraries in third_party previously had the CPPPATH set up to see into the src/mongo directory for headers. After my change, that is no longer true:



 
 

This isn't easy to change, it's a problem related to the Python GIL. It can't be circumvented unless you cut away basic parts of SCons' dependency checking...and then you can't use it anymore. ;)

I'd like to understand this better. 

 

This would probably take reading several design documents/mail threads and also inspecting parts of the source code. If you really want to go down that road, please write to the SCons user mailing list. The core developers can then take you on a tour through the Taskmaster. ;)

Fair enough.

dl9...@darc.de

unread,
Jun 23, 2014, 4:38:15 PM6/23/14
to mongo...@googlegroups.com


Am Samstag, 21. Juni 2014 17:50:45 UTC+2 schrieb acm:



Is this on a publicly accessible branch somewhere? I'd really like to have a second look at your latest changes.



I had a first look at your changes, that got build times for a no-op update of "mongodb" down to 31 seconds (from 36s) for you. When calling "scons --dbg=on --tree=derived ./mongodb", I see that the target depends on 881 *.o files. But your patches in

src/mongo/db/fts/SConscript
src/mongo/db/sorter/SConscript
src/mongo/util/options_parser/SConscript
src/third_party/SConscript
src/third_party/s2/SConscript
src/third_party/v8-3.25/SConscript
src/third_party/v8/SConscript

affects only 187 of those object files. So, from my angle following this strategy further appears to make sense (simple extrapolation: 36.0s - (881.0/187.0 * 5.0s) = 12.44s).

I think I disagree with your assessment of how many .o builds were affected. Without my patch, the CPPPATH for every .o contains all of the include paths for all of the third party libraries. My patch reverses this, defaulting to a more or less empty CPPPATH, and adds items back on an as-needed basis.


Based on your patch, I started some further experiments and profilings. One major problem with your build is, that you seem to add the "-ggdb" option...even when compiling the release version. This leads to very large object and archive files (*.o/*.a) and means a lot of additional work when checking for changed contents. I wrote a small Python script that calls "md5sum" for each *.o/*.a file, and for an update (no changes) build of

/usr/bin/time scons --debug=time --dbg=on
...
scons
: done building targets.
Total build time: 68.442516 seconds
Total SConscript file execution time: 7.191091 seconds
Total SCons execution time: 61.221189 seconds
Total command execution time: 0.030236 seconds
35.96user 2.91system 1:10.33elapsed 55%CPU (0avgtext+0avgdata 201360maxresident)k
5936416inputs+21048outputs (40major+71637minor)pagefaults 0swaps

the actual computation of all the checksums takes 39.93s.

By removing the "-ggdb" option I could reduce the time for a "/usr/bin/time scons --debug=time ./mongod" update build from

Total build time: 65.365845 seconds
Total SConscript file execution time: 7.664901 seconds
Total SCons execution time: 57.687046 seconds
Total command execution time: 0.013898 seconds
31.08user 2.89system 1:07.31elapsed 50%CPU (0avgtext+0avgdata 198140maxresident)k
5993552inputs+27120outputs (47major+76141minor)pagefaults 0swaps

to

Total build time: 26.096024 seconds
Total SConscript file execution time: 5.535025 seconds
Total SCons execution time: 20.560151 seconds
Total command execution time: 0.000848 seconds
22.40user 0.76system 0:26.79elapsed 86%CPU (0avgtext+0avgdata 198112maxresident)k
53672inputs+27120outputs (11major+78172minor)pagefaults 0swaps


. In a next step, I trimmed this down even further by removing the third-party libraries (especially Boost, with its large number of intertwined headers) from CPPPATH as described in http://stackoverflow.com/questions/2441047/how-do-i-set-scons-system-include-path . New update times were

Total build time: 11.067789 seconds
Total SConscript file execution time: 3.529675 seconds
Total SCons execution time: 7.537188 seconds
Total command execution time: 0.000926 seconds
10.87user 0.42system 0:11.31elapsed 99%CPU (0avgtext+0avgdata 116648maxresident)k
0inputs+11648outputs (0major+53041minor)pagefaults 0swaps


and when adding the options "--max-drift=1 --implicit-deps-unchanged" this yields

Total build time: 7.829032 seconds
Total SConscript file execution time: 3.462231 seconds
Total SCons execution time: 4.365900 seconds
Total command execution time: 0.000901 seconds
7.74user 0.30system 0:08.06elapsed 99%CPU (0avgtext+0avgdata 113588maxresident)k
0inputs+11648outputs (0major+51085minor)pagefaults 0swaps


The remaining 4.3s for the build would be quite acceptable, I think. The 3.4s for reading all the SConscript files are mainly spent on running the Configure steps over and over again. This effort could get reduced by adding some code that writes an options.py file with the detected values at the first call, and simply reads it back in following runs. Several projects are doing this successfully, see UserGuide ( http://www.scons.org/doc/production/HTML/scons-user.html ), sect. "Reading Build Variables From a File" .

The changes I listed above are the points where you could gain some speed...now it's up to you to decide where you want to lose some security. ;)
Find attached the diff for removing the third-party libs from CPPPATH, and adding them to CXXFLAGS directly instead (Linux only).


Best regards,

Dirk

mongodb_speedup.patch

Andrew Morrow

unread,
Jun 23, 2014, 7:52:32 PM6/23/14
to mongo...@googlegroups.com
On Mon, Jun 23, 2014 at 4:38 PM, <dl9...@darc.de> wrote:



Based on your patch, I started some further experiments and profilings.

Thanks! We do appreciate your help with this.
Unfortunately, this suggestion doesn't really help us. Quick re-build cycles are most important during the edit, compile, run, debug cycle - exactly when you need the debug info. Our release builds are never incremental so we don't care too much about startup time there.

I experimented a bit with using debug fission (-gsplit-dwarf), and or assembler and link time debug compression with -Wa,--compress-debug-sections and -Wl,--compress-debug-sections=zlib to reduce the amount of data to be checksummed. That seemed like a promising avenue but unfortunately all I succeeded in doing was segfaulting ld.gold a lot. Maybe later?

In any event, I don't think turning off debug symbols is an option for us.
 
. In a next step, I trimmed this down even further by removing the third-party libraries (especially Boost, with its large number of intertwined headers) from CPPPATH as described in http://stackoverflow.com/questions/2441047/how-do-i-set-scons-system-include-path . New update times were

Total build time: 11.067789 seconds
Total SConscript file execution time: 3.529675 seconds
Total SCons execution time: 7.537188 seconds
Total command execution time: 0.000926 seconds
10.87user 0.42system 0:11.31elapsed 99%CPU (0avgtext+0avgdata 116648maxresident)k
0inputs+11648outputs (0major+53041minor)pagefaults 0swaps



I can confirm that this does speed things up. In some quick tests, by about 30%. I do have some reservations about it though, because we do sometimes commit changes to the files in third_party, and it would be unclear as a developer when I needed to handle that case as a result of a 'git pull'. We would definitely need a way to disable this feature for when people are actively working in the third_party directory. So we would need some sort of --third-party-dependency-checking=['loose', 'strict'] flag and then plumb that through to add either to CPPPATH or CXXFLAGS as appropriate.

In addition, it introduces a portability nit, in that CPPPATH knows how to inject the correct compiler flag to add something to the search path, where CXXFLAGS doesn't.

On the other hand, if this was guarded behind a flag, is that of any advantage over just saying --implicit-cache?

 
and when adding the options "--max-drift=1 --implicit-deps-unchanged" this yields

Total build time: 7.829032 seconds
Total SConscript file execution time: 3.462231 seconds
Total SCons execution time: 4.365900 seconds
Total command execution time: 0.000901 seconds
7.74user 0.30system 0:08.06elapsed 99%CPU (0avgtext+0avgdata 113588maxresident)k
0inputs+11648outputs (0major+51085minor)pagefaults 0swaps


The remaining 4.3s for the build would be quite acceptable, I think. The 3.4s for reading all the SConscript files are mainly spent on running the Configure steps over and over again. This effort

For me at least, the second run prints '(cached)' for almost all configure checks, so I don't think much time is being spent on the configure step on rebuilds.
 
could get reduced by adding some code that writes an options.py file with the detected values at the first call, and simply reads it back in following runs. Several projects are doing this successfully, see UserGuide ( http://www.scons.org/doc/production/HTML/scons-user.html ), sect. "Reading Build Variables From a File" .
 
The changes I listed above are the points where you could gain some speed...now it's up to you to decide where you want to lose some security. ;)
Find attached the diff for removing the third-party libs from CPPPATH, and adding them to CXXFLAGS directly instead (Linux only).

We've had, for a good while, a build flag --build-fast-and-loose which does the following:

if has_option('build-fast-and-loose'):
    env.Decider('MD5-timestamp')
    env.SetOption('max_drift', 1)
    env.SourceCode('.', None)
 
This does seem to accelerate things somewhat, but for whatever reason it doesn't seem to get much use in practice. Perhaps we should consider making it the default for non-release builds. I'd be interested in your thoughts on the real-world risks of the above settings for developer builds. We already forbid using it for release builds.


Overall, here is where things are for zero (a.k.a no-op) builds. Using the 'less-includes' branch (as is, using CPPPATH, not CXXFLAGS):

> scons --debug=time --dbg=on -j24 ./mongod
Total build time: 31.137041 seconds
Total SConscript file execution time: 4.179627 seconds
Total SCons execution time: 26.932774 seconds
Total command execution time: 0.024640 seconds


Adding --build-fast-and-loose:

> scons --debug=time --dbg=on -j24 ./mongod --build-fast-and-loose
Total build time: 22.553228 seconds
Total SConscript file execution time: 4.205896 seconds
Total SCons execution time: 18.322773 seconds
Total command execution time: 0.024559 seconds


Adding --implicit-cache:

> scons --debug=time --dbg=on -j24 ./mongod --build-fast-and-loose --implicit-cache
Total build time: 17.671432 seconds
Total SConscript file execution time: 4.205080 seconds
Total SCons execution time: 13.442400 seconds
Total command execution time: 0.023952 seconds

That is good progress from my initial starting point of around 35 seconds. Any thoughts on how to get insight into what that 13 seconds of SCons execution time is doing? With --build-fast-and-loose and --implicit-cache in play, I'd expect basically no MD5'ing or dependency scanning, so 13 seconds seems pretty high.

Thanks again for all of your help,
Andrew


dl9...@darc.de

unread,
Jun 24, 2014, 12:46:16 PM6/24/14
to mongo...@googlegroups.com


Am Dienstag, 24. Juni 2014 01:52:32 UTC+2 schrieb acm:



On Mon, Jun 23, 2014 at 4:38 PM, <dl9...@darc.de> wrote:



Based on your patch, I started some further experiments and profilings.

Thanks! We do appreciate your help with this.


You're very welcome.
 
 
One major problem with your build is, that you seem to add the "-ggdb" option...even when compiling the release version. This leads to very large object and archive files (*.o/*.a) and means a lot of additional work when checking for changed contents. [...]

Unfortunately, this suggestion doesn't really help us. Quick re-build cycles are most important during the edit, compile, run, debug cycle - exactly when you need the debug info. Our release builds are never incremental so we don't care too much about startup time there.


That's okay...and it's completely your choice. But if you want to check all your objects/archives for content-based changes, it takes its time.
 
[...]
I can confirm that this does speed things up. In some quick tests, by about 30%. I do have some reservations about it though, because we do sometimes commit changes to the files in third_party, and it would be unclear as a developer when I needed to handle that case as a result of a 'git pull'. We would definitely need a way to disable this feature for when people are actively working in the third_party directory. So we would need some sort of --third-party-dependency-checking=['loose', 'strict'] flag and then plumb that through to add either to CPPPATH or CXXFLAGS as appropriate.

In addition, it introduces a portability nit, in that CPPPATH knows how to inject the correct compiler flag to add something to the search path, where CXXFLAGS doesn't.

On the other hand, if this was guarded behind a flag, is that of any advantage over just saying --implicit-cache?


By appending Boost and the other third-party libs to CPPPATH you're adding a lot of children to each object/archive file, about 200-500 per target (!). This lets the dependency tree grow very fast, which stresses the whole build and not only the parts where "--implicit-cache" gets into play directly. So yes, there is a difference between the two approaches I'd say.
 
 
and when adding the options "--max-drift=1 --implicit-deps-unchanged" this yields 
[...]
 
The remaining 4.3s for the build would be quite acceptable, I think. The 3.4s for reading all the SConscript files are mainly spent on running the Configure steps over and over again. This effort

For me at least, the second run prints '(cached)' for almost all configure checks, so I don't think much time is being spent on the configure step on rebuilds.

Sure, I just wanted to mention it as an additional option...
 

We've had, for a good while, a build flag --build-fast-and-loose which does the following:

if has_option('build-fast-and-loose'):
    env.Decider('MD5-timestamp')
    env.SetOption('max_drift', 1)
    env.SourceCode('.', None)
 
This does seem to accelerate things somewhat, but for whatever reason it doesn't seem to get much use in practice. Perhaps we should consider making it the default for non-release builds. I'd be interested in your thoughts on the real-world risks of the above settings for developer builds. We already forbid using it for release builds.


The SourceCode() method is deprecated now, so it shouldn't have any negative/positive effect whatsoever in latest SCons versions 2.x and above. Switching to a different Decider function usually doesn't save you any runtime. It's a common misconception that a env.Decider("timestamp") would turn off MD5 checksum computation completely. In the background, SCons *always* computes the content signature and keeps it up-to-date in the signature database, such that it is always available. Setting a different Decider just means that a different combination of the available infos (timestamp, file length, content checksum) is used to determine whether a file has changed.
There is one exception, and this is where the "--max-drift" parameter comes to the rescue. If the file's content signature from the database is older than "max-drift" seconds, it is regarded to be "trustworthy" and won't get recomputed. So, unless you use SCons' CacheDir features where you might get trapped by time differences between all the connected remote machines, and only work on a single local filesystem, you can safely set "--max-drift=1".
 

Overall, here is where things are for zero (a.k.a no-op) builds. Using the 'less-includes' branch (as is, using CPPPATH, not CXXFLAGS):
[...]
> scons --debug=time --dbg=on -j24 ./mongod --build-fast-and-loose --implicit-cache
Total build time: 17.671432 seconds
Total SConscript file execution time: 4.205080 seconds
Total SCons execution time: 13.442400 seconds
Total command execution time: 0.023952 seconds

That is good progress from my initial starting point of around 35 seconds. Any thoughts on how to get insight into what that 13 seconds of SCons execution time is doing? With --build-fast-and-loose and --implicit-cache in play, I'd expect basically no MD5'ing or dependency scanning, so 13 seconds seems pretty high.

This would actually require another profiling run to be sure, but my first bet would be the env.subst() method which is responsible for all the variable substitution and expansion. On top of all the successful changes, you could try my "fastcpp" Tool (see https://bitbucket.org/dirkbaechle/scons_fastcpp ) and see if this saves you a few more cycles. Please take the disclaimer in the README seriously...

Dirk

Reply all
Reply to author
Forward
0 new messages