Dynamic -j thread allocation

2,350 views
Skip to first unread message

Richard Geary

unread,
Jul 19, 2012, 1:54:47 PM7/19/12
to ninja...@googlegroups.com
Ninja has the -j setting to decide how parallel the build process is, but it's a static decision. My build spawns lots of lightweight builds initially, then very memory-hungry processes later. I'm running on a slice of a 6 CPU machine. The outcome is the fastest way to build is to set -j16 to start, then Ctrl-C the build when it starts to run out of memory & page to disk, then switch to -j2. I also have to watch out that I don't hog the CPU resources for other users.

Could there be a dynamic solution for this? Perhaps specify how much CPU & memory you want to use and Ninja allocates the appropriate number of threads dynamically? Also the previous timing information in the logs might provide a hint on how to long each build target will take to run. You could extend the log with peak memory requirements.

Scott Graham

unread,
Jul 19, 2012, 2:02:44 PM7/19/12
to ninja...@googlegroups.com
There's a -l option (that's an L) that I think was trying to do something like that (only Linux, and maybe Mac).

Evan Martin

unread,
Jul 19, 2012, 2:42:13 PM7/19/12
to ninja...@googlegroups.com
One idea we'd tossed around is that you could annotate a given build
rule with some "metric" attribute that Ninja would attempt to balance
without needing to know what the metric is. For example, if your
build is memory-constrained you could tag each build rule with an
estimation of how much memory that rule would take, and then you could
tell Ninja to run as many jobs as possible such that the sum of the
metric is below some specified limit.

In some sense that's just a generalization of the current -j behavior
where each job has an implied metric value of 1.

But maybe all of what I just described is too generic and not
pragmatic enough; maybe the only metrics real projects care about are
memory and CPU. I know in Chrome's case we couldn't run multiple
links at the same time due to memory constraints...

On Thu, Jul 19, 2012 at 10:54 AM, Richard Geary <richar...@gmail.com> wrote:

Nico Weber

unread,
Jul 19, 2012, 4:20:36 PM7/19/12
to ninja...@googlegroups.com
On Thu, Jul 19, 2012 at 11:42 AM, Evan Martin <mar...@danga.com> wrote:
> One idea we'd tossed around is that you could annotate a given build
> rule with some "metric" attribute that Ninja would attempt to balance
> without needing to know what the metric is. For example, if your
> build is memory-constrained you could tag each build rule with an
> estimation of how much memory that rule would take, and then you could
> tell Ninja to run as many jobs as possible such that the sum of the
> metric is below some specified limit.
>
> In some sense that's just a generalization of the current -j behavior
> where each job has an implied metric value of 1.
>
> But maybe all of what I just described is too generic and not
> pragmatic enough; maybe the only metrics real projects care about are
> memory and CPU. I know in Chrome's case we couldn't run multiple
> links at the same time due to memory constraints...

We this issue in chromium-land with compiler processes too: When ninja
(or make) builds all the V8DerivedSourcesNN.cpp files, normal
computers get very slow.

I think it'd be useful if ninja wouldn't start new processes if it
notices the system is low on memory and at least 1 ninja child is
running. I don't know how feasible it is to detect "low on memory"
though.

Nico

Nicolas Desprès

unread,
Jul 20, 2012, 2:33:43 AM7/20/12
to ninja...@googlegroups.com
It also work on Mac. It uses getloadavg(3). There is a "todo" on Windows. I don't know if it takes the memory into account. It reduces the number of jobs running down to 1 (if necessary) until the load average is less than the given limit. You can see its effect by adding the number of jobs running in your ninja status like this: export NINJA_STATUS="[%u/%r/%f] " (see http://martine.github.com/ninja/manual.html#_environment_variables). If it is not enough you can still nice(1) ninja.

Cheers,
Nico
--
Nicolas Desprès

Nicolas Desprès

unread,
Jul 20, 2012, 2:40:36 AM7/20/12
to ninja...@googlegroups.com
On Thu, Jul 19, 2012 at 8:42 PM, Evan Martin <mar...@danga.com> wrote:
One idea we'd tossed around is that you could annotate a given build
rule with some "metric" attribute that Ninja would attempt to balance
without needing to know what the metric is.  For example, if your
build is memory-constrained you could tag each build rule with an
estimation of how much memory that rule would take, and then you could
tell Ninja to run as many jobs as possible such that the sum of the
metric is below some specified limit.

Memory constraints would be easy to implement on all platform. Contrary to the load average limit.
 

In some sense that's just a generalization of the current -j behavior
where each job has an implied metric value of 1.

But maybe all of what I just described is too generic and not
pragmatic enough; maybe the only metrics real projects care about are
memory and CPU.  I know in Chrome's case we couldn't run multiple
links at the same time due to memory constraints...

That would be interesting to see if -l works in this case.
 

On Thu, Jul 19, 2012 at 10:54 AM, Richard Geary <richar...@gmail.com> wrote:
> Ninja has the -j setting to decide how parallel the build process is, but
> it's a static decision. My build spawns lots of lightweight builds
> initially, then very memory-hungry processes later. I'm running on a slice
> of a 6 CPU machine. The outcome is the fastest way to build is to set -j16
> to start, then Ctrl-C the build when it starts to run out of memory & page
> to disk, then switch to -j2. I also have to watch out that I don't hog the
> CPU resources for other users.
>
> Could there be a dynamic solution for this? Perhaps specify how much CPU &
> memory you want to use and Ninja allocates the appropriate number of threads
> dynamically? Also the previous timing information in the logs might provide
> a hint on how to long each build target will take to run. You could extend
> the log with peak memory requirements.
>



--
Nicolas Desprès

Nicolas Desprès

unread,
Jul 20, 2012, 2:45:41 AM7/20/12
to ninja...@googlegroups.com
On Thu, Jul 19, 2012 at 10:20 PM, Nico Weber <tha...@chromium.org> wrote:
On Thu, Jul 19, 2012 at 11:42 AM, Evan Martin <mar...@danga.com> wrote:
> One idea we'd tossed around is that you could annotate a given build
> rule with some "metric" attribute that Ninja would attempt to balance
> without needing to know what the metric is.  For example, if your
> build is memory-constrained you could tag each build rule with an
> estimation of how much memory that rule would take, and then you could
> tell Ninja to run as many jobs as possible such that the sum of the
> metric is below some specified limit.
>
> In some sense that's just a generalization of the current -j behavior
> where each job has an implied metric value of 1.
>
> But maybe all of what I just described is too generic and not
> pragmatic enough; maybe the only metrics real projects care about are
> memory and CPU.  I know in Chrome's case we couldn't run multiple
> links at the same time due to memory constraints...

We this issue in chromium-land with compiler processes too: When ninja
(or make) builds all the  V8DerivedSourcesNN.cpp files, normal
computers get very slow.

I think it'd be useful if ninja wouldn't start new processes if it
notices the system is low on memory and at least 1 ninja child is
running. I don't know how feasible it is to detect "low on memory"
though.

What about thresholding the ratio of free memory over the load average?

From getloadavg(3):
     The getloadavg() function returns the number of processes in the
     system run queue averaged over various periods of time. [...]

-Nico 

Nico

>
> On Thu, Jul 19, 2012 at 10:54 AM, Richard Geary <richar...@gmail.com> wrote:
>> Ninja has the -j setting to decide how parallel the build process is, but
>> it's a static decision. My build spawns lots of lightweight builds
>> initially, then very memory-hungry processes later. I'm running on a slice
>> of a 6 CPU machine. The outcome is the fastest way to build is to set -j16
>> to start, then Ctrl-C the build when it starts to run out of memory & page
>> to disk, then switch to -j2. I also have to watch out that I don't hog the
>> CPU resources for other users.
>>
>> Could there be a dynamic solution for this? Perhaps specify how much CPU &
>> memory you want to use and Ninja allocates the appropriate number of threads
>> dynamically? Also the previous timing information in the logs might provide
>> a hint on how to long each build target will take to run. You could extend
>> the log with peak memory requirements.
>>



--
Nicolas Desprès

Philip Craig

unread,
Jul 20, 2012, 4:55:26 AM7/20/12
to ninja...@googlegroups.com
We also see this issue in our builds -- on a busy machine that is doing something else memory intensive, our ninja builds can fail to fork new processes due to low memory on the (Linux) machine. We haven't solved it.

Evan Martin

unread,
Jul 20, 2012, 10:02:21 AM7/20/12
to ninja...@googlegroups.com
On Fri, Jul 20, 2012 at 1:55 AM, Philip Craig <phi...@pobox.com> wrote:
> We also see this issue in our builds -- on a busy machine that is doing
> something else memory intensive, our ninja builds can fail to fork new
> processes due to low memory on the (Linux) machine. We haven't solved it.

Does Ninja fail gracefully? I wonder if we should do something like
"wait ten seconds and try again" if fork fails.

Philip Craig

unread,
Jul 20, 2012, 10:34:02 AM7/20/12
to ninja...@googlegroups.com, mar...@danga.com
It's not very graceful. Here is a typical line:
ninja: FATAL: fork: Cannot allocate memory 

iannucci

unread,
Aug 28, 2012, 5:49:28 PM8/28/12
to ninja...@googlegroups.com, mar...@danga.com
Yeah something along these lines would be awesome...

Perhaps a metrics approach with some solid default implementations (i.e. a memory metric which is proportional to the size of the inputs, a cpu metric which takes an estimated expected load, 'no more than X of this rule at a time', etc.) could work. Keeps the ninja-side implementation clean, but addresses the practical concerns.

I'd love to help implement this.

Richard Geary

unread,
Aug 29, 2012, 3:46:21 AM8/29/12
to ninja...@googlegroups.com, mar...@danga.com
We need this dynamic thread allocation feature for our build, I was thinking of starting it next week. Metric weightings would be a very useful addition. Size of the inputs might be a good proxy for memory usage, but I don't know how closely it would track it. Some tools may require memory x2+ times the input file size, or a quadratic dependence.

It should be possible to monitor the CPU, memory & disk usage of each build edge as it runs. I presume I can store this in the build log, although this would require extending the log format. The goal would be to tell ninja how much of each resource you want it to use and it can allocate work optimally. Though there are more "unknowns" that could go wrong with this method, so I'd still want to implement the metrics allocation.

Nico Weber

unread,
Aug 29, 2012, 10:16:23 AM8/29/12
to ninja...@googlegroups.com, mar...@danga.com
For what it's worth, I prototyped free-ram-guided thread counts in https://github.com/nico/ninja/compare/memlimit a while ago for OS X, but it didn't really go anywhere. (It works well, but it has 100 MB as free memory hardcoded and it's specific to ram. I guess cpu is already handled by the existing -l mode, but doing this for disk i/o is tricky. On the plus side, it doesn't need to store anything anywhere.)

Nico

Maxim Kalaev

unread,
Aug 29, 2012, 4:45:07 PM8/29/12
to ninja...@googlegroups.com, mar...@danga.com
On Wednesday, August 29, 2012 5:16:24 PM UTC+3, Nico Weber wrote:
For what it's worth, I prototyped free-ram-guided thread counts in https://github.com/nico/ninja/compare/memlimit a while ago for OS X, but it didn't really go anywhere. (It works well, but it has 100 MB as free memory hardcoded and it's specific to ram. I guess cpu is already handled by the existing -l mode, but doing this for disk i/o is tricky. On the plus side, it doesn't need to store anything anywhere.)
I must admit that I've tried using '-l' flag to balance between CPU intensive and IO intensive build phases, bute it didn't worked well for me.
The problem is that load seems to be measured over a large window compared to time it takes to accomplish a single job.
As a result I see that ninja reacts 'too later' to load changes: spawns as crazy for 5 sec, then loads hits skies, then it waits until jobs finish and then it "rests" several seconds by running 1 job in parallel.
Have anyone used this option successfully?

Nicolas Desprès

unread,
Sep 27, 2012, 4:26:18 AM9/27/12
to ninja...@googlegroups.com, mar...@danga.com
The -l flag implementation relies on getloadavg(3) which makes it simple but it shares the drawback of getloadavg(3).

The current implementation uses the first sample returned by getloadavg(3) representing the average over the last 1 minute. So yes it does not work for short time window. Also it is not a preventive implementation. So nothing happen until the limit is reached.

I wrote it because it is useful on machine running continuous build system. These machines often run full build. In this context it works well enough.

For incremental build started manually, I think it is not the best approach. Probably a preventive one would be better. Is that easy to implement?

--
Nicolas Desprès

Reply all
Reply to author
Forward
0 new messages