why does Bazel burn a core (or more)?

350 views
Skip to first unread message

chris....@gmail.com

unread,
May 3, 2019, 3:22:28 PM5/3/19
to bazel-discuss
I use Bazel to build a large-ish C++ codebase on a machine with 8 hyper-threaded cores (i.e, 4 real cores.) Anecdotally it seems that Bazel itself is usually burning at least one hyper-threaded core throughout most of the build. This does not make sense to me. Is this expected?

Worth noting is that I have distcc hacked into the toolchain for lack of a better remote-execution solution. In order to make effective use of this I need to lie to Bazel about the local resource (i.e., I tell it there are something like 80 cores instead of the 8 I actually have locally.)

I'm happy to investigate and do some profiling--I'm just not sure what the most effective approach is for digging into this.

Thanks,

Chris

Julio Merino

unread,
May 3, 2019, 3:25:59 PM5/3/19
to chris....@gmail.com, bazel-discuss
Lying about the number of cores in --local_cpu_resources will indeed cause Bazel to use more resources than it should. Can you reproduce the significant CPU usage if you do not do that?

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/ade7ac13-cdbf-44f1-8f71-0db60f9bebdc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Julio Merino / go/jmmv / Blaze

chris....@gmail.com

unread,
May 3, 2019, 6:00:03 PM5/3/19
to bazel-discuss
On Friday, May 3, 2019 at 2:25:59 PM UTC-5, Julio Merino wrote:
> Lying about the number of cores in --local_cpu_resources will indeed cause Bazel to use more resources than it should. Can you reproduce the significant CPU usage if you do not do that?

I will try that out. The problem is that the wall clock times for builds are significantly longer when leaving local resources at their default values, so it is not really an effective solution. At least that has been my experience; I have not tried this recently.

Is Bazel remote execution a real thing for C++ yet? Whenever I look into it I'm left somewhat confused about the state of things.

chris....@gmail.com

unread,
May 4, 2019, 8:25:14 AM5/4/19
to bazel-discuss
On Friday, May 3, 2019 at 5:00:03 PM UTC-5, chris....@gmail.com wrote:
> On Friday, May 3, 2019 at 2:25:59 PM UTC-5, Julio Merino wrote:
> > Lying about the number of cores in --local_cpu_resources will indeed cause Bazel to use more resources than it should. Can you reproduce the significant CPU usage if you do not do that?
>
> I will try that out.

Letting Bazel decide what resources are available does seem to reduce the load of Bazel itself, but it still is surprisingly busy. For example, during a one minute window of the build Bazel ran about 190 subcommands, about 90% of them compiling C++ and the remaining linking shared objects. ps(1) shows that Bazel consumed 28 seconds of CPU time during this period, so now burning only half a core. That seems quite high.

The above was still distributing to distcc, so the cores were otherwise not very busy while running the above test. I wondered if maybe Bazel would consume less CPU if running GCC locally? I tried this and got similar resultes: 136 subcommands (with only a handful or so not GCC) in 60 seconds, ps(1) shows Bazel used 25 seconds of CPU.

Is 150 ms CPU time per subcommand typical of Bazel? I'm currently running 23.2, but I think this is characteristic of my Bazel builds over many release (i.e., so not a recent regression, either with Bazel or in my build configuration.)

Thanks,

Chris

Paul Johnston

unread,
May 4, 2019, 10:09:39 AM5/4/19
to bazel-discuss
I'm not personally a C++ developer but I can say that the abseil-cpp repository is informally used as the "hello world" example for remote execution.

Austin Schuh

unread,
May 4, 2019, 2:56:58 PM5/4/19
to Paul Johnston, bazel-discuss
Bazel checksums all the outputs.  This takes a surprising amount of CPU.

https://docs.bazel.build/versions/master/skylark/performance.html has good information about how to enable profiling as well to see what bazel is doing.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.

chris....@gmail.com

unread,
May 5, 2019, 3:18:21 PM5/5/19
to bazel-discuss
On Saturday, May 4, 2019 at 1:56:58 PM UTC-5, Austin Schuh wrote:
> Bazel checksums all the outputs.  This takes a surprising amount of CPU.

That's a a good point, though it does not seem like it should account for much of the overhead I'm seeing. I just pointed find(1) at a portion of my Bazel cache to look for object files and ran them all through sha1sum(1). The input was 9500+ files totaling more than 8GB. Computing SHA1 sums for each of these took about 1.5 seconds CPU time. I would expect Bazel was computing checksums for only a few hundred files when I was seeing it use 20+ seconds of CPU.

> https://docs.bazel.build/versions/master/skylark/performance.html has good information about how to enable profiling as well to see what bazel is doing.

Thanks for the tip. I have looked at some of this before when digging into other aspects of build performance. I'll try to use this to dig into this question sometime later this week.

Thanks again!

Chris

Austin Schuh

unread,
May 5, 2019, 6:58:21 PM5/5/19
to chris....@gmail.com, bazel-discuss
On Sun, May 5, 2019 at 12:18 PM <chris....@gmail.com> wrote:
On Saturday, May 4, 2019 at 1:56:58 PM UTC-5, Austin Schuh wrote:
> Bazel checksums all the outputs.  This takes a surprising amount of CPU.

That's a a good point, though it does not seem like it should account for much of the overhead I'm seeing.  I just pointed find(1) at a portion of my Bazel cache to look for object files and ran them all through sha1sum(1).  The input was 9500+ files totaling more than 8GB.  Computing SHA1 sums for each of these took about 1.5 seconds CPU time.  I would expect Bazel was computing checksums for only a few hundred files when I was seeing it use 20+ seconds of CPU.

I found a 3.1G file to play with.

sha1sum was 10 sec
sha256sum was 25 sec

Bazel uses sha256.  Which looks like it's 2.5 times slower than sha1.  It also uses the Java versions, not the C++ version you used.  My understanding from previous emails is that the Java version is significantly slower (~5x)  That puts it at around 20 seconds of CPU for your numbers.

There's a --experimental_multi_threaded_digest flag which allows Bazel to compute checksums in parallel if you are getting bottlenecked.

(It's apparently pretty easy to convert the Java checksum in use today to use C/C++ and patches are welcome.  It's too far down on my list to get done in a reasonable amount of time)

Austin

chris....@gmail.com

unread,
May 5, 2019, 7:35:29 PM5/5/19
to bazel-discuss
On Sunday, May 5, 2019 at 5:58:21 PM UTC-5, Austin Schuh wrote:
> On Sun, May 5, 2019 at 12:18 PM <chris....@gmail.com> wrote:
>
> On Saturday, May 4, 2019 at 1:56:58 PM UTC-5, Austin Schuh wrote:
>
> > Bazel checksums all the outputs.  This takes a surprising amount of CPU.
>
>
>
> That's a a good point, though it does not seem like it should account for much of the overhead I'm seeing.  I just pointed find(1) at a portion of my Bazel cache to look for object files and ran them all through sha1sum(1).  The input was 9500+ files totaling more than 8GB.  Computing SHA1 sums for each of these took about 1.5 seconds CPU time.  I would expect Bazel was computing checksums for only a few hundred files when I was seeing it use 20+ seconds of CPU.
>
>
>
> I found a 3.1G file to play with.
>
>
> sha1sum was 10 sec
> sha256sum was 25 sec
>
>
> Bazel uses sha256.  Which looks like it's 2.5 times slower than sha1.  It also uses the Java versions, not the C++ version you used.  My understanding from previous emails is that the Java version is significantly slower (~5x)  That puts it at around 20 seconds of CPU for your numbers.

I found a 2.8G file and compared sha1sum and sha256sum:

sha1sum: 2.73s
sha256sum: 5.97s

I'm not sure why running it across a bunch of smaller files totaling 8G was faster, maybe I made a mistake somewhere. Anyway, this lines up pretty close to your 5x estimate.

Seems like the overhead could be largely the checksum'ing. Thanks for taking the time to explain this.

Regards,

Chris

Philipp Wollermann

unread,
May 6, 2019, 8:22:24 AM5/6/19
to Austin Schuh, chris....@gmail.com, bazel-discuss
> Bazel uses sha256.  Which looks like it's 2.5 times slower than sha1.  It also uses the Java versions, not the C++ version you used.  My understanding from previous emails is that the Java version is significantly slower (~5x)  That puts it at around 20 seconds of CPU for your numbers.

I also thought so and considered writing a JNI extension for SHA256 computations for Bazel, but apparently recent versions of OpenJDK support the native SHA Extensions in CPUs, so this should already be as fast as it gets (at least on CPUs that support it):

$ java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version
     bool UseSHA                                   = true                                     {product} {default}
     bool UseSHA1Intrinsics                        = false                                 {diagnostic} {default}
     bool UseSHA256Intrinsics                      = true                                  {diagnostic} {default}
     bool UseSHA512Intrinsics                      = true                                  {diagnostic} {default}

Or am I missing something here?

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.


--
Philipp Wollermann | Software Engineer | phi...@google.com
Google Germany GmbH | Erika-Mann-Straße 33 | 80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Austin Schuh

unread,
May 7, 2019, 12:35:10 PM5/7/19
to Philipp Wollermann, chris....@gmail.com, bazel-discuss
Ah, I'm mixing up performance issues in my head.  https://github.com/bazelbuild/bazel/issues/6123 is what I was thinking of.
Reply all
Reply to author
Forward
0 new messages