Bazel slow with sandboxing enabled

1,069 views
Skip to first unread message

Austin Schuh

unread,
Dec 22, 2015, 6:02:13 PM12/22/15
to bazel-discuss
Bazel is performing very well with --spawn_strategy=local, and poorly when using sandboxing for C++.  I have a dual socket, 16 core desktop with a PCIE solid state drive.  The hardware is fast.

I have the following in my bazelrc file:  build --local_resources 50000,16,1000

top shows a bunch of sandbox tasks who all add up to about 1 CPU worth of load, and sometimes some other tasks which run faster.

iotop shows that when bazel is reading from disk from Java, I get 600% CPU usage or more, and 100+ MB/s of disk activity.

atop shows that when it is building, we are limited on the filesystem.  The relevant line is "LVM | oot-root--lv |              |              | busy     94% |              | read       1 |              |              | write  13307 |              |              | KiB/r      4 |              | KiB/w      9 |              |              | MBr/s   0.00 |              |              | MBw/s  12.85 |              | avq    52.62 |              |              | avio 0.70 ms |"

For a solid state drive, 12.8 MB/s at 94% IO usage is pretty bad.

The build takes significantly longer when running sandboxed.  I am using the compiler from a .tar.gz in the sandbox and using a custom crosstool.

Any ideas on what is wrong?  Is there anything I can collect to help debug this?

Thanks,
  Austin

Austin Schuh

unread,
Dec 29, 2015, 11:48:33 PM12/29/15
to bazel-discuss
I did some perf benchmarking, and it is spending a lot of time in SetupDirectories.

I noticed that it was faster on another machine than my machine, and narrowed down the difference to that I'm using XFS as my filesystem, and the other box is using EXT4.  I tried building on a hard-drive with EXTR, and I was able to build faster on the hard-drive than the nvme.  Looks like the problem is related to something that Bazel is doing that XFS isn't happy with but EXT4 is.

Austin

Lukács T. Berki

unread,
Jan 4, 2016, 5:18:59 AM1/4/16
to Austin Schuh, bazel-discuss, Philipp Wollermann
+Philipp, the owner of sandboxing

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/CABsbf%3DHTVL_g_9MaY5b_7deGEnazSpcCqDCL7r3EKRCufVJJxw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.



--
Lukács T. Berki | Software Engineer | lbe...@google.com | 

Google Germany GmbH | Maximillianstr. 11-15 | 80539 München | Germany | Geschäftsführer: Matthew Scott Sucherman, Paul Terence Manicle | Registergericht und -nummer: Hamburg, HRB 86891

Austin Schuh

unread,
Jan 5, 2016, 1:55:26 PM1/5/16
to Lukács T. Berki, bazel-discuss, Philipp Wollermann
Hi Philipp,

I did some lockstat recording during a build on XFS, and I see significant contention on the filesystem locks.  I'm not sure the kernel guys are expecting that level of contention.  Attached is a trace during a point in the build where there were a lot of namespace-sandbox processes being started.  I don't have a trace right now of EXT4.

Austin
lockreport_xfs.txt

Philipp Wollermann

unread,
Jan 5, 2016, 2:44:04 PM1/5/16
to Austin Schuh, Lukács T. Berki, bazel-discuss
Thanks so much for investigating and collecting the data, Austin!

I'll check this out and see how we can improve performance on XFS.

Austin Schuh

unread,
Jan 5, 2016, 2:50:10 PM1/5/16
to Philipp Wollermann, Lukács T. Berki, bazel-discuss
Thanks PhiIipp!

I just switched my root filesystem over on the machine from XFS to EXT4, and I'm seeing better performance but I'm still seeing a bottleneck somewhere in the sandbox creation step.  I'll see sections of the build where there are a lot of cheap(ish) operations (compiling lots of simple C files), so there are a lot of sandbox creation/deletion steps.

My build is using a version of GCC and Clang that is added to the sandbox through a new_http_archive and then a custom cross-tool.  I'm not sure if that is going to contribute to the issue because of the large number of files that need to be mapped into the sandbox.

Austin

Pedro Kiefer

unread,
Jan 5, 2016, 4:15:20 PM1/5/16
to Austin Schuh, Philipp Wollermann, Lukács T. Berki, bazel-discuss
I've also notice a slow down on latest release of bazel and I too use a custom cross-tool (linaro arm-linux-gnueabi). I don't have any data like Austin, just my perception of taking way longer to build my project. The build did get more reliable, some times I would get a broken build because some dependency didn't recompile when it should, haven't got that anymore. Maybe this was my lack of knowledge of how to write a proper BUILD file, but the newer version of bazel got more strict rules for building c++ code which is great!

If we could cache the mounts for the cross-tool files across namespace-sandbox processes we would have a nice improvement on speed. But I don't have the slightest clue on how to implement that.

Pedro

Kamal Marhubi

unread,
Jan 5, 2016, 4:48:59 PM1/5/16
to Pedro Kiefer, Austin Schuh, Philipp Wollermann, Lukács T. Berki, bazel-discuss
On Tue, Jan 5, 2016 at 4:15 PM Pedro Kiefer <pki...@gmail.com> wrote:
If we could cache the mounts for the cross-tool files across namespace-sandbox processes we would have a nice improvement on speed. But I don't have the slightest clue on how to implement that.

This ought to be possible is the Bazel server is willing to keep a long-running child with those files mounted in a mount namespace. Sandbox processes could then setns(2) on the namespace before unsharing the user namespace. Not clear if this is a great idea, but it should be possible!

Brian Silverman

unread,
Jan 5, 2016, 7:31:53 PM1/5/16
to Kamal Marhubi, Pedro Kiefer, Austin Schuh, Philipp Wollermann, Lukács T. Berki, bazel-discuss
I doubt there are many actions which share completely identical mount sets. Each C++ compile action (for example) only has its .cc source file mounted, which makes them all slightly different. I think it would technically work to keep long-running children with the common parts set up and then create sub-namespaces for each action to set up the differences, but that gets really complicated to implement.

An optimization which seems easier to implement would be to bind-mount whole directory trees rather than doing each file individually when possible. Using downloaded compilers results in a lot of things like glob(['usr/include/**/*']) being included in the sandbox for each C++ compile action, which means thousands of mounts that could be simplified to just a single one (of the top-level directory) without changing what ends up mounted. I think Bazel already has the information to decide when there's a complete directory tree present, so it shouldn't slow down single-core builds.

At some point, it might even perform better to just build up the non-system (/bin, /usr/bin, etc) of the sandbox in a shmfs by copying...

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages