High CPU usage for any command

383 views
Skip to first unread message

Gunnar Wagenknecht

unread,
Apr 3, 2022, 8:54:14 PM4/3/22
to bazel-discuss
Greetings,

I'm analyzing a performance problem in our Bazel environment. I did lots of optimization to move to_list operations out of analysis phase into the execution phase. However, there is one problem that remains and I wondering if this is due to the amount of files in our project.

Basically, every command invocation takes at least a minute. This makes Bazel's shell tab completion almost impossible to use.

With profiling it looks like the minute is spend entirely in fsvc.getDirtyKeys. This is on a relativ fast M1 14" MacBook Pro. Anyone knows what could be causing this or if it's just something we have to live with?

-Gunnar

prof-1.gz

Lars Clausen

unread,
Apr 4, 2022, 5:19:07 AM4/4/22
to Gunnar Wagenknecht, bazel-discuss
That definitely shouldn't be the case. I doubt it's due to your number of files, internally we have huge numbers of files, and I have never seen fsvc.dirtyKeys take that long. Some seconds, yes, but not a minute. What kind of file system are you on?

-Lars

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/F561EFBF-00CC-4AF9-A9F4-5A9098227946%40wagenknecht.org.


--
Gunnar Wagenknecht
gun...@wagenknecht.org, http://guw.io/



--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/F561EFBF-00CC-4AF9-A9F4-5A9098227946%40wagenknecht.org.

--

Lars Clausen

Software Engineer

lar...@google.com

Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Liana Sebastian

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde. 

     

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.


Gunnar Wagenknecht

unread,
Apr 4, 2022, 10:39:27 AM4/4/22
to Lars Clausen, bazel-discuss
Hi Lars,

It's encrypted APFS in Monterey 12.3.1. MacBook Pro (14-inch, 2021) Apple M1 Pro.
It's with the AARCH64 version of Zulu 11 JDK.

Is there any profiling/debugging I can do? 


-Gunnar

-- 

Lars Clausen

unread,
Apr 4, 2022, 11:35:27 AM4/4/22
to Gunnar Wagenknecht, bazel-discuss
There's some output in the server log, look for lines with "com.google.devtools.build.lib.skyframe.FilesystemValueChecker.lambda$getDirtyValues", that gives the basic time and number of files. For building Bazel on an M1 MacMini, I see building Bazel takes 85ms to check 267 nodes (872 scanned). How many nodes are we talking about for you?

There's an option `--experimental_fsvc_threads` for the number of threads used for this scanning, it defaults to 200, you could try changing that.

You can have Bazel wait for a debugger to attach by passing the startup argument `--host_jvm_debug`. The operative method is `src/main/java/com/google/devtools/build/lib/skyframe/FilesystemValueChecker.java`

-Lars

Gunnar Wagenknecht

unread,
Apr 4, 2022, 12:57:37 PM4/4/22
to Lars Clausen, bazel-discuss
On Apr 4, 2022, at 08:35, Lars Clausen <lar...@google.com> wrote:

There's some output in the server log, look for lines with "com.google.devtools.build.lib.skyframe.FilesystemValueChecker.lambda$getDirtyValues", that gives the basic time and number of files. For building Bazel on an M1 MacMini, I see building Bazel takes 85ms to check 267 nodes (872 scanned). How many nodes are we talking about for you?

Here are some from today:
Spent 813666 nanoseconds checking 2 filesystem nodes (24 scanned)
Spent 59221656458 nanoseconds checking 2983617 filesystem nodes (9131026 scanned)
Spent 18310125 nanoseconds checking 802 filesystem nodes (802 scanned)

So it is scanning quite a lot (huge mono-repo).


There's an option `--experimental_fsvc_threads` for the number of threads used for this scanning, it defaults to 200, you could try changing that.

Given the M1 Pro has 10 Cores, I'm not sure if more threads will help. I can try, though. Looking at the CPU profile the system part is pretty high.

Gunnar Wagenknecht

unread,
Apr 5, 2022, 11:48:46 AM4/5/22
to Lars Clausen, bazel-discuss
Here is some more data from today. It looks like the --experimental_fsvc_threads is making a woerse.

Interestingly enough, the CPU spinning does not happen when the server was not running. But then the execution takes a lot more time because Bazel needs to load all the packages first (7700 packages loaded, 566531 targets configured).

# first invocation when server was not running
Spent 909291 nanoseconds checking 2 filesystem nodes (24 scanned)

# common --experimental_fsvc_threads=500 in .bazelrc
Spent 58956030667 nanoseconds checking 2983617 filesystem nodes (9131026 scanned)
Spent 44046167 nanoseconds checking 802 filesystem nodes (802 scanned)

# common --experimental_fsvc_threads=1000 in .bazelrc
Spent 74890550041 nanoseconds checking 2983617 filesystem nodes (9131081 scanned)
Spent 77621041 nanoseconds checking 802 filesystem nodes (802 scanned)

In all those cases, kernel_task went to >=80% CPU utilization for the duration getDirtyValues is running. The Java process of the Bazel server peaked at ~800% during that time (probably because it's maxing out all of the eight regular cores of the M1)

I then tried the opposite direction because I found this:

But that didn't help either. It made the execution time go up. On the plus side, the kernel_task only stayed below 30% CPU utilization, which helped keeping the system usable.

# common --experimental_fsvc_threads=2 in .bazelrc
Spent 403631042667 nanoseconds checking 2983617 filesystem nodes (9131026 scanned)
Spent 15301750 nanoseconds checking 802 filesystem nodes (802 scanned)

# common --experimental_fsvc_threads=100 in .bazelrc
Spent 58712551292 nanoseconds checking 2983617 filesystem nodes (9131081 scanned)
Spent 12946709 nanoseconds checking 802 filesystem nodes (802 scanned)


So the 200 default seems to look like the sweat spot on Mac. Also, looking at the number It is a lot of filesystem nodes to go through. Probably it's really a lot. 

That leads me to a follow up question. I was under the impression that the Bazel server process would do these checks in the background and monitor the file system for changes. Thus, seeing it doing the scanning of every single command invocation, without having done any change on the file system at all is worrying me. Is there a technical nee to do this scanning on every single command? Is there a GitHub issue with some pending optimization/spike in that area?


-Gunnar

-- 
Gunnar Wagenknecht
gun...@wagenknecht.org, http://guw.io/


Gunnar Wagenknecht

unread,
Apr 5, 2022, 12:38:56 PM4/5/22
to Lars Clausen, bazel-discuss
For comparison, here is one from Linux (also Bazel 5.1). Ext4 file system.

# Linux, ext4
Spent 6778660622 nanoseconds checking 2924232 filesystem nodes (8992203 scanned)

We also see such high times on older Intel based Macs.

Gunnar Wagenknecht

unread,
Apr 8, 2022, 6:02:57 AM4/8/22
to Lars Clausen, bazel-discuss
We made progress with using the --watchfs option. This enables file system watching on macOS. It does reduce the startup time significantly for us.

The only downside it that this seems to have an impact on performance of larger builds (especially full builds). fseventsd is becoming a cpu consumer of up to 80% during large builds.

-Gunnar

--
Gunnar Wagenknecht
gun...@wagenknecht.org, http://guw.io/



Lars Clausen

unread,
Apr 25, 2022, 3:10:54 AM4/25/22
to Gunnar Wagenknecht, Janak Ramakrishnan, bazel-discuss

We hit a similar and reproducible case internally, and there's some work in that area now, e.g. https://github.com/bazelbuild/bazel/commit/73f587992435cac4dd8f8ef0490224960797b513

Janak, you can probably say more about what your changes might do for Bazel users.

-Lars

Janak Ramakrishnan

unread,
Apr 25, 2022, 11:00:45 AM4/25/22
to Lars Clausen, Gunnar Wagenknecht, bazel-discuss, Alex Jurkowski
I'm no longer working on Bazel, but that work is/was not intended to help the situation in this thread, it's targeted at enabling digests to be provided via different channels besides unix xattrs (for an internal-only use case).

--watchfs is the canonical solution for slow filesystem checking. I don't know why fseventsd would get overloaded during a full build. I think Bazel is only supposed to watch the source tree. I would try out --noexperimental_check_output_files and --noexperimental_check_external_repository_files as well, just to see. But it sounds like either Bazel is incorrectly watching the output tree (Bazel bug: maybe via convenience symlinks?) or you have a weird source tree that contains the output tree via a symlink, or your build is writing to your source tree.

Gunnar Wagenknecht

unread,
Apr 25, 2022, 1:29:50 PM4/25/22
to Janak Ramakrishnan, Lars Clausen, bazel-discuss, Alex Jurkowski

> On Apr 25, 2022, at 17:00, Janak Ramakrishnan <jan...@google.com> wrote:
> --watchfs is the canonical solution for slow filesystem checking. I don't know why fseventsd would get overloaded during a full build. I think Bazel is only supposed to watch the source tree. I would try out --noexperimental_check_output_files and --noexperimental_check_external_repository_files as well, just to see. But it sounds like either Bazel is incorrectly watching the output tree (Bazel bug: maybe via convenience symlinks?) or you have a weird source tree that contains the output tree via a symlink, or your build is writing to your source tree.

I didn't know writing to the source tree is possible. :) FWIW, the --noexperimental_check_output_files is something that we also settled on and the seems to have mitigated the issue.

-Gunnar

Lars Clausen

unread,
Jun 10, 2022, 4:45:38 AM6/10/22
to Gunnar Wagenknecht, Janak Ramakrishnan, bazel-discuss, Alex Jurkowski
Writing to the source tree from Bazel is possible, but a very bad idea generally. It's impossible (or at least should be) when running in a sandbox.

-Lars

--

Reply all
Reply to author
Forward
0 new messages