How does Bazel know a file changed?

1,336 views
Skip to first unread message

duane....@gmail.com

unread,
Feb 15, 2018, 10:13:36 AM2/15/18
to bazel-discuss
Hopefully this isn't too simple a question to ask on this forum. I'm trying to convince my team to switch to Bazel and I'd like to know more about it.


I read somewhere that Google internally has a FUSE system that read's/write's data to their cloud, and that as files are written, it creates a hash (or content digest) for those files. And that Bazel can quickly read those digests rather than read entire files during build.

How does it work for Linux users using Bazel outside of Google that have our own standalone networks without clouds or anything? Would we have to put our source in a special bazel file system or something? I assume Bazel doesn't read and hash every file every time to figure out what to rebuild. Seems like that would be slower than merely checking timestamps like Make. So how does that work, and is there some special kernel setting we have to turn on to have Bazel do fast incremental builds?

Austin Schuh

unread,
Feb 15, 2018, 1:58:12 PM2/15/18
to duane....@gmail.com, bazel-discuss
Take a look at the --watchfs flag.

Austin

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/61c9d419-3ced-4625-bd52-db301d2d2406%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

duane....@gmail.com

unread,
Feb 15, 2018, 3:15:48 PM2/15/18
to bazel-discuss
Thanks for the response!

Is this something that needs to be set the first time? (So that the bazel server would be watching the FS during development?) If I don't set that ever, do a lot of builds, and then decide to set --watchfs, then would that do anything the first time? Does it only speed things up on subsequent builds?

Austin Schuh

unread,
Feb 15, 2018, 6:56:33 PM2/15/18
to duane....@gmail.com, bazel-discuss
We put it in //tools:bazel.rc to enable it by default for our users.

Austin

On Thu, Feb 15, 2018 at 12:15 PM <duane....@gmail.com> wrote:
Thanks for the response!

Is this something that needs to be set the first time?  (So that the bazel server would be watching the FS during development?)  If I don't set that ever, do a lot of builds, and then decide to set --watchfs, then would that do anything the first time?  Does it only speed things up on subsequent builds?

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.

Brian Silverman

unread,
Feb 15, 2018, 7:49:28 PM2/15/18
to Austin Schuh, duane....@gmail.com, bazel-discuss
--watchfs is a startup option, so if you change it the bazel server process will be restarted. Restarting the server means re-parsing all the BUILD files etc, so it's very unhelpful for fast incremental compilation. Therefore, you probably want to always use it, which makes bazel.rc a good place for it. That said, the cache (of previously built outputs) does persist across server restarts and --watchfs/--nowatchfs, so changing is still a less expensive than starting from scratch.

If you don't use --watchfs, then bazel looks at checksums of the file contents to decide if they've changed. From what I've seen in the open-source code, Google's internal FUSE filesystem basically caches this checksum in an extended attribute (and can return it without pulling the file's whole contents across the network). I think there's some level of caching the checksums if the mtime of the file doesn't change; looking around just now it appears that DigestUtils keeps a bounded-size cache in memory of (path, inode, mtime, size) -> MD5. This means a running server process usually avoids re-checksuming all the input files each time because it only needs to actually re-read the ones which change mtime. --watchfs lets it avoid even calling stat on all the input files because it knows which ones changed. However, that doesn't help on the first build after the server process is started because it still needs to re-checksum all the files.

Something to keep in mind comparing to Make is bazel avoids re-parsing all the BUILD files and building the action graph from scratch for each incremental build. With large codebases, that can take a long time. Bazel also does that in parallel (potentially with build and/or test actions even), unlike Make. Bazel also avoids loading unnecessary parts of the action into memory at all, which also makes a big difference at even moderate scale.

Depending on your hardware, --experimental_multi_threaded_digest might also help speed things up. It's an option because apparently that slows things down a lot on hard disks, but my experience with SSDs is enabling it speeds things up significantly, especially with lots of CPU cores. Even with --watchfs, that still helps for calculating checksums of action outputs in parallel.

On Thu, Feb 15, 2018 at 6:56 PM, Austin Schuh <austin...@gmail.com> wrote:
We put it in //tools:bazel.rc to enable it by default for our users.

Austin
On Thu, Feb 15, 2018 at 12:15 PM <duane....@gmail.com> wrote:
Thanks for the response!

Is this something that needs to be set the first time?  (So that the bazel server would be watching the FS during development?)  If I don't set that ever, do a lot of builds, and then decide to set --watchfs, then would that do anything the first time?  Does it only speed things up on subsequent builds?

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/CABsbf%3DEd7UGdkhBx8qT8GdQA0-tECYK_gnSLa2EcsxYgSZjc5w%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages