How to test remote caching

1,245 views
Skip to first unread message

Rob Figueiredo

unread,
Nov 6, 2017, 6:34:08 PM11/6/17
to bazel-discuss
Hi all,

I'm trying to set up remote caching for my team, but in my tests I have found that my workstation can use remote cached results but my coworker's workstation can not. I imagine that there is some input to the build process specific to our machines that is causing our cache keys to be different. Can anyone provide guidance on how I might investigate further to see what is preventing reuse?

Here are more details

(0) Our workstations are Mac OSX. Mine is 10.11.6 and coworker's I've tested with are also on 10.11.*.

(1) tools/bazel.rc

build --experimental_multi_threaded_digest=true
build --experimental_external_repositories=true
build --output_filter='^//((?!(thirdparty|thirdparty/play/framework):(twitter4j-core|play|aws-java-sdk|axiom)).)*$'

# Pass --config=cache to use these options
startup --host_jvm_args=-Dbazel.DigestFunction=sha256
build:cache --remote_rest_cache=http://inf-baz01.office.yext.com/
build:cache --experimental_strict_action_env
build:cache --experimental_remote_spawn_cache
build:cache --verbose_failures

(2) The remote cache is using this library:

(3) I am testing with this process:

$ rm -rf bazel-out/darwin_x86_64-fastbuild/bin/src
$ bazel build --config=cache //src/com/yext/util/...

I see that runs containing --config=cache finish in about 1s, a huge difference from not specifying --config=cache.
I ask my coworker to run that same command, and I see that his machine is apparently not using any of the cached libraries, recompiling both protobufs and java libraries. 

Any ideas what the issue could be or how I may investigate further?

Thank you!
Rob

Alpha Lam

unread,
Nov 6, 2017, 6:46:16 PM11/6/17
to Rob Figueiredo, bazel-discuss
Does you build rule use local system toolchains and system libraries and they are different on the 2 machines?

If the toolchains and system libs are different than the digest of them will be different and hence the cache has no effect.

Alpha

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/44d47445-d589-4ddf-881b-6629f2f611b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rob Figueiredo

unread,
Nov 6, 2017, 7:05:17 PM11/6/17
to Alpha Lam, bazel-discuss
Ah yes, sorry to omit that information. Those are all java_library and java_proto_library targets (via pubref/rules_protobuf).

Do you think that the version of XCode that we have installed would prevent java_library outputs from being shared? I can find out and test with same version.

I will try to develop a minimal reproduction that I can share on github.

Thank you for your reply

Marcel Hlopko

unread,
Nov 14, 2017, 6:24:17 PM11/14/17
to Rob Figueiredo, buc...@google.com, Alpha Lam, bazel-discuss

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/CAEpF5ehkQbcLQE90nciVgcRVkr%3DcVmPPPd1W_RYDjtzfauOpCQ%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.
--
-- 
Marcel Hlopko | Software Engineer | hlo...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | Germany | Geschäftsführer: Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

Jakob Buchgraber

unread,
Nov 16, 2017, 4:27:21 AM11/16/17
to bazel-discuss
Hi Rob,

I would be interested in a reproducer too. A simple hello world using java_proto_library or java_library should be enough? 
Additionally, there might be something different in the setup of the two machines or you might be running different Bazel
versions?

Furthermore, it would be interesting to learn whether your two builds genuinely produce different output artifacts.
A simple way to verify this is to
 1) Clear the remote cache.
 2) Build on your workstation
 3) Query the size of the remote cache directory using i.e. du -s /cache/directory
 4) Build on your co-workers workstation
 5) Query the size of the remote cache directory using i.e. du -s /cache/directory

If the sizes returned in 3) and 5) are the same, it's a bug in Bazel or the remote cache. If 5) is only slightly bigger
than 3) it's likely a bug in the action cache. If 5) is roughly twice the size of 3), then your two builds are genuinely different.

P.S.: The cache is also available as a docker image https://hub.docker.com/r/buchgr/bazel-remote-cache/

Best,
Jakob

Rob Figueiredo

unread,
Nov 17, 2017, 7:16:39 PM11/17/17
to Jakob Buchgraber, bazel-discuss
Hi Jakob,
We are definitely running the same bazel versions. The machines are not generally set up identically, but this might be the incentive to start ensuring that. I will work on getting a minimal reproduction. In the mean time, this is the result of the trial following your instructions. It demonstrates that something is being shared across machines, but most artifacts are not. Looking at one jar in particular, it is identical on both machines but was not provided from the cache. After this trial I began upgrading both machines to High Sierra and will see if it allows them to reuse more things from the cache.
Thanks for helping!
Rob


==============

Workstation (OSX 10.11.6, LLVM 8.0.0, clang 800.0.42.1)
Laptop (OSX 10.12, LLVM 8.1.0, clang 802.0.42)

## Build of util/...

On my workstation
$ rm -rf bazel-out/darwin_x86_64-fastbuild/bin/src
$ bazel build --config=cache src/com/yext/util/...
(3m15s, remote cache is 28M)

On my laptop
$ rm -rf bazel-out/darwin_x86_64-fastbuild/bin/src
$ bazel build --config=cache src/com/yext/util/...
(2m55s, remote cache is 39M)

Doing this in the reverse order, the workstation build was a bit faster at 2m30s. So, something's getting cached, just not most of it.

## Build of one package, util/time

Workstation

(1) Clean build, with remote cache empty

$ rm -rf bazel-out/darwin_x86_64-fastbuild/bin/src
$ bazel build --config=cache src/com/yext/util/time/...
INFO: Analysed target //src/com/yext/util/time:time.
INFO: Found 1 target...
Target //src/com/yext/util/time:time up-to-date:
  bazel-bin/src/com/yext/util/time/libtime.jar
INFO: Elapsed time: 3.671s, Critical Path: 3.55s
INFO: Build completed successfully, 27 total actions

(2) Clean build, with remote cache full

$ rm -rf bazel-out/darwin_x86_64-fastbuild/bin/src
$ bazel build --config=cache src/com/yext/util/time/...
INFO: Analysed target //src/com/yext/util/time:time.
INFO: Found 1 target...
Target //src/com/yext/util/time:time up-to-date:
  bazel-bin/src/com/yext/util/time/libtime.jar
INFO: Elapsed time: 0.192s, Critical Path: 0.07s
INFO: Build completed successfully, 27 total actions

(3) Take fingerprint of jar, check cache size

$ ls -l bazel-bin/src/com/yext/util/time/libtime.jar
-r-xr-xr-x  1 robfig  wheel  34765 Nov 17 17:25 bazel-bin/src/com/yext/util/time/libtime.jar
$ md5 bazel-bin/src/com/yext/util/time/libtime.jar
MD5 (bazel-bin/src/com/yext/util/time/libtime.jar) = 3487c7462f54c55249338eba990fb685

Cache size: 876K

Laptop

(4) Clean build, with remote cache full from workstation - 3.026s

Cache size: 936K

There is no difference in the JARs; same MD5




--
You received this message because you are subscribed to a topic in the Google Groups "bazel-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bazel-discuss/e7pE9Nfnoz4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bazel-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/d4d70266-f9c4-43f7-b040-43392f978957%40googlegroups.com.

Rob Figueiredo

unread,
Nov 19, 2017, 1:16:43 PM11/19/17
to bazel-discuss
Hi Jakob,

I got the artifacts to be used across machines! I had configured both machines to the same version of MacOS, XCode, and bazel but found that it wasn't working until I replaced the homebrew-installed bazel with the distribution from github; I noticed that although they were the same version, their build timestamps were different. Both were saying the latest version is installed when running "brew upgrade bazel", yet they did not have the same binary.

Workstation:
Build label: 0.7.0-homebrew
Build target: bazel-out/darwin_x86_64-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Sun Nov 12 03:09:44 2017 (1510456184)
Build timestamp: 1510456184
Build timestamp as int: 1510456184

Laptop:
Build label: 0.7.0-homebrew
Build target: bazel-out/darwin_x86_64-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Sun Nov 12 03:11:31 2017 (1510456291)
Build timestamp: 1510456291
Build timestamp as int: 1510456291

I'll see if that resolves all my issues.

Thanks,
Rob

Rob Figueiredo

unread,
Nov 20, 2017, 7:05:43 PM11/20/17
to Jakob Buchgraber, bazel-discuss
Hi Jakob,
I unfortunately did find one coworker who has the identical Bazel version, same version of OSX and XCode, but who was not sharing my artifacts. I did notice that his output was under bazel-out/local-fastbuild while I and other coworkers had output in bazel-out/darwin_x86_64-fastbuild. Do you think that could be relevant? I searched but didn't see why that would be different considering we have nearly identical setups.
Thanks,
Rob

George Gensure

unread,
Nov 20, 2017, 7:52:49 PM11/20/17
to Rob Figueiredo, Jakob Buchgraber, bazel-discuss
That will indeed cause misses, as the output filenames are part of the action definition.

I've put up a branch which represents some of my work for tooling on this front.  It adds:

  --[no]remote_cache_log_all_actions (a boolean; default: "false")
    Log action definition for remote action cache. Only functional with remote
    logging
  --remote_cache_log_filename (a string; default: see description)
    Log remote cache interactions to this file. Each entry contains the
    following: 'ACTION,KEY,ID', ACTION is the request type, KEY is the cache's
    identifier, ID is a context description identifier for the request.
  --[no]remote_cache_log_missed_actions (a boolean; default: "false")
    Log action definition with remote action cache misses. Only functional with
    remote logging

The action logging can be extremely verbose in large input trees, but has been instrumental in discovering differences for us between 'blessed' build environments and degraded ones.


-George

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/CAEpF5ei-piYTW2oaGQgffcJ0ivXYEZXAePdXddgt8oeSgh7dzg%40mail.gmail.com.

Rob Figueiredo

unread,
Nov 20, 2017, 7:56:51 PM11/20/17
to George Gensure, Jakob Buchgraber, bazel-discuss
Very interesting, I will use that for testing thank you!

In the meantime, did you know how the bazel-out subdirectory name is generated? Why would one be local-fastbuild vs darwin_x86_64-fastbuild?

Thank you,
Rob

George Gensure

unread,
Nov 20, 2017, 9:54:21 PM11/20/17
to Rob Figueiredo, Jakob Buchgraber, bazel-discuss
TL;DR: the string is "-".join([toolchain_identifier, compilation_mode]), have your colleague check out his configs/.bazelrc compared to yours

The real definition of this is extremely convoluted. It is defined in the documentation as:
          bazel-out/                      <== All actual output of the build is under here: outputPath
            local_linux-fastbuild/        <== one subdirectory per unique target BuildConfiguration instance;
                                              this is currently encoded
I implore a bazel expert to elaborate on this, but I can say that I get my guaranteed toolchain_identifier in place by providing a 'default_toolchain' in my CROSSTOOL, specified with --crosstool_top. The build config appears to be populated by my loading this crosstool with no compiler/config specified.

-George

Jakob Buchgraber

unread,
Nov 21, 2017, 6:52:53 AM11/21/17
to bazel-discuss
That's great George. We need to add the --remote_cache_log_all_actions functionality to Bazel.
This seems most useful for debugging such issues. Would you want to polish the code and open
a PR? Else, I can take care too.

We will need to sort the actions so as to have them in a stable format, so that two logs are easily
comparable using i.e. diff.

Best,
Jakob

On Tuesday, November 21, 2017 at 1:52:49 AM UTC+1, George Gensure wrote:
That will indeed cause misses, as the output filenames are part of the action definition.

I've put up a branch which represents some of my work for tooling on this front.  It adds:

  --[no (a boolean; default: "false")
To unsubscribe from this group and all its topics, send an email to bazel-discus...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.

George Gensure

unread,
Nov 21, 2017, 7:09:51 AM11/21/17
to Jakob Buchgraber, bazel-discuss
I can get that into a PR.

-George

To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/e2e2ecaf-ccc4-4708-bc4d-4dda7e53b8b0%40googlegroups.com.

Marcel Hlopko

unread,
Nov 22, 2017, 9:07:46 AM11/22/17
to bazel-discuss
Hi Rob,

I think the reason why the output directory is not the same is that you don't have same toolchains. Bazel probably decided that your machine has working Xcode (so it used toolchain that supports objc), but your coworders doesn't (so bazel used C++ only toolchain). Xcode is a funny beast, your coworker might just need to run Xcode once, install extensions, and accept the license, and forcing bazel reconfiguration by running bazel clean --expunge. In any case, toolchain identifier will no longer be part of the output directory for 0.8.0. You can try if it helps by running with --noexperimental_toolchain_id_in_output_directory. Hope it helps.
To unsubscribe from this group and all its topics, send an email to bazel-discus...@googlegroups.com.

alpha....@gmail.com

unread,
Nov 27, 2017, 11:43:44 AM11/27/17
to bazel-discuss
I have a need to analyze the build cache hit rate as well. Would it be better to write the data to profile and use bazel analyze-profile to inspect?

Alpha

To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.

Jakob Buchgraber

unread,
Nov 27, 2017, 12:04:16 PM11/27/17
to alpha....@gmail.com, bazel-discuss
On Mon, Nov 27, 2017 at 5:43 PM <alpha....@gmail.com> wrote:
I have a need to analyze the build cache hit rate as well. Would it be better to write the data to profile and use bazel analyze-profile to inspect?

Alpha

I think there's two different things:
1) Statistics about a build. Detailed in the profiler and simplified in the UI after each build.
2) Understand why one is not getting cache hits and being able to deep dive on that.
 
We need both.

Best,
Jakob

 

For more options, visit https://groups.google.com/d/optout.
--

Jakob Buchgraber

Software Engineer

Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.

    

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.

Rob Figueiredo

unread,
Nov 27, 2017, 12:25:25 PM11/27/17
to Marcel Hlopko, bazel-discuss
Hi Marcel,
This information is exactly what I was looking for! Thank you.
Much appreciated,
Rob

To unsubscribe from this group and all its topics, send an email to bazel-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/c08837f6-4574-4fc6-82a9-a4b8f4e977f9%40googlegroups.com.

alpha....@gmail.com

unread,
Nov 29, 2017, 3:19:05 AM11/29/17
to bazel-discuss
Bazel's profiler is extremely useful and a good place to put both information. You can have both by recording the input digests and action protobufs as events. I did a small change to also put remote cache tasks in the profiler. I then export the raw dump to be viewed in Chrome tracing. Please see the attach screenshot of the profile for building Bazel.

In my opinion with proper visualization tool (e.g. Chrome tracing which Buck also exports to), finding and debugging the uncached actions is much simpler. This has other benefits:

1. Task hierarchy is preserved. I can dig deeper by greping the parent task id in the raw dump.
2. Helps finding other bottlenecks. Sometimes uncached actions are not the bottleneck.
3. Standardizing on one format so easier to analyze.

Besides this discussion I think a analyze-profile dump format that for Chrome tracing (json) will be very useful.
bazel tracing.png

Jakob Buchgraber

unread,
Nov 29, 2017, 3:25:26 AM11/29/17
to alpha....@gmail.com, bazel-discuss
Hi Alpha,

I didn't know about Chrome's tracing feature. That sounds great! Can you contribute this change to Bazel?

Best,
Jakob


For more options, visit https://groups.google.com/d/optout.

alpha....@gmail.com

unread,
Dec 1, 2017, 12:32:10 AM12/1/17
to bazel-discuss
I'll give it a try. Chrome tracing is one of its best feature. :)

Jakob Buchgraber

unread,
Jan 16, 2018, 12:50:36 PM1/16/18
to alpha....@gmail.com, ola...@google.com, ago...@google.com, bazel-discuss

Hi Alpha,

I just wanted to let you know that Ola and Alexandra are also planning on doing similar work. Maybe you guys could collaborate :).

Cheers,
Jakob



For more options, visit https://groups.google.com/d/optout.

Ian O'Connell

unread,
Jan 25, 2018, 9:22:08 AM1/25/18
to Alpha Lam, bazel-discuss
Hi Alpha,

Do you have a fork with your changes to support the chrome profiling somewhere? Would love to toss some of my builds through to trace some odd misses for the AC

Thanks,

Ian.

To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/5e1f9a93-bb1c-4db6-99ca-605912c9bd14%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages