remote caching basics

168 views
Skip to first unread message

Michael Knight

unread,
Oct 22, 2016, 6:08:02 PM10/22/16
to bazel-discuss
Hi,

I've been reading over some documents about the remote caching. But I have a few questions that I'm still struggling to find answers for. What do people see as the primary goal of remote caching? Is the intention a shared cache for multiple uses? Or is it to distribute your build products for faster remote execution/actions? Both? Something else?  


Is there more I should be looking at?

thanks

Alpha Lam

unread,
Oct 24, 2016, 1:27:01 PM10/24/16
to Michael Knight, bazel-discuss
From google's perspective it's to distribute build actions and allow remote execution to share the build artifacts. So the focus of their design is remote execution.

I think for most users without the resources to set up a build farm it makes more sense to be a shared cache for multiple users, or a shared cache between the CI bots and users.

Alpha

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/9f77ac30-194a-4830-a363-1c440e99f6f3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Eric Burnett

unread,
Oct 24, 2016, 1:59:54 PM10/24/16
to Alpha Lam, Michael Knight, bazel-discuss
To expand on Alpha's point a bit, just a shared cache (without execution) can be quite helpful, but requires a fairly specific set of circumstances:
  • Multiple builds being run that would hit a remote cache, but not already have results locally. E.g. clean builds can get value from a shared cache, but incremental ones less so.
  • Toolchains that either emit machine-independent results, or multiple machines having the same key characteristics.
  • Keying such that cache values that shouldn't be reused won't be, but still allow for cache hits. E.g. object files from the wrong version of gcc should probably not be picked up.
So examples environments where this can be useful:
  1. Sharing between CI machines that are all running the exact same hardware and software. This one is easy - almost anything that can be cached locally should be able to be safely cached remotely too.
  2. Sharing between users, IF you know they're running the same hw/sw, OR you're sharing the results from tools where mis-versioning won't have a negative effect, OR you have good enough keying to pick up only the transferrable actions from the graph.
I generally assume that this is going to be surprisingly difficult for most people, and that getting this right gets you most of the way to being able to do remote execution anyways. But I've definitely heard from folk (including Alpha) who have environments that allow them to usefully capitalize on shared caches, so I'm not going to say that it cannot be done :).

Couple other misc things to think about:
  • Using bazel well can often have better impact directly than relying on a remote cache. E.g. if you can arrange it so you do incremental builds instead of clean builds (in your CI setup for example), I'd expect that to bring you most of the benefits of a remote cache at a lot less effort.
  • Toolchains can have surprising coupling to the machines they run on, making results non-portable. E.g. C++ debug info often embeds absolute paths.
  • ...though some tool results will be trivially cacheable. E.g. if you had an action in bazel to minify pngs, I'd be surprised if you couldn't trivially cache the results of that. But that'll be specific to your particular workload, and what actions within it are both cacheable and expensive.
Cheers,
Eric

Alpha Lam

unread,
Oct 24, 2016, 2:14:15 PM10/24/16
to Eric Burnett, Michael Knight, bazel-discuss
It is worth mentioning that base image sandboxing can help this issue a bit: http://www.bazel.io/docs/designs/2016/06/02/sandboxing.html#base-images

If system dependencies and toolchains are provided by the base image then essentially users have identical build environment. Of course there will be some corner cases that might break and this only works with certain OS.

Alpha

Dan Fabulich

unread,
Nov 2, 2016, 2:21:47 PM11/2/16
to bazel-discuss, ericb...@google.com, mad...@gmail.com
Is the base image feature something that actually exists today?
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.

Alpha Lam

unread,
Nov 2, 2016, 2:25:17 PM11/2/16
to Dan Fabulich, bazel-discuss, Eric Burnett, Michael Knight
No I don't think so.

To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/2e6730be-36c4-475b-a890-5481d97f2162%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages