remote_http_cache + experimental_local_disk_cache

404 views
Skip to first unread message

James Judd

unread,
May 9, 2018, 11:49:44 PM5/9/18
to bazel-discuss
Hi everyone,

We are setting up caching (which has been awesome so far!) and ran into something we didn't expect.

We were hoping that when remote_http_cache and experimental_local_disk_cache are used together, that both remote cache hits and misses would populate the local disk cache. In our testing, when remote_http_cache is enabled, neither remote cache hits or misses populate the local cache. Is this intended behavior or a bug? I can file an issue on GitHub if it is a bug.

Best,
James

Klaus Aehlig

unread,
May 11, 2018, 5:03:55 AM5/11/18
to James Judd, bazel-discuss

Hi James,

> We were hoping that when remote_http_cache and
> experimental_local_disk_cache are used together, that both remote cache
> hits and misses would populate the local disk cache.

I'm not sure which cache you are talking about.

- There is the disk_cache, formerly known as experimental_local_disk_cache which,
however, is a cache for actions (in the sense of the action graph). Any downloads
by workspace rules conceptionally happen before analysis, so are not actions in
that sense.

- There is the repository_cache, now enabled by default and shared between
workspaces. That cache caches on local disk files downloaded by bazel via
http. A cache hit, however, can only happen if a sha256 sum for the file
to be downloaded is specified (cache key is the checksum). If a file downloaded
by bazel at head (regardless if a remote cache hit or miss happened) does not
end up in that cache, this is indeed a bug.

Best,
Klaus

--
Klaus Aehlig
Google Germany GmbH, Erika-Mann-Str. 33, 80636 Muenchen
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschaeftsfuehrer: Paul Terence Manicle, Halimah DeLaine Prado

Philipp Wollermann

unread,
May 12, 2018, 1:11:44 PM5/12/18
to Klaus Aehlig, ja...@lucidchart.com, bazel-discuss
On Fri, May 11, 2018 at 11:03 AM 'Klaus Aehlig' via bazel-discuss <bazel-...@googlegroups.com> wrote:
> We were hoping that when remote_http_cache and
> experimental_local_disk_cache are used together, that both remote cache
> hits and misses would populate the local disk cache.

I'm not sure which cache you are talking about.

- There is the disk_cache, [...]

- There is the repository_cache, [...]

Klaus: Considering that James wrote "remote_http_cache and experimental_local_disk_cache" and that both are action caches, it's probably the first?
Note that the patch that renamed "--experimental_local_disk_cache" to "--disk_cache" is not yet available in any Bazel release.

James: I don't think we have tested or currently support running with two action caches, but I agree that this sounds like a nice feature and how it should work intuitively. Could you please file a GitHub issue and assign it to @buchgr?

Cheers,
Philipp

George Gensure

unread,
May 12, 2018, 1:29:05 PM5/12/18
to Philipp Wollermann, Klaus Aehlig, James Judd, bazel-discuss
Fwiw a local disk cache service that wraps bazel execution and can proxy and write through grpc, hazelcast, or http action cache, as well as proxying remote ex is something we will be releasing soon based on buildfarm.

-George

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/CA%2BAhZoik9q4oJdfJQ68BE4nRx8y4%3DXVsqK2yOgGebD6%2Bw%3DjJLw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jakob Buchgraber

unread,
May 13, 2018, 7:46:27 AM5/13/18
to Philipp Wollermann, Klaus Aehlig, ja...@lucidchart.com, bazel-discuss
Hi,

answers inline.

We are planning on introducing a single --cache=(file|http|grpc)://server.com:port flag to replace all the existing caching flags. Such an interface would not be compatible with two caches. I can see that this would be nice, but it would also require lots of refactoring and I wonder whether it wouldn't be smarter to instead move this complexity outside of Bazel to a local cache server. I.e. we have plans for https://github.com/buchgr/bazel-remote to support this.

Thoughts?

Best,
Jakob

ittai zeidman

unread,
May 13, 2018, 11:38:04 PM5/13/18
to bazel-discuss
Yes actually.
This sounds horrible.
For us the file cache is meant to fix problems with local development and switching branches.
Remote cache (grpc) is meant to allow utilizing what the CI already built.
We need this to be a chain.

Jakob Buchgraber

unread,
May 14, 2018, 5:05:08 AM5/14/18
to ittai zeidman, bazel-discuss
Hi Ittai,


What are you referring to? Also, I would love to hear your thoughts about the --cache flag.

Best,
Jakob
 
--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.

ittai zeidman

unread,
May 14, 2018, 9:54:36 AM5/14/18
to Jakob Buchgraber, bazel-discuss
Hi Jakob,
First let me apologize for the tone of my previous mail it was too harsh and wasn't called for. I'm sorry.

I was indeed referring to your proposal of having a unified flag.
The use-case I have is as follows:
Developer works on their machine and wants:
Remote cache (grpc) for the CI utilization.
Disk cache for switching between branches (maybe even for sharing between different workspaces). 

This means they need a chain where they try to use the disk cache and if there's a miss there they fallback to the remote cache.
The key point to remember is that in this use-case the developer doesn't use remote execution and so their work doesn't exist in the remote cache.

Jakob Buchgraber

unread,
May 14, 2018, 4:20:28 PM5/14/18
to ittai zeidman, bazel-discuss
Hi ittai,

answers inline.

On Mon, May 14, 2018 at 3:54 PM ittai zeidman <itt...@gmail.com> wrote:
Hi Jakob,
First let me apologize for the tone of my previous mail it was too harsh and wasn't called for. I'm sorry.

I was indeed referring to your proposal of having a unified flag.
The use-case I have is as follows:
Developer works on their machine and wants:
Remote cache (grpc) for the CI utilization.
Disk cache for switching between branches (maybe even for sharing between different workspaces). 

This means they need a chain where they try to use the disk cache and if there's a miss there they fallback to the remote cache.
The key point to remember is that in this use-case the developer doesn't use remote execution and so their work doesn't exist in the remote cache.


Understood. So I was wondering whether this behavior really needs to be in Bazel or whether it would be sufficient to have a local caching daemon on each developer machine that implements the described behavior, with the daemon acting as a remote cache to Bazel.

Thanks for your input!

- Jakob

ittai zeidman

unread,
May 14, 2018, 4:34:01 PM5/14/18
to Jakob Buchgraber, bazel-discuss
Right, forgot to comment on that.
It sounds like that adds yet more complexity to the setup which I’d rather avoid.
I’m not 100% dead set against it but it just makes the entry barrier higher.

Eric Burnett

unread,
May 14, 2018, 5:25:13 PM5/14/18
to bazel-discuss
My 2c is that a local caching daemon sounds like a good direction to go assuming most users of bazel+remote caching/execution don't need it, but if ~everyone is going to leverage a local cache on top of the remote cache then baking the functionality into bazel proper is a nice simplification (as Ittai desires). But they're not mutually exclusive - starting with a caching daemon to prove out the value and iterate on the details and then (if desired) bake the logic back into bazel itself seems like a fine progression to me.

I'll also note that there are a lot of variants to this that one might want, including having potentially all of a "remote" cache (persistent cache in a service), an "office" cache (local multi-user cache, for network proximity), a "local" cache (disk), and fallback+backfill through multiple of these.

George: One concern I have is that a cache service by itself would not be compatible with the BuildEventStream bazel can put out, as it'd point URIs at the local caching proxy instead of the "canonical" path due to https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/RemoteModule.java#L66-L67 . I expect the right answer is that the URI prefix should be overridable another bazel flag (yay!), and the appropriate canonical path passed explicitly in whenever a proxy is being used, so that the build event stream contains references as if the proxy were not being used. Not sure if this is on your radar yet or not.

--Eric

Robert Gay

unread,
May 14, 2018, 5:27:42 PM5/14/18
to ittai zeidman, Jakob Buchgraber, bazel-discuss
My $0.02 (for whatever it's worth): I really, really like being able to switch back and forth between branches locally, or add a new `git worktree` off some previously built revision and not have to rebuild the world from that point again, and not have to downloads hundreds of megabytes of jars from a remote cache. I don't need this behavior 100% of the time, but when I do need it (e.g., on a train working primarily offline), it saves me quite a few minutes each time. I may be in the minority, but having a local action cache by default just seems "right" to me. (I've actually had coworkers get confused when they realized that bazel didn't have this behavior natively).

That being said, for me in particular the local action cache only provides significant value in a relatively small percentage of my day; namely, when I'm either on spotty internet, or when I want to create a new branch off some existing work. If it requires significant refactoring to implement, I could survive with an external local process doing the caching, though I agree w/ Ittai that for most folks the barrier to entry to running a second, persistence service locally is such that they probably won't use it.





For more options, visit https://groups.google.com/d/optout.
--

ROBERT GAY | REDFIN | PRINCIPAL ENGINEER

rober...@redfin.com

James Judd

unread,
May 14, 2018, 5:54:05 PM5/14/18
to Robert Gay, ittai zeidman, Jakob Buchgraber, bazel-discuss
At Lucid we use the local cache for a similar purpose: speeding up switching branches. We use CI to populate a remote cache in S3. A caching proxy is set up in the office to make accessing that cache faster. 

To workaround the remote cache not populating the local cache, we set up a local nginx proxy that stores remote cache hits locally.  Here's a quickly hacked together flowchart showing our first pass at a workaround for this.


Based on the responses, it seems like this is a common use case. It would be nice if Bazel supported this by default.

Best,
James


To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.
--

ROBERT GAY | REDFIN | PRINCIPAL ENGINEER

rober...@redfin.com

--
You received this message because you are subscribed to a topic in the Google Groups "bazel-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bazel-discuss/CNYNXacr2E4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bazel-discuss+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/CAN1%3DHEfVuE3MmNLMNw6yzoWWUALi02snerxOJziHcbO8jXyDGQ%40mail.gmail.com.

Jakob Buchgraber

unread,
May 16, 2018, 8:46:21 AM5/16/18
to James Judd, rober...@redfin.com, ittai zeidman, bazel-discuss
Hi,

thanks all for your input. I have thought about it a bit more and I agree that it makes sense for Bazel to support this scenario.
So the proposal would be to have two flags: --disk_cache and --remote_cache. I think the sane behavior would be for Bazel
to first try to read from the disk cache and then read from the remote cache. For writing, Bazel would populate both caches
concurrently by default. If the --noremote_upload_local_results flag is specified it would only write to the disk cache. One can't
use the disk cache in read-only mode.

I still think there's value in having a local daemon, as described above, additional to what Bazel can do a daemon can asynchronously
upload files to a remote cache. This can be a *major* performance improvement for when using remote caching for incremental
builds. We probably could and would not want to implement such an asynchronous upload behavior in Bazel.

Best,
Jakob

Jakob Buchgraber

Software Engineer


Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.

    

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.




To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
--

ROBERT GAY | REDFIN | PRINCIPAL ENGINEER

rober...@redfin.com

--
You received this message because you are subscribed to a topic in the Google Groups "bazel-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bazel-discuss/CNYNXacr2E4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bazel-discus...@googlegroups.com.

ittai zeidman

unread,
May 16, 2018, 9:08:54 AM5/16/18
to Jakob Buchgraber, James Judd, rober...@redfin.com, bazel-discuss
Sounds good to me.
Question: the disk cache would be instead of the workspace cache?

Jakob Buchgraber

unread,
May 16, 2018, 9:21:53 AM5/16/18
to ittai zeidman, James Judd, rober...@redfin.com, bazel-discuss
Hi ittai,

not it currently is and will be in addition to the workspace cache.

Best,
Jakob

Jakob Buchgraber

Software Engineer


Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.

    

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.



nox...@gmail.com

unread,
Aug 12, 2018, 9:07:37 PM8/12/18
to bazel-discuss
Reply all
Reply to author
Forward
0 new messages