Debugging action cache differences where they are not expected

766 views
Skip to first unread message

glin...@gmail.com

unread,
May 20, 2016, 5:34:59 PM5/20/16
to bazel-discuss
We are trying to get bazel caching support up and running (awesome work btw!).

However, we are hitting a very strange issue where we are seeing files that shouldn't need to be rebuilt, getting rebuilt across different machines.

Digging into it, we dumped the action cache on both machines (bazel dump --action-cache), and compared the actionKey, digestKey and dependencies for one of the files that is showing this behavior. Sure enough, the actionKey and digestKey are different, yet all the dependencies are the same. The dependencies are all in the source tree (or included via new_http_archive), so I'm at a bit of a loss as to how we could be getting differences here.

Any pointers for debugging this sort of issue?

Thanks!

Janak Ramakrishnan

unread,
May 20, 2016, 7:02:01 PM5/20/16
to glin...@gmail.com, bazel-discuss
The action key is computed based (roughly) on the command line that
will be executed -- it's actually independent of the inputs. Is it
possible that your different machines have different configurations,
like versions of the compiler, locations of libraries, etc., that are
preventing caching? And can you clarify? Are you sharing the cache
across multiple machines, or even within a single machine, you're
seeing unexpected rebuilds?
> --
> You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/3d1c43fd-fc3d-42b7-bf91-0da55e4f322a%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

glin...@gmail.com

unread,
May 20, 2016, 7:07:51 PM5/20/16
to bazel-discuss, glin...@gmail.com
Thanks for the explanation! Any way to get the command line that it's using to generate that hash?

It's two different machines, but same base OS, and libraries, although they might have been installed at slightly different times. But all the paths should be the same.

On the same machine, it behaves as expected with the remote cache (ie. clean, then rebuild just downloads cached items instead of rebuilding them).

The test we tried was clearing the cache, building on one machine, then building on another machine. We saw it download some items, but it also built some.

On Friday, May 20, 2016 at 4:02:01 PM UTC-7, Janak Ramakrishnan wrote:
> The action key is computed based (roughly) on the command line that
> will be executed -- it's actually independent of the inputs. Is it
> possible that your different machines have different configurations,
> like versions of the compiler, locations of libraries, etc., that are
> preventing caching? And can you clarify? Are you sharing the cache
> across multiple machines, or even within a single machine, you're
> seeing unexpected rebuilds?
>

Janak Ramakrishnan

unread,
May 20, 2016, 7:10:39 PM5/20/16
to glin...@gmail.com, bazel-discuss
If you run with -s (--subcommands) then Bazel will print out the
command line of every action it actually executes. So doing it on a
build where you don't expect to see any actions executed should be
helpful. Just to be clear, you're building on one machine, then
copying the entire installation to another machine, and building
there?
> --
> You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/b0584d93-4061-41bd-a782-5b5940289de3%40googlegroups.com.

glin...@gmail.com

unread,
May 20, 2016, 7:26:40 PM5/20/16
to bazel-discuss, glin...@gmail.com
Ok, will compare those two. Interestingly I tried on another computer in the meantime, and it did cache the target properly.

We are just cloning the source tree onto two different machines, then doing a build.

On Friday, May 20, 2016 at 4:10:39 PM UTC-7, Janak Ramakrishnan wrote:
> If you run with -s (--subcommands) then Bazel will print out the
> command line of every action it actually executes. So doing it on a
> build where you don't expect to see any actions executed should be
> helpful. Just to be clear, you're building on one machine, then
> copying the entire installation to another machine, and building
> there?
>

glin...@gmail.com

unread,
May 20, 2016, 7:51:40 PM5/20/16
to bazel-discuss, glin...@gmail.com
Slightly different issue now, we have a target where the actionKeys are the same, but the digestKeys are different. Again, using only source in the repository. Does bazel hash the compiler binary/libc/etc when determining the digestKey?

Janak Ramakrishnan

unread,
May 21, 2016, 1:46:48 PM5/21/16
to Gary Linscott, bazel-discuss
I think it may depend on what your Bazel setup is. Someone else on the
Bazel team might be able to say more. Kristina or Damien? I'm still a
little confused by your question. Is it just that on a certain
machine, you're seeing unexpected rebuilds? Can we ignore the multiple
machines in your setup?
> --
> You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/e49751c8-d964-4ac3-a1ee-4594e9afb3ee%40googlegroups.com.

glin...@gmail.com

unread,
May 22, 2016, 1:05:36 PM5/22/16
to bazel-discuss, glin...@gmail.com
We aren't seeing any spurious rebuilds on the same machine, it's only when we are trying to share cached objects that we see things that should be in the cache getting rebuilt.

We tracked it down to bazel hashing the system include headers, which makes sense. Some folks have slightly different versions of those installed, since we didn't all install them at the same time. So we need to make that all consistent to get good usage of the cache.

On Saturday, May 21, 2016 at 10:46:48 AM UTC-7, Janak Ramakrishnan wrote:
> I think it may depend on what your Bazel setup is. Someone else on the
> Bazel team might be able to say more. Kristina or Damien? I'm still a
> little confused by your question. Is it just that on a certain
> machine, you're seeing unexpected rebuilds? Can we ignore the multiple
> machines in your setup?
>

Janak Ramakrishnan

unread,
May 22, 2016, 1:16:20 PM5/22/16
to Gary Linscott, bazel-discuss
You may be interested in the distributed caching prototype @hhclam has
been working on (along with remote execution):
https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/MemcacheActionCache.java
> --
> You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/b2be5307-d2ae-4d30-92d9-83c47cc8ce9b%40googlegroups.com.

glin...@gmail.com

unread,
May 22, 2016, 3:55:06 PM5/22/16
to bazel-discuss, glin...@gmail.com
Definitely, that's what we are using :). Fantastic work. Although one thing that's a bit limiting is the use of hazelcast for caching. Currently it appears there is no built-in expiry in the cache, so it will fill up and overflow memory fairly quickly. We worked around that by setting up a machine with the HD allocated to swap.

On Sunday, May 22, 2016 at 10:16:20 AM UTC-7, Janak Ramakrishnan wrote:
> You may be interested in the distributed caching prototype @hhclam has
> been working on (along with remote execution):
> https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/remote/MemcacheActionCache.java
>

Janak Ramakrishnan

unread,
May 22, 2016, 6:41:10 PM5/22/16
to Gary Linscott, bazel-discuss
Aha, now I finally understand your setup :) missed it in the initial
email. Great to hear the caching is already useful! It's an area of
very active development, so please file issues/open discussions/send
patches if there are changes that will help.
> --
> You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/27a60861-3b8b-4ea6-b911-a9cdce4b3e6d%40googlegroups.com.

Alpha Lam

unread,
May 22, 2016, 11:05:08 PM5/22/16
to glin...@gmail.com, bazel-discuss
You can configure hazelcast server to evict. 

See documentations here: 

You need to run the hazelcast with -Dhazelcast.config=<path to the hazelcast.xml>. The default configuration file is here: https://github.com/hazelcast/hazelcast/blob/master/hazelcast/src/main/resources/hazelcast-default.xml. Change it to suit your needs.

The reason for choosing Hazelcast is it's ease of setup.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.

Austin Schuh

unread,
May 22, 2016, 11:40:18 PM5/22/16
to glin...@gmail.com, bazel-discuss
Consider building your own CROSSTOOL file and providing the compiler.  We fetch the compiler with a new_http_repository, and then configure the CROSSTOOL file to use it.  This controls the compiler and headers completely, and should get rid of this type of issue.  I think Brian Silverman put something up on the wiki.

Austin

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.

glin...@gmail.com

unread,
May 23, 2016, 1:18:58 AM5/23/16
to bazel-discuss, glin...@gmail.com
Fantastic! Not sure how I missed that, thanks for the pointer. Great work on the caching infrastructure as well :). It will be a huge timesaver for us.
Reply all
Reply to author
Forward
0 new messages