Remote build vs bazel stamping

271 views
Skip to first unread message

Artem Navrotskiy

unread,
Sep 18, 2022, 3:43:44 PM9/18/22
to Remote Execution APIs Working Group
I have come to the conclusion that stamping is incompatible with the current remote build protocol.

In discussing the issue, I was advised to write here.

Related issues:

What is stamping?
Stamping allows you to add information like "this executable file can be compiled from revision XXX". However, this information does not affect the caching key.

Controlled by bazel option: https://bazel.build/docs/user-manual#stamp

It works quite simply:
  • a file is generated at the start of the build volatile-status.txt;
  • this file can use the build action, but it does not affect the caching key (that is, the build pretends that it never changes).
Unfortunately, stamping only works with a local build.

How does stamping behave when building remotely?
In the case of a remote build, as well as in the case of disk cache, the behavior is as follows:
  • an Action is created, to which InputRootDigest is passed a hash from all input files, including volatile-status.txt;
  • the hash from Action is used as a caching key;
  • as a result, the caching key depends on volatile-status.txt;
  • by default, stamping contains BUILD_TIMESTAMP;
  • any Action with stamping is guaranteed to get a cache miss.

I tried to use as a caching key a value not equal to the hash from the Action (has from Action with differ InputRootDigest without volatile-status.txt): it works with disk cache, but does not with bazel-buildfarm.

Looks like remote build requires cache key equal hash from Action.

How can I get around this problem in the current situation?
Don't use stamping
One workaround for this issue is to not use stamping.

But it's not as easy as it seems. It doesn't work if some Action declares a dependency on volatile-status.txt and for the same input data can generate different artifacts.

That is, third-party dependency can cause cache miss for some Action when stamping is formally turned off.

Allow caching key not equal to hash from Action
This is not a very good idea, as it gives a trivial opportunity to poison the cache.

Do not use remote build for Action with stamping
This method works for me. Proof of concept implementation: https://github.com/bazelbuild/bazel/pull/16240

In this case, the problem of the possibility of using a caching key that is not equal to the hash from the Action is bypassed by executing this Action locally.

Unfortunately this is the solution:
  • does not allow you to perform all steps remotely;
  • requires the client to be able to write to the cache and allows the cache to be poisoned.

But this change is local, and in my case good enough.

Extend remote build protocol
Add the ability to transfer additional data along with Action so that some files do not affect the caching key.

This method solves the problem of cache poisoning, but is too global for me to solve it myself.

Fabian Meumertzheim

unread,
Sep 18, 2022, 5:50:57 PM9/18/22
to Artem Navrotskiy, Remote Execution APIs Working Group
Such an extension to the remote execution API would also be required to realize some of the goals of https://github.com/bazelbuild/bazel/issues/6526, such as cache hits for Java compilation across different platforms. Just mentioning this here to show that the benefits would potentially extend beyond stamping support.

--
You received this message because you are subscribed to the Google Groups "Remote Execution APIs Working Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to remote-execution...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/remote-execution-apis/6e913cf3-2c0a-47be-b63d-22a9c440219dn%40googlegroups.com.

John Millikin

unread,
Sep 18, 2022, 8:55:15 PM9/18/22
to Artem Navrotskiy, Remote Execution APIs Working Group
I think this is a client-side issue (in Bazel), rather than an issue with the
remote APIs or remote caching.

Reading the linked issue, the sequence is something like:

1. bazel build //some/target --stamp --remote_cache
2. bazel clean
3. bazel build //some/target --stamp --remote_cache

You're expecting the second `build` to return the cached result of the first
build. The complaint is that Bazel is requesting rebuilds from the remote
executor that you think it shouldn't be requesting.


There's two problems here:

1. Bazel's local state contains more data than the remote/disk cache, including
data about the freshness of `volatile-status.txt`. Since this data exists
only in Bazel's local state, when you ran `bazel clean`, that data was
deleted. You were expecting it to be preserved.

2. Bazel doesn't have a way to do a "partial clean", where it deletes the data
you think it should delete, and preserves data (like volatile cache keys)
that you think it should preserve.

Since the behavior you want to change is in the client (Bazel), that seems like
it would also be the best place to investigate making changes.


Three example ideas on how to solve this on the client side:

1. Add support to Bazel for a "volatile cache", where Bazel can store data on
stamp freshness for volatile data. This would be a local path that doesn't
get cleared by `bazel clean`.

2. As part of your build, have a local-only uncached action that reads in
`volatile-status.txt` and outputs a stabilized version. This action could
do this via non-hermetic reading/writing to a local directory. Then the
stabilized volatile status gets used as an input file to the "real" action,
which executes remotely with existing cache logic.

3. Instead of using volatile status, compute equivalent data from a stable
source. For example the build timestamp might be computed from the commit
time of the revision being built, rather than the true system time.


Are any of these applicable to your use case?

Artem Navrotskiy

unread,
Sep 19, 2022, 1:33:54 AM9/19/22
to John Millikin, Remote Execution APIs Working Group


пн, 19 сент. 2022 г. в 03:55, John Millikin <jmil...@gmail.com>:
I think this is a client-side issue (in Bazel), rather than an issue with the
remote APIs or remote caching.

No. I tried to debug bazel and it looks like it is impossible to implement stamping with current Action (AC key is hash of Action and Action contains stamping volatile files).


Reading the linked issue, the sequence is something like:

  1. bazel build //some/target --stamp --remote_cache
  2. bazel clean
  3. bazel build //some/target --stamp --remote_cache

You're expecting the second `build` to return the cached result of the first
build. The complaint is that Bazel is requesting rebuilds from the remote 
executor that you think it shouldn't be requesting.

This example emulates build from differ hosts without additional remote build farm installation.
I expect at this point that volatile stamping is not part of the caching key. Then the build result will be requested in the remote cache and there will be a cache hit.
 


There's two problems here:

1. Bazel's local state contains more data than the remote/disk cache, including
   data about the freshness of `volatile-status.txt`. Since this data exists
   only in Bazel's local state, when you ran `bazel clean`, that data was
   deleted. You were expecting it to be preserved.

There is a problem of different behavior with local and remote cache. But now it's not about that. One can simply assume that there is no local cache.
 

2. Bazel doesn't have a way to do a "partial clean", where it deletes the data
   you think it should delete, and preserves data (like volatile cache keys)
   that you think it should preserve.

The cleanup was used as an emulation of running on a different host. They only have a shared remote cache.
 

Since the behavior you want to change is in the client (Bazel), that seems like
it would also be the best place to investigate making changes.


There are two issues on this topic. Both stalled on the need for changes to the remote execution API.
 

Three example ideas on how to solve this on the client side:

1. Add support to Bazel for a "volatile cache", where Bazel can store data on
   stamp freshness for volatile data. This would be a local path that doesn't
   get cleared by `bazel clean`.

This does not solve the problem of building from multiple hosts.
 

2. As part of your build, have a local-only uncached action that reads in
   `volatile-status.txt` and outputs a stabilized version. This action could
   do this via non-hermetic reading/writing to a local directory. Then the
   stabilized volatile status gets used as an input file to the "real" action,
   which executes remotely with existing cache logic.

This does not solve the problem of building from multiple hosts.

Theoretically, you can move stamping to a separate service, but this is hardcore:
  - this requires a separate step to convert input data + volatile to stamping data;
  - you need to edit all third-party build libraries (rules_go, rules_docker, rules_pkg,...);
  - this solution is specific to each customer.
 

3. Instead of using volatile status, compute equivalent data from a stable
   source. For example the build timestamp might be computed from the commit
   time of the revision being built, rather than the true system time.


I want to achieve two things:
  - build only what really changed by commit;
  - save information about which commit the same artifact can be built from.

Stamping is ideal for this purpose.

Eric Burnett

unread,
Sep 20, 2022, 2:50:30 PM9/20/22
to Artem Navrotskiy, John Millikin, Remote Execution APIs Working Group
I think there is a misunderstanding, or perhaps disagreement, on how stamping is supposed to work. As designed in bazel, I'd say it is by design that you don't get a remote cache hit when the volatile-status.txt file changes. The 'volatile' split is helpful to make interactive/iterative development faster, but otherwise, getting a new stamped build later is kind of expected to rerun the remote stamped actions - consider the presence of BUILD_USER / BUILD_HOST in stable-status.txt, for example.

If you want to simply override the BUILD_TIMESTAMP that goes into the volatile-status.txt file so it doesn't churn your remote cache hits unduly, I think it's something like `SOURCE_DATE_EPOCH=123 bazel build ...` . https://reproducible-builds.org/docs/source-date-epoch/ has tips on how you can pick a value for it that aligns with your repo, or you could just set it to 0 or similar if you don't need the information.

If what you specifically want to do is get a value embedded into the stamped binaries, but not cache invalidate on it - embed some commit the same artifact can be built from, but avoid rebuilding if another commit produces the same binary - then you're correct, that cannot be accomplished in the REAPI today. We've avoided any such feature thus far on purity/correctness grounds, but I can see your argument. Still, I suspect it'll be an uphill battle for you - e.g. other bazel users probably do want the correct CL number and build timestamp to be embedded in their binaries, so you'd be requesting a REAPI feature to support this sort of side-channel non-invalidating metadata, plus a bazel change to optionally mark the volatile-status.txt to be passed this way. I doubt that's going to be too high of a priority, so you may want to look into what workarounds you're comfortable with.

Also, one other note - this only affects the stamped actions themselves. E.g. for C++, only the link actions do actual stamping; all the object file compilation will be cached regardless. This should hopefully only be a small fraction of the overall builds, and maybe tolerable to re-run. But still, stamping is pretty expensive, agreed - at Google we try to only do it for artifacts we want to actually persist and run in production; the majority of builds have it turned off. 

A final option is to simply not embed the info in the binary - if all you want is to track what commit a build can be reproduced from, you could consider building the binary unstamped, but passing around a file alongside it with provenance information so you have it handy when you want to do the reverse lookup later. 

Cheers,
Eric

Artem Navrotskiy

unread,
Sep 20, 2022, 4:13:45 PM9/20/22
to Eric Burnett, John Millikin, Remote Execution APIs Working Group


вт, 20 сент. 2022 г. в 21:50, Eric Burnett <ericb...@google.com>:
I think there is a misunderstanding, or perhaps disagreement, on how stamping is supposed to work. As designed in bazel, I'd say it is by design that you don't get a remote cache hit when the volatile-status.txt file changes. The 'volatile' split is helpful to make interactive/iterative development faster, but otherwise, getting a new stamped build later is kind of expected to rerun the remote stamped actions - consider the presence of BUILD_USER / BUILD_HOST in stable-status.txt, for example.

If you want to simply override the BUILD_TIMESTAMP that goes into the volatile-status.txt file so it doesn't churn your remote cache hits unduly, I think it's something like `SOURCE_DATE_EPOCH=123 bazel build ...` . https://reproducible-builds.org/docs/source-date-epoch/ has tips on how you can pick a value for it that aligns with your repo, or you could just set it to 0 or similar if you don't need the information.

If what you specifically want to do is get a value embedded into the stamped binaries, but not cache invalidate on it - embed some commit the same artifact can be built from, but avoid rebuilding if another commit produces the same binary - then you're correct, that cannot be accomplished in the REAPI today. We've avoided any such feature thus far on purity/correctness grounds, but I can see your argument. Still, I suspect it'll be an uphill battle for you - e.g. other bazel users probably do want the correct CL number and build timestamp to be embedded in their binaries, so you'd be requesting a REAPI feature to support this sort of side-channel non-invalidating metadata, plus a bazel change to optionally mark the volatile-status.txt to be passed this way. I doubt that's going to be too high of a priority, so you may want to look into what workarounds you're comfortable with.

This is my case: I want to embed some commit the same artifact can be built from and avoid rebuilding if another commit produces the same binary.
I understand that there is no quick and easy solution. But if the problem is not reported, then it will never be solved.

At the moment, I got around the problem with two behavior changes (https://github.com/bazelbuild/bazel/pull/16240):
  • exclude volatile-status.txt from cache key;
  • build all actions with volatile cache files locally (but with remote caching). 
 This solution is dirty, but it solves my problem.
Reply all
Reply to author
Forward
0 new messages