xcode version and cache poisoning

454 views
Skip to first unread message

o...@wix.com

unread,
Apr 23, 2019, 7:16:55 AM4/23/19
to bazel-discuss
We have a team of ~ 100 developers sharing the same remote cache (read/write).
Each of us is using Mac but we don't force same OS version / xcode version policy.

Recently I got the following error:
ERROR: /private/var/tmp/_bazel_ors/69a4269ef9a625e3cd84076e6996a3e0/external/com_google_protobuf/BUILD:354:1: C++ compilation of rule '@com_google_protobuf//:protoc' failed: I/O exception during sandboxed execution: xcrun failed with code 1.
This most likely indicates that SDK version [10.10] for platform [MacOSX] is unsupported for the target version of xcode.
Process exited with status 1
stdout
: stderr: xcodebuild: error: SDK "macosx10.10" cannot be located.


When I ran bazel clean and disabled the cache this problem was resolved.

I think something is not hermetic in my builds around cpp / xcode but I'm not sure what.

Two possible things that may solve it is:
1. (preferred) Defining remote tools so that bazel would download them and use them instead of the OS built in tools (similar to remote JDK)
2. Making sure that the xcode version changes the cache key so two users using different xcode wouldn't share outputs.

How do you solve it in your place?

Philipp Wollermann

unread,
Apr 23, 2019, 9:47:02 AM4/23/19
to Or Shachar, Jakob Buchgraber, Eric Burnett, bazel-discuss
Hi Or,

we encountered the same problem on Bazel's CI in the last days. Bazel does not correctly take the Xcode version or even the path of the Xcode installation into account when building the cache key. This results in cache poisoning and getting cache hits for artifacts built with a different Xcode version.

The solution is to manually build an additional cache key that you put into every action's cache key. For example, you could add the OS version, the Xcode version, the path to Xcode and some other information that you feel makes a difference for your builds and then hash everything with SHA256.

Now we have to put this SHA256 digest into every action's cache key. We first tried to solve this using the --host_platform_remote_properties_override flag in this way:

--host_platform_remote_properties_override=properties:{name:"platform" value:"$CACHE_KEY"}
or the other form:
--host_platform_remote_properties_override=properties:{name:"cache-silo-key" value:"$CACHE_KEY"}

We tried this, because we found a lot of examples inside docs and other people's wrappers inside Google that did this - but it turns out, this doesn't actually work. The semantics of the flag silently changed a few releases ago to no longer override anything (despite the flag still being called "[...]_override" ...) - it now only provides a default, however that seems to be mostly ignored and overridden by some automatic platform detection that Bazel does.

The "correct" way is thus to separate your cache into silos by using a prefix in the URL:

or
--remote_http_cache=http://192.168.0.1/$CACHE_KEY for a local HTTP cache

Note though that bazel-remote currently does not support this - the prefix is silently ignored and the cache is not separated. We thus had to switch to using nginx for our local cache server for the Mac cluster, which can be configured to support this. I don't know about RBE and if it supports URL prefixing.

Cc'ing @Jakob Buchgraber and @Eric Burnett for further ideas on how to solve this. Maybe there's a better way than using URL prefixes that I'm not aware of.

Cheers,
Philipp


--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/b9ef5a9a-2a22-449f-977b-b8f593439cbf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Philipp Wollermann | Software Engineer | phi...@google.com
Google Germany GmbH | Erika-Mann-Straße 33 | 80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Eric Burnett

unread,
Apr 23, 2019, 11:10:52 AM4/23/19
to Philipp Wollermann, Alexandra Goultiaeva, Or Shachar, Jakob Buchgraber, bazel-discuss, John Cater
On Tue, Apr 23, 2019 at 9:46 AM Philipp Wollermann <phi...@google.com> wrote:
Hi Or,

we encountered the same problem on Bazel's CI in the last days. Bazel does not correctly take the Xcode version or even the path of the Xcode installation into account when building the cache key. This results in cache poisoning and getting cache hits for artifacts built with a different Xcode version.

The solution is to manually build an additional cache key that you put into every action's cache key. For example, you could add the OS version, the Xcode version, the path to Xcode and some other information that you feel makes a difference for your builds and then hash everything with SHA256.

Now we have to put this SHA256 digest into every action's cache key. We first tried to solve this using the --host_platform_remote_properties_override flag in this way:

--host_platform_remote_properties_override=properties:{name:"platform" value:"$CACHE_KEY"}
or the other form:
--host_platform_remote_properties_override=properties:{name:"cache-silo-key" value:"$CACHE_KEY"}

We tried this, because we found a lot of examples inside docs and other people's wrappers inside Google that did this - but it turns out, this doesn't actually work. The semantics of the flag silently changed a few releases ago to no longer override anything (despite the flag still being called "[...]_override" ...) - it now only provides a default, however that seems to be mostly ignored and overridden by some automatic platform detection that Bazel does.

I'm surprised this didn't work - it's true that the new semantics are to provide defaults only, but if you're using remote caching (rather than remote execution) I'd expect there to be no manually-specified remote execution properties in the default platform, and so these defaults would be used.

E.g. https://github.com/bazelbuild/bazel/blob/master/tools/platforms/BUILD#L90 does not specify remote_execution_properties anywhere. If you do have a manually-specified platform somewhere used instead of this, that's the place to make the change; if you don't, possibly there's a bug in how the flag is applied now?

(I'll note that I'm most familiar with the gRPC caching; if there's anything different with the remote http cache, I don't know the details).

+Alexandra Goultiaeva , FYI since this flag (and its new name, "remote_default_platform_properties") seems to come up a lot in issues - seems it's still a point of confusion that can maybe made easier to debug. 
 

The "correct" way is thus to separate your cache into silos by using a prefix in the URL:

or
--remote_http_cache=http://192.168.0.1/$CACHE_KEY for a local HTTP cache

Note though that bazel-remote currently does not support this - the prefix is silently ignored and the cache is not separated. We thus had to switch to using nginx for our local cache server for the Mac cluster, which can be configured to support this. I don't know about RBE and if it supports URL prefixing.

Cc'ing @Jakob Buchgraber and @Eric Burnett for further ideas on how to solve this. Maybe there's a better way than using URL prefixes that I'm not aware of.

I'm not sure why URL prefixes would be the "correct" way to solve this - bazel is in full control of the action keying, and so anything it could inject into the URL it should be able to inject into the key directly. If the current set of bazel flags aren't letting users express their keying easily enough, I'd prefer to fix the flags than to add another place to control it (and one that's independent of the Platform logic bazel is standardizing on).

For other ideas, mostly I'd err towards making it easy to specify and obvious what's going to be used when. We could also look into more auto-detection so e.g. xcode versions for mac toolchains are picked up automatically, but I'm leery of that direction - if bazel auto-detected some but not all relevant host details, the breakages would just become more rare and hard to debug. So for this to be useful we'd at least have to ensure that we minimally knew how to capture all relevant details about a given toolchain (the xcode toolchain, in this case), if not everything else, which may be too deep a rabbit hole in practice.

Jakob Buchgraber

unread,
Apr 23, 2019, 11:16:03 AM4/23/19
to Or Shachar, bazel-discuss
Hi Or,

if you are using platforms you need to define a different execution platform for every xcode version that's being used
and have your users set the correct platform. In each platform you then specify the remote_execution_properties
attribute that has the xcode version in them.

If you are not using platforms, just set the following flag

--remote_default_platform_properties="properties:{name:'xcode-version' value:'$(xcodebuild -version)'}"

This will ensure that the xcode version is included in the remote cache key.

Best,
Jakob

Jakob Buchgraber

Software Engineer


Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.

    

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.




On Tue, Apr 23, 2019 at 1:16 PM ors via bazel-discuss <bazel-...@googlegroups.com> wrote:
We have a team of ~ 100 developers sharing the same remote cache (read/write).
Each of us is using Mac but we don't force same OS version / xcode version policy.

Recently I got the following error:
ERROR: /private/var/tmp/_bazel_ors/69a4269ef9a625e3cd84076e6996a3e0/external/com_google_protobuf/BUILD:354:1: C++ compilation of rule '@com_google_protobuf//:protoc' failed: I/O exception during sandboxed execution: xcrun failed with code 1.
This most likely indicates that SDK version [10.10] for platform [MacOSX] is unsupported for the target version of xcode.
Process exited with status 1
stdout
: stderr: xcodebuild: error: SDK "macosx10.10" cannot be located.


When I ran bazel clean and disabled the cache this problem was resolved.

I think something is not hermetic in my builds around cpp / xcode but I'm not sure what.

Two possible things that may solve it is:
1. (preferred) Defining remote tools so that bazel would download them and use them instead of the OS built in tools (similar to remote JDK)

Ideally, 
 
2. Making sure that the xcode version changes the cache key so two users using different xcode wouldn't share outputs.

Have you tried specifying?


 

How do you solve it in your place?

--

Philipp Wollermann

unread,
Apr 23, 2019, 11:38:51 AM4/23/19
to Eric Burnett, Alexandra Goultiaeva, Or Shachar, Jakob Buchgraber, bazel-discuss, John Cater
On Tue, Apr 23, 2019 at 5:10 PM Eric Burnett <ericb...@google.com> wrote:
On Tue, Apr 23, 2019 at 9:46 AM Philipp Wollermann <phi...@google.com> wrote:
I'm surprised this didn't work - it's true that the new semantics are to provide defaults only, but if you're using remote caching (rather than remote execution) I'd expect there to be no manually-specified remote execution properties in the default platform, and so these defaults would be used.

I don't know much about the platforms stuff in Bazel.

We do have a few "platform(...)" definitions in Bazel's BUILD file: https://github.com/bazelbuild/bazel/blob/master/BUILD#L136
And here's a call to register_execution_platforms(...): https://github.com/bazelbuild/bazel/blob/master/WORKSPACE#L465

However, we don't set any flags that tell Bazel that it should use these platforms, for example here's a typical "Bazel presubmit on macOS" command line:

bazel build --show_progress_rate_limit=5 --curses=yes --color=yes --verbose_failures --keep_going --jobs=36 --announce_rc --experimental_multi_threaded_digest --experimental_repository_cache_hardlinks --disk_cache= --sandbox_writable_path=/var/tmp/_bazel_buildkite/cache/repos/v1 --test_env=REPOSITORY_CACHE=/var/tmp/_bazel_buildkite/cache/repos/v1 --remote_timeout=60 --remote_max_connections=200 --remote_http_cache=http://100.107.67.248/6966eb210b1628bacfdbb6fb82c61b1232cc936558cdc383e90e715842c5672c --apple_platform_type=macos --noincompatible_strict_action_env //src:bazel //src:bazel_jdk_minimal

Does that mean we don't use any platform and thus it should work? Then why didn't it work?
 
The "correct" way is thus to separate your cache into silos by using a prefix in the URL:

or
--remote_http_cache=http://192.168.0.1/$CACHE_KEY for a local HTTP cache

Note though that bazel-remote currently does not support this - the prefix is silently ignored and the cache is not separated. We thus had to switch to using nginx for our local cache server for the Mac cluster, which can be configured to support this. I don't know about RBE and if it supports URL prefixing.

Cc'ing @Jakob Buchgraber and @Eric Burnett for further ideas on how to solve this. Maybe there's a better way than using URL prefixes that I'm not aware of.

I'm not sure why URL prefixes would be the "correct" way to solve this - bazel is in full control of the action keying, and so anything it could inject into the URL it should be able to inject into the key directly. If the current set of bazel flags aren't letting users express their keying easily enough, I'd prefer to fix the flags than to add another place to control it (and one that's independent of the Platform logic bazel is standardizing on).

Correct from an infrastructure perspective, not from a "this is how I always wanted Bazel to work" perspective:

There was a flag that we were supposed to use for exactly this case, it stopped working and there is no clear, working replacement.
Thus, on the infra side, we have to invent our own solution to this problem to deal with whatever bugs or unintuitive behavior Bazel has here.

The current solution uses URL prefixes. It's not beautiful, but it works, until we figure out how to tell Bazel purely from the command-line that it should put these additional things into its action cache keys no matter what.

Cheers,
Philipp

Jakob Buchgraber

unread,
Apr 23, 2019, 12:02:11 PM4/23/19
to Philipp Wollermann, Eric Burnett, Alexandra Goultiaeva, Or Shachar, bazel-discuss, John Cater
On Tue, Apr 23, 2019 at 5:38 PM Philipp Wollermann <phi...@google.com> wrote:
bazel build --show_progress_rate_limit=5 --curses=yes --color=yes --verbose_failures --keep_going --jobs=36 --announce_rc --experimental_multi_threaded_digest --experimental_repository_cache_hardlinks --disk_cache= --sandbox_writable_path=/var/tmp/_bazel_buildkite/cache/repos/v1 --test_env=REPOSITORY_CACHE=/var/tmp/_bazel_buildkite/cache/repos/v1 --remote_timeout=60 --remote_max_connections=200 --remote_http_cache=http://100.107.67.248/6966eb210b1628bacfdbb6fb82c61b1232cc936558cdc383e90e715842c5672c --apple_platform_type=macos --noincompatible_strict_action_env //src:bazel //src:bazel_jdk_minimal

Does that mean we don't use any platform and thus it should work? Then why didn't it work?

Correct. In the absence of a platform that sets remote_execution_properties adding --remote_default_platform_properties to this line should just work. If it doesn't we need to investigate.

Eric Burnett

unread,
Apr 23, 2019, 12:03:10 PM4/23/19
to Philipp Wollermann, Alexandra Goultiaeva, Or Shachar, Jakob Buchgraber, bazel-discuss, John Cater
On Tue, Apr 23, 2019 at 11:38 AM Philipp Wollermann <phi...@google.com> wrote:
On Tue, Apr 23, 2019 at 5:10 PM Eric Burnett <ericb...@google.com> wrote:
On Tue, Apr 23, 2019 at 9:46 AM Philipp Wollermann <phi...@google.com> wrote:
I'm surprised this didn't work - it's true that the new semantics are to provide defaults only, but if you're using remote caching (rather than remote execution) I'd expect there to be no manually-specified remote execution properties in the default platform, and so these defaults would be used.

I don't know much about the platforms stuff in Bazel.

We do have a few "platform(...)" definitions in Bazel's BUILD file: https://github.com/bazelbuild/bazel/blob/master/BUILD#L136
And here's a call to register_execution_platforms(...): https://github.com/bazelbuild/bazel/blob/master/WORKSPACE#L465

However, we don't set any flags that tell Bazel that it should use these platforms, for example here's a typical "Bazel presubmit on macOS" command line:

bazel build --show_progress_rate_limit=5 --curses=yes --color=yes --verbose_failures --keep_going --jobs=36 --announce_rc --experimental_multi_threaded_digest --experimental_repository_cache_hardlinks --disk_cache= --sandbox_writable_path=/var/tmp/_bazel_buildkite/cache/repos/v1 --test_env=REPOSITORY_CACHE=/var/tmp/_bazel_buildkite/cache/repos/v1 --remote_timeout=60 --remote_max_connections=200 --remote_http_cache=http://100.107.67.248/6966eb210b1628bacfdbb6fb82c61b1232cc936558cdc383e90e715842c5672c --apple_platform_type=macos --noincompatible_strict_action_env //src:bazel //src:bazel_jdk_minimal

Does that mean we don't use any platform and thus it should work? Then why didn't it work?

I think it should have worked with this, yes. +Jakob Buchgraber on whether remote property defaults work properly with --remote_http_cache, +John Cater on whether there's anything special about mac rules and how they interact with Platforms still.

FWIW, last I checked this still worked with mac with gRPC caching, but it's possible that our cache-only users haven't upgraded to a new enough version of bazel yet so it's also broken there and I just don't know it yet.
 
 
The "correct" way is thus to separate your cache into silos by using a prefix in the URL:

or
--remote_http_cache=http://192.168.0.1/$CACHE_KEY for a local HTTP cache

Note though that bazel-remote currently does not support this - the prefix is silently ignored and the cache is not separated. We thus had to switch to using nginx for our local cache server for the Mac cluster, which can be configured to support this. I don't know about RBE and if it supports URL prefixing.

Cc'ing @Jakob Buchgraber and @Eric Burnett for further ideas on how to solve this. Maybe there's a better way than using URL prefixes that I'm not aware of.

I'm not sure why URL prefixes would be the "correct" way to solve this - bazel is in full control of the action keying, and so anything it could inject into the URL it should be able to inject into the key directly. If the current set of bazel flags aren't letting users express their keying easily enough, I'd prefer to fix the flags than to add another place to control it (and one that's independent of the Platform logic bazel is standardizing on).

Correct from an infrastructure perspective, not from a "this is how I always wanted Bazel to work" perspective:

There was a flag that we were supposed to use for exactly this case, it stopped working and there is no clear, working replacement.
Thus, on the infra side, we have to invent our own solution to this problem to deal with whatever bugs or unintuitive behavior Bazel has here.

The current solution uses URL prefixes. It's not beautiful, but it works, until we figure out how to tell Bazel purely from the command-line that it should put these additional things into its action cache keys no matter what.

Gotcha. Then to answer the question you asked, no RBE does not support URL prefixing, and cannot easily be made to (gRPC controls the URLs themselves). We'd need hackery on our side and in bazel to approximate something like it, which wouldn't land any quicker than just fixing the flags in bazel. (Though I'm also not yet sure if they're not working correctly for the gRPC API, or just http caching). 
 

Cheers,
Philipp

Philipp Wollermann

unread,
Apr 23, 2019, 12:29:56 PM4/23/19
to Eric Burnett, Alexandra Goultiaeva, Or Shachar, Jakob Buchgraber, bazel-discuss, John Cater
Thanks a lot for confirming that the flag should work and the explanations, Eric and Jakob!

I'll run some tests tomorrow to see whether I can repro the cache poisoning locally on my Mac and if yes, will try to figure out why it's happening.

John Cater

unread,
Apr 23, 2019, 12:39:51 PM4/23/19
to Philipp Wollermann, Eric Burnett, Alexandra Goultiaeva, Or Shachar, Jakob Buchgraber, bazel-discuss
Sorry for coming into this late. I can clarify some of the points on platforms:

1) The flag "--remote_default_platform_properties" (and the old version "--host_platform_remote_properties_override") have always been ways to set the remote_execution_properties for platforms that don't have their own. The original flag only overrode the host platform, and only if it didn't declare its own remote_execution_properties. The current flag applies to any execution platform which does not declare its own remote_execution_properties.

2) MacOS rules do not do any special checking of the execution platform.

3) If the Bazel tests are calling "register_execution_platforms()" in the WORKSPACE (directly or in a macro), then execution platforms are being considered before the host platform. Any remote_execution_properties on those platforms will be used directly.

4) Using a separate execution platform per xcode version is probably not the right way to manage this: there's no way except directly editing the WORKSPACE (or by changing the "--extra_execution_platforms" flag) to change which execution platform is used. Instead, we should make sure that the xcode version is reflected in the action's command and/or inputs, so that changing the xcode version causes the action key to change.

Philipp Wollermann

unread,
Apr 23, 2019, 1:59:16 PM4/23/19
to John Cater, Eric Burnett, Alexandra Goultiaeva, Or Shachar, Jakob Buchgraber, bazel-discuss
On Tue, Apr 23, 2019 at 6:39 PM John Cater <jca...@google.com> wrote: 
3) If the Bazel tests are calling "register_execution_platforms()" in the WORKSPACE (directly or in a macro), then execution platforms are being considered before the host platform. Any remote_execution_properties on those platforms will be used directly.

I don't know what anything in this sentence means. Let's break this down:

"Bazel tests are calling "register_execution_platforms()" in the WORKSPACE":
How can tests "call" something in the WORKSPACE file?
From my point of view, there's this line in our WORKSPACE file: https://github.com/bazelbuild/bazel/blob/master/WORKSPACE#L465
Which - to me - looks like that register_execution_platforms thing is always called (because it's in the WORKSPACE file without any logic surrounding it!).
Does that mean it's always in effect? Does that mean we now "have a platform" (as opposed to that state where we "don't have a platform" and thus the --remote_default_platform_properties flag takes effect) when I just run "bazel build" without any remote execution?

"execution platforms are being considered before the host platform":
What does "a platform is being considered before the host platform" mean?

"remote_execution_properties on those platforms will be used directly":
What does "used directly" mean?

Sorry for all the questions, but the whole concept somehow isn't really accessible to me.
 
4) Using a separate execution platform per xcode version is probably not the right way to manage this: there's no way except directly editing the WORKSPACE (or by changing the "--extra_execution_platforms" flag) to change which execution platform is used. Instead, we should make sure that the xcode version is reflected in the action's command and/or inputs, so that changing the xcode version causes the action key to change.

Yes, absolutely. I talked with Marcel about this last week. He knows and it's very bad that we don't have this, because it also means that Bazel will simply crash or produce incorrect results when a user upgrades Xcode without manually running "bazel clean --expunge" afterwards.

Cheers,
Philipp

itt...@wix.com

unread,
Apr 23, 2019, 4:23:28 PM4/23/19
to bazel-discuss
Maybe it's just me but it somehow sounds to me that it's much too easy to mess Bazel up then I'd like it to be.
This is getting us into slow and mistaken.

Solving the apple rules is great, seeing if we can package their dependencies as external dependencies (like remotejdk) would probably be even better but I think that the question which troubles us is:
What is the principled answer? How can I know that all inputs are indeed declared? Otherwise I'm at risk.
> Cc'ing ...@Jakob Buchgraber and ...@Eric Burnett for further ideas on how to solve this. Maybe there's a better way than using URL prefixes that I'm not aware of.

Philipp Wollermann

unread,
Apr 24, 2019, 9:29:31 AM4/24/19
to Eric Burnett, Alexandra Goultiaeva, Or Shachar, Jakob Buchgraber, bazel-discuss, John Cater
Follow up: I cannot reproduce this on my Mac.

I tried various things like building with different Xcode paths, same path but different Xcode version, ... but it worked fine. :|
It even worked fine without specifying the remote_default_platform_properties flag at all, even though that's not supposed to work yet. No idea.

I'll try to switch Bazel CI over to use the --remote_default_platform_properties= flag again instead of URL prefixes - let's see what happens.

Cheers,
Philipp

Philipp Wollermann

unread,
Apr 24, 2019, 9:37:34 AM4/24/19
to Ittai Zeidman, bazel-discuss
On Tue, Apr 23, 2019 at 10:23 PM ittaiz via bazel-discuss <bazel-...@googlegroups.com> wrote:
Maybe it's just me but it somehow sounds to me that it's much too easy to mess Bazel up then I'd like it to be.
This is getting us into slow and mistaken.

Solving the apple rules is great, seeing if we can package their dependencies as external dependencies (like remotejdk) would probably be even better but I think that the question which troubles us is:
What is the principled answer? How can I know that all inputs are indeed declared? Otherwise I'm at risk.

I don't think you can know that all inputs are declared. Anything could influence your build - for example, we recently had a test that always failed on Bazel CI, but no one was able to reproduce it on a local workstation. Even when we used the exact same host OS, Docker container, Git repository and command-line.

After some hours of debugging, it turned out that the test relied on readdir() returning entries in a consistent order, but the Bazel CI machines use ZFS as the Docker backing filesystem and ZFS's order of returning entries is not deterministic, because it depends on the internal state of certain data structures at the time of file creation.

My colleague Klaus then sent me a link to a great collection of "classic reproducibility breakers": https://github.com/bmwiedemann/theunreproduciblepackage

I think the only way to be somewhat sure that your builds are really deterministic is to actually build everything twice and compare the outputs.
If you find any differences, you have to investigate why the tools did not produce the same output and tweak them until they do.

If anyone has a better idea, let's discuss. :)

Cheers,
Philipp

John Cater

unread,
Apr 24, 2019, 10:19:58 AM4/24/19
to Philipp Wollermann, Ittai Zeidman, bazel-discuss
Jumping back a few messages to answer Philipp's questions:

"Bazel tests are calling "register_execution_platforms()" in the WORKSPACE":
How can tests "call" something in the WORKSPACE file?
From my point of view, there's this line in our WORKSPACE file: https://github.com/bazelbuild/bazel/blob/master/WORKSPACE#L465
Which - to me - looks like that register_execution_platforms thing is always called (because it's in the WORKSPACE file without any logic surrounding it!).
Does that mean it's always in effect? Does that mean we now "have a platform" (as opposed to that state where we "don't have a platform" and thus the --remote_default_platform_properties flag takes effect) when I just run "bazel build" without any remote execution?

You're correct: I was being lazy with my terminology, which I need to avoid.

Every configured target has a single execution platform. Every action then inherits that execution platform from the configured target. How is that execution platform chosen? During toolchain resolution, Bazel looks at the list of available execution platforms (this is global to the build), the list of available toolchain implementations (this is also global to the build), the requested toolchain types (this is specific to the actual target being configured), and the target platform (this is specific to the actual target being configured). Bazel then picks the execution platform and toolchain implementations that match the requested toolchain types and target platform.

The list of "available execution platforms" is:
1) Any platforms specified by the "--extra_execution_platforms" flag.
2) Any platforms registered via "register_execution_platforms" calls in WORKSPACE (possibly via a macro).
3) The host platform.

This list is considered in order, so any platform that is found before the host platform will be used before the host platform.

When an action is sent to the remote execution endpoint, the remote_execution_properties from the execution platform is also sent, so that the RE endpoint can determine what to use for the execution. If the execution platform does not specify remote_execution_properties, then the value of "--remote_default_platform_properties" is used.

Because of the current split between platforms and execution strategies, platforms cannot directly specify that they are for remote execution. I would love to work on a way to bridge this gap, so that some platforms can be marked as being for remote execution, some for local, some define a sandbox, etc etc. This will take a lot of thoughts and so far I only have hand-wavy ideas.

"execution platforms are being considered before the host platform":
What does "a platform is being considered before the host platform" mean?

I hope my previous answer has clarified this, but: the host platform is the last possible execution platform to be considered, and acts as a fallback.

"remote_execution_properties on those platforms will be used directly":
What does "used directly" mean?

I also hope my answer clarified this, let me know if it didn't.


--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.

Ittai Zeidman

unread,
Apr 25, 2019, 4:59:04 AM4/25/19
to John Cater, Philipp Wollermann, bazel-discuss
Thanks John,
This detailed response is helpful (for me at least).
Philipp,
If Bazel strives to have an empty PATH won’t this drive rule maintainers to explicitly define their inputs?

It just sounds like the only sane answer for a correct reproducible build system apart from managing everyone’s computers and making them identical
--


Ittai Zeidman

 

Cell: 054-6735021

40 Hanamal street, Tel Aviv, Israel



jga...@butterflynetinc.com

unread,
Apr 30, 2019, 10:59:53 AM4/30/19
to bazel-discuss
Jakob, I was testing this out on bazel 0.24.1, I don't think this works as written.

λ ./bazel-debug build //host/Applications/ImagingSandbox --remote_default_platform_properties="properties:{name:'xcode-version' value:'$(xcodebuild -version)'}"
INFO: Invocation ID: d01f4b12-38a7-4eb8-a144-ddc2af42cb46
INFO: Analysed target //host/Applications/ImagingSandbox:ImagingSandbox (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /Users/jgavris/code/software/host/Applications/ImagingSandbox/BUILD:4:1: Executing genrule //host/Applications/ImagingSandbox:python-version failed: Failed to parse --remote_default_platform_properties properties:{name:'xcode-version' value:'Xcode 10.1
Build version 10B61'}: 1:40: String missing ending quote.
Target //host/Applications/ImagingSandbox:ImagingSandbox failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 1.517s, Critical Path: 0.02s
INFO: 0 processes.


On Tuesday, April 23, 2019 at 11:16:03 AM UTC-4, Jakob Buchgraber wrote:
> Hi Or,
>
>
> if you are using platforms you need to define a different execution platform for every xcode version that's being used
> and have your users set the correct platform. In each platform you then specify the remote_execution_properties
> attribute that has the xcode version in them.
>
>
> If you are not using platforms, just set the following flag
>
>
> --remote_default_platform_properties="properties:{name:'xcode-version' value:'$(xcodebuild -version)'}"
>
>
>
> This will ensure that the xcode version is included in the remote cache key.
>
>
> Best,
> Jakob
>
>
>
>
>
>
>
>
> Jakob Buchgraber
> Software Engineer
>
>
> To unsubscribe from this group and stop receiving emails from it, send an email to bazel-...@googlegroups.com.

Jakob Buchgraber

unread,
May 2, 2019, 5:20:24 AM5/2/19
to Jason Gavris, bazel-discuss
Hi Jason,

it's a bit subtle as the string passed to the --remote_default_platform_properties flag is a text protobuf. I believe you need to reverse your quoting
--remote_default_platform_properties='properties:{name:"xcode-version" value:"foo"}'. I opened https://github.com/bazelbuild/bazel/issues/8220
to make this simpler.

Best,
Jakob

jga...@butterflynetinc.com

unread,
May 2, 2019, 2:38:32 PM5/2/19
to bazel-discuss
Perfect, that didn't throw an error. But I guess these properties are not included in the action key for pure remote caching (without execution). I ended up just hashing the output of `gcc -v` and adding that as an ACTION_ENV.

Jakob Buchgraber

unread,
May 2, 2019, 2:45:12 PM5/2/19
to Jason Gavris, bazel-discuss
They are included for remote caching too. If that's not happening and you don't have a platform defined yet then that's a bug.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/736c53b9-ac63-48e5-ae74-345359d790d8%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages