[Proposal] Optional Toolchains

133 views
Skip to first unread message

John Cater

unread,
Jan 24, 2022, 9:20:54 AM1/24/22
to bazel-dev
It's been a requested feature for a while, so I'm looking at adding the capability for optional toolchains in toolchain resolution, instead of the current mandatory system.

There are two possible approaches outlined, please take a look and let me know your preferences if you are interesting in writing rules that use toolchains.


Thanks!
John Cater

John Cater

unread,
Jan 24, 2022, 3:24:53 PM1/24/22
to bazel-dev, Greg Estren, Tom Rybka, bazel-...@googlegroups.com
Adding both +Greg Estren and +Tom Rybka as domain reviewers.

Also sending this to the Rules Authors SIG, since you will certainly be interested.

Thanks!
John Cater

Alex Eagle

unread,
Jan 24, 2022, 4:06:41 PM1/24/22
to bazel-dev
I like how the second option makes using toolchains in a rule more similar to using providers. An attribute can declare that a provider is required from targets given as values, in which case it can be de-referenced without any checking, otherwise the None value indicates a provider that isn't present on the target. The advantage of the `mandatory` attribute is that bazel and other tooling can detect mistakes statically without having to evaluate the rule implementation.
-Alex

Greg Estren

unread,
Jan 25, 2022, 5:17:15 PM1/25/22
to John Cater, bazel-dev, Tom Rybka, bazel-...@googlegroups.com
Review comments:
  1. Performance question: missing mandatory toolchains, if they still exist, would still be reported before dependencies are computed? i.e. here? In other words, is performance loss possible due to "missing toolchain" errors triggering only *after* the transitive subgraph is constructed?

  2. For Changes to Toolchain Resolution Logic, the subtle implication is platforms matching the same # of toolchains revert to priority-order selection. While you say as much, could you explicitly call out the most generic logic:

    selectedPlatform = filter(registeredPlatforms, hasAllMandatoryToolchains).sort(numOptionalToolchainsMatches).secondarySort(priorityOrder).chooseFirst() 

    I'd also define, briefly, what "priority" means.

  3. What would be the default for mandatory?

  4. For the second option, why do rule writers need to check the Bazel version? Wouldn't that outright fail for old Bazels since config.toolchain_type wouldn't resolve?



Tony Aiuto

unread,
Jan 26, 2022, 12:16:59 AM1/26/22
to Greg Estren, John Cater, bazel-dev, Tom Rybka, bazel-...@googlegroups.com
Option 2 is the right choice by this reasoning:
  • Assume option 1, where all toolchains are optional
  • Assume there is a rule for which there a mandatory toolchain and two optional ones.
  • If we look at execution platforms and find 1 platform that provides the mandatory toolchain, but another that provides both of the optional toolchains, how will we know to correctly pick the one that only has 1 toolchain match?
That said.  I think we could incrementally get there with a hybrid approach.
  • Quickly move to "all toolchains are optional". Rule authors should start today towards migrating rules to fail if the toolchain is missing. Getting ahead of the bazel implementation.
  • For the cases where a rule only has one toolchain, the old API and change to rules is sufficient.
  • We could add a flag "--incompatible_all_toolchains_are_optional" that turns on this behavior. Declare a fixed time before the day it becomes true and then removed.
  • We continue to work through use cases of mixed mandatory/optional toolchains w.r.t. execution platform priority.
What I would like to see is that we can move forward on allowing optionality in the most common case (single toolchain) soon, without getting bogged down in API for mandatory-ness vs. execution platforms.

Question: I am presuming the proposal does not mandate that all rule declarations change. That is, under option 2, should
  toolchain_types = ["//toolchain:type1", "//toolchain:type2"],
continue to work?  I think it has too, with the assumption that each is optional (if --incompatible_all_toolchains_are_optional is set)






--
You received this message because you are subscribed to the Google Groups "bazel-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-dev/CABdTCQVR17JOUU7pd9oV-9qqmaFmzsBWZ5z5LukpxDODhkhPMw%40mail.gmail.com.

John Cater

unread,
Jan 26, 2022, 10:27:42 AM1/26/22
to bazel-dev
Thanks for the comments. I've put up a PR (https://github.com/bazelbuild/proposals/pull/244) to clarify some points, mostly around the meaning of "priority" and clarifying what happens for missing mandatory toolchains, as well as fixing the formatting to clarify the structure.

To respond to further questions:
If a mandatory toolchain cannot be found, analysis will end immediately, as in the current implementation. One consequence of this is that a generic error message is used: with optional toolchains, a rule can report a custom error message that may be more meaningful to users. On the other hand, the entire dependency tree will still be analyzed, so there may be some tradeoffs there.

Currently, under both API proposals, toolchains default to optional (with both the base-label syntax or the richer API, when `mandatory` isn't specified). This seems to be to be in keeping with how attributes handle the `mandatory` keyword.

I'm reluctant to use an incompatible flag, just because unwinding and removing those always seems to be more work than I expect, but using an incompatible flag to control the default is a promising idea. One issue is that it would be global, for all toolchain types, so if one set of rules is updated and another isn't, users may be confused when they apply the flag and builds begin to fail.

And finally, Greg, with the new API rules will need to check the Bazel version in order to be backwards compatible. Any rule that wants to use the new API with Bazel 6.0+, but also be compatible with Bazel 5.0, will need to do a version check when defining rules or else have to branch their rules. Based on my experiences with enabling the toolchain transition, rules authors prefer to have a single version of their rules that can work with any supported Bazel version.

Rule authors, please correct me if this is not correct.

Thanks!
John

Tony Aiuto

unread,
Jan 26, 2022, 11:07:05 AM1/26/22
to John Cater, bazel-dev
I'm not a fan of adding the flag either.  Let's say we don't. That leaves two behaviors for rules that are still using the older API.
  • toolchains are mandatory.
    • You can't have optional unless you use the newer API.
    • Old rules will continue to work as before
    • No surprises
    • but you can not make use of optional toolchains until the new API is available
  • toolchains are optional
    • A rule author can take advantage of optionality today, ahead of any API change.
    • When the new API is available, they should switch to the new API.
    • If we can not find a toolchain, some rules will fail hard in rule evaluation rather than before evaluation.
    • But, in the new API, mandatory is the default. I think there would be some confusion.
Then, the question to the group is how bad would the change in failure mode be? My suspicion is that it is not a problem.

We could test the unflagged behavior on a large code base(tm) very quickly to see if we get terrible problems.
OTOH, the value of being able to move rules earlier than config.toolchain_type() being available it is limited. If we can implement that quickly, we might as well go for that.

Another thought.  in toolchain_type, what if we used optional (default False) rather than mandatory. That has the property that most people won't have to specify mandatory or not, and the optionality (the new feature) is explicitly enabled with a "True".



John Cater

unread,
Jan 26, 2022, 2:57:49 PM1/26/22
to bazel-dev
I'm leaning towards the following:

1. We add the new API, but it's still possible to just declare toolchain types as a string/label
2. If you're using the legacy string/label form, the toolchain dependency is mandatory. Otherwise, you can use the API to clarify.
3. I lean towards the default for the `config.toolchain_type` API to not be mandatory, but I'm open to arguments about this.

This lets existing toolchains continue to work and migrate as Bazel releases happen. One concern is that we will probably migrate to use this sooner inside Google than out, because it will be at least a Bazel release or two before the API is out, so internal rule authors may need to be careful about releases.

Greg Estren

unread,
Jan 26, 2022, 5:18:56 PM1/26/22
to John Cater, bazel-dev
Sounds like there's a consensus on Option 2, with these benefits:
  • handles ambiguities like "platform A supports mandatory toolchain 1", "platform B supports optional toolchains 2 and 3" (we want to choose platform A even though it matches less toolchains, because what it matches is more important) Option A isn't expressive enough for this
  • use cases aren't just theoretical: rules really do have conditional logic paths with different toolchain needs.  The GitHub issue behind this proposal mentions some.
  • explicit behavior, easier for tooling, static analysis to understand
  • preserves today's performance" missing mandatory toolchains fail without having to wait forever for a huge build graph to load. This may sound silly but I think it's a legitimate issue when debugging rule & toolchain definitions.
  • easier migration path: rule writers can opt in at their leisure because there's an explicit API to do so 
It is a complication of the Starlark API, which always needs a compelling argument. I think the above are compelling. Plus, the patterns it adds are standard patterns (rule attributes look similar), so it's not conceptual complication.

Alex mentioned upthread this is nicely similar to how rule attributes depend on aspects. Furthermore, attributes themselves have a mandatory attribute (which defaults to false). For that reason I suggest re-using mandatory for greater consistency and simplicity. Defaulted to whichever default is most reasonable for a generic toolchain. John - one nice thing about a True default is provides clear messaging when users get it wrong. If a rule writer completely misses this part of the documentation and it defaults to false, when the toolchain is missing it'll be an obscure interpreter error (since they didn't think to check for it). If it defaults to true, when the toolchain is missing Bazel will actively explain the error. Then the user can take that feedback and alter their logic as needed.

I agree that the legacy API should mean "mandatory". This preserves 100% semantic parity with today. It's just more work and confusion for everyone if semantics change out from underneath them.

John Cater

unread,
Jan 27, 2022, 12:34:59 PM1/27/22
to bazel-dev
Thanks everyone. I've updated the proposal (https://github.com/bazelbuild/proposals/blob/main/designs/2022-01-21-optional-toolchains.md) to clarify the selected API option and to make sure the defaults are clearly specified.

Please take another look and let me know if there are other questions, I'd like to finalize this in the next few days.

Ivo Ristovski List

unread,
Jan 31, 2022, 6:39:10 AM1/31/22
to bazel-dev
My 2 cents. A while ago I was thinking of an alternative design just never got to propose it.

In Java there is a dummy toolchain, which is there so that toolchain resolution doesn't fail and at the same time to report an error when it's actually being used. The ugly piece of code is here https://github.com/bazelbuild/bazel/blob/master/tools/jdk/local_java_repository.bzl#L194

An alternative design to currently proposed one would be to extend `toolchain_type` rule with an `error` attribute. Toolchain resolution of that type (when the attribute is set) should not fail. An when the toolchain gets actually used it should result in the error that is on the attribute.

I think this kind of solution is a bit more declarative than imperative (comparing to current proposal). Caveat is how to format the error message, or what placeholders to provide in the message (or maybe using selects for the message does everything we need already?)

I don't know about other use cases, but I think it covers Java's use case and Go's cgo use case and without much more effort by the rule authors.

WDYT?

John Cater

unread,
Jan 31, 2022, 9:40:44 AM1/31/22
to bazel-dev
Unfortunately, the dummy toolchain in any form does not work in the presence of multiple execution platforms, due mostly to the simplistic notion of "priority" that toolchain resolution uses (and discussed here).

The core motivation of this is to remove the dummy CC toolchain, which is causing us some pain inside Google.

Imagine this scenario:
1. You have two pools of remote workers, one linux-based and one not (we'll say darwin for convenience).
  a. You have to register them in some order: either ":linux_worker, :darwin_worker", or ":darwin_worker, :linux_worker"
2. You have three toolchains: linux, darwin, and dummy.
  a. The ordering here is easier, let's assume the order is ":linux_toolchain, :darwin_toolchain, :dummy_toolchain".
  b. The linux and darwin toolchains have exec compatibility set to match the linux and darwin execution platforms.
  c. The dummy toolchain has an empty execution compatibility, and is therefore compatible with anything.
  d. The dummy toolchain is still needed, because some users are building for a different target architecture that hasn't yet updated to toolchains.
3. No cross-compilation is set up: all linux builds target linux and all darwin builds target darwin.
  a. This is a simplification but the issue still arises if this isn't true.

When you attempt to build a linux binary, Bazel tries to find the best set of execution platform and toolchains:
1. Execution platform :linux_worker would use toolchain :linux_toolchain
2. Execution platform :darwin_worker would use toolchain :dummy_toolchain, because :darwin_toolchain isn't appropriate.

Since :linux_worker is higher priority, it is selected, no problem.

When you attempt to build a darwin binary, however:
1. Execution platform :linux_worker would use toolchain :dummy_toolchain, because :linux_toolchain isn't appropriate
2. Execution platform :darwin_worker would use toolchain :darwin_toolchain.

Since :linux_worker is higher priority, it is chosen, and then your build cannot continue (with a hard crash currently, and an error using your proposal).

If you swap the priority of :linux_worker and :darwin_worker, you swap the analysis and one case still cannot select a valid toolchain, even though one exists.

This would be solved with mandatory toolchains if you could remove the dummy toolchain entirely: then the incorrect execution platform would be rejected as inappropriate, and the correct one would be used. Unfortunately, because there are still cases where toolchains can't be used (Apple rules and Android rules both use CC toolchains, and are currently not completely ready for toolchain resolution), we need the dummy toolchain to prevent errors in (unused) toolchain resolution from stopping builds.

Ivo Ristovski List

unread,
Feb 1, 2022, 1:18:01 AM2/1/22
to John Cater, bazel-dev
It looks like there's an agreement we need to remove dummy toolchains.

One thing that might be missing is that Java toolchainisation uses the dummy toolchain to report an error message when the local JDK is missing. There were long threads of discussions about this, and the result of them is that by default (zero-configuration) we need to depend on the local JDK (if we don't, then upgrading Bazel could break builds) and we need to report an error with directions when it's missing (otherwise users loose time figuring out the issue).

I think only with the "optional toolchains" proposal, we can't remove the dummy toolchain from Java. But if the dummies remain, it will potentially break resolution of other rules that consider Java toolchain as optional (I don't believe there is anybody, but we can generalize this example).

I can see how optional toolchains are useful for Java. CC toolchain would be optional in Java. And I believe there were complaints about dependency to CC, where people are developing pure Java. Not only useful, they are necessary - without extra information the resolution can't decide which toolchain is more important to Java and having CC and no Java toolchain would result in problems.

It seems to me that we need both ingredients to actually remove dummy toolchains: optional toolchains and the error message. You can think about the error message on the toolchain_type as a "virtual dummy" that has the lowest priority.

I've got one naive question. Do we really need to execute each target on only one execution platform? It seems the execution platform could be a property of an action, instead the whole rule. Is complexity the reason this is not true? In the case of Java rules, running javac on one system and gcc on another doesn't sound like a problem; but running Go compiler and Cc linker on different execution platforms might be a problem in practice.


You received this message because you are subscribed to a topic in the Google Groups "bazel-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bazel-dev/PdRF3Lmrln8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bazel-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-dev/3237d5ca-fa38-44fb-a6f5-4e2ca2cf342an%40googlegroups.com.


--

Ivo List | Software Engineer | il...@google.com | 

Greg Estren

unread,
Feb 1, 2022, 10:50:24 AM2/1/22
to Ivo Ristovski List, John Cater, bazel-dev
This proposal LGTM as reviewer Ivo: if you still consider your comments blocking please let me know.

Re: Java, are you saying that the dummy toolchain produces an error with details that are specific to that toolchain? i.e. what's the qualitative difference vs. having Java logic declare it as optional and triggering the error in the Java logic itself? 

That's a good point that dummy toolchains won't mix well with rules that consider them optional. To recap, the main challenge I'm hearing you express about Java toolchains is that when they're missing you want something more descriptive then the generic "toolchain resolution failed" message? It's true that declaring it optional is an unintuitive workaround for that. But if John's proposal adds a config.toolchain_type dict with a "mandatory" attribute, why couldn't it also have a "custom_error" attribute?



John Cater

unread,
Feb 1, 2022, 11:04:00 AM2/1/22
to bazel-dev
Ivo: This proposal allows rule authors to opt into optional toolchains: Java can remain as-is until you're ready to change.

If you want custom errors with Java toolchains, one way is to enable optional toolchains, remove the dummy, and have the rule implementation emit an appropriate error when the toolchain isn't present. Just because Bazel considers the toolchain to be optional doesn't mean that the rule has to proceed.

Also, we do have the capability for different actions to have different execution groups and different toolchains. Take a look at the docs for execution groups, which does exactly that.

Ivo Ristovski List

unread,
Feb 1, 2022, 1:44:39 PM2/1/22
to John Cater, bazel-dev
Hey all, 

 Ivo: if you still consider your comments blocking please let me know.
No, I don't consider my comments as blockers for the proposal. LGTM from my side.
 
Re: Java, are you saying that the dummy toolchain produces an error with details that are specific to that toolchain?
Yes. Specific to toolchain type and the configuration for which it was resolved.

i.e. what's the qualitative difference vs. having Java logic declare it as optional and triggering the error in the Java logic itself? 
 
If you want custom errors with Java toolchains, one way is to enable optional toolchains, remove the dummy, and have the rule implementation emit an appropriate error when the toolchain isn't present. Just because Bazel considers the toolchain to be optional doesn't mean that the rule has to proceed.

I don't think this would work. The problem with making both Java and CC toolchain optional in Java rules is: the resolution criteria (in the proposal it says "satisfies the most toolchain types") could match an execution platform with only CC, even in cases where there was a second execution platform with only Java toolchain.

 But if John's proposal adds a config.toolchain_type dict with a "mandatory" attribute, why couldn't it also have a "custom_error" attribute?

This could help, although I'm not sure it is enough - so far there is some customization in the message, which might be easier to do on a rule's attribute. I'm not sure if it's helpful to the user, saying something general "Java runtime toolchain couldn't be resolved.", without being able to say this is for target platform or for execution platform, or that the failure is due to JAVA_HOME setting.

Also, we do have the capability for different actions to have different execution groups and different toolchains. Take a look at the docs for execution groups, which does exactly that.

Nice, I didn't know about this one. So instead of saying `toolchains = ["//java", "//cc"]`, I could say 
 ``
exec_groups = {
      "java": exec_group(toolchains = [config.toolchain_type("//java", mandatory = False)]),
      "cc": exec_group(toolchains = [config.toolchain_type("//cc", mandatory = False)]),
   }
```
And if I don't get a Java toolchain, does this mean there isn't any? This would make it possible to print the error message from the rule's implementation. It's more imperative and perhaps needs a hack to detect target/exec configuration, but it works.


John Cater

unread,
Feb 1, 2022, 2:56:52 PM2/1/22
to bazel-dev
On Tuesday, February 1, 2022 at 1:44:39 PM UTC-5 Ivo Ristovski List wrote:
If you want custom errors with Java toolchains, one way is to enable optional toolchains, remove the dummy, and have the rule implementation emit an appropriate error when the toolchain isn't present. Just because Bazel considers the toolchain to be optional doesn't mean that the rule has to proceed.

I don't think this would work. The problem with making both Java and CC toolchain optional in Java rules is: the resolution criteria (in the proposal it says "satisfies the most toolchain types") could match an execution platform with only CC, even in cases where there was a second execution platform with only Java toolchain.

Right, we'd need a better way to handle this. Using execution groups (as you mention below) might be better. This proposal doesn't address questions like "I really want toolchain foo but maybe also bar if it's not a problem", and I'm not sure how (API-wise) we'd add a way to specify that in a future API.
 
 But if John's proposal adds a config.toolchain_type dict with a "mandatory" attribute, why couldn't it also have a "custom_error" attribute?

This could help, although I'm not sure it is enough - so far there is some customization in the message, which might be easier to do on a rule's attribute. I'm not sure if it's helpful to the user, saying something general "Java runtime toolchain couldn't be resolved.", without being able to say this is for target platform or for execution platform, or that the failure is due to JAVA_HOME setting.

Yes, the current error message is very blunt (which is why the Java toolchain has its own handling). Adding a per-rule message helps a bit but I'm not sure how much we'd want to parameterize it, and attaching an entire starlark function to create an error is definitely overkill.
 
Also, we do have the capability for different actions to have different execution groups and different toolchains. Take a look at the docs for execution groups, which does exactly that.

Nice, I didn't know about this one. So instead of saying `toolchains = ["//java", "//cc"]`, I could say 
 ``
exec_groups = {
      "java": exec_group(toolchains = [config.toolchain_type("//java", mandatory = False)]),
      "cc": exec_group(toolchains = [config.toolchain_type("//cc", mandatory = False)]),
   }
```
And if I don't get a Java toolchain, does this mean there isn't any? This would make it possible to print the error message from the rule's implementation. It's more imperative and perhaps needs a hack to detect target/exec configuration, but it works.

Right, to get back to your earlier situation: you really want a Java toolchain, and if possible a CC. This configuration would mean that each is resolved separately (and possibly with different execution platforms), so you could emit the correct error messages (or just fall back to not supporting native, if CC is missing) as needed.

One concern is the "possibly different execution platforms" part, though: if the best Java toolchain is for Linux, and the only CC toolchain is for Darwin, is that acceptable? There's no easy way in the rule implementation to check that (although since you can check properties of the CC toolchain it should be obvious).

Either way, I think the current proposal is still a step in the right direction. If Java needs more flexibility I think we can continue to iterate on the capabilities in future work.

John
 

Ivo Ristovski List

unread,
Feb 2, 2022, 12:18:01 PM2/2/22
to John Cater, bazel-dev
I have some second thoughts on this topic. Sorry, for that, but I was basing my previous argument on Java rules to have one toolchain mandatory and the second one optional. And that this is the information toolchain resolution needs.

If they are both optional and resolved separately, this gives almost no information to toolchain resolution. It looks like it's there just for the sake of error handling and this seems wrong to me. Wrong because, from Java rules perspective I can't produce a good error message for CC toolchain and wrong because other rules that need Java toolchain (like android or kotlin rules for example) won't report the same error as Java rules.

Thus, it seems Java rules don't represent a proper use case for the optional toolchains anymore. Other rules might, though.


--
You received this message because you are subscribed to a topic in the Google Groups "bazel-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bazel-dev/PdRF3Lmrln8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bazel-dev+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages