--
You received this message because you are subscribed to the Google Groups "bazel-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-dev/CALR1tuM8heuoy%2Btds3B%3DMwkL1w42zSx7qP9MLKk4AVQO4-1Xjg%40mail.gmail.com.
1. Does RunfilesLibraryInfo work well?AFAICT it is an optimization so that the mapping file is not written if there is no runtime library to read it anyway and so that not the full repository mapping of the whole transitive closure needs to be provided?
I think the goal is noble, but it raises all sorts of questions: for example, what if the runfiles library is hidden by an abstraction in the user code (e.g. library R depends on the runfiles library) and then the main binary B depends on the runfiles library abstraction R and an unrelated library X and B passes a label from X to R and then R passes that to the actual runfiles library? Will the repository mapping data for X and its transitive dependencies be in the mapping file?
2. How do indirect dependencies work?It is possible that the mapping for an indirect dependency is required to resolve runfiles paths properly, for example, in this case:main repo, a/BUILD:sh_binary(name="a", data=["@repo1//b:data"])repo1-0.1, b/BUILD:filegroup(name="b", data=["@repo2//c:data"])repo2-0.2, c/BUILD:filegroup(name="c", data=["c"])In this case, AFAIU the only file in the runfiles tree will be repo2-0.2/c/c. Is this case handled correctly? It's not obvious to me because the actual file depends on the canonical repository for the apparent repository "repo2" in "repo1-0.1".
3. Can we assume that the runfiles library is always used?I don't have data as to whether this is the case (at Google, it's not widely used but then again, we don't have external repositories, either) If not, that's obviously a problem but theoretically, it's possible to extend the runfiles tree to contain paths like $CONTAINING_REPO/$APPARENT_REPO/$REPO_PATH. It would probably blow up its size quite a bit, though.
Thanks for the great questions. I do believe that some of them are new, so I will provide detailed answers inline.1. Does RunfilesLibraryInfo work well?AFAICT it is an optimization so that the mapping file is not written if there is no runtime library to read it anyway and so that not the full repository mapping of the whole transitive closure needs to be provided?While it's true that RunfilesLibraryInfo is an optimization and not a crucial ingredient of the proposal, it does serve a second purpose: It is meant to be used by rulesets to selectively register additional actions needed to provide a constant containing the current canonical repository name. The C++ rules can simply add a local_define, which isn't costly, but the Java rules have to generate a Java file and compile it. Without a concept similar to RunfilesLibraryInfo, the latter would add two actions for every Java target, which may be prohibitive.I think the goal is noble, but it raises all sorts of questions: for example, what if the runfiles library is hidden by an abstraction in the user code (e.g. library R depends on the runfiles library) and then the main binary B depends on the runfiles library abstraction R and an unrelated library X and B passes a label from X to R and then R passes that to the actual runfiles library? Will the repository mapping data for X and its transitive dependencies be in the mapping file?All cases of this I have encountered in real life were of one of the following two types:1. R is a general purpose wrapper around a runfiles library meant to be used by other repositories (e.g. a Scala or Clojure wrapper around the Java runfiles library). In this case, it wouldn't seem out of place to demand R to advertise RunfilesLibraryInfo itself - in fact, by that point, R more or less is a runfiles library.2. R is a convenience wrapper around a runfiles library meant to be used only in its own repository (e.g. it implements a custom lookup procedure for the runfiles manifest and/or stores its canonical repository name so that Rlocation calls don't have to specify it). In this case, R wouldn't have to be marked as a runfiles library since all users of R are located in the same repository and thus share a repository mapping with it. This mapping will be included in the manifest since R itself is tracked as a user of a runfiles library.The RunfilesLibraryInfo approach would only become somewhat confusing for users that wrap a runfiles library and pass labels from multiple different repositories to it - but I wonder whether there is any abstraction of this kind that shouldn't be considered a full-fledged runfiles library, for which it would again be reasonable to demand that it advertises RunfilesLibraryInfo.
2. How do indirect dependencies work?It is possible that the mapping for an indirect dependency is required to resolve runfiles paths properly, for example, in this case:main repo, a/BUILD:sh_binary(name="a", data=["@repo1//b:data"])repo1-0.1, b/BUILD:filegroup(name="b", data=["@repo2//c:data"])repo2-0.2, c/BUILD:filegroup(name="c", data=["c"])In this case, AFAIU the only file in the runfiles tree will be repo2-0.2/c/c. Is this case handled correctly? It's not obvious to me because the actual file depends on the canonical repository for the apparent repository "repo2" in "repo1-0.1".This is indeed a tricky situation that I don't think has any good answer other than labeling this a "runfiles strict deps violation" and discouraging it. With Bzlmod, there purposefully is no way to refer to repo2-0.2/c from the main repo without declaring a dependency on and thus a repo mapping entry for the repo2 module (assuming for simplicity that all repos in the example are actually Bazel modules). Even with a full repository mapping not filtered by RunfilesLibraryInfo, looking up repo2-0.2/c from the main repo would require specifying the apparent repo name "repo2" relative to the canonical repo name "repo1-0.1" - but how would the main repo target //:a get access to the dynamic canonical repo name "repo1-0.1" in the first place?I think that the only solution that fits the spirit of Bzlmod's strict deps approach thus is to make repo2-0.2 a proper dependency of the main repo and data depend on @repo2//:c directly from //:a.
3. Can we assume that the runfiles library is always used?I don't have data as to whether this is the case (at Google, it's not widely used but then again, we don't have external repositories, either) If not, that's obviously a problem but theoretically, it's possible to extend the runfiles tree to contain paths like $CONTAINING_REPO/$APPARENT_REPO/$REPO_PATH. It would probably blow up its size quite a bit, though.We can't assume that it is, just like many rulesets today rely on the runfiles symlink tree being present and thus break on Windows with symlinks disabled. From my point of view, it makes sense to require rulesets to either a) use a runfiles library or b) compute runfiles path in Starlark and pass them around to tools via different means (arguments, environment variables). By having a canonical API for runfiles lookups, future changes to the layout of runfiles (see e.g. https://github.com/bazelbuild/bazel/pull/14676) can be implemented without breaking rulesets. Since Bzlmod will necessarily require changes, this may be the right time to migrate everyone to an actual API and away from implementation details.
On Fri, Jul 22, 2022 at 4:34 PM Fabian Meumertzheim <fab...@meumertzhe.im> wrote:Hi,--I submitted a draft of a proposal on how to locate runfiles with Bzlmod and fix Stardoc along the way (https://github.com/bazelbuild/proposals/pull/269).The proposal entails changes to both Bazel itself as well as the runfiles libraries.I'm looking forward to comments either in this mail thread or on the GitHub PR.I'm especially interested in the feedback of ruleset maintainers, be it rules shipped with Bazel or external, on Part 4 of the proposal.Fabian
You received this message because you are subscribed to the Google Groups "bazel-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-dev/CALR1tuM8heuoy%2Btds3B%3DMwkL1w42zSx7qP9MLKk4AVQO4-1Xjg%40mail.gmail.com.
--Lukács T. Berki | Software Engineer | lbe...@google.com |Google Germany GmbH | Erika-Mann-Str. 33 | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891
Do I understand correctly that your main argument is that the caller of the runfiles library needs to pass in the canonical repository name in order to do the runfiles resolution and RunfilesLibraryInfo is the only acceptably convenient way to inject that without undue pain?
Unfortunately, as far as I can tell, that cannot be determined in a language-agnostic way because it has to be done in one of the following ways:
- Walking the call stack at runtime
- Declaring a symbol that always has the right value (the current local_define / inlined static constant proposal)
- Asking people to hard-wire it in the code (it's not as bad as it appears, because AFAIU the canonical repository name is already encoded in the MODULE.bazel file of the repository)
It feels weird to add this feature to Bazel just for the sake of Java. Have you thought about how this could work for Python (or Rust, etc.)? If it's absolutely necessary, sure, go ahead, but if it's just a Java thing, let's find an approach that doesn't require widening the interface of Bazel just for a single programming language.
(my gut feeling is that it's odd to make a configured target do things differently based on whether any of its direct dependencies has a marker provider but let's not make decisions based on gut feelings if we can help)
FWIW, I think the more general issue this case uncovers is that runfiles is not really a good abstraction because the input to the path resolution is an "apparent runfiles path" whereas everywhere else, paths are resolved from labels. Concretely, you are totally right that :a should not even know about the existence @repo2 but there is currently no way for a binary to ask at runtime "what files does @repo1//b:data contain?". If your argument is that this use case is already broken currently and this change does not make it any more broken, I could get behind that :)
3. Can we assume that the runfiles library is always used?I don't have data as to whether this is the case (at Google, it's not widely used but then again, we don't have external repositories, either) If not, that's obviously a problem but theoretically, it's possible to extend the runfiles tree to contain paths like $CONTAINING_REPO/$APPARENT_REPO/$REPO_PATH. It would probably blow up its size quite a bit, though.We can't assume that it is, just like many rulesets today rely on the runfiles symlink tree being present and thus break on Windows with symlinks disabled. From my point of view, it makes sense to require rulesets to either a) use a runfiles library or b) compute runfiles path in Starlark and pass them around to tools via different means (arguments, environment variables). By having a canonical API for runfiles lookups, future changes to the layout of runfiles (see e.g. https://github.com/bazelbuild/bazel/pull/14676) can be implemented without breaking rulesets. Since Bzlmod will necessarily require changes, this may be the right time to migrate everyone to an actual API and away from implementation details.I think (b) would give rise to a menagerie of ad-hoc mechanisms, all awful, so if it really needs to be, it should be (a). If your argument is that we can't support runfiles everywhere (because Windows) and that we should therefore mandate the use of a library, it would be a big change that I emotionally like because frankly, runfiles trees are a lot of baggage and have caused a lot of bugs and pain so I'd love to see them go away. However, they are *way* too widely used at Google for us to be able to remove the support code within a reasonable time frame and they are just too convenient for shell and Python so we'll have to pay the price anyway, which means that if we can make them work halfway well in Bazel, it's worth keeping them.
I'm somewhat inclined to think a bit more about runfiles trees, but I'll avoid the temptation because coupling that change to bzlmod would make the success of both less likely.
On Tue, Sep 6, 2022 at 9:22 AM 'Lukács T. Berki' via bazel-dev <baze...@googlegroups.com> wrote:Do I understand correctly that your main argument is that the caller of the runfiles library needs to pass in the canonical repository name in order to do the runfiles resolution and RunfilesLibraryInfo is the only acceptably convenient way to inject that without undue pain?Yes, that is the most important part. although I would say that I find the trimmed repository mapping manifests rather important (but not strictly necessary) as well.Unfortunately, as far as I can tell, that cannot be determined in a language-agnostic way because it has to be done in one of the following ways:
- Walking the call stack at runtime
- Declaring a symbol that always has the right value (the current local_define / inlined static constant proposal)
- Asking people to hard-wire it in the code (it's not as bad as it appears, because AFAIU the canonical repository name is already encoded in the MODULE.bazel file of the repository)
It's crucial to note that 3. can't work: The canonical repository name is not a function of the information in MODULE.bazel, but rather depends on the outcome of the Minimal Version Selection algorithm. A module could of course duplicate its "version" parameter of the "module" directive into a Starlark constant and inject that into code that looks up runfiles, but even that would break if the root module declares an override for the module. Since overrides use different canonical repository names (not including the version) and a module can't determine whether it has been overridden, the canonical repository name must be assumed to be fully dynamic. This essentially leaves only option 1 and 2.
It feels weird to add this feature to Bazel just for the sake of Java. Have you thought about how this could work for Python (or Rust, etc.)? If it's absolutely necessary, sure, go ahead, but if it's just a Java thing, let's find an approach that doesn't require widening the interface of Bazel just for a single programming language.IMO it's the other way round: This feature is mostly there for the sake of all the other languages out there that I don't even know of and that may have to employ any number of arcane (and costly) hacks to embed the canonical repository name into their compiled representation. For example, as far as I know, Rust doesn't have textual macros like C/C++ and would thus likely also require code generation comparable to the current approach for Java. By providing these rulesets with a canonical, cross-ruleset indicator of runfiles library usage (e.g., someone might use the C++ runfiles library from Rust via FFI, which RunfilesLibraryInfo could handle gracefully), we put them in the best position to limit the fallout of this change to the smallest number of targets necessary.
(my gut feeling is that it's odd to make a configured target do things differently based on whether any of its direct dependencies has a marker provider but let's not make decisions based on gut feelings if we can help)The following perspective has allowed my gut to make peace with this, maybe it helps you too :-)Assume for the moment that Bazel internally merged all repositories into a single one with a dynamic name and thus the canonical repository name we needed to inject would be a whole-build constant. In that case, we would offer this value as e.g. a Java constant provided by the Java runfiles library and, crucially, the UX of that would be the same as it is with the proposal: The configured target would only be able to use the constant if it depends on the Java runfiles library directly due to strict deps. Granted, the value would behave differently, but since the users of the runfiles library aren't expected to do anything with the value other than pass it back to the runfiles library, that shouldn't matter.
FWIW, I think the more general issue this case uncovers is that runfiles is not really a good abstraction because the input to the path resolution is an "apparent runfiles path" whereas everywhere else, paths are resolved from labels. Concretely, you are totally right that :a should not even know about the existence @repo2 but there is currently no way for a binary to ask at runtime "what files does @repo1//b:data contain?". If your argument is that this use case is already broken currently and this change does not make it any more broken, I could get behind that :)Yes, the missing link between labels and runfiles paths already makes runfiles hard to use today. I worked on rules_runfiles (https://github.com/fmeum/rules_runfiles) specifically to prototype a solution to this that along the way also solves the Bzlmod repo mapping problem, but found the UX to be too different from what people are used to to force it on everyone using Bzlmod instead of this proposal.
3. Can we assume that the runfiles library is always used?I don't have data as to whether this is the case (at Google, it's not widely used but then again, we don't have external repositories, either) If not, that's obviously a problem but theoretically, it's possible to extend the runfiles tree to contain paths like $CONTAINING_REPO/$APPARENT_REPO/$REPO_PATH. It would probably blow up its size quite a bit, though.We can't assume that it is, just like many rulesets today rely on the runfiles symlink tree being present and thus break on Windows with symlinks disabled. From my point of view, it makes sense to require rulesets to either a) use a runfiles library or b) compute runfiles path in Starlark and pass them around to tools via different means (arguments, environment variables). By having a canonical API for runfiles lookups, future changes to the layout of runfiles (see e.g. https://github.com/bazelbuild/bazel/pull/14676) can be implemented without breaking rulesets. Since Bzlmod will necessarily require changes, this may be the right time to migrate everyone to an actual API and away from implementation details.I think (b) would give rise to a menagerie of ad-hoc mechanisms, all awful, so if it really needs to be, it should be (a). If your argument is that we can't support runfiles everywhere (because Windows) and that we should therefore mandate the use of a library, it would be a big change that I emotionally like because frankly, runfiles trees are a lot of baggage and have caused a lot of bugs and pain so I'd love to see them go away. However, they are *way* too widely used at Google for us to be able to remove the support code within a reasonable time frame and they are just too convenient for shell and Python so we'll have to pay the price anyway, which means that if we can make them work halfway well in Bazel, it's worth keeping them.In the simple scenario where one is only using runfiles with symlinks enabled *and* all lookups happen in the root module, I would say it's totally fine not to use runfiles libraries. I assume that this covers a large number of use cases (including all of Google) and I think it's important to preserve that UX (at least via explicit opt-in). But for all cases that are more complicated than that, I would like everyone to rely on a common API.
I'm somewhat inclined to think a bit more about runfiles trees, but I'll avoid the temptation because coupling that change to bzlmod would make the success of both less likely.If you are interested, we could start a separate effort/discussion to make runfiles lookups more "Bazely" long-term.
> Wait, then I have grossly misunderstood how bzlmod works: I thought that the canonical name of a module is only a function of its own MODULE.bazel flie and it's only the apparent repository -> canonical repository mapping that depends on the version selection algorithm. IOW, if the MODULE.bazel file contains module(name="foo", version="0.1") the code in that module will have foo-0.1 as its canonical repository name. Am I wrong here?This is indeed incorrect. As Fabian said, the final canonical repository name would be `foo~$VERSION` where $VERSION is the resolved version of `foo`, which may be higher than 0.1. Since we only want one version of `foo` in our codebase, and if someone else requests `foo@0.2`, we will choose version 0.2 per Minimal Version Selection.
--
You received this message because you are subscribed to the Google Groups "bazel-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-dev/CAOu%2B0LVWWj0-6XvMsZrHG%3D2vmqT-BQR8rB5aqdYeJZVcDSaABQ%40mail.gmail.com.
> To make sure I understand correctly: then even though the source code says that it's foo-0.1, the canonical repository name will be foo-0.2 for the 0.1 sources?What do you mean by "the source code"? There is no source code of foo@0.1 anywhere, as we don't even fetch it. We'd only fetch foo@0.2 sources (when, for example, someone asks to load `@foo//something`).
Ah sorry, I misread your comment. You said `module(name="foo",version="0.1")`, not `bazel_dep(name="foo",version="0.1")`.If you're talking about the scenario where the main repo / root module is this foo@0.1, then the name/version of it has no bearing on its own canonical repo name, which is *always* just the empty string. (cf. the label "@//something/in/the/main:repo")
If I understand correctly, historically runfiles for the main repo have been placed in the subdirectory named after the workspace name (declared in the WORKSPACE file). We're not changing that as of yet, and changing it isn't a huge deal anyway since we'd just update the runfiles library. (Obviously this hinges on everyone converting to use the runfiles library as opposed to creating their own runfile paths, but I sort of just took that for granted)
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-dev/CAOu%2B0LUdrRLoWDks9C61VEp_GYYWkyVSef_%3DpzpDGdf1TgQr_Q%40mail.gmail.com.
I checked transitivePackagesForPackageRootResolution and it seems to reliably contain the information we need. It is cleared only when the analysis cache is cleared, which I assume won't happen during the build.