[Proposal] Locating runfiles with Bzlmod

134 views
Skip to first unread message

Fabian Meumertzheim

unread,
Jul 22, 2022, 10:34:50 AM7/22/22
to bazel-dev
Hi,

I submitted a draft of a proposal on how to locate runfiles with Bzlmod and fix Stardoc along the way (https://github.com/bazelbuild/proposals/pull/269). 

The proposal entails changes to both Bazel itself as well as the runfiles libraries.

I'm looking forward to comments either in this mail thread or on the GitHub PR.

I'm especially interested in the feedback of ruleset maintainers, be it rules shipped with Bazel or external, on Part 4 of the proposal.

Fabian

Lukács T. Berki

unread,
Sep 5, 2022, 11:04:14 AM9/5/22
to Fabian Meumertzheim, bazel-dev, Xudong Yang
I belatedly read this design doc (first day back at work!) and I think the general principle of embedding the information about repository mappings into the runfiles manifest in some form is quite reasonable. I have a few questions, though. I hope y'all have already touched upon all this and I just need pointers to the pertinent conversations :)

1. Does RunfilesLibraryInfo work well? 
AFAICT it is an optimization so that the mapping file is not written if there is no runtime library to read it anyway and so that not the full repository mapping of the whole transitive closure needs to be provided? I think the goal is noble, but it raises all sorts of questions: for example, what if the runfiles library is hidden by an abstraction in the user code (e.g. library R depends on the runfiles library) and then the main binary B depends on the runfiles library abstraction R and an unrelated library X and B passes a label from X to R and then R passes that to the actual runfiles library? Will the repository mapping data for X and its transitive dependencies be in the mapping file?

2. How do indirect dependencies work?
It is possible that the mapping for an indirect dependency is required to resolve runfiles paths properly, for example, in this case:

main repo, a/BUILD:
sh_binary(name="a", data=["@repo1//b:data"])

repo1-0.1, b/BUILD:
filegroup(name="b", data=["@repo2//c:data"])

repo2-0.2, c/BUILD:
filegroup(name="c", data=["c"])

In this case, AFAIU the only file in the runfiles tree will be repo2-0.2/c/c. Is this case handled correctly? It's not obvious to me because the actual file depends on the canonical repository for the apparent repository "repo2" in "repo1-0.1".

3. Can we assume that the runfiles library is always used?
I don't have data as to whether this is the case (at Google, it's not widely used but then again, we don't have external repositories, either) If not, that's obviously a problem but theoretically, it's possible to extend the runfiles tree to contain paths like $CONTAINING_REPO/$APPARENT_REPO/$REPO_PATH. It would probably blow up its size quite a bit, though.


--
You received this message because you are subscribed to the Google Groups "bazel-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-dev/CALR1tuM8heuoy%2Btds3B%3DMwkL1w42zSx7qP9MLKk4AVQO4-1Xjg%40mail.gmail.com.


--
Lukács T. Berki | Software Engineer | lbe...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

Fabian Meumertzheim

unread,
Sep 5, 2022, 2:47:17 PM9/5/22
to Lukács T. Berki, bazel-dev, Xudong Yang
Thanks for the great questions. I do believe that some of them are new, so I will provide detailed answers inline.

1. Does RunfilesLibraryInfo work well? 
AFAICT it is an optimization so that the mapping file is not written if there is no runtime library to read it anyway and so that not the full repository mapping of the whole transitive closure needs to be provided?
 
While it's true that RunfilesLibraryInfo is an optimization and not a crucial ingredient of the proposal, it does serve a second purpose: It is meant to be used by rulesets to selectively register additional actions needed to provide a constant containing the current canonical repository name. The C++ rules can simply add a local_define, which isn't costly, but the Java rules have to generate a Java file and compile it. Without a concept similar to RunfilesLibraryInfo, the latter would add two actions for every Java target, which may be prohibitive.

I think the goal is noble, but it raises all sorts of questions: for example, what if the runfiles library is hidden by an abstraction in the user code (e.g. library R depends on the runfiles library) and then the main binary B depends on the runfiles library abstraction R and an unrelated library X and B passes a label from X to R and then R passes that to the actual runfiles library? Will the repository mapping data for X and its transitive dependencies be in the mapping file?

All cases of this I have encountered in real life were of one of the following two types:

1. R is a general purpose wrapper around a runfiles library meant to be used by other repositories (e.g. a Scala or Clojure wrapper around the Java runfiles library). In this case, it wouldn't seem out of place to demand R to advertise RunfilesLibraryInfo itself - in fact, by that point, R more or less is a runfiles library.
2. R is a convenience wrapper around a runfiles library meant to be used only in its own repository (e.g. it implements a custom lookup procedure for the runfiles manifest and/or stores its canonical repository name so that Rlocation calls don't have to specify it). In this case, R wouldn't have to be marked as a runfiles library since all users of R are located in the same repository and thus share a repository mapping with it. This mapping will be included in the manifest since R itself is tracked as a user of a runfiles library.

The RunfilesLibraryInfo approach would only become somewhat confusing for users that wrap a runfiles library and pass labels from multiple different repositories to it - but I wonder whether there is any abstraction of this kind that shouldn't be considered a full-fledged runfiles library, for which it would again be reasonable to demand that it advertises RunfilesLibraryInfo.

2. How do indirect dependencies work?
It is possible that the mapping for an indirect dependency is required to resolve runfiles paths properly, for example, in this case:

main repo, a/BUILD:
sh_binary(name="a", data=["@repo1//b:data"])

repo1-0.1, b/BUILD:
filegroup(name="b", data=["@repo2//c:data"])

repo2-0.2, c/BUILD:
filegroup(name="c", data=["c"])

In this case, AFAIU the only file in the runfiles tree will be repo2-0.2/c/c. Is this case handled correctly? It's not obvious to me because the actual file depends on the canonical repository for the apparent repository "repo2" in "repo1-0.1".

This is indeed a tricky situation that I don't think has any good answer other than labeling this a "runfiles strict deps violation" and discouraging it. With Bzlmod, there purposefully is no way to refer to repo2-0.2/c from the main repo without declaring a dependency on and thus a repo mapping entry for the repo2 module (assuming for simplicity that all repos in the example are actually Bazel modules). Even with a full repository mapping not filtered by RunfilesLibraryInfo, looking up repo2-0.2/c from the main repo would require specifying the apparent repo name "repo2" relative to the canonical repo name "repo1-0.1" - but how would the main repo target //:a get access to the dynamic canonical repo name "repo1-0.1" in the first place?

I think that the only solution that fits the spirit of Bzlmod's strict deps approach thus is to make repo2-0.2 a proper dependency of the main repo and data depend on @repo2//:c directly from //:a.

3. Can we assume that the runfiles library is always used?
I don't have data as to whether this is the case (at Google, it's not widely used but then again, we don't have external repositories, either) If not, that's obviously a problem but theoretically, it's possible to extend the runfiles tree to contain paths like $CONTAINING_REPO/$APPARENT_REPO/$REPO_PATH. It would probably blow up its size quite a bit, though.

We can't assume that it is, just like many rulesets today rely on the runfiles symlink tree being present and thus break on Windows with symlinks disabled. From my point of view, it makes sense to require rulesets to either a) use a runfiles library or b) compute runfiles path in Starlark and pass them around to tools via different means (arguments, environment variables). By having a canonical API for runfiles lookups, future changes to the layout of runfiles (see e.g. https://github.com/bazelbuild/bazel/pull/14676) can be implemented without breaking rulesets. Since Bzlmod will necessarily require changes, this may be the right time to migrate everyone to an actual API and away from implementation details.

Lukács T. Berki

unread,
Sep 6, 2022, 3:22:14 AM9/6/22
to Fabian Meumertzheim, bazel-dev, Xudong Yang
On Mon, Sep 5, 2022 at 8:47 PM Fabian Meumertzheim <fab...@meumertzhe.im> wrote:
Thanks for the great questions. I do believe that some of them are new, so I will provide detailed answers inline.

1. Does RunfilesLibraryInfo work well? 
AFAICT it is an optimization so that the mapping file is not written if there is no runtime library to read it anyway and so that not the full repository mapping of the whole transitive closure needs to be provided?
 
While it's true that RunfilesLibraryInfo is an optimization and not a crucial ingredient of the proposal, it does serve a second purpose: It is meant to be used by rulesets to selectively register additional actions needed to provide a constant containing the current canonical repository name. The C++ rules can simply add a local_define, which isn't costly, but the Java rules have to generate a Java file and compile it. Without a concept similar to RunfilesLibraryInfo, the latter would add two actions for every Java target, which may be prohibitive.

I think the goal is noble, but it raises all sorts of questions: for example, what if the runfiles library is hidden by an abstraction in the user code (e.g. library R depends on the runfiles library) and then the main binary B depends on the runfiles library abstraction R and an unrelated library X and B passes a label from X to R and then R passes that to the actual runfiles library? Will the repository mapping data for X and its transitive dependencies be in the mapping file?

All cases of this I have encountered in real life were of one of the following two types:

1. R is a general purpose wrapper around a runfiles library meant to be used by other repositories (e.g. a Scala or Clojure wrapper around the Java runfiles library). In this case, it wouldn't seem out of place to demand R to advertise RunfilesLibraryInfo itself - in fact, by that point, R more or less is a runfiles library.
2. R is a convenience wrapper around a runfiles library meant to be used only in its own repository (e.g. it implements a custom lookup procedure for the runfiles manifest and/or stores its canonical repository name so that Rlocation calls don't have to specify it). In this case, R wouldn't have to be marked as a runfiles library since all users of R are located in the same repository and thus share a repository mapping with it. This mapping will be included in the manifest since R itself is tracked as a user of a runfiles library.

The RunfilesLibraryInfo approach would only become somewhat confusing for users that wrap a runfiles library and pass labels from multiple different repositories to it - but I wonder whether there is any abstraction of this kind that shouldn't be considered a full-fledged runfiles library, for which it would again be reasonable to demand that it advertises RunfilesLibraryInfo.
Do I understand correctly that your main argument is that the caller of the runfiles library needs to pass in the canonical repository name in order to do the runfiles resolution and RunfilesLibraryInfo is the only acceptably convenient way to inject that without undue pain? Unfortunately, as far as I can tell, that cannot be determined in a language-agnostic way because it has to be done in one of the following ways:
  1. Walking the call stack at runtime
  2. Declaring a symbol that always has the right value (the current local_define / inlined static constant proposal)
  3. Asking people to hard-wire it in the code (it's not as bad as it appears, because AFAIU the canonical repository name is already encoded in the MODULE.bazel file of the repository)
It feels weird to add this feature to Bazel just for the sake of Java. Have you thought about how this could work for Python (or Rust, etc.)? If it's absolutely necessary, sure, go ahead, but if it's just a Java thing, let's find an approach that doesn't require widening the interface of Bazel just for a single programming language. 

(my gut feeling is that it's odd to make a configured target do things differently based on whether any of its direct dependencies has a marker provider but let's not make decisions based on gut feelings if we can help)


 

2. How do indirect dependencies work?
It is possible that the mapping for an indirect dependency is required to resolve runfiles paths properly, for example, in this case:

main repo, a/BUILD:
sh_binary(name="a", data=["@repo1//b:data"])

repo1-0.1, b/BUILD:
filegroup(name="b", data=["@repo2//c:data"])

repo2-0.2, c/BUILD:
filegroup(name="c", data=["c"])

In this case, AFAIU the only file in the runfiles tree will be repo2-0.2/c/c. Is this case handled correctly? It's not obvious to me because the actual file depends on the canonical repository for the apparent repository "repo2" in "repo1-0.1".

This is indeed a tricky situation that I don't think has any good answer other than labeling this a "runfiles strict deps violation" and discouraging it. With Bzlmod, there purposefully is no way to refer to repo2-0.2/c from the main repo without declaring a dependency on and thus a repo mapping entry for the repo2 module (assuming for simplicity that all repos in the example are actually Bazel modules). Even with a full repository mapping not filtered by RunfilesLibraryInfo, looking up repo2-0.2/c from the main repo would require specifying the apparent repo name "repo2" relative to the canonical repo name "repo1-0.1" - but how would the main repo target //:a get access to the dynamic canonical repo name "repo1-0.1" in the first place?

I think that the only solution that fits the spirit of Bzlmod's strict deps approach thus is to make repo2-0.2 a proper dependency of the main repo and data depend on @repo2//:c directly from //:a.
FWIW, I think the more general issue this case uncovers is that runfiles is not really a good abstraction because the input to the path resolution is an "apparent runfiles path" whereas everywhere else, paths are resolved from labels. Concretely, you are totally right that :a should not even know about the existence @repo2 but there is currently no way for a binary to ask at runtime "what files does @repo1//b:data contain?". If your argument is that this use case is already broken currently and this change does not make it any more broken, I could get behind that :)


3. Can we assume that the runfiles library is always used?
I don't have data as to whether this is the case (at Google, it's not widely used but then again, we don't have external repositories, either) If not, that's obviously a problem but theoretically, it's possible to extend the runfiles tree to contain paths like $CONTAINING_REPO/$APPARENT_REPO/$REPO_PATH. It would probably blow up its size quite a bit, though.

We can't assume that it is, just like many rulesets today rely on the runfiles symlink tree being present and thus break on Windows with symlinks disabled. From my point of view, it makes sense to require rulesets to either a) use a runfiles library or b) compute runfiles path in Starlark and pass them around to tools via different means (arguments, environment variables). By having a canonical API for runfiles lookups, future changes to the layout of runfiles (see e.g. https://github.com/bazelbuild/bazel/pull/14676) can be implemented without breaking rulesets. Since Bzlmod will necessarily require changes, this may be the right time to migrate everyone to an actual API and away from implementation details.
I think (b) would give rise to a menagerie of ad-hoc mechanisms, all awful, so if it really needs to be, it should be (a). If your argument is that we can't support runfiles everywhere (because Windows) and that we should therefore mandate the use of a library, it would be a big change that I emotionally like because frankly, runfiles trees are a lot of baggage and have caused a lot of bugs and pain so I'd love to see them go away. However, they are *way* too widely used at Google for us to be able to remove the support code within a reasonable time frame and they are just too convenient for shell and Python so we'll have to pay the price anyway, which means that if we can make them work halfway well in Bazel, it's worth keeping them.

I'm somewhat inclined to think a bit more about runfiles trees, but I'll avoid the temptation because coupling that change to bzlmod would make the success of both less likely.




On Fri, Jul 22, 2022 at 4:34 PM Fabian Meumertzheim <fab...@meumertzhe.im> wrote:
Hi,

I submitted a draft of a proposal on how to locate runfiles with Bzlmod and fix Stardoc along the way (https://github.com/bazelbuild/proposals/pull/269). 

The proposal entails changes to both Bazel itself as well as the runfiles libraries.

I'm looking forward to comments either in this mail thread or on the GitHub PR.

I'm especially interested in the feedback of ruleset maintainers, be it rules shipped with Bazel or external, on Part 4 of the proposal.

Fabian

--
You received this message because you are subscribed to the Google Groups "bazel-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-dev/CALR1tuM8heuoy%2Btds3B%3DMwkL1w42zSx7qP9MLKk4AVQO4-1Xjg%40mail.gmail.com.


--
Lukács T. Berki | Software Engineer | lbe...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

Fabian Meumertzheim

unread,
Sep 6, 2022, 4:52:12 AM9/6/22
to Lukács T. Berki, bazel-dev, Xudong Yang
On Tue, Sep 6, 2022 at 9:22 AM 'Lukács T. Berki' via bazel-dev <baze...@googlegroups.com> wrote:
Do I understand correctly that your main argument is that the caller of the runfiles library needs to pass in the canonical repository name in order to do the runfiles resolution and RunfilesLibraryInfo is the only acceptably convenient way to inject that without undue pain?

Yes, that is the most important part. although I would say that I find the trimmed repository mapping manifests rather important (but not strictly necessary) as well.
 
Unfortunately, as far as I can tell, that cannot be determined in a language-agnostic way because it has to be done in one of the following ways:
  1. Walking the call stack at runtime
  2. Declaring a symbol that always has the right value (the current local_define / inlined static constant proposal)
  3. Asking people to hard-wire it in the code (it's not as bad as it appears, because AFAIU the canonical repository name is already encoded in the MODULE.bazel file of the repository)
It's crucial to note that 3. can't work: The canonical repository name is not a function of the information in MODULE.bazel, but rather depends on the outcome of the Minimal Version Selection algorithm. A module could of course duplicate its "version" parameter of the "module" directive into a Starlark constant and inject that into code that looks up runfiles, but even that would break if the root module declares an override for the module. Since overrides use different canonical repository names (not including the version) and a module can't determine whether it has been overridden, the canonical repository name must be assumed to be fully dynamic. This essentially leaves only option 1 and 2.
 
It feels weird to add this feature to Bazel just for the sake of Java. Have you thought about how this could work for Python (or Rust, etc.)? If it's absolutely necessary, sure, go ahead, but if it's just a Java thing, let's find an approach that doesn't require widening the interface of Bazel just for a single programming language. 
 
IMO it's the other way round: This feature is mostly there for the sake of all the other languages out there that I don't even know of and that may have to employ any number of arcane (and costly) hacks to embed the canonical repository name into their compiled representation. For example, as far as I know, Rust doesn't have textual macros like C/C++ and would thus likely also require code generation comparable to the current approach for Java. By providing these rulesets with a canonical, cross-ruleset indicator of runfiles library usage (e.g., someone might use the C++ runfiles library from Rust via FFI, which RunfilesLibraryInfo could handle gracefully), we put them in the best position to limit the fallout of this change to the smallest number of targets necessary.


(my gut feeling is that it's odd to make a configured target do things differently based on whether any of its direct dependencies has a marker provider but let's not make decisions based on gut feelings if we can help)
The following perspective has allowed my gut to make peace with this, maybe it helps you too :-)
Assume for the moment that Bazel internally merged all repositories into a single one with a dynamic name and thus the canonical repository name we needed to inject would be a whole-build constant. In that case, we would offer this value as e.g. a Java constant provided by the Java runfiles library and, crucially, the UX of that would be the same as it is with the proposal: The configured target would only be able to use the constant if it depends on the Java runfiles library directly due to strict deps. Granted, the value would behave differently, but since the users of the runfiles library aren't expected to do anything with the value other than pass it back to the runfiles library, that shouldn't matter.

FWIW, I think the more general issue this case uncovers is that runfiles is not really a good abstraction because the input to the path resolution is an "apparent runfiles path" whereas everywhere else, paths are resolved from labels. Concretely, you are totally right that :a should not even know about the existence @repo2 but there is currently no way for a binary to ask at runtime "what files does @repo1//b:data contain?". If your argument is that this use case is already broken currently and this change does not make it any more broken, I could get behind that :)

Yes, the missing link between labels and runfiles paths already makes runfiles hard to use today. I worked on rules_runfiles (https://github.com/fmeum/rules_runfiles) specifically to prototype a solution to this that along the way also solves the Bzlmod repo mapping problem, but found the UX to be too different from what people are used to to force it on everyone using Bzlmod instead of this proposal. 



3. Can we assume that the runfiles library is always used?
I don't have data as to whether this is the case (at Google, it's not widely used but then again, we don't have external repositories, either) If not, that's obviously a problem but theoretically, it's possible to extend the runfiles tree to contain paths like $CONTAINING_REPO/$APPARENT_REPO/$REPO_PATH. It would probably blow up its size quite a bit, though.

We can't assume that it is, just like many rulesets today rely on the runfiles symlink tree being present and thus break on Windows with symlinks disabled. From my point of view, it makes sense to require rulesets to either a) use a runfiles library or b) compute runfiles path in Starlark and pass them around to tools via different means (arguments, environment variables). By having a canonical API for runfiles lookups, future changes to the layout of runfiles (see e.g. https://github.com/bazelbuild/bazel/pull/14676) can be implemented without breaking rulesets. Since Bzlmod will necessarily require changes, this may be the right time to migrate everyone to an actual API and away from implementation details.
I think (b) would give rise to a menagerie of ad-hoc mechanisms, all awful, so if it really needs to be, it should be (a). If your argument is that we can't support runfiles everywhere (because Windows) and that we should therefore mandate the use of a library, it would be a big change that I emotionally like because frankly, runfiles trees are a lot of baggage and have caused a lot of bugs and pain so I'd love to see them go away. However, they are *way* too widely used at Google for us to be able to remove the support code within a reasonable time frame and they are just too convenient for shell and Python so we'll have to pay the price anyway, which means that if we can make them work halfway well in Bazel, it's worth keeping them.

In the simple scenario where one is only using runfiles with symlinks enabled *and* all lookups happen in the root module, I would say it's totally fine not to use runfiles libraries. I assume that this covers a large number of use cases (including all of Google) and I think it's important to preserve that UX (at least via explicit opt-in). But for all cases that are more complicated than that, I would like everyone to rely on a common API. 

I'm somewhat inclined to think a bit more about runfiles trees, but I'll avoid the temptation because coupling that change to bzlmod would make the success of both less likely.
 
If you are interested, we could start a separate effort/discussion to make runfiles lookups more "Bazely" long-term. 

Lukács T. Berki

unread,
Sep 6, 2022, 11:27:36 AM9/6/22
to Fabian Meumertzheim, bazel-dev, Xudong Yang
On Tue, Sep 6, 2022 at 10:52 AM Fabian Meumertzheim <fab...@meumertzhe.im> wrote:


On Tue, Sep 6, 2022 at 9:22 AM 'Lukács T. Berki' via bazel-dev <baze...@googlegroups.com> wrote:
Do I understand correctly that your main argument is that the caller of the runfiles library needs to pass in the canonical repository name in order to do the runfiles resolution and RunfilesLibraryInfo is the only acceptably convenient way to inject that without undue pain?

Yes, that is the most important part. although I would say that I find the trimmed repository mapping manifests rather important (but not strictly necessary) as well.
 
Unfortunately, as far as I can tell, that cannot be determined in a language-agnostic way because it has to be done in one of the following ways:
  1. Walking the call stack at runtime
  2. Declaring a symbol that always has the right value (the current local_define / inlined static constant proposal)
  3. Asking people to hard-wire it in the code (it's not as bad as it appears, because AFAIU the canonical repository name is already encoded in the MODULE.bazel file of the repository)
It's crucial to note that 3. can't work: The canonical repository name is not a function of the information in MODULE.bazel, but rather depends on the outcome of the Minimal Version Selection algorithm. A module could of course duplicate its "version" parameter of the "module" directive into a Starlark constant and inject that into code that looks up runfiles, but even that would break if the root module declares an override for the module. Since overrides use different canonical repository names (not including the version) and a module can't determine whether it has been overridden, the canonical repository name must be assumed to be fully dynamic. This essentially leaves only option 1 and 2.
Wait, then I have grossly misunderstood how bzlmod works: I thought that the canonical name of a module is only a function of its own MODULE.bazel flie and it's only the apparent repository -> canonical repository mapping that depends on the version selection algorithm. IOW, if the MODULE.bazel file contains module(name="foo", version="0.1") the code in that module will have foo-0.1 as its canonical repository name. Am I wrong here?

This is crucial because then hard-coding the repository name in runfiles queries (either by just repeating "foo-0.1" or exposing some symbol in Bazel that means "the canonical name of the repository I am in") becomes possible.

 
It feels weird to add this feature to Bazel just for the sake of Java. Have you thought about how this could work for Python (or Rust, etc.)? If it's absolutely necessary, sure, go ahead, but if it's just a Java thing, let's find an approach that doesn't require widening the interface of Bazel just for a single programming language. 
 
IMO it's the other way round: This feature is mostly there for the sake of all the other languages out there that I don't even know of and that may have to employ any number of arcane (and costly) hacks to embed the canonical repository name into their compiled representation. For example, as far as I know, Rust doesn't have textual macros like C/C++ and would thus likely also require code generation comparable to the current approach for Java. By providing these rulesets with a canonical, cross-ruleset indicator of runfiles library usage (e.g., someone might use the C++ runfiles library from Rust via FFI, which RunfilesLibraryInfo could handle gracefully), we put them in the best position to limit the fallout of this change to the smallest number of targets necessary.
I'm confused. If a language can't put the canonical repository name into their compiled representation, how can they construct the proper calls to the runfiles library? I thought the idea was that code in repository foo-0.1 will always pass the string "foo-0.1" as the canonical repository name to the runfiles library.



(my gut feeling is that it's odd to make a configured target do things differently based on whether any of its direct dependencies has a marker provider but let's not make decisions based on gut feelings if we can help)
The following perspective has allowed my gut to make peace with this, maybe it helps you too :-)
Assume for the moment that Bazel internally merged all repositories into a single one with a dynamic name and thus the canonical repository name we needed to inject would be a whole-build constant. In that case, we would offer this value as e.g. a Java constant provided by the Java runfiles library and, crucially, the UX of that would be the same as it is with the proposal: The configured target would only be able to use the constant if it depends on the Java runfiles library directly due to strict deps. Granted, the value would behave differently, but since the users of the runfiles library aren't expected to do anything with the value other than pass it back to the runfiles library, that shouldn't matter.
FWIW, "aren't expected to do anything with the value other than pass it back" is a good argument to try not to have this value at all, at least to the extent practicable. It looks like that local_define can paper over that with C++, Java can use (in theory, again) call stack walking and then that's a strictly superior user experience than having to pass a mysterious symbol to the runfiles library.


FWIW, I think the more general issue this case uncovers is that runfiles is not really a good abstraction because the input to the path resolution is an "apparent runfiles path" whereas everywhere else, paths are resolved from labels. Concretely, you are totally right that :a should not even know about the existence @repo2 but there is currently no way for a binary to ask at runtime "what files does @repo1//b:data contain?". If your argument is that this use case is already broken currently and this change does not make it any more broken, I could get behind that :)

Yes, the missing link between labels and runfiles paths already makes runfiles hard to use today. I worked on rules_runfiles (https://github.com/fmeum/rules_runfiles) specifically to prototype a solution to this that along the way also solves the Bzlmod repo mapping problem, but found the UX to be too different from what people are used to to force it on everyone using Bzlmod instead of this proposal. 
Agreed, although I could also be sold on ripping off this particular band-aid with the migration to bzlmod; runfiles are painful enough already.




3. Can we assume that the runfiles library is always used?
I don't have data as to whether this is the case (at Google, it's not widely used but then again, we don't have external repositories, either) If not, that's obviously a problem but theoretically, it's possible to extend the runfiles tree to contain paths like $CONTAINING_REPO/$APPARENT_REPO/$REPO_PATH. It would probably blow up its size quite a bit, though.

We can't assume that it is, just like many rulesets today rely on the runfiles symlink tree being present and thus break on Windows with symlinks disabled. From my point of view, it makes sense to require rulesets to either a) use a runfiles library or b) compute runfiles path in Starlark and pass them around to tools via different means (arguments, environment variables). By having a canonical API for runfiles lookups, future changes to the layout of runfiles (see e.g. https://github.com/bazelbuild/bazel/pull/14676) can be implemented without breaking rulesets. Since Bzlmod will necessarily require changes, this may be the right time to migrate everyone to an actual API and away from implementation details.
I think (b) would give rise to a menagerie of ad-hoc mechanisms, all awful, so if it really needs to be, it should be (a). If your argument is that we can't support runfiles everywhere (because Windows) and that we should therefore mandate the use of a library, it would be a big change that I emotionally like because frankly, runfiles trees are a lot of baggage and have caused a lot of bugs and pain so I'd love to see them go away. However, they are *way* too widely used at Google for us to be able to remove the support code within a reasonable time frame and they are just too convenient for shell and Python so we'll have to pay the price anyway, which means that if we can make them work halfway well in Bazel, it's worth keeping them.

In the simple scenario where one is only using runfiles with symlinks enabled *and* all lookups happen in the root module, I would say it's totally fine not to use runfiles libraries. I assume that this covers a large number of use cases (including all of Google) and I think it's important to preserve that UX (at least via explicit opt-in). But for all cases that are more complicated than that, I would like everyone to rely on a common API. 
Have you checked whether Python and shell rules will work with this particular change in the "simple" scenario? They rely on a particular layout of the runfiles tree for including source code from dependencies (Python, by importing, shell, by just executing them) and it looks like they may be affected? 


I'm somewhat inclined to think a bit more about runfiles trees, but I'll avoid the temptation because coupling that change to bzlmod would make the success of both less likely.
 
If you are interested, we could start a separate effort/discussion to make runfiles lookups more "Bazely" long-term. 
With a low priority? Given the inertia at Google, I don't see a way to significantly change how runfiles work there and then the code supporting it must stay in Bazel. 

Xudong Yang

unread,
Sep 6, 2022, 11:36:18 AM9/6/22
to Lukács T. Berki, Fabian Meumertzheim, bazel-dev
> Wait, then I have grossly misunderstood how bzlmod works: I thought that the canonical name of a module is only a function of its own MODULE.bazel flie and it's only the apparent repository -> canonical repository mapping that depends on the version selection algorithm. IOW, if the MODULE.bazel file contains module(name="foo", version="0.1") the code in that module will have foo-0.1 as its canonical repository name. Am I wrong here?

This is indeed incorrect. As Fabian said, the final canonical repository name would be `foo~$VERSION` where $VERSION is the resolved version of `foo`, which may be higher than 0.1. Since we only want one version of `foo` in our codebase, and if someone else requests `foo@0.2`, we will choose version 0.2 per Minimal Version Selection.

"  YÁNG Xùdōng (杨旭东)
#  SURNAME Givenname
=  Software Engineer
@  Google Munich

Lukács T. Berki

unread,
Sep 6, 2022, 11:40:24 AM9/6/22
to Xudong Yang, Fabian Meumertzheim, bazel-dev
On Tue, Sep 6, 2022 at 5:36 PM Xudong Yang <w...@google.com> wrote:
> Wait, then I have grossly misunderstood how bzlmod works: I thought that the canonical name of a module is only a function of its own MODULE.bazel flie and it's only the apparent repository -> canonical repository mapping that depends on the version selection algorithm. IOW, if the MODULE.bazel file contains module(name="foo", version="0.1") the code in that module will have foo-0.1 as its canonical repository name. Am I wrong here?

This is indeed incorrect. As Fabian said, the final canonical repository name would be `foo~$VERSION` where $VERSION is the resolved version of `foo`, which may be higher than 0.1. Since we only want one version of `foo` in our codebase, and if someone else requests `foo@0.2`, we will choose version 0.2 per Minimal Version Selection.
To make sure I understand correctly: then even though the source code says that it's foo-0.1, the canonical repository name will be foo-0.2 for the 0.1 sources?

Xudong Yang

unread,
Sep 6, 2022, 11:44:00 AM9/6/22
to Lukács T. Berki, Fabian Meumertzheim, bazel-dev
> To make sure I understand correctly: then even though the source code says that it's foo-0.1, the canonical repository name will be foo-0.2 for the 0.1 sources?

What do you mean by "the source code"? There is no source code of foo@0.1 anywhere, as we don't even fetch it. We'd only fetch foo@0.2 sources (when, for example, someone asks to load `@foo//something`).

"  YÁNG Xùdōng (杨旭东)
#  SURNAME Givenname
=  Software Engineer
@  Google Munich

--
You received this message because you are subscribed to the Google Groups "bazel-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-dev+...@googlegroups.com.

Lukács T. Berki

unread,
Sep 6, 2022, 11:46:54 AM9/6/22
to Xudong Yang, Fabian Meumertzheim, bazel-dev
On Tue, Sep 6, 2022 at 5:44 PM Xudong Yang <w...@google.com> wrote:
> To make sure I understand correctly: then even though the source code says that it's foo-0.1, the canonical repository name will be foo-0.2 for the 0.1 sources?

What do you mean by "the source code"? There is no source code of foo@0.1 anywhere, as we don't even fetch it. We'd only fetch foo@0.2 sources (when, for example, someone asks to load `@foo//something`).
Ok, then it looks like my initial assessment was correct. Let me try again. Maybe I worded it wrong.

Is it true that when a single module says module(name="foo", version="0.1") the canonical repository name of that module will always be "foo-0.1"? It looks like both you and Fabian are saying that if a module requests "foo-0.1" as a dependency, it might end up being "foo-0.2" instead, which is perfectly reasonable, but doesn't affect this use case because the RunfilesLibraryInfo symbol is required to know the name of the *current* module and not its dependencies.

Xudong Yang

unread,
Sep 6, 2022, 11:57:25 AM9/6/22
to Lukács T. Berki, Fabian Meumertzheim, bazel-dev
Ah sorry, I misread your comment. You said `module(name="foo",version="0.1")`, not `bazel_dep(name="foo",version="0.1")`.

If you're talking about the scenario where the main repo / root module is this foo@0.1, then the name/version of it has no bearing on its own canonical repo name, which is *always* just the empty string. (cf. the label "@//something/in/the/main:repo")

If I understand correctly, historically runfiles for the main repo have been placed in the subdirectory named after the workspace name (declared in the WORKSPACE file). We're not changing that as of yet, and changing it isn't a huge deal anyway since we'd just update the runfiles library. (Obviously this hinges on everyone converting to use the runfiles library as opposed to creating their own runfile paths, but I sort of just took that for granted)


"  YÁNG Xùdōng (杨旭东)
#  SURNAME Givenname
=  Software Engineer
@  Google Munich

Lukács T. Berki

unread,
Sep 7, 2022, 4:46:55 AM9/7/22
to Xudong Yang, Fabian Meumertzheim, bazel-dev
On Tue, Sep 6, 2022 at 5:57 PM Xudong Yang <w...@google.com> wrote:
Ah sorry, I misread your comment. You said `module(name="foo",version="0.1")`, not `bazel_dep(name="foo",version="0.1")`.

If you're talking about the scenario where the main repo / root module is this foo@0.1, then the name/version of it has no bearing on its own canonical repo name, which is *always* just the empty string. (cf. the label "@//something/in/the/main:repo")
I'm talking about every repo in the build, transitive dependencies included (I in fact didn't think about the main repo, it's in fact a special case) 

If I understand correctly, historically runfiles for the main repo have been placed in the subdirectory named after the workspace name (declared in the WORKSPACE file). We're not changing that as of yet, and changing it isn't a huge deal anyway since we'd just update the runfiles library. (Obviously this hinges on everyone converting to use the runfiles library as opposed to creating their own runfile paths, but I sort of just took that for granted) 
Weelll, this implies that the layout of the runfiles directory is an implementation detail. I would love that to be the case, but I do know for a fact that the shell and Python rules provided by Google directly access the runfiles tree to load the code in dependencies. So at least those two things would need to be checked before changing anything in runfiles. 

Lukács T. Berki

unread,
Sep 7, 2022, 9:22:17 AM9/7/22
to Xudong Yang, Fabian Meumertzheim, bazel-dev
A summary of our video call.

I insisted that if we want to get bzlmod out of experimental state, it's necessary that all rule sets endorsed by the Bazel team (in practice, this means those that are or were implemented in Java) work reasonably well with it. I'll be flexible on the "reasonably" part, but at the very least I'd like binaries spanning multiple repositories to work as well as without bzlmod. I know of two cases where this is doubtful:
  1. Python and its import semantics
  2. Shell and the convention to just execute scripts based on their runfiles paths
If they don't work at the moment, that's also fine, then bzlmod doesn't make the situation worse.

Fabian and Xudong convinced me that it's not trivial for Java code to know at runtime what repository it is built from; I thought it was doable by embedding a .class -> canonical repository name in the .jar files and then using Thread.getStackTrace()[0].getClassName() but they disagreed. I said that adding an extra action for each Java library is too much overhead and recommended tweaking JavaBuilder to make use of static final inlining instead of doing it in separate actions.

I also insisted that we try going without RunfilesLibraryInfo; Fabian did have a point that every use case I could think of can be covered by adding RunfilesLibraryInfo to just the right libraries, but that's manual labor and as such, error-prone. My line of thinking is that Bazel should be correct first, simple second and fast third and thus until it's proven that this symbol is necessary, let's try to live without it. I hope that we can find a way to put only those repo mappings into the runfiles repo mapping that is actually necessary for resolving the runfiles; Fabian and Xudong are right that if we need to put every apparent repository for every repository in the transitive closure of binary, that's probably a gross overestimate and would harm incrementality.

Fabian also lamented the difficulty of getting the set of transitive packages for a binary (for the purposes of computing the runfiles repo mapping). I pointed him towards RuleConfiguredTargetValue.transitivePackagesForPackageRootResolution ; since then, I checked and it looks like it's indeed maintained in Bazel.

So, TODO:
  1. Verify whether our Python and shell rules work with bzlmod. If they don't, fix them.
  2. Check whether either the stack trace method or the JavaBuilder method works for figuring out the canonical repository a particular class comes from at runtime in Java (during the meeting, they convinced me that getStackTrace() does not work, but its Javadoc makes me think that it does)
  3. Figure out a way to minimize the repo mappings in runfiles without RunfilesLibraryInfo
Xudong, Fabian, is this a fair summary?

4. transitivePackagesForPackageRootResolution (seems like we do store it)


Xudong Yang

unread,
Sep 7, 2022, 9:49:30 AM9/7/22
to Lukács T. Berki, Xudong Yang, Fabian Meumertzheim, bazel-dev
> Verify whether our Python and shell rules work with bzlmod. If they don't, fix them.
They do work; Python imports don't contain the repo name whatsoever (apparent or canonical), so that part's fine. And shell `source()` calls should already be using rlocation due to Windows; and even if they're not, they can be converted to use rlocation.


Fabian Meumertzheim

unread,
Sep 7, 2022, 10:15:52 AM9/7/22
to Lukács T. Berki, Xudong Yang, bazel-dev
Thanks for the summary, I think it's accurate. My additions:

2. While a global class name -> repo name mapping could theoretically work, it would necessarily be imperfect: With custom class loaders, it is quite possible that different repositories contribute the same class. This mapping would also break with shading unless the shading process is aware of the mapping. JavaBuilder should work fairly reliably though - it's mostly a more efficient implementation of the existing Starlark trickery. I will investigate this further.
3. The additional pruning afforded by RunfilesLibraryInfo converges to 0 as more and more targets use runfiles - that by itself makes me think that if we don't need RunfilesLibraryInfo to guard additional actions (see 2), we don't need it at all. Even without it, we at least have the following property: The repo mapping manifest for a target T only changes when the apparent repo name that a repo in the transitive closure of T assigns to a repo contributing runfiles to T changes. RunfilesLibraryInfo would help only if there were a very popular target with runfiles (e.g. a dynamic language's standard library that references data files) that is dependend on by lots of targets that don't use runfiles themselves - hard to quantify and most likely unnecessary. I will update the proposal.
4. I will have to check whether this is reliably non-null at the places where we actually use it - it is subject to some Skyframe "cleaning" that I don't grasp.

Fabian Meumertzheim

unread,
Sep 7, 2022, 10:59:49 AM9/7/22
to Lukács T. Berki, Xudong Yang, bazel-dev
I checked transitivePackagesForPackageRootResolution and it seems to reliably contain the information we need. It is cleared only when the analysis cache is cleared, which I assume won't happen during the build.

@Lukács T. Berki Could you provide some hints for how to funnel that information into e.g. RunfilesSupport?

Lukács T. Berki

unread,
Sep 14, 2022, 3:21:30 AM9/14/22
to Fabian Meumertzheim, Xudong Yang, bazel-dev, Nathan Harmata, Ivo Ristovski List
+Nathan (for historical context on transitivePackagesForPackageRootResolution) and Ivo (due to increasing the power of RuleContext and its interplay with Starlarkification)

On Wed, Sep 7, 2022 at 4:59 PM Fabian Meumertzheim <fab...@meumertzhe.im> wrote:
I checked transitivePackagesForPackageRootResolution and it seems to reliably contain the information we need. It is cleared only when the analysis cache is cleared, which I assume won't happen during the build.
Yay! Adding Nathan in case he thinks it's a bad idea to rely on transitivePackagesForPackageRootResolution, but I don't expect any trouble; it looks like the ability to not track it was added in 9e1d285e as a Google-internal optimization and we conveniently never need that optimization and have external repositories at the same time.

So y'all have decided to go with the simplest "everything in the transitive closure" approach?

What I would do is simple workmanlike plumbing: RunfilesSupport.create() has a reference to RuleContext and where RuleContext is created (in ConfiguredTargetFactory.createAspect() and createRule()), transitivePackagesForRootResolution is available a few stack frames up: either in AspectFunction or ConfiguredTargetFunction (there are few more stack frames in between in the latter case). So just plumb the nested set to RuleContext.Builder, store it in RuleContext and Bob's your uncle.

I would be more worried about this approach is we had more rules implemented in Java because the implementation of rules shouldn't need to know the set of transitive packages, but since we are rapidly moving away from them, I don't mind too much. Give the new method a scary name ("transitivePackagesForRepositoryMappingInRunfilesManifests"), a polite but firm comment and you're good.

Also, don't forget to rename transitivePackagesForPackageRootResolution since then it won't be for package root resolution anymore.
Reply all
Reply to author
Forward
0 new messages