Challenges with external dependencies

76 views
Skip to first unread message

Andrew Suffield

unread,
Jul 14, 2020, 3:51:21 PM7/14/20
to external-deps
Caveat: these are my problems. They happen to be fairly widespread, but they aren't the only problems in the world, they're just the ones I've fallen over at scale.

  1. Cross-repo version management. Various forms of this exist, notably https://github.com/bazelbuild/bazel-federation. None of them are as polished as I would like, and there's a collection of interesting problems around "how to get users to update to newer versions?" and "how to test that a new release is compatible with existing users?" which sound an awful lot like wanting a scalable, git-friendly variation on rosie and TAP. It's also a fairly manual process to update dependencies in these things, and we could use a better solution here (we might have an intern trying to make progress on that last puzzle at the moment).
  2. Nontrivial logic in WORKSPACE routinely falls foul of the byzantine, not-precisely-starlark language used here - the order-sensitivity is confounding to tools like buildozer, and the top-level chunking isn't powerful enough to allow the creation of interesting macros. An example problem here would be "how do I selectively define repos based on platform?", and the available solutions today are to either rewrite every repository_rule to include this feature, or to have a repository_rule which generates a .bzl file with another macro in, and then call it - which isn't unreasonable, until you realise that this can only be done from the top-level WORKSPACE file, and not within a macro. https://github.com/bazelbuild/proposals/blob/master/designs/2019-01-14-delayed-load.md was one attempt at untangling this puzzle, but we never got it off the ground. I'm going to stick a flag in the ground and claim that the ideal state would be for WORKSPACE to be processed like any other BUILD file, with the only difference being that repository_rules can be called and rules cannot. That means both the chunking and the order-sensitivity need to go, and the whole concept of namespacing in //external needs to be rethought.
  3. Building without internet access. I know this sounds annoying, but it's a real problem (and hey, blaze works without internet access, so it's not unprecedented). The constraints I'm usually operating under are that there is some magic black-box system which downloads and blesses bytes as safe for use, and all downloads have to be sent via this system. We can't use http proxies for this, because all URLs are https these days and the magic black-box needs to introspect the content. The remote asset API might work here, if bazel implemented it.
  4. A better story on Windows support. It's really hairy in repository_rules, and in practice we always end up with a chain of if statements on repository_ctx.os and then writing everything twice. I can't even use repository_ctx.symlink because it creates junctions instead of symlinks, and those have unfortunate limitations, like not being able to target a network filesystem from a local one.
I can go into more detail on any of these, if there's anything surprising in here.

Tony Aiuto

unread,
Jul 14, 2020, 10:22:57 PM7/14/20
to Andrew Suffield, external-deps
Thanks for these thoughts.

On Tue, Jul 14, 2020 at 3:51 PM Andrew Suffield <asuf...@gmail.com> wrote:
Caveat: these are my problems. They happen to be fairly widespread, but they aren't the only problems in the world, they're just the ones I've fallen over at scale.
  1. Cross-repo version management. Various forms of this exist, notably https://github.com/bazelbuild/bazel-federation. None of them are as polished as I would like, and there's a collection of interesting problems around "how to get users to update to newer versions?" and "how to test that a new release is compatible with existing users?" which sound an awful lot like wanting a scalable, git-friendly variation on rosie and TAP. It's also a fairly manual process to update dependencies in these things, and we could use a better solution here (we might have an intern trying to make progress on that last puzzle at the moment).
  1. Nontrivial logic in WORKSPACE routinely falls foul of the byzantine, not-precisely-starlark language used here - the order-sensitivity is confounding to tools like buildozer, and the top-level chunking isn't powerful enough to allow the creation of interesting macros. An example problem here would be "how do I selectively define repos based on platform?", and the available solutions today are to either rewrite every repository_rule to include this feature, or to have a repository_rule which generates a .bzl file with another macro in, and then call it - which isn't unreasonable, until you realise that this can only be done from the top-level WORKSPACE file, and not within a macro. https://github.com/bazelbuild/proposals/blob/master/designs/2019-01-14-delayed-load.md was one attempt at untangling this puzzle, but we never got it off the ground. I'm going to stick a flag in the ground and claim that the ideal state would be for WORKSPACE to be processed like any other BUILD file, with the only difference being that repository_rules can be called and rules cannot. That means both the chunking and the order-sensitivity need to go, and the whole concept of namespacing in //external needs to be rethought.
The way I think of this is that WORKSPACE should either be procedural, or declarative, but not a little of both. 
  1. Building without internet access. I know this sounds annoying, but it's a real problem (and hey, blaze works without internet access, so it's not unprecedented). The constraints I'm usually operating under are that there is some magic black-box system which downloads and blesses bytes as safe for use, and all downloads have to be sent via this system. We can't use http proxies for this, because all URLs are https these days and the magic black-box needs to introspect the content. The remote asset API might work here, if bazel implemented it.
There are a few issues buried here behind the different styles of what that could mean
  1. I just want to build offline: I need to run bazel in a mode that downloads and caches everything once.  Bonus points for dealing with --config so I only download the tools needed for the targets I work on
  2. I want to know my dependencies and save them in a private repository. Like the above, but you'll put the archives in source control.
  3. I want to use *my* version of some repositories. For example, you might have a patched bazel-skylib, and you want that used regardless of whatever version any other repository asks for in its dependencies.

Andrew Suffield

unread,
Jul 15, 2020, 7:40:08 AM7/15/20
to Tony Aiuto, external-deps
On Wed, Jul 15, 2020 at 3:22 AM 'Tony Aiuto' via external-deps <extern...@bazel.build> wrote:
The way I think of this is that WORKSPACE should either be procedural, or declarative, but not a little of both. 

That sounds reasonable. Unless somebody has a good reason why not, I'm going to claim that procedural code belongs in repository_rule implementation functions. The only real barrier to this is the current byzantine semantics of load statements and redefined repository names. I could sketch out a solution that works for me, but I'll wait a bit to get other people's problems. (Happy to join a working group on this)
 
  1. Building without internet access. I know this sounds annoying, but it's a real problem (and hey, blaze works without internet access, so it's not unprecedented). The constraints I'm usually operating under are that there is some magic black-box system which downloads and blesses bytes as safe for use, and all downloads have to be sent via this system. We can't use http proxies for this, because all URLs are https these days and the magic black-box needs to introspect the content. The remote asset API might work here, if bazel implemented it.
There are a few issues buried here behind the different styles of what that could mean
  1. I just want to build offline: I need to run bazel in a mode that downloads and caches everything once.  Bonus points for dealing with --config so I only download the tools needed for the targets I work on
  2. I want to know my dependencies and save them in a private repository. Like the above, but you'll put the archives in source control.
  3. I want to use *my* version of some repositories. For example, you might have a patched bazel-skylib, and you want that used regardless of whatever version any other repository asks for in its dependencies.

My problem is a mix of 2 and 3, but I'm handling 3 in the bazel-federation style and injecting patches there. My form of 2 is a policy tangle about going through a set process for fetching from the internet, but I have a google3-analog: assume you wanted to run this in google prod, so all your http downloads and git clones have to go via HOPE. It's not quite powerful enough to run git (or at least, it wasn't when I worked on it) so you end up needing a contraption like: the "git clone" command runs on the public cloud side with access to the internet, the result of this is packed into a tarball, which can be fetched back to where bazel runs using the download service.

I'm currently tackling this with some terribly manual hackery and github's ability to spit out tarballs of any git revision, but it's fragile. Better would be if the repository_rule implementation could be run remotely, with an API that I could proxy via the download service.

Xudong Yang

unread,
Sep 21, 2020, 3:23:47 AM9/21/20
to external-deps, asuf...@gmail.com, external-deps, Tony Aiuto
> That sounds reasonable. Unless somebody has a good reason why not, I'm going to claim that procedural code belongs in repository_rule implementation functions. The only real barrier to this is the current byzantine semantics of load statements and redefined repository names. I could sketch out a solution that works for me, but I'll wait a bit to get other people's problems. (Happy to join a working group on this)

In the design we're currently working on, we want to introduce another file called MODULE.bazel where users can specify direct dependencies in a declarative manner. The WORKSPACE file would be reduced to a machine-generated, load()-less list of repos. I'm curious what the sketch you had in mind looks like (since well, nobody else has really shared their problems...)

Reply all
Reply to author
Forward
0 new messages