Problems with WORKSPACE

Andrew Suffield

unread,

Jul 18, 2020, 5:14:08 PM7/18/20

to external-deps

Diving deeper on one of the threads in my earlier mail, there's a set of recurring problems with WORKSPACE (as a system for declaring dependencies) which are worth enumerating explicitly. For this email, I'm going to try to avoid describing possible solutions, just the problems:

The chunking and order-sensitivity are almost, but not quite, unusable. It is technically possible to get it to do whatever you want, but in practice it's confusing and complicated and hard to get right for any non-trivial scenarios. I'm going to claim that if we believe declarative is the right way to go (and I do), then any solution needs to satisfy two properties: load() statements can be sorted to the top of the file like other starlark code (which implies deferred loading), and duplicate definitions of repository names are an error.
The idea of a single global namespace for repository names sounds appealing, but is not an effective solution in all scenarios. The underlying motivation behind it sounding appealing is the scenario where you are building a single binary which has diamonds in its dependency graph: your binary depends on repos A and B, both of which depend on repo C, and you need to pick a single version of C so that they can all be linked together. This scenario is common and does need solving. To explain why a single global namespace is not the right solution, I have a compelling counterexample (which is an actual problem I was fighting with recently): instead of building a binary that depends on libraries from repos A and B, I am building a container image which includes binaries from repos A and B, and the repo C which they both depend on is protobuf or something like it. In this scenario, not only is there no reason for using the same version of protobuf in both libraries, but this is actively counterproductive because now two unrelated binaries need to be source-level compatible at all times - if I update the API used in one of them, I also have to update all of the others simultaneously. Once we bring in deep dependency nesting and libraries with unstable APIs like prometheus and kubernetes, this becomes a substantial challenge. repo_mapping is an effective workaround, but it raises an obvious question: why aren't we just using repo_mapping for everything?
There's still too much magic in how cache keys work. Whenever I'm doing something complicated with repository rules, I end up thinking that I really want to return an explicit cache key.
A lot of features have been made easy for humans and hard for computers, which is a problem when I'm trying to automate running builds and updating WORKSPACE. It's really hard to tell the difference between "the network download timed out and should be retried" and "the sha256 of the downloaded file did not match" without understanding the log messages. It's hard to automatically populate fields like sha256 and shallow_since, and the useful hints only go into human-readable log messages.

(Solving all of this while maintaining backwards compatibility might be infeasible; I suspect we'll end up needing a viable migration path to something that works differently.)

Tony Aiuto

unread,

Jul 18, 2020, 5:50:00 PM7/18/20

to Andrew Suffield, external-deps

Backwards compatibility should not be a requirement. If Bazel 5.0 introduced an entirely new workspace format it would just be another migration.

--
To unsubscribe from this group and stop receiving emails from it, send an email to external-dep...@bazel.build.

Xudong Yang

unread,

Sep 21, 2020, 3:17:07 AM9/21/20

to external-deps, Tony Aiuto, external-deps, asuf...@gmail.com

Digging graves a bit:

re 2. If I understand correctly, you're requesting "splitting the diamond" to be supported. It does sound like a reasonable request (and I think technically possible to set up with today's WORKSPACE, with repo_mapping like you said). My question is -- do you have any thoughts regarding what we should do in the case where the user does try to link the two different versions together into one binary? Supposedly they would get a cryptic linker error (or worse a runtime error in Java or Python, IIUC), but maybe that's fine since they did "ask" for this by using the repo_mapping feature. In a way, we're saying that you can use repo_mapping to split the diamond, but be prepared to deal with weird errors. Is that what you had in mind?

re 3. Could you elaborate a bit? What cache keys are you referring to?

Reply all

Reply to author

Forward