Bazel Package Manager Design Ideas - Bazel modules and how should they be distributed

562 views

Skip to first unread message

Yun Peng

unread,

Sep 22, 2020, 11:11:37 AM9/22/20

to bazel-dev, bazel-discuss, Xudong Yang, Philipp Wollermann, Sven Tiffe

Hi Bazel community,

In order to improve the external dependencies management experience in Bazel, Xudong, Philipp and I are still working on a Bazel package manager design. We'll share the design doc as soon as it's in shape, but first I want to share the basic ideas and ask for your opinions on some important design decisions.

(This email is a polished version of the original post in the external-deps group, we want to increase the visibility and gather opinions from a broader audience. Please read this email and fill in the survey mentioned at the end.)

We want to introduce a new way to declare external dependencies and move the responsibility of managing dependencies from Bazel to a new dependencies management tool. The design has the following parts:

Bazel Module and MODULE.bezel: Bazel module is a collection of available versions of a Bazel project. This project declares its dependencies on other Bazel modules in a MODULE.bazel file. Unlike the WORKSPACE file, users only need to declare their direct dependencies.
The Bazel dependencies management tool: You can use this tool to add, remove, upgrade, and query your dependencies. It will resolve your dependencies transitively by reading the MODULE.bazel files and make the required external sources available for the Bazel build.
Custom module rules: like today's custom repo rules (eg. rules_jvm_external), "module rule" will be supported by the new tool to pull dependencies from non-Bazel registries such as Maven.

There are a lot of design details, we'll share them in the design doc later.

The question I want to discuss in this thread is how does a user publish their project as a Bazel module. In our design, we have the concept of Bazel registry. It is basically an index of a list of Bazel modules in the form that's understandable by the dependencies management tool. The essential information a Bazel registry should contain are the available versions of a module, the MODULE.bazel file of each version, and the url of the source blob of each version. We plan to implement Bazel registry as a github repository, similar to crates.io-index. To be more flexible, a git repository with version tags can be interpreted as a mini Bazel registry that only contains one Bazel module. Note that, unlike some registries, a Bazel registry is not a running service, which reduces some maintenance cost.

We have the following ideas of how Bazel registries should look like in the new world, but we think the community's opinions on this are very important for making the decision.

Bazel Central Registry

Like the Maven Central Repo or crates.io, we create a central registry for hosting Bazel modules. This is where all users should publish their project in order to make it available to others. While this is the main source of Bazel external dependencies, third party Bazel registry will also be supported for use cases that the project cannot be in the central registry (eg. publishing internal libraries inside the company). But in most cases, users just have to specify the module name and version of their direct dependencies, then our tool will know how to pull them from the central registry.

Pros

It's easy for users to find and declare dependency, module name + version, that's it.
In the central registry, we can store patch files that are unable to be upstreamed for some reason (eg. for adding BUILD files for a non-Bazel project), and this can be shared with all Bazel users.
The Bazel modules are reviewed before checking into the registry, which ensures their license validity and security.
It's possible to calculate the dependents of a module, therefore compatibility check is easier when a new version comes out.
No module name conflict because the same module name can only appear once in the registry.
The transitive dependency closure of any given module can be precomputed, saving a lot of HTTP downloads at dependency resolution time.

Cons

Users probably have to figure out a way to get their dependencies into the central registry in the first place, especially in the initial phase.
Very likely a huge maintenance cost that's nearly impossible for a three-person team to deal with. Whether this approach is viable really depends on how much we can collaborate with the community.

Bazel Official Registry + Community Maintained Third Party Registries

The Bazel team will host a registry for official Bazel rules, Starlark libraries, and other important Bazel related projects (kind of like the Bazel Federation). Other interest groups can host their own Bazel registries. For example, the Bazel C++ community can host a third party registry for releasing C++ projects as Bazel modules. Note that one Bazel module in a registry may have to depend on a module in another registry. For example, a library in the C++ Bazel registry may have to depend on rules_cc in the official Bazel registry. With this approach, users have to specify not only the module name and version of their direct dependencies, but also a list of registries that provide all the Bazel modules in their transitive dependencies.

Pros

The first three points of pros of the Bazel central registry solution.
Maintenance cost is spread across the community.
Each interest group can have full control of their registry.

Cons

The first point of cons of the Bazel central registry solution.
The same module name might be used in multiple registries, which could cause a conflict. Mitigate: we can require users to use reversed internet domain as module name (they are already recommended for repo name)
When adding a new dependency, users have to make sure they also add it's required registries. This list can grow as the number of registries in the ecosystem grows.
It's not very clear for some multi-language projects to choose which registry they should go into.

Decentralized

In a decentralized world, we think the best way is to distribute Bazel modules as git repositories with version tags. We can still have Bazel registries, but they will not be the main sources for pulling Bazel dependencies. When users declare dependencies, the source (a git repo or a Bazel registry) of a module should also be specified along with the module name and version.

Pros

Low maintenance cost for Bazel registry. Because even if it exists, its size should be very small.
Easier for users to "publish" their projects. Just make a new version tag.

Cons

If one git repo changes (offline or moved), it could transitively break many downstream projects. Mitigate: we can use a mirror to ensure what was available is always available and the same.
We have a much higher chance to have module name conflicts. Eg. 1) different projects accidentally use the same module name. 2) The same module is hosted in different git repos (due to clone perhaps). In the first case, we can distinguish modules by url and use repo_remapping to mitigate, but in the second case, there could still be conflicts during linking time.
For projects not using Bazel already, this means the corresponding Bazel module (with Bazel BUILD files) has to be created and hosted by a third party.
Compared to the registries as the main source solutions, this approach has less security promises.

As you can see, each solution has its pros and cons. Overall, I think the Bazel Central Registry approach may provide the best user experience and create a more unified ecosystem, but it definitely requires a lot of effort from both the Bazel team and the community.

Please tell us what you think is the best approach, you can reply to this thread or provide us with more detailed information by filling out this form.

Cheers,

Yun Peng

bran...@google.com

unread,

Sep 22, 2020, 6:01:41 PM9/22/20

to bazel-dev

I think the three registry proposals sound somewhat equivalent. If you have the ability for one registry to source another (i.e., include / incorporate / inherit from it), then depending on multiple registries is the same as depending on your own custom registry that has a few include lines. If you have a decentralized model, that sounds the same as depending on many registries where each registry is 1:1 with a module.

Seems like an important principle here is that registries (and transitively included registries) are all fetched and fully known before version selection and dependency fetching begins.

Also, @some_module_name must be globally unique regardless of how many registries there are, or else we run into problems trying to even interpret another repo's MODULE.bazel file. Yet that doesn't stop a registry from choosing a different URL (mirror) to send its fetches to. The registry could also choose which module rule is applicable -- the equivalent of picking among local_repository, git_repository, or http_archive depending on how your company's local mirrors are set up. That is, there's no need for @some_module_name's own MODULE.bazel file to preselect what mechanism you use to retrieve its source.

Reply all

Reply to author

Forward

0 new messages