Hi Bazel community,
In order to improve the external dependencies management experience in Bazel, Xudong, Philipp and I are still working on a Bazel package manager design. We'll share the design doc as soon as it's in shape, but first I want to share the basic ideas and ask for your opinions on some important design decisions.
(This email is a polished version of
the original post in the external-deps group, we want to increase the visibility and gather opinions from a broader audience. Please read this email and fill in
the survey mentioned at the end.)
We want to introduce a new way to declare external dependencies and move the responsibility of managing dependencies from Bazel to a new dependencies management tool. The design has the following parts:
- Bazel Module and MODULE.bezel: Bazel module is a collection of available versions of a Bazel project. This project declares its dependencies on other Bazel modules in a MODULE.bazel file. Unlike the WORKSPACE file, users only need to declare their direct dependencies.
- The Bazel dependencies management tool: You can use this tool to add, remove, upgrade, and query your dependencies. It will resolve your dependencies transitively by reading the MODULE.bazel files and make the required external sources available for the Bazel build.
- Custom module rules: like today's custom repo rules (eg. rules_jvm_external), "module rule" will be supported by the new tool to pull dependencies from non-Bazel registries such as Maven.
There are a lot of design details, we'll share them in the design doc later.
The question I want to discuss in this thread is how does a user publish their project as a Bazel module. In our design, we have the concept of
Bazel registry. It is basically an index of a list of Bazel modules in the form that's understandable by the dependencies management tool. The essential information a Bazel registry should contain are the available versions of a module, the MODULE.bazel file of each version, and the url of the source blob of each version. We plan to implement Bazel registry as a github repository, similar to
crates.io-index. To be more flexible, a git repository with version tags can be interpreted as a mini Bazel registry that only contains one Bazel module. Note that, unlike some registries, a Bazel registry is not a running service, which reduces some maintenance cost.
We have the following ideas of how Bazel registries should look like in the new world, but we think the community's opinions on this are very important for making the decision.
Like the
Maven Central Repo or
crates.io, we create a central registry for hosting Bazel modules. This is where all users should publish their project in order to make it available to others. While this is the main source of Bazel external dependencies, third party Bazel registry will also be supported for use cases that the project cannot be in the central registry (eg. publishing internal libraries inside the company). But in most cases, users just have to specify the module name and version of their direct dependencies, then our tool will know how to pull them from the central registry.
Pros
- It's easy for users to find and declare dependency, module name + version, that's it.
- In the central registry, we can store patch files that are unable to be upstreamed for some reason (eg. for adding BUILD files for a non-Bazel project), and this can be shared with all Bazel users.
- The Bazel modules are reviewed before checking into the registry, which ensures their license validity and security.
- It's possible to calculate the dependents of a module, therefore compatibility check is easier when a new version comes out.
- No module name conflict because the same module name can only appear once in the registry.
- The transitive dependency closure of any given module can be precomputed, saving a lot of HTTP downloads at dependency resolution time.
Cons
- Users probably have to figure out a way to get their dependencies into the central registry in the first place, especially in the initial phase.
- Very likely a huge maintenance cost that's nearly impossible for a three-person team to deal with. Whether this approach is viable really depends on how much we can collaborate with the community.
- Bazel Official Registry + Community Maintained Third Party Registries
The Bazel team will host a registry for official Bazel rules, Starlark libraries, and other important Bazel related projects (kind of like the Bazel Federation). Other interest groups can host their own Bazel registries. For example, the Bazel C++ community can host a third party registry for releasing C++ projects as Bazel modules. Note that one Bazel module in a registry may have to depend on a module in another registry. For example, a library in the C++ Bazel registry may have to depend on rules_cc in the official Bazel registry. With this approach, users have to specify not only the module name and version of their direct dependencies, but also a list of registries that provide all the Bazel modules in their transitive dependencies.
Pros
- The first three points of pros of the Bazel central registry solution.
- Maintenance cost is spread across the community.
- Each interest group can have full control of their registry.
Cons
- The first point of cons of the Bazel central registry solution.
- The same module name might be used in multiple registries, which could cause a conflict. Mitigate: we can require users to use reversed internet domain as module name (they are already recommended for repo name)
- When adding a new dependency, users have to make sure they also add it's required registries. This list can grow as the number of registries in the ecosystem grows.
- It's not very clear for some multi-language projects to choose which registry they should go into.
In a decentralized world, we think the best way is to distribute Bazel modules as git repositories with version tags. We can still have Bazel registries, but they will not be the main sources for pulling Bazel dependencies. When users declare dependencies, the source (a git repo or a Bazel registry) of a module should also be specified along with the module name and version.
- Low maintenance cost for Bazel registry. Because even if it exists, its size should be very small.
- Easier for users to "publish" their projects. Just make a new version tag.
- If one git repo changes (offline or moved), it could transitively break many downstream projects. Mitigate: we can use a mirror to ensure what was available is always available and the same.
- We have a much higher chance to have module name conflicts. Eg. 1) different projects accidentally use the same module name. 2) The same module is hosted in different git repos (due to clone perhaps). In the first case, we can distinguish modules by url and use repo_remapping to mitigate, but in the second case, there could still be conflicts during linking time.
- For projects not using Bazel already, this means the corresponding Bazel module (with Bazel BUILD files) has to be created and hosted by a third party.
- Compared to the registries as the main source solutions, this approach has less security promises.
As you can see, each solution has its pros and cons. Overall, I think the Bazel Central Registry approach may provide the best user experience and create a more unified ecosystem, but it definitely requires a lot of effort from both the Bazel team and the community.
Please tell us what you think is the best approach, you can reply to this thread or provide us with more detailed information by filling out
this form.
Cheers,
Yun Peng