State of BUILD file generation?

139 views
Skip to first unread message

Gregg Donovan

unread,
Apr 16, 2021, 11:35:22 AMApr 16
to bazel-discuss

Hello! I'm interested in the state of and future roadmap of BUILD file generation. We have an existing Bazel monorepo with Java, Scala, Python, Go, JavaScript, and Protobuf. We've grown weary of manually maintaining BUILD files, especially during large refactorings, where the double-accounting of, say, Java imports and deps entries can be time consuming and error-prone.

The ideal user interface we would like is something that runs in the background creating, updating, and editing BUILD files and removing 90+% of the manual maintenance required. Certain actions, like changing visibility and adding runtime dependencies, would still require manual updates to BUILD files, but mechanical changes like mapping new imports to a first or third-party dependency would be automated. The second best user interface would be a manual generate command that needs to run when changes requiring BUILD file modifications occur. 

Before we write our own custom tool for solving this, we're curious to know:

  • How have others solved the problem?

  • Is there a plan to have a standard mechanism within the Bazel project for BUILD file generation? 

  • If not, what's the best foundation or set of patterns to build on? 


Going through some of the prior art, there is:


Any pointers would be much appreciated! Thanks.


Gregg Donovan

Senior Staff Software Engineer, Etsy.com


Gregg Reynolds

unread,
Apr 16, 2021, 5:06:47 PMApr 16
to Gregg Donovan, bazel-discuss
On Fri, Apr 16, 2021 at 10:35 AM 'Gregg Donovan' via bazel-discuss <bazel-...@googlegroups.com> wrote:

Hello! I'm interested in the state of and future roadmap of BUILD file generation. We have an existing Bazel monorepo with Java, Scala, Python, Go, JavaScript, and Protobuf. We've grown weary of manually maintaining BUILD files, especially during large refactorings, where the double-accounting of, say, Java imports and deps entries can be time consuming and error-prone.

The ideal user interface we would like is something that runs in the background creating, updating, and editing BUILD files and removing 90+% of the manual maintenance required. Certain actions, like changing visibility and adding runtime dependencies, would still require manual updates to BUILD files, but mechanical changes like mapping new imports to a first or third-party dependency would be automated. The second best user interface would be a manual generate command that needs to run when changes requiring BUILD file modifications occur. 

Before we write our own custom tool for solving this, we're curious to know:

  • How have others solved the problem?

  • Is there a plan to have a standard mechanism within the Bazel project for BUILD file generation? 

  • If not, what's the best foundation or set of patterns to build on? 


What a coincidence, I (the other Gregg) am working on something like that right now, for OCaml/Coq projects.

The major problem for OCaml/Coq is dependency discovery.  There are a couple of tools (ocamldep, codept) that can analyze code and emit dependency info, but that only goes so far, since the language does not fully define the relation between "modules" (which are language constructs), files, and build products (e.g. archive files that may contain module deps).  To complicate things further, preprocessing is common - lex/yacc stuff, but also several other kinds of pp. Dependency discovery can only happen after preprocessing.

The most popular build tool, Dune, does dependency discovery (by running ocamldep etc.) at build time. This obviously will not work for Bazel.  So the first version of my Ocaml rules (https://obazl.github.io/docs_obazl/ caveat: a little outdated, under heavy development) just ignored that problem and left it up the to developer the get the BUILD.bazel files in order by whatever means.  This is not really a sustainable model, as you have discovered.

So I'm writing a tool in C (called 'obazl') that analyzes the deps and generates the BUILD.bazel files, fast.  One tricky bit is that the dep analysis tool (codept) needs to know where to look to resolve deps, which means I have to discover the bazel repos in the project and pass them to the tool.  Currently I do this by using the C function `popen` to run `bazel info output_base` and then I add `external/` to the result to get the repos, which I can enumerate.  The tricky bit is that the OCaml/Coq packages in the repos must be bazelized, which means I've had to write another "bootstrap" tool to configure those repos, which means creating a directory structure with symlinks (for this version, the bazel repos will piggy-back on a local installation of the ocaml/coq stuff) and generating BUILD.bazel files with "import" rules to expose the precompiled resources.

So the bootstrap tool is run by a custom repository rule using repository_ctx.execute, resulting in a repo containing build files and targets suitable for use as dependencies by the project build files. That happens automatically when a build is run.  The project build files  have to be generated by running the 'obazl` tool by hand.  I've still got lots of details to nail down, but so far the results are very encouraging.  The bootstrap tool configures the repo with about 170 build files and 7500+ symlinks in about a second, and there's room for optimization. The `'obazl` tool is also fast, although unlike the bootstrap tool it runs the dependency analyzer tool which does a lot more work and may take a few seconds.

I also wrote a primitive Proof of Concept implementation of a "live update" tool, using fswatch. It watches the source tree, and whenever a file changes it regenerates the relevant build file. Almost instantly, at least in a tiny test project.  Actually I wrote that first, with just shell scripting. It was so fast I decided to try implementing a complete solution, in C for speed and portability.  The core logic is pretty close to done, but I have yet to decide how best to handle the case where the user has hand-tuned build files that should not be overwritten. I'm toying with the idea of a custom config file (maybe using Lua), but I suppose in the end there is no getting around the need to parse and update BUILD files. I believe that's what Gazelle does.

I don't really relish the idea of writing a Bazel build file parser in C, although I've made a start using re2c and lemon.  But such a tool would be useful to any Bazel tool dev so maybe somebody out there would be interested in collaborating.

Hope that gives you some ideas at least.  Once my tools are done they'll be open source.  One possibility I'm considering is to implement the application logic (i.e. what to do with the analysis data, what to emit) in Lua, so that users could customize the output.  With a little more work it might be possible to turn it into a general tool for bazel project analysis and code generation.

Gregg (R)

Alex Eagle

unread,
Apr 18, 2021, 11:28:32 PMApr 18
to bazel-discuss
Hey Gregg, great timing :)

I worked on the BUILD file generators a bit at Google (mostly the TypeScript one, I had an intern work on open-sourcing ts_auto_deps which is since archived). The big lesson coming out of Google was that this isn't something Bazel core is likely to ever support, and the ecosystem is very fragmented (even internally many languages developed their own)

Now at aspect.dev we are planning to jump into this with gusto. Our engineer Thulio wrote a Gazelle plugin for python which we are working to open-source, and Matt Mackay wrote bzlgen that you linked to - so we have a lot of experience here and are planning to do JS/TS next.

Gazelle is indeed a capable driver. Extensions don't always have to be written in Go. Our Python one uses a long-lived subprocess in a python interpreter to parse files. https://github.com/bazelbuild/bazel-gazelle/issues/938 is a proposal to make a more generic interface for subprocess plugins, and Paul Johnston wrote a POC using a gRPC transport. After looking around the ecosystem I'm pretty confident that Gazelle solves the problem best. It has the advantage that users only need to know how to configure and interact with one top-level tool.

As for when to run the generator, it's true that the typical workflow of "show the user an opaque error, they are puzzled, their teammate says did-you-remember-to-run-gazelle" is pretty bad. One solution is that a "watch mode" tool like iBazel can scrape the output messages from tools, looking for a hint like "run this tool to auto-fix", which it then does and re-runs the build. This just results in a slower-than-normal incremental rebuild but no disruption to the developer workflow, so I think this is a good approach and could be generalized to some capable tools/bazel wrapper. Another approach I've seen used at Google is for the editor Bazel plugin to know how to re-run the generator at the Right Time though I don't recall how it determines when that is.

-Alex

Yuval Kaplan

unread,
Apr 19, 2021, 6:56:45 PMApr 19
to bazel-discuss
For our migration to Bazel, we wrote a wee Python tool to generate BUILD files for C++ code. We've since added a module for Python code too. It analyses sources to find dependencies, runs `bazel query` (or uses configuration) to figure out how they may be satisfied, and emits Buildozer commands to fix BUILD files. By default, it asks before applying any changes, using an interactive interface like git add -p, which is indeed based on that same tool, only we added the option to see a bit more context. That's usually just what we need; the Buildozer commands don't usually affect manually-edited fields, and the tool runs very fast -- at least, fast enough that nobody seems to mind that the process isn't fully automatic.

Yuval

David Bakin

unread,
Apr 19, 2021, 10:59:36 PMApr 19
to bazel-discuss
I'm a little surprised at this discussion because I thought one of the premises of having everything explicit in BUILD files with a functional-ish build system was that automagically generating dependencies during the build - even if done with compiler output - is fragile and detrimental to correctness and repeatability.  And, thus, the pain of maintaining BUILD files is just one of things that's given with Bazel against which you have all the other benefits.  Yuval mentions doing this kind of generation to facilitate a migration, which is very reasonable, but I get the impression the rest of this discussion is going in the direction of auto-generation of BUILD files all the time.

Purpose of this post: To find out if my understanding of this is correct, vis-à-vis Bazel.  Thanks! -- David

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/1ed7207b-65fa-4846-ade8-61357539b7c5n%40googlegroups.com.

Gregg Reynolds

unread,
Apr 20, 2021, 8:00:32 AMApr 20
to David Bakin, bazel-discuss
On Mon, Apr 19, 2021 at 9:59 PM David Bakin <da...@bakins-bits.com> wrote:
I'm a little surprised at this discussion because I thought one of the premises of having everything explicit in BUILD files with a functional-ish build system was that automagically generating dependencies during the build - even if done with compiler output - is fragile and detrimental to correctness and repeatability.  And, thus, the pain of maintaining BUILD files is just one of things that's given with Bazel against which you have all the other benefits.  Yuval mentions doing this kind of generation to facilitate a migration, which is very reasonable, but I get the impression the rest of this discussion is going in the direction of auto-generation of BUILD files all the time.

Purpose of this post: To find out if my understanding of this is correct, vis-à-vis Bazel.  Thanks! -- David

It's unavoidable.  Dependency management and build (i.e. construction) management are two different things, even if most or at least many build tools glom them together.

The problem is that you do not always know what the depgraph is until build time, and during development the depgraph may change.  As a simple example, imagine any kind of preprocessing. In an OCaml project you might have .mll files that must be processed by ocamllex, and .mly files to be processed by ocamlyacc.  In a Coq project you might have .mlg (grammar) files that must be preprocessed by the 'coqpp' tool. There's a whole class of "PPX" (preprocessor extension) tasks that must run to generate source files to be compiled.  In general you might have all kinds of source code generation processes, and you only discover the dependencies of the preprocessing output files after you do the preprocessing.

Of course if the dependency structure of your code is 100% stable, then you can do the preprocessing once.  But that's unrealistic; during development, devs will add and remove dependencies all the time, so one way or another the build files must be edited. In principle, if you change a file that must be preprocessed you can probably figure out which deps you are adding or removing and you can edit the build files accordingly by hand.  But if that can be automated, so much the better.

HTH

Gregg R

Reply all
Reply to author
Forward
0 new messages