Generate dependencies?

367 views
Skip to first unread message

Magnus Andersson

unread,
Aug 28, 2017, 9:56:06 AM8/28/17
to bazel-discuss
Hi,

Today we are using a GNU Make-based build system for a large code base. Our users write build configuration files, which are pretty similar to the Bazel BUILD files. The build system generates GNU Makefiles from these configuration files.

There are lots of custom tools in our build chain. Here is an example of a GNU Make target with a custom tool:

build/generated.c: model.xml
   
/path/to/some-funny-script.sh -o build/generated.c model.xml


This target will only be rebuilt if file "model.xml" is updated. But the target might read configuration files, and additional files that model.xml refers to. Script "some-funny-script.sh" might be developed by another team at another site. For a user it is hard to maintain a complete list of all dependencies, to make sure that it is rebuilt properly.

To solve this problem we have written a tool called "depgen", which is based on strace. If a command is prefixed with "depgen ", the tool will trace all file accesses and write the paths to all read and written files to a ".d" makefile fragment, similar to the ".d" files that "gcc -M" produces. When rebuilding, make will parse the ".d" file and rebuild if any read or written file has been updated:

-include build/generated.c.d
build
/generated.c: model.xml
      depgen
/path/to/some-funny-script.sh -o build/generated.c model.xml

If we migrate to Bazel, we need to write custom macros and rules for our custom tools. Is it possible to generate dependencies in a custom rule, in a similar way as we do with "depgen" in our GNU Makefiles?

BR,
Magnus

Marcel Hlopko

unread,
Aug 30, 2017, 3:05:31 AM8/30/17
to Magnus Andersson, bazel-discuss
Hi Magnus,

So the first run of the command depends on everything? And .d file file generated by depgen is there only as a performance optimization?



--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/fdcd22b6-80c5-4b12-bfe8-1205a34283b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
-- 
Marcel Hlopko | Software Engineer | hlo...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | Germany | Geschäftsführer: Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

Magnus Andersson

unread,
Aug 30, 2017, 9:08:00 AM8/30/17
to bazel-discuss, magnus.a...@gmail.com
Hi Marcel,

In the first build, make only sees the explicit dependency to "model.xml", but builds the target since file "build/generated.c" does not exist.

In the next build, make sees both the dependency to "model.xml" and all other dependencies in the generated ".d" file, and rebuilds if any of these dependencies have been updated.

The purpose is to avoid maintaining a list of all dependencies manually. Instead the target (via "depgen" or "gcc -M") generates a list of all dependencies. This makes sure that targets are rebuilt properly, but makes builds somewhat slower since more file timestamps need to be checked.

I guess my question can be rephrased to "In a custom rule, can dependencies be generated or does the user need to list all input files in the "srcs" argument?"

BR,
Magnus

Marcel Hlopko

unread,
Sep 4, 2017, 3:50:40 AM9/4/17
to Magnus Andersson, bazel-discuss
Sorry for not being clearer. In the first run, make has access to everything, and since some-funny-script.sh is maintained by some other team, it's possible that it will access everything. That's what I meant by my question. But I think you answered my clumsy question anyway :)

So I think what you want cannot be done easily in Bazel. There are multiple problems:
* you cannot run dep-gen in the analysis phase (where dependencies are calculated), that phase is completely functional and isolated, it cannot run commands, it cannot read files, it can only create the action graph with computed dependencies.
* we have .d file parsing and handling implemented, but only for C++ rules, and it happens behind the scenes, you cannot provide your custom .d file.
* you could use repository_rule to create .d files and produce BUILD files based on that. But from what I assume, it will be nontrivial. But this approach might be worth investigating.

But from my experience, maintaining dependencies explicitly is not a bottleneck at all, and you'd get all these nice features bazel has. Can you maybe elaborate on why this is a problem? Do you have too many dependencies? Or other teams don't want to transition to bazel?


For more options, visit https://groups.google.com/d/optout.

Magnus Andersson

unread,
Sep 4, 2017, 8:06:11 AM9/4/17
to bazel-discuss, magnus.a...@gmail.com
Yep, the problem is that we have too many dependencies. We have complex code generators in our build chain. They read a model file, which might include tens or hundreds of other model files. As a user it is hard to maintain a list of all these dependencies.

(You can compare it to C++ compilation rules. There a .cpp file includes .h files transitively. Here a model file includes model fragments transitively. As a user it is just as hard maintaining the list of all model fragment files, as maintaining a list of all included .h files.)

Depgen is a very convenient feature. When we generate our makefiles, we just need to prefix our build recipes with "depgen " to make sure that the target will be rebuilt if any dependency is updated.

Note that we don't necessarily need to use .d files in Bazel. If dependencies can be added programmatically in some other way, it should work as well.

BR,
Magnus

Marcel Hlopko

unread,
Sep 4, 2017, 8:16:15 AM9/4/17
to Magnus Andersson, bazel-discuss, lbe...@google.com, dsl...@google.com, dmar...@google.com
Let's summon the elders, since I'm out of ideas here :) +Lukács T. Berki +Dmitry Lomov +Damien Martin-guillerez 


For more options, visit https://groups.google.com/d/optout.

Damien Martin-Guillerez

unread,
Sep 4, 2017, 8:41:12 AM9/4/17
to Marcel Hlopko, Magnus Andersson, bazel-discuss, lbe...@google.com, dsl...@google.com
"* you cannot run dep-gen in the analysis phase (where dependencies are calculated), that phase is completely functional and isolated, it cannot run commands, it cannot read files, it can only create the action graph with computed dependencies."

With skylark remote repository you can actually do that, we do kind of hack like that to do integration test in bazel. You could run the depgen as a step in a remote repository that depends only on the model file, then have it generate a bunch of BUILD file that declare the correct dependencies so you don't rebuild everything everytime.

However if you have a lot of call to depgen, that might be tricky.

Lukács T. Berki

unread,
Sep 4, 2017, 12:41:44 PM9/4/17
to Damien Martin-Guillerez, Marcel Hlopko, Magnus Andersson, bazel-discuss, dsl...@google.com
I don't have anything smarter to say than what Damien said -- it's possible to beat remote repositories into implementing this, but it won't be pretty. Logic like .d file parsing is not exposed to Skylark and the reason is that if you discover inputs as your action runs, said action cannot be sandboxed or executed remotely. In fact, even for C++, .d files should only narrow the set of dependencies, never extend them. At least that's the general idea, but looking at the code, it appears that .d file *are* able to extend that set :(

To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.
--
-- 
Marcel Hlopko | Software Engineer | hlo...@google.com | 


Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | GermanyGeschäftsführer: Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891



--
Lukács T. Berki | Software Engineer | lbe...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

Magnus Andersson

unread,
Sep 4, 2017, 1:20:56 PM9/4/17
to bazel-discuss, dmar...@google.com, hlo...@google.com, magnus.a...@gmail.com, dsl...@google.com
Hmm, interesting. The ability to rebuild targets if a dependency is updated, without the need to list all dependencies manually, is a core feature in our build system and premium make implementations like ClearCase clearmake and ElectricCloud emake. (They use their own filesystem to audit the build and detect all file accesses.) I expected to find a clever solution to this problem in Bazel as well.

But if listing all .cpp and .h files in cc_library(...) works for C++ code, I suppose it should work for our model files as well. I have to give this a thought, I have not wrapped my mind around this idea yet :)
--
-- 
Marcel Hlopko | Software Engineer | hlo...@google.com | 


Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | GermanyGeschäftsführer: Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

Austin Schuh

unread,
Sep 6, 2017, 3:12:57 AM9/6/17
to Magnus Andersson, bazel-discuss, dmar...@google.com, hlo...@google.com, dsl...@google.com
Consider drawing parallels to how something like golang build support works.  The gazelle tool analyzes the .go files, and updates the BUILD files to match the build graph and dependency graph contained therein.

You should be able to do something similar.  Either teach the model building rule to verify that the dependency set provided matches the dependency set that it expected (and print out what changes are needed), or write a tool which auto-generates the BUILD file.  It may make sense to do both.  That'll both let you automate the work required to create the BUILD files, and will make sure they don't decay.  gazelle (or buildifier, I can't remember) has support for programmatically modifying BUILD files, which solves part of this problem.

There's a huge amount of complexity and CPU time in something like emake spent on discovering the build graph automatically.  Bazel instead chooses to enforce that all rules only use the dependencies that were specified.  This keeps the build graph in sync with what dependencies are being used, which is the problem that emake and clearcase are trying to solve.  I find that the more constrained semantics of bazel (in terms of dependencies) make it much easier to reason about the dependency graph, and also provide tools to query it.

Austin
Reply all
Reply to author
Forward
0 new messages