Obtaining input and output locations in run_binary() rules

2,861 views
Skip to first unread message

Ajith Ramanathan

unread,
May 26, 2021, 12:31:32 PM5/26/21
to bazel-discuss
Hi.

I'm porting a build to Bazel that must run on both Windows and Linux.  One step in the build (that is currently done manually, and that I wish to automate) is to generate some C++ code from some specifications.  Specifically, we have several specification files and for each we produce a C++ header and source pair using a custom tool.  One thing to note is that I have a naming convention such that foo.data produces foo.generated.hpp and foo.generated.cpp.

In my build, I have a cpp_binary() rule generating the tool, and then I use run_binary() to execute the tool.  The *.data files are specified in a filegroup().  My first attempt at the build was this:

cpp_binary(name = "tool",  ... )

filegroup(
  name = "data",
  # Let's say we have foo.data and bar.data.
  srcs = glob(["*.data"]),  
)

run_binary(
  name = "generate_code_from_data",
  srcs = [":data"],
  outs = ["foo.generated.hpp", 
          "foo.generated.cpp",
          "bar.generated.hpp", 
          "bar.generated.cpp"],
  args = ["$(locations :data)"],
  tool = [":tool"],
)

This didn't quite work.  The tool ran fine, but it placed the outputs in locations that Bazel was not expecting, and I ended up with error messages like "output 'path/to/data/foo.generated.hpp' was not created.

My next attempt changed the generate_code_from_data rule to (changes bold-faced):

run_binary(
  name = "generate_code_from_data",
  srcs = [":data"],
  outs = ["foo.generated.hpp", 
          "foo.generated.cpp",
          "bar.generated.hpp", 
          "bar.generated.cpp"],
  args = ["--in=$(locations :data)"
          "--out=$(location foo.generated.hpp)",
          "--out=$(location foo.generated.cpp)",
          "--out=$(location bar.generated.hpp)",
          "--out=$(location bar.generated.cpp)",],
  tool = [":tool"],
)

and then using the naming convention to match inputs and outputs.  It feels a little awkward as
1) there is an asymmetry in the way inputs and outputs are defined.  I tried using the make variables that genrule() understands but they don't seem to work or I have the wrong syntax (for example --out=$OUTS and --out=$(location OUTS) didn't seem to work).
2)  I have to  do some param matching to pair input and output paths.

So I changed it to a set of rules, one for each data file:

filegroup(name="foo_data", srcs=["foo.data")

run_binary(
  name = "generate_foo_code_from_foo_data",
  srcs = [":foo_data"],
  outs = ["foo.generated.hpp", 
          "foo.generated.cpp"],
  args = ["--in=$(locations :foo_data)"
          "--hppout=$(location foo.generated.hpp)",
          "--cppout=$(location foo.generated.cpp)"],
  tool = [":tool"],
)

and similarly for bar.data.   I suppose I could simplify with list comprehensions:

[filegroup(name = "%s_data" % b,
           srcs = [":%s.data" % b]) for b in ["foo", "bar"]]

and so on.

I have two questions:
1) Is the approach I'm taking (with or without the list comprehensions) the most natural way to express this build step in Bazel?  My first attempt feels pretty unnatural.
2) One thing I need to extract is the path to the generated hpp relative to the workspace root.  $(location foo.generated.hpp) produces something like bazel-out/k8-fastbuild/bin/path/to/data/foo.generated.hpp.  Is there some make variable that run_binary() understands that I could pass in, or should I just bake the path in manually (either in code or in the rule) as I know the directory structure?

Herrmann, Andreas

unread,
May 27, 2021, 4:25:06 AM5/27/21
to Ajith Ramanathan, bazel-discuss
Is there some make variable that run_binary() understands that I could pass in

Looking at the implementation of run_binary, it doesn't expand make variables, only locations: https://github.com/bazelbuild/bazel-skylib/blob/c6f6b5425b232baf5caecc3aae31d49d63ddec03/rules/run_binary.bzl#L29-L30
So, make variables are not available.

One thing I need to extract is the path to the generated hpp relative to the workspace root.

Bazel provides different forms of location expansion: $(execpath ) and $(rootpath ). The latter should give you this path relative to the workspace root.

If this gets too unwieldy, it may be worth writing a custom rule for this task instead of using an existing one like run_binary.

Inside a rule implementation you have access to input and output files as File objects. The short_path attribute should give you this path relative to the workspace root.

Best, Andreas

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/a629e540-a913-40a5-8de7-94589ae0e8a3n%40googlegroups.com.

Ajith Ramanathan

unread,
May 27, 2021, 8:02:07 PM5/27/21
to Herrmann, Andreas, bazel-discuss
Thank you for the very helpful response.

This isn't a task that occurs frequently enough for me to think about
a custom rule, but I'll keep that advice in mind!
Reply all
Reply to author
Forward
0 new messages