How to express targets post-processing with Bazel?

297 views
Skip to first unread message

Konstantin

unread,
Jun 11, 2022, 11:47:49 PM6/11/22
to bazel-discuss

Hello,

I struggle to express particular build behavior and seek for some leads how it can be done with Bazel.

Our build produces many binaries which can be requested in any combinations. Some of those binaries require post processing, for which we have a custom rule.

First problem: how to make sure that particular targets get post-processed when they are requested or are a dependency of something requested?

Next level of complexity: binaries are statically assigned to the groups and post-processing must handle all binaries in a group in one shot.

Example of the problem: A and B could be assigned to a group. When either of them (or both) are requested or either of them is a dependency of the target requested we need to post-process them both together even if only one was requested.

We can implement a rule such as “postprocess_group_AB” which would depends on both A and B, but how to get it executed when only A (for instance) is requested?

Any ideas?

Thank you!

Konstantin

Alex Humesky

unread,
Jun 14, 2022, 12:06:33 AM6/14/22
to Konstantin, bazel-discuss
For the first problem, do you mean something like this:

post_process(name = "post_processed_bin", src = "bin")
cc_binary(name = "bin")

and you want to make sure that target can depend only on "post_processed_bin" and not "bin" by itself?

This is a problem we've run into internally, and I'm not sure that we have a direct solution. One way to address this is with visibility:

post_process(name = "post_processed_bin", srcs = "bin")
cc_binary(name = "bin", visibility = ["//visibility:private"])

Then nothing outside the package can depend on bin, and they have to depend on "post_processed_bin" (really, then you might want to name the post processed one "bin" and the other one "_bin" or something like that)

This would require that the binaries are in separate packages. And things within the package can still depend on "bin". And visibility is only about dependencies, you can always build any target from the command line.

We've also talked about having "internal" targets that are private to macros, so that you could declare those two targets in a macro, and only allow the post processed one to be accessible outside the macro. That hasn't been implemented though.

For the 2nd question, it's a little unclear to me exactly how things are set up.

Is it something like this:

post_process(name = "group_1", srcs = ["bin_a", "bin_b"])
cc_binary(name = "bin_a")
cc_binary(name = "bin_b")

That is, by "statically assigned to groups", do you mean "the groupings are encoded in the build files"?

And does post processing depend on the content of bin_a and bin_b? Or is it more like they need to be signed with the same key (i.e., some 3rd pieces of information)? (I assume the former in the example below)

And in "When either of them (or both) are requested or either of them is a dependency of the target requested", by requested do you mean "requested to be built on the command line"? Any target can always be requested on the command line, so I'm not sure there's a way to prevent that. I recall that you may have a lot of transitions, so one safeguard you could put in is to add a flag where if it's not set, an action fails, and set that flag in your transitions. You could still set that flag from the command line, but then you'd have to know what you're doing. It would also require that you add that to your rule, and if you're using cc_library or another rule set, then that doesn't work so well.

For "post-process them both together even if only one was requested", does this mean that even if the targets are grouped, you need to be able to depend on individual members of that group? I think you can use attr.output_list (and a macro) to create labels for outputs of rules, like this:

defs.bzl:
def _post_process_impl(ctx):

  total_lines = ctx.actions.declare_file("total_lines")
  ctx.actions.run_shell(
    inputs = ctx.files.srcs,
    outputs = [total_lines],
    command = "cat %s | wc -l > %s" % (" ".join([s.path for s in ctx.files.srcs]), total_lines.path),
  )

  for src, output in zip(ctx.files.srcs, ctx.outputs.outs):
    ctx.actions.run_shell(
      inputs = [src, total_lines],
      outputs = [output],
      # processing depends on the content of all inputs
      command = "cp {src} {out} && cat {total_lines} >> {out}".format(
          src = src.path, out = output.path, total_lines = total_lines.path)
    )

  return [DefaultInfo(files = depset(ctx.outputs.outs))]

_post_process = rule(
  implementation = _post_process_impl,
  attrs = {
    "srcs": attr.label_list(allow_files = True),
    "outs": attr.output_list(),
  },
)

def post_process(name, srcs):
  outs = ["%s_processed" % src for src in srcs]  # order matters here for zip() above
  _post_process(
    name = name,
    srcs = srcs,
    outs = outs,
  )
BUILD:
load(":defs.bzl", "post_process")

post_process(
  name = "post",
  srcs = ["file1", "file2"],
)

genrule(
  name = "pkg1",
  srcs = [":file1_processed"],
  outs = ["pkg1.zip"],
  tools = ["@bazel_tools//tools/zip:zipper"],
  cmd = "$(location @bazel_tools//tools/zip:zipper) c $@ $<",
)

genrule(
  name = "pkg2",
  srcs = [":file2_processed"],
  outs = ["pkg2.zip"],
  tools = ["@bazel_tools//tools/zip:zipper"],
  cmd = "$(location @bazel_tools//tools/zip:zipper) c $@ $<",
)
file1:
line 1
line 2
file2:
line a
line b
line c


$ bazel build pkg1 pkg2
INFO: Analyzed 2 targets (7 packages loaded, 21 targets configured).
INFO: Found 2 targets...
INFO: Elapsed time: 0.322s, Critical Path: 0.05s
INFO: 6 processes: 1 internal, 5 linux-sandbox.
INFO: Build completed successfully, 6 total actions

$ zcat bazel-bin/pkg1.zip
line 1
line 2
5

$ zcat bazel-bin/pkg2.zip
line a
line b
line c
5


--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/f7ae4156-b1d2-4c52-8274-6799ac3957a1n%40googlegroups.com.

Konstantin

unread,
Jun 14, 2022, 3:58:49 PM6/14/22
to bazel-discuss

Trivial example of the target post-processing would be copying of the produced binaries to designated output folder, for example:

               bazel build t1 t2 t3

should achieve not only building of the targets t1, t2 and t3 and their dependencies, but also invoke post-processing (in some form) to copy the binaries of t1, t2 and t3 to designated folder and also all dependencies must do the same.

 The best I can think of would be using the aspect from the command line, i.e.

               bazel build t1 t2 t3 --aspects postproc.bzl%postproc_aspect

And let the aspect to post-process all given targets and their dependencies.

 This may work (I have not implemented PoC yet), but unfortunately it falls apart when Part 2 kicks in – for some targets post-processing is more complex and for some targets interconnected, for example:

Requirement: targets t1 and t2 must be always post-processed in one shot – together.

If neither target is requested then no post-processing is necessary.

If either (or both) targets are requested or if one (or both) of them is a dependency of something requested, then we need to invoke post-processing action on both t1 and t2 even if only one is needed.

 -- by "statically assigned to groups", do you mean "the groupings are encoded in the build files"?

 Yes, it is known upfront and never changes.

 -- And does post processing depend on the content of bin_a and bin_b

Yes. It calculates combined hash of all binaries in one post-proc group.

 Konstantin

Reply all
Reply to author
Forward
0 new messages