Is there a way to deliberately ignore changes in particular sources?

428 views
Skip to first unread message

Konstantin

unread,
Sep 11, 2021, 10:29:46 PM9/11/21
to bazel-discuss
I understand this question is very "heretic" for the whole Bazel hermiticity paradigm but still have to ask this: is there a way to exclude particular C++ header files from the dirtiness check, so that changes to those files would NOT cause rebuilds of the dependent targets?

Let me explain why such question. We have huge C++ codebase which uses lots of enums. Those enums are defined in (generated) header files and the problem is that those headers are included pretty much everywhere, so any change to enums causes full rebuild. This is the problem.

From experience we know that although those enum headers could be updated multiple times a day in the majority of cases the change only adds new item(s) to enum list. Enums rarely get deleted or shuffled. If we can detect that particular enum header only got something added, we can skip compilation of all the existing code which relies on existing enums. Only new code which uses new enums needs to be compiled and it would be, because it is new code - new or updated sources. 

This is the reason I am looking for a way to exclude (conditionally) some source files (headers) from causing the rebuild of the targets depending on it when they change. I understand it is ugly as sin but it creates incremental build speed up we cannot miss.

Any advice would be highly appreciated!
Konstantin 

Alex Humesky

unread,
Sep 17, 2021, 7:20:45 PM9/17/21
to Konstantin, bazel-discuss
I think some of the pieces are available in bazel, but maybe not a complete solution.

Looking around, there is the unused_inputs_list parameter of ctx.actions.run():
(I used the same underlying mechanisms at a much lower level (i.e. far away from Starlark) here: https://github.com/bazelbuild/bazel/commit/226ad7f1e2ca26bb41d4bd7ee6440fb73a564add)

With that, you could create a rule which copies a file, and then ignores subsequent changes to that file:

BUILD:

load(":defs.bzl", "ignore_changes")

sh_binary(
  name = "copy",
  srcs = ["cp.sh"],
  visibility = ["//visibility:public"],
)

ignore_changes(
  name = "header",
  src = "main_changes_ignored.h",
  out = "main.h"
)

cc_binary(
  name = "main",
  srcs = [
    "main.cc",
    "main.h",
  ],
)

defs.bzl:

def _ignore_changes_impl(ctx):
  unused_inputs_file = ctx.actions.declare_file(ctx.label.name + ".unused_inputs")
  src = ctx.files.src[0]
  ctx.actions.run(
    inputs = [src],
    unused_inputs_list = unused_inputs_file,
    outputs = [ctx.outputs.out, unused_inputs_file],
    executable = ctx.executable._copy,
    arguments = [
      ctx.actions.args()
        .add(src).add(ctx.outputs.out).add(unused_inputs_file)],
  )
  return [DefaultInfo(files = depset([ctx.outputs.out]))]

ignore_changes = rule(
  implementation = _ignore_changes_impl,
  attrs = {
    "src": attr.label(mandatory = True, allow_files = True),
    "out": attr.output(mandatory = True),
    "_copy": attr.label(default = "//:copy", cfg = "exec", executable = True),
  }
)


main.cc:

#include <stdio.h>
#include "main.h"

int main() {
  printf("foo is %d\n", foo);
  return 0;
}


main_changes_ignore.h:

#define foo 1

cp.sh:

cp "$1" "$2"
# unused inputs file
echo "$1" > "$3"


On the first build, main will print "foo is 1", then changing the header file will not result in main being rebuilt. Deleting the intermediate will cause it to rebuild:

$ bazel run main
Starting local Bazel server and connecting to it...
INFO: Analyzed target //:main (17 packages loaded, 85 targets configured).
INFO: Found 1 target...
Target //:main up-to-date:
  bazel-bin/main
INFO: Elapsed time: 4.100s, Critical Path: 0.11s
INFO: 10 processes: 7 internal, 3 linux-sandbox.
INFO: Build completed successfully, 10 total actions
INFO: Build completed successfully, 10 total actions
foo is 1

$ echo "#define foo 2" > main_changes_ignored.h

$ bazel run main
INFO: Analyzed target //:main (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:main up-to-date:
  bazel-bin/main
INFO: Elapsed time: 0.154s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
foo is 1

$ rm -f bazel-bin/main.h

$ bazel run main
INFO: Analyzed target //:main (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:main up-to-date:
  bazel-bin/main
INFO: Elapsed time: 0.171s, Critical Path: 0.06s
INFO: 4 processes: 1 internal, 3 linux-sandbox.
INFO: Build completed successfully, 4 total actions
INFO: Build completed successfully, 4 total actions
foo is 2


The downside to this, of course, is that you have to know to manually delete the output file to get bazel to rebuild it and the downstream actions, and when to do that. I'm also not sure off hand how this would interact with remote caching and remote execution (if I had to guess, I might say unexpected or undesirable things would happen).

This also doesn't sound like this solves the whole problem, because it sounds like you're after something more programmatic / automatic. What seems to be missing here is bazel having some idea of the previous state of the file (i.e. actions aren't told what changed). Two things come to mind here:
1) workers, which could store state, and
2) the workspace status command, which could query against the checked-in state of the file and send that information to an action via stamping

Those are pretty hacky though, and the considerations above about remote caching and remote execution might apply here too.

A more bazel-friendly approach might be to break the enums up into separate files as granularly as possible, and have other targets depend on the minimum set of enums they need (of course, I don't know the details here, whether this would work or not).

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/13fbb8d7-223a-42d7-8e41-6fec67cd8713n%40googlegroups.com.

Konstantin

unread,
Sep 18, 2021, 12:35:25 AM9/18/21
to bazel-discuss
Hi Alex, I was sure that if somebody is going to respond to this - probably going to be you! :-)

Great idea about unused_inputs_list! As a matter of fact we already using it to implement code generation with the list of inputs not known before it runs. We first "oversubscribe" globbing the whole tree where the inputs may come from and then after the code is generated we "unsubscribe" from all the files from the first glob which appear not to be the inputs. We do it with  unused_inputs_list and it works like a charm. Too bad it did not come to me to use  unused_inputs_list for ignoring input changes when necessary.

I understand your example with "ignore_changes" copy operation and I believe can re-shape it into the actual solution for our problem!

Thanks a lot!
Konstantin

Alex Humesky

unread,
Sep 20, 2021, 4:37:34 PM9/20/21
to Konstantin, bazel-discuss
Reply all
Reply to author
Forward
0 new messages