How to prevent expensive rule from running for queries?

130 views
Skip to first unread message

Konstantin

unread,
Dec 20, 2021, 3:43:44 PM12/20/21
to bazel-discuss
We have one particularly expensive to execute rule and only want it to run when it is explicitly requested. For the "build" command the tag "manual" does the trick and the rule does not run by the build even when "..." is requested.

The problem is we also use cquery and it appears that no matter what that expensive rule runs as part of any cquery!

What can we do to prevent execution of the expensive rule during cquery?

Thank you!
Konstantin

Alexandre Rostovtsev

unread,
Dec 20, 2021, 4:53:56 PM12/20/21
to Konstantin, bazel-discuss
Are you saying that the rule is expensive to load/analyze, so that `bazel cquery` itself bogs down?

If that is the case, the best solution is to fix the rule's implementation: it probably has an unnecessary quadratic inefficiency in its Starlark code somewhere. The classic example is repeatedly concatenating lists of dependencies/paths instead of using a depset.

Alternatively, you might use `except` (also known as `-`) in the query expression; see https://docs.bazel.build/versions/main/query.html#set-operations

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/51c18350-88c2-4322-8281-a033b12ff693n%40googlegroups.com.

Alex Humesky

unread,
Dec 20, 2021, 5:41:38 PM12/20/21
to Konstantin, bazel-discuss
The first thing that might come to mind is to subtract those targets, but I'm pretty sure having a target in the query (either directly or through a target pattern) will cause it to be analyzed in cquery:

fast_slow.bzl:

def _fast(ctx):
  print("this is fast")
  return []

fast = rule(
  implementation = _fast,
)

def _slow(ctx):
  print("this is slow")
  return []

slow = rule(
  implementation = _slow,
)


BUILD.bazel:

load("fast_slow.bzl", "fast", "slow")
fast(name = "fast")
fast(name = "fast2")
slow(name = "slow", tags = ["manual"])

slow(name = "slow2", tags = ["manual"])

then:

$ bazel clean ; bazel cquery "... - attr(tags, '.*manual.*', ...)"
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
DEBUG: /usr/local/google/home/ahumesky/bazel-workspaces/read_file_repo_rule/fast_slow.bzl:4:8: this is fast
DEBUG: /usr/local/google/home/ahumesky/bazel-workspaces/read_file_repo_rule/fast_slow.bzl:4:8: this is fast
DEBUG: /usr/local/google/home/ahumesky/bazel-workspaces/read_file_repo_rule/fast_slow.bzl:12:8: this is slow
DEBUG: /usr/local/google/home/ahumesky/bazel-workspaces/read_file_repo_rule/fast_slow.bzl:12:8: this is slow
INFO: Analyzed 4 targets (4 packages loaded, 9 targets configured).
INFO: Found 4 targets...
//:fast (96d6638)
//:fast2 (96d6638)
INFO: Elapsed time: 0.269s
INFO: 0 processes.
INFO: Build completed successfully, 0 total actions

//:slow and //:slow2 are removed from the results as expected, but its implementation function still runs. And the same happens when naming the targets explicitly:

$ bazel clean ; bazel cquery "... - //:slow - //:slow2"
INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
DEBUG: /usr/local/google/home/ahumesky/bazel-workspaces/read_file_repo_rule/fast_slow.bzl:4:8: this is fast
DEBUG: /usr/local/google/home/ahumesky/bazel-workspaces/read_file_repo_rule/fast_slow.bzl:4:8: this is fast
DEBUG: /usr/local/google/home/ahumesky/bazel-workspaces/read_file_repo_rule/fast_slow.bzl:12:8: this is slow
DEBUG: /usr/local/google/home/ahumesky/bazel-workspaces/read_file_repo_rule/fast_slow.bzl:12:8: this is slow
INFO: Analyzed 4 targets (4 packages loaded, 9 targets configured).
INFO: Found 4 targets...
//:fast (96d6638)
//:fast2 (96d6638)
INFO: Elapsed time: 0.277s
INFO: 0 processes.
INFO: Build completed successfully, 0 total actions

One way to approach this is to take advantage of the differences between bazel query and bazel cquery: bazel query only loads build files and does not run any of the implementation functions of rules, whereas cquery loads build files and does run the functions. So you might be able to run the query through bazel build first, to remove the expensive targets, and then run the flat list of targets through cquery:

$ TARGETS=$(bazel query "... - attr(tags, '.*manual.*', ...)")
Loading: 0 packages loaded
Loading: 1 packages loaded
Loading: 1 packages loaded


$ echo $TARGETS
//:fast2 //:fast

$ echo $TARGETS | tr ' ' '+'
//:fast2+//:fast

$ bazel cquery $(echo $TARGETS | tr ' ' '+')
DEBUG: /usr/local/google/home/ahumesky/bazel-workspaces/read_file_repo_rule/fast_slow.bzl:4:8: this is fast
DEBUG: /usr/local/google/home/ahumesky/bazel-workspaces/read_file_repo_rule/fast_slow.bzl:4:8: this is fast
INFO: Analyzed 2 targets (4 packages loaded, 7 targets configured).
INFO: Found 2 targets...
//:fast2 (96d6638)
//:fast (96d6638)
INFO: Elapsed time: 0.219s
INFO: 0 processes.
INFO: Build completed successfully, 0 total actions


This might not be perfect though, there might be cases where this won't work, e.g. because your query needs to go through select()s

On Mon, Dec 20, 2021 at 3:43 PM Konstantin <kon...@ermank.com> wrote:
--

Alex Humesky

unread,
Dec 20, 2021, 7:06:04 PM12/20/21
to Konstantin, bazel-discuss, Alexandre Rostovtsev
For some reason arostovtsev's email went to my spam -- yes check that solution first

Konstantin

unread,
Dec 20, 2021, 7:29:30 PM12/20/21
to bazel-discuss
Alexandre, the rule we have trouble with produces compile_commands.json for the project with about 500 binaries, so it is naturally heavy and not much can be done about it. Specific problem is that after we introduced that rule all cquery started to execute it and as the result run much longer. This is the side effect of mere existence of that rule which I am looking to alleviate.

I also tried your idea with `except` clause, but my observation shows what Alex already noticed above - `except` clause indeed removes the rule from the query output, but does not prevent its implementation function from running, so it does not really help for our purpose.

Alex's idea with using `query` for pre-filtering seems to work well, so we go with it until we see any better idea.

Thank you Alexandre and Alex!
Konstantin

Dan Cohn

unread,
Dec 20, 2021, 9:49:55 PM12/20/21
to bazel-discuss
I've found that even the simplest queries require loading and analyzing all rules imported via the WORKSPACE. This can be time-consuming when first executing a query. I'm not sure if this is what you're talking about.

A workaround we use is to "truncate" the WORKSPACE to the minimum set of imports required to successfully complete the query when run with the --keep_going option. Unlike with builds, Bazel doesn't seem to be very good at figuring out which rules to load when performing queries. It assumes that the whole WORKSPACE should be loaded to guarantee a valid query result. This is definitely not the case most of the time.

I'm interested to know if there are any other ways to speed up initial queries in a new workspace, although I realize this may be different from what Konstantin's looking for.
Reply all
Reply to author
Forward
0 new messages