How does Bazel decide how many tests to run in parallel?

2,552 views
Skip to first unread message

Erik Kuefler

unread,
Oct 7, 2015, 8:45:29 PM10/7/15
to bazel-...@googlegroups.com
Looking at the output from Bazel's profiling on my build machine, I notice that Bazel is gladly running at least a dozen or so of my "small" unit tests in parallel, but will only run a few of my "large" tests at once (the others show "action resource lock"). My "large" tests here are selenium browser tests that are run on a remote grid - so although they take a long time, rely on an external resource, and can be flaky, they consume very few resources on the build machine and it's safe to run many of them in parallel.

What's the best way to express a test that takes a long time but consumes few resources? I could set it as "small" with a timeout of "long", but that feels kind of weird in this case. I'm also confused since the user manual says " Bazel uses the size only to determine a default timeout.", but it looks to me like it's also using it as a hint for resource estimation.

Han-Wen Nienhuys

unread,
Oct 8, 2015, 2:25:24 AM10/8/15
to Erik Kuefler, bazel-...@googlegroups.com
You are right. The resources are different, and listed here,


I would expect that most tests get blocked on the CPU usage, which is roughly constant for all sizes.

You can manually set the available resources, using  --local_resources

On Thu, Oct 8, 2015 at 2:45 AM, Erik Kuefler <ekue...@gmail.com> wrote:
Looking at the output from Bazel's profiling on my build machine, I notice that Bazel is gladly running at least a dozen or so of my "small" unit tests in parallel, but will only run a few of my "large" tests at once (the others show "action resource lock"). My "large" tests here are selenium browser tests that are run on a remote grid - so although they take a long time, rely on an external resource, and can be flaky, they consume very few resources on the build machine and it's safe to run many of them in parallel.

What's the best way to express a test that takes a long time but consumes few resources? I could set it as "small" with a timeout of "long", but that feels kind of weird in this case. I'm also confused since the user manual says " Bazel uses the size only to determine a default timeout.", but it looks to me like it's also using it as a hint for resource estimation.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/CANK5_0pVtAoKYy54X52kS1ZOwkgO%3DRfoqBqjmuPmwNpWRaRb%2BA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.



--
Han-Wen Nienhuys
Google Munich
han...@google.com

Erik Kuefler

unread,
Oct 12, 2015, 12:24:54 AM10/12/15
to Han-Wen Nienhuys, bazel-...@googlegroups.com
Ah interesting. Based on those values I'm guessing that I'm actually getting limited on IO capacity, since it sets small tests to 0 and large tests to .1, and it appears to cap me at exactly 10 simultaneous large tests. The manual also says IO capacity is always set to 1.0, so it makes sense that it's correctly inferring the right values for memory and CPU on my beefy build machine but missing IO. I'll try using local_resources to turn it up. Thanks!

Han-Wen Nienhuys

unread,
Oct 12, 2015, 2:48:29 AM10/12/15
to Erik Kuefler, bazel-...@googlegroups.com
FWIW, the IO limit is a historical artifact from the time we had rotating disks, and doesn't make much sense if you have an SSD.

(I tried removing it a year ago or so, and unfortunately, found out that we still have rotating disks at google in our workstations)
Reply all
Reply to author
Forward
0 new messages