Build times for OpenXLA

215 views
Skip to first unread message

Oscar Hernandez

unread,
May 19, 2023, 3:22:45 PM5/19/23
to OpenXLA Discuss
How long does it usually take to build OpenXLA on a single CPU system (e.g. 32+ cores) ? What is your experience so far?

----
I have been trying to build OpenXLA on a ThunderX2 ARM system for the last two days. I'm suspecting that I'm overwhelming my local file system (NFS). Notice how compiling specific source files are taking more than 10K seconds...
I'm planning to try it out next with a VASP filesystem and Ampere Altra (80 cores)..

[13,930 / 16,704] 223 actions running

    Compiling xla/python/callback.cc; 10494s processwrapper-sandbox

    Compiling xla/python/sharding.cc; 10424s processwrapper-sandbox

    Compiling xla/python/py_values.cc; 10411s processwrapper-sandbox

    Compiling xla/python/py_compile_only_client.cc; 10405s processwrapper-sandbox

    Compiling xla/python/py_client.cc; 10383s processwrapper-sandbox

    Compiling xla/python/py_array.cc; 10288s processwrapper-sandbox

    Compiling xla/python/py_buffer.cc; 10230s processwrapper-sandbox ...




Geoffrey Martin-Noble

unread,
May 19, 2023, 3:34:29 PM5/19/23
to Oscar Hernandez, OpenXLA Discuss
On the general theme of us being bad at naming things and people consequently being confused, I'd like to clarify that "building OpenXLA" is a bit of a category error: "OpenXLA" is a *project* and there are multiple repositories in it, containing many different things you could build.

Based on your logs, I'm assuming that you're talking about doing a Bazel build of https://github.com/openxla/xla. I assume you're following https://github.com/openxla/xla/blob/main/docs/developer_guide.md? Some of us have noticed in the past that Bazel can end up quite IO bound when CPU overwhelms the IO due to the way it uses symlinks for sandboxing. That experience was based on HDD and 96 cores, but I imagine that NFS and 32 might hit the same problems. A trick to get around this is to stick the Bazel sandbox in a ramdisk/tmpfs. The writes here are extremely short-lived, so it shouldn't need to be particularly big. To do this, you can add `--sandbox_base=/dev/shm` (or some other tmpfs) to your build. Note that if you're building within docker as those instructions appear to suggest, you may need to mount a host tmpfs or mount one on the fly with docker (https://docs.docker.com/storage/tmpfs/). If that setting fixes things for you, you may want to add it to your bazelrc (either globally in ~/.bazelrc or for the specific repo). Hope that helps.

Cheers,
Geoffrey

--
You received this message because you are subscribed to the Google Groups "OpenXLA Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openxla-discu...@openxla.org.
To view this discussion on the web visit https://groups.google.com/a/openxla.org/d/msgid/openxla-discuss/dc5ec898-37c3-435e-ab67-1f46ea2b415fn%40openxla.org.
For more options, visit https://groups.google.com/a/openxla.org/d/optout.

Oscar Hernandez

unread,
May 19, 2023, 3:55:10 PM5/19/23
to OpenXLA Discuss, Geoffrey Martin-Noble, OpenXLA Discuss, Oscar Hernandez
Geoffrey, thanks for the pointers. Correct. I'm doing a Bazel build for https://github.com/openxla/xla and I was following the instructions here: https://github.com/openxla/xla/blob/main/docs/build_from_source.md using a docker container via podman. Will try your suggestion about ramdisk/tmpfs. I will report back if this solves the problem. Oscar

Oscar Hernandez

unread,
May 19, 2023, 9:01:01 PM5/19/23
to OpenXLA Discuss, Oscar Hernandez, Geoffrey Martin-Noble, OpenXLA Discuss
It worked, and it was fast. 
INFO: Elapsed time: 6439.681s, Critical Path: 2572.93s
INFO: 29980 processes: 9170 internal, 2 local, 20808 processwrapper-sandbox.
INFO: Build completed successfully, 29980 total actions

Geoffrey Martin-Noble

unread,
May 19, 2023, 9:22:27 PM5/19/23
to Oscar Hernandez, OpenXLA Discuss
Glad that helped :-)

Although if I'm reading that right, I wouldn't call almost 2 hours fast... (I think that's the curse of depending on LLVM, unfortunately)

For rebuilds, I would also strongly recommend setting up a disk cache: https://bazel.build/remote/caching#disk-cache. Again, if you're doing this in docker, you'll need to mount to a consistent directory for that

Mehdi AMINI

unread,
May 20, 2023, 4:40:36 AM5/20/23
to Geoffrey Martin-Noble, Oscar Hernandez, OpenXLA Discuss
On Fri, May 19, 2023 at 6:22 PM 'Geoffrey Martin-Noble' via OpenXLA Discuss <openxla...@openxla.org> wrote:
Glad that helped :-)

Although if I'm reading that right, I wouldn't call almost 2 hours fast... (I think that's the curse of depending on LLVM, unfortunately)

Actually there is more to this I think. Here is a spreadsheet of the build action (compile and link) with the recommended build invocation from the developer guide: https://docs.google.com/spreadsheets/d/1uOTHseY7tc8AEREcTufaSZuP4ONVhLsA/edit?usp=sharing&ouid=101423881461419382626&rtpof=true&sd=true
On a high level, the four big pieces of the build (time in minutes):

1) xla 434.68
2) llvm 274.52
3) mlir 122.92
4) external/nccl_archive 113.30

Some learnings, three most impactful and obvious low hanging fruits:

1) Bazel compiles  most files three times (!!): one for the "host config" and twice otherwise (why? Some dynamic config in bazel splitting the build graph?). There used to be a flag to disable the separate "host" config, I don't know if it still works but the doc could be updated for this.
2) 82 min are spent building xla tests, even though I may never run them, there has to be a way to have tests not build unless we request testing?
3) We build too many LLVM targets, why isn't the configure steps also allowing the user to select targets? 31 min of my build are spent on the AMDGPU backend...

-- 
Mehdi

 

Jacques Pienaar

unread,
May 20, 2023, 9:34:43 AM5/20/23
to Mehdi AMINI, Geoffrey Martin-Noble, Oscar Hernandez, OpenXLA Discuss


On Sat, May 20, 2023, 1:40 AM Mehdi AMINI <joke...@gmail.com> wrote:


On Fri, May 19, 2023 at 6:22 PM 'Geoffrey Martin-Noble' via OpenXLA Discuss <openxla...@openxla.org> wrote:
Glad that helped :-)

Although if I'm reading that right, I wouldn't call almost 2 hours fast... (I think that's the curse of depending on LLVM, unfortunately)

Actually there is more to this I think. Here is a spreadsheet of the build action (compile and link) with the recommended build invocation from the developer guide: https://docs.google.com/spreadsheets/d/1uOTHseY7tc8AEREcTufaSZuP4ONVhLsA/edit?usp=sharing&ouid=101423881461419382626&rtpof=true&sd=true
On a high level, the four big pieces of the build (time in minutes):

1) xla 434.68
2) llvm 274.52
3) mlir 122.92
4) external/nccl_archive 113.30

Some learnings, three most impactful and obvious low hanging fruits:

1) Bazel compiles  most files three times (!!): one for the "host config" and twice otherwise (why? Some dynamic config in bazel splitting the build graph?). There used to be a flag to disable the separate "host" config, I don't know if it still works but the doc could be updated for this.

If I recall this flag is/will get removed soon and is already for internal version.

2) 82 min are spent building xla tests, even though I may never run them, there has to be a way to have tests not build unless we request testing?

Unfortunately as the tests are (almost all) binaries building is what you are requesting if you use "...". Best may be to help folks with a more narrow build target or just one where one does an exclude on tests path (well except here the tests and libraries intermingled). One could combine blaze query with blaze build to filter test targets. One could file an issue for infra team here.

3) We build too many LLVM targets, why isn't the configure steps also allowing the user to select targets? 31 min of my build are spent on the AMDGPU backend...

This is where the cmake ones really make life easier ...

Stella Laurenzo

unread,
May 20, 2023, 11:49:58 AM5/20/23
to Mehdi AMINI, Geoffrey Martin-Noble, Oscar Hernandez, OpenXLA Discuss
On Sat, May 20, 2023, 1:40 AM Mehdi AMINI <joke...@gmail.com> wrote:


On Fri, May 19, 2023 at 6:22 PM 'Geoffrey Martin-Noble' via OpenXLA Discuss <openxla...@openxla.org> wrote:
Glad that helped :-)

Although if I'm reading that right, I wouldn't call almost 2 hours fast... (I think that's the curse of depending on LLVM, unfortunately)

Actually there is more to this I think. Here is a spreadsheet of the build action (compile and link) with the recommended build invocation from the developer guide: https://docs.google.com/spreadsheets/d/1uOTHseY7tc8AEREcTufaSZuP4ONVhLsA/edit?usp=sharing&ouid=101423881461419382626&rtpof=true&sd=true
On a high level, the four big pieces of the build (time in minutes):

1) xla 434.68
2) llvm 274.52
3) mlir 122.92
4) external/nccl_archive 113.30

Some learnings, three most impactful and obvious low hanging fruits:

1) Bazel compiles  most files three times (!!): one for the "host config" and twice otherwise (why? Some dynamic config in bazel splitting the build graph?). There used to be a flag to disable the separate "host" config, I don't know if it still works but the doc could be updated for this.

I've never understood why things get done the way they do with bazel. The ergonomics are really quite bad for substantial projects. As near as I can tell, through many interactions, the devs don't use it for serious development outside of Google's internal flows (which differ entirely based on a lot of these ergonomics). Removing the separate host config flag is but one thing I've commented on about this.

A couple of weeks ago, I did stumble across this: https://github.com/openxla/iree/pull/13471 (more information in the linked issue). I think this accounts for the double non-host build. I also thought I saw some evidence of better caching between host/non configs too but I didn't study it closely.

Not sure if it will help /xla because the testing builds so many cc binaries (the projects I applied this too have a more OSS friendly testing setup. But I suspect that use of lld and static linking will outperform the cost of double building and linking shared objects for the test.

I *think* you could still get shared object test linking and not double builds with just the pic flag, but I didn't verify that, as we didn't need it.


2) 82 min are spent building xla tests, even though I may never run them, there has to be a way to have tests not build unless we request testing?
3) We build too many LLVM targets, why isn't the configure steps also allowing the user to select targets? 31 min of my build are spent on the AMDGPU backend...

This is why outside of /xla we don't use bazel for serious dev/deployment flows involving the compiler. The build files are monolithic in a few places and hard to keep tidy. Still, 31m for the AND GPU target does not match my experience (later machines, I guess).

Geoffrey Martin-Noble

unread,
May 20, 2023, 12:13:26 PM5/20/23
to Jacques Pienaar, Mehdi AMINI, Oscar Hernandez, OpenXLA Discuss
Yeah sorry I didn't mean to say that 2 hours was inevitable with LLVM. More like building LLVM changes your expectations of a "fast" build


On Sat, May 20, 2023, 06:34 Jacques Pienaar <jpie...@google.com> wrote:


On Sat, May 20, 2023, 1:40 AM Mehdi AMINI <joke...@gmail.com> wrote:


On Fri, May 19, 2023 at 6:22 PM 'Geoffrey Martin-Noble' via OpenXLA Discuss <openxla...@openxla.org> wrote:
Glad that helped :-)

Although if I'm reading that right, I wouldn't call almost 2 hours fast... (I think that's the curse of depending on LLVM, unfortunately)

Actually there is more to this I think. Here is a spreadsheet of the build action (compile and link) with the recommended build invocation from the developer guide: https://docs.google.com/spreadsheets/d/1uOTHseY7tc8AEREcTufaSZuP4ONVhLsA/edit?usp=sharing&ouid=101423881461419382626&rtpof=true&sd=true
On a high level, the four big pieces of the build (time in minutes):

1) xla 434.68
2) llvm 274.52
3) mlir 122.92
4) external/nccl_archive 113.30

Some learnings, three most impactful and obvious low hanging fruits:

1) Bazel compiles  most files three times (!!): one for the "host config" and twice otherwise (why? Some dynamic config in bazel splitting the build graph?). There used to be a flag to disable the separate "host" config, I don't know if it still works but the doc could be updated for this.

There are other things going on here with PIC. https://github.com/openxla/iree/pull/13471 on the IREE side just disabled this "feature"  that forces everything to build at least one of those extra times. A lot more details in the linked issue. 


If I recall this flag is/will get removed soon and is already for internal version.

Indeed 😥

I think something only gets built in the host config if it's used as a host tool, so I think this indicates that something is using heavyweight built tools to do build actions also. Could be possible to fix or exclude those


2) 82 min are spent building xla tests, even though I may never run them, there has to be a way to have tests not build unless we request testing?

Unfortunately as the tests are (almost all) binaries building is what you are requesting if you use "...". Best may be to help folks with a more narrow build target or just one where one does an exclude on tests path (well except here the tests and libraries intermingled). One could combine blaze query with blaze build to filter test targets. One could file an issue for infra team here.

Indeed the user has in this case requested to build tests. build_tag_filters I think would also help here before breaking into bazel query. But I also think it would be useful to examine the use case. Why would you want to build *everything* but not tests? It seems like in this case you probably are just interested in some specific binaries, no? 


3) We build too many LLVM targets, why isn't the configure steps also allowing the user to select targets? 31 min of my build are spent on the AMDGPU backend...

This is where the cmake ones really make life easier ...

The bazel configuration allows you to specify which targets to include when configuring LLVM: https://github.com/llvm/llvm-project/blob/main/utils/bazel/configure.bzl#L173

I think it's just at the repository config level and doesn't expose per-build flags though (we could fix that 🙂)

Stella Laurenzo

unread,
May 20, 2023, 12:15:10 PM5/20/23
to Mehdi AMINI, Geoffrey Martin-Noble, Oscar Hernandez, OpenXLA Discuss


On Sat, May 20, 2023, 8:49 AM Stella Laurenzo <stellar...@gmail.com> wrote:


On Sat, May 20, 2023, 1:40 AM Mehdi AMINI <joke...@gmail.com> wrote:


On Fri, May 19, 2023 at 6:22 PM 'Geoffrey Martin-Noble' via OpenXLA Discuss <openxla...@openxla.org> wrote:
Glad that helped :-)

Although if I'm reading that right, I wouldn't call almost 2 hours fast... (I think that's the curse of depending on LLVM, unfortunately)

Actually there is more to this I think. Here is a spreadsheet of the build action (compile and link) with the recommended build invocation from the developer guide: https://docs.google.com/spreadsheets/d/1uOTHseY7tc8AEREcTufaSZuP4ONVhLsA/edit?usp=sharing&ouid=101423881461419382626&rtpof=true&sd=true
On a high level, the four big pieces of the build (time in minutes):

1) xla 434.68
2) llvm 274.52
3) mlir 122.92
4) external/nccl_archive 113.30

Some learnings, three most impactful and obvious low hanging fruits:

1) Bazel compiles  most files three times (!!): one for the "host config" and twice otherwise (why? Some dynamic config in bazel splitting the build graph?). There used to be a flag to disable the separate "host" config, I don't know if it still works but the doc could be updated for this.

I've never understood why things get done the way they do with bazel. The ergonomics are really quite bad for substantial projects. As near as I can tell, through many interactions, the devs don't use it for serious development outside of Google's internal flows (which differ entirely based on a lot of these ergonomics). Removing the separate host config flag is but one thing I've commented on about this.

I *think* that with bazel 6, if you manage to get the host and target flags lined up, it will still fork into two execution configs but I would expect them to cache properly and not rebuild.

Of course, bazel makes it really hard to ensure those line up, in my experience (and actively encourages them not to).

Mehdi AMINI

unread,
May 21, 2023, 3:38:48 PM5/21/23
to Geoffrey Martin-Noble, Jacques Pienaar, Oscar Hernandez, OpenXLA Discuss
On Sat, May 20, 2023 at 9:13 AM Geoffrey Martin-Noble <gc...@google.com> wrote:
Yeah sorry I didn't mean to say that 2 hours was inevitable with LLVM. More like building LLVM changes your expectations of a "fast" build

Sure: I wanted to debunk a bit that "most of the time is spent in LLVM" when it is actually XLA code itself :)
Finding the inefficiencies of Bazel was not expected, even if it should have: I had spent time to align host/target config on TensorFLow and I sped up the TensorFlow CI drastically by disabling the separate host config (sad to hear the flag goes away).


On Sat, May 20, 2023, 06:34 Jacques Pienaar <jpie...@google.com> wrote:


On Sat, May 20, 2023, 1:40 AM Mehdi AMINI <joke...@gmail.com> wrote:


On Fri, May 19, 2023 at 6:22 PM 'Geoffrey Martin-Noble' via OpenXLA Discuss <openxla...@openxla.org> wrote:
Glad that helped :-)

Although if I'm reading that right, I wouldn't call almost 2 hours fast... (I think that's the curse of depending on LLVM, unfortunately)

Actually there is more to this I think. Here is a spreadsheet of the build action (compile and link) with the recommended build invocation from the developer guide: https://docs.google.com/spreadsheets/d/1uOTHseY7tc8AEREcTufaSZuP4ONVhLsA/edit?usp=sharing&ouid=101423881461419382626&rtpof=true&sd=true
On a high level, the four big pieces of the build (time in minutes):

1) xla 434.68
2) llvm 274.52
3) mlir 122.92
4) external/nccl_archive 113.30

Some learnings, three most impactful and obvious low hanging fruits:

1) Bazel compiles  most files three times (!!): one for the "host config" and twice otherwise (why? Some dynamic config in bazel splitting the build graph?). There used to be a flag to disable the separate "host" config, I don't know if it still works but the doc could be updated for this.

There are other things going on here with PIC. https://github.com/openxla/iree/pull/13471 on the IREE side just disabled this "feature"  that forces everything to build at least one of those extra times. A lot more details in the linked issue. 


If I recall this flag is/will get removed soon and is already for internal version.

Indeed 😥

I think something only gets built in the host config if it's used as a host tool, so I think this indicates that something is using heavyweight built tools to do build actions also. Could be possible to fix or exclude those

TableGen for ODS, but also PDLL and LinalgDSL are "heavy" host tools in practice I think. 
That said, I'm not sure why the LLVM backends are needed! I need to refresh my `bazel cquery` foo ...
 


2) 82 min are spent building xla tests, even though I may never run them, there has to be a way to have tests not build unless we request testing?

Unfortunately as the tests are (almost all) binaries building is what you are requesting if you use "...". Best may be to help folks with a more narrow build target or just one where one does an exclude on tests path (well except here the tests and libraries intermingled). One could combine blaze query with blaze build to filter test targets. One could file an issue for infra team here.

Indeed the user has in this case requested to build tests. build_tag_filters I think would also help here before breaking into bazel query. But I also think it would be useful to examine the use case. Why would you want to build *everything* but not tests? It seems like in this case you probably are just interested in some specific binaries, no? 

Probably, I'm just applying the recipes from the documentation about "how to build XLA" :)

 

3) We build too many LLVM targets, why isn't the configure steps also allowing the user to select targets? 31 min of my build are spent on the AMDGPU backend...

This is where the cmake ones really make life easier ...

The bazel configuration allows you to specify which targets to include when configuring LLVM: https://github.com/llvm/llvm-project/blob/main/utils/bazel/configure.bzl#L173

I think it's just at the repository config level and doesn't expose per-build flags though (we could fix that 🙂)

Right, we don't build all the targets, XLA configures them here: https://github.com/openxla/xla/blob/main/third_party/llvm/setup.bzl#L6

We already have to run a ./configure script before invoking bazel (I haven't looked in details what it does):

TF_NEED_CUDA=1 ./configure

So it seems natural to me that the configure script could tweak the list of backends to build?

Overall it just seems like Cmake+Ninja+ccache is the optimal solution for fast and customized builds, it is a bit sad considering the promises of Bazel!

Cheers,

-- 
Mehdi

Oscar Hernandez

unread,
May 21, 2023, 4:24:57 PM5/21/23
to OpenXLA Discuss, Mehdi AMINI, Jacques Pienaar, Oscar Hernandez, OpenXLA Discuss, Geoffrey Martin-Noble
I do appreciate the discussions happening here. I like the idea of further customizing builds via configure options - such as filtering tests or targets for reducing compilation times.

We should also consider mentioning in the developer's guide (https://github.com/openxla/xla/blob/main/docs/developer_guide.md) that using "--sandbox_base=<memory dir>" can significantly speed up the build. 
For me, it reduced the time from approximately 48 hours to about 2 hours. What a difference!

Oscar

Stella Laurenzo

unread,
May 21, 2023, 4:36:46 PM5/21/23
to Oscar Hernandez, OpenXLA Discuss, Mehdi AMINI, Jacques Pienaar, Geoffrey Martin-Noble


On Sun, May 21, 2023, 1:25 PM Oscar Hernandez <keyle...@gmail.com> wrote:
I do appreciate the discussions happening here. I like the idea of further customizing builds via configure options - such as filtering tests or targets for reducing compilation times.

We should also consider mentioning in the developer's guide (https://github.com/openxla/xla/blob/main/docs/developer_guide.md) that using "--sandbox_base=<memory dir>" can significantly speed up the build. 
For me, it reduced the time from approximately 48 hours to about 2 hours. What a difference!

+1 to documenting recommended options very explicitly. One of the problems with non trivial bazel setups is that the defaults can almost never be counted on and there isn't enough usage to have a good corpora of resources or traditional wisdom in stackoverflow and such. Apart from the host config thing (which I am quite cross that the devs didn't even know why that is critical much less that it should be the trivial default), most things do have a way to make them better. It is often obviously hard/convoluted but possible...

New backends and projects in openxla adopt cmake as the go to, but there is still a lot of bazel that is very hard to avoid and will be with us for a long time even if we decided to have a different policy overall. The effort to improve the situation is definitely not wasted...

Geoffrey Martin-Noble

unread,
May 22, 2023, 5:04:05 PM5/22/23
to Stella Laurenzo, Oscar Hernandez, OpenXLA Discuss, Mehdi AMINI, Jacques Pienaar
I think part of the issue is that we're usually loath to take on documenting another project or get too much into the non-standard weeds in getting started guides. It's a tricky balance to strike. We can't just make `--sandbox_base=/dev/shm` the default, for instance because it's system specific :-/ This may actually been more important to document than we'd realized: it appears to be caused by a big mismatch between core count and disk speed. Previously we'd only seen it on Google dev machines with a ton of cores, so had thought it perhaps a more niche issue, but hadn't considered a moderate number of cores and a file system even slower than HDD :-D

Configure scripts can indeed help with this and we're starting to build ours out more for IREE, including e.g. devcontainer setup.

Regarding building `...`, I think it's reasonable that the getting started guide says to do this and I don't think it should explain how Bazel targets are selected in detail (the Bazel documentation does cover this just fine). If there's some good candidate binary for smoke-testing the build, then that would work too, but generally "build everything not manual" is a pretty standard starting point.
Reply all
Reply to author
Forward
0 new messages