Enabling hlo op profiling for cpu

Muneeb Anwar

unread,

Sep 26, 2024, 10:25:31 AM9/26/24

to openxla...@openxla.org, munee...@huawei.com, guillermo...@huawei.com

Hi,

I'm interested in enabling profiling of hlo ops on cpu. This is already supported for gpu under `xla/service/gpu` (with profiler tests) but isn't available on the cpu side. The purpose is to enable profiling tests and profiler runs for benchmarking hlo ops when compiling and running on cpu.

While I'm already delving into this, I would like to know where I should be tinkering the code and if there are any pitfalls that I should be aware of. Also, is there anyone else who is working on the same thing?

best,

Muneeb

Eugene Zhulenev

unread,

Sep 26, 2024, 5:58:43 PM9/26/24

to Muneeb Anwar, openxla...@openxla.org, munee...@huawei.com, guillermo...@huawei.com

Hi Muneeb,

What do you mean "profiling hlo ops"? Do you need to look at compiled assembly with perf? Get the wall time for hlo ops? Or something different? FWIW perf just works:

1. perf record -k 1 -g -o /tmp/perf.data -- $command

2. perf inject -j -i /tmp/perf.data -o /tmp/perf.data.jit

3. perf report /tmp/perf.data.jit

Eugene

--
You received this message because you are subscribed to the Google Groups "OpenXLA Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openxla-discu...@openxla.org.
To view this discussion on the web visit https://groups.google.com/a/openxla.org/d/msgid/openxla-discuss/CAHjT0%2B7n9wwCzpbYM-n2VkatFM1WZc0yTswuZM6_sToGOzZLWg%40mail.gmail.com.
For more options, visit https://groups.google.com/a/openxla.org/d/optout.

Md Faijul Amin

unread,

Sep 26, 2024, 7:09:11 PM9/26/24

to OpenXLA Discuss, Eugene Zhulenev, openxla...@openxla.org, munee...@huawei.com, guillermo...@huawei.com, Muneeb Anwar

Hi Muneeb and Eugene,

I think in order to inject JIT symbols to perf, we need to change some config (#define LLVM_USE_PERF 1) in the llvm-project project repo (https://github.com/llvm/llvm-project/blob/main/utils/bazel/llvm-project-overlay/llvm/include/llvm/Config/llvm-config.h#L74). Otherwise, llvm::JITEventListener::createPerfJITEventListener ( https://github.com/openxla/xla/blob/main/xla/service/cpu/simple_orc_jit.cc#L365) returns nullptr. I am also interested to know if there is another way to config (e.g., during building or through some environment variable). I have been using perf by manually changing llvm-config.h.

Thanks,

Amin

Muneeb Anwar

unread,

Sep 27, 2024, 6:41:26 AM9/27/24

to Md Faijul Amin, OpenXLA Discuss, Eugene Zhulenev, munee...@huawei.com, guillermo...@huawei.com

Hi,

I can make the perf profiling work. That's not an issue.

I'm talking about the following profiling tools in the source tree.

xla/service/gpu/model/hlo_op_profiler.cc

used for profiling the timing of each executed hlo op.

The following tests

xla/service/gpu/model/hlo_op_profiler_test.cc

xla/service/gpu/model/hlo_op_profiler_run.cc

could be run with tensorflow using bazel, such as `bazel test -c opt //xla/service/gpu/model:hlo_op_profiler_test`

The test creates,compiles and runs (times) microbenchmarks for the specified hlo ops and emits the execution timing info of the op.

This profiling of hlo ops only exists for gpu and there's no support for such profiling in xla for cpu.

I want similar profiling and performance testing support for cpu.

Best,

Muneeb

Eugene Zhulenev

unread,

Sep 27, 2024, 11:14:18 AM9/27/24

to Muneeb Anwar, Md Faijul Amin, OpenXLA Discuss, munee...@huawei.com, guillermo...@huawei.com

To make this work you’d need to make changes to xla/backends/cpu/runtime/thunk_executor.h.

This is something we plan to work on, but we don’t have any concrete timelines.

Eugene

Reply all

Reply to author

Forward