Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Enabling hlo op profiling for cpu

74 views
Skip to first unread message

Muneeb Anwar

unread,
Sep 26, 2024, 10:25:31 AM9/26/24
to openxla...@openxla.org, munee...@huawei.com, guillermo...@huawei.com
Hi,
I'm interested in enabling profiling of hlo ops on cpu. This is already supported for gpu under `xla/service/gpu` (with profiler tests) but isn't available on the cpu side. The purpose is to enable profiling tests and profiler runs for benchmarking hlo ops when compiling and running on cpu.

While I'm already delving into this, I would like to know where I should be tinkering the code and if there are any pitfalls that I should be aware of. Also, is there anyone else who is working on the same thing?

best,
Muneeb

Eugene Zhulenev

unread,
Sep 26, 2024, 5:58:43 PM9/26/24
to Muneeb Anwar, openxla...@openxla.org, munee...@huawei.com, guillermo...@huawei.com
Hi Muneeb,

What do you mean "profiling hlo ops"? Do you need to look at compiled assembly with perf? Get the wall time for hlo ops? Or something different? FWIW perf just works:

1. perf record -k 1 -g -o /tmp/perf.data -- $command
2. perf inject -j -i /tmp/perf.data -o /tmp/perf.data.jit
3. perf report /tmp/perf.data.jit

Eugene

--
You received this message because you are subscribed to the Google Groups "OpenXLA Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openxla-discu...@openxla.org.
To view this discussion on the web visit https://groups.google.com/a/openxla.org/d/msgid/openxla-discuss/CAHjT0%2B7n9wwCzpbYM-n2VkatFM1WZc0yTswuZM6_sToGOzZLWg%40mail.gmail.com.
For more options, visit https://groups.google.com/a/openxla.org/d/optout.

Md Faijul Amin

unread,
Sep 26, 2024, 7:09:11 PM9/26/24
to OpenXLA Discuss, Eugene Zhulenev, openxla...@openxla.org, munee...@huawei.com, guillermo...@huawei.com, Muneeb Anwar
Hi Muneeb and Eugene,

I think in order to inject JIT symbols to perf, we need to change some config (#define LLVM_USE_PERF 1) in the llvm-project project repo (https://github.com/llvm/llvm-project/blob/main/utils/bazel/llvm-project-overlay/llvm/include/llvm/Config/llvm-config.h#L74). Otherwise, llvm::JITEventListener::createPerfJITEventListener ( https://github.com/openxla/xla/blob/main/xla/service/cpu/simple_orc_jit.cc#L365)  returns nullptr. I am also interested to know if there is another way to config (e.g., during building or through some environment variable). I have been using perf by manually changing llvm-config.h.

Thanks,
Amin

Muneeb Anwar

unread,
Sep 27, 2024, 6:41:26 AM9/27/24
to Md Faijul Amin, OpenXLA Discuss, Eugene Zhulenev, munee...@huawei.com, guillermo...@huawei.com
Hi,
I can make the perf profiling work. That's not an issue.

I'm talking about the following profiling tools in the source tree.

xla/service/gpu/model/hlo_op_profiler.cc

used for profiling the timing of each executed hlo op.

The following tests
xla/service/gpu/model/hlo_op_profiler_test.cc
xla/service/gpu/model/hlo_op_profiler_run.cc

could be run with tensorflow using bazel, such as `bazel test -c opt //xla/service/gpu/model:hlo_op_profiler_test`

The test creates,compiles and runs (times) microbenchmarks for the specified hlo ops and emits the execution timing info of the op.

This profiling of hlo ops only exists for gpu and there's no support for such profiling in xla for cpu.

I want similar profiling and performance testing support for cpu.

Best,
Muneeb

Eugene Zhulenev

unread,
Sep 27, 2024, 11:14:18 AM9/27/24
to Muneeb Anwar, Md Faijul Amin, OpenXLA Discuss, munee...@huawei.com, guillermo...@huawei.com
To make this work you’d need to make changes to xla/backends/cpu/runtime/thunk_executor.h.

This is something we plan to work on, but we don’t have any concrete timelines.

Eugene 
Reply all
Reply to author
Forward
0 new messages