Hi,
I can make the perf profiling work. That's not an issue.
I'm talking about the following profiling tools in the source tree.
xla/service/gpu/model/hlo_op_profiler.cc
used for profiling the timing of each executed hlo op.
The following tests
xla/service/gpu/model/hlo_op_profiler_test.cc
xla/service/gpu/model/hlo_op_profiler_run.cc
could be run with tensorflow using bazel, such as `bazel test -c opt //xla/service/gpu/model:hlo_op_profiler_test`
The test creates,compiles and runs (times) microbenchmarks for the specified hlo ops and emits the execution timing info of the op.
This profiling of hlo ops only exists for gpu and there's no support for such profiling in xla for cpu.
I want similar profiling and performance testing support for cpu.
Best,
Muneeb