Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Inserting custom-calls inside HLOComputation for customized runtime instrumentation

33 views
Skip to first unread message

Muneeb Anwar

unread,
Dec 5, 2024, 9:32:14 AM12/5/24
to OpenXLA Discuss, munee...@huawei.com, guillermo...@huawei.com
Hi,
I'm trying to rewrite HLOModule in order to insert custom-calls inside each HLOComputation. The purpose is to inject custom instrumentation calls (such as performance counter profiling) at the beginning and end of each HLOcomputation so that we can gather some CPU profiling data at the HLOComputation granularity.

I have been able to successfully register the runtime symbol names for the custom calls within xla/service/cpu/cpu_runtumexxx as well as in simple_orc_jit so that the symbols are resolved at compile time and also identified at runtime. I have tested this and know it works.

However, I'm trying to insert custom-calls inside each HLOComputation within an HLOModule at the compilation stage inside CpuCompiler::RunBackend(...) by attempting to walk over each HLOComputation and inserting custom-calls once and also make it the ROOT.
We also make the new ROOT instruction to take as inputs the inputs of the old ROOT, so the results of the HLOComputation does not change.

But, doing so makes the compilation seg fault at TuplePointsToAnalysis::PopulateDefinedBuffersAndAliases(...), specifically when calling GatherBuffersDefinedByInstruction(...) for the custom-call HLOInstruction inserted.

The rewriting to insert new custom-call HLOInstructions inside the HLOComputations seems to be messing up the structure of the HLOModule.

Could you please provide any pointers/hints as to where such re-writing could actually be done correctly or if I'm missing something?

best,
Muneeb
Reply all
Reply to author
Forward
0 new messages