XLA GPU MLIR Codegen

Md Faijul Amin

unread,

Feb 6, 2025, 11:27:52 PM2/6/25

to OpenXLA Discuss

Hi,

I have watched XLA GPU MLIR Codegen presented by Alexander Belyaev in the 2024 OpenXLA Fall Dev Lab. There was an example with mlir pass dumping.

run_hlo_module \
--platform=CUDA --xla_disable_all_hlo_passes \
--reference_platform=""--v=5 /tmp/gelu.hlo

However, --v=5 does not work. Also I am unable to see mlir pass dumps. Could anyone help getting mlir pass dumps?

Thanks,

Amin

Sayce Falk

unread,

Feb 7, 2025, 1:04:05 PM2/7/25

to Md Faijul Amin, Alexander Belyaev, OpenXLA Discuss

Good questions, Amin - + @Alexander Belyaev to help answer!

--
You received this message because you are subscribed to the Google Groups "OpenXLA Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openxla-discu...@openxla.org.
To view this discussion visit https://groups.google.com/a/openxla.org/d/msgid/openxla-discuss/1fe9f9a4-7248-4110-9595-13d886f815f0n%40openxla.org.
For more options, visit https://groups.google.com/a/openxla.org/d/optout.

Alexander Belyaev

unread,

Feb 7, 2025, 2:45:16 PM2/7/25

to OpenXLA Discuss, Sayce Falk, OpenXLA Discuss, Md Faijul Amin, Alexander Belyaev

Hi Amin,

does the command not work at all? Or it runs the module, but you see no logs? What version are you using? The MLIR emitters were enabled by default only in the end of October. It could be that the compilation goes via the legacy emitters, that were removed in October/November. There used to be a flag --xla_gpu_mlir_emitter_level=4 that was necessary to enable the new emitters.

Alex

Md Faijul Amin

unread,

Feb 7, 2025, 2:55:52 PM2/7/25

to OpenXLA Discuss, Alexander Belyaev, Sayce Falk, OpenXLA Discuss, Md Faijul Amin

Hi Alex,

The command works if remove --v=5, but does not show any log. I am using xla main branch. I can see some logs having TF_CPP_VMODULE=emitter_base=5, though.

~Amin

Alexander Belyaev

unread,

Feb 7, 2025, 2:59:55 PM2/7/25

to OpenXLA Discuss, Md Faijul Amin, Alexander Belyaev, Sayce Falk, OpenXLA Discuss

Nice! So, do you see the IR dumps after every pass now?

Md Faijul Amin

unread,

Feb 7, 2025, 3:08:19 PM2/7/25

to OpenXLA Discuss, Alexander Belyaev, Md Faijul Amin, Sayce Falk, OpenXLA Discuss

Yes, I can see MLIR passes up to https://github.com/openxla/xla/blob/main/xla/backends/gpu/codegen/emitters/emitter_base.cc#L311
But I would like to see also https://github.com/openxla/xla/blob/main/xla/backends/gpu/codegen/emitters/emitter_base.cc#L319

Alexander Belyaev

unread,

Feb 7, 2025, 3:13:22 PM2/7/25

to OpenXLA Discuss, Md Faijul Amin, Alexander Belyaev, OpenXLA Discuss

But that's just a translation to LLVM IR, there are no MLIR passes left.

Md Faijul Amin

unread,

Feb 7, 2025, 3:17:08 PM2/7/25

to OpenXLA Discuss, Alexander Belyaev, Md Faijul Amin, OpenXLA Discuss

I see, thanks for the explanation. Is there anyway to see the steps or logs for that translation to LLVM IR?

Alexander Belyaev

unread,

Feb 7, 2025, 4:02:08 PM2/7/25

to OpenXLA Discuss, Alexander Belyaev, OpenXLA Discuss

You can dump the llvm Module after the translation. But you are probably interested not in this, but in the passes that lower LLVM -> PTX. you can find those in, for example, xla/service/gpu/nvptx_compiler.cc

you can also try to dump different modules/stages by adding XLA_FLAGS=--xla_dump_to=/tmp/myfolder as described in https://openxla.org/xla/tools

Md Faijul Amin

unread,

Feb 11, 2025, 1:09:55 AM2/11/25

to OpenXLA Discuss, Alexander Belyaev, OpenXLA Discuss

Hi Alex,
Is it possible set a breakpoint at cudaMalloc/cudaMemcpy for loop fusion while running with run_hlo_module? Trying to understand where host to memory transfer code is located while running with run_hlo_module.

Thanks,

Amin

Alexander Belyaev

unread,

Feb 11, 2025, 5:50:15 AM2/11/25

to Md Faijul Amin, OpenXLA Discuss

Hi Amin,

I think this logic is in xla/stream_executor/gpu.

--

Alexander Belyaev

Md Faijul Amin

unread,

Feb 11, 2025, 11:16:34 AM2/11/25

to OpenXLA Discuss, Alexander Belyaev, OpenXLA Discuss, Md Faijul Amin

Thank you very much! Alex for the pointers to gpu memory allocations. I found the CUDA allocations here https://github.com/openxla/xla/blob/main/xla/stream_executor/cuda/cuda_executor.cc
This has been very useful.

Alexander Belyaev

unread,

Feb 11, 2025, 11:26:50 AM2/11/25

to Md Faijul Amin, OpenXLA Discuss

We can also chat over VC, if it is more convenient for you.

Reply all

Reply to author

Forward