Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Very bad performance when xla_gpu_enable_triton_gemm=true on MI210/MI300X

195 views
Skip to first unread message

Steeve Morin

unread,
Sep 10, 2024, 6:38:12 PM9/10/24
to OpenXLA Discuss
Hello,

We've seen that enabling xla_gpu_enable_triton_gemm will severly (~50-100x) impact performance on AMD CDNA (MI chips).
For instance, Llama 3 8B will run at about 1 tok/s with triton enabled vs ~66 tok/s disabled.

Since its enabled by default, we were wondering if we did something wrong.

Also, we've seen that NVIDIA recommends disabling it also on their GPUs: https://github.com/NVIDIA/JAX-Toolbox?tab=readme-ov-file#environment-variables, although we've haven't seen any difference on NVIDIA so far.

Should we just go ahead and disable it altogether ?

Thank you,
Steeve

George Karpenkov

unread,
Sep 11, 2024, 10:54:20 AM9/11/24
to Steeve Morin, Chao Chen, OpenXLA Discuss
We've never tuned Triton tilings for AMD GPUs, and we've never debugged their performance, I'd check what exact kernels are slow in the timeline and potentially file bugs on Triton AMD backend.

+@Chao Chen can someone from AMD comment on Triton-on-AMD performance? 


> Also, we've seen that NVIDIA recommends disabling it also on their GPUs: https://github.com/NVIDIA/JAX-Toolbox?tab=readme-ov-file#environment-variables, although we've haven't seen any difference on NVIDIA so far.

That's a different question; I think we got mostly aligned with NVIDIA to remove that recommendation, it shouldn't be valid anymore.

--
You received this message because you are subscribed to the Google Groups "OpenXLA Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openxla-discu...@openxla.org.
To view this discussion on the web visit https://groups.google.com/a/openxla.org/d/msgid/openxla-discuss/f95a8c94-d2a4-4e47-a811-acb95439038en%40openxla.org.
For more options, visit https://groups.google.com/a/openxla.org/d/optout.

Jay Furmanek

unread,
Sep 11, 2024, 11:24:39 AM9/11/24
to OpenXLA Discuss, George Karpenkov, OpenXLA Discuss, Steeve Morin, Chao Chen
Hi guys,
We have the Triton GEMM rewriter enabled for AMD, but don't yet have the auto-tuner enabled yet (we're working on that).
That " xla_gpu_enable_triton_gemm" should probably be off by default for AMD until we finish there.

@Steve
What Llama-3 model are you using? We can use that one to ensure Triton helps there before re-enabling once we have the tuner in.
Is this JAX, or are you using Torch-XLA? Also, which GPU are you using? Triton likely won't ever be enabled for anything older than MI200.

George Karpenkov

unread,
Sep 11, 2024, 12:02:42 PM9/11/24
to Jay Furmanek, OpenXLA Discuss, Steeve Morin, Chao Chen
Ah I guess it's not surprising then, without an autotuner the performance will be quite bad.

Steeve Morin

unread,
Sep 11, 2024, 1:11:10 PM9/11/24
to OpenXLA Discuss, George Karpenkov, OpenXLA Discuss, Steeve Morin, Chao Chen, Jay Furmanek
Thanks you all!!

We’re using our own llama3 implem using our framework, ZML, which is based on Zig, StabeHLO/XLA and Bazel.
Would love to give you access to the repo to you folks if you want to, send me your github handles if you want.

More context:
https://x.com/steeve/status/1819005278467015044?s=46&t=IF2rTl8IrJJyT2pzj-XmBQ

Quick side question: we usually run compilations in parallel (in different threads), does that impact autotuning ?

Steeve

George Karpenkov

unread,
Sep 12, 2024, 4:26:00 AM9/12/24
to Steeve Morin, OpenXLA Discuss, Chao Chen, Jay Furmanek
> Quick side question: we usually run compilations in parallel (in different threads), does that impact autotuning ?

No, there's a mutex. Don't use more then one process for a single GPU though (but that probably won't work anyway due to BFC allocator).
Reply all
Reply to author
Forward
0 new messages