Status of new typed FFI API

Corentin Godeau

unread,

Jul 11, 2024, 10:33:04 AM7/11/24

to OpenXLA Discuss

Hi everyone,

For the past few months we have been working on an ML framework leveraging StableHLO and XLA.

Recently, we started digging the custom calls topic and found out about the upcoming typed FFI API. It seems that progress is being made but is there an ETA/roadmap for that feature ?

In parallel, we've been patching StableHLO and XLA internally to run some tests and see what was possible.

Without too much work, we ended up having the custom calls work on CPU and GPU with the new typed FFI but some features are not working yet. Namely, accessing user data and allocating/freeing device memory is triggering a segfault. This was to be expected given that we had to patch stuff but we just wanted to be sure we are doing things right and understand how it will work eventually.

Do we agree that the expected way is to use the ExecuteContext through PJRT and then retrieve the user data in the custom call through the XLA_FFI_ExecutionContext_Get callback ?

Thanks a lot and keep up the good work !

Peter Hawkins

unread,

Jul 11, 2024, 10:37:56 AM7/11/24

to Corentin Godeau, Eugene Zhulenev, Dan Foreman-Mackey, OpenXLA Discuss

+Eugene Zhulenev +Dan Foreman-Mackey

--
You received this message because you are subscribed to the Google Groups "OpenXLA Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openxla-discu...@openxla.org.
To view this discussion on the web visit https://groups.google.com/a/openxla.org/d/msgid/openxla-discuss/566fdd19-47db-4d3a-a6ee-02ca5e675247n%40openxla.org.
For more options, visit https://groups.google.com/a/openxla.org/d/optout.

Eugene Zhulenev

unread,

Jul 11, 2024, 12:21:16 PM7/11/24

to Peter Hawkins, Corentin Godeau, Dan Foreman-Mackey, OpenXLA Discuss

On Thu, Jul 11, 2024 at 7:37 AM Peter Hawkins <phaw...@google.com> wrote:

+Eugene Zhulenev +Dan Foreman-Mackey

On Thu, Jul 11, 2024 at 10:33 AM Corentin Godeau <core...@zml.ai> wrote:
Hi everyone,

Hi Corentin,

It's almost ready, just needs a few finishing touches to fix remaining bugs. We are working on migrating existing custom calls to XLA FFI and fixing issues as we find them. However, most of the new APIs (execution context, memory allocator) are not used in any of the old custom calls, so we have very sparse test coverage for new features.

For the past few months we have been working on an ML framework leveraging StableHLO and XLA.

Recently, we started digging the custom calls topic and found out about the upcoming typed FFI API. It seems that progress is being made but is there an ETA/roadmap for that feature ?

In parallel, we've been patching StableHLO and XLA internally to run some tests and see what was possible.

Without too much work, we ended up having the custom calls work on CPU and GPU with the new typed FFI but some features are not working yet. Namely, accessing user data and allocating/freeing device memory is triggering a segfault. This was to be expected given that we had to patch stuff but we just wanted to be sure we are doing things right and understand how it will work eventually.

Do you have minimal reproducers for the bugs? I suspect that we forget to set some of the pointers and you end up dereferencing nullptr somewhere. Also what is the backend? GPU or CPU? I don't think it was ever implemented for CPU, because for CPU you can use a regular new/delete, but on GPU we make sure that custom calls can have access to underlying BFCAllocator that shares memory pool with XLA itself.

Do we agree that the expected way is to use the ExecuteContext through PJRT and then retrieve the user data in the custom call through the XLA_FFI_ExecutionContext_Get callback ?

Yes, kind of... you are not supposed to ever touch XLA_FFI_Xyz C APIs, and instead always use C++ wrappers, there is one ExecutionContext example here: https://github.com/openxla/xla/blob/main/xla/service/gpu/custom_call_test.cc#L724-L756

Thanks,

Eugene

Corentin Godeau

unread,

Jul 11, 2024, 1:38:44 PM7/11/24

to Eugene Zhulenev, Peter Hawkins, Dan Foreman-Mackey, OpenXLA Discuss

Thanks for the answer !

Do you have minimal reproducers for the bugs?

Not really unfortunately. The reason is that we built our framework on top of PJRT (and we are not using XLA directly) and it's currently in a private repository. I don't know if that would be of any help but maybe I could create an example with the version of XLA + the patches we used to build the PJRT plugins ?

I suspect that we forget to set some of the pointers and you end up dereferencing nullptr somewhere

That's exactly what's happening. I tracked down the code and from what I can tell it segfault at xla/ffi/ffi_api.cc#L354 when trying to access the user data and at xla/ffi/ffi_api.cc#L382 when trying to allocate memory.

The issue with user data is that args->ctx->execution_context is null, so it might not be set correctly when running an executable using PJRT. For the allocation, you're right it's indeed working correctly on GPU, I talked too fast.

Also what is the backend? GPU or CPU?

Both actually, we might be wrong but we noticed that the code to register new FFI handlers was backend agnostic. In short, we applied the patch in openxla/xla#13419 (+ some nitpicks to make the FFI C API headers proper C headers) and it worked.

I don't think it was ever implemented for CPU, because for CPU you can use a regular new/delete

Sure, that makes sense, we just wanted to try stuff and see what worked and what didn't.

Yes, kind of... you are not supposed to ever touch XLA_FFI_Xyz C APIs, and instead always use C++ wrappers

I'm not sure that's an option for us since we leverage PJRT no ? Also, the handler signature only parameter is a `XLA_FFI_CallFrame` so I think we need to interact with the C API at some point.

Thanks for the help, let me know if you need more details !

Cheers,

Corentin

Reply all

Reply to author

Forward