Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Status of new typed FFI API

68 views
Skip to first unread message

Corentin Godeau

unread,
Jul 11, 2024, 10:33:04 AM7/11/24
to OpenXLA Discuss
Hi everyone,

For the past few months we have been working on an ML framework leveraging StableHLO and XLA.

Recently, we started digging the custom calls topic and found out about the upcoming typed FFI API. It seems that progress is being made but is there an ETA/roadmap for that feature ?

In parallel, we've been patching StableHLO and XLA internally to run some tests and see what was possible.

Without too much work, we ended up having the custom calls work on CPU and GPU with the new typed FFI but some features are not working yet. Namely, accessing user data and allocating/freeing device memory is triggering a segfault. This was to be expected given that we had to patch stuff but we just wanted to be sure we are doing things right and understand how it will work eventually.
Do we agree that the expected way is to use the ExecuteContext through PJRT and then retrieve the user data in the custom call through the XLA_FFI_ExecutionContext_Get callback ?

Thanks a lot and keep up the good work !

Peter Hawkins

unread,
Jul 11, 2024, 10:37:56 AM7/11/24
to Corentin Godeau, Eugene Zhulenev, Dan Foreman-Mackey, OpenXLA Discuss

--
You received this message because you are subscribed to the Google Groups "OpenXLA Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openxla-discu...@openxla.org.
To view this discussion on the web visit https://groups.google.com/a/openxla.org/d/msgid/openxla-discuss/566fdd19-47db-4d3a-a6ee-02ca5e675247n%40openxla.org.
For more options, visit https://groups.google.com/a/openxla.org/d/optout.

Eugene Zhulenev

unread,
Jul 11, 2024, 12:21:16 PM7/11/24
to Peter Hawkins, Corentin Godeau, Dan Foreman-Mackey, OpenXLA Discuss
On Thu, Jul 11, 2024 at 7:37 AM Peter Hawkins <phaw...@google.com> wrote:

On Thu, Jul 11, 2024 at 10:33 AM Corentin Godeau <core...@zml.ai> wrote:
Hi everyone,


Hi Corentin,

It's almost ready, just needs a few finishing touches to fix remaining bugs. We are working on migrating existing custom calls to XLA FFI and fixing issues as we find them. However, most of the new APIs (execution context, memory allocator) are not used in any of the old custom calls, so we have very sparse test coverage for new features.
 
For the past few months we have been working on an ML framework leveraging StableHLO and XLA.

Recently, we started digging the custom calls topic and found out about the upcoming typed FFI API. It seems that progress is being made but is there an ETA/roadmap for that feature ?

In parallel, we've been patching StableHLO and XLA internally to run some tests and see what was possible.

Without too much work, we ended up having the custom calls work on CPU and GPU with the new typed FFI but some features are not working yet. Namely, accessing user data and allocating/freeing device memory is triggering a segfault. This was to be expected given that we had to patch stuff but we just wanted to be sure we are doing things right and understand how it will work eventually.

Do you have minimal reproducers for the bugs? I suspect that we forget to set some of the pointers and you end up dereferencing nullptr somewhere. Also what is the backend? GPU or CPU? I don't think it was ever implemented for CPU, because for CPU you can use a regular new/delete, but on GPU we make sure that custom calls can have access to underlying BFCAllocator that shares memory pool with XLA itself.
 
Do we agree that the expected way is to use the ExecuteContext through PJRT and then retrieve the user data in the custom call through the XLA_FFI_ExecutionContext_Get callback ?

Yes, kind of... you are not supposed to ever touch XLA_FFI_Xyz C APIs, and instead always use C++ wrappers, there is one ExecutionContext example here: https://github.com/openxla/xla/blob/main/xla/service/gpu/custom_call_test.cc#L724-L756 

Thanks,
Eugene

Corentin Godeau

unread,
Jul 11, 2024, 1:38:44 PM7/11/24
to Eugene Zhulenev, Peter Hawkins, Dan Foreman-Mackey, OpenXLA Discuss
Thanks for the answer !


Do you have minimal reproducers for the bugs?

Not really unfortunately. The reason is that we built our framework on top of PJRT (and we are not using XLA directly) and it's currently in a private repository. I don't know if that would be of any help but maybe I could create an example with the version of XLA + the patches we used to build the PJRT plugins ?

I suspect that we forget to set some of the pointers and you end up dereferencing nullptr somewhere
That's exactly what's happening. I tracked down the code and from what I can tell it segfault at xla/ffi/ffi_api.cc#L354 when trying to access the user data and at xla/ffi/ffi_api.cc#L382 when trying to allocate memory.

The issue with user data is that args->ctx->execution_context is null, so it might not be set correctly when running an executable using PJRT. For the allocation, you're right it's indeed working correctly on GPU, I talked too fast.

Also what is the backend? GPU or CPU?
Both actually, we might be wrong but we noticed that the code to register new FFI handlers was backend agnostic. In short, we applied the patch in openxla/xla#13419 (+ some nitpicks to make the FFI C API headers proper C headers) and it worked.

I don't think it was ever implemented for CPU, because for CPU you can use a regular new/delete
Sure, that makes sense, we just wanted to try stuff and see what worked and what didn't.

Yes, kind of... you are not supposed to ever touch XLA_FFI_Xyz C APIs, and instead always use C++ wrappers
I'm not sure that's an option for us since we leverage PJRT no ? Also, the handler signature only parameter is a `XLA_FFI_CallFrame` so I think we need to interact with the C API at some point.

Thanks for the help, let me know if you need more details !

Cheers,
Corentin
Reply all
Reply to author
Forward
0 new messages