PJRT for Custom Accelarator

3 views
Skip to first unread message

Milin Bhade

unread,
Oct 14, 2025, 10:44:11 PM (4 hours ago) Oct 14
to OpenXLA Discuss
Hi — I’m working on a PJRT plugin for a custom accelerator and want to enable availability-aware, cost-driven partitioning of an XLA/HLO module across GPU, CPU, and the custom accelerator:

If only CPU + accelerator are available, run using those.

If GPU is present and used, automatically identify HLO subgraphs that are better offloaded to the accelerator and compile/run them there along with GPU

Questions:

Does XLA currently support multi-backend HLO partitioning/placement (i.e., splitting one HLO module across different backend types)?

Can a PJRT plugin expose device cost/constraints or otherwise influence partitioning during HLO-level compilation?

If not, what’s the recommended approach: implement an XLA pass to consume cost info, or build an orchestration layer that partitions the model and invokes multiple PJRT clients/executables? Which option is more realistic today?

I can prototype either an XLA/HLO pass or an external orchestrator — looking for pointers, existing examples, or caveats.

Thanks.
Reply all
Reply to author
Forward
0 new messages