--
You received this message because you are subscribed to the Google Groups "XLA development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/4814fd28-dff1-4744-a3e0-709c19ee81bd%40googlegroups.com.
Hey,
With customized ops, do you mean TF ops or you want to add a custom XLA HLO op? If the former then you should be able to register a kernel to do symbolic expansion today already, if the latter then not in XLA no.
--Best,JacquesOn Wed, Jun 3, 2020, 4:45 AM Wenxi Zhu <zhuwen...@gmail.com> wrote:--Hi.Is there any way to have a plugin mechanisms, so users can define and run customized ops in XLA? Just like its counterpart in non-XLA version tensorflow: https://www.tensorflow.org/guide/create_op.I suppose APIs such as "REGISTER_XLA_OP()" and "Thunk" related structures could be exported to help users define HLO codegen and thunk implementations for their customized ops? I'm wondering if there is a plan, because currently XLA itself is not even included in libtensorflow_framwork.so.And if the answer is yes, would you accept a pull request with the work?Thank you!Wenxi
You received this message because you are subscribed to the Google Groups "XLA development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/4814fd28-dff1-4744-a3e0-709c19ee81bd%40googlegroups.com.
You received this message because you are subscribed to the Google Groups "XLA development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/CAM4W%2BYcaNCkLSjyOSSJa%2B4rqXL41gBvf%3DS%3DVhQVoeT8WQPpJ0A%40mail.gmail.com.
On 4 Jun 2020, at 17:00, 朱文熙 <zhuwen...@gmail.com> wrote:
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/CAFxN3F8Df4%3DegL1ya7KGNfDoKYDPcN51q06wQN%2B4Dhoz3R_ADw%40mail.gmail.com.
Sanjoy, thank you for the reply!I've read through the CustomCall documentation, however I don't think it can fulfill my requirement. Currently it looks more like a hack to me rather than a fully functional plugin mechanism because:1. The custom call code that users wrote should be added to tensorflow source code and be compiled with tensorflow, while official tensorflow plugins can be compiled separately from tensorflow and loaded by tensorflow at runtime. (I haven't try it to write a custom call code yet, so please correct me if I'm wrong)
2. "Custom call doesn't know the dimensions of buffers it operates over", that's what I read from its doc page.
It looks like custom call thunk is not a "first-class citizen" thunk so it lacks some basic functionalities?
Is there any other differences like between custom-call thunk and those "first-class citizen" thunks? So probably there will be some trouble when writing a serious custom call implementation? I don't know.I'm working on enabling a custom GPU op (kind of like horovod's allreduce/allgather operation but have some major differences) to run in XLA, the op is used in a model from Tencent AI Lab and will be deployed in Tencent's datacenter for large-scale multi machine training. I definitely don't want to do any hacking to existing XLA, because having a modified version of tensorflow which diverged from master branch is high-maintenance, especially in a datacenter. So I'm looking for a plugin mechanism in XLA but have no luck.
Sanjoy.Thanks for the explanations! Totally makes sense to me, especially the "custom call" part.Just curious, is there any specific reason why XLA can't have a plugin mechanism? Just like ordinary tensorflow distribution which already provides "REGISTER_OP()" and "REGISTER_KERNEL_BUILDER()" to let users create their own Op and corresponding Kernel implementation, it makes sense that XLA could also export similar APIs to users for creating HLO codegen and thunk implementation for a specific user-customized Op, I think.
Come to my situation, my worry is the OP I'm implementing is not generalize and mature enough to be added into XLA. I understand there won't be too much trouble to add a new HLO and a corresponding xla::Thunk, but after that every time when our DL scientists change the OP's behavior, I would have to change the HLO/Thunk implementations and upstream to community, which is troublesome in my opinions.That's why I'm calling for a plugin or extension mechanism, that users can provide their own HLO codegen & thunk implementations for a customized op. With this approach, there will be no need change the XLA source code and recompile it every time; The plugin will be compiled and maintained separately from main tensorflow/XLA source code and be loaded at runtime. It is a much more flexible and productive way in my perspective, and I believe it would also benefit a broader spectrum of users.I'm willing to take effort to introduce the plugin/extension mechanism to XLA. Actually I'm about to create a RFC for further discussions in the community (if it is not against XLA' roadmap, you know). Would you like to be my sponsor, or bridge me anyone who is suitable for that?
On Mon, Jun 8, 2020 at 4:14 AM Wenxi Zhu <zhuwen...@gmail.com> wrote:Sanjoy.Thanks for the explanations! Totally makes sense to me, especially the "custom call" part.Just curious, is there any specific reason why XLA can't have a plugin mechanism? Just like ordinary tensorflow distribution which already provides "REGISTER_OP()" and "REGISTER_KERNEL_BUILDER()" to let users create their own Op and corresponding Kernel implementation, it makes sense that XLA could also export similar APIs to users for creating HLO codegen and thunk implementation for a specific user-customized Op, I think.That's exactly the custom-call mechanism (but, as you said, there may be room for improvement).Come to my situation, my worry is the OP I'm implementing is not generalize and mature enough to be added into XLA. I understand there won't be too much trouble to add a new HLO and a corresponding xla::Thunk, but after that every time when our DL scientists change the OP's behavior, I would have to change the HLO/Thunk implementations and upstream to community, which is troublesome in my opinions.That's why I'm calling for a plugin or extension mechanism, that users can provide their own HLO codegen & thunk implementations for a customized op. With this approach, there will be no need change the XLA source code and recompile it every time; The plugin will be compiled and maintained separately from main tensorflow/XLA source code and be loaded at runtime. It is a much more flexible and productive way in my perspective, and I believe it would also benefit a broader spectrum of users.I'm willing to take effort to introduce the plugin/extension mechanism to XLA. Actually I'm about to create a RFC for further discussions in the community (if it is not against XLA' roadmap, you know). Would you like to be my sponsor, or bridge me anyone who is suitable for that?We are in the process of incrementally porting parts of XLA to the MLIR compiler infra, so this partly depends on your timeline. If you need something in the 1-2 quarters then IMO improving XLA's custom-call support makes sense to me. We can also use your input about what doesn't work well with XLA's custom-call HLO to inform our choices as we move to MLIR.
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/CAFxN3F-psfdrJ5%2B5LbXJ7Z5UramYb%2BReTgERdm9yiYvp3zeNMw%40mail.gmail.com.
Thank you, Chris. Very helpful.You mentioned the design choices about portability and consistency across devices, I totally agree with that. That's why I don't think add a new HLO/Thunk is the best solution for my situation, although it's probably the fastest way to get my work done.That's also the reason why I believe a plugin mechanism is suitable and necessary for XLA. It is a non-invasive approach to extend XLA's capability, there will be no need for developers to hack existing XLA source code and thus have to maintain a modified tensorflow distribution themselves;
And plugin development would be much easier, at least not taking too much consideration about portability or consistency, a custom op with only one device implementation (such as gpu) is appropriate.
Because users of these plugins know exactly what device/platform they're running on, they just select the appropriate plugin to install. That's my thought about the plugin mechanism.If I understand correctly, the design of XLA/MLIR plugin you're working on is still at the very early stage, probably purely proof-of-concept (you mentioned it's all code), no working prototype yet? But there's definitely a plan and we're marching for the target, right?