CustomCalls and subcomputations

222 views
Skip to first unread message

George Pawelczak

unread,
Jul 16, 2019, 5:57:17 PM7/16/19
to XLA development
Hi all,

We have been using custom calls extensively in our backend, and combining that with how (relatively) easy it is to lower a python function into XLA, we are wondering what's the community's opinion on allowing custom calls to have subcomputations associated with them. One use case in our backend would be to have a custom call, which similarly to an instruction such as reduction, has a computation associated with it which tells our backend how to combine values together. This computation could be a user defined python function which is lowered into XLA when generating the custom call, optimised at Hlo level, and then lowered in our backend, which takes the subcomputation into account.
This approach will still allow us to optimise such computations, where as if we tried passing that information in the backend config proto this would not be possible.

To achieve this I have extended the Custom Call instruction to take a computation/modified the accessors etc. I've placed an (arbitrary) restriction that the return value of the subcomputation needs to have the same shape as the output of the custom call instruction (by default broadcast zero to the right shapes, handling tuples) - this can be changed though - probably to just return a scalar zero. It is then completely up to the backend whether to use that subcomputation or not therefore CPU and GPU should not be affected apart from the HloModule having these extra computations.

The changes I have made thus far are here:
They are mainly to the tests  (I haven't targeted the GPU backend yet).

If we were to proceed with this I wonder if custom call subcomputations should only have the custom call as its call site, however unlike a fused computation, they can still be optimised.

DavidN and I have been wondering if this is the approach to take or whether there are any suggestions on alternatives? We are hoping to upstream these changes to minimise merge conflicts with our fork.

Cheers,

George

Sanjoy Das

unread,
Aug 15, 2019, 11:56:05 AM8/15/19
to George Pawelczak, XLA development
Hi George,

Sorry for the late reply.

On Tue, Jul 16, 2019 at 2:57 PM George Pawelczak <grzpaw...@gmail.com> wrote:
> We have been using custom calls extensively in our backend, and combining that with how (relatively) easy it is to lower a python function into XLA, we are wondering what's the community's opinion on allowing custom calls to have subcomputations associated with them. One use case in our backend would be to have a custom call, which similarly to an instruction such as reduction, has a computation associated with it which tells our backend how to combine values together. This computation could be a user defined python function which is lowered into XLA when generating the custom call, optimised at Hlo level, and then lowered in our backend, which takes the subcomputation into account.

What does this CustomCall do really? Can it be expressed as HLOs? Is
it reasonable to introduce a new HLO if existing HLOs can't express
its semantic?

We want to discourage XLA frontends from generating custom calls since
custom calls are not "self describing".

-- Sanjoy

> This approach will still allow us to optimise such computations, where as if we tried passing that information in the backend config proto this would not be possible.
>
> To achieve this I have extended the Custom Call instruction to take a computation/modified the accessors etc. I've placed an (arbitrary) restriction that the return value of the subcomputation needs to have the same shape as the output of the custom call instruction (by default broadcast zero to the right shapes, handling tuples) - this can be changed though - probably to just return a scalar zero. It is then completely up to the backend whether to use that subcomputation or not therefore CPU and GPU should not be affected apart from the HloModule having these extra computations.
>
> The changes I have made thus far are here:
> https://github.com/georgepaw/tensorflow/commit/3b6f6c6ddad88d80dfa6534d6f11cfe4f5429d37
> They are mainly to the tests (I haven't targeted the GPU backend yet).
>
> If we were to proceed with this I wonder if custom call subcomputations should only have the custom call as its call site, however unlike a fused computation, they can still be optimised.
>
> DavidN and I have been wondering if this is the approach to take or whether there are any suggestions on alternatives? We are hoping to upstream these changes to minimise merge conflicts with our fork.
>
> Cheers,
>
> George
>
> --
> You received this message because you are subscribed to the Google Groups "XLA development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.
> To post to this group, send email to xla...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/c8dc5ad1-8301-4d22-ae78-1a9ec59066e6%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

George Karpenkov

unread,
Oct 2, 2020, 10:20:52 PM10/2/20
to George Pawelczak, XLA development
hi George,

One year later I find myself needing this, so I might write something close to your patch. Are you still using it? One thing I'm confused about is why are you creating "default" computations instead of just leaving them empty.

George

On Tue, Jul 16, 2019 at 2:57 PM George Pawelczak <grzpaw...@gmail.com> wrote:
--

George Karpenkov

unread,
Oct 2, 2020, 10:29:09 PM10/2/20
to George Pawelczak, XLA development
Actually nevermind --- this functionality was already added in https://github.com/tensorflow/tensorflow/commit/5fc54e21530b89048a39bf83983a1f94befbc71b.
Reply all
Reply to author
Forward
0 new messages