On Thu, Aug 1, 2019 at 8:16 AM Alexandre Passos <
apa...@google.com> wrote:
> On Thu, Aug 1, 2019 at 3:43 AM Artem Artemiev <
i...@artemav.com> wrote:
>>
>> I'm a bit confused now, can I do this type of optimization by introducing new XLA op or it is just not trivial (impossible)? :)
>>
>>> XLA currently isn't very extensible in the way you'd like to extend it
>>
>>
>> +Alexandre Passos , what did you have in mind, why do you think it will be hard (impossible) to implement?
>
> Implementing new atomic operations in XLA is really difficult given its current design, and unlikely to be accepted as a PR, as XLA intends HLO to be a closed set of operations the compiler is deeply aware of (so it'd involve, among other things, changes to the google-private TPU backend to make it work). There is no easy extension point to add a new operation that just does a thing.
We do allow "custom calls" in HLO.
We really don't encourage generating it from the "frontend" (i.e. the
TF/XLA bridge) and Alex is right that a PR doing this will get
pushback. However backends are free to generate custom calls when
lowering. For instance in XLA GPU we use custom calls to represent
backwards convolutions. Backwards convolutions are first lowered (by
the TF/XLA bridge) into a pad/reverse/convolution sequence (I don't
remember the exact details) that's mathematically equivalent to a
backwards conv. They're later pattern matched by
cudnn_conv_rewriter.cc to custom calls into targets like
"__cudnn$convBackwardInput" which do the whole sequence in "one step"*
faster and with less memory.
* I'm counting a single call to cudnn as "one step"
In the example you gave, are f: X * X -> M and g: M -> V specific
functions? If yes, I think our handling of cudnn backwards
convolutions would be a good fit for what you're trying to do. Most
of the machinery is in cudnn_conv_rewriter.cc.
If you want to do this more generally (i.e. f and g are general
functions that you know nothing about) then you'll probably have to
use our more general fusion machinery. This is split across several
optimization passes, see all files with "fusion" in their names in
tensorflow/compiler/xla/service/*.
Please feel free to ask additional questions if you have any.
> Similarly, while the thing you want can be implemented as a fusion, you cannot as a user currently teach XLA how to do new fusions. Either your fusion is expressible in terms of HLO (so you go from a set of HLOs to another set of HLOs) or it needs to be separately implemented for each backend.
Backends can implement custom fusions, like
"__cudnn$convBackwardInput" etc. I mentioned above.
We _also_ have a way to generically say: do this sequence of N HLO
ops, but in a "single step" and we use that representation extensively
as well.
-- Sanjoy