Maintaining Gelu operation from Keras in XLA

Noor, Abdul Rafae

unread,

Jun 7, 2022, 12:33:22 PM6/7/22

to xla...@googlegroups.com

Hello,

I am working on a LLVM Based backend which lowers XLA operations to LLVM-IR. However, as part of this backend I would like to lower the Gelu operation in Keras, while retaining at the XLA level that the computation is performing GeLU. As Gelu is not an operation in XLA, the operation is lowered into its component operations as shown here:

https://github.com/tensorflow/tensorflow/blob/49d605dab04d9e07441153b3a0d7a2beb2db7127/tensorflow/python/ops/nn_ops.py#L3668

While the individual operations exist in XLA, I would like to retain that this computation is a GeLU and not have it decomposed. What would be the best approach in retaining/recovering this information? I was thinking of two possible ways:

Using a pattern matcher, according to the utilities in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/xla/service/pattern_matcher.h . However, I’m concerned whether the size of the pattern may make it difficult to identify this case. Looking at the HLO output of a simple sequential model with a gelu activation shows that it’s at-least dozens of lines.
Lower the gelu operation into an XLA CustomCall operation (https://www.tensorflow.org/xla/operation_semantics#customcall ) with a custom Gelu declaration at this level https://github.com/tensorflow/tensorflow/blob/49d605dab04d9e07441153b3a0d7a2beb2db7127/tensorflow/python/ops/nn_ops.py#L3668 and then handle lowering of this custom call as is done here https://github.com/tensorflow/tensorflow/blob/49d605dab04d9e07441153b3a0d7a2beb2db7127/tensorflow/compiler/xla/service/cpu/ir_emitter.cc#L2420 . This may be better as retaining information is easier than trying to recover it after decomposition. I was unable to find any code however which generates the custom call XLA operation at this point (the point where gelu is decomposed into multiple simpler operations).

Best Regards,

Rafae

George Karpenkov

unread,

Jun 7, 2022, 4:45:26 PM6/7/22

to Noor, Abdul Rafae, Justin Lebar, xla...@googlegroups.com

+Justin Lebar (Note: external mailing list)

Hi Rafae,

This is an excellent question!

Indeed, you've correctly outlined your options: either pattern-match in XLA (attempt a "highering" operation), or create a CustomCall in a tf2xla kernel.

Indeed, none of them are great: CustomCall will not work on other backends, and "highering" is inherently fragile.

Adding Justin who recently was solving almost exactly the same problem.

--
You received this message because you are subscribed to the Google Groups "XLA development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/BYAPR11MB2918CFCA2BFEF032049B32C286A59%40BYAPR11MB2918.namprd11.prod.outlook.com.

Justin Lebar

unread,

Jun 7, 2022, 5:05:57 PM6/7/22

to George Karpenkov, Noor, Abdul Rafae, xla...@googlegroups.com

I don't think this is too hard to pattern-match -- we already do much "worse" pattern-matches. But also gelu is elementwise and pretty simple, I am curious how XLA's native codegen isn't good enough for you.

Noor, Abdul Rafae

unread,

Jun 7, 2022, 5:19:26 PM6/7/22

to jle...@waymo.com, George Karpenkov, xla...@googlegroups.com

Hello all,

I’m attaching the small snippet of code which shows the model in Keras, as well as the HLO file for the Gelu kernel. Could you share/point out where some of the more complex pattern matching is being done in XLA? I can implement a pattern for the shown code sequence, but I’m concerned if that will capture many cases in practice.

With regards to XLA’s native code-gen, it is more than sufficient however as part of our requirement we would like retain some higher level operation information (gelu is one such example), and then later on do lower level codegen in a separate phase. For the regular case, we would deconstruct the operations similar to how is done in XLA, but for cases where some specialized hardware support is available having this information can be useful.

Regards,

Rafae

Text

Description automatically generated

Justin Lebar

unread,

Jun 7, 2022, 5:24:03 PM6/7/22

to Noor, Abdul Rafae, George Karpenkov, xla...@googlegroups.com

cudnn_fused_conv_rewriter.cc has some pretty complicated matchers.

> With regards to XLA’s native code-gen, it is more than sufficient however as part of our requirement we would like retain some higher level operation information (gelu is one such example), and then later on do lower level codegen in a separate phase.

FWIW I think there's still an xyproblem.info here in that it's not clear to me why you actually want to do this.

Noor, Abdul Rafae

unread,

Jun 7, 2022, 5:38:17 PM6/7/22

to jle...@waymo.com, George Karpenkov, xla...@googlegroups.com

Hey Justin, I apologize if the intention is not clear. We’re working on an internal project for doing low level code-generation for many emerging tensor ISA and are trying to use the XLA Graph as a frontend to that system. Some of the constructs overlap with XLA operations in the system (e.g. Dot Product in XLA can map to Dot Product in our system), however our operations are not identical to the operations in XLA (hence we are supporting a restricted set of XLA operations). We map the operations in XLA accordingly and the code generation is done outside of TensorFlow, hence we are not using the LLVM lowering currently present in XLA for these operations. Does that help provide some idea of what I’m attempting to do?

Justin Lebar

unread,

Jun 7, 2022, 7:30:53 PM6/7/22

to Noor, Abdul Rafae, George Karpenkov, xla...@googlegroups.com

Thanks, that does help.

I am a little confused how, in the system you describe, you're still using LLVM. (From the original email, "I am working on a LLVM Based backend which lowers XLA operations to LLVM-IR.") If you have a chip that supports an ISA that is a reasonable match for LLVM IR, I am surprised that you also need to pattern-match gelu.

But I guess I will find out more when I read the paper! (Please send a link to this list when it's ready!)

Reply all

Reply to author

Forward