Dynamically loadable XLA plugin

376 views
Skip to first unread message

Avijit Chakraborty

unread,
Mar 8, 2018, 2:53:26 PM3/8/18
to XLA development
We would like to propose a dynamically loadable plugin for XLA. The key idea is very similar to adding a new Op to TensorFlow (https://www.tensorflow.org/extend/adding_an_op). In this scheme, the plugin(s) will be discovered and loaded by TensorFlow at the startup time. Computation placement etc. will be as usual - it's just that the plugin functionalities will be loaded dynamically. 

Comments/thoughts?

Thanks,

Avijit
xla-plugin-proposal.pdf

Justin Lebar

unread,
Mar 8, 2018, 5:04:29 PM3/8/18
to Avijit Chakraborty, XLA development
We've had some off-list discussion, perhaps you could summarize it here as a starting point?

--
You received this message because you are subscribed to the Google Groups "XLA development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.
To post to this group, send email to xla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/53f06032-db88-42ee-bcff-2267936a2ab6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Avijit Chakraborty

unread,
Mar 8, 2018, 5:38:46 PM3/8/18
to XLA development
Justin brought up an important point about the ABI compatibility issue in this scheme. 

Since the plugin DSO interface is a C++ API, this will require use of same compiler and options building the plugin DSO.  Additionally, for every TF binary release, the plugin DSO needs to be rebuilt and released by the provider of the plugin. This could be a non trivial task for the plugin provider and if not done right, could be a frustrating experience for the end user.

While I acknowledge that this is a problem that needs to be properly addressed by the plugin provider, this proposal adds the following advantages from a plugin author's point of view:

1. The bridge code i.e., the code that takes an HLO graph and provides a backend for the plugin device is isolated from the rest of TensorFlow.

2.The bridge code can be developed outside of the TensorFlow source tree. This allows developers to focus on the implementation aspect of the bridge and the backend without having to spend too much time understanding and keeping up with TF build system and rest of the codebase.

3. The bridge code doesn't need to be upstreamed - which means implementation that is not related to TensorFlow doesn't have to go through the TF PR process (saving time for TF PR reviewers)

4. At the same time, this scheme utilizes all the useful HLO optimizations (hence the need for C++ API) and other features from the TF framework - that would continue to benefit the plugin developers. 

Thoughts/comments?

Bjarke Roune

unread,
Mar 9, 2018, 12:53:40 PM3/9/18
to XLA development
Have you considered an approach based on one of these two options?

 1) Distribute your own XLA that exposes the same protobuffer and service interface that base XLA does. Then you just need TensorFlow to connect to your custom XLA service. (it's not clear to me how similar this is to what you're proposing?)

 2) Add a trivial XLA backend that just records the HLO graph as a protobuffer and passes that on to your binary blob backend (potentially with no prior processing). We already have code to serialize HLO graphs as protobuffers, so that part is taken care of.

The XLA protobuffer definitions are NOT guaranteed to be stable over time, so this is NOT a panacea for version compatibility. Justin was pointing out that ABI compatibility can be broken even at minor revisions, like security fixes. The protobuffer definitions are very unlikely to change with minor revisions, reducing these concerns and the frequency with which custom backend developers would have to release new binaries.

I think option 2 is a good choice. It keeps you in sync with upstream for changes that are not specific to your backend, reducing the risk of accumulating deviations from base XLA over time - that's always tempting, but it can be difficult to deal with if you later wish to downstream large changes from base XLA, which you most likely will want to do.

Bjarke

Avijit Chakraborty

unread,
Mar 11, 2018, 11:36:44 PM3/11/18
to XLA development
So, for solution #2, once the plugin receives the protobuf binary blob, it may need to run HLO level optimizations, traverse the HLO graph and generate backend code and so on. For the HLO related operations, I assume that the plugin will be using various HLO classes from TensorFlow codebase (i.e., will use the TensorFlow include files and will need to link with the libtensorflow_framework.so). 

If the TensorFlow codebase has changed (due to a security update for example) don't we need to recompile the plugin DSO?

On Thursday, March 8, 2018 at 11:53:26 AM UTC-8, Avijit Chakraborty wrote:

Bjarke Roune

unread,
Mar 12, 2018, 1:27:38 AM3/12/18
to Avijit Chakraborty, XLA development
That approach would involve shipping your binary with whatever HLO passes you need included as well as anything else you need from XLA (e.g. the HloInstruction class). If the passes are updated, you wouldn't then receive the updates without updating your binary, but other than that it would be fine, as long as the proto definitions and the hand-off API/ABI don't change.

--
You received this message because you are subscribed to a topic in the Google Groups "XLA development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/xla-dev/LZdKcq7goko/unsubscribe.
To unsubscribe from this group and all its topics, send an email to xla-dev+unsubscribe@googlegroups.com.

To post to this group, send email to xla...@googlegroups.com.

Avijit Chakraborty

unread,
Mar 17, 2018, 6:51:03 PM3/17/18
to XLA development
Bjarke's idea makes sense. Just to be sure that I understand this correctly - following are the key points as I understand them:

1. We will define a C interface with POD 
2. For transferring the HLO graphs, we will use protobuf binary to serialize the graph on the TF side and deserialize on the plugin side
3. For graph execution, we need to define a C API which will use memory pointers pointing to actual data and some data structure such as Shape (serialized as PB) that describes the data layout.  

The plugin implementation will use various TF and XLA classes as needed but will statically include them (i.e., won't need libtensorflow_framework.so during the runtime). 

If this looks ok, then we will send out an API header file for review in this group.

Thanks,

Avijit 

On Thursday, March 8, 2018 at 11:53:26 AM UTC-8, Avijit Chakraborty wrote:

Bjarke Roune

unread,
Mar 19, 2018, 12:19:42 PM3/19/18
to XLA development
That sounds good to me. Though feel free to argue for other options, this is only one of the ways to do it. Something more that you'll have to think about is how you prefer the interaction with the device to work beyond giving it an XLA graph to execute. There are StreamExecutor and XLA interfaces for this, that may or may not be sufficient for your use-case.


On Saturday, March 17, 2018 at 3:51:03 PM UTC-7, Avijit Chakraborty wrote:
Bjarke's idea makes sense. Just to be sure that I understand this correctly - following are the key points as I understand them:

1. We will define a C interface with POD 
2. For transferring the HLO graphs, we will use protobuf binary to serialize the graph on the TF side and deserialize on the plugin side

A quick note on terminology: The way we think about TF and XLA is as separate entities, so to me "the TF side" would mean something in TF that is outside of XLA, while here I think you are not making a distinction and thinking in terms of "on the XLA side, which is part of TF, so that's the TF side."

Justin Lebar

unread,
Mar 19, 2018, 4:19:12 PM3/19/18
to Bjarke Roune, XLA development
Another thing to consider: The proto API you'll be using doesn't have backwards or forwards compatibility guarantees.  I think you're ok with that -- certainly failure should be less catastrophic than with an ABI mismatch, but you should think about how you're going to test this on your side so you know when you have to rebuild, and how you're going to communicate this to your customers.

Also you should think about how you're going to avoid ODR violations when you have your own copy of XLA classes.  Wrap everything in an inline namespace?  Hide all your symbols so they don't conflict?

--
You received this message because you are subscribed to the Google Groups "XLA development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.
To post to this group, send email to xla...@googlegroups.com.

Chakraborty, Avijit

unread,
Mar 19, 2018, 4:29:13 PM3/19/18
to Justin Lebar, Bjarke Roune, XLA development

These are very good points Bjarke and Justin. Really appreciate your feedback. I will add these points as part of the API doc I will send out for review.

 

Thanks,

 

  • Avijit

Avijit Chakraborty

unread,
Mar 30, 2018, 11:14:28 AM3/30/18
to XLA development
Does any of the TensorFlow binary (i.e., Python whl) distributions include XLA? Or anyone intends to use XLA must build it from source?   


On Thursday, March 8, 2018 at 11:53:26 AM UTC-8, Avijit Chakraborty wrote:

Avijit Chakraborty

unread,
Apr 3, 2018, 7:32:34 PM4/3/18
to XLA development

I have scoped out the changes needed to support a full C API for plugin based on the comments from Justin and Bjarke. I am becoming more and more  skeptical about putting the effort to develop a C API to support the backward compatibility. The main reason for a C API is to be ABI compatible. However, the XLA plugin implementation needs to use a number of XLA, Protobuf and TensorFlow classes (e.g., StreamExecutor). Since these classes are going to change across releases (as in the past) – I don’t see any easy way to maintain backward compatibility. 

 

As Justin mentioned:

 

“The proto API you'll be using doesn't have backwards or forwards compatibility guarantees.  I think you're ok with that -- certainly failure should be less catastrophic than with an ABI mismatch”

 

He also mentioned: “Also you should think about how you're going to avoid ODR violations when you have your own copy of XLA classes.  Wrap everything in an inline namespace?  Hide all your symbols so they don't conflict?”

 

So, if we define the goal to “avoid catastrophic failure due to ABI mismatch and a graceful error reporting” then it’s achievable. Here’s a proposal that outlines the scheme:

 

1. Plugin will implement a C API function that will receive version information from XLA as follows:

        - value returned by tf_git_version()

        - value returned by tf_compiler_version()

        - value returned by tf_cxx11_abi_flag()

 

2. This C API function will return a boolean indicating

        - True which means the Plugin is compatible and loading can proceed

        - False meaning that plugin is not compatible (and/or other errors) and loading cannot continue. Additionally this will also send out a LOG(WARNING) message. The caller on the XLA side will skip the rest of the plugin initialization sequence.  Remaining TensorFlow initialization will finish. At runtime, any TensorFlow Python script referring to this plugin device will get an error indicating that no such device found. However,  will not result in any crash.

 

We recently open sourced nGraph and an XLA plugin using a similar scheme in github as follows:

 

A. Modified TensorFlow that has the dynamically loadable plugin capability: https://github.com/NervanaSystems/ngraph-tensorflow

B. The nGraph library: https://github.com/NervanaSystems/ngraph

C. The nGraph XLA plugin: https://github.com/NervanaSystems/ngraph-tensorflow-bridge

 

In the ngraph-tensorflow repo, we defined the following:


  1. The plugin API consisting of one C function that returns a C structure (containing function pointers that the plugin adapter on the XLA side will call).
  2. The plugin adapter on the XLA side calls the “Init” method at the startup time and checks the return value. If the plugin returns false then the plugin adapter (on the XLA side) skips the rest of the device initialization. TensorFlow starts up normally (but the device is not available – which is what we want)
  3. We just need to add the version information based check to this Init(…) function.

 

Please let us know whether this works or there are any corner cases we missed.

 


On Thursday, March 8, 2018 at 11:53:26 AM UTC-8, Avijit Chakraborty wrote:

Justin Lebar

unread,
Apr 4, 2018, 12:16:13 PM4/4/18
to Avijit Chakraborty, XLA development
>  the XLA plugin implementation needs to use a number of XLA, Protobuf and TensorFlow classes (e.g., StreamExecutor).

Just to make sure I understand your argument, it is that the plugin API [0] needs to pass not just protobufs between it and XLA, but also other XLA and TensorFlow types?

>  Plugin will implement a C API function that will receive version information from XLA as follows:
>       - value returned by tf_git_version()
>       - value returned by tf_compiler_version()
>       - value returned by tf_cxx11_abi_flag()

And all of these have to match exactly in order to load the plugin?

In fact I think the version information needs to go from the plugin to TensorFlow, not the other way around.  TensorFlow cannot trust the plugin to make the right decision here; it should be TF that says "your plugin is not compatible".

One problem even with this approach is that in order to call the "info" function on the plugin, we still have to link with the shared library at build time or dlopen it at runtime.  Either way will cause us to run the library's global initializers.  If there is a mismatch between the XLA ABI and the code in those global initializers, we will have a catastrophic failure.

One way to work around this would be for TF to check that the shared library has no static initializers/destructors before loading it.  But I'm not sure you can or would want to guarantee this.

So if we don't do that, then I guess the other alternative is to provide a manifest along with your shared library.

Another consideration is: There are two ways to use one of these plugins.  You can either get a precompiled binary and link with it somehow, or you can build from source and have it available natively.  We should consider these two APIs together so that we can see if we should change one to make it more similar to the other.


--
You received this message because you are subscribed to the Google Groups "XLA development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.
To post to this group, send email to xla...@googlegroups.com.

Chakraborty, Avijit

unread,
Apr 4, 2018, 7:37:53 PM4/4/18
to Justin Lebar, XLA development

Thanks Justin for your feedback. Please see my responses below.

 

From: Justin Lebar <jle...@google.com>
Date: Wednesday, April 4, 2018 at 9:16 AM
To: "Chakraborty, Avijit" <avijit.ch...@intel.com>
Cc: XLA development <xla...@googlegroups.com>
Subject: Re: [xla-dev] Re: Dynamically loadable XLA plugin

 

 

Just to make sure I understand your argument, it is that the plugin API [0] needs to pass not just protobufs between it and XLA, but also other XLA and TensorFlow types?

 

[Avijit] The C plugin API would pass the protobuf and memory pointers. But the plugin implementation needs to use various TF & XLA data types and classes (e.g., traversing the HLO graph, unpacking input data, packing output etc.).

 

>  Plugin will implement a C API function that will receive version information from XLA as follows:

>       - value returned by tf_git_version()

>       - value returned by tf_compiler_version()

>       - value returned by tf_cxx11_abi_flag()

 

And all of these have to match exactly in order to load the plugin?

 

[Avijit] The cxx11_abi and compiler_version need to match exactly. The git_version may be a bit flexible. If the plugin was built with TF a specific git-hash and no changes were made in the relevant TF/XLA classes that are used by the plugin by later version of TF, then the plugin can be loaded. Of course – it’ a bit of a slippery slope. So initially we could mandate an exact match and see how it goes. After all it’s the responsibility of the plugin developers to release a newer version of the plugin to be used with a newer version of TF.

 

In fact I think the version information needs to go from the plugin to TensorFlow, not the other way around.  TensorFlow cannot trust the plugin to make the right decision here; it should be TF that says "your plugin is not compatible".

 

[Avijit] Yes – that makes better sense. In that case TensorFlow will query the plugin the information above and decide whether to proceed with the load or fail. The plugin will simply notify what specific git hash of TF it was compiled with.

 

One problem even with this approach is that in order to call the "info" function on the plugin, we still have to link with the shared library at build time or dlopen it at runtime.  Either way will cause us to run the library's global initializers.  If there is a mismatch between the XLA ABI and the code in those global initializers, we will have a catastrophic failure.

 

One way to work around this would be for TF to check that the shared library has no static initializers/destructors before loading it.  But I'm not sure you can or would want to guarantee this.

 

So if we don't do that, then I guess the other alternative is to provide a manifest along with your shared library.

 

[Avijit] A manifest file sounds like a better idea. In addition to the version information, other plugin specific information (such as the device priority, or a resource directory etc.) can also be read from this manifest by both TensorFlow and the plugin itself as needed.

 

Another consideration is: There are two ways to use one of these plugins.  You can either get a precompiled binary and link with it somehow, or you can build from source and have it available natively.  We should consider these two APIs together so that we can see if we should change one to make it more similar to the other.

 

[Avijit] The proposed API is derived from the old Executor example and provides a simplified set of function calls to compile and execute the graph. At a high level both of these APIs need to provide same set of functionalities.  This approach we are proposing is an additional (and more flexible) way to add support to new XLA devices and not intended to replace the other approach.

 

Justin Lebar

unread,
Apr 5, 2018, 3:01:08 PM4/5/18
to Chakraborty, Avijit, XLA development
If we're going to have a manifest, I wonder if the shared library should simply register itself upon being dlopen'ed, the same as in the in-source build.  There would be no need to add any new API (?).

Bjarke Roune

unread,
Apr 9, 2018, 5:42:16 PM4/9/18
to Justin Lebar, Chakraborty, Avijit, XLA development
On Thu, Apr 5, 2018 at 12:00 PM, 'Justin Lebar' via XLA development <xla...@googlegroups.com> wrote:
If we're going to have a manifest, I wonder if the shared library should simply register itself upon being dlopen'ed, the same as in the in-source build.  There would be no need to add any new API (?).

On Wed, Apr 4, 2018 at 4:37 PM Chakraborty, Avijit <avijit.ch...@intel.com> wrote:

Thanks Justin for your feedback. Please see my responses below.

 

From: Justin Lebar <jle...@google.com>
Date: Wednesday, April 4, 2018 at 9:16 AM
To: "Chakraborty, Avijit" <avijit.ch...@intel.com>
Cc: XLA development <xla...@googlegroups.com>
Subject: Re: [xla-dev] Re: Dynamically loadable XLA plugin

 

 

Just to make sure I understand your argument, it is that the plugin API [0] needs to pass not just protobufs between it and XLA, but also other XLA and TensorFlow types?

 

[Avijit] The C plugin API would pass the protobuf and memory pointers. But the plugin implementation needs to use various TF & XLA data types and classes (e.g., traversing the HLO graph, unpacking input data, packing output etc.).

 


Is it a problem to compile in and distribute these as part of your plugin?
 

To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+unsubscribe@googlegroups.com.


To post to this group, send email to


To view this discussion on the web visit

You received this message because you are subscribed to a topic in the Google Groups "XLA development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/xla-dev/LZdKcq7goko/unsubscribe.
To unsubscribe from this group and all its topics, send an email to xla-dev+unsubscribe@googlegroups.com.
To post to this group, send email to xla...@googlegroups.com.

Chakraborty, Avijit

unread,
Apr 9, 2018, 6:03:30 PM4/9/18
to Bjarke Roune, Justin Lebar, XLA development

 

 

From: 'Bjarke Roune' via XLA development <xla...@googlegroups.com>
Reply-To: Bjarke Roune <bro...@google.com>
Date: Monday, April 9, 2018 at 2:42 PM
To: Justin Lebar <jle...@google.com>
Cc: "Chakraborty, Avijit" <avijit.ch...@intel.com>, XLA development <xla...@googlegroups.com>
Subject: Re: [xla-dev] Re: Dynamically loadable XLA plugin

 

On Thu, Apr 5, 2018 at 12:00 PM, 'Justin Lebar' via XLA development <xla...@googlegroups.com> wrote:

If we're going to have a manifest, I wonder if the shared library should simply register itself upon being dlopen'ed, the same as in the in-source build.  There would be no need to add any new API (?).

 

On Wed, Apr 4, 2018 at 4:37 PM Chakraborty, Avijit <avijit.ch...@intel.com> wrote:

Thanks Justin for your feedback. Please see my responses below.

 

From: Justin Lebar <jle...@google.com>
Date: Wednesday, April 4, 2018 at 9:16 AM
To: "Chakraborty, Avijit" <
avijit.ch...@intel.com>
Cc: XLA development <
xla...@googlegroups.com>
Subject: Re: [xla-dev] Re: Dynamically loadable XLA plugin

 

 

Just to make sure I understand your argument, it is that the plugin API [0] needs to pass not just protobufs between it and XLA, but also other XLA and TensorFlow types?

 

[Avijit] The C plugin API would pass the protobuf and memory pointers. But the plugin implementation needs to use various TF & XLA data types and classes (e.g., traversing the HLO graph, unpacking input data, packing output etc.).

 

 

>Is it a problem to compile in and distribute these as part of your plugin?

 

No – that’s not a problem. In fact the XLA plugin will compile in necessary TF and XLA classes. But the issue is if the definition for these classes may have changed, then that would result in a problem.

 

To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.


To post to this group, send email to


To view this discussion on the web visit

--
You received this message because you are subscribed to a topic in the Google Groups "XLA development" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/xla-dev/LZdKcq7goko/unsubscribe

.
To unsubscribe from this group and all its topics, send an email to
xla-dev+u...@googlegroups.com.


To post to this group, send email to

 

--

You received this message because you are subscribed to the Google Groups "XLA development" group.

To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.


To post to this group, send email to

Bjarke Roune

unread,
Apr 11, 2018, 12:32:00 PM4/11/18
to XLA development


On Monday, April 9, 2018 at 3:03:30 PM UTC-7, Avijit Chakraborty wrote:

 

 

From: 'Bjarke Roune' via XLA development <xla...@googlegroups.com>
Reply-To: Bjarke Roune <bro...@google.com>
Date: Monday, April 9, 2018 at 2:42 PM
To: Justin Lebar <jle...@google.com>
Cc: "Chakraborty, Avijit" <avijit.ch...@intel.com>, XLA development <xla...@googlegroups.com>
Subject: Re: [xla-dev] Re: Dynamically loadable XLA plugin

 

 

 

On Thu, Apr 5, 2018 at 12:00 PM, 'Justin Lebar' via XLA development <xla...@googlegroups.com> wrote:

If we're going to have a manifest, I wonder if the shared library should simply register itself upon being dlopen'ed, the same as in the in-source build.  There would be no need to add any new API (?).

 

On Wed, Apr 4, 2018 at 4:37 PM Chakraborty, Avijit <avijit.ch...@intel.com> wrote:

Thanks Justin for your feedback. Please see my responses below.

 

From: Justin Lebar <jle...@google.com>
Date: Wednesday, April 4, 2018 at 9:16 AM
To: "Chakraborty, Avijit" <
avijit.ch...@intel.com>
Cc: XLA development <
xla...@googlegroups.com>
Subject: Re: [xla-dev] Re: Dynamically loadable XLA plugin

 

 

Just to make sure I understand your argument, it is that the plugin API [0] needs to pass not just protobufs between it and XLA, but also other XLA and TensorFlow types?

 

[Avijit] The C plugin API would pass the protobuf and memory pointers. But the plugin implementation needs to use various TF & XLA data types and classes (e.g., traversing the HLO graph, unpacking input data, packing output etc.).

 

 

>Is it a problem to compile in and distribute these as part of your plugin?

 

No – that’s not a problem. In fact the XLA plugin will compile in necessary TF and XLA classes. But the issue is if the definition for these classes may have changed, then that would result in a problem.

 


Sorry if I'm being dense here, but why is that a problem? As long as those classes do not appear in the interface of your plugin, that should be fine, shouldn't it? Or are you saying that those classes have to appear in the interface?

To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+unsubscribe@googlegroups.com.


To post to this group, send email to


To view this discussion on the web visit

--
You received this message because you are subscribed to a topic in the Google Groups "XLA development" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/xla-dev/LZdKcq7goko/unsubscribe

.
To unsubscribe from this group and all its topics, send an email to
xla-dev+unsubscribe@googlegroups.com.


To post to this group, send email to

--
You received this message because you are subscribed to the Google Groups "XLA development" group.

To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+unsubscribe@googlegroups.com.


To post to this group, send email to


To view this discussion on the web visit

Reply all
Reply to author
Forward
0 new messages