--
You received this message because you are subscribed to the Google Groups "XLA development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.
To post to this group, send email to xla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/53f06032-db88-42ee-bcff-2267936a2ab6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to a topic in the Google Groups "XLA development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/xla-dev/LZdKcq7goko/unsubscribe.
To unsubscribe from this group and all its topics, send an email to xla-dev+unsubscribe@googlegroups.com.
To post to this group, send email to xla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/7a4b26dc-e1c6-4244-a2fc-673582a413b1%40googlegroups.com.
Bjarke's idea makes sense. Just to be sure that I understand this correctly - following are the key points as I understand them:1. We will define a C interface with POD2. For transferring the HLO graphs, we will use protobuf binary to serialize the graph on the TF side and deserialize on the plugin side
--
You received this message because you are subscribed to the Google Groups "XLA development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.
To post to this group, send email to xla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/90484e0b-1cd2-4679-bcc1-a203c4f8c393%40googlegroups.com.
These are very good points Bjarke and Justin. Really appreciate your feedback. I will add these points as part of the API doc I will send out for review.
Thanks,
.
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/CAMuNMforC-RRg3DhqebPXtB1%2B2-F-BbM8JyLA7Rheyjr3hqtBQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
I have scoped out the changes needed to support a full C API for plugin based on the comments from Justin and Bjarke. I am becoming more and more skeptical about putting the effort to develop a C API to support the backward compatibility. The main reason for a C API is to be ABI compatible. However, the XLA plugin implementation needs to use a number of XLA, Protobuf and TensorFlow classes (e.g., StreamExecutor). Since these classes are going to change across releases (as in the past) – I don’t see any easy way to maintain backward compatibility.
As Justin mentioned:
“The proto API you'll be using doesn't have backwards or forwards compatibility guarantees. I think you're ok with that -- certainly failure should be less catastrophic than with an ABI mismatch”
He also mentioned: “Also you should think about how you're going to avoid ODR violations when you have your own copy of XLA classes. Wrap everything in an inline namespace? Hide all your symbols so they don't conflict?”
So, if we define the goal to “avoid catastrophic failure due to ABI mismatch and a graceful error reporting” then it’s achievable. Here’s a proposal that outlines the scheme:
1. Plugin will implement a C API function that will receive version information from XLA as follows:
- value returned by tf_git_version()
- value returned by tf_compiler_version()
- value returned by tf_cxx11_abi_flag()
2. This C API function will return a boolean indicating
- True which means the Plugin is compatible and loading can proceed
- False meaning that plugin is not compatible (and/or other errors) and loading cannot continue. Additionally this will also send out a LOG(WARNING) message. The caller on the XLA side will skip the rest of the plugin initialization sequence. Remaining TensorFlow initialization will finish. At runtime, any TensorFlow Python script referring to this plugin device will get an error indicating that no such device found. However, will not result in any crash.
We recently open sourced nGraph and an XLA plugin using a similar scheme in github as follows:
A. Modified TensorFlow that has the dynamically loadable plugin capability: https://github.com/NervanaSystems/ngraph-tensorflow
B. The nGraph library: https://github.com/NervanaSystems/ngraph
C. The nGraph XLA plugin: https://github.com/NervanaSystems/ngraph-tensorflow-bridge
In the ngraph-tensorflow repo, we defined the following:
Please let us know whether this works or there are any corner cases we missed.
--
You received this message because you are subscribed to the Google Groups "XLA development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.
To post to this group, send email to xla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/5775c659-0539-426d-b046-1e85049c27ef%40googlegroups.com.
Thanks Justin for your feedback. Please see my responses below.
From: Justin Lebar <jle...@google.com>
Date: Wednesday, April 4, 2018 at 9:16 AM
To: "Chakraborty, Avijit" <avijit.ch...@intel.com>
Cc: XLA development <xla...@googlegroups.com>
Subject: Re: [xla-dev] Re: Dynamically loadable XLA plugin
Just to make sure I understand your argument, it is that the plugin API [0] needs to pass not just protobufs between it and XLA, but also other XLA and TensorFlow types?
[Avijit] The C plugin API would pass the protobuf and memory pointers. But the plugin implementation needs to use various TF & XLA data types and classes (e.g., traversing the HLO graph, unpacking input data, packing output etc.).
> Plugin will implement a C API function that will receive version information from XLA as follows:
> - value returned by tf_git_version()
> - value returned by tf_compiler_version()
> - value returned by tf_cxx11_abi_flag()
And all of these have to match exactly in order to load the plugin?
[Avijit] The cxx11_abi and compiler_version need to match exactly. The git_version may be a bit flexible. If the plugin was built with TF a specific git-hash and no changes were made in the relevant TF/XLA classes that are used by the plugin by later version of TF, then the plugin can be loaded. Of course – it’ a bit of a slippery slope. So initially we could mandate an exact match and see how it goes. After all it’s the responsibility of the plugin developers to release a newer version of the plugin to be used with a newer version of TF.
In fact I think the version information needs to go from the plugin to TensorFlow, not the other way around. TensorFlow cannot trust the plugin to make the right decision here; it should be TF that says "your plugin is not compatible".
[Avijit] Yes – that makes better sense. In that case TensorFlow will query the plugin the information above and decide whether to proceed with the load or fail. The plugin will simply notify what specific git hash of TF it was compiled with.
One problem even with this approach is that in order to call the "info" function on the plugin, we still have to link with the shared library at build time or dlopen it at runtime. Either way will cause us to run the library's global initializers. If there is a mismatch between the XLA ABI and the code in those global initializers, we will have a catastrophic failure.
One way to work around this would be for TF to check that the shared library has no static initializers/destructors before loading it. But I'm not sure you can or would want to guarantee this.
So if we don't do that, then I guess the other alternative is to provide a manifest along with your shared library.
[Avijit] A manifest file sounds like a better idea. In addition to the version information, other plugin specific information (such as the device priority, or a resource directory etc.) can also be read from this manifest by both TensorFlow and the plugin itself as needed.
Another consideration is: There are two ways to use one of these plugins. You can either get a precompiled binary and link with it somehow, or you can build from source and have it available natively. We should consider these two APIs together so that we can see if we should change one to make it more similar to the other.
[Avijit] The proposed API is derived from the old Executor example and provides a simplified set of function calls to compile and execute the graph. At a high level both of these APIs need to provide same set of functionalities. This approach we are proposing is an additional (and more flexible) way to add support to new XLA devices and not intended to replace the other approach.
If we're going to have a manifest, I wonder if the shared library should simply register itself upon being dlopen'ed, the same as in the in-source build. There would be no need to add any new API (?).On Wed, Apr 4, 2018 at 4:37 PM Chakraborty, Avijit <avijit.ch...@intel.com> wrote:Thanks Justin for your feedback. Please see my responses below.
From: Justin Lebar <jle...@google.com>
Date: Wednesday, April 4, 2018 at 9:16 AM
To: "Chakraborty, Avijit" <avijit.ch...@intel.com>
Cc: XLA development <xla...@googlegroups.com>
Subject: Re: [xla-dev] Re: Dynamically loadable XLA plugin
Just to make sure I understand your argument, it is that the plugin API [0] needs to pass not just protobufs between it and XLA, but also other XLA and TensorFlow types?
[Avijit] The C plugin API would pass the protobuf and memory pointers. But the plugin implementation needs to use various TF & XLA data types and classes (e.g., traversing the HLO graph, unpacking input data, packing output etc.).
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+unsubscribe@googlegroups.com.
To post to this group, send email to
To view this discussion on the web visit
You received this message because you are subscribed to a topic in the Google Groups "XLA development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/xla-dev/LZdKcq7goko/unsubscribe.
To unsubscribe from this group and all its topics, send an email to xla-dev+unsubscribe@googlegroups.com.
To post to this group, send email to xla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/CAMuNMfo7WjLK6%2ByxieUGuekztHdeTSNL1-uwm%2B3_5raNOcGhLg%40mail.gmail.com.
From: 'Bjarke Roune' via XLA development <xla...@googlegroups.com>
Reply-To: Bjarke Roune <bro...@google.com>
Date: Monday, April 9, 2018 at 2:42 PM
To: Justin Lebar <jle...@google.com>
Cc: "Chakraborty, Avijit" <avijit.ch...@intel.com>, XLA development <xla...@googlegroups.com>
Subject: Re: [xla-dev] Re: Dynamically loadable XLA plugin
On Thu, Apr 5, 2018 at 12:00 PM, 'Justin Lebar' via XLA development <xla...@googlegroups.com> wrote:
If we're going to have a manifest, I wonder if the shared library should simply register itself upon being dlopen'ed, the same as in the in-source build. There would be no need to add any new API (?).
On Wed, Apr 4, 2018 at 4:37 PM Chakraborty, Avijit <avijit.ch...@intel.com> wrote:
Thanks Justin for your feedback. Please see my responses below.
From: Justin Lebar <jle...@google.com>
Date: Wednesday, April 4, 2018 at 9:16 AM
To: "Chakraborty, Avijit" <avijit.ch...@intel.com>
Cc: XLA development <xla...@googlegroups.com>
Subject: Re: [xla-dev] Re: Dynamically loadable XLA plugin
Just to make sure I understand your argument, it is that the plugin API [0] needs to pass not just protobufs between it and XLA, but also other XLA and TensorFlow types?
[Avijit] The C plugin API would pass the protobuf and memory pointers. But the plugin implementation needs to use various TF & XLA data types and classes (e.g., traversing the HLO graph, unpacking input data, packing output etc.).
>Is it a problem to compile in and distribute these as part of your plugin?
No – that’s not a problem. In fact the XLA plugin will compile in necessary TF and XLA classes. But the issue is if the definition for these classes may have changed, then that would result in a problem.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.
To post to this group, send email to
To view this discussion on the web visit
--
You received this message because you are subscribed to a topic in the Google Groups "XLA development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/xla-dev/LZdKcq7goko/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to xla-dev+u...@googlegroups.com.
To post to this group, send email to
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/CAMuNMfo7WjLK6%2ByxieUGuekztHdeTSNL1-uwm%2B3_5raNOcGhLg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "XLA development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.
To post to this group, send email to
xla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/CABEBuAJyJuEo0qihwHFX8-sDmp89GWx%2B4k0UJaN%2BMaXMip%2Bsng%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
From: 'Bjarke Roune' via XLA development <xla...@googlegroups.com>
Reply-To: Bjarke Roune <bro...@google.com>
Date: Monday, April 9, 2018 at 2:42 PM
To: Justin Lebar <jle...@google.com>
Cc: "Chakraborty, Avijit" <avijit.ch...@intel.com>, XLA development <xla...@googlegroups.com>
Subject: Re: [xla-dev] Re: Dynamically loadable XLA plugin
On Thu, Apr 5, 2018 at 12:00 PM, 'Justin Lebar' via XLA development <xla...@googlegroups.com> wrote:
If we're going to have a manifest, I wonder if the shared library should simply register itself upon being dlopen'ed, the same as in the in-source build. There would be no need to add any new API (?).
On Wed, Apr 4, 2018 at 4:37 PM Chakraborty, Avijit <avijit.ch...@intel.com> wrote:
Thanks Justin for your feedback. Please see my responses below.
From: Justin Lebar <jle...@google.com>
Date: Wednesday, April 4, 2018 at 9:16 AM
To: "Chakraborty, Avijit" <avijit.ch...@intel.com>
Cc: XLA development <xla...@googlegroups.com>
Subject: Re: [xla-dev] Re: Dynamically loadable XLA plugin
Just to make sure I understand your argument, it is that the plugin API [0] needs to pass not just protobufs between it and XLA, but also other XLA and TensorFlow types?
[Avijit] The C plugin API would pass the protobuf and memory pointers. But the plugin implementation needs to use various TF & XLA data types and classes (e.g., traversing the HLO graph, unpacking input data, packing output etc.).
>Is it a problem to compile in and distribute these as part of your plugin?
No – that’s not a problem. In fact the XLA plugin will compile in necessary TF and XLA classes. But the issue is if the definition for these classes may have changed, then that would result in a problem.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+unsubscribe@googlegroups.com.
To post to this group, send email to
To view this discussion on the web visit
--
You received this message because you are subscribed to a topic in the Google Groups "XLA development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/xla-dev/LZdKcq7goko/unsubscribe
.
To unsubscribe from this group and all its topics, send an email to xla-dev+unsubscribe@googlegroups.com.
To post to this group, send email to
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/CAMuNMfo7WjLK6%2ByxieUGuekztHdeTSNL1-uwm%2B3_5raNOcGhLg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "XLA development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+unsubscribe@googlegroups.com.
To post to this group, send email to
To view this discussion on the web visit