Understanding HLO IR dumps for an example CNN architecture

Hashim Sharif

unread,

Apr 19, 2018, 11:59:15 PM4/19/18

to XLA development

Hi, All,

I was analyzing the HLO IR for the CNN built in this tutorial : https://www.tensorflow.org/tutorials/layers. Using TF_XLA_FLAGS=--xla_generate_hlo_text_to=$output_dir dumps a number of files under the $output_dir, however each file seems to be composed of a fraction of the total network operations. My questions are:

How are they HLO Modules linked/calling each other?
Are the trained hyperparameters (weights) included as part of the HLO modules? If not, how can they be extracted?
Is it possible to generate an HLO IR file that includes all the CNN operations?

Guidance is appreciated.

Hashim Sharif

unread,

Apr 20, 2018, 3:44:47 PM4/20/18

to XLA development

To be more specific, If I compile the following tensorflow code:


a = tf.placeholder(tf.int16)
b = tf.placeholder(tf.int16)

jit_scope = tf.contrib.compiler.jit.experimental_jit_scope  # Using JIT compilation
with jit_scope():
    add = tf.add(a, b)
    mul = tf.multiply(add, b)

with tf.Session() as sess:
    # Run every operation with variable input                                                                              
    print("Addition with variables: %i" % sess.run(add, feed_dict={a: 2, b: 3}))
    print("Multiplication with variables: %i" % sess.run(mul, feed_dict={a: 2, b: 3}))

Note that the tf.multiply is using the result from tf.add. For tf.add and tf.multiply I observe two different HLO modules, and each module contains an Entry point. Isn't there a representation of a data-flow edge across these modules that captures the data dependencies? So in this case, an edge between the add and multiply modules?

Justin Lebar

unread,

Apr 20, 2018, 4:53:16 PM4/20/18

to Hashim Sharif, XLA development

Hi, Hashim.

The example code you've given uses int16. I expect that phawkin's response
to your other thread is relevant:
https://groups.google.com/d/msg/xla-dev/eTg8IEFGOvs/hWWoREOSCwAJ.

-Justin
On Fri, Apr 20, 2018 at 12:44 PM Hashim Sharif <hashim....@gmail.com>
wrote:

> --
> You received this message because you are subscribed to the Google Groups
"XLA development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
email to xla-dev+u...@googlegroups.com.
> To post to this group, send email to xla...@googlegroups.com.
> To view this discussion on the web visit
https://groups.google.com/d/msgid/xla-dev/c055d047-ef3f-471a-8c1f-e3700a0e3101%40googlegroups.com
.
> For more options, visit https://groups.google.com/d/optout.

Hashim Sharif

unread,

Apr 20, 2018, 7:12:08 PM4/20/18

to XLA development

Hi, Justin,

I did use int32 (apologies for the error in the posted code)- fixed it after Peter's response. I am able to generate the HLO IR files, but I cannot make sense of how the HLO modules are being generated corresponding to the tensorflow constructs. I would have expected a single HLO IR file that contains all the HLO IR code, but the XLA compiler emits a number of these modules. How can one extract the dataflow relations between these modules. So for instance if the result of the *add* module was feeding into the *multiply* module, is there a data-flow graph that would connect the outputs of the *add* operation to the subsequent *multiply* operation? I also tried emitting the google protobuf file, but again two separate protobuf modules are generated (for each operation).

Justin Lebar

unread,

Apr 20, 2018, 7:48:19 PM4/20/18

to Hashim Sharif, XLA development

> I cannot make sense of how the HLO modules are being generated
corresponding to the tensorflow constructs. I would have expected a single
HLO IR file that contains all the HLO IR code, but the XLA compiler emits a
number of these modules. How can one extract the dataflow relations between
these modules. So for instance if the result of the *add* module was
feeding into the *multiply* module, is there a data-flow graph that would
connect the outputs of the *add* operation to the subsequent *multiply*
operation? I also tried emitting the google protobuf file, but again two
separate protobuf modules are generated (for each operation).

This is a big gap in our tooling, most of us cannot make sense of it
either. :)

If you use standard TF tools like tensorboard, you'll be able to see the
XLA clusters in your TF graph and their relation to one another.

But as far as *why* they got clustered that way, it's much harder to say.
I'm surprised that you're getting two clusters for this trivial program.
It's possible you need to move `a` and `b` into the jit scope.
On Fri, Apr 20, 2018 at 4:12 PM Hashim Sharif <hashim....@gmail.com>

https://groups.google.com/d/msgid/xla-dev/dc0367d7-7c6b-4ca7-b07c-0bac3b7418c7%40googlegroups.com

Hashim Sharif

unread,

Apr 21, 2018, 12:54:44 AM4/21/18

to XLA development

Thanks for your response Justin. I tried moving the variable declaration in the scope and that doesn't make a difference in the HLO IR output.

Allow me to explain our usage scenario and what we are trying to accomplish with the XLA compiler. As part of our work, we want to build a compiler frontend from HLO IR to our own Compiler IR that we are using for backend code generation. In order to translate from HLO IR, we need access to the full HLO IR graph with all the computations and the dataflow relations across the computations. Since you mention using TensorBoard for understanding the relationship among the HLO clusters, I assume there is an underlying representation (as datastructures) of the XLA graph. If we are to analyze the XLA graph for building our frontend are their parts of the XLA code you would suggest modifying/leveraging?

-Hashim

Justin Lebar

unread,

Apr 29, 2018, 12:11:30 PM4/29/18

to Hashim Sharif, XLA development

> Since you mention using TensorBoard for understanding the relationship
among the HLO clusters, I assume there is an underlying representation (as
datastructures) of the XLA graph.

There is: The TensorFlow graph. The XLA computations get embedded
(auto-clustered) into the underlying TF graph. That is, a TF graph may
contain multiple XLA clusters. These clusters are connected just like any
other TF ops.

It sounds like you may want to write TensorFlow code such that it's all
guaranteed to be compiled into XLA, so you can analyze the whole thing
using XLA. This is not really possible at the moment, but it's something
we're actively working on.
On Fri, Apr 20, 2018 at 9:54 PM Hashim Sharif <hashim....@gmail.com>

https://groups.google.com/d/msgid/xla-dev/8a8dc75d-6af5-4ef8-8f7a-e2d7f4dc0884%40googlegroups.com

Hashim Sharif

unread,

May 12, 2018, 3:03:55 AM5/12/18

to XLA development

Thanks Justin,

When you mention the TensorFlow Graph, are you referring to tf.Graph, as defined here: https://www.tensorflow.org/api_docs/python/tf/Gra ph#properties?

Regarding the conversion of TF graphs to XLA graphs, I am confused on the exact workflow. Specifically the interaction between the python classes and the the corresponding C implementation for XLA seems unclear. Under tensorflow/compiler/tf2xla, I can see routines that convert higher-level tfops to lower-level XLA ops (please correct me if I am wrong), however, I cannot figure out from where they are invoked and how these routines embed XLA operations into tf.Graph? If tf.Graph is a graph of tf.Operation objects, are XLA operations also represented as tf.Operation objects? How are the ops clustered into XLA clusters?

-Hashim

Justin Lebar

unread,

May 15, 2018, 5:39:17 PM5/15/18

to Hashim Sharif, XLA development

> When you mention the TensorFlow Graph, are you referring to tf.Graph, as
defined here: https://www.tensorflow.org/api_docs/python/tf/Graph#properties
?

Yes, exactly.

> Under tensorflow/compiler/tf2xla, I can see routines that convert
higher-level tfops to lower-level XLA ops (please correct me if I am
wrong), however, I cannot figure out from where they are invoked and how
these routines embed XLA operations into tf.Graph? If tf.Graph is a graph
of tf.Operation objects, are XLA operations also represented as
tf.Operation objects?

Yes. The op name is XlaLaunchOp.

> How are the ops clustered into XLA clusters?

I believe this occurs in mark_for_compilation_pass.cc.

-Justin

Hashim Sharif

unread,

May 15, 2018, 8:11:09 PM5/15/18

to XLA development

> Yes. The op name is XlaLaunchOp.

For testing, I am using the mnist_softmax_xla.py script that builds a simple TF computation and enables XLA compilation in the JIT session. After invoking sess.run(graph_output), I printed out the operation names (using the name field of tf.Operation) for the complete graph. However, I do not see any operations of the name "XlaLaunchOp". Does tf.Graph include the XLA ops directly, and if so how can one invoke the XLA passes to embed the XLA ops as part of tf.Graph?

-Hashim

Justin Lebar

unread,

May 15, 2018, 11:20:28 PM5/15/18

to Hashim Sharif, XLA development

> Does tf.Graph include the XLA ops directly, and if so how can one invoke the XLA passes to embed the XLA ops as part of tf.Graph?

It should, yes. If they're not there, I think that probably means you're not using XLA in the end.

See my previous email:

> It sounds like you may want to write TensorFlow code such that it's all guaranteed to be compiled into XLA, so you can analyze the whole thing using XLA. This is not really possible at the moment, but it's something we're actively working on.

It's also possible that like XLA isn't linked to your program in or something. I don't know, without seeing your code and concrete steps to reproduce, it's very hard to say.

--

You received this message because you are subscribed to the Google Groups "XLA development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.
To post to this group, send email to xla...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/98daf19c-2f68-4895-b563-8d755d133fc9%40googlegroups.com.

Hashim Sharif

unread,

May 16, 2018, 1:11:53 AM5/16/18

to Justin Lebar, XLA development

> It's also possible that like XLA isn't linked to your program in or something. I don't know, without seeing your code and concrete steps to reproduce, it's very hard to say.

Attaching the mnist_softmax.py source I am using - augmented with simple routines to print the TF ops in the tf.Graph. I am invoking the script as follows:

TF_XLA_FLAGS=--xla_generate_hlo_text_to=output_dir python mnist_softmax.py

Interestingly, I do see "dumping module" messages from hlo_graph_dumper.cc, and I do see the HLO IR output files under the specified output directory, so I would assume that XLA is running. It is likely that I may be incorrectly extracting the TF operations from the TF Graph. Any pointers are appreciated.

-Hashim

On Tue, May 15, 2018 at 10:20 PM, Justin Lebar <jle...@google.com> wrote:

> Does tf.Graph include the XLA ops directly, and if so how can one invoke the XLA passes to embed the XLA ops as part of tf.Graph?

It should, yes. If they're not there, I think that probably means you're not using XLA in the end.

See my previous email:

> It sounds like you may want to write TensorFlow code such that it's all guaranteed to be compiled into XLA, so you can analyze the whole thing using XLA. This is not really possible at the moment, but it's something we're actively working on.

It's also possible that like XLA isn't linked to your program in or something. I don't know, without seeing your code and concrete steps to reproduce, it's very hard to say.

On Tue, May 15, 2018 at 5:11 PM Hashim Sharif <hashim....@gmail.com> wrote:

> Yes. The op name is XlaLaunchOp.

For testing, I am using the mnist_softmax_xla.py script that builds a simple TF computation and enables XLA compilation in the JIT session. After invoking sess.run(graph_output), I printed out the operation names (using the name field of tf.Operation) for the complete graph. However, I do not see any operations of the name "XlaLaunchOp". Does tf.Graph include the XLA ops directly, and if so how can one invoke the XLA passes to embed the XLA ops as part of tf.Graph?

-Hashim

> How are the ops clustered into XLA clusters?

I believe this occurs in mark_for_compilation_pass.cc.

-Justin

--
You received this message because you are subscribed to the Google Groups "XLA development" group.

To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+unsubscribe@googlegroups.com.

mnist_softmax.py

Hashim Sharif

unread,

May 16, 2018, 1:22:02 AM5/16/18

to Justin Lebar, XLA development

Sent the wrong file earlier - attaching again. The function call to printTfGraphOps() prints the TF operation given the final output operation of the TF Graph - in the mnist_softmax example, "y" computed as tf.matmul(w,x) +b.

-Hashim

mnist_softmax.py

Justin Lebar

unread,

May 16, 2018, 1:36:01 AM5/16/18

to Hashim Sharif, XLA development

> Interestingly, I do see "dumping module" messages from
hlo_graph_dumper.cc, and I do see the HLO IR output files under the
specified output directory, so I would assume that XLA is running.

That is a safe assumption!

I'd guess that however you're getting the TF graph must be before
clustering happens.
On Tue, May 15, 2018 at 10:22 PM Hashim Sharif <hashim....@gmail.com>

an email to xla-dev+u...@googlegroups.com.

Hashim Sharif

unread,

May 16, 2018, 3:11:07 AM5/16/18

to Justin Lebar, XLA development

>> I'd guess that however you're getting the TF graph must be before
>> clustering happens.

I run sess.run(computation) before extracting the tf.Operation objects from the tf.Graph.Is the clustering done at sess.run or sess.close or a different program point? A slimmed version of my code is as follows:

def traverseInputOps(op):
print ("op.name = ", op.name)
for i in range(len(op.inputs)):
    input_op = op.inputs[i].op
    traverseInputOps(input_op)


def printTfGraphOps(output_op):
traverseInputOps(output_op)

def main():

# Defining the TF graph

x = tf.placeholder(tf.float32, [None, 784])
w = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.matmul(x, w) + b

# Enabling XLA compilation in the Session

config = tf.ConfigProto()
jit_level = 0
if FLAGS.xla:
# Turns on XLA JIT compilation.
jit_level = tf.OptimizerOptions.ON_1

config.graph_options.optimizer_options.global_jit_level = jit_level
run_metadata = tf.RunMetadata()
sess = tf.Session(config=config)
tf.global_variables_initializer().run(session=sess)

# invoking the TF Graph computations in the session (XLA enabled)

sess.run(y, feed_dict={x: mnist.test.images, y_: mnist.test.labels})
printTfGraphOps(y.op)
sess.close()

an email to xla-dev+unsubscribe@googlegroups.com.

Justin Lebar

unread,

May 16, 2018, 1:03:38 PM5/16/18

to Hashim Sharif, XLA development

I'm not sure what's going on for you (honestly I'm not much of a TF
expert), but in my debugging I've done something like:

TF_XLA_FLAGS="--xla_hlo_profile --tf_xla_clustering_debug
--tf_dump_graph_prefix=/tmp/foo" <my binary> --vmodule=xla_compiler=2

and this has dumped some useful results.
On Wed, May 16, 2018 at 12:11 AM Hashim Sharif <hashim....@gmail.com>
wrote:

>> an email to xla-dev+u...@googlegroups.com.

Alex Park

unread,

May 16, 2018, 1:59:29 PM5/16/18

to XLA development

you might find this method I am using for separating the graph optimization passes from actual execution useful...

https://github.com/apark263/tensorflow/tree/alex/v1.6.0/tensorflow/contrib/xla_extractor

This produces a SessionModule proto, which encapsulates the training operation (essentially by building the XLAlaunch Op and compiling it)

The variables are inputs to the entry computation, and the outputs are the updated state of the variables, so the XLA representation still remains "pure", since the buffer allocation and assignment happens outside of the function and can be implementation specific.

Hashim Sharif

unread,

May 17, 2018, 2:18:37 AM5/17/18

to XLA development

Hi, Alex,

Thanks for pointing that out.

> This produces a SessionModule proto, which encapsulates the training operation (essentially by building the XLAlaunch Op and compiling it)

Essentially what information does a SessionModule proto (returned by XlaExtract) contain? Does it contain the XLA graph? For our purposes, we need to extract the data flow graph of XLA operations and the associated tensor values (for constant tensors such as weights learnt in training). Any pointers are appreciated.

-Hashim

Hashim Sharif

unread,

May 17, 2018, 3:04:07 AM5/17/18

to XLA development

The variables are inputs to the entry computation, and the outputs are the updated state of the variables, so the XLA representation still remains "pure", since the buffer allocation and assignment happens outside of the function and can be implementation specific.

Are you suggesting that all variables in the TF Graph (for instance all weights in different layers of a DNN) are passed to the XLA entry computation at launch? Also, I am unaware of what an "XLALaunchOp" is. Could you provide more context?

Hashim Sharif

unread,

May 17, 2018, 4:08:41 AM5/17/18

to XLA development

Hi, Justin,

>> I believe this occurs in mark_for_compilation_pass.cc.

If the clustering is done in the XLA backend, do the XLA ops necessarily reflect as part of the python interface to tf.Graph? I can't seem to find python classes that embed the XLA operations as part of tf.Graph. Is it possible that XLA operations are only added to an underlying representation of the TF graph - not available via the python TF API?

-Hashim

Alex Park

unread,

May 17, 2018, 1:14:53 PM5/17/18

to XLA development

On Thursday, May 17, 2018 at 12:04:07 AM UTC-7, Hashim Sharif wrote:

The variables are inputs to the entry computation, and the outputs are the updated state of the variables, so the XLA representation still remains "pure", since the buffer allocation and assignment happens outside of the function and can be implementation specific.

Are you suggesting that all variables in the TF Graph (for instance all weights in different layers of a DNN) are passed to the XLA entry computation at launch? Also, I am unaware of what an "XLALaunchOp" is. Could you provide more context?

The XLA Session Module is an HLO representation of the computation that describes the dataflow and operations. The data itself (e.g. weights and inputs) are not part of that representation except as arguments. They are not necessarily passed to the computation at "launch", rather at execution time.

The XLALaunchOp is something that is created when JIT compiling portions of the TensorFlow Graph.

During TF graph execution, if JIT compilation is on, graph optimization passes are run on the TF graph that mark ops as XLA compilable, and then those ops are encapsulated into a graph partition that is turned into a single node with op type XLALaunchOp. This launch op can then be compiled using XlaCompiler.

Hashim Sharif

unread,

May 17, 2018, 6:52:17 PM5/17/18

to XLA development

Thanks Alex,

During TF graph execution, if JIT compilation is on, graph optimization passes are run on the TF graph that mark ops as XLA compilable, and then those ops are encapsulated into a graph partition that is turned into a single node with op type XLALaunchOp.

I am not sure what is a graph partition. Is it a subgraph of XLA computations that replaces a TF operation - for instance, a pooling operation in TF being replaced by lower-level XLA ops? When JIT compilation is enabled, are these ops added as tf.Operation objects?

This launch op can then be compiled using XlaCompiler.

Is it the case that the TF ops are only marked in tf.Graph and the actual mapping from TF ops to XLA ops is done later in the pipeline?

Earlier I was trying to view the XLA operations/computations as part of tf.Graph, but even after invoking (with JIT enabled) the TF graph with sess.run, the tf.Graph (in Python) did not include any XLA specific operations. Is this an unfair expectation?

-Hashim

Reply all

Reply to author

Forward