Graph mode in TF 2.0

Jonas Eschle

unread,

May 22, 2019, 6:27:07 AM5/22/19

to Discuss

Hi everyone,

so far, my understanding was that eager will be the default in 2.0 and graph mode functions like tf.Session are just moved more hidden. But the more I read, the more it looks to me that the graph mode will be dropped entirely and that it won't be possible anymore to build a graph like in TF 1.x at all.

We've built a library (zfit, and there are dozens more internal) that does likelihood fits using TensorFlow as a "math backend" as it was meant to be, compared to other approaches that focused too much on deep learning and depends on a lot of crucial graph features, for example:

- our PDFs are factories that stitch together a graph, so they are expensive Python code meant to be run once, not hundreds of times. Converting pieces to tf.functions means converting a lot of small pieces, which significantly hampers execution time as a lot is similar in different functions. For example combining the output of several pdfs into a graph and so something with it. This is rendered more or less inefficient now and would again require the user to use GradientTape mostly manually.

- since the logic to build a model includes 1000's of lines of python code with Python logic, converting the whole pdf to a tf.function is impossible.

- gradients can be taken with respect to any variable (in TF 1.x). This allows users to create any combination with parameters as input and zfit to take the gradient with respect to them. Since a GradientTape is required now, this would be left to the user to do right, which is quite an extra burden and likely to break, since sometimes also 2. derivatives are required. Doable, but very error prone.

- estimators are meant to be the replacements for graph based etc. Unfortunately, they do not fit for us, since we don't use layers or training metrics but simply low level TF functions. In short, they do not work well for anything not exactly a deep learning model.

eager execution bring nearly zero benefit, since before, with a "global" session and the sess run, the actual computation could be run anytime if needed. This freedom seems to be gone with TF 2.0.

TL;DR: we heavily rely on and love the graph mode. It allows to build a complex factory in python that builds a highly efficient graph. Is this possibility really gone in TensorFlow 2.0?

Best,
Jonas

Alexandre Passos

unread,

May 22, 2019, 11:33:13 AM5/22/19

to Jonas Eschle, Discuss

You can still use tf.compat.v1.Graph and tf.compat.v1.Session while we work around these issues. The underlying run-a-graph functionality in TF is not going away.

You can also use tf.compat.v1.wrap_function to build a graph and then prune and manipulate it like session.run allows you to do but still run it conveniently from a tf2 context while being able to do nice things like differentiating it.

That said, I don't understand why tf.function is not a good fit for you. Functions can call other functions, and if they do so you can amortize the graph building cost while not paying ~anything in terms of performance overhead since we inline most functions before execution.

Also with tf2 we're improving the 2nd and higher order derivative situation in TF quite a bit (see how control flow v2 is twice differentiable, for example, which is ~impossible to do in control flow v1).

I understand that porting your code to use tf.function and gradienttape is a lot of work, and I don't expect you to do this now. Just run the tf2 converter script on your codebase to make it work with future versions of tf and then see if the new features in tf.function, control flow v2, etc, appeal to you and switch piecemeal to them. I expect once you've taken a look at it from an incremental point of view and not a "OMG everything is going away! no!" perspective you'll find benefits in small code rewrites to adopt the new perspectives.

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/e3be3acc-00e1-487c-b0e9-7fed2d903091%40tensorflow.org.

--

- Alex

Jonas Eschle

unread,

Aug 14, 2019, 3:09:23 PM8/14/19

to Discuss

Thanks a lot for that (sorry, I missed the notification of your response)!

Before I answer, just to be clear: I am a huge fan of TensorFlow and the way you're progressing. Switching to TF2.0 including the work that has to be done in our framework is not a question to us and I am very glad about all the improvements! Better to break earlier than later.

And things like the 2nd derivative are just great for us!

The whole code basically builds one huge loss and contains a lot of python logic. Therefore, we do not have "several computational heavy parts", but a lot of small ones, so we can simply us tf.function to basically wrap the loss building. Two things are though needed:
- critical: the retracing and caching has to be transparent and not rely on the object identity (as currently is?). We basically always use wrapped objects that contain tensors and logic, mutable. So we need a way of forcing a retracing if an object is different. Is there any way to do that?

- caching of values: While TF seems great at caching values in runtime and between runtimes some constants as well, we would need a way to cache values that are "not constant", e.g. depend on a variable. While tracking the validity of the cache can be done in our framework, we would need a way to tell whether to cache certain values or not. How it can be done now: using a feed_dict, we can override the value of a node with the value we cached. Is something similar possible?

Thanks a lot for your efforts!
Jonas

On Wednesday, May 22, 2019 at 5:33:13 PM UTC+2, Alexandre Passos wrote:

You can still use tf.compat.v1.Graph and tf.compat.v1.Session while we work around these issues. The underlying run-a-graph functionality in TF is not going away.

You can also use tf.compat.v1.wrap_function to build a graph and then prune and manipulate it like session.run allows you to do but still run it conveniently from a tf2 context while being able to do nice things like differentiating it.

That said, I don't understand why tf.function is not a good fit for you. Functions can call other functions, and if they do so you can amortize the graph building cost while not paying ~anything in terms of performance overhead since we inline most functions before execution.

Also with tf2 we're improving the 2nd and higher order derivative situation in TF quite a bit (see how control flow v2 is twice differentiable, for example, which is ~impossible to do in control flow v1).

I understand that porting your code to use tf.function and gradienttape is a lot of work, and I don't expect you to do this now. Just run the tf2 converter script on your codebase to make it work with future versions of tf and then see if the new features in tf.function, control flow v2, etc, appeal to you and switch piecemeal to them. I expect once you've taken a look at it from an incremental point of view and not a "OMG everything is going away! no!" perspective you'll find benefits in small code rewrites to adopt the new perspectives.

On Wed, May 22, 2019 at 3:27 AM Jonas Eschle <jonas...@cern.ch> wrote:

Hi everyone,

so far, my understanding was that eager will be the default in 2.0 and graph mode functions like tf.Session are just moved more hidden. But the more I read, the more it looks to me that the graph mode will be dropped entirely and that it won't be possible anymore to build a graph like in TF 1.x at all.

We've built a library (zfit, and there are dozens more internal) that does likelihood fits using TensorFlow as a "math backend" as it was meant to be, compared to other approaches that focused too much on deep learning and depends on a lot of crucial graph features, for example:

- our PDFs are factories that stitch together a graph, so they are expensive Python code meant to be run once, not hundreds of times. Converting pieces to tf.functions means converting a lot of small pieces, which significantly hampers execution time as a lot is similar in different functions. For example combining the output of several pdfs into a graph and so something with it. This is rendered more or less inefficient now and would again require the user to use GradientTape mostly manually.
- since the logic to build a model includes 1000's of lines of python code with Python logic, converting the whole pdf to a tf.function is impossible.
- gradients can be taken with respect to any variable (in TF 1.x). This allows users to create any combination with parameters as input and zfit to take the gradient with respect to them. Since a GradientTape is required now, this would be left to the user to do right, which is quite an extra burden and likely to break, since sometimes also 2. derivatives are required. Doable, but very error prone.
- estimators are meant to be the replacements for graph based etc. Unfortunately, they do not fit for us, since we don't use layers or training metrics but simply low level TF functions. In short, they do not work well for anything not exactly a deep learning model.

eager execution bring nearly zero benefit, since before, with a "global" session and the sess run, the actual computation could be run anytime if needed. This freedom seems to be gone with TF 2.0.

TL;DR: we heavily rely on and love the graph mode. It allows to build a complex factory in python that builds a highly efficient graph. Is this possibility really gone in TensorFlow 2.0?

Best,
Jonas

--
You received this message because you are subscribed to the Google Groups "Discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dis...@tensorflow.org.

To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/e3be3acc-00e1-487c-b0e9-7fed2d903091%40tensorflow.org.

--
- Alex

Toby Boyd

unread,

Aug 14, 2019, 3:18:41 PM8/14/19

to Jonas Eschle, Discuss

Not at all overriding Alex's comment. I only converted our ResNet50 model and other than fixing a few places we used contrib I was able to make it work very easily with TF 2.0 using compat.v1. This is a pretty cool part of this TF 2.0 move. If you are not using contrib (or can find how to map it to a new method or using tf.addons) then you can join TF 2.0 without having to do major changes. Here is a link to the ResNet50 code, I doubt it is useful and I run this on TF 2.0 every day to ensure it still works.

Toby

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/68e60381-3a56-48cd-8eab-13bbedfb7e4b%40tensorflow.org.

Alexandre Passos

unread,

Aug 14, 2019, 3:25:10 PM8/14/19

to Jonas Eschle, Discuss

On Wed, Aug 14, 2019 at 12:09 PM Jonas Eschle <jonas....@cern.ch> wrote:

Thanks a lot for that (sorry, I missed the notification of your response)!

Before I answer, just to be clear: I am a huge fan of TensorFlow and the way you're progressing. Switching to TF2.0 including the work that has to be done in our framework is not a question to us and I am very glad about all the improvements! Better to break earlier than later.

And things like the 2nd derivative are just great for us!

The whole code basically builds one huge loss and contains a lot of python logic. Therefore, we do not have "several computational heavy parts", but a lot of small ones, so we can simply us tf.function to basically wrap the loss building. Two things are though needed:
- critical: the retracing and caching has to be transparent and not rely on the object identity (as currently is?). We basically always use wrapped objects that contain tensors and logic, mutable. So we need a way of forcing a retracing if an object is different. Is there any way to do that?

Making a new tf.function object forces retracing; using function.get_concrete_function will block all future retracing. Combine both to retrace at will.

- caching of values: While TF seems great at caching values in runtime and between runtimes some constants as well, we would need a way to cache values that are "not constant", e.g. depend on a variable. While tracking the validity of the cache can be done in our framework, we would need a way to tell whether to cache certain values or not. How it can be done now: using a feed_dict, we can override the value of a node with the value we cached. Is something similar possible?

Not really; just make the "cached" values arguments to the function, I think.

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/68e60381-3a56-48cd-8eab-13bbedfb7e4b%40tensorflow.org.

--

- Alex

Jonas Eschle

unread,

Aug 23, 2019, 3:45:01 PM8/23/19

to Discuss

On Wednesday, August 14, 2019 at 9:25:10 PM UTC+2, Alexandre Passos wrote:

Making a new tf.function object forces retracing; using function.get_concrete_function will block all future retracing. Combine both to retrace at will.

Sure, but I think that in TF there is an internal mechanism already to do that. Doing it manually is equivalent to what we do now already with the Tensors that are retrieved from a function (e.g. we implement our own "Tensor-cache", while tf.function has it's own built in). No way to hook into that system?

Not really; just make the "cached" values arguments to the function, I think.

Hm, not so golden given that there are a lot of user provided functions in between, these signatures should of course be kept clean, but I'll try, maybe there is a way around. Anyway, behind the scenes, a tf.function runs a Session with feed_dicts somewhere I assume (I was not able to trace it down fully)? I am also fine with customizing "Function" and adding hooks there for the running, would you consider that safe and a stable API? We would need that somehow anyway, but do we need to wrap function or is inheritance stable?

Thanks again!

Jonas

To unsubscribe from this group and stop receiving emails from it, send an email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/68e60381-3a56-48cd-8eab-13bbedfb7e4b%40tensorflow.org.

--
- Alex

Alexandre Passos

unread,

Aug 23, 2019, 3:47:29 PM8/23/19

to Jonas Eschle, Discuss

On Fri, Aug 23, 2019 at 12:45 PM Jonas Eschle <jonas....@cern.ch> wrote:

On Wednesday, August 14, 2019 at 9:25:10 PM UTC+2, Alexandre Passos wrote:

Making a new tf.function object forces retracing; using function.get_concrete_function will block all future retracing. Combine both to retrace at will.
Sure, but I think that in TF there is an internal mechanism already to do that. Doing it manually is equivalent to what we do now already with the Tensors that are retrieved from a function (e.g. we implement our own "Tensor-cache", while tf.function has it's own built in). No way to hook into that system?

Not with public APIs now, because we want to be able to change that system.

Not really; just make the "cached" values arguments to the function, I think.
Hm, not so golden given that there are a lot of user provided functions in between, these signatures should of course be kept clean, but I'll try, maybe there is a way around. Anyway, behind the scenes, a tf.function runs a Session with feed_dicts somewhere I assume (I was not able to trace it down fully)? I am also fine with customizing "Function" and adding hooks there for the running, would you consider that safe and a stable API? We would need that somehow anyway, but do we need to wrap function or is inheritance stable?

Actually there's no session / placeholder logic in tf.function; it uses PartitionedCall or other function-calling ops to execute the function graph containing the body; this works both in tfv1 graph mode and in tfv2 eager mode.

--

- Alex

Martin Wicke

unread,

Aug 23, 2019, 7:30:16 PM8/23/19

to Jonas Eschle, Discuss

You can continue to use graph mode using compat.v1.Session etc. This will be supported for all 2.x releases. So your initial impression is correct, they are just more hidden.

We believe you could likely refactor your library to make good use of functions, but I understand that may require a lot of reengineering.

TensorFlow 2.* will not force you to do that.

--

You received this message because you are subscribed to the Google Groups "Discuss" group.

To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.

Jonas Eschle

unread,

Aug 29, 2019, 11:40:06 AM8/29/19

to Discuss

@Alex

Not with public APIs now, because we want to be able to change that system.

Makes sense

Actually there's no session / placeholder logic in tf.function; it uses PartitionedCall or other function-calling ops to execute the function graph containing the body; this works both in tfv1 graph mode and in tfv2 eager mode.

Hm, I see. Side-question: is there any place where I can read up more about PartitionedCall and so on? I find myself desperately going through source code trying to learn about the ideas and mechanics. Do you have any "internal documentation" or similar? I'd love to understand more of the technicalities, but just learning from the code is rather unfeasible...

Jonas Eschle

unread,

Aug 29, 2019, 11:40:26 AM8/29/19

to Discuss

@Martin

You can continue to use graph mode using compat.v1.Session etc. This will be supported for all 2.x releases.

So what about 3+ releases? I guess it's not in your range yet, but still asking in case you got a clue ;)

We believe you could likely refactor your library to make good use of functions, but I understand that may require a lot of reengineering.

I've heavily looked into that and am very open to do reengineering. The problem is that we basically need:

- a tf.function wrap of our whole loss, which equals to building the model in graph mode

- applying function to subroutines (and the caching in general of tf.function) is of no real help to us; we already have this kind of "primitive" (object ID based) caching, we need though something more advanced: global states can change, our functions are not per-se idempotent, function arguments are classes (mutable). Since (referring to @Alex answer) there is no direct way of hooking into the system, it seems as if we don't gain anything and anyway need our own caching system.

We are currently converting the library on a branch blindly pushing for a true TF 2.0 version of it, but it is rather equivalent and needs just more wrapping (e.g. wrapping tf.function). Impossible: no. But no benefit at all as it seems, rather vice versa.

The problem is maybe that we are using TF as the mathematical HPC library, ("such as Numpy"), and are building a high level library wrapping a lot of TF behind the scenes, so controlling the execution directly instead of "hiding" it inside tf.function is no more work, actually a benefit. A more sophisticated caching system in tf.function would though be the game changer I think (including the possibility of global states). I am aware that this may not be needed by a lot of users, so I estimate the chances small. What's your prediction on that?

Alexandre Passos

unread,

Aug 29, 2019, 11:43:23 AM8/29/19

to Jonas Eschle, Paige Bailey, Discuss

https://www.youtube.com/watch?v=MSXouZPyTrc&t=10s (also check out the other inside tensorflow ones)

Sadly the slides are not public; I wonder if we can link them in youtube? +Paige Bailey do you know?

--

- Alex

Alexandre Passos

unread,

Aug 29, 2019, 11:46:27 AM8/29/19

to Jonas Eschle, Discuss

On Thu, Aug 29, 2019 at 8:40 AM Jonas Eschle <jonas....@cern.ch> wrote:

@Martin
You can continue to use graph mode using compat.v1.Session etc. This will be supported for all 2.x releases.
So what about 3+ releases? I guess it's not in your range yet, but still asking in case you got a clue ;)

Ideally not by 3.x releases (but that's still a ways out).

We believe you could likely refactor your library to make good use of functions, but I understand that may require a lot of reengineering.
I've heavily looked into that and am very open to do reengineering. The problem is that we basically need:
- a tf.function wrap of our whole loss, which equals to building the model in graph mode
- applying function to subroutines (and the caching in general of tf.function) is of no real help to us; we already have this kind of "primitive" (object ID based) caching, we need though something more advanced: global states can change, our functions are not per-se idempotent, function arguments are classes (mutable). Since (referring to @Alex answer) there is no direct way of hooking into the system, it seems as if we don't gain anything and anyway need our own caching system.

TBH I'd like to improve the tf.function caching story, I'm just wary of doing it so close to the public release. As long as we find a backwards-compatible way of doing so we'll do it.

We are currently converting the library on a branch blindly pushing for a true TF 2.0 version of it, but it is rather equivalent and needs just more wrapping (e.g. wrapping tf.function). Impossible: no. But no benefit at all as it seems, rather vice versa.

The problem is maybe that we are using TF as the mathematical HPC library, ("such as Numpy"), and are building a high level library wrapping a lot of TF behind the scenes, so controlling the execution directly instead of "hiding" it inside tf.function is no more work, actually a benefit. A more sophisticated caching system in tf.function would though be the game changer I think (including the possibility of global states). I am aware that this may not be needed by a lot of users, so I estimate the chances small. What's your prediction on that?

There are a few parts of TF that want an api to register a callback that will be called when executing tf.functions; this callback returns some state (potentially global?) that should be used as a cache key, and tf.function will make sure to retrace when the value of that changes. I'm still a little unsure how to best implement this, so this has been on the backburner while we fix tf2 high-priority bugs, but this will likely happen soon.

--

- Alex

Jonas Eschle

unread,

Aug 29, 2019, 12:03:17 PM8/29/19

to Discuss

Thanks a lot for the link, I'll have a look!

About non-public: I see and of course understand, though it's still annoying. Since we are researcher pursuing purely academic goals without any economical interest or bindings and actively contributing to TF (but also limited to the parts we "understand" without docs), do you know if there would be the possibility of accessing certain docs with e.g. a NDA or similar? If so, you may want to let me know non-publicly on jonas.eschleATcern.ch

TBH I'd like to improve the tf.function caching story, I'm just wary of doing it so close to the public release. As long as we find a backwards-compatible way of doing so we'll do it.

Sounds good!

There are a few parts of TF that want an api to register a callback that will be called when executing tf.functions; this callback returns some state (potentially global?) that should be used as a cache key, and tf.function will make sure to retrace when the value of that changes. I'm still a little unsure how to best implement this, so this has been on the backburner while we fix tf2 high-priority bugs, but this will likely happen soon.

Yep, custom cache key is the crucial point, that would be great!

Martin Wicke

unread,

Aug 29, 2019, 12:09:06 PM8/29/19

to Jonas Eschle, Discuss

On Thu, Aug 29, 2019 at 9:03 AM Jonas Eschle <jonas....@cern.ch> wrote:

About non-public: I see and of course understand, though it's still annoying. Since we are researcher pursuing purely academic goals without any economical interest or bindings and actively contributing to TF (but also limited to the parts we "understand" without docs), do you know if there would be the possibility of accessing certain docs with e.g. a NDA or similar? If so, you may want to let me know non-publicly on jonas.eschleATcern.ch

I think this is a pure misunderstanding: there is nothing non-public. What Alex was saying is that he's hesitant to mess with how tf.function works so close to a very public large release. We do not want to release a broken 2.0, so any big change in behavior at this point is very risky.

After 2.0, we'll be restricted to backwards compatible changes. But if there are good options, I am sure we'll find a backwards compatible way in which to use those.

Jonas Eschle

unread,

Sep 6, 2019, 12:05:32 PM9/6/19

to Discuss

Hi Alex, Martin,

first, thanks a lot for the "TensorFlow internals" playlist on youtube, it's exactly what I was looking for (in the last few years), it helps a lot (@alex, good talks!)! I encourage to have more of those out (but I also understand the work of cutting going with it).

Coming back on my main issue: caching in TF 2.0 (which we did with tf.Session). I guess we found a nice way, I am interested though in your opinion:

Caching with tf.cond and two tf.Variable, one for the cached value, one showing whether it is cached, calculating 'func'. The value is not the actual value, but a tf.cond, that, depending on the flag, calculates and caches the actual func or, if cached, returns the cached value:

cache = tf.Variable(...) # cached value, empty

flag = tf.Variable(...) # is cached?

value = tf.cond(flag, cache, actual_func)

def actual_func(...):

val = func(...)

cache.assign(val)
flag.assign(True)
return val

invalidating the cache is simple. My small test runs yield a very reasonable performance (~ like feed_dict). Do you have any concerns/opinion about this way of doing things? (assuming we need to cache up to max. 100 values, so not millions)

Do you have any concerns about this method?
On Thursday, August 29, 2019 at 5:46:27 PM UTC+2, Alexandre Passos wrote:

Alexandre Passos

unread,

Sep 6, 2019, 1:10:52 PM9/6/19

to Jonas Eschle, Discuss

This sounds like the idiomatic way to do things. If there's a performance problem I think we might be able to use XLA to inline the cond.

--

You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.

To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/7f239f21-fd57-4c08-ad1a-ea0b33ba0dac%40tensorflow.org.