why is tensorflow backend so slow?

3,443 views
Skip to first unread message

tc

unread,
Apr 15, 2016, 10:57:58 AM4/15/16
to Keras-users
Hi,

I tried to run some keras/examples with both backends, and found that it is much slower when using tensorflow backend than theano backend. Here is a rough time cost per epoch I got:

mnist_cnn: TF: 30, TH: 5.
imdb_lstm: TF 250, TH 60.

I don't know if anyone else has similar experiences with the backends. If so, why is tensorflow backend so slow?

J Rao

unread,
Apr 15, 2016, 12:07:59 PM4/15/16
to keras...@googlegroups.com
I got similar result, you can search "Performance discrepancy between
theano and tensorflow" for my result. I have no idea why this is
happening.What is your TF version? Iwas using 0.6, not sure if the
latest version has improvements.
> --
> You received this message because you are subscribed to the Google
> Groups "Keras-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to keras-users...@googlegroups.com
> <mailto:keras-users...@googlegroups.com>.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/keras-users/d53ee9e8-5389-4b9c-b17c-b322653e062a%40googlegroups.com
> <https://groups.google.com/d/msgid/keras-users/d53ee9e8-5389-4b9c-b17c-b322653e062a%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

tc

unread,
Apr 15, 2016, 12:57:06 PM4/15/16
to Keras-users
I also used 0.6. Now just update it to 0.8rc0, and its performance improved (though still slower):

mnist_cnn: TF: 12, TH: 5.
imdb_lstm: TF 90, TH 60.

I am still not sure where the overhead is. And I also wonder if direct using tensorflow optimizer and others, it probably will be faster?

On Friday, April 15, 2016 at 12:07:59 PM UTC-4, J Rao wrote:
I got similar result, you can search "Performance discrepancy between
theano and tensorflow" for my result. I have no idea why this is
happening.What is your TF version? Iwas using 0.6, not sure if the
latest version has improvements.

On 4/15/2016 10:57 PM, tc wrote:
> Hi,
>
> I tried to run some keras/examples with both backends, and found that
> it is much slower when using tensorflow backend than theano backend.
> Here is a rough time cost per epoch I got:
>
> mnist_cnn: TF: 30, TH: 5.
> imdb_lstm: TF 250, TH 60.
>
> I don't know if anyone else has similar experiences with the backends.
> If so, why is tensorflow backend so slow?
> --
> You received this message because you are subscribed to the Google
> Groups "Keras-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to keras-users...@googlegroups.com

François Chollet

unread,
Apr 15, 2016, 1:56:18 PM4/15/16
to tc, Keras-users
I have benchmarked the use of TF optimizers vs. how Keras optimizers in TF, and it turns out there are small differences, but for most cases (including the 2 scripts you mention here) the Keras optimizers are faster than native TF optimizers. The differences are quite small though.

On 15 April 2016 at 09:57, tc <iamti...@gmail.com> wrote:
I also used 0.6. Now just update it to 0.8rc0, and its performance improved (though still slower):

mnist_cnn: TF: 12, TH: 5.
imdb_lstm: TF 90, TH 60.

I am still not sure where the overhead is. And I also wonder if direct using tensorflow optimizer and others, it probably will be faster?

On Friday, April 15, 2016 at 12:07:59 PM UTC-4, J Rao wrote:
I got similar result, you can search "Performance discrepancy between
theano and tensorflow" for my result. I have no idea why this is
happening.What is your TF version? Iwas using 0.6, not sure if the
latest version has improvements.

On 4/15/2016 10:57 PM, tc wrote:
> Hi,
>
> I tried to run some keras/examples with both backends, and found that
> it is much slower when using tensorflow backend than theano backend.
> Here is a rough time cost per epoch I got:
>
> mnist_cnn: TF: 30, TH: 5.
> imdb_lstm: TF 250, TH 60.
>
> I don't know if anyone else has similar experiences with the backends.
> If so, why is tensorflow backend so slow?
> --
> You received this message because you are subscribed to the Google
> Groups "Keras-users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to keras-users...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/2fd2bec5-6add-4b94-ab8e-091ae464c8bd%40googlegroups.com.

tc

unread,
Apr 15, 2016, 2:02:28 PM4/15/16
to Keras-users, iamti...@gmail.com
So does it mean that currently tensorflow is slower than Theano in single GPU device? If so, do you have any idea why? Is it because Theano compilation does more optimization than tensorflow (thus also tensorflow is faster at compilation)?

François Chollet

unread,
Apr 15, 2016, 2:03:21 PM4/15/16
to tc, Keras-users
So does it mean that currently tensorflow is slower than Theano in single GPU device? If so, do you have any idea why? Is it because Theano compilation does more optimization than tensorflow (thus also tensorflow is faster at compilation)?

Yes and yes.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/2fd2bec5-6add-4b94-ab8e-091ae464c8bd%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.

Dan Becker

unread,
Apr 16, 2016, 8:37:25 AM4/16/16
to Keras-users, iamti...@gmail.com
In my experience, the Tensorflow backend is faster for convolutions on 'tf' format data (row, column, channel) than on 'th' format data (channel, row, column).

Similarly, the theano backend is faster for convolutions on 'th' format data than on 'tf' format data.

So, a 'fair' benchmark would convert the data for each backend to the format it is fastest in using
np.swapaxes(1,3)

and then specify the appropriate dim_ordering argument to any convolutional layers in your model.

The default value of dim_ordering advantages Theano. Though as others have mentioned, Theano is still faster in an apples-to-apples comparison.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/2fd2bec5-6add-4b94-ab8e-091ae464c8bd%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

tc

unread,
Apr 16, 2016, 10:40:00 AM4/16/16
to Keras-users, iamti...@gmail.com
By using dim_ordering='tf' in keras (default is 'th'), it did bring down the time per epoch to 7s for mnist_cnn example, but still slower than 4~5s by theano. On the other side, if using 'tf' ordering for theano, its time per epoch would also go beyond 10s. So different input ordering indeed make a huge difference.

There might be other implementation tips/tricks are helpful, if only there are summaries of such.

varga....@gmail.com

unread,
Apr 18, 2016, 12:45:10 PM4/18/16
to Keras-users, iamti...@gmail.com
On neighboring thread https://groups.google.com/forum/#!topic/keras-users/BB_JPmtwa9s I realized that by default, TF does all this shuffling just to obey TH memory layout. I'm a bit biased  towards Theano, but it's still weird seeing TF as such a second class citizen. And it's not even really worth the trouble, because the flip semantics is different anyway, so I can't transfer serialized models between TF and TH, as explained by Francois on the above thread.

Wouldn't it make sense if all backends used their own preferred dim_ordering and flip direction, both for activations and for kernels, and the only place to bring them under the same Keras tent would be weight serialization (and backend API, obviously)? The serialization file layout could then be backend agnostic. (And not especially important. It could be decided between the two internal layouts by flipping a fair coin.)

I understand that Keras does not really have weight serialization, just dumping of backend non-agnostic internal memory representations, so we need to write brand new code for my proposal to happen. But isn't lack of real weight serialization a limitation anyway? Probably it's just my unfamiliarity, but I don't immediately see anything in the current codebase that forces us down this path.

Maybe something important escaped my attention, and the above is unfeasible. Maybe for regulars of the keras-users list this is already old hat, something often discussed that's only an annoying distraction by now. If that is the case, I'd be grateful for pointers to the relevant previous discussions.

Thanks,
Dániel

Dan Becker

unread,
Apr 18, 2016, 3:09:26 PM4/18/16
to varga....@gmail.com, Keras-users, iamti...@gmail.com
The dim_ordering is a statement about what format you provided the data to the model in.  So, it needs to be specified by the user. If "all backends used their own preferred dim_ordering" you would frequently be convolving over unintended dimensions.

I believe there are now functions to convert a kernel between TH and TF formats (Here is the commit. It probably arose in response to that thread).  So, it should now be possible to serialize the model for one backend, reload it and convert it to the other.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/2fd2bec5-6add-4b94-ab8e-091ae464c8bd%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/f921de8a-f0e3-4269-8652-12f52830009f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Keras-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/keras-users/0aYbPSGylEo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/dcf265e4-57c0-43ac-a90d-3ee49719d71b%40googlegroups.com.

varga....@gmail.com

unread,
Apr 18, 2016, 6:58:56 PM4/18/16
to Keras-users, varga....@gmail.com, iamti...@gmail.com


On Monday, April 18, 2016 at 9:09:26 PM UTC+2, Dan Becker wrote:
The dim_ordering is a statement about what format you provided the data to the model in.  So, it needs to be specified by the user. If "all backends used their own preferred dim_ordering" you would frequently be convolving over unintended dimensions.

That's a good point, thanks. My proposal is flawed, but I still believe there must be a better way than paying these 3 transposes at every single layer, on the default TF codepath:

Can we envision an API with the following constraints?:
- The same code can run, with no changes, under both backends.
- Regardless of which backend is used, there's no useless transpose penalty paid between all the adjacent conv layers.

I think I do have a solution now, but I also expect it to have unforeseen other issues, and I expect it to be hugely unpopular. Here it is anyway, let's call it a thought experiment. I'm interested in your counterpoints, but I'm not seriously proposing this:

We'd have a convolutional subclass of the Tensor class that augments it with a dim_ordering member variable. The convolutional classes and functions (the ones currently having a dim_ordering argument) would use this subclass for maps and kernels. Such a function would work like this:
- Obviously, it takes its dim_ordering from the subclass's member variable instead of a function argument. dim_ordering function arguments are all erased.
- It can take a tensor with any dim_ordering as an input,
- but if it encounters a tensor preferred by the other backend, the first thing it does is to transpose it into the backend's own preferred shape. From then on, everything happens according to the internally preferred representation.

This would shrink and streamline the current codebase. Some 200 occurrences of the dim_ordering argument could be removed, adding a trivial convert_to_the_current_backends_preferred_dim_ordering() helper function instead. As far as I can see, it could solve the flipping order serialization issue as well. But of course it brings its own issues, some I foresee, some I don't. So as I said, I don't expect it to be a net improvement in usability, and I'm not seriously arguing for it.


I believe there are now functions to convert a kernel between TH and TF formats (Here is the commit. It probably arose in response to that thread).  So, it should now be possible to serialize the model for one backend, reload it and convert it to the other.

Sure, it's possible, and I'm doing exactly this right now. (Thank you, Francois, I really appreciate the patch.) I was just looking for a more consistent, less error-prone way of doing it. Maybe there is none?

D.
 

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/2fd2bec5-6add-4b94-ab8e-091ae464c8bd%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/f921de8a-f0e3-4269-8652-12f52830009f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

François Chollet

unread,
Apr 18, 2016, 7:02:53 PM4/18/16
to varga....@gmail.com, Keras-users, Ting Chen
Look, how about this:

- if you are using TF, you use dim_ordering="tf".
- if you are using Theano, you use dim_ordering="th".

Then your code and your weights are still portable (modulo the convolution kernel conversion you have to do on your weights), and there are no unnecessary operations.

It's simple enough. It's the reason we have this argument in the first place.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/2fd2bec5-6add-4b94-ab8e-091ae464c8bd%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/f921de8a-f0e3-4269-8652-12f52830009f%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Keras-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/keras-users/0aYbPSGylEo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/dcf265e4-57c0-43ac-a90d-3ee49719d71b%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.

varga....@gmail.com

unread,
Apr 18, 2016, 7:24:13 PM4/18/16
to Keras-users, varga....@gmail.com, iamti...@gmail.com
Sure, I don't claim that adding dim_ordering=dim_ordering to all my layer constructors and convolution functions, and flipping the kernels manually is really that painful. In my original post I hoped for a relatively easy improvement over that. Dan showed me that it's not as easy as I imagined, and as I said, the rest was just a thought experiment.

Dan Becker

unread,
Apr 18, 2016, 7:29:39 PM4/18/16
to varga....@gmail.com, Keras-users, iamti...@gmail.com
Daniel,

If your data is in the wrong dim_ordering for your backend, you can avoid paying those transpose costs repeatedly by transposing the data before calling fit (i.e. with np.swapaxes(1,3)).

Then you can specify a dim_ordering that matches your back-end, and it's smooth sailing from there.
Dan

varga....@gmail.com

unread,
Apr 19, 2016, 4:37:21 AM4/19/16
to Keras-users, varga....@gmail.com, iamti...@gmail.com


On Tuesday, April 19, 2016 at 1:29:39 AM UTC+2, Dan Becker wrote:
Daniel,

If your data is in the wrong dim_ordering for your backend, you can avoid paying those transpose costs repeatedly by transposing the data before calling fit (i.e. with np.swapaxes(1,3)).

Then you can specify a dim_ordering that matches your back-end, and it's smooth sailing from there.
Dan

Yes, that's exactly what Francois wrote above as well, if I understand both of you correctly. You've already convinced me with your previous post that there's probably no better solution than that. I agree that it's an acceptable solution. But please understand that it's still an imperfect solution, from an API design perspective.

Basically, the point that I feel you underappreciate is that with Keras, there's no such a thing as "my preferred backend". Me personally, I use TF for making my code correct, and then I switch to TH and pay the 4 minutes upfront compile cost for better amortized runtime. So I don't even really care about TF slowness. But there's a way more important use case than mine:

Library writers, gitxiv releasers of novel new network topologies would like to release code that people can effectively use on both backends. These people are forced to write dim_ordering=dim_ordering  to every relevant constructor and function, and either release two weight files, or write their own serialization. Then their code is cloned and altered by less experienced people. If these other people use TF, and forget about adding just one of those dim_ordering=dim_ordering arguments, their code will mysteriously fail.

Okay, I'm tapping out before I'm boxed as the guy obsessed with backend agnostic code. :) As I said, I don't think the current solution is that bad. I just wanted to 1. make sure that we don't miss an obvious improvement over it, 2. make sure that we are on the same page about what its costs and benefits are.

D.


jasonz...@gmail.com

unread,
Apr 20, 2016, 7:35:38 PM4/20/16
to Keras-users, varga....@gmail.com, iamti...@gmail.com
I ran some timing tests with cifar10_cnn.py (without data augmentation) with a GTX 980 gpu.

I am running keras 1.0.1, theano 0.8.1, tensorflow 0.8, cuda 7.5, cudnn v4

Here are the timings:
Theano backend: 10 seconds per epoch
Tensorflow backend: 40 seconds per epoch
Tensorflow backend  (with dim_ordering='tf'): 20 seconds per epoch

Even with the 'tf' dim_ordering, tensorflow backend is 2x slower than theano. The good news is that with tensorflow, you dont have to spend about 2-3 minutes compiling the model, but in the long run, theano is still faster. 


ma...@plpp.de

unread,
Feb 14, 2017, 10:19:39 AM2/14/17
to Keras-users, varga....@gmail.com, iamti...@gmail.com, jasonz...@gmail.com
im having trouble finding any comparisons that were held within the past weeks or months. has anything changed regarding the performance since april last year?
Reply all
Reply to author
Forward
0 new messages