Shape mismatch on GPU but not CPU

77 views
Skip to first unread message

paulse...@gmail.com

unread,
Nov 20, 2018, 9:18:17 PM11/20/18
to Keras-users
Hi, I've got a fairly complex seq-to-seq network that runs perfectly on CPU, but when I try to run it with tensorflow-gpu 1.5 + CUDA9.0 as the backend instead of regular tensorflow 1.5 (I use version 1.5 for some dependency issues), I end up with Incompatible shapes error when it tries to perform sparse categorical accuracy:

Screen Shot 2018-11-20 at 9.09.20 PM.png


Both my labels and generated output should be shape [128,X], the batch size by the length of the sequences in the current batch. Because of batch randomization, X varies, and in this example it was 2. As you can see, the first tensor (which should be the labels, as produced by a keras sequence) almost certainly contains the correct values (its size is always the product 128*X), it has just been flattened due to some component of running on GPU rather than CPU. Because, remember, this runs without error on CPU.

Any idea how I can prevent this or, to adapt to it, flatten the second tensor as well? A Flatten() layer doesn't apply to the batch dimension, so that's no good, and I don't think Reshape() can affect the batch dimension either. 

Maybe upgrading tensorflow would do it? I'd rather not have to do that because it will have a bit of a ripple effect, but if that's the only idea I guess I'll have to. 

Thanks for any help!

Daπid

unread,
Nov 21, 2018, 9:03:17 AM11/21/18
to paulse...@gmail.com, keras...@googlegroups.com
That sounds like a bug. I would be hesitant to add a hack to it without a good testing until you have it nailed down. For all you know, the tensors are all garbled. My first step would be to upgrade TF and see if it is now fixed, it should be mostly backwards compatible.


/David.

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/a9a111cd-1a1f-47e3-8d92-81c2e5197a2f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

paulse...@gmail.com

unread,
Nov 29, 2018, 9:40:41 AM11/29/18
to Keras-users
I am now using the most recent versions of tensorflow and cuda+cudann and I'm getting the same error. Any other thoughts out there? I did a fresh conda environment using "conda create --name tf_gpu tensorflow-gpu" Then "conda install keras" on top of that. 

So it must be something with my code, right? I'm following the example in the keras documentation for multi_gpu_model and I'm able to run a less complex architecture. If I had to guess, it's something to do with using fit_generator() on a keras Sequence rather than just fit(). Should I be splitting up my sequence in some different way to run on multiple GPUs? Again, this all works fine on cpu or on a single gpu. 
Reply all
Reply to author
Forward
0 new messages