Hi, I've got a fairly complex seq-to-seq network that runs perfectly on CPU, but when I try to run it with tensorflow-gpu 1.5 + CUDA9.0 as the backend instead of regular tensorflow 1.5 (I use version 1.5 for some dependency issues), I end up with Incompatible shapes error when it tries to perform sparse categorical accuracy:

Both my labels and generated output should be shape [128,X], the batch size by the length of the sequences in the current batch. Because of batch randomization, X varies, and in this example it was 2. As you can see, the first tensor (which should be the labels, as produced by a keras sequence) almost certainly contains the correct values (its size is always the product 128*X), it has just been flattened due to some component of running on GPU rather than CPU. Because, remember, this runs without error on CPU.
Any idea how I can prevent this or, to adapt to it, flatten the second tensor as well? A Flatten() layer doesn't apply to the batch dimension, so that's no good, and I don't think Reshape() can affect the batch dimension either.
Maybe upgrading tensorflow would do it? I'd rather not have to do that because it will have a bit of a ripple effect, but if that's the only idea I guess I'll have to.
Thanks for any help!