This error is occurring inside the function dot_product_attention. I don't even know how that's possible - it is all t2t code and the tensors are all of shape [None, 8, None, None, 4]. Has anyone else faced this kind of problem before?
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: In[0] and In[1] must have compatible batch dimensions: [1,8,1288,4608,4] vs. [1,8,1250,4,256]
[[{{node text_recognition/parallel_4_4/text_recognition/text_recognition/body/encoder_layer_0/local_2d_self_att/self_attention/local_self_attention_2d/local_2d/einsum/MatMul}}]]
(1) Invalid argument: In[0] and In[1] must have compatible batch dimensions: [1,8,1288,4608,4] vs. [1,8,1250,4,256]
[[{{node text_recognition/parallel_4_4/text_recognition/text_recognition/body/encoder_layer_0/local_2d_self_att/self_attention/local_self_attention_2d/local_2d/einsum/MatMul}}]]
[[training/gradients/AddN_232/_6519]]
File "/home/ssingh/src/tensor2tensor/tensor2tensor/layers/common_attention.py", line 1576, in dot_product_attention
logits = tf.einsum("...kd,...qd->...qk", k, q)
Tensor shapes printed out from the function:
in dot_product_attention, shapes(q,k,v)=([None, 8, None, None, 4], [None, 8, None, None, 4], [None, 8, None, None, 4])