Compiler/xla error for Transformer model on Colab TPU

215 views

Skip to first unread message

Wojtek Czarnowski

unread,

Jul 15, 2020, 11:12:22 AM7/15/20

to Swift for TensorFlow

When trying to train Transformer model on Colab with TPU I get this error:

2020-07-15 14:57:51.319334: F tensorflow/compiler/xla/xla_client/xla_util.cc:90] Invalid argument: From /job:tpu_worker/replica:0/task:0: Computation requires more parameters (333) than supported (limit 237).

Hmmm, my model runs on GPU with X10 backend nicely.

Here is notebook that crashes xla compiler:

https://github.com/wojtekcz/language2motion/blob/colab-tpu-error/notebooks/Motion2lang-Training/TrainMotion2langColabTPU.ipynb

Best, Wojtek

Brad Larson

unread,

Jul 16, 2020, 7:05:09 PM7/16/20

to Swift for TensorFlow

TPUs have unique characteristics as accelerators, and it may be that whatever calculation is being traced here ends up outside of what the XLA compiler can support when targeting them. I've created a tracking issue on swift-models that references your reproducer case, thanks for providing that.

This sounds like a model design issue, because our Transformer model should be compatible with running on TPUs, so perhaps we have something wrong in the implementation of it or one of the layers within it. Others with more TPU experience might be able to chime in with additional pointers.