Compiler/xla error for Transformer model on Colab TPU

213 views
Skip to first unread message

Wojtek Czarnowski

unread,
Jul 15, 2020, 11:12:22 AM7/15/20
to Swift for TensorFlow
When trying to train Transformer model on Colab with TPU I get this error:

2020-07-15 14:57:51.319334: F tensorflow/compiler/xla/xla_client/xla_util.cc:90] Invalid argument: From /job:tpu_worker/replica:0/task:0: Computation requires more parameters (333) than supported (limit 237).

Hmmm, my model runs on GPU with X10 backend nicely.

Here is notebook that crashes xla compiler:

Best, Wojtek

Brad Larson

unread,
Jul 16, 2020, 7:05:09 PM7/16/20
to Swift for TensorFlow
TPUs have unique characteristics as accelerators, and it may be that whatever calculation is being traced here ends up outside of what the XLA compiler can support when targeting them. I've created a tracking issue on swift-models that references your reproducer case, thanks for providing that.

This sounds like a model design issue, because our Transformer model should be compatible with running on TPUs, so perhaps we have something wrong in the implementation of it or one of the layers within it. Others with more TPU experience might be able to chime in with additional pointers.

Thanks for reporting the issue.
Reply all
Reply to author
Forward
0 new messages