TPUs have unique characteristics as accelerators, and it may be that whatever calculation is being traced here ends up outside of what the XLA compiler can support when targeting them. I've created
a tracking issue on swift-models that references your reproducer case, thanks for providing that.
This sounds like a model design issue, because our Transformer model should be compatible with running on TPUs, so perhaps we have something wrong in the implementation of it or one of the layers within it. Others with more TPU experience might be able to chime in with additional pointers.
Thanks for reporting the issue.