Hi guys-
How much work would it be to add support for IBM Power, especially Power 10? The Power 10 architecture has the Matrix Math Accelerator instructions which according to IBM should let it keep up with GPUs for certain workloads. It would be interesting to see if JAX+XLA could leverage these and what the performance delta would be.
The documentation makes it sound like it shouldn't be terribly difficult given that there is already an LLVM backend for Power 10. I assume there is more to the story than this, however, as there would likely need to be some awareness of the MMA instructions and to be able to emit code that the backend can use to target it.
Thanks in advance for any pointers.