Can anyone give me any hints why XLA-JIT has better porformance on CPU backend?
I tried TensorFlow without and with XLA-JIT (manual mode) on mnist benchmark on a single CPU. Using XLA-JIT achieves 13.6x speedups against TensorFlow without XLA-JIT, which is quite significant, so I decided to get to know what is under the hood.
As operation fusion is often mentioned when talking about the advantages of XLA-JIT, I naturally thought this technique might be the reason behind, so I learned the source code and found the fusion procedure is roughly like this (please correct me if anything is wrong):
Considering the significant performance improvement, I think there must be something more that I miss or am mistaken about. May I have your advice?
--
You received this message because you are subscribed to the Google Groups "XLA development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xla-dev+u...@googlegroups.com.
To post to this group, send email to xla...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/3d63f1f6-81f8-4f69-8101-56932cdfe2c0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
To view this discussion on the web visit https://groups.google.com/d/msgid/xla-dev/0b255b1d-b1fb-496b-9036-905c89da9971%40googlegroups.com.