Hi Harsh - good (somewhat loaded) question. The IREE team and the TFRT team are "friends" and we internally (to Google) collaborate extensively, especially in several key areas of the stack (which the resolution of TFRT diagrams presented publicly aren't really setup to call out):
- TensorFlow->XLA MLIR based lowerings
- MLIR/LinAlg development
- XLA HLO->LinAlg
- GPU codegen
- CPU codegen
- Core MLIR development.
In all of those cases, we are tightly coupled, and we expect that these components will dominate in all future states of the more compiler-based side of the infra. Subjectively, TFRT has been prioritizing getting the eager pipeline on large systems plumbed well while we (IREE) have been focused on more whole-program/compiler based approaches specifically for deployment form factors that are resource constrained and latency sensitive. These are not mutually exclusive, and in the fullness of time, I expect these paths will cross. There is also a general acknowledgment by all sides of the work that different scales can often require different investments, and the important thing is that we intersect the most strongly at the core infrastructure level. From that perspective, TFRT-of-today and IREE are targeting different scales of use cases, and if there is a corresponding strong reason in the future to move them closer together or further apart. We'll do that when the picture becomes clearer.
The integration question that we have our eye on the most would be when is it the right time to layer something like IREE under TFRT in the same way that the TPU driver will layer. The critical things to evaluate here when deciding when to do this are: a) have we moved the needle enough on the Vulkan/GPU side in IREE such that there is value in having such a compiler/backend to TFRT, and b) is TFRT and its integration with TensorFlow mature enough to take such a new backend. The answer to both of those right now is "no" but that is more a question of evaluating "when" than "if".
The key here is that I think, especially for resource constrained and embedded use cases, using IREE directly without layering through TFRT will be advantageous. There are a number of load bearing points on how each interface up the stack that will be important for such things at different scales. There are plenty of proofs needed from our side to make that position real, however. As Sean says as he raced me to the response, the two do operate at different layers and we are committed to maintaining usable layers at the different levels of the stack.