RFC: Integrate torch-mlir into IREE

734 views
Skip to first unread message

Stella Laurenzo

unread,
Sep 4, 2023, 10:36:44 PM9/4/23
to iree-discuss
Hi folks - I would like to reach a consensus that we should take some form of dependency on torch-mlir and integrate its dialects and conversion pipelines directly into the IREE compiler.

For the past two months, we (Nod) have been executing on a proof of concept that folks may have seen: SHARK-Turbine. This project has as a primary goal "taming" the interface between PyTorch and IREE, providing a tight integration with all PyTorch modalities (torch.compile, eager execution, and AOT approaches). Further, it has been attempting to see how far we are from the torch-mlir vision to rebase most of the integration onto upstream PyTorch concepts, mostly centered around Dynamo.

Current Status
I'm happy to say that as of last week, we passed a milestone where we consider this proof of concept to be a success, and it is now time to determine what to do with it and position it in a way that it has maximum effect. Namely, we have determined that:
  • Interop at the FX/python level for graph extraction is viable without bridging through any of the legacy modalities (TorchScript, etc). Proving this is important in terms of integration because it allows for a much simplified approach to something like torch-mlir, requiring no native code dependency on PyTorch itself in order to bridge to a compiler backend. The entirety of the graph-level interop is now in one Python file, and it bridges directly via MLIR/IREE's Python API.
  • The Dynamo export path to FX is ready to replace the more legacy (tricky integration) approaches for performing whole graph extraction. In addition, Dynamo's native dynamic shape support mates well to IREE's own model, and it appears ready to handle challenging models (we have been testing with inference optimized LLMs).
  • Implementing a low-level export interface like iree-jax is now possible directly on PyTorch, producing similar levels of capability for assembling complicated programs/exports.
  • I have been syncing a subset of torch-mlir (just the dialects and conversions) to keep in sync with IREE's LLVM version for O(months), and I have not hit *any* integration hurdles for this subset (with the exception of one case of needing to tweak a warning flag). The integration surface is quite small (see here and here). I expect it could be further simplified as we begin to shed the legacy, pre-Dynamo paths in torch-mlir proper.
  • We considered interop via StableHLO and may consider it further in the future, but in a survey, it was not suitable for any of the workloads which our customers care about -- primarily due to immaturity with respect to custom ops, type support, extensibility, and convoluted conversion paths requiring third party dependencies that do not appear factored for our use. The StableHLO ecosystem continues to evolve, and we are free to make project-level decisions to embrace it more as the situation improves.
Benefits
We are happy to continue evolving the native compiler integration within the SHARK-Turbine project, but splitting the compiler in this way forks the ecosystem. Minimally, it creates a lot of redundant work around CI, but more importantly, it forces us into an untenable integration testing story: we either have to base integration tests on an import to certain internal dialects which are unstable or face the issue that we can't offer artifact stability across dependency versions. Ideally, I would like a robust integration test suite in upstream IREE that includes generated IR from PyTorch, testing a variety of models and modalities. Basing this on the torch+surrounding dialects, which we control and can handle upgrades for, gives us a reasonable path forward.

Approach
The SHARK-Turbine prototype already factors the PyTorch frontend as a compiler plugin. It should be a simple matter of adding it to a directory like compiler/plugins/fe/pytorch (parallel to the compiler/plugins/target tree) and setting some CMake flags. We would also need to add a submodule dependency on torch-mlir. We can consider other methods of managing this dependency over time, but this seems low enough overhead and inline with how we manage StableHLO, Jax's native frontend layer.

Doing this now also has the benefit that the code is mostly build scaffolding and has only had commits from a small set of individuals who have all signed the OpenXLA CLA -- meaning that it is in a state where it can easily be contributed directly. We are planning many enhancements to this layer of the system, and it will become increasingly hard to keep this in a clean state with respect to authorship.

We would continue to develop SHARK-Turbine, using a direct dependency on the compiler's APIs vs a private version with our plugin enabled.

Making this contribution would also eliminate the in-tree fork of the TMTensor dialect and supporting infrastructure, since (to the extent that those are still needed), they are drawn directly from their source and included as part of the plugin.

There are parallel discussions to be had on the torch-mlir side about simplifying the codebase in a post-Dynamo world. These are orthogonal concerns to this RFC, since the approach used already presumes that state and excludes the legacy pieces of the project.

Timeline
If there are no objections, I would like to make this contribution this week and simplify the state of the world so that we can contribute more tests to upstream.

Comments?
- Stella
Message has been deleted

Stanley Winata

unread,
Sep 4, 2023, 11:25:41 PM9/4/23
to iree-discuss
Hi Stella,

Thanks for the work on SHARK-Turbine! Very happy to see Pytorch, the current de facto standard of ML framework, getting better support. Super excited for the next steps of this project! Let us know how I can help and pitch in to accelerate the development cycle.

Best,
Stanley

Anush Elangovan

unread,
Sep 5, 2023, 12:28:00 AM9/5/23
to Stanley Winata, iree-discuss
Having seen a few features like Flash Attention, GPTQ etc threaded through the different abstractions - Heavy +1 to have a tight integration for PyTorch via Torch-MLIR in IREE. Having supported torch-MLIR and SHARK/IREE with multiple customers this would bring much tighter integration and feature velocity for end customers.   Also the CI infra can be simplified and moved upstream from SHARK so this work can benefit everyone using IREE with PyTorch (not just SHARK downstream). 

Thank you for the push Stella. 


--
You received this message because you are subscribed to the Google Groups "iree-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iree-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/iree-discuss/93590622-f65c-4b19-971c-f928e6f5c156n%40googlegroups.com.

Ben Vanik

unread,
Sep 5, 2023, 12:49:41 PM9/5/23
to Anush Elangovan, Stanley Winata, iree-discuss
As someone not familiar with the tangle of projects, what does this actually look like mechanically?
We have third_party/torch-mlir-dialects - is that going to be replaced with third_party/torch-mlir instead and that's it?
What other deps does torch-mlir have that may cause issues with our cross-compilation/split compiler/runtime cmake/etc?
What implications are there around llvm-project/mlir-hlo integrations given that torch-mlir also has a dep on it? Would we be locked into having a globally synchronized submodule tree through downstream users, us, mlir-hlo, and torch-mlir on llvm-project?

Stella Laurenzo

unread,
Sep 5, 2023, 1:11:19 PM9/5/23
to Ben Vanik, Anush Elangovan, Stanley Winata, iree-discuss
On Tue, Sep 5, 2023 at 9:49 AM Ben Vanik <b...@nod-labs.com> wrote:
As someone not familiar with the tangle of projects, what does this actually look like mechanically?
We have third_party/torch-mlir-dialects - is that going to be replaced with third_party/torch-mlir instead and that's it?
What other deps does torch-mlir have that may cause issues with our cross-compilation/split compiler/runtime cmake/etc?
What implications are there around llvm-project/mlir-hlo integrations given that torch-mlir also has a dep on it? Would we be locked into having a globally synchronized submodule tree through downstream users, us, mlir-hlo, and torch-mlir on llvm-project?

  • We would drop torch-mlir-dialects (it is a manually copied subset of torch-mlir anyway and is superseded by this). I would unwire the bespoke plumbing for all of this, as it would just be subsumed by depending on the full compiler plugin.
  • The part of torch-mlir that we depend on is just the torch, torch_conversion and (currently) tm_tensor dialects and a bag of conversion pipelines to take them down to linalg and IREE dialects.
  • The parts of torch-mlir that have exotic dependencies (vs compiler dialects/transforms) are already segregated and not included in this integration. So no deps beyond llvm-project itself.
  • Upstream torch-mlir will be refactored soon to completely remove the mlir-hlo dep, but it is already optional and not activated by our integration.
  • What we are actually depending on is very similar to stablehlo and has light dependencies on llvm-project/mlir itself (basically core IR/ and Linalg deps).

 

Julian Walker

unread,
Sep 5, 2023, 1:16:44 PM9/5/23
to Stella Laurenzo, Ben Vanik, Anush Elangovan, Stanley Winata, iree-discuss
Thanks Stella and team for the exploration and work here! An additional mechanics question to add to Ben's list: are there expected testing and benchmarking needs in the IREE repo to support this integration?

Also, a more product-focused question: what e2e scenarios will this unlock for customers of the IREE repo on server and mobile, and which out-of-tree projects would be needed? I'm just trying to develop a mental model of the interop between the components here. Thanks!

Stella Laurenzo

unread,
Sep 5, 2023, 1:25:03 PM9/5/23
to Julian Walker, Ben Vanik, Anush Elangovan, Stanley Winata, iree-discuss
On Tue, Sep 5, 2023 at 10:16 AM Julian Walker <juli...@google.com> wrote:
Thanks Stella and team for the exploration and work here! An additional mechanics question to add to Ben's list: are there expected testing and benchmarking needs in the IREE repo to support this integration?

I think that this work is a pre-requisite for further building out the in-tree regression suite for new workloads from PyTorch. I'm primarily interested in making sure that we build out the pkgci regression suite for various hardware based on using some of this: https://github.com/openxla/iree/tree/main/experimental/regression_suite

I don't think there are any testing/benchmarking needs going into doing this integration (more that it enables more testing flows to move from downstream to upstream).
 

Also, a more product-focused question: what e2e scenarios will this unlock for customers of the IREE repo on server and mobile, and which out-of-tree projects would be needed? I'm just trying to develop a mental model of the interop between the components here. Thanks!

This is needed for tight integration with native PyTorch features including newer Dynamo interactions, quantization and experimental features that are driving LLM development. More importantly, I think (since we can already do all of that in the downstream), is that it makes upstream capable of compiling/testing these flows, which will be a big improvement (currently a large amount of our time is spent triaging downstream failures which should have been gated upstream).

This would just be a submodule dep on torch-mlir, replacing the existing/ad-hoc torch-mlir-dialects snapshot. It only depends on a small subset (in terms of build/dep complexity) of torch-mlir, providing the dialects/conversions.

Stella Laurenzo

unread,
Sep 5, 2023, 1:33:12 PM9/5/23
to Julian Walker, Ben Vanik, Anush Elangovan, Stanley Winata, iree-discuss
On Tue, Sep 5, 2023 at 10:24 AM Stella Laurenzo <ste...@nod-labs.com> wrote:


On Tue, Sep 5, 2023 at 10:16 AM Julian Walker <juli...@google.com> wrote:
Thanks Stella and team for the exploration and work here! An additional mechanics question to add to Ben's list: are there expected testing and benchmarking needs in the IREE repo to support this integration?

I think that this work is a pre-requisite for further building out the in-tree regression suite for new workloads from PyTorch. I'm primarily interested in making sure that we build out the pkgci regression suite for various hardware based on using some of this: https://github.com/openxla/iree/tree/main/experimental/regression_suite

I don't think there are any testing/benchmarking needs going into doing this integration (more that it enables more testing flows to move from downstream to upstream).
 

Also, a more product-focused question: what e2e scenarios will this unlock for customers of the IREE repo on server and mobile, and which out-of-tree projects would be needed? I'm just trying to develop a mental model of the interop between the components here. Thanks!

This is needed for tight integration with native PyTorch features including newer Dynamo interactions, quantization and experimental features that are driving LLM development. More importantly, I think (since we can already do all of that in the downstream), is that it makes upstream capable of compiling/testing these flows, which will be a big improvement (currently a large amount of our time is spent triaging downstream failures which should have been gated upstream).

This would just be a submodule dep on torch-mlir, replacing the existing/ad-hoc torch-mlir-dialects snapshot. It only depends on a small subset (in terms of build/dep complexity) of torch-mlir, providing the dialects/conversions.

Another cluster of product features that are currently suffering by having these boundaries placed in a weird way: more user/framework control on advanced export flows, including weight externalization, training flows, etc. We need to be able to optimize these in short order and provide a unified experience for PyTorch, and removing these boundaries will let us directly target IREE features from PyTorch (vs the current situation where we lowest-common-denominator this stuff). We estimate that some of these paper cuts are adding up to 30-40% of the overhead of current LLM inference models in terms of runtime latency, and the inability to target IREE-specific features is driving workflows that are very intensive in terms of compile time and memory overhead. We need to address all of those at the source to meet requirements.

Ben Vanik

unread,
Sep 5, 2023, 5:33:40 PM9/5/23
to Stella Laurenzo, Julian Walker, Anush Elangovan, Stanley Winata, iree-discuss
Sounds great - thanks for clarifying on the parts of the project we'd be depending on. I imagine if not already there are things that could be added to torch-mlir's cmake to disable the behavior we don't need (beyond what EXCLUDE_FROM_ALL will do).

Stella Laurenzo

unread,
Sep 5, 2023, 6:10:15 PM9/5/23
to Ben Vanik, Julian Walker, Anush Elangovan, Stanley Winata, iree-discuss
I've already pared that down to the minimum and used our CMake setup to be more precise about it. I'll be working in torch-mlir post setting this up to add a bit more isolation (i.e. make the TOSA conversions optional/separate and maybe prune some of the hoops being jumped through for tm_tensor now that we don't need it for an IREE backdoor).

Also, I will not be implementing Bazel support for any of this as that is unmaintained upstream.

FYI - Removing the tm_tensor subset from IREE drops 2.9kLOC from the repo.

Stella Laurenzo

unread,
Sep 5, 2023, 8:09:20 PM9/5/23
to Ben Vanik, Julian Walker, Anush Elangovan, Stanley Winata, iree-discuss
Reply all
Reply to author
Forward
0 new messages