PyTorch Frontend for IREE

Stella Laurenzo

unread,

Sep 7, 2023, 3:34:20 PM9/7/23

to iree-discuss

Hi folks --

Over the last couple of months, we've (on the Nod side) have been running an experiment to see if it was a good time to embrace PyTorch 2 as a real frontend to IREE. For a bit of history, IREE started with TensorFlow/TFLite as its primary frontend and then due to organizational affiliation, also moved on to Jax. PyTorch support was always through a bit of an "air gap" from the torch-mlir project and not well integrated. Many of the core devs worked on both, but there were always partitions in the code.

We've been working to change this via a new project that we are calling Turbine. Last week, we crossed the threshold to get real HF/LLaMa models working through this path (and we got to a 90% pass rate on the test suite we were tracking), using the native dynamic shapes support that Dynamo provides (and which matches IREE's model). This indicated to us that the experiment is over and we should aim to invest here and make it real.

You've already seen some of this emerging from the RFC to Integrate toch-mlir into IREE (now landed), project work on the #mai-tai channel, and IREE Python API enhancements over the last months. With the torch-mlir integration landed, we have been able to drop the custom compiler/runtime build of IREE that was making the project unwieldy: Turbine is now just a pure Python project which bridges to both torch.compile and torch.[_dynamo.]export while also providing a low level iree-jax look-alike programming model for export cases of complicated programs. With the enhancements to both PyTorch and IREE's Python APIs, we have been able to remove the complicated native-code dependencies that dominated prior approaches.

This isn't just about rebasing on a new tech: it is an essential step for unlocking some features:

More efficient handling and optimization of weights, especially for giant models.
Implementation of more complicated inference techniques which require additional low-level programming of the compiler/runtime.
Efficient/async, eager execution and memory management on all backends.
Enhanced integration of training based workflows.
Tight integration with quantization approaches.

We've had a lively internal thread at Nod for a while on some of this, and we will be moving our conversation to the #pytorch channel on Discord. I wanted to provide this context for what is going on so that people can follow along. It is early work still but proceeding quickly.

- Stella

Ramiro Leal-Cavazos

unread,

Sep 7, 2023, 4:17:09 PM9/7/23

to iree-discuss

This is really exciting! Having a fully supported Dynamo path will simplify the Torch-MLIR codebase immensely and it will make the user experience much better.

I have a few questions:

I noticed that Turbine uses the Python-based FX importer from Torch-MLIR with a few changes/fixes. Is the plan to upstream these?

Is there an example of the dynamic shape support? From a quick glance at the importer, it seems that the tensor shapes used when creating the MLIR are obtained from the `shape` property of type `torch.Size` in the `TensorMetadata`. I'm not sure this would result in dynamic shapes in MLIR, since one needs to generate a question mark character for the dynamic dimension.

Is there an example of running Llama e2e? I noticed there is a test for Llama, but it is marked as `expectedFailure`.

Looking forward to future updates,

Ramiro

Stella Laurenzo

unread,

Sep 7, 2023, 4:30:17 PM9/7/23

to Ramiro Leal-Cavazos, Arham Khan, Daniel Garvey, KwangKyun Kim (Bruce), iree-discuss

On Thu, Sep 7, 2023 at 1:17 PM 'Ramiro Leal-Cavazos' via iree-discuss <iree-d...@googlegroups.com> wrote:

This is really exciting! Having a fully supported Dynamo path will simplify the Torch-MLIR codebase immensely and it will make the user experience much better.

I have a few questions:

I noticed that Turbine uses the Python-based FX importer from Torch-MLIR with a few changes/fixes. Is the plan to upstream these?

It started as a total rewrite as it does change a few underlying assumptions. But as these things go, a fair bit of DNA got swapped. In my mind, I wanted to see how it went and then include it in a proposal to rework torch-mlir's Python surface area (as part of a general cleanup to extricate all of the old TorchScript bits, etc). It is going to grow some more IREE specific stuff, but I'm trying to add those as some kind of policy objects that let import be customized for different use cases.

It does use the MLIR Python API a bit more conservatively than torch-mlir, primarily because I wanted to avoid deep layers of Python bindings. And it also uses `iree.compiler.ir` directly. I figure there are ways to segregate those decisions so it can be shared between the projects, but I wanted to get a look at the end state before doing that.

Is there an example of the dynamic shape support? From a quick glance at the importer, it seems that the tensor shapes used when creating the MLIR are obtained from the `shape` property of type `torch.Size` in the `TensorMetadata`. I'm not sure this would result in dynamic shapes in MLIR, since one needs to generate a question mark character for the dynamic dimension.

@Arham Khan has been working on that and I need to double check where it landed. We saw it work once last week, but we've been scrambling a bit :)

Is there an example of running Llama e2e? I noticed there is a test for Llama, but it is marked as `expectedFailure`.

This was also in the "it worked once" with unspecified patches category...

We've been splitting the work between getting a torch.exportable version that is inference efficient and buffing up the importer enough to handle it. @Daniel Garvey and I have largely been working on the former, and @Arham Khan and @KwangKyun Kim (Bruce) the latter. We haven't completely met in the middle yet for general use but are trying to get there this week.

Looking forward to future updates,

It goes without saying, that we'd love your help, and a big part of this was coming out of stealth on this work and making a new watering hole that we could all pitch in to.

Ramiro

On Thursday, September 7, 2023 at 12:34:20 PM UTC-7 Stella Laurenzo wrote:
Hi folks --

Over the last couple of months, we've (on the Nod side) have been running an experiment to see if it was a good time to embrace PyTorch 2 as a real frontend to IREE. For a bit of history, IREE started with TensorFlow/TFLite as its primary frontend and then due to organizational affiliation, also moved on to Jax. PyTorch support was always through a bit of an "air gap" from the torch-mlir project and not well integrated. Many of the core devs worked on both, but there were always partitions in the code.

We've been working to change this via a new project that we are calling Turbine. Last week, we crossed the threshold to get real HF/LLaMa models working through this path (and we got to a 90% pass rate on the test suite we were tracking), using the native dynamic shapes support that Dynamo provides (and which matches IREE's model). This indicated to us that the experiment is over and we should aim to invest here and make it real.

You've already seen some of this emerging from the RFC to Integrate toch-mlir into IREE (now landed), project work on the #mai-tai channel, and IREE Python API enhancements over the last months. With the torch-mlir integration landed, we have been able to drop the custom compiler/runtime build of IREE that was making the project unwieldy: Turbine is now just a pure Python project which bridges to both torch.compile and torch.[_dynamo.]export while also providing a low level iree-jax look-alike programming model for export cases of complicated programs. With the enhancements to both PyTorch and IREE's Python APIs, we have been able to remove the complicated native-code dependencies that dominated prior approaches.

This isn't just about rebasing on a new tech: it is an essential step for unlocking some features:
More efficient handling and optimization of weights, especially for giant models.
Implementation of more complicated inference techniques which require additional low-level programming of the compiler/runtime.
Efficient/async, eager execution and memory management on all backends.
Enhanced integration of training based workflows.
Tight integration with quantization approaches.
We've had a lively internal thread at Nod for a while on some of this, and we will be moving our conversation to the #pytorch channel on Discord. I wanted to provide this context for what is going on so that people can follow along. It is early work still but proceeding quickly.

- Stella

--
You received this message because you are subscribed to the Google Groups "iree-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iree-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/iree-discuss/b2df5ea4-c2f3-4023-8717-c726b4e75275n%40googlegroups.com.

Sean Silva

unread,

Sep 7, 2023, 6:15:01 PM9/7/23

to Stella Laurenzo, iree-discuss

Really excited to see this integration reaching the level of polish it deserves! Onward!!

-- Sean Silva

--
You received this message because you are subscribed to the Google Groups "iree-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iree-discuss...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/iree-discuss/499cd093-e8c0-456e-862b-62903d412862n%40googlegroups.com.

Sambhav Jain

unread,

Sep 13, 2023, 8:48:22 PM9/13/23

to iree-discuss

Hi Stella et al,
This is an exciting direction indeed!

At Cruise we're quite heavily using the TorchScript frontend in Torch-MLIR (at the moment), but also interested in exploring some of the aspects of TorchDynamo (e.g. dynamic shape support, custom op/type support) to better position ourselves for this switchover to the new frontend - is Turbine something we would need to take a dependency on (in addition to or instead of Torch-MLIR)?

Also, could you or Ramiro share some comments on the timeline for deprecating TS support in Torch-MLIR. Even if we do deploy some of our newer workloads using the experimental Dynamo frontend, we might want the legacy TS mode to be alive for most of our existing production use-cases, at least for the foreseeable future. Is that something we can continue to rely on?

Best,

Sambhav

Stella Laurenzo

unread,

Sep 14, 2023, 1:41:47 PM9/14/23

to Sambhav Jain, iree-discuss

On Wed, Sep 13, 2023 at 5:48 PM 'Sambhav Jain' via iree-discuss <iree-d...@googlegroups.com> wrote:

Hi Stella et al,
This is an exciting direction indeed!

At Cruise we're quite heavily using the TorchScript frontend in Torch-MLIR (at the moment), but also interested in exploring some of the aspects of TorchDynamo (e.g. dynamic shape support, custom op/type support) to better position ourselves for this switchover to the new frontend - is Turbine something we would need to take a dependency on (in addition to or instead of Torch-MLIR)?

Turbine is a leaf project with a hard dependency on IREE (and transitively, torch-mlir, since IREE depends on that). There are things in there that we will look at upstreaming at an appropriate time, but we're developing it without targeting the lowest-common-denominator that torch-mlir does. Specifically, we are trying to keep the FxImporter pristine and configured through policy objects, as I expect that it will make sense to more broadly share that piece at some point.

The rest of it is tightly integrated with IREE's runtime (for eager execution) and IREE's beyond-linalg IR (for AOT) and uses IREE features as needed to meet our requirements. It may be possible eventually, if the AOT side works out, to consider a generalization of that (it is targeted on building multi-module programs that we need for various deployment scenarios vs the single func style of FxImporter).

The dividing line currently is that Turbine (via IREE) is using the torch-mlir dialects and the torch-to-linalg conversion pipelines but is not pulling in any of the Python side. We are also building out a standalone test suite that is based on having a Dynamo backend.

It's a community discussion to be had, but as we move to a PT2/Dynamo approach, I'm not going to be advocating for torch-mlir being a standalone full-stack project like it is today. I expect that we will peel it back to the MLIR/C++ dialects/conversions, some light weight Python utilities (like the FxImporter) that anyone can use to build a backend, and some testing conventions that are hopefully rooted in how PyTorch itself does things vs being its own island universe. We'll need to see how things take shape...

So to answer your question, unless if you are based on IREE, you wouldn't take a dependency on Turbine. We'd be happy to look at upstreaming valuable bits of Turbine that can be generalized into torch-mlir as they mature or there is a need. But it is still pretty early days... probably at least a couple of months to get all of our stuff rebased.

Also, could you or Ramiro share some comments on the timeline for deprecating TS support in Torch-MLIR. Even if we do deploy some of our newer workloads using the experimental Dynamo frontend, we might want the legacy TS mode to be alive for most of our existing production use-cases, at least for the foreseeable future. Is that something we can continue to rely on?

I think that it would be appropriate to calling it deprecated now, just as seems to be the defacto policy in PyTorch proper. However, we have no concrete timeline for removal. As long as it is useful and people are using it, we're certainly not going to delete it. In the worst case, if maintaining it comes at odds with Dynamo-ifying the codebase, I expect we'd take a branching strategy and call what's there "pt1" or something so it could be used as is for as long as needed.

Best,
Sambhav

On Thursday, September 7, 2023 at 3:15:01 PM UTC-7 Sean Silva wrote:
Really excited to see this integration reaching the level of polish it deserves! Onward!!

-- Sean Silva

On Thu, Sep 7, 2023 at 12:34 PM Stella Laurenzo <ste...@nod-labs.com> wrote:
Hi folks --

Over the last couple of months, we've (on the Nod side) have been running an experiment to see if it was a good time to embrace PyTorch 2 as a real frontend to IREE. For a bit of history, IREE started with TensorFlow/TFLite as its primary frontend and then due to organizational affiliation, also moved on to Jax. PyTorch support was always through a bit of an "air gap" from the torch-mlir project and not well integrated. Many of the core devs worked on both, but there were always partitions in the code.

We've been working to change this via a new project that we are calling Turbine. Last week, we crossed the threshold to get real HF/LLaMa models working through this path (and we got to a 90% pass rate on the test suite we were tracking), using the native dynamic shapes support that Dynamo provides (and which matches IREE's model). This indicated to us that the experiment is over and we should aim to invest here and make it real.

You've already seen some of this emerging from the RFC to Integrate toch-mlir into IREE (now landed), project work on the #mai-tai channel, and IREE Python API enhancements over the last months. With the torch-mlir integration landed, we have been able to drop the custom compiler/runtime build of IREE that was making the project unwieldy: Turbine is now just a pure Python project which bridges to both torch.compile and torch.[_dynamo.]export while also providing a low level iree-jax look-alike programming model for export cases of complicated programs. With the enhancements to both PyTorch and IREE's Python APIs, we have been able to remove the complicated native-code dependencies that dominated prior approaches.

This isn't just about rebasing on a new tech: it is an essential step for unlocking some features:
More efficient handling and optimization of weights, especially for giant models.
Implementation of more complicated inference techniques which require additional low-level programming of the compiler/runtime.
Efficient/async, eager execution and memory management on all backends.
Enhanced integration of training based workflows.
Tight integration with quantization approaches.
We've had a lively internal thread at Nod for a while on some of this, and we will be moving our conversation to the #pytorch channel on Discord. I wanted to provide this context for what is going on so that people can follow along. It is early work still but proceeding quickly.

- Stella

--
You received this message because you are subscribed to the Google Groups "iree-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iree-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/iree-discuss/499cd093-e8c0-456e-862b-62903d412862n%40googlegroups.com.

Confidentiality Note: We care about protecting our proprietary information, confidential material, and trade secrets. This message may contain some or all of those things. Cruise will suffer material harm if anyone other than the intended recipient disseminates or takes any action based on this message. If you have received this message (including any attachments) in error, please delete it immediately and notify the sender promptly.
--
You received this message because you are subscribed to a topic in the Google Groups "iree-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/iree-discuss/yyLTOXHsEZs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to iree-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/iree-discuss/5f0a8f60-cb84-45c5-aded-ccef3262383an%40googlegroups.com.

Reply all

Reply to author

Forward