Support for outfeeds

48 views
Skip to first unread message

José Valim

unread,
May 8, 2024, 4:21:21 PM5/8/24
to iree-discuss
Hi everyone, thank you for your work on IREE.

Are there plans to support stablehlo.outfeed? We use to stream tokens on Whisper and other text generation models, otherwise we wait and get all tokens at once. So I am wondering if we should disable the streaming mode when using IREE or if you folks typically tackle these problems differently.

Thanks!

Jacques Pienaar

unread,
May 8, 2024, 4:25:20 PM5/8/24
to José Valim, iree-discuss
Hey José,

@Natasha Kononenko was looking in this area for a very similar reason recently (she is currently traveling so may be slow to respond).

Do you also use infeed host side? Or just a one sided outfeed and custom hooks host side?

Thanks,

Jacques


--
You received this message because you are subscribed to the Google Groups "iree-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iree-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/iree-discuss/9d9216e2-10e4-4675-93af-f95f9864b3een%40googlegroups.com.

ste...@laurenzo.org

unread,
May 8, 2024, 5:08:58 PM5/8/24
to iree-discuss
(resending because I got a bounce - apologies if duplicates)

The question would be what it lowers to. In the work I’ve done, we’ve typically done that in a different way, but if you’re coming from the stablehlo ecosystem, it makes some sense to use the programming model it has for that kind of comms.

 

If it were me trying to map this to IREE, I would lower it to a call into a VM module. Then you provide an implementation of the module at runtime or in the tools to do whatever I/O is called for. When I’ve done this in the past, I’ve often just implemented the prototype module in Python for testing (example: https://github.com/iree-org/iree/tree/main/samples/py_custom_module), but it isn’t very hard to write one in C (and then you just pass the SO to the runtime to load it and bind dynamically).

 

If going that route, you would add a pass to the stablehlo input pipeline which lowers the infeed/outfeed ops to util.calls of declared util.func functions like `stablehlo_rt.infeed` and `stablehlo_rt.outfeed`. Then the runtime will late band those to a `stablehlo_rt` module. If there are any of those extern declarations in the module (i.e. as a result of any surviving calls to them), then the runtime would require the module to be present. In this way, the compiler transforms just need to decide on the stub that represents the functionality.

 

There’s various advanced things that can be done from there, especially with respect to concurrency, but that’s the basic idea.


Jacques Pienaar

unread,
May 8, 2024, 5:18:03 PM5/8/24
to ste...@laurenzo.org, iree-discuss
https://github.com/iree-org/iree/tree/main/samples/custom_module/basic for a non-python one (some other variants there) - but for the other plumbing Stella mentions python or not doesn't matter.

-- Jacques

Stella Laurenzo

unread,
May 8, 2024, 11:55:52 PM5/8/24
to José Valim, iree-discuss
Generally for what? Consider that the entire device hal is just a VM module that the cl tools happen to create and preload (but for APIs, even that has to be created). The compiler is just generating code that calls into it, same as you would do here.

Having named stubs for infeed/outfeed is just a way of being able to compile generically while providing the behavior at runtime.

Of the systems I know that use infeed/outfeed style comms, none of them share any implementation details, so having a way to provide such a module at runtime gives everyone a chance to implement what they actually need. Maybe there's a common version for testing (ie. Write to file, etc). As an example, the PJRT plug-in could provide an implementation that bridged to it's API, while a standalone case could literally just wrap a deque that you can pull results from in another thread or something.





On Wed, May 8, 2024, 2:42 PM José Valim <jose....@dashbit.co> wrote:

If it were me trying to map this to IREE, I would lower it to a call into a VM module. Then you provide an implementation of the module at runtime or in the tools to do whatever I/O is called for.

I wonder if this is a good way to solve the problem generally? IREE could automatically lower the operation but I would be required to pass a VM module that it would invoke? So there is less plumbing required from the developer side.

José Valim

unread,
May 9, 2024, 3:59:08 AM5/9/24
to Stella Laurenzo, iree-discuss
Hi Stella,

Thank you. I am not familiar with IREE internals, my understanding (which may be wrong) from your first reply was that I'd need two steps: 1. lower the stablehlo.outfeed to something 2. implement the relevant VM API. My suggestion was to do 1 automatically (assuming it is indeed not yet done) and let the user only have to deal with the stub. :)

However, at the moment we are also using IREE via PjRT, so tackling that subproblem would be sufficient for us, so users of JAX/XLA have it working out of the box.

Sorry for the confusion!

--
You received this message because you are subscribed to a topic in the Google Groups "iree-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/iree-discuss/f21s2LvE5nI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to iree-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/iree-discuss/CAH8pnHbLTEnPKAjQOBwPXB7DLiGyZ%3DpQseUCY3qhGbRAmUi0RA%40mail.gmail.com.

Jacques Pienaar

unread,
May 9, 2024, 8:57:55 AM5/9/24
to José Valim, Stella Laurenzo, iree-discuss
I had typed this yesterday but forgot to send/you answered what I was going to ask. But will add it anyway:

Are you José referring by generally to having the pass to lower as Stella suggested to VM calls unconditionally, assuming they'll be registered but leaving registration up to "user"? (User as in user of IREE to build a compiler / runtime / solution)

(Which your response before my question already answered 🙂)

When I added these to XLA back in the day we started with a single blocking queue. That part was simple, the communication between device and host definitely specialized. The above pass by default would reduce effort on user although by taking care of the easier part, but at least it provides a "fill in blanks" spot for it. With the caveat that the error experience from forgetting to register these would be rather bad (although perhaps we could have an op that at runtime verifies whether given modules are loaded and then reports a user defined error if not ... Or have config option at compile time).

Basically not sure if providing a full example with indeed & outfeed support may be as useful as this given the rest/need to package up and deploy still. End user of packaged solution should indeed be unaware.

-- Jacques 

You received this message because you are subscribed to the Google Groups "iree-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iree-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/iree-discuss/CAGnRm4LXi8Rzq5ggMsKU4KrY%2Bo9rXXwTzjWCvv%3DoTxB3ij848g%40mail.gmail.com.

José Valim

unread,
May 9, 2024, 9:15:37 PM5/9/24
to Jacques Pienaar, iree-discuss
Thank you for the reply, there is no rush and that helps us plan accordingly.

We don't use infeeds extensively for now (the only use case we have can be easily worked around) but we were considering using it to implement things like ZeRO-Offload (i.e. to implement the host_callback that JAX provides).
Reply all
Reply to author
Forward
0 new messages