Adding torch-mlir end-to-end tests to IREE CI

174 views
Skip to first unread message

Sean Silva

unread,
May 2, 2022, 9:21:35 AM5/2/22
to iree-discuss
I have been maintaining a nightly run of the IREE nightly pip package vs the Torch-MLIR E2E tests in the iree-torch repo (https://github.com/google/iree-torch). In the last 3 weeks it has caught 2 legitimate regressions:


I would like to move these tests coreward to prevent regressions, as it has started to be a maintenance burden to file these bugs, XFAIL the tests, and then un-XFAIL them once the fix lands. Almost all the tests boil down to very small MLIR snippets (the IR in the bugs above is unreduced).

I don't think it is advisable to actually depend on pulling down Torch-MLIR itself for the testing (especially as new tests get added to the test suite). I propose the following:

1. We teach iree-torch how to spit out a snapshot of standalone iree-compile/iree-run-mlir invocations from the test suite. (this removes the dependence on Torch-MLIR)
2.  We add an IREE CI job (presubmit ideally) that bulk runs those commandline invocations (should be pretty quick).
3. We add to the build cop rotation the weekly task to regenerate the test snapshot and update the XFAILs / file any bugs identified by new tests.

What do folks think? I'm happy to do this once we agree on the end state we want.

-- Sean Silva

Scott Todd

unread,
May 2, 2022, 1:36:27 PM5/2/22
to Sean Silva, iree-discuss
I'm all for adding more test coverage :D

Agreed about the dependency layering. We should be steering towards the core repo not depending on high level ML frameworks. Until we have a stable ml_program dialect and maybe some notion of dialect versioning, this will be bumpy though.

Having instructions (or a script) for regenerating test files will be helpful. I'm not sure where the best place is for that responsibility though. We can start with the build cop rotation, but it might also make sense to fold it into updating LLVM/MLIR. Say a core MLIR change needs updates in the test files, we could 
  • Bump MLIR in Torch-MLIR/iree-torch then regenerate the test files (yet another thing to do prior to updating IREE, this won't scale as we include even more frontends)
  • Disable the tests until Torch-MLIR/iree-torch update
  • Update the test files manually
Even if someone on the build rotation regenerates the test files weekly, we could still hit cases like that when updating LLVM/MLIR (and the process for running that is separate from the build rotation)


--
You received this message because you are subscribed to the Google Groups "iree-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iree-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/iree-discuss/CAPxmVVxeqwESQOWGZw0EMtX2vANoYuQ_YjtdWHo12rWvHquiUg%40mail.gmail.com.

Robert Suderman

unread,
May 2, 2022, 1:47:01 PM5/2/22
to Scott Todd, Sean Silva, iree-discuss
It would be valuable get a consistent nightly for iree's pip dependencies in general. It doesn't look like a new release has been generated for the last two weeks which is becoming an issue for the iree-jax integration wor. I have been trying to solve the same integration issues - jax updates jaxlib in an adhoc way sometimes overwriting the newest release. Normally this would be fine except it pulls in mhlo changes which may make it incompatible with the pinned version of the iree compiler.

Mahesh Ravishankar

unread,
May 2, 2022, 2:09:03 PM5/2/22
to Robert Suderman, Scott Todd, Sean Silva, iree-discuss
A lot of the terminology here is hard for me to really follow. The thing that sticks out to me here is the (circular?) dependency between IREE version and LLVM version used for the tests. I'd really like to avoid adding more overhead to the already heavy llvm bump process. So not making it a presubmit thing, but a post submit thing is preferable for me. So far the bump is being done by compiler folks in IREE, and navigating multiple infrastructure pieces is harder (at least for people especially challenged in such things like me). Post submit fixes for any failures should be easy given they are really small tests.



--
Mahesh

Sean Silva

unread,
May 3, 2022, 5:57:55 AM5/3/22
to Mahesh Ravishankar, Robert Suderman, Scott Todd, iree-discuss
This just caught another regression btw: https://github.com/google/iree/issues/9036

-- Sean Silva

Sean Silva

unread,
May 3, 2022, 6:36:12 AM5/3/22
to Mahesh Ravishankar, Robert Suderman, Scott Todd, iree-discuss

Geoffrey Martin-Noble

unread,
May 10, 2022, 4:56:18 PM5/10/22
to Sean Silva, Mahesh Ravishankar, Robert Suderman, Scott Todd, iree-discuss
The thing you're describing where we have frontends produce artifacts that are usable as standalone test artifacts has been the vision for frontend testing for a while, but has remained unimplemented. I think we generally agree that it's the best way to decouple things. We'd really like those artifacts to be in some stable interchange format though. Every time we've done something that requires someone manually updating artifacts in an unstable format it has been quite painful and we've fought hard to get rid of those. I agree with Mahe that piling this onto an already over-burdened LLVM integrate process is not a great solution and I'm also concerned about potential cyclic dependencies in the LLVM bump (doesn't torch-mlir wait for IREE to bump?). At some point we're going to need to have a more comprehensive solution for frontend testing with version skew. Does PyTorch have any stable model format we could use? What I'd been planning to do when thinking about this for TF (which we no longer intend to pursue, I believe) was to generate the artifacts from Python as part of the build.

I would rather pull down the torch-mlir repo each time than add another arcane updating process somewhere. This would then have to be postsubmit-only and be treated like the samples builds and fixed up after the fact in the case of version skew.

Sean Silva

unread,
May 11, 2022, 10:31:59 AM5/11/22
to Geoffrey Martin-Noble, Mahesh Ravishankar, Robert Suderman, Scott Todd, iree-discuss
That seems to be a different opinion than Stella's in another thread -- Stella, what are your thoughts?

-- Sean Silva

Sean Silva

unread,
May 16, 2022, 5:46:12 AM5/16/22
to Geoffrey Martin-Noble, Stella Laurenzo, Mahesh Ravishankar, Robert Suderman, Scott Todd, iree-discuss
Ping, @Stella Laurenzo 

-- Sean Silva

Stella Laurenzo

unread,
May 16, 2022, 10:31:42 PM5/16/22
to Sean Silva, Geoffrey Martin-Noble, Mahesh Ravishankar, Robert Suderman, Scott Todd, iree-discuss
We can't solve everything at once. I would rather have a test setup that starts to provide some signal and breaks occasionally/needs to be regenerated when APIs break vs the current situation... where we don't test. In practice, solutions to version skew never get fixed until there is some motivation to do so. Observing the skew ourselves and collecting some data on the impact helps make the case and motivates the investments.

I don't think that putting this on the LLVM bump is the right thing to do -- at least without significant soak time and investments to make it stable. The last time we did this, with tflite, we went for many, many months where the team working on that owned failures. The same is called for here. I would like to see the data to make a better decision, and that involves building it and observing, potentially for an extended period of time before we add blocking activities.

Sean Silva

unread,
May 20, 2022, 9:17:52 AM5/20/22
to Stella Laurenzo, Geoffrey Martin-Noble, Mahesh Ravishankar, Robert Suderman, Scott Todd, iree-discuss
On Tue, May 17, 2022 at 4:31 AM Stella Laurenzo <laur...@google.com> wrote:
We can't solve everything at once. I would rather have a test setup that starts to provide some signal and breaks occasionally/needs to be regenerated when APIs break vs the current situation... where we don't test. In practice, solutions to version skew never get fixed until there is some motivation to do so. Observing the skew ourselves and collecting some data on the impact helps make the case and motivates the investments.

I don't think that putting this on the LLVM bump is the right thing to do -- at least without significant soak time and investments to make it stable. The last time we did this, with tflite, we went for many, many months where the team working on that owned failures.

In this context, do you mean that the Torch-MLIR team should own that test and fix it when it breaks?

-- Sean Silva

Stella Laurenzo

unread,
May 20, 2022, 11:39:00 AM5/20/22
to Sean Silva, Geoffrey Martin-Noble, Mahesh Ravishankar, Robert Suderman, Scott Todd, iree-discuss
I think so, at least to get initial mileage on it. But my main requirement is that it is not blocking a presubmit until it is stable, and that may involve more work than just setting things up (i.e. there is not a stable op set here -- should that be fixed?).

I suspect that how this will go: we will set this up and have relatively low-frequency breaks due to API mismatches. We observe that and collectively work to fix that over ~months. This is likely of an overall benefit to IREE itself, and the main part of the IREE team would be motivated to help keep this running. In the very initial stages though (i.e. working out scripts, automation, documenting processes, etc), these things tend to have a lot of sharp edges and it is traditional for the author to be responsible for keeping it working for a while while getting the mechanism stable and knowledge transferred.
Reply all
Reply to author
Forward
0 new messages