RFC: Create a new iree-org/iree-test-suites repository

81 views
Skip to first unread message

Scott Todd

unread,
Aug 7, 2024, 6:02:45 PM8/7/24
to iree-discuss
Overview

I propose the creation of a new repository in iree-org, tentatively named "iree-test-suites", then I would like to start migrating parts of existing test suites into that repository from these sources:
Organizing these test suites into a standalone repository will draw a clear line around the core project build system and its unit/integration tests then build a common location for test suites to be developed within iree-org.

Organization / scope

I'm imagining each top level directory will be a self-contained test suite, possibly with some shared utilities (test runners, environment setup scripts, cache/file management tools, etc.):

iree-test-suites/
  attention/
  convolution/
  matmul/
  onnx-ops/
  onnx-models/
  stablehlo-ops/
  stablehlo-models/
  tensorflow-models/
  tflite-models/

Or we could nest by category:

iree-test-suites/
  frameworks/
    onnx/
      ops/
      models/
    tensorflow/
    tflite/
  generic/
    attention/
    matmul/
    convolution/

Guidelines for test suite definitions

We can define some ground rules that test suites should aspire to follow, but some rule bending and organic growth is expected. Critically, this repository will be disconnected from the core iree-org/iree repository by construction and thus will impose no direct burden on the build system(s) of the core project. Suggested rules:
  1. Input files are sourced from third party public hosts. We had been mirroring files to the iree-model-artifacts GCS bucket in iree-org/iree before, and nod-ai/SHARK-TestSuite uses a mix of public and private Azure storage accounts - I want to avoid both of those storage options for public test suites. We may be able to secure some cloud hosting as a member project in the LF AI & Data Foundation, but that will be a shared resource to treat carefully. Possible sources, leaning on upstream test suites as much as possible:
  2. Persistent test runners may use local caches for large files. Depending on how large the files are (e.g. 70b / 400b parameter LLMs), we could have several different groups of persistent runners with different caches pre-populated.
  3. Test suites can only depend on public IREE APIs offered through release package artifacts (e.g. the `iree-compile` tool bundled with the `iree-compiler` pypi package). Any tools like iree-e2e-matmul-test must be built locally in the test suites repository using downstream build systems (e.g. CMake).
  4. Generated test files are encouraged to be committed into the repository directly or using Git LFS, within reason. See https://docs.github.com/en/repositories/creating-and-managing-repositories/repository-limits and https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-git-large-file-storage for technical limits. If the repository gets unmanageably large, we can create a new one - not a luxury we really have with the core iree-org/iree repo :)
  5. Tests should aim to use common tools like `iree-compile` and `iree-run-module` whenever possible, or at least generate reproducer commands compatible with those tools, so project developers can use the upstream native tools to directly debug test issues, run benchmarks, and profile test models/cases. For example: if significant scripting or model development is needed in Python, that Python should generate artifacts (model.mlir, input.npy, output.npy) that can be processed using native tools.
The core IREE repository can run tests from the test suite on pull requests, pushes to main, nightly, etc. as desired, depending on the size of the test suites, and depending on runner availability. Configuration of tests can be somewhat complicated when crossing repositories, so this will need some careful design work. The ONNX tests have gone through a few iterations already.

Specific test suites

The precise details here I expect will be reviewed step by step once the repository is created. Here's what I have visibility into now:

ONNX

First announced on this list here: https://groups.google.com/g/iree-discuss/c/-WSup4WZ0Xw/m/6ynRIgGeAAAJ and documented here: https://iree.dev/developers/general/testing-guide/#external-test-suite, we converted the upstream ONNX "node" tests into a pytest project here: https://github.com/nod-ai/SHARK-TestSuite/tree/main/iree_tests. Those tests have been running in iree-org/iree as part of https://github.com/iree-org/iree/blob/main/.github/workflows/pkgci_regression_test.yml using "config files" here: https://github.com/iree-org/iree/tree/main/build_tools/pkgci/external_test_suite.

I would like to lift parts of that directory to this new repository and later draw from the upstream ONNX Model Zoo (https://github.com/onnx/models) as well - see https://github.com/nod-ai/SHARK-TestSuite/issues/275.

TensorFlow/TFLite/StableHLO

We have tests and benchmarks for TensorFlow, TensorFlow Lite, and StableHLO ops and models scattered in a few places right now:
Matmul/convolution/attention

These tests live under https://github.com/iree-org/iree/tree/main/tests/e2e, use binaries from https://github.com/iree-org/iree/tree/main/tools/testing/e2e, and use https://github.com/iree-org/iree/blob/main/build_tools/cmake/iree_e2e_generated_runner_test.cmake (and the matching Bazel function/file).

I'd like to prototype a restructuring here modeled after how the ONNX tests were set up:

* Have generator scripts produce .mlir files and other artifacts
* Check those generated files in to the repository
* Add a test runner (pytest / CTest / Bazel / etc.) that runs test cases derived from those generated files, compiler options, and runtime options

Risks - cross repository changes

Splitting across repositories will make it harder to make atomic changes to code and tests. I think that's generally healthy in this case though - we should be testing the stable APIs of the core project, and this will add friction to changing interfaces, relying on internal compiler flags, or directly authoring unstable IR.

Beyond tests - benchmarks

Once we have an organized test suite, I think we should build benchmarks on top of the tests. I'm not sure at this time if that would be an extra layer on top of the test suite repository or a separate repository all together.

Implementation plan
  1. Create iree-org/iree-test-suites repository with essentials (README.md, LICENSE)
  2. Lift ONNX test suite from https://github.com/nod-ai/SHARK-TestSuite/tree/main/iree_tests
  3. Lift matmul/convolution test suites as-is
  4. Prototype refactoring matmul/convolution test suites
I could also do that restructuring in a user repository before moving it into iree-org, if that would be useful.

Scott Todd

unread,
Aug 7, 2024, 6:51:02 PM8/7/24
to iree-discuss
Hanhan and Ben asked about the tests in folders like https://github.com/iree-org/iree/tree/main/tests/e2e/tensor_ops here on Discord. I had some ideas on how to improve those in https://github.com/iree-org/iree/issues/17868. Most of the ideas there are orthogonal to which repository the tests live in. Those tests use hand-authored MLIR from both upstream and IREE dialects, so cross-repository changes would be tricky.

It is very useful for project developers to have some e2e test suite directly integrated into the core project so they can test their changes locally with minimal extra configuration. However, the current setup of those 'check' tests is tightly integrated into the CMake and Bazel build systems in ways that make them awkward to support in continuous integration workflows and when cross compiling (see the IREE_HOST_BIN_DIR CMake variable and iree-test-deps CMake target, for example). Furthermore, our current CMake/CTest setup does not allow for expressive target filtering across compiler+runtime+flag configurations or marking tests as expected to fail - only passing or disabled.

For those tests, a good rule of thumb could be "the dialects in the MLIR repository + IREE's dialects in LinalgExt are tested in-tree and framework ops like ONNX and StableHLO are tested out of tree".

Once we have a dedicated home for test suites and iterate on the ergonomics more, we can hopefully share any process improvements between repos.

--
You received this message because you are subscribed to the Google Groups "iree-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to iree-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/iree-discuss/54812c22-8b88-49e9-b49c-c785f6af84a9n%40googlegroups.com.

Scott Todd

unread,
Aug 8, 2024, 3:00:07 PM8/8/24
to iree-discuss
Surya has also been developing a kernel benchmarking suite at https://github.com/nod-ai/rocm-gemm-benchmark/ . I could see that project or at least parts of it folding into this new test suites repository - looking at directories like https://github.com/nod-ai/rocm-gemm-benchmark/tree/main/kernels/mlir, that sort of "generate test cases and commit them" matches how I'd like to handle these problem / parameter / size / data type sweeps. Having a common repository to test correctness and build benchmarks would make it easier to share that work with the community and run it across IREE's full support matrix.

Stella Laurenzo

unread,
Aug 8, 2024, 3:21:30 PM8/8/24
to Scott Todd, iree-discuss
Big +1. Go for it. I'd like the cost of adding novel test setups to not be dominated by "mkdir" and having a single repo that we can pin for this stuff makes a lot of sense.

Scott Todd

unread,
Aug 9, 2024, 2:22:18 PM8/9/24
to iree-discuss
Repository created: https://github.com/iree-org/iree-test-suites . I'll start setting up directories and forking/moving code.
Reply all
Reply to author
Forward
0 new messages