OverviewI propose the creation of a new repository in iree-org, tentatively named "iree-test-suites", then I would like to start migrating parts of existing test suites into that repository from these sources:
Organizing these test suites into a standalone repository will draw a clear line around the core project build system and its unit/integration tests then build a common location for test suites to be developed within iree-org.
Organization / scopeI'm imagining each top level directory will be a self-contained test suite, possibly with some shared utilities (test runners, environment setup scripts, cache/file management tools, etc.):
iree-test-suites/
attention/
convolution/
matmul/
onnx-ops/
onnx-models/
stablehlo-ops/
stablehlo-models/
tensorflow-models/
tflite-models/
Or we could nest by category:
iree-test-suites/
frameworks/
onnx/
ops/
models/
tensorflow/
tflite/
generic/
attention/
matmul/
convolution/
Guidelines for test suite definitionsWe can define some ground rules that test suites should aspire to follow, but some rule bending and organic growth is expected. Critically, this repository will be disconnected from the core iree-org/iree repository by construction and thus will impose no direct burden on the build system(s) of the core project. Suggested rules:
- Input files are sourced from third party public hosts. We had been mirroring files to the iree-model-artifacts GCS bucket in iree-org/iree before, and nod-ai/SHARK-TestSuite uses a mix of public and private Azure storage accounts - I want to avoid both of those storage options for public test suites. We may be able to secure some cloud hosting as a member project in the LF AI & Data Foundation, but that will be a shared resource to treat carefully. Possible sources, leaning on upstream test suites as much as possible:
- Persistent test runners may use local caches for large files. Depending on how large the files are (e.g. 70b / 400b parameter LLMs), we could have several different groups of persistent runners with different caches pre-populated.
- Test suites can only depend on public IREE APIs offered through release package artifacts (e.g. the `iree-compile` tool bundled with the `iree-compiler` pypi package). Any tools like iree-e2e-matmul-test must be built locally in the test suites repository using downstream build systems (e.g. CMake).
- Generated test files are encouraged to be committed into the repository directly or using Git LFS, within reason. See https://docs.github.com/en/repositories/creating-and-managing-repositories/repository-limits and https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-git-large-file-storage for technical limits. If the repository gets unmanageably large, we can create a new one - not a luxury we really have with the core iree-org/iree repo :)
- Tests should aim to use common tools like `iree-compile` and `iree-run-module` whenever possible, or at least generate reproducer commands compatible with those tools, so project developers can use the upstream native tools to directly debug test issues, run benchmarks, and profile test models/cases. For example: if significant scripting or model development is needed in Python, that Python should generate artifacts (model.mlir, input.npy, output.npy) that can be processed using native tools.
The core IREE repository can run tests from the test suite on pull requests, pushes to main, nightly, etc. as desired, depending on the size of the test suites, and depending on runner availability. Configuration of tests can be somewhat complicated when crossing repositories, so this will need some careful design work. The ONNX tests have gone through a few iterations already.
Specific test suitesThe precise details here I expect will be reviewed step by step once the repository is created. Here's what I have visibility into now:
ONNXFirst announced on this list here:
https://groups.google.com/g/iree-discuss/c/-WSup4WZ0Xw/m/6ynRIgGeAAAJ and documented here:
https://iree.dev/developers/general/testing-guide/#external-test-suite, we converted the upstream ONNX "node" tests into a pytest project here:
https://github.com/nod-ai/SHARK-TestSuite/tree/main/iree_tests. Those tests have been running in iree-org/iree as part of
https://github.com/iree-org/iree/blob/main/.github/workflows/pkgci_regression_test.yml using "config files" here:
https://github.com/iree-org/iree/tree/main/build_tools/pkgci/external_test_suite.
I would like to lift parts of that directory to this new repository and later draw from the upstream ONNX Model Zoo (
https://github.com/onnx/models) as well - see
https://github.com/nod-ai/SHARK-TestSuite/issues/275.
TensorFlow/TFLite/StableHLOWe have tests and benchmarks for TensorFlow, TensorFlow Lite, and StableHLO ops and models scattered in a few places right now:
Matmul/convolution/attentionThese tests live under
https://github.com/iree-org/iree/tree/main/tests/e2e, use binaries from
https://github.com/iree-org/iree/tree/main/tools/testing/e2e, and use
https://github.com/iree-org/iree/blob/main/build_tools/cmake/iree_e2e_generated_runner_test.cmake (and the matching Bazel function/file).
I'd like to prototype a restructuring here modeled after how the ONNX tests were set up:
* Have generator scripts produce .mlir files and other artifacts
* Check those generated files in to the repository
* Add a test runner (pytest / CTest / Bazel / etc.) that runs test cases derived from those generated files, compiler options, and runtime options
Risks - cross repository changes
Splitting across repositories will make it harder to make atomic changes to code and tests. I think that's generally healthy in this case though - we should be testing the stable APIs of the core project, and this will add friction to changing interfaces, relying on internal compiler flags, or directly authoring unstable IR.
Beyond tests - benchmarksOnce we have an organized test suite, I think we should build benchmarks on top of the tests. I'm not sure at this time if that would be an extra layer on top of the test suite repository or a separate repository all together.
Implementation plan- Create iree-org/iree-test-suites repository with essentials (README.md, LICENSE)
- Lift ONNX test suite from https://github.com/nod-ai/SHARK-TestSuite/tree/main/iree_tests
- Lift matmul/convolution test suites as-is
- Prototype refactoring matmul/convolution test suites
I could also do that restructuring in a user repository before moving it into iree-org, if that would be useful.