Sorry I totally forgot to respond to this! I'm +1 on graduating this and on the staged approach you're taking.
First a high-level point that is perhaps more of a question. Are we committing to putting a sustained effort into this as a target for IREE? We've had some cases in the past of prototypy things that were introduced but then bitrotted. As a litmus test, if we did some big migration, would this backend get migrated to work, or would it be disabled to be "fixed at some point"? My impression from the work you've been doing, we are indeed trying to make this first-class (subject to limitations around weird toolchain stuff), but I just want to double check that we are committed to that here.
I like keeping WASM, at least for testing that we can do on a CPU without a browser (which would be required for the other drivers IIUC). I'd also suggest we start with the emscripten build only on postsubmit to start. I know it's already had postsubmit CI for a bit now on Buildkite, but I'd like to make sure everything's stable on GitHub actions before we add it to presubmits. When adding tests, let's make sure to get unit tests running on emscripten before we add any e2e model tests or browser shenanigans.
Overall this LGTM. I think we should have the external drivers conversation eventually, and wouldn't object if we decided to have it with this in particular (especially if dependency stuff starts to get complicated), but I'm fine keeping it in-tree for now.
Regarding on-demand dependencies. Can we make these as not-magic as possible? Is this the sort of thing where it needs to be pinned to a really specific version or could it work with an existing install on a dev's device? Do we want to support a range of versions? In the latter cases I'd prefer we use an existing install, pin some specific version on the CI, and give devs a script they can run to install a known-good version. If we do fetch anything as part of the build system, please make sure that it is only fetched once, cleans up after itself, etc.
These latter things are all basically implementation details though. If we are indeed think the experiment has proved what it needed to, then I'm on board with graduating it in the measured fashion you describe.