Hello everyone,
* Replace various tests/benchmarks in-tree to use cuda2 and address potential issues
* And after being stable, delete cuda and rename cuda2 as cuda
I'll give updates along the way about the above.
cuda2 improves lots of aspects of the current cuda impl and we believe it should be a strict improvement, esp regarding async behavior and graph usage. It's needed for the long-term direction of IREE; and having a solid foundation right now is important. Hopefully this won't be too disruptive; but certainly let us know if you have any future questions or issues. :)