Hello everyone,
We were recently testing some of XLAs parallelism features and ran in to some issues running replicated executables across different devices (in our case a GeForce RTX 2060 and 2070).
Digging into the source, we noticed that XLA validates that an executable is compatible with a device by just checking the name of the device against the name of the device it was compiled for. We were able to adjust this check and get everything running just fine, but have some follow-on questions.
I understand that an executable compiled for one device would run best on that device, but why does it need to be exactly the same device? My understanding was that so long as the architectures were the same, the generated executable should run without issue. What are the implications of making this check a bit less restrictive?