> Could you please expand above what it is you are testing? The description says "OpenCL (native driver)" vs "Slang => SPIR-V". Where does ANGLE fit in this?
> I presume the latter (slang) means you are using Vulkan compute shaders, i.e. unrelated to ANGLE.
Exactly, I'm not experimenting with ANGLE.
I'm asking in the ANGLE community as one of your goals is to port any API to Vulkan Compute, so I was wondering whether you had encountered such precision issues.
> Does "OpenCL (native driver)" mean you are testing ANGLE with it's OpenCL backend? We do currently implement OpenCL with two implementations, one that forwards to the native driver (i.e. the OpenCL backend) and one that implements it over Vulkan (i.e. the Vulkan backend). Both are works in progress FWIW. So how are you exactly using ANGLE?
I would be curious to know the differences between ANGLE CL with OpenCL backend and ANGLE CL with Vulkan Compute backend, on the NVIDIA Hardware.
If it's what I'm observing, their outcome should be different.
Let me explain the exact test I'm running:
I'm running an optimization which is made of 4000 sequential kernels whose result of one kernel is fed to the next kernel (gradient descent).
What I'm seeing is that:
- VK (Adreno 830) converges at Step 4000; Loss: -0.8004418
- VK (NV RTX 3090) converges at Step 4000; Loss: -0.6021383 <= this is wildly different!
- CPU (Desktop) converges at Step 4000; Loss: -0.79429066
- CL (Adreno 830) Step 4000; Loss: -0.79901075
- CL (NV RTX 3090) Step 4000; Loss: -0.79938555
As the code and the SPIR-V fed to each backend is the same (a part from CPU; while for OpenCL I have exact same kernels),
the culprit to me is the driver. I've read online that the NV driver does aggressive optimizations on floats, that are allowed by the SPIR-V spec, and it makes sense as the principal use-case for Vulkan Compute is real-time graphics (hence performance over accuracy).
I tried decorating SPIR-V ops in several way but the driver seems just to ignore them and the loss keeps converging at Loss: -0.6021383.
This said, is this something you've experienced in ANGLE? Could you suggest anything?