matrix multiplication optimization for different GPUs

121 views
Skip to first unread message

Evgeny Demidov

unread,
May 14, 2019, 5:26:40 AM5/14/19
to WebGL Dev List
Cedric Nugteren have optimized OpenCL kernels for Nvigia GPUs (although they are 2 times slower than cuBLAS kernels with GPU assembler codes). One can use similar shaders 6,7 in webgl2-compute, WebGPU, Vulcan https://www.ibiblio.org/e-notes/webgl/gpu/mul/sgemm.htm

Unfortunately Cedric have interesting observation at clBlas on AMD GPUs that these kernels are not optimal for AMD GPUs. He made one more shader 11 for Radeon R9 280X (2550/1960 faster than shader 7) at Inside clBlas. It is 1370/830 slower on Nvidia GPU.
Kai Ninomiya wrote me "When I ran the sgemm6b shader *in C++ with Dawn*, even ignoring that the input needed to be transposed, I saw pretty good results but they didn't quite reach our shader tuned for my hardware. I never got a chance to tune sgemm6b for my hardware" (may be he need use fastest on AMD sgemm7b).

As Jiajia wrote, on Intel GPUs shaders 6,7 need for parameters tuning with D3D drivers (they are fast with OpenGL backend). Have not any numbers (just Cederic's data for OpenCL).

Surprisingly I get high performance for shaders 6,7 on my small GPUs (GT 710 and AMD A6-5200 APU). I have not any webgl2-compute test results on modern main-stream GPUs (too numerous :)

Not sure if browser can detect GPU type for privacy reason. All that may be important for ML in WebGL and WebGPU. A prompt to choose shader type?

Evgeny

Ken Russell

unread,
May 17, 2019, 8:05:25 PM5/17/19
to WebGL Dev List
It would be great if the GPU, or at least the browser, could expose some parameters - rather than just the GPU device name (OpenGL renderer string) - which would capture all of the information needed to help compute kernels run at peak performance on that architecture.

Perhaps the web community can help by analyzing some of these shaders tuned for various architectures and seeing if there are any commonalities: for example, this GPU prefers its matrices to be passed in in column-major order, this one in row-major order, etc.

Several WebGL implementations support this extension:

which provides the description of the GPU, but only as a string.

-Ken


--
You received this message because you are subscribed to the Google Groups "WebGL Dev List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to webgl-dev-lis...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/webgl-dev-list/928d7d51-26a4-481a-89eb-4a330a71247e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages