Taking Raul’s statement to heart I did some more digging to see if I could use Apple’s Metal Performance Shader (MPS) API within J. As proof of concept I used the FFI available in the ‘dll’ library.
First not being an Apple developer the first problem with using Apple’s Metal API is that Apple likes to steer you toward Swift. There are some Metal APIs that can be used directly in C/C++ but not the MPS API. It is available to Objective-C, which it turns out if you write a typical C style function call it is accessible to J via the FFI facility in J.
There are some limitations. The MPS API only allows an 8000 x 8000 maximum for matrix multiplication. It also only uses 4 byte floats (aka float32). This is different from J which uses double as the underlying floating point representation. The Apple BLAS library also uses doubles. Which makes it easier to integrate right in the Source code as Elijah Stone had done.
I have a confession to make. Not being a swift or objective-c programmer I took an example swift program and took out the salient features I need and then asked ChatGPT to translate that into objective-c. It didn’t take much hand modification from there to get the MPS version of matrix product up and running and tested in a C calling program. I asked ChatGPT to make some J test code and write the FFI but it failed miserably and I had to do that myself.
If you have an M2 machine and are interested I made it installable into J via the ‘install’ command from GitHub.
install 'github:tmcguirefl/JmpsMP@main'
load 'tmcguirefl/mpsMP/mpsmp.ijs’
then run: testMP 0
Here are my results with impressive timing on the MPS multiply
Regular J matrix product
1018.25 1013.55 999.638 1000.76 1015.25 1032.3 1006.25 1022.34 1019.53 1014.35
Now with Apple Metal Performance Shader API
1018.25 1013.55 999.638 1000.76 1015.25 1032.3 1006.25 1022.33 1019.53 1014.35
Standard J MP : 6.08373 6.71108e7
MPS MP : 0.066262 5.45267e8
Yes that appears to be about 100x speed up. So I guess we know why GPU processing is favored for neural network / AI stuff.
All the code is available on Github so I won’t reproduce it here except for the links:
You might be the best person to work out the details of that kind of
implementation.
--
Raul
On Wed, May 29, 2024 at 8:14 AM Thomas McGuire <
tmcgu...@gmail.com> wrote: