The CPU backend of ArrayFire was implemented as a fallback for the CUDA/OpenCL
backends. With that said, we are evaluating ways of improving the performance of
the CPU backend.
I am also curious why you do not want to use the OpenCL backend to perform CPU
operations. We have been discussing how we can improve the CPU performance
but it looks like it will just be an exercise to repeat the OpenCL implementations with
extra complexity in the binary. The reason this is not a problem in libraries like Eigen
is because they are header only library whereas ArrayFire is an actual library. In order
to support all architectures, we will need implementations for AVX, SSE, and Non-vectorized
versions in the library and switch to it at runtime which requires additional logic for each
function. This is not a problem with the OpenCL backend because it compiles the
kernels at runtime so it is optimized for that architecture.
Umar