Dear Kenta,
On Mon, Oct 02, 2017 at 12:01:40PM +0900, Kenta Murata wrote:
> What is the reason why ArrayFire is 24x slower than RbCUDA?
ArrayFire is an abstraction of array computing. It represents a
generic case even though it contains its own kernels (I think). Much
of the speed gain going straight to CUDA is probably from removing an
interaction layer (and buffers) as well as how the data is organized
and fed to the underlying architecture.
IF you think about file systems on top of each other you get the idea.
Like running ext4 inside a VM on top of a Linux box running btrfs. It
slows things down incrementally because they are different approaches.
That is why today's VMs have 'pass-through' file systems and why
containers are interesting (they don't buffer). rbCUDA would be a
pass-through. For those that have CUDA support, obviously. Prasun
should take care always to have both options for users.
Pj.