Improve GPU computing infrastructure for Ruby.

Prasun Anand

unread,

Sep 23, 2017, 4:27:58 AM9/23/17

to SciRuby Development

Hi,

I am working on writing examples for ArrayFire regarding solving real problems.

I would also like to announce a new pet project of mine "RbCUDA". It will provide

better performance and control over GPU hardware.

Repo: https://github.com/prasunanand/rbcuda

The main objective of RbCUDA would be:

1. Map all of CUDA into Ruby

2. Ready-made on-GPU linear algebra, reduction, scan using cuBLAS, cuMath, cuSolver libraries.

3. Random Numer generator using cuRand

4. Near-zero wrapping overhead.

5. CUDA profiler for Ruby.

In the near future:

6. fast-fourier transform(cuFFT)

7. Parallel Primitives and Data Structures(Thrust)

8. Image processing (NVIDIA Performance Primitives Library).

This project will help us lay foundations for Dynamic Neural Network framework and other libraries in pure Ruby.

You can expect a blog series on RbCUDA and ArrayFire examples in the coming weeks :) .

Regards,

Prasun

Prasun Anand

unread,

Sep 29, 2017, 4:38:42 PM9/29/17

to SciRuby Development

Hi,

I have been able to successfully run matrix-multiplication on RbCUDA.

Its 24 times faster than ArrayFire :) .

Benchmark code:

https://github.com/prasunanand/rbcuda/blob/master/examples/matmul.rb

The syntax is very Ruby-like.

An interesting blog post coming soon.

Regards,

Prasun

Kenta Murata

unread,

Oct 1, 2017, 11:02:24 PM10/1/17

to SciRuby Development

What is the reason why ArrayFire is 24x slower than RbCUDA?

Regards,
Kenta Murata

--
You received this message because you are subscribed to the Google Groups "SciRuby Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Prasun Anand

unread,

Oct 2, 2017, 1:52:27 AM10/2/17

to SciRuby Development

Hi,

The time taken for matrix multiplication for RbCUDA is 0.000017s on my machine.

ArrayFire takes 0.000424s. The plain C code takes me 0.000013s for this calculation.

The time difference between RbCUDA and plain C code is due to IO.

The time difference between ArrayFire and RbCUDA is it has different checkpoints in the code

which make it slightly slower.

Hence, we have an overhead of 0.000004s for RbCUDA making it highly efficient Maths library in Ruby.

Regards,

Prasun

Pjotr Prins

unread,

Oct 2, 2017, 2:05:41 AM10/2/17

to sciru...@googlegroups.com

Dear Kenta,

On Mon, Oct 02, 2017 at 12:01:40PM +0900, Kenta Murata wrote:
> What is the reason why ArrayFire is 24x slower than RbCUDA?

ArrayFire is an abstraction of array computing. It represents a
generic case even though it contains its own kernels (I think). Much
of the speed gain going straight to CUDA is probably from removing an
interaction layer (and buffers) as well as how the data is organized
and fed to the underlying architecture.

IF you think about file systems on top of each other you get the idea.
Like running ext4 inside a VM on top of a Linux box running btrfs. It
slows things down incrementally because they are different approaches.

That is why today's VMs have 'pass-through' file systems and why
containers are interesting (they don't buffer). rbCUDA would be a
pass-through. For those that have CUDA support, obviously. Prasun
should take care always to have both options for users.

Pj.

Reply all

Reply to author

Forward