JRuby port of Nmatrix etc.

207 views
Skip to first unread message

Pjotr Prins

unread,
Mar 1, 2016, 11:07:25 AM3/1/16
to sciru...@googlegroups.com
I do hope we get some interest of a JRuby port of Nmatrix etc., so we
can run natively on the JVM. We could start with a pure Ruby
implementation of what is currently in C.

Please, students, I agree the other projects are very important, but
this one has real general impact because the fastest Ruby today is
JRuby - and it is only getting faster. Think magnitudes 10x and more.

A pure Ruby sciruby would have a long term impact.

Pj.

John Woods

unread,
Mar 1, 2016, 11:33:48 AM3/1/16
to sciru...@googlegroups.com
Pjotr, do we know that NMatrix isn't working currently with JRuby? Just curious.

I do think we should have the backend implemented in both Java and C, but I think pure Ruby is going to be really, really slow.

--
You received this message because you are subscribed to the Google Groups "SciRuby Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

John Woods

unread,
Mar 1, 2016, 11:38:56 AM3/1/16
to sciru...@googlegroups.com

Pjotr Prins

unread,
Mar 1, 2016, 12:11:45 PM3/1/16
to sciru...@googlegroups.com
On Tue, Mar 01, 2016 at 04:33:35PM +0000, John Woods wrote:
> I do think we should have the backend implemented in both Java and C, but
> I think pure Ruby is going to be really, really slow.

I meant pure-Ruby as a starting premise. Thereafter point-optimization and
multi-threading become feasible. I encourage students to start with
Ruby.

Pj.

John Woods

unread,
Mar 1, 2016, 3:56:25 PM3/1/16
to sciru...@googlegroups.com
It's going to be very difficult to implement dtypes in pure Ruby. Also, some of the different storagetype-specific code will be tough to reproduce.

Prasun Anand

unread,
Mar 1, 2016, 5:15:48 PM3/1/16
to SciRuby Development
I would like to know how we can start implementing it. Currently , I was trying to build a wrapper over mdarray. But now, it seems that re-implementation of the current backend of Nmatrix in java is the way to go.

The current nmatrix uses lapack, however mdarray uses parallel colt for multi-thread support. Also Java-lapack[0] is available.

Can you suggest me how the roadmap for this project would look like ?

Prasun.

Pjotr Prins

unread,
Mar 1, 2016, 5:30:14 PM3/1/16
to sciru...@googlegroups.com
On Tue, Mar 01, 2016 at 02:15:47PM -0800, Prasun Anand wrote:
> I would like to know how we can start implementing it. Currently , I was
> trying to build a wrapper over mdarray. But now, it seems that
> re-implementation of the current backend of Nmatrix in java is the way to
> go.
> The current nmatrix uses lapack, however mdarray uses parallel colt for
> multi-thread support. Also Java-lapack[0] is available.
> Can you suggest me how the roadmap for this project would look like ?
> Prasun.
> [0][1]http://icl.cs.utk.edu/f2j/

I am not too concerned of the HOW - and I am sure we are open to
suggestions. In my experience most good things happen by trial and
error. So, a road map is a bit too much to ask.

I like the f2j thing, I am not convinced John is right that we can't
map dtypes to unboxed JVM types.

What do you say?

Pj.

Prasun Anand

unread,
Mar 1, 2016, 6:03:24 PM3/1/16
to SciRuby Development
I don't think it is tough to map dtype to unboxed jvm types. While going through mdarray tests , i came across dtypes which have been easily mapped to jvm types .
Mdarray uses NetCDF API[0] for this purpose.The factory method is helpful here for jvm storage.

I am not well-familiarized with this package currently. But this will surely help.

Also, I think we can take a lot of inspiration from mdarray.

Prasun.

Pjotr Prins

unread,
Mar 2, 2016, 2:35:29 AM3/2/16
to sciru...@googlegroups.com, hea...@headius.com

On Tue, Mar 01, 2016 at 03:03:24PM -0800, Prasun Anand wrote:
> I don't think it is tough to map dtype to unboxed jvm types. While going
> through mdarray tests , i came across dtypes which have been easily mapped
> to jvm types .
> Mdarray uses NetCDF API[0] for this purpose.The factory method is helpful
> here for jvm storage.
> I am not well-familiarized with this package currently. But this will
> surely help.
> Also, I think we can take a lot of inspiration from mdarray.
> Prasun.

Dear Prasun (Cc'ing headius, leader of the JRuby project),

If you want to apply for this project it may be an idea to write a
quick proof of concept using a dtype that can run on the JVM. That
would help convince John and help qualify you for participation in one
go :)

To get into GSoC is very competitive and we ask students to do some
work in the application period. This would be a perfect task.

Also, see if we need dtypes at all. At this point a favour a pure Ruby
implementation first - as a base line - and in the next stage we use
whatever is available on the JVM to make things fast, including use
of unboxed dtypes. Be good to get some bench marks.

What many people do not realise at this point is that JRuby compiles
to byte code, making it fast, but also does heavy optimizations before
and after generating byte code (well the latter is done by the JVM).
See one talk by headius I attended at FOSDEM:

https://fosdem.org/2016/schedule/event/jruby9000/attachments/slides/1264/export/events/attachments/jruby9000/slides/1264/Optimizing_JRuby.pdf

This is where JRuby shines and explains the great speed improvements
that you don't see in MRI. Note that even inline eval gets compiled
and optimized these days. Would have saved me some metaprogramming
tricks in some of my gems. And then we get good multi-threading
support thrown in. JRuby+Truffle is also very interesting, we may look
at their FFI support. John et al., see

http://chrisseaton.com/rubytruffle/javaone15/guilt-free-ruby-on-the-jvm.pdf

Anyway, people know I like to have GSoC projects that leap frog the
competition, rather than duplicating what is in Scipy/numpy etc. JRuby
is now *much* faster than python, so we should start to think about
leveraging that power.

Pj.

Sameer Deshmukh

unread,
Mar 2, 2016, 2:58:06 AM3/2/16
to SciRuby Development
How about using the MDArray Java source code and incorporating it into NMatrix as a Java backend?

It can be done in a similar fashion to Nokogiri, which has support for both C and Java backends. Any java specific optimizations can apply to the java codebase only, and we'll have similar plugin naming schemes for both the C and Java implementations. That way users need not be aware of whether they're using C or Java back ends unless they want some advanced functionality like interfaces to a specific library.

Mind you we'll also need a plugin interface similar to the current NMatrix plugin interface for the Java implementation.

Sameer Deshmukh

unread,
Mar 2, 2016, 5:47:09 AM3/2/16
to SciRuby Development
An added advantage of having an array in both Java and C with a uniform Ruby interface under the moniker of 'nmatrix' would be the fact that we can we better engage with the Ruby web community by possibly convincing them to use numerical Arrays in web development stacks and database handles.

Rodrigo Botafogo

unread,
Mar 2, 2016, 4:58:20 PM3/2/16
to SciRuby Development
Just my 2c: I know of two very mature Java array and linear algebra libraries: Apache Commons Math and Colt/Parallel Colt.  I think it is better to use a mature library than trying to migrate NMatrix (not that there is anything wrong with it!).  MDMatrix (part of the MDArray Gem) is based on Parallel Colt, which is considered one of the fastest Java linear algebra libraries, although it does not have much support lately.  Commons Math seems to have a larger community behind it.  So, my suggestion is to use either of those libraries and add a NMatrix common interface.  Wrapping around MDMatrix is probably faster since it already provides templates for the integration with Colt.

Sameer Deshmukh

unread,
Mar 6, 2016, 1:16:09 PM3/6/16
to SciRuby Development
JRuby is immensely interested in such projects!


Maybe we can work with them?

Sameer Deshmukh

unread,
Mar 6, 2016, 1:17:51 PM3/6/16
to SciRuby Development
Pjotr, John - how about adding NMatrix to the list of gems there?

Sameer Deshmukh

unread,
Mar 6, 2016, 1:20:41 PM3/6/16
to SciRuby Development
An idea for this project could be to first create similar looking C and Java extensions, and then using FFI to call them dynamically directly from Ruby. I think it would reduce the amount of scaffolding code we would need, and also keep interfaces sane. Not thought this through though - there might be some edge cases where FFI might fail in case of NMatrix.

John Woods

unread,
Mar 7, 2016, 1:29:58 PM3/7/16
to sciru...@googlegroups.com
By all means, please add NMatrix and list me as a mentor. We should also mention MDArray.

--

Rodrigo Botafogo

unread,
Mar 8, 2016, 8:48:20 AM3/8/16
to SciRuby Development
Unfortunately I´m too busy to be a mentor, but I´m always available to answer questions by e-mail if anyone wants to do anything with MDArray.

Prasun Anand

unread,
Mar 10, 2016, 6:29:01 PM3/10/16
to SciRuby Development, hea...@headius.com
Hi
I wrote a java program to implement nmatrix dtypes( https://gist.github.com/prasunanand/696db86607f64ee9c16a). I used the enumerated data types as in the c core of nmatrix.

Since, Generics in java are just syntactic sugar. So, there is no scope of implementing nmatrix as in the current codebase(use of templates) by using generics( due to type erasure).
Currently , Generics has been loosely implement in my code. Also,in future we have an option of using singleton methods if we still get any issues implementing dtypes.

Next step: I would be profiling the code by calling an arithmetic function on the nmatrix that I built.  Also, I had a look at Apache Common Maths which Rodrigo suggested. I will try to implement some basic arithmetic functions provided by the library.


Prasun

Pjotr Prins

unread,
Mar 11, 2016, 1:46:54 AM3/11/16
to sciru...@googlegroups.com, hea...@headius.com
Dear Prasun,

On Thu, Mar 10, 2016 at 03:29:01PM -0800, Prasun Anand wrote:
> Hi
> I wrote a java program to implement nmatrix dtypes(
> [1]https://gist.github.com/prasunanand/696db86607f64ee9c16a). I used the
> enumerated data types as in the c core of nmatrix.
> Since, Generics in java are just syntactic sugar. So, there is no scope of
> implementing nmatrix as in the current codebase(use of templates) by using
> generics( due to type erasure).
> Currently , Generics has been loosely implement in my code. Also,in future
> we have an option of using singleton methods if we still get any issues
> implementing dtypes.
> Next step: I would be profiling the code by calling an arithmetic function
> on the nmatrix that I built.  Also, I had a look at Apache Common Maths
> which Rodrigo suggested. I will try to implement some basic arithmetic
> functions provided by the library.

Great start. Would be good to have some functional Ruby code and
execute performance analysis. You can add those results to your
proposal.

Pj.

Charles Nutter

unread,
Mar 11, 2016, 9:35:10 AM3/11/16
to SciRuby Development, hea...@headius.com
Sounds like you guys have the situation in hand. I am willing and able to help with JRuby-specifics: how to write an extension, how to load it and build Ruby code around it, etc. Keep me posted...I'd really like to see this happen!

- Charlie

Prasun Anand

unread,
Mar 15, 2016, 6:21:35 PM3/15/16
to SciRuby Development, hea...@headius.com
Hi

I have implemented matrix addition, subtraction and multiplication in my Nmatrix code using Apache Maths Commons. The code is here https://github.com/prasunanand/jnmatrix  .

I have also benchmarked the program and the results are here https://gist.github.com/prasunanand/2ccfa69803dafd995a04  .

Next step: I will be implementing Decomposition Algorithms[0] to solve Linear Equations,

Regards
Prasun

Pjotr Prins

unread,
Mar 15, 2016, 10:05:51 PM3/15/16
to sciru...@googlegroups.com, hea...@headius.com
Thanks Prasun,

Good job.

So, I gather that matrix addition and subtraction with JRuby is already faster
than Ruby 2.2.1, but multiplication is ~9x slower. Not bad for a first
attempt :).

Rather than moving on to new algorithms I would like you to see if we
can make this multiplication a bit faster. Maybe Headius can give some
advice?

Also the benchmarks could use a little graph so it is easier to
interpret. These figures you can use in your application! So, put in
the effort.

For me Prasun's work is proof we can make a SciRuby port work using
equivalent JVM structures. Anyone any comments?

Pj.

On Tue, Mar 15, 2016 at 03:21:35PM -0700, Prasun Anand wrote:
> Hi
> I have implemented matrix addition, subtraction and multiplication in my
> Nmatrix code using Apache Maths Commons. The code is
> here [1]https://github.com/prasunanand/jnmatrix  .
> I have also benchmarked the program and the results are
> here[2] https://gist.github.com/prasunanand/2ccfa69803dafd995a04  .
> Next step: I will be implementing Decomposition Algorithms[0] to solve
> Linear Equations,
> Regards
> Prasun
> [0] [3]https://commons.apache.org/proper/commons-math/javadocs/api-3.6/index.html
>
> On Friday, March 11, 2016 at 12:16:54 PM UTC+5:30, Pjotr Prins wrote:
>
> Dear Prasun,
>
> On Thu, Mar 10, 2016 at 03:29:01PM -0800, Prasun Anand wrote:
> >    Hi
> >    I wrote a java program to implement nmatrix dtypes(
> >    [1][4]https://gist.github.com/prasunanand/696db86607f64ee9c16a). I
> used the
> >    enumerated data types as in the c core of nmatrix.
> >    Since, Generics in java are just syntactic sugar. So, there is no
> scope of
> >    implementing nmatrix as in the current codebase(use of templates)
> by using
> >    generics( due to type erasure).
> >    Currently , Generics has been loosely implement in my code. Also,in
> future
> >    we have an option of using singleton methods if we still get any
> issues
> >    implementing dtypes.
> >    Next step: I would be profiling the code by calling an arithmetic
> function
> >    on the nmatrix that I built.  Also, I had a look at Apache Common
> Maths
> >    which Rodrigo suggested. I will try to implement some basic
> arithmetic
> >    functions provided by the library.
>
> Great start. Would be good to have some functional Ruby code and
> execute performance analysis. You can add those results to your
> proposal.
>
> Pj.
>
> --
> You received this message because you are subscribed to the Google Groups
> "SciRuby Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [5]sciruby-dev...@googlegroups.com.
> For more options, visit [6]https://groups.google.com/d/optout.
>
> References
>
> Visible links
> 1. https://github.com/prasunanand/jnmatrix
> 2. https://gist.github.com/prasunanand/2ccfa69803dafd995a04
> 3. https://commons.apache.org/proper/commons-math/javadocs/api-3.6/index.html
> 4. https://gist.github.com/prasunanand/696db86607f64ee9c16a
> 5. mailto:sciruby-dev...@googlegroups.com
> 6. https://groups.google.com/d/optout


--

Francesco Strozzi

unread,
Mar 18, 2016, 3:08:24 PM3/18/16
to sciru...@googlegroups.com, hea...@headius.com
Hi guys,
This looks great and very promising, I'm looking forward to see students applications on the topic. The Apache common library and colt library are a good starting point, I agree with Pjotr let's start looking at basic operations and improving them where possible before moving on to complex algorithms. 

Fra



To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Prasun Anand

unread,
Mar 20, 2016, 5:35:07 AM3/20/16
to SciRuby Development

Regarding making matrix multiplication a bit faster, I am not sure what is the right question to ask to @headius because its Commons Math that is slow in matrix multiplication. I have profiled the benchmark program for arithmetic operations on 1000,000 elements:


E:\jnmatrix\benchmarking>jruby --profile elements1,000,000.rb
Profiling enabled; ^C shutdown will now dump profile info
#<Java::Nmatrix::Jnmatrix:0x6d7b4f4c>
#<Java::Nmatrix::Jnmatrix:0x3108bc>
Benchmarking for Addition
0.327000 0.000000 0.327000 ( 0.327373)
Benchmarking for Subtraction
0.895000 0.000000 0.895000 ( 0.894524)
Benchmarking for Multiplication
9.604000 0.000000 9.604000 ( 9.612699)

main profile results:
Total time: 18.37

total self children calls method
----------------------------------------------------------------
10.84 0.00 10.84 6 Benchmark.measure
9.61 0.00 9.61 1 Object#multiply
9.47 9.47 0.00 1 Java::Nmatrix::Linear#matrixMultiplicationFunction
5.48 0.04 5.44 53 Kernel.require
4.95 0.00 4.95 8 Kernel.require
1.83 0.00 1.83 148 Class#new
1.64 0.67 0.97 2 Array#initialize
1.29 1.29 0.00 6 Java::Nmatrix::Linear#matrixCreation
0.97 0.87 0.10 2000000 Kernel.rand
0.89 0.00 0.89 1 Object#subtract
0.75 0.08 0.68 27 Kernel.load
0.33 0.00 0.33 1 Object#add
0.23 0.01 0.22 58 Array#each
0.22 0.00 0.22 1 Gem::Specification.load_defaults
0.22 0.00 0.22 1 Gem::Specification.each_spec
0.22 0.00 0.22 1 Gem::Specification.each_gemspec
0.17 0.00 0.17 10 Gem::Specification.load
0.13 0.00 0.13 2 Nmatrix#initialize
0.13 0.00 0.13 5 ConcreteJavaProxy.new
0.13 0.00 0.13 5 ConcreteJavaProxy#initialize
0.13 0.13 0.00 2 Java::Nmatrix::Jnmatrix#__jcreate!
0.10 0.10 0.00 2000000 Fixnum#-
0.08 0.08 0.00 10 IO.read
0.08 0.02 0.06 12 Kernel.eval
0.06 0.00 0.06 10 Gem::Specification#initialize
0.04 0.00 0.04 10 Gem.register_default_spec
0.04 0.04 0.00 1 Java::Nmatrix::Linear#matrixAdditionFunction
0.03 0.01 0.03 3 Object#java_import
0.03 0.03 0.00 1 Java::Nmatrix::Linear#matrixSubtractionFunction
0.03 0.00 0.03 27 Array#map
0.03 0.00 0.03 1 JRuby.runtime
0.02 0.00 0.02 30 Gem::Specification#add_development_dependency
0.02 0.01 0.01 33 Gem::Specification#add_dependency_with_type
0.02 0.02 0.00 64 String#=~
0.02 0.02 0.00 12 JavaUtilities.get_proxy_or_package_under_package
0.02 0.02 0.00 970 String#sub
0.02 0.02 0.00 1 Time.now
0.02 0.00 0.02 9 IO#puts
0.02 0.00 0.02 8 Kernel.puts
0.02 0.02 0.00 3 JavaUtilities.get_proxy_class
0.02 0.00 0.01 1 Gem.win_platform?
0.02 0.01 0.00 1 JRuby.reference0
0.01 0.00 0.01 1 Enumerable.find
0.01 0.00 0.01 5 Java::Java.method_missing
0.01 0.01 0.00 15 IO#write
0.01 0.00 0.01 2 Java::OrgJruby::RubyBasicObject#getRuntime
0.01 0.00 0.01 48 Gem::Requirement#initialize
0.01 0.00 0.01 33 Gem::Dependency#initialize
0.01 0.00 0.01 10 Gem::Specification#files
0.01 0.00 0.01 48 Array#map!

I also tried matrix multiplication directly on Java and got similar results(9.6 seconds for matrix multiplication). What do you suggest?
Initially, when I tried to reach out to Rodrigo regarding creating an mdarray wrapper for Nmatrix, he suggested me to implement multi-threading by writing "the parallel code as a block (closure) in java 8. Then use a library such as Aparapi (or any other) that supports GPU to allow matrix operations on the GPU ."
Should I try to work on it?
I have the following algorithm in mind:
If we have M*N and N*P matrix
start M threads ==> Result[thread][] = 1*N + N*P
return Result

Rodrigo Botafogo

unread,
Mar 21, 2016, 9:47:05 AM3/21/16
to SciRuby Development
Prasun,

I guess that if you are getting similar results for Java multiplication and Commons Math, then this is probably a problem with Java and not the library.  Maybe a Java expert (Headius being one of them) could give hints as why matrix multiplication in Java is so much slower and have some idea on how to improve this matter.

May I suggest that you do a benchmarking also of MDArray/MDMatrix?  MDArray uses Java multiplication and MDMatrix uses Parallel Colt.  With MDMatrix we should have some improvement on multiplication, but limited to the number of cores in your computer.  Parallel Colt is known to be one of the fastest linear algebra libraries in Java, so, maybe they use some techniques to improve multiplication. If you get speed ups of more than the number of cores, this should be a place to look at.

Finally, regarding GPU, I think this is a great project; however, as I told you, I have no idea of what kind of road blocks you´ll find in the way.  This might not be an ideal project for GSOC, unless you find a knowledgeable mentor that can help you define your work schedule.

Good luck,

Prasun Anand

unread,
Mar 21, 2016, 11:39:52 AM3/21/16
to sciru...@googlegroups.com
Rodrigo,

I was able to improve the performance drastically by just changing the data container provided by Commons-Math. Now I can multiply 1000*1000 matrix in 0.896 seconds compared to 9.5 seconds, just when the JVM has not even warmed up :=). I solved the issue. I am working on JRuby implementation now.

JRuby is indeed very faster than MRI. I will share the code soon.

When the JVM has warmed up, it is even faster.

I am still working on my timeline for GSOC. 

Thanks and Regards
Prasun


--
You received this message because you are subscribed to a topic in the Google Groups "SciRuby Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sciruby-dev/tqtVFVvdyjU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sciruby-dev...@googlegroups.com.

Pjotr Prins

unread,
Mar 21, 2016, 1:14:38 PM3/21/16
to sciru...@googlegroups.com
Cool :)

On Mon, Mar 21, 2016 at 09:09:11PM +0530, Prasun Anand wrote:
> Rodrigo,
> I was able to improve the performance drastically by just changing the
> data container provided by Commons-Math. Now I can multiply 1000*1000
> matrix in 0.896 seconds compared to 9.5 seconds, just when the JVM has not
> even warmed up :=). I solved the issue. I am working on JRuby
> implementation now.
> JRuby is indeed very faster than MRI. I will share the code soon.
> When the JVM has warmed up, it is even faster.
> I am still working on my timeline for GSOC. 
> Thanks and Regards
> Prasun
> [2]https://groups.google.com/d/topic/sciruby-dev/tqtVFVvdyjU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [3]sciruby-dev...@googlegroups.com.
> For more options, visit [4]https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "SciRuby Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [5]sciruby-dev...@googlegroups.com.
> For more options, visit [6]https://groups.google.com/d/optout.
>
> References
>
> Visible links
> 1. mailto:rodrigo.a...@gmail.com
> 2. https://groups.google.com/d/topic/sciruby-dev/tqtVFVvdyjU/unsubscribe
> 3. mailto:sciruby-dev...@googlegroups.com
> 4. https://groups.google.com/d/optout

Prasun Anand

unread,
Mar 21, 2016, 1:22:17 PM3/21/16
to SciRuby Development, hea...@headius.com
Hi
I have benchmarked the Nmatrix on ruby versus jruby.
Overall, JRuby is faster than Ruby because I have not considered the warmup time for JVM, to optimise it. After warmup, the computation speed increases even more for JRuby.

This chart is plotted from the results.
 I have pushed the new code to git.
Auto Generated Inline Image 1
Auto Generated Inline Image 2
Auto Generated Inline Image 3

John Woods

unread,
Mar 21, 2016, 5:00:18 PM3/21/16
to SciRuby Development, hea...@headius.com
Is that also true when using nmatrix-atlas? I'm curious to see how JRuby compares to the ATLAS implementation.

--
You received this message because you are subscribed to the Google Groups "SciRuby Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Prasun Anand

unread,
Mar 21, 2016, 6:22:20 PM3/21/16
to SciRuby Development, hea...@headius.com
Hi John

NMatrix-ATLAS is amazingly faster than my JRuby Implementation for Nmatrix :) .

I have benchmarked it . The results are here https://gist.github.com/prasunanand/bd9cb2c4e45d625b4bfc

I have plotted the charts

Jruby is slightly faster in addition and subtraction. But Nmatrix-ATLAS is the clear winner. 
I found this article http://blog.mikiobraun.de/2009/04/some-benchmark-numbers-for-jblas.html that supports the benchmarking. The only thing that can compete with NMatrix-ATLAS is NMatrix JBlas(http://jblas.org/). I can implement it but I am not sure how long it will take me.
Should I add it to my proposal?
What do you suggest?

Regards
Prasun
Auto Generated Inline Image 1
Auto Generated Inline Image 2
Auto Generated Inline Image 3

Pjotr Prins

unread,
Mar 21, 2016, 7:00:49 PM3/21/16
to sciru...@googlegroups.com, hea...@headius.com
On Mon, Mar 21, 2016 at 03:22:20PM -0700, Prasun Anand wrote:
> Hi John
> NMatrix-ATLAS is amazingly faster than my JRuby Implementation for Nmatrix
> :) .

:)

> I have benchmarked it . The results are
> here [1]https://gist.github.com/prasunanand/bd9cb2c4e45d625b4bfc
> I have plotted the charts
> Jruby is slightly faster in addition and subtraction. But Nmatrix-ATLAS is
> the clear winner. 
> I found this
> article [2]http://blog.mikiobraun.de/2009/04/some-benchmark-numbers-for-jblas.html that
> supports the benchmarking. The only thing that can compete with
> NMatrix-ATLAS is NMatrix JBlas([3]http://jblas.org/). I can implement it
> but I am not sure how long it will take me.
> Should I add it to my proposal?

yes.

Pj.

Prasun Anand

unread,
Mar 21, 2016, 9:10:46 PM3/21/16
to SciRuby Development, hea...@headius.com
I have implemented jblas for addition, subtraction and matrix multiplication. I have benchmarked the results.
NMatrix-JBLAS is close to NMatrix-ATLAS but still NMatrix-ATLAS is the clear winner.
The charts are


Any suggestions, how we can still improve it?

Prasun
Auto Generated Inline Image 1
Auto Generated Inline Image 2
Auto Generated Inline Image 3

John Woods

unread,
Mar 21, 2016, 9:43:29 PM3/21/16
to sciru...@googlegroups.com, hea...@headius.com
Those numbers make me skeptical of some of the others. ATLAS shouldn't change the results for addition/subtraction. It is good to know, though, that matrix multiplication is working well.

What would be the advantages and disadvantages of adding jblas support?

--
You received this message because you are subscribed to the Google Groups "SciRuby Development" group.

Charles Oliver Nutter

unread,
Mar 21, 2016, 10:54:26 PM3/21/16
to Prasun Anand, SciRuby Development

Hello!

I do not have my machine handy but wanted to offer some suggestions.

I'm not familiar with the algorithms being used here so I can't comment on that. However I can suggest some ways to investigate perf.

Firstly, --profile will only report timings for methods called from Ruby. If you pass --sample you can enable the JVM sampling profiler, which will let us see in more detail where any bottlenecks are. If that doesn't look conclusive we can also try to turn on instrumented (full timing) profiling with a few different JVM flags.

We may also want to see if there's any allocation happening along the hot paths, since that would be a big hit to perf. I wouldn't expect matrix-library authors to make that mistake, though.

I'm not sure I saw warmed up results, and that could change many things too. The fact that JRuby's performance is pretty linear makes me suspect that JIT is not kicking in, and I would generally be surprised to see it kick in for subsecond benchmarks. Pass -Xjit.logging=true to show JRuby JIT and -J-XX:+PrintCompilation to see JVM JIT. We would want to see the hottest methods (e.g. the top items in a sampled profile) getting jitted by JRuby (if Ruby) or JVM (if Java or JRuby-jitted Ruby).

Check GC with -J-XX:+PrintGCDetails.

- Charlie (mobile)

On Mar 22, 2016 06:40, "Prasun Anand" <prasunan...@gmail.com> wrote:
>
> I have implemented jblas for addition, subtraction and matrix multiplication. I have benchmarked the results.
> NMatrix-JBLAS is close to NMatrix-ATLAS but still NMatrix-ATLAS is the clear winner.
> The charts are
>
>

Sameer Deshmukh

unread,
Mar 22, 2016, 4:54:32 AM3/22/16
to SciRuby Development, prasunan...@gmail.com
Prasun,

Can you post the code you used for profiling and plotting?

Prasun Anand

unread,
Mar 22, 2016, 5:15:38 AM3/22/16
to SciRuby Development

Pjotr Prins

unread,
Mar 22, 2016, 8:55:04 AM3/22/16
to sciru...@googlegroups.com
I think headius has a good point that you should use larger matrices
for testing and see when then JIT kicks in. It needs warming up, i.e.
a routine needs to get called (say) a 1000 times. That is also when
performance starts to matter (30 seconds instead of 3 minutes makes a
difference).

Pj.

On Tue, Mar 22, 2016 at 02:15:38AM -0700, Prasun Anand wrote:
> https://github.com/prasunanand/jnmatrix
>
> --
> You received this message because you are subscribed to the Google Groups "SciRuby Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sciruby-dev...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.


--

Prasun Anand

unread,
Mar 22, 2016, 9:23:31 AM3/22/16
to SciRuby Development, prasunan...@gmail.com
Charles

I agree that we didn't see the warmed up results. I wasn't aware of how to improve JIT warmup. 
I will follow these guidelines and will share the results.

Thanks for your suggestions and feedback
Prasun

On Tuesday, March 22, 2016 at 8:24:26 AM UTC+5:30, Charles Nutter wrote:

Sameer Deshmukh

unread,
Jun 2, 2016, 4:00:48 PM6/2/16
to SciRuby Development, prasunan...@gmail.com
Hey Prasun,

Can you elaborate on how exactly you plotted those graphs for benchmarking libraries in the above posts? Did you write any code for it or did you have to manually pick values and put them in the graph?
Reply all
Reply to author
Forward
0 new messages