-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 05.01.2015 03:34, Mike Anderson wrote:
> Very cool stuff!
Like yours! I wish nurokit was EPLed, then I could have had a look at
it and try to include it there. Have libraries like this high
commercial value? I thought the knowledge to apply them and tune them
to the problem is still more expensive, this is why I picked EPL. Also
GPL and EPL don't seem to be compatible due to my recherche (which is
a pity, because I like the GPL).
>
> I notice that you are specialising the RBM to a specific matrix
> implementation (Clatrix / JBlas) in the file "jblas.clj". Are you
> sure you need to do that? Part of the beauty of core.matrix is
> that you should be able to write your algorithms in an
> implementation-independent manner and still get the performance
> benefits of the optimised implementation when you need it.
I started with core.matrix operations and clatrix and then tried to
eliminate all overhead showing up in the VisualVM sampling profiler.
In my experiments the protocol overhead in this inner loop in
`cond-prob-batch` was something like 10% or so, but I am not sure
whether I did something wrong. In the mean time I have benchmarked my
cryptographic hash function, which also uses protocols, and sometimes
I have seen protocol overhead and sometimes not, maybe it was related
to tiered compilation and the JIT sometimes not optimizing it, but
this is only guessing.
If you replace all the jBlas method calls with core.matrix fns in
`cond-prob-batch` (3), which is quick to do, do you see a performance
difference?
I really like core.matrix, or in general sound, light protocols and
then implementations. Yesterday I found an improved fork for clj-hdf5
for instance, which implements some of core.matrix protocols and fixed
that to read double matrices for me, potentially this even allows to
read tensors bigger than memory partially then. (1) So I didn't want
to inline jBlas, but really use core.matrix. This internal inlining
seemed to be some compromise, since it still allows to use clatrix
when dealing with the jblas implementation (otherwise it was just a
mini-batch implementation).
For deep learning most interesting was GPU support in core.matrix for
typical BLAS routines, e.g. with jCuBLAS or clBLAS, but I just
couldn't start work on this yet. You then have to be very careful not
to access some memory, but if this could work with core.matrix
protocols it was a major win.
boltzmann's CPU version is for me 1/3 to 1/4th training speed of
theano (which again is 1/5 of its GPU version on my older gaming
laptop). Theano uses a symbolic compute graph modelled after Python's
numpy API and then emits that either to CPU or GPU (including some
numeric optimizations). I guess my jBlas backend is still slower than
theirs.... netlib-java (2) recommends building a custom version of
ATLAS (for Ubuntu here), have you experience with this? I probably
should do this for clatrix (and also for numpy).
>
> For example, the core.matrix protocols (mmul, add!, add,
> inner-product, transpose etc.) should all call the right Clatrix
> implementation without any noticeable loss of performance (if they
> don't that's an implementation issue in Clatrix... would be good
> to unearth these!).
Indeed! I also missed outer-product, which I have implemented for
jBLAS, as this at some point was taking most of the time, seemingly
falling back on a default implementation of core.matrix including
conversion to default types.
>
> If the core.matrix API is insufficient to implement what you need,
> then I'd love to get issues / PRs (either for core.matrix or
> Clatrix).
Ok. Maybe you can verify that you don't see a significant performance
difference between the clatrix and the jblas version of
cond-prob-batch so I can remove the inlining and the rest should be
able to be patched into clatrix.
Christian
(1)
https://github.com/ghubber/clj-hdf5/tree/develop
(2)
https://github.com/fommil/netlib-java/
(3)
https://github.com/ghubber/boltzmann/blob/master/src/boltzmann/jblas.clj#L33
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAEBAgAGBQJUqvPJAAoJEKel+aujRZMkNyEIAKTUqhsZOWI+17Fk9eZCkvLj
0geoshCHdX0K1A6ZmIGblFuRZ+DuJ6fiP/cO95IxRDfkXnK+cm/FIAJAXxz+U5PB
4+cl6x9x86C8VLL7MwrTR0woiP8sSHmnrbGpeefoj5KFBD03GQ0g0P/5ONFIeYPc
4MNOvFIja8EiHmFph2rOgBXvM3WWtbaibSeRbYkAVyq7jZ7D8sHcmM43Ycg+S0kM
Gfweuc3dzWAShxK8WKOazBiu7T4IPwHHIMZgNiPYNK5jFV6C1NIUrUpyU+fkWbTB
Tz7gm4l8i0zpX/M7yfa2l8r6Hgq6B0wtGXivSeurXyJnLDyHvWKvbzUJzvMACec=
=3TgC
-----END PGP SIGNATURE-----