ANN: boltzmann 0.1.1 - a deep-learning library

Christian Weilbach

unread,

Jan 4, 2015, 6:07:11 PM1/4/15

to clo...@googlegroups.com, numerica...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

- From the README:

This library is supposed to implement Boltzmann Machines, Autoencoders
and related deep learning technologies. All implementations should
both have a clean high-level mathematical implementation of their
algorithms (with core.matrix) and if possible, an optimized and
benchmarked version of the core routines for production use. This is
to facilitate learning for new users or potential contributors, to be
able to implement algorithms from papers/other languages and then tune
them for performance if needed.

This repository is supposed to cover techniques building on Restricted
Boltzmann Machines, like Deep Belief Networks, Deep Boltzmann Machines
or temporal extensions thereof as well as Autoencoders (which I am not
familiar enough with yet). Classical back-propagation is also often
used to fine-tune deep models supervisedly, so networks should support
it as well.

I haven't build myself deep belief networks out of it yet, but this
should be fairly straightforward. Also combination with the usual
linear classifiers (logistic regression, SVM) at the top layer can be
explored. If somebody has interest/experience in/with implementing
standard backpropagation, go ahead and open a pull-request :-).

Christian
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUqceeAAoJEKel+aujRZMkJHoIAKkAbgZjvs9pzmJjzJf5Y1sg
EQCwf7W6Vrz0rvDtrkSiRNO+rmSEL4TpWPPlHLTYWs781Wrz9FRmkmHzR0mZ8izT
kWsQ3rP4TjDUDiB8S34CQxA15YLRfbvIxVv2JBfkGBWo64NHSrNUxz+Dfvu2jzbi
at614o/T5lZQ6qzkyputYwzOocX58AcnCtfXDVO2UJt8RU/q33FVugjtXtvsDxgM
AOO4WnW6mzYvLUbrhksDjuLShhs2EoCMB54cB2W5ejz+6X3oFeF/xndFqtNYdwPF
d13q60Ex0s/IqIo3mOwB/O1rOnsBHxiQ6nuSaphMAm7jJF9wHtDaXHWRZHa2RTg=
=BjnJ
-----END PGP SIGNATURE-----

Christian Weilbach

unread,

Jan 4, 2015, 6:09:19 PM1/4/15

to numerica...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Should add the actual link as well ;-):

https://github.com/ghubber/boltzmann

On 05.01.2015 00:07, Christian Weilbach wrote:
> Hi all,

iQEcBAEBAgAGBQJUqcgdAAoJEKel+aujRZMkqQwIAMDFtAv3BggIF+2SmoDm6Int
/PsxJIxcHgDblDew4Uqtb+hQGOw1rkqAt4gPRKgifr66ZuT/iMNfYV+athXaKH1P
1DX9KFg6GLVGCJ0t7x0yDD4rBuXgG7zHtWiKcAMefgZrTv6ryhahADm7Dotrbxz8
2X2tTYVpLs9nawRA3Prxel2lBMlyd/omfLao5CsgvqZeLw/iWGxZnUVFQz3QrQSO
JR0ljkK+xXmZiApBO0xidO90sZJ5/Kz6NDRdyBkjzII6QCtb0uKkRS2/8z6mxkCt
08jwB9HdI0bzckBsGbjKjxeKGGGDZXFJ8zs6nYRI8JxsEbl5tH5qcv/gtW6tFuo=
=Ccih
-----END PGP SIGNATURE-----

Shriphani Palakodety

unread,

Jan 4, 2015, 6:23:56 PM1/4/15

to numerica...@googlegroups.com

Very interesting. Two questions:

Have you compared this to deeplearning4j?

Any particular reason for this restriction to energy-based models? Theano for example allows me to specify computations ranging from deep-nets to RBMs and so on.

Cheers,

Shriphani

--
You received this message because you are subscribed to the Google Groups "Numerical Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to numerical-clojure+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mike Anderson

unread,

Jan 4, 2015, 9:34:41 PM1/4/15

to numerica...@googlegroups.com, clo...@googlegroups.com

Very cool stuff!

I notice that you are specialising the RBM to a specific matrix implementation (Clatrix / JBlas) in the file "jblas.clj". Are you sure you need to do that? Part of the beauty of core.matrix is that you should be able to write your algorithms in an implementation-independent manner and still get the performance benefits of the optimised implementation when you need it.

For example, the core.matrix protocols (mmul, add!, add, inner-product, transpose etc.) should all call the right Clatrix implementation without any noticeable loss of performance (if they don't that's an implementation issue in Clatrix... would be good to unearth these!).

If the core.matrix API is insufficient to implement what you need, then I'd love to get issues / PRs (either for core.matrix or Clatrix).

Christian Weilbach

unread,

Jan 5, 2015, 3:27:55 PM1/5/15

to clo...@googlegroups.com, numerica...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05.01.2015 03:34, Mike Anderson wrote:
> Very cool stuff!

Like yours! I wish nurokit was EPLed, then I could have had a look at
it and try to include it there. Have libraries like this high
commercial value? I thought the knowledge to apply them and tune them
to the problem is still more expensive, this is why I picked EPL. Also
GPL and EPL don't seem to be compatible due to my recherche (which is
a pity, because I like the GPL).

>
> I notice that you are specialising the RBM to a specific matrix
> implementation (Clatrix / JBlas) in the file "jblas.clj". Are you
> sure you need to do that? Part of the beauty of core.matrix is
> that you should be able to write your algorithms in an
> implementation-independent manner and still get the performance
> benefits of the optimised implementation when you need it.

I started with core.matrix operations and clatrix and then tried to
eliminate all overhead showing up in the VisualVM sampling profiler.
In my experiments the protocol overhead in this inner loop in
`cond-prob-batch` was something like 10% or so, but I am not sure
whether I did something wrong. In the mean time I have benchmarked my
cryptographic hash function, which also uses protocols, and sometimes
I have seen protocol overhead and sometimes not, maybe it was related
to tiered compilation and the JIT sometimes not optimizing it, but
this is only guessing.

If you replace all the jBlas method calls with core.matrix fns in
`cond-prob-batch` (3), which is quick to do, do you see a performance
difference?

I really like core.matrix, or in general sound, light protocols and
then implementations. Yesterday I found an improved fork for clj-hdf5
for instance, which implements some of core.matrix protocols and fixed
that to read double matrices for me, potentially this even allows to
read tensors bigger than memory partially then. (1) So I didn't want
to inline jBlas, but really use core.matrix. This internal inlining
seemed to be some compromise, since it still allows to use clatrix
when dealing with the jblas implementation (otherwise it was just a
mini-batch implementation).

For deep learning most interesting was GPU support in core.matrix for
typical BLAS routines, e.g. with jCuBLAS or clBLAS, but I just
couldn't start work on this yet. You then have to be very careful not
to access some memory, but if this could work with core.matrix
protocols it was a major win.
boltzmann's CPU version is for me 1/3 to 1/4th training speed of
theano (which again is 1/5 of its GPU version on my older gaming
laptop). Theano uses a symbolic compute graph modelled after Python's
numpy API and then emits that either to CPU or GPU (including some
numeric optimizations). I guess my jBlas backend is still slower than
theirs.... netlib-java (2) recommends building a custom version of
ATLAS (for Ubuntu here), have you experience with this? I probably
should do this for clatrix (and also for numpy).

>
> For example, the core.matrix protocols (mmul, add!, add,
> inner-product, transpose etc.) should all call the right Clatrix
> implementation without any noticeable loss of performance (if they
> don't that's an implementation issue in Clatrix... would be good
> to unearth these!).

Indeed! I also missed outer-product, which I have implemented for
jBLAS, as this at some point was taking most of the time, seemingly
falling back on a default implementation of core.matrix including
conversion to default types.

>
> If the core.matrix API is insufficient to implement what you need,
> then I'd love to get issues / PRs (either for core.matrix or
> Clatrix).

Ok. Maybe you can verify that you don't see a significant performance
difference between the clatrix and the jblas version of
cond-prob-batch so I can remove the inlining and the rest should be
able to be patched into clatrix.

Christian

(1) https://github.com/ghubber/clj-hdf5/tree/develop
(2) https://github.com/fommil/netlib-java/
(3)
https://github.com/ghubber/boltzmann/blob/master/src/boltzmann/jblas.clj#L33

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUqvPJAAoJEKel+aujRZMkNyEIAKTUqhsZOWI+17Fk9eZCkvLj
0geoshCHdX0K1A6ZmIGblFuRZ+DuJ6fiP/cO95IxRDfkXnK+cm/FIAJAXxz+U5PB
4+cl6x9x86C8VLL7MwrTR0woiP8sSHmnrbGpeefoj5KFBD03GQ0g0P/5ONFIeYPc
4MNOvFIja8EiHmFph2rOgBXvM3WWtbaibSeRbYkAVyq7jZ7D8sHcmM43Ycg+S0kM
Gfweuc3dzWAShxK8WKOazBiu7T4IPwHHIMZgNiPYNK5jFV6C1NIUrUpyU+fkWbTB
Tz7gm4l8i0zpX/M7yfa2l8r6Hgq6B0wtGXivSeurXyJnLDyHvWKvbzUJzvMACec=
=3TgC
-----END PGP SIGNATURE-----

Mike Anderson

unread,

Jan 5, 2015, 11:04:54 PM1/5/15

to numerica...@googlegroups.com, clo...@googlegroups.com

On Tuesday, 6 January 2015 04:27:55 UTC+8, Christian Weilbach wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 05.01.2015 03:34, Mike Anderson wrote:
> Very cool stuff!

Like yours! I wish nurokit was EPLed, then I could have had a look at
it and try to include it there. Have libraries like this high
commercial value? I thought the knowledge to apply them and tune them
to the problem is still more expensive, this is why I picked EPL. Also
GPL and EPL don't seem to be compatible due to my recherche (which is
a pity, because I like the GPL).

I think there isn't much commercial value in the library itself - there are many free libraries for machine learning that work just fine. Nobody with enough skill to use the library is going to pay you for something they can get for free.

The commercial value is all around :

- Building a solution that solves a business problem *using* the library

- Integrating with other applications/services (Clojure shines here because of the JVM ecosystem)

- Professional services / consulting

>
> I notice that you are specialising the RBM to a specific matrix
> implementation (Clatrix / JBlas) in the file "jblas.clj". Are you
> sure you need to do that? Part of the beauty of core.matrix is
> that you should be able to write your algorithms in an
> implementation-independent manner and still get the performance
> benefits of the optimised implementation when you need it.

I started with core.matrix operations and clatrix and then tried to
eliminate all overhead showing up in the VisualVM sampling profiler.
In my experiments the protocol overhead in this inner loop in
`cond-prob-batch` was something like 10% or so, but I am not sure
whether I did something wrong. In the mean time I have benchmarked my
cryptographic hash function, which also uses protocols, and sometimes
I have seen protocol overhead and sometimes not, maybe it was related
to tiered compilation and the JIT sometimes not optimizing it, but
this is only guessing.

10% protocol overhead sounds like you must be doing quite a lot of protocol calls.

The usual trick to minimise this is to ensure that a single protocol call does a lot of work (i.e. work on whole arrays at a time rather than individual elements). If you do that, then the protocol overhead should be negligible.

If you replace all the jBlas method calls with core.matrix fns in
`cond-prob-batch` (3), which is quick to do, do you see a performance
difference?

I really like core.matrix, or in general sound, light protocols and
then implementations. Yesterday I found an improved fork for clj-hdf5
for instance, which implements some of core.matrix protocols and fixed
that to read double matrices for me, potentially this even allows to
read tensors bigger than memory partially then. (1) So I didn't want
to inline jBlas, but really use core.matrix. This internal inlining
seemed to be some compromise, since it still allows to use clatrix
when dealing with the jblas implementation (otherwise it was just a
mini-batch implementation).

For deep learning most interesting was GPU support in core.matrix for
typical BLAS routines, e.g. with jCuBLAS or clBLAS, but I just
couldn't start work on this yet. You then have to be very careful not
to access some memory, but if this could work with core.matrix
protocols it was a major win.

It should certainly be possible to wrap GPU matrix support in a core.matrix implementation, indeed I think there have been a couple of "proof of concept" attempts already.

I personally have in the back of my mind a GPU-accelerated extension to Vectorz (i.e. GPU-backed subclasses of AMatrix and AVector), using something like jCuBLAS. Then the full core.matrix support would come for free via vectorz-clj. Would possibly be the easiest way to get comprehensive GPU array programming support in Clojure.

boltzmann's CPU version is for me 1/3 to 1/4th training speed of
theano (which again is 1/5 of its GPU version on my older gaming
laptop). Theano uses a symbolic compute graph modelled after Python's
numpy API and then emits that either to CPU or GPU (including some
numeric optimizations). I guess my jBlas backend is still slower than
theirs.... netlib-java (2) recommends building a custom version of
ATLAS (for Ubuntu here), have you experience with this? I probably
should do this for clatrix (and also for numpy).

Not really - I generally do pure-JVM stuff (vectorz-clj etc.).

Would be interested to see how vectorz-clj stacks up to Clatrix / Blas if you get an opportunity to benchmark this (matrix multiplication is probably worse since BLAS shines here, but most other operations I believe are much faster with vectorz). vectorz-clj has definitely had far more optimisation work than Clatrix.

>
> For example, the core.matrix protocols (mmul, add!, add,
> inner-product, transpose etc.) should all call the right Clatrix
> implementation without any noticeable loss of performance (if they
> don't that's an implementation issue in Clatrix... would be good
> to unearth these!).

Indeed! I also missed outer-product, which I have implemented for
jBLAS, as this at some point was taking most of the time, seemingly
falling back on a default implementation of core.matrix including
conversion to default types.

outer-product is tricky because results require higher dimensional arrays - which JBlas doesn't support sadly. outer-product is another operation that I think is much better in vectorz-clj.

white...@polyc0l0r.net

unread,

Jan 7, 2015, 2:28:23 PM1/7/15

to numerica...@googlegroups.com

Am Montag, 5. Januar 2015 00:23:56 UTC+1 schrieb Shriphani Palakodety:

Very interesting. Two questions:

Have you compared this to deeplearning4j?

I have had a look at deeplearning4j briefly a few months ago. But if frameworks make little sense in Clojure in general, then they make even less sense imo for machine learning. The Theano deeplearning.net examples repository for instance only shares little (in the examples I had a look into, only RBM parts for higher-level networks), theano itself is only an optimized numpy-like library (a bit like core.matrix). So from a pure user perspective I don't think that frameworks help you to integrate machine-learning nicely in your application, even if there are easy examples and APIs to get you started (also with enclog for instance). You can of course cherry pick features from java libraries, I used clojure nlp wrapping java stuff like that and it worked nicely. But in Python you also often have libraries for specific techniques and then maybe you can integrate them in some bigger concept. But Clojure isn't there yet imo. Incanter for instance feels still too heavy for me. First some protocols like core.matrix need to be factored out and shared imo.

The problem with deep-learning is that you quickly need to tune the algorithm either to improve quality or to make it fast, especially in deep learning, where you need a lot of data and a lot of parameters (read: big matrices). And then Clojure imo really shines. I have worked in Python mostly, but it is no match to Clojure's expressiveness when it comes to math and defining algorithms as well as the possibility at the same time to reach down the stack and directly use jblas/.... (it has much better libraries though). Something like Theano, which still feels alien in Python because it builds a symbolic compute graph, was natural in Clojure's s-expressions, which can be optimized numerically separately (instead of opaquely in the compute pipeline). Building it in Java like deeplearning4j does, imo gives you no advantages, because Java is too slow for big matrx multiplications (so you need to call jblas/... anyway) and gives you no mathematical expression of functions because of its unwieldy syntax and it also gives you no live coding. Having said all that, you can go ahead and implement/extend boltzmann's very few protocols with deeplearning4j, I would really appreciate improving on them and making it possible to use state-of-the art implementations, e.g. of backpropagation. I just think pure Clojure implementations of ml-algorithms have advantages over Java ones, because they can be hacked better.

Any particular reason for this restriction to energy-based models? Theano for example allows me to specify computations ranging from deep-nets to RBMs and so on.

Not necessarily, but it was for the same reason as above, I don't want to deal with anything that can be considered deep (because this just means multiple layers of models), but models which behave like a Boltzmann machine, in general that means typical neural networks with sigmoid activation function, because I know the algorithms for them a bit and think that common protocols can be factored without killing performance or bloating them with unrelated details of implementations. If you think something useful and concrete is excluded because of the definition, then I will fix it.

Cheers,
Shriphani

Cheers,
Christian

Christian Weilbach

unread,

Feb 10, 2015, 10:06:29 AM2/10/15

to clo...@googlegroups.com, numerica...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hey,

On 06.01.2015 05:04, Mike Anderson wrote:
> On Tuesday, 6 January 2015 04:27:55 UTC+8, Christian Weilbach
> wrote:
>>
> On 05.01.2015 03:34, Mike Anderson wrote:
>>>> Very cool stuff!
>
> Like yours! I wish nurokit was EPLed, then I could have had a look
> at it and try to include it there. Have libraries like this high
> commercial value? I thought the knowledge to apply them and tune
> them to the problem is still more expensive, this is why I picked
> EPL. Also GPL and EPL don't seem to be compatible due to my
> recherche (which is a pity, because I like the GPL).
>
>
>> I think there isn't much commercial value in the library itself -
>> there are many free libraries for machine learning that work just
>> fine. Nobody with enough skill to use the library is going to pay
>> you for something they can get for free.
>
>> The commercial value is all around : - Building a solution that
>> solves a business problem *using* the library - Integrating with
>> other applications/services (Clojure shines here because of the
>> JVM ecosystem) - Professional services / consulting

Ok, thx. I'd like to move into this direction.

>
>
>
>
>>>>
>>>> I notice that you are specialising the RBM to a specific
>>>> matrix implementation (Clatrix / JBlas) in the file
>>>> "jblas.clj". Are you sure you need to do that? Part of the
>>>> beauty of core.matrix is that you should be able to write
>>>> your algorithms in an implementation-independent manner and
>>>> still get the performance benefits of the optimised
>>>> implementation when you need it.
>
> I started with core.matrix operations and clatrix and then tried to
> eliminate all overhead showing up in the VisualVM sampling
> profiler. In my experiments the protocol overhead in this inner
> loop in `cond-prob-batch` was something like 10% or so, but I am
> not sure whether I did something wrong. In the mean time I have
> benchmarked my cryptographic hash function, which also uses
> protocols, and sometimes I have seen protocol overhead and
> sometimes not, maybe it was related to tiered compilation and the
> JIT sometimes not optimizing it, but this is only guessing.
>
>
>> 10% protocol overhead sounds like you must be doing quite a lot
>> of protocol calls.
>
>> The usual trick to minimise this is to ensure that a single
>> protocol call does a lot of work (i.e. work on whole arrays at a
>> time rather than individual elements). If you do that, then the
>> protocol overhead should be negligible.

I only do a matrix-multiplication and element-wise calculation of the
sigmoid activation:
https://github.com/ghubber/boltzmann/blob/master/src/boltzmann/jblas.clj#L40
I have not done any inlining without a profiler proving significant
performance benefits, but I can recheck at some point.

Cool. Maybe we could also just wrap the NDarray library of
deeplearning4j, then we could wrap their API and use a industry-level
deep-learning solution as Shriphani Palakodety suggested. While I
still don't think it is nice to implement machine learning algorithms
as giant frameworks in Java, but I'd prefer to have them hackable in
Clojure, it makes sense to start with some state-of-the-art. I also
don't see enough drive behind Clojure ml-libraries to make them
compete with Java ones atm.

Ok, but I wanted to use one library without copying matrices between
vector-clj and clatrix if possible. I only need the 2-dimensional
outer-product expansion.

Christian

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJU2h5zAAoJEKel+aujRZMkjscH/RUnDEPDQ2pqZrE3Dt8bTrnL
RdwZydg13odAncc3g9PlOh4tDYWI445FO7NonU+DV0Rf23AiBmWmhQoguh1BlFUc
Z0yqFyECgnIB9UiThDNGVt6mPoLkK5H2ovI/Er/99zLcNBN50rqpSuKguqaALVaV
ckyTKQTnkiVO2idxDrtETM4a2LlWzpOqny9cKcwaIChenC9wS3hsJGV0XXotmVYA
ksmwaMjak1KylhWdD/jtMEtVtorDgmGD+xib4R/bkT8iFf8rR5tvg3GB4IK9wjUF
nd8ZCbSE+N07pQRTcOC1CuWu+JIbK51cbSYkC4tzxEHfLQmldnvF9qUF2AquZ9g=
=92ha
-----END PGP SIGNATURE-----

Reply all

Reply to author

Forward