Today Google Summer of Code finally ends. NDArray is pretty stable now and will be the default implementation used by core.matrix in the upcoming release. It was an epic journey through bugs, unexpected slowness and weird macros, but now I can proudly state that in some cases NDArray can be faster than NumPy by a factor of two: [1].
Well, not always. It's hard to consistently beat highly optimized and vectorized native code on JVM :) But we have a secrete sauce here in Clojure: macros. You can find the code I've benchmarked here: [2]. What's going on? The thing here is that using macros we can fully eliminate the cost of intermediate matrix allocation (or function call machinery in case of map-like functions) doing element-wise operations. Thanks to macros, the code will be expanded to a highly optimized loop right inside user's code, hiding messy details of NDArray and Clojure's type hinting from user, but exposing necessary information to JVM. Moreover, this will work on a bunch of NDArrays, too: [3]. More details can be found in NDArray's documentation (by the courtesy of Marginalia): [4].
Overall, this project was very interesting experience to me. Hope others will find it useful :)
Cheers!
Dmitry