I have been working with John on improving machine learning packages.
As part of the efforts, I rewrote the k-means algorithm in Clustering.jl.
It is now substantially faster (100x - 200x) than before. In a benchmark, it takes 0.5s in my macbook pro to cluster 10000 samples (of dimension 100) into 50 clusters (about 0.01 second per iteration).
Several key modifications are used to make it fast:
1. Change from row-major to column-major. So it is more cache friendly w.r.t. Julia's memory layout.
2. Use the Distance.jl package to evaluate pairwise distances (which internally uses BLAS-3 routines for speed)
3. Remember which clusters were affected during re-assignment, so as to reduce the computation in ensuing updates of centers and distances.
4. Reuse memory carefully, which substantially reduces re-allocation of arrays at each iteration.
In terms of functionality, it now supports more options. Refer to the README of Clustering.jl for details.
There will be updates in several machine learning packages (e.g. Clustering.jl, kNN.jl, Classification.jl, SVM.jl, etc) in coming weeks.