Best SVD algorithm for scalability?

222 views
Skip to first unread message

Tim Lahey

unread,
Oct 5, 2013, 7:33:18 PM10/5/13
to s-space-re...@googlegroups.com
Hi,

Right now, I'm using SVDLIBC but I'm worried that writing to a temp file and calling a command-line program isn't the most scalable. The pure Java options for S-Space right now appear to be, COLT and JAMA (although JAMA isn't available right now since NIST is down because of the shutdown). COLT hasn't been developed since 2004 it looks like (according to their changelog), although there is PColt which is somewhat newer (2010).

According the the Java Matrix Benchmark,

http://code.google.com/p/java-matrix-benchmark/

There's a lot of different libraries and there's at least one more not listed there, la4j,

http://la4j.org

According to the benchmark, it looks like ojAlgo,

http://ojalgo.org

is the best one at the moment (although la4j isn't listed). Are any of the other Java matrix libraries being investigated for use in S-Space? Can PColt (Parallel COLT) be used as a replacement for COLT?

Any direction you could provide would be a great help.

Thanks,

Tim Lahey.

---
Tim Lahey, Ph.D.
Post-Doctoral Fellow
Systems Design Engineering
University of Waterloo

David Jurgens

unread,
Oct 6, 2013, 4:51:49 AM10/6/13
to s-space-re...@googlegroups.com
Hi Tim,

  The big issue with the SVD libraries is whether they support a thin/truncated SVD.   Last time we checked, neither COLT nor JAMA supported this.  The matrix that LSA uses is very sparse and only the first K singular values are needed.  JAMA and COLT will compute all of the singular values (i.e., the full U,S,V decomposition) which runs out of memory quickly.  There used to be a Java port of SVDLIBC named SVDLIBJ, which we used heavily.  However, we discovered that the implementation was flawed and produced incorrect singular values at times, so we had to remove it from our dependencies.  At the moment, SVDLIBC is still the fastest way to compute the type of SVD that LSA requires.

  I definitely share your sentiment that it would be really nice to have a pure Java solution though.  

  Thanks,
  David



--
You received this message because you are subscribed to the Google Groups "Semantic Space Research - Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s-space-research...@googlegroups.com.
To post to this group, send email to s-space-re...@googlegroups.com.
Visit this group at http://groups.google.com/group/s-space-research-dev.
For more options, visit https://groups.google.com/groups/opt_out.

Tim Lahey

unread,
Oct 6, 2013, 11:10:09 AM10/6/13
to s-space-re...@googlegroups.com
Hi David,

I've been doing some digging. At the moment, there appear to be two options for sparse SVD with the first K singular values in Java.

Decomposer - https://code.google.com/p/decomposer/

and I think one could use netlib-java to do it as it wraps ARPACK. I'm not sure about the accuracy of either of these solutions, though. MTJ wraps netlib-java, but incompletely as it uses ARPACK only for sparse symmetric matrices.

The guy who wrote Decomposer talked about porting his SVD code to the Apache Commons math library, but I can't see that he's done that yet.

Thanks for the clarification,

Tim.

---
Tim Lahey, Ph.D.
Post-Doctoral Researcher
Systems Design Engineering
University of Waterloo

David Jurgens

unread,
Oct 6, 2013, 1:08:44 PM10/6/13
to s-space-re...@googlegroups.com
We looked at both at one point.  Decomposer seems very promising, but it hasn't been updated in several years and it's not entirely clear how to adapt their code to fit the matrices we use.  
Reply all
Reply to author
Forward
0 new messages