troubles encountered using S-Space

29 views
Skip to first unread message

Sky Cliff

unread,
Jul 3, 2014, 9:09:29 AM7/3/14
to s-space-users, s-space-research-dev
Dear developers,
My name is Qingwen Liu, I am a P.H.D candidate in University of Science and Technology of China.
Thank you for providing the free and powerful semantic space builders.
I encountered some trouble when using the LSA algorithm. The following is the detail description.

 1. running command:
  • java -Xmx20g -cp ~/java_workspace/S-Space/target/sspace-2.0.4-jar-with-dependencies.jar edu.ucla.sspace.mains.LSAMain -d lsaCorpus.sent.filtered -o binary -p edu.ucla.sspace.matrix.TfIdfTransform -F exclude=msscdStopwords.txt -v -n 300 holmos.sent.filtered.full.sspace
2. data description
  • About 1.4 million docs are included in the input file "lsaCorpus.sent.filtered" and there are 5658 words in the vocabulary, i.e. the size of the matrix is  5658 * 1392446  before the SVD process.
3. the exceptions i encountered:

Report bugs to <b...@octave.org> (but first, please read
http://www.octave.org/bugs.html to learn how to write a helpful report).

For information about changes from previous versions, type `news'.


Jul 02, 2014 2:15:19 PM edu.ucla.sspace.matrix.factorization.SingularValueDecompositionOctave factorize
FINE: Octave svds exit status: 0
Jul 02, 2014 2:15:19 PM edu.ucla.sspace.matrix.MatrixIO readDenseTextMatrix
FINE: reading in text matrix with 0 rows and -1 cols
Jul 02, 2014 2:15:19 PM edu.ucla.sspace.matrix.MatrixIO readDenseTextMatrix
FINE: reading in text matrix with 0 rows and -1 cols
Exception in thread "main" java.lang.IllegalArgumentException: dimensions must be positive
at edu.ucla.sspace.matrix.OnDiskMatrix.<init>(OnDiskMatrix.java:98)
at edu.ucla.sspace.matrix.Matrices.create(Matrices.java:216)
at edu.ucla.sspace.matrix.MatrixIO.readDenseTextMatrix(MatrixIO.java:929)
at edu.ucla.sspace.matrix.MatrixIO.readMatrix(MatrixIO.java:795)
at edu.ucla.sspace.matrix.MatrixIO.readMatrix(MatrixIO.java:762)
at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionOctave.factorize(SingularValueDecompositionOctave.java:137)
at edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:457)
at edu.ucla.sspace.mains.GenericMain.processDocumentsAndSpace(GenericMain.java:514)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:443)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:167)

4) My question
  • If I use a small corpus which contains 0.8 million docs , the LSA algorithm works fine, but if the corpus size exceed 1 million, the exceptions above came out. What can I do to solve this problem? 
  •  It seems the SVD algorithm crashed when the size of the input corpus is a bit large. The default SVD algorithm is octave, and i want to try another one , but when I use the command option "-S SVDLIBC" or others except OCTAVE, i will get the un-installed exception as follows. If I had installed this lib, how can i configured it to make it work in S-Space package?
Exception in thread "main" java.lang.IllegalStateException: Use of this class requires the SVDLIBC command line program, which is either not installed on this system or is not available to be executed from the command line.  Check that your PATH settings are correct or see http://tedlab.mit.edu/~dr/SVDLIBC/ to download and install the program.
at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibC.factorize(SingularValueDecompositionLibC.java:84)
at edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:457)
at edu.ucla.sspace.mains.GenericMain.processDocumentsAndSpace(GenericMain.java:514)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:443)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:167)



emiel...@gmail.com

unread,
Aug 7, 2014, 5:17:16 AM8/7/14
to s-spac...@googlegroups.com, s-space-re...@googlegroups.com, 3163...@qq.com
Did you add the command line program SVDLIBC to your path? See here how to do this: http://unix.stackexchange.com/questions/26047/how-to-correctly-add-a-path-to-path

David Jurgens

unread,
Aug 7, 2014, 10:16:49 AM8/7/14
to s-spac...@googlegroups.com
Hi Qingwen,

  The problem you're experiencing is due to the matrix size limitation in Octave.  If you try to load a sparse matrix that exceeds these bounds, the loading step fails, given that error.  We should probably be checking for this in the code beforehand since we know the matrix size and can report a more intelligence error. :)

  For SVDLIBC, you don't need to configure it specifically for our software.  All that's needed is that the "svd" program is callable from the command line.  What platform are you running on?  If you're on some unix variant, the path to "svd" just needs to be in your PATH environment variable.

  Thanks,
  David





--

---
You received this message because you are subscribed to the Google Groups "S-Space Package Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s-space-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages