Comment on LatentSemanticAnalysis in airhead-research

169 views
Skip to first unread message

airhead-...@googlecode.com

unread,
Aug 15, 2010, 8:02:49 PM8/15/10
to s-space-re...@googlegroups.com
Comment by markklein60:

i downloaded lsa.jar and created a corpus and tried to run it and got the
following error:


Mark-Kleins-Computer-3:Downloads markklein$ java -jar lsa.jar -d
E-3JOYFY-52.txt my-lsa-output.sspace
Aug 15, 2010 7:58:00 PM edu.ucla.sspace.lsa.LatentSemanticAnalysis
processSpace
INFO: performing log-entropy transform
Aug 15, 2010 7:58:00 PM edu.ucla.sspace.lsa.LatentSemanticAnalysis
processSpace
INFO: reducing to 300 dimensions
java.lang.UnsupportedOperationException: No SVD algorithms are available
at edu.ucla.sspace.matrix.SVD.svd(SVD.java:446)
at
edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:494)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:417)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:147)

What can I do to get it to work?

Mark Klein
MIT Center for Collective Intelligence
m_k...@mit.edu


For more information:
http://code.google.com/p/airhead-research/wiki/LatentSemanticAnalysis

Chad Furman

unread,
Aug 15, 2010, 9:43:30 PM8/15/10
to s-space-re...@googlegroups.com, m_k...@mit.edu
Hey Mark,

I cc'd you incase you wouldn't get this when I post to the list. I
had the exact same problem.

try :

java -jar lsa.jar -S SVDLIBJ -d E-3JOYFY-52.txt my-lsa-output.sspace

I think SVDLIBJ is bundled with lsa.jar so you shouldn't have an issue
with that.

Good Luck,
Chad

David Jurgens

unread,
Aug 15, 2010, 11:25:38 PM8/15/10
to s-space-re...@googlegroups.com, m_k...@mit.edu
Hey Mark and Chad,

  The error you're seeing is due to a bug that was fixed Revision 1050.  The SVDLIBJ code path wasn't present so LSA wouldn't default to it.  I'll update the .jar file momentarily so it doesn't cause any more problems.  Please let us know if you run into any further issues.

  Thanks,
  David


--
You received this message because you are subscribed to the Google Groups "Semantic Space Research - Development" group.
To post to this group, send email to s-space-re...@googlegroups.com.
To unsubscribe from this group, send email to s-space-research...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/s-space-research-dev?hl=en.


airhead-...@googlecode.com

unread,
Aug 18, 2010, 12:06:00 PM8/18/10
to s-space-re...@googlegroups.com
Comment by sujanaraj.jyothi:

Hi, I get the same error msg. what is the next step taken if such error is
encountered. Thanks. SJ

airhead-...@googlecode.com

unread,
Aug 18, 2010, 12:47:30 PM8/18/10
to s-space-re...@googlegroups.com
Comment by David.Jurgens:

Please email s-space-re...@googlegroups.com and we'll start
debugging from there. I believe the bug reported earlier is fixed in the
trunk and in the current lsa.jar file, so we'll have to see what's causing
your error.

airhead-...@googlecode.com

unread,
Sep 29, 2010, 3:18:03 PM9/29/10
to s-space-re...@googlegroups.com
Comment by sushinoms:

I'm also getting an error message and it's really frustrating me. I've read
over this article over and over again and can't seem to find where I went
wrong. Could I get a little help please? Thanks. http://www.cabartutors.com

airhead-...@googlecode.com

unread,
Feb 6, 2011, 5:33:50 AM2/6/11
to s-space-re...@googlegroups.com
Comment by kontakti...@gmail.com:

How long should conversion from MATLAB to SVDLIBC last?
My computer is running for 48h already. processing 450k files (each 2kb)

airhead-...@googlecode.com

unread,
Feb 7, 2011, 12:45:27 PM2/7/11
to s-space-re...@googlegroups.com
Comment by David.Ju...@gmail.com:

The conversion process should be fast, even for very large matrices.
However, the SVD step may take some time. 48 hours seems like a very long
time for either one given the file sizes you listed. Could you email our
dev mailing list (s-space-re...@googlegroups.com) with more
information? We should be able to figure out why the program would be
taking so long.

airhead-...@googlecode.com

unread,
Feb 14, 2011, 3:08:29 PM2/14/11
to s-space-re...@googlegroups.com
Comment by kontakti...@gmail.com:

I have sent an exception to dev list, but haven't received an answer.
(IndexOutofBounds in class MatrixIO, row 558)
I was surprised to receive it after 2 days of SVD conversion.

Keith Stevens

unread,
Feb 14, 2011, 5:44:12 PM2/14/11
to s-space-re...@googlegroups.com
Hello,

I'm not sure we received this exception report.  Can you resent the full exception list and some details about how this problem occurred?

Thanks!
--Keith

airhead-...@googlecode.com

unread,
Mar 8, 2011, 8:04:43 AM3/8/11
to s-space-re...@googlegroups.com
Comment by elenh_...@hotmail.com:

I have the same problem -- conversion from Matlab double values to SVDLIBC
float values has been running for 2 days already. My dataset contains 260K
documents (1GB in total). Is there any way we can fix that?

airhead-...@googlecode.com

unread,
Mar 19, 2011, 11:18:22 AM3/19/11
to s-space-re...@googlegroups.com
Comment by marjan.s...@gmail.com:

Hi,
Can we build document classifier based on the class LatentSemanticAnalysis?
Something similar to:
{{{
LatentSemanticAnalysis lsa=new LatentSemanticAnalysis();
for(String document:documents)
lsa.processDocument(new BufferedReader(new
StringReader(document.toLowerCase())));
lsa.processSpace(p);
double simillarity=0;
int simIndex=-1;
DoubleVector targetDocument=lsa.getDocumentVector(documents.length-1);
for(int i=0;i<documents.length-1;i++){
double sim=Similarity.getSimilarity(Similarity.SimType.COSINE,
lsa.getDocumentVector(i),
targetDocument);
if(sim>simillarity){
simillarity=sim;
simIndex=i;
}
}
System.out.println(String.format("Target
document:'%s'",documents[documents.length-1]));
System.out.println(String.format("Similar document
found:'%s'",documents[simIndex]));
}}}
The problem with the above approach is that I don’t want to recalculate the
Semantic Space all the time. Rather, the Semantic Space should be
calculated for some training set of documents and after that new documents
should be compared with the existing document vectors. Is that possible
with the current code?

airhead-...@googlecode.com

unread,
Mar 19, 2011, 8:46:23 PM3/19/11
to s-space-re...@googlegroups.com
Comment by David.Ju...@gmail.com:

Hi, it's _almost_ possible with the current code, but you'd need to do some
more work. Specifically, the new document you'd like to compare exists in
a different geometric space than the document vectors in the LSI/LSA
space. (During `processSpace`, the SVD step is mapping your set of
documents to a different basis). To compare a new document, you have to
project its document vector into the space where your reference set
exists. This actually isn't too difficult to do. (See the
[http://en.wikipedia.org/wiki/Latent_semantic_analysis#Derivation Wikipedia
page] for the full math details.)

Assuming LSA used the SVD to covert your term-document matrix into `U`,
`S`, and `V` matrices (U is the word space and V is the document space),
you'd need to perform the following multiplication: U^-1^ * S^T^ * d,
where d is your new document vector. Currently, the only unimplemented
part of this procedure in the S-Space Package is finding the matrix inverse
of U. To get your use case to work we'd need to implement matrix inversion
and then add a few bits to LSA to save the required matrices.

Also, note that this projection doesn't change the original LSI space; so
any new vectors that you compute will not affect your training set. If
you're going to be comparing a lot of new documents, it may make sense to
recompute the LSI space periodically. Also, the projection won't include
information for any new terms in the new documents. If your test documents
all start referencing terms that aren't in the training set, then
recomputing the space also seems necessary.

airhead-...@googlecode.com

unread,
Mar 20, 2011, 2:28:50 AM3/20/11
to s-space-re...@googlegroups.com
Comment by marjan.s...@gmail.com:

Hi David,

I understand. The space (SVD) must be recalculated eventually. For document
classifier it should be on demand, for example if we flag some document as
important or something (Google Priority Inbox), the space will be
recalculated for the upcoming classifications. It would be nice if S-Space
has out of the box support for this scenario:

1. Calculate the LSA Space
2. Serialize the LSA Space on disk (or output stream)
3. De-serialize the LSA Space from disk for document classification (Space
read only operation)
4. Classify document(s)
5. Serialize the LSA Space on disk

Steps 2-5 are repeating until a new document is flagged as semantic
“governor” and the process continues from the step 1.

Nice work! I was reading through the code. It is well written and easy to
follow.

Thanks,

Marjan

airhead-...@googlecode.com

unread,
Mar 20, 2011, 2:32:50 AM3/20/11
to s-space-re...@googlegroups.com
Comment by marjan.s...@gmail.com:

Hi David,

I understand. The space (SVD) must be recalculated eventually. For document
classifier it should be on demand, for example if we flag some document as
important or something (Google Priority Inbox), the space will be
recalculated for the upcoming classifications. It would be nice if S-Space
has out of the box support for this scenario:

1. Calculate the LSA Space
2. Serialize the LSA Space on disk (or output stream)
3. De-serialize the LSA Space from disk for document classification
(Space read only operation)
4. Classify document(s)
5. Serialize the LSA Space on disk

Steps 2-5 are repeating until a new document is flagged as semantic
“governor” and the process continues from the step 1.

Nice work! I was reading through the code. It is well written and easy to
follow.

airhead-...@googlecode.com

unread,
Mar 20, 2011, 2:36:50 AM3/20/11
to s-space-re...@googlegroups.com
Comment by marjan.s...@gmail.com:

Hi David,

I understand. The space (SVD) must be recalculated eventually. For document
classifier it should be on demand, for example if we flag some document as
important or something (Google Priority Inbox), the space will be
recalculated for the upcoming classifications. It would be nice if S-Space
has out of the box support for this scenario:

# Calculate the LSA Space
# Serialize the LSA Space on disk (or output stream)
# De-serialize the LSA Space from disk for document classification (Space
read only operation)
# Classify document(s)
# Serialize the LSA Space on disk

airhead-...@googlecode.com

unread,
Mar 20, 2011, 2:40:51 AM3/20/11
to s-space-re...@googlegroups.com
Comment by marjan.s...@gmail.com:

Hi David,

I understand. The space (SVD) must be recalculated eventually. For document
classifier it should be on demand, for example if we flag some document as
important or something (Google Priority Inbox), the space will be
recalculated for the upcoming classifications. It would be nice if S-Space
has out of the box support for this scenario:

# Calculate the LSA Space
# Serialize the LSA Space on disk (or output stream)
# De-serialize the LSA Space from disk for document classification (Space
read only operation)
# Classify document(s)
# Serialize the LSA Space on disk

Steps 3-5 are repeating until a new document is flagged as semantic

airhead-...@googlecode.com

unread,
Mar 20, 2011, 3:58:41 PM3/20/11
to s-space-re...@googlegroups.com
Comment by marjan.s...@gmail.com:

I’m just curious, is there any specific reason for not using the existing
class edu.ucla.sspace.common.DocumentVectorBuilder for the classification
purposes? All that we need is save/load support for the space’s document
vectors, the same way we save/load the term vectors. The builder produced
vector (the query) will be compared with each space document vector for
classification match.

airhead-...@googlecode.com

unread,
Mar 21, 2011, 2:21:47 PM3/21/11
to s-space-re...@googlegroups.com
Comment by David.Ju...@gmail.com:

I actually do have support for Serializing the LSA Space to disk in a
branch somewhere. Another person had requested it, so we were looking at
what the best way to do it is. (There's a few issues with how we treat the
document space, since its matrix might exist on disk). I'll see if I can
clean those up soon and add them in. This still doesn't resolve the
missing matrix inversion functionality for step 4, but it's a step in that
direction.

As for the second question, it's definitely possible to use the
DocumentVectorBuilder (DVB) for what you're trying to do. However, given
that LSA has an explicit document representation, I think it would make
more sense to use that. Also, I'm not certain we've really tested how well
the DVB works for classification or clustering. The DVB has a very naive
treatment of what it means to represent a document, where it just sums the
vectors of the document's tokens. The summation ignores all aspects of
compositionality, word order, and collocations. If you decide to try using
it, we'd love to hear what you discover.

Also, it's probably easiest to continue this discussion on the developer
mailing list: s-space-re...@googlegroups.com.
Thanks,
David

airhead-...@googlecode.com

unread,
Apr 7, 2011, 8:20:15 PM4/7/11
to s-space-re...@googlegroups.com
Comment by jetpropu...@gmail.com:

I get the following error:

java -d64 -Xms2G -Xmx7G -jar lsa-1.3.jar -d all.txt -n 300
my-lsa-output-300.sspace
Apr 7, 2011 6:17:00 PM edu.ucla.sspace.lsa.LatentSemanticAnalysis

processSpace
INFO: performing log-entropy transform

Apr 7, 2011 6:17:03 PM edu.ucla.sspace.lsa.LatentSemanticAnalysis

processSpace
INFO: reducing to 300 dimensions

Apr 7, 2011 6:17:04 PM edu.ucla.sspace.matrix.MatrixIO
matlabToSvdlibcSparseBinary
INFO: Converting from Matlab double values to SVDLIBC float values;
possible loss of precision
Apr 7, 2011 6:17:24 PM edu.ucla.sspace.matrix.SVD svd
SEVERE: convertFormat
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at edu.ucla.sspace.matrix.SvdlibjDriver.readToSMat(SvdlibjDriver.java:263)
at edu.ucla.sspace.matrix.SvdlibjDriver.svd(SvdlibjDriver.java:96)
at edu.ucla.sspace.matrix.SVD.svd(SVD.java:400)
at
edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:528)
at
edu.ucla.sspace.mains.GenericMain.processDocumentsAndSpace(GenericMain.java:496)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:429)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:147)
java.lang.UnsupportedOperationException: Unknown algorithm: ANY
at edu.ucla.sspace.matrix.SVD.svd(SVD.java:409)
at
edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:528)
at
edu.ucla.sspace.mains.GenericMain.processDocumentsAndSpace(GenericMain.java:496)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:429)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:147)

Any ideas?

airhead-...@googlecode.com

unread,
May 12, 2011, 6:17:06 AM5/12/11
to s-space-re...@googlegroups.com
Comment by DuoT...@gmail.com:

Is there any way to set retainDocumentSpace to "true" during the launching
of lsa-1.3.jar?

airhead-...@googlecode.com

unread,
Aug 12, 2011, 11:27:10 AM8/12/11
to s-space-re...@googlegroups.com
Comment by vivekvar...@gmail.com:

I have created the lsa space using the lsa.jar . Now I'm able to get the
similarity between any two words, and also the number of neighbors for a
given word using COSINE distance measure. How do I get similarity between a
sentence and a word in the given lsa space, or similarity between document
and a word? Please do the needful.

airhead-...@googlecode.com

unread,
Dec 20, 2011, 12:47:11 PM12/20/11
to s-space-re...@googlegroups.com
Comment by internet...@hotmail.com:

Hello there, I am new to this and never worked on LSA before.

I want to use LSA to compare sentences for semantic similarities and wanted
to know if lsa-1.3.jar can be used for this purpose.

I have downloaded it and seen that it works with documents only and I
really can't understand what the output means or how do it convert it into
a more readable format.
Just to test,I inserted a document with some sentences and specified the
output as text file,the output was incomprehensible, how do I read the
output?

Can someone tell me if is possible to find similarties between sentences
and know how similar they are as output?

airhead-...@googlecode.com

unread,
Jan 20, 2012, 3:36:26 AM1/20/12
to s-space-re...@googlegroups.com
Comment by zongnans...@gmail.com:

Hi, i want use the lib to compute the co-occurrence of terms in many docs,
which class i should use, sorry, there is no examples, so its a little bit
difficult.

airhead-...@googlecode.com

unread,
Jan 20, 2012, 4:03:43 AM1/20/12
to s-space-re...@googlegroups.com
Comment by David.Ju...@gmail.com:

If you're wanting to compute the co-occurrence, you'll want to use the
GenericWordSpace class, which records word co-occurrences as features.
There's not a .jar file for running that class, but we could probably get
one up and running easily if you don't want to write the code. Please
email the development team at s-space-re...@googlegroups.com and we
can talk more there.

Just to note, LSA doesn't directly record word co-occurrences (only
document co-occurrences), so you probably don't want to use the lsa.jar for
what you're describing.

airhead-...@googlecode.com

unread,
Jan 22, 2012, 4:23:34 PM1/22/12
to s-space-re...@googlegroups.com
Comment by rive...@gmail.com:

Greetings - In running LSA using the following command:

`java -jar lsa.jar -f lexis_monthly_corpus-filelist.txt -n 300 -t 1 -o text
-v ./`

I find the following message in the output:

"FINE: Finished writing matrix in MATLAB_SPARSE format with 60 columns"

I'm wondering how I can get the full 300-dimension space to written to the
file as it appears to be COALS and BEAGLE. The 60-D matrix appears to get
written regardless of output type for an n > 60.

Any ideas?

airhead-...@googlecode.com

unread,
Mar 20, 2012, 11:53:55 AM3/20/12
to s-space-re...@googlegroups.com
Comment by kleinmar...@gmail.com:

I am trying to have the LSA output go to a text, rather than binary, file.
When I run the following command line:

java -jar lsa-1.3.jar -d E-3O6ICQ-61.txt –o TEXT output.txt


the output is in a binary file named "-o". What am I doing wrong?

Thanks,

Mark

-------------------------------
Mark Klein
Principal Research Scientist
MIT Center for Collective Intelligence
http://cci.mit.edu/klein/

airhead-...@googlecode.com

unread,
Apr 20, 2012, 6:14:47 AM4/20/12
to s-space-re...@googlegroups.com
Comment by elef...@gmail.com:

Hi,
I Have installed the S-Space package and tried it out on a small test file,
but I get following error - any ideas what's going wrong?
Tx!!


java -Xmx8g -jar ./bin/lsa.jar -d ./bin/test.txt -S
SVDLIBJ ../Data/Train/Fr/LSA_out
20-apr-2012 12:10:24 edu.ucla.sspace.util.LoggerUtil info
INFO: performing log-entropy transform
20-apr-2012 12:10:24 edu.ucla.sspace.util.LoggerUtil info


INFO: reducing to 300 dimensions

20-apr-2012 12:10:24 edu.ucla.sspace.matrix.MatrixIO

matlabToSvdlibcSparseBinary
INFO: Converting from Matlab double values to SVDLIBC float values;
possible loss of precision

java.lang.NumberFormatException: For input string: "1,584963"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1222)
at java.lang.Double.valueOf(Double.java:475)
at
edu.ucla.sspace.matrix.MatrixIO.matlabToSvdlibcSparseBinary(MatrixIO.java:589)
at edu.ucla.sspace.matrix.MatrixIO.convertFormat(MatrixIO.java:259)
at edu.ucla.sspace.matrix.MatrixIO.convertFormat(MatrixIO.java:220)
at edu.ucla.sspace.matrix.SvdlibjDriver.svd(SvdlibjDriver.java:95)
at edu.ucla.sspace.matrix.SVD.svd(SVD.java:343)
at
edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:350)
at
edu.ucla.sspace.mains.GenericMain.processDocumentsAndSpace(GenericMain.java:508)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:432)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:147)

David Jurgens

unread,
Apr 20, 2012, 10:58:36 AM4/20/12
to s-space-re...@googlegroups.com
It looks like the number format is using a comma to format the decimal marker, rather than a period.  The matrix writing should be automatic, so I doubt this is something you're doing specifically.  Do you know if you're using a different language locale that would cause the comma delimiter?  There could be a bug here in incorrectly reading the numbers that doesn't take into account the correct Locale, but I'm not sure at the moment.

  Thanks,
  David

--
You received this message because you are subscribed to the Google Groups "Semantic Space Research - Development" group.
To post to this group, send email to s-space-research-dev@googlegroups.com.
To unsubscribe from this group, send email to s-space-research-dev+unsub...@googlegroups.com.

airhead-...@googlecode.com

unread,
Jun 9, 2012, 6:03:58 PM6/9/12
to s-space-re...@googlegroups.com
Comment by shiva.p....@gmail.com:

I have a theoretical question regarding document space. Here it goes.


Step 1: In edu.ucla.sspace.lsa.LatentSemanticAnalysis
if "retainDocumentSpace" is true then we can recover the new reduced
dimension document vectors.
Let's say I start with 100 documents and do LSA and recover the document
vector for the 0th document using the "retained" document space as has been
implemented in edu.ucla.sspace.lsa.LatentSemanticAnalysis.


Step 2: Now I follow the steps noted in one of the comments for the same
document 0 (with the original term frequency and not the reduced dimension
vector) and the s-space calculated in step 1.

Comment by project member David.Ju...@gmail.com, Mar 19, 2011
...Assuming LSA used the SVD to covert your term-document matrix into U, S,
and V matrices (U is the word space and V is the document space), you'd
need to perform the following multiplication: U-1 ST d, where d is your new
document vector. Currently, the only unimplemented part of this procedure
in the S-Space Package is finding the matrix inverse of U. To get your use
case to work we'd need to implement matrix inversion and then add a few
bits to LSA to save the required matrices...

So for same 0th document I can calculate the reduced dimension vector as
noted in the comment above.


My question is: it turns out that the document vector in the new reduced
dimension space for document 0 as extracted in step 1 is different from the
one in step 2.

Is this expected?

john.l...@gmail.com

unread,
Jul 2, 2012, 9:33:55 PM7/2/12
to s-space-re...@googlegroups.com, codesite...@google.com
Hi David,

I am also interested using s-space's LSA for doc similarity, so I need to load a pre-trained sspace, and convert new documents into the low-dimensional space for comparison. Has any progress been made in the codebase for implementing these remaining steps -- as far as I can tell, there has not been (aside from the related item of persisting the documentSpace was done).  So in order to compute U^-1^ * S^T^ * d  I still need to:
1. Obtain the singularValues (S) stored in the SVD class (e.g. SingularValueDecompositionLibC), by downcasting the reducer to its implementing class, and then persist it in the LatentSemanticAnalysis class in the same way the documentSpace is persisted.
2. Integrate my own Matrix inversion (I’m sure they abound, but any recommendations?)

Does that sound about right?
Thanks!!

David Jurgens

unread,
Jul 9, 2012, 1:23:45 PM7/9/12
to s-space-re...@googlegroups.com
Hi John,

  Sorry for the delay; somehow this email got stuck in our spam filter until I noticed it today.

I am also interested using s-space's LSA for doc similarity, so I need to load a pre-trained sspace, and convert new documents into the low-dimensional space for comparison. Has any progress been made in the codebase for implementing these remaining steps -- as far as I can tell, there has not been (aside from the related item of persisting the documentSpace was done).

I actually think I might still have this code laying around somewhere in a branch.  Let me see if I can find it and if it looks decent, get it checked in.
 
 So in order to compute U^-1^ * S^T^ * d  I still need to:

Actually, I think it's S^-1 * U^T * d  
 

1. Obtain the singularValues (S) stored in the SVD class (e.g. SingularValueDecompositionLibC), by downcasting the reducer to its implementing class, and then persist it in the LatentSemanticAnalysis class in the same way the documentSpace is persisted.

Yes, you'll need to store the Singular Values
 

2. Integrate my own Matrix inversion (I’m sure they abound, but any recommendations?)

If it's the S matrix that needs to be inverted, I think you're in luck because it's a diagonal matrix and the inversion is taking (1/val) for all the diagonal elements.
 

Does that sound about right?
Thanks!!

  Thanks,
  David
 

--
You received this message because you are subscribed to the Google Groups "Semantic Space Research - Development" group.
To view this discussion on the web visit https://groups.google.com/d/msg/s-space-research-dev/-/RtatwwmJAcUJ.

To post to this group, send email to s-space-re...@googlegroups.com.
To unsubscribe from this group, send email to s-space-research...@googlegroups.com.

airhead-...@googlecode.com

unread,
Aug 7, 2012, 5:25:44 AM8/7/12
to s-space-re...@googlegroups.com
Comment by mcmarco...@ymail.com:

hi

i have the following error, using the following command in Eclipse
(programm arguments):

-d "/home/marco/test.txt" "/home/marco/lsa.txt"

Error:

07.08.2012 10:43:00 edu.ucla.sspace.common.GenericTermDocumentVectorSpace
processSpace
INFO: performing log-entropy transform
07.08.2012 10:43:00
edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform <init>
INFO: Computing the total row counts
07.08.2012 10:43:00
edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform <init>
INFO: Computing the entropy of each row
07.08.2012 10:43:00
edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform <init>
INFO: Scaling the entropy of the rows
07.08.2012 10:43:00 edu.ucla.sspace.lsa.LatentSemanticAnalysis processSpace
INFO: reducing to 300 dimensions
07.08.2012 10:43:00
edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibC
factorize
SCHWERWIEGEND: SVDLIBC
java.io.IOException: Cannot run program "svd": java.io.IOException:
error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:475)
at java.lang.Runtime.exec(Runtime.java:610)
at java.lang.Runtime.exec(Runtime.java:448)
at java.lang.Runtime.exec(Runtime.java:345)
at
edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibC.factorize(SingularValueDecompositionLibC.java:134)
at
edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:360)
at
edu.ucla.sspace.mains.GenericMain.processDocumentsAndSpace(GenericMain.java:513)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:437)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:166)
Caused by: java.io.IOException: java.io.IOException: error=2, No such file
or directory
at java.lang.UNIXProcess.<init>(UNIXProcess.java:164)
at java.lang.ProcessImpl.start(ProcessImpl.java:81)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:468)
... 8 more
Exception in thread "main" java.lang.NullPointerException
at
edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibC.dataClasses(SingularValueDecompositionLibC.java:198)
at
edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:362)
at
edu.ucla.sspace.mains.GenericMain.processDocumentsAndSpace(GenericMain.java:513)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:437)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:166)

Any suggestions, how to fix the problem?

Best

Marco

airhead-...@googlecode.com

unread,
Aug 7, 2012, 6:57:30 AM8/7/12
to s-space-re...@googlegroups.com
Comment by Fozzie...@gmail.com:

First, I would suggest using the latest code base located at
https://github.com/fozziethebeat/S-Space. Second, you will have to install
svdlibc found at http://tedlab.mit.edu/~dr/SVDLIBC/.
https://github.com/fozziethebeat/S-Space/wiki/SingularValueDecomposition
has more information on alternative SVD options such as Matlab or Octave if
you have those.

airhead-...@googlecode.com

unread,
Aug 7, 2012, 7:41:33 AM8/7/12
to s-space-re...@googlegroups.com
Comment by mcmarco...@ymail.com:

Hi

thanks for your reply. The website of SVDLIBC is currently not reachable.
Will try an alternative..

Best

Marco

airhead-...@googlecode.com

unread,
Aug 10, 2012, 7:22:59 AM8/10/12
to s-space-re...@googlegroups.com
Comment by mcmarco...@ymail.com:

Hi,

i tried to run it with COLT and Jama, but without success. I included the
path to the classpath and choosing -d "/home/marco/test.txt" -S
COLT "/home/marco/lsa.txt" outputs the same error, mentioned above.
The website about SVDLIBC is still not reachable.

Any suggestions?

airhead-...@googlecode.com

unread,
Aug 23, 2012, 12:17:22 AM8/23/12
to s-space-re...@googlegroups.com
Comment by thale...@gmail.com:

Hi marco,

take a look at: https://github.com/lucasmaystre/svdlibc

Best regards,

Thales F. Costa

airhead-...@googlecode.com

unread,
Jun 25, 2014, 5:39:18 AM6/25/14
to s-space-re...@googlegroups.com
Comment by dmosc...@gmail.com:

The above description reads : "LSA should work right out of the box with no
configuring and no external software" but the code itself seems to
contradict this assertion:

private SVD() { }

/**
* Returns the fastest {@link MatrixFactorization} implementation of
* Singular Value Decomposition available, or {@code null} if no
* implementation is available.
*/
public static SingularValueDecomposition
getFastestAvailableFactorization() {
if (isSVDLIBCavailable())
return new SingularValueDecompositionLibC();
if (isMatlabAvailable())
return new SingularValueDecompositionMatlab();
if (isOctaveAvailable())
return new SingularValueDecompositionOctave();
throw new UnsupportedOperationException("Cannot find a valid SVD
implementation");
}

Or I have missed something?

David M.

For more information:
https://code.google.com/p/airhead-research/wiki/LatentSemanticAnalysis

David Jurgens

unread,
Jun 25, 2014, 10:05:12 AM6/25/14
to s-space-re...@googlegroups.com
Hi David,

  This statement is unfortunately not true any more.  Prior, we were using SVDLIBJ to perform the SVD.  However, we eventually discovered that their code had a bug which produced incorrect decompositions.  Therefore we had to revert the code to use the other SVD libraries, which are unfortunately not in Java.  

  So for the moment, you'll need some SVD library installed for LSA to work correctly.

  Thanks,
  David 


--
You received this message because you are subscribed to the Google Groups "Semantic Space Research - Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s-space-research-dev+unsub...@googlegroups.com.

To post to this group, send email to s-space-research-dev@googlegroups.com.
Visit this group at http://groups.google.com/group/s-space-research-dev.
For more options, visit https://groups.google.com/d/optout.

Fabian Zehner

unread,
Jun 25, 2014, 1:34:01 PM6/25/14
to s-space-re...@googlegroups.com
Hi David,

sorry for intervening in this communication. It touches the issue of using external libraries which I raised some months ago and did not hear back from you (https://groups.google.com/forum/?hl=de#!topic/s-space-users/hfHxQ3o9-pk).
For windows users the invoking of some external libraries via exec() as implemented in your package does not work. I described the bug that can be easily corrected in the post linked above. Or if you maybe could give me a hint how to come along with a workaround I would be really pleased, because I cannot run my LSAs recently with larger corpora because of JAMA's poor performance.

Thanks a lot for your great work on this package, I really appreciate it a lot!

Best regards,
Fabian
To unsubscribe from this group and stop receiving emails from it, send an email to s-space-research...@googlegroups.com.

To post to this group, send email to s-space-re...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Semantic Space Research - Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s-space-research...@googlegroups.com.
To post to this group, send email to s-space-re...@googlegroups.com.

David Jurgens

unread,
Jul 4, 2014, 6:43:56 AM7/4/14
to s-space-re...@googlegroups.com
Hi Fabian,  

  Sorry for the delay.  The change you requested was actually committed to the github repository three months ago.  If you get the latest version, it should be fixed and (hopefully) the exec() call should no longer block on Windows.  Please let me know if you still have issues.

  Thanks,
  David
Reply all
Reply to author
Forward
0 new messages