i downloaded lsa.jar and created a corpus and tried to run it and got the
following error:
Mark-Kleins-Computer-3:Downloads markklein$ java -jar lsa.jar -d
E-3JOYFY-52.txt my-lsa-output.sspace
Aug 15, 2010 7:58:00 PM edu.ucla.sspace.lsa.LatentSemanticAnalysis
processSpace
INFO: performing log-entropy transform
Aug 15, 2010 7:58:00 PM edu.ucla.sspace.lsa.LatentSemanticAnalysis
processSpace
INFO: reducing to 300 dimensions
java.lang.UnsupportedOperationException: No SVD algorithms are available
at edu.ucla.sspace.matrix.SVD.svd(SVD.java:446)
at
edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:494)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:417)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:147)
What can I do to get it to work?
Mark Klein
MIT Center for Collective Intelligence
m_k...@mit.edu
For more information:
http://code.google.com/p/airhead-research/wiki/LatentSemanticAnalysis
I cc'd you incase you wouldn't get this when I post to the list. I
had the exact same problem.
try :
java -jar lsa.jar -S SVDLIBJ -d E-3JOYFY-52.txt my-lsa-output.sspace
I think SVDLIBJ is bundled with lsa.jar so you shouldn't have an issue
with that.
Good Luck,
Chad
--
You received this message because you are subscribed to the Google Groups "Semantic Space Research - Development" group.
To post to this group, send email to s-space-re...@googlegroups.com.
To unsubscribe from this group, send email to s-space-research...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/s-space-research-dev?hl=en.
Hi, I get the same error msg. what is the next step taken if such error is
encountered. Thanks. SJ
Please email s-space-re...@googlegroups.com and we'll start
debugging from there. I believe the bug reported earlier is fixed in the
trunk and in the current lsa.jar file, so we'll have to see what's causing
your error.
I'm also getting an error message and it's really frustrating me. I've read
over this article over and over again and can't seem to find where I went
wrong. Could I get a little help please? Thanks. http://www.cabartutors.com
How long should conversion from MATLAB to SVDLIBC last?
My computer is running for 48h already. processing 450k files (each 2kb)
The conversion process should be fast, even for very large matrices.
However, the SVD step may take some time. 48 hours seems like a very long
time for either one given the file sizes you listed. Could you email our
dev mailing list (s-space-re...@googlegroups.com) with more
information? We should be able to figure out why the program would be
taking so long.
I have sent an exception to dev list, but haven't received an answer.
(IndexOutofBounds in class MatrixIO, row 558)
I was surprised to receive it after 2 days of SVD conversion.
I have the same problem -- conversion from Matlab double values to SVDLIBC
float values has been running for 2 days already. My dataset contains 260K
documents (1GB in total). Is there any way we can fix that?
Hi, it's _almost_ possible with the current code, but you'd need to do some
more work. Specifically, the new document you'd like to compare exists in
a different geometric space than the document vectors in the LSI/LSA
space. (During `processSpace`, the SVD step is mapping your set of
documents to a different basis). To compare a new document, you have to
project its document vector into the space where your reference set
exists. This actually isn't too difficult to do. (See the
[http://en.wikipedia.org/wiki/Latent_semantic_analysis#Derivation Wikipedia
page] for the full math details.)
Assuming LSA used the SVD to covert your term-document matrix into `U`,
`S`, and `V` matrices (U is the word space and V is the document space),
you'd need to perform the following multiplication: U^-1^ * S^T^ * d,
where d is your new document vector. Currently, the only unimplemented
part of this procedure in the S-Space Package is finding the matrix inverse
of U. To get your use case to work we'd need to implement matrix inversion
and then add a few bits to LSA to save the required matrices.
Also, note that this projection doesn't change the original LSI space; so
any new vectors that you compute will not affect your training set. If
you're going to be comparing a lot of new documents, it may make sense to
recompute the LSI space periodically. Also, the projection won't include
information for any new terms in the new documents. If your test documents
all start referencing terms that aren't in the training set, then
recomputing the space also seems necessary.
I actually do have support for Serializing the LSA Space to disk in a
branch somewhere. Another person had requested it, so we were looking at
what the best way to do it is. (There's a few issues with how we treat the
document space, since its matrix might exist on disk). I'll see if I can
clean those up soon and add them in. This still doesn't resolve the
missing matrix inversion functionality for step 4, but it's a step in that
direction.
As for the second question, it's definitely possible to use the
DocumentVectorBuilder (DVB) for what you're trying to do. However, given
that LSA has an explicit document representation, I think it would make
more sense to use that. Also, I'm not certain we've really tested how well
the DVB works for classification or clustering. The DVB has a very naive
treatment of what it means to represent a document, where it just sums the
vectors of the document's tokens. The summation ignores all aspects of
compositionality, word order, and collocations. If you decide to try using
it, we'd love to hear what you discover.
Also, it's probably easiest to continue this discussion on the developer
mailing list: s-space-re...@googlegroups.com.
Thanks,
David
I get the following error:
java -d64 -Xms2G -Xmx7G -jar lsa-1.3.jar -d all.txt -n 300
my-lsa-output-300.sspace
Apr 7, 2011 6:17:00 PM edu.ucla.sspace.lsa.LatentSemanticAnalysis
processSpace
INFO: performing log-entropy transform
Apr 7, 2011 6:17:03 PM edu.ucla.sspace.lsa.LatentSemanticAnalysis
processSpace
INFO: reducing to 300 dimensions
Apr 7, 2011 6:17:04 PM edu.ucla.sspace.matrix.MatrixIO
matlabToSvdlibcSparseBinary
INFO: Converting from Matlab double values to SVDLIBC float values;
possible loss of precision
Apr 7, 2011 6:17:24 PM edu.ucla.sspace.matrix.SVD svd
SEVERE: convertFormat
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at edu.ucla.sspace.matrix.SvdlibjDriver.readToSMat(SvdlibjDriver.java:263)
at edu.ucla.sspace.matrix.SvdlibjDriver.svd(SvdlibjDriver.java:96)
at edu.ucla.sspace.matrix.SVD.svd(SVD.java:400)
at
edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:528)
at
edu.ucla.sspace.mains.GenericMain.processDocumentsAndSpace(GenericMain.java:496)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:429)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:147)
java.lang.UnsupportedOperationException: Unknown algorithm: ANY
at edu.ucla.sspace.matrix.SVD.svd(SVD.java:409)
at
edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:528)
at
edu.ucla.sspace.mains.GenericMain.processDocumentsAndSpace(GenericMain.java:496)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:429)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:147)
Any ideas?
Is there any way to set retainDocumentSpace to "true" during the launching
of lsa-1.3.jar?
I have created the lsa space using the lsa.jar . Now I'm able to get the
similarity between any two words, and also the number of neighbors for a
given word using COSINE distance measure. How do I get similarity between a
sentence and a word in the given lsa space, or similarity between document
and a word? Please do the needful.
Hello there, I am new to this and never worked on LSA before.
I want to use LSA to compare sentences for semantic similarities and wanted
to know if lsa-1.3.jar can be used for this purpose.
I have downloaded it and seen that it works with documents only and I
really can't understand what the output means or how do it convert it into
a more readable format.
Just to test,I inserted a document with some sentences and specified the
output as text file,the output was incomprehensible, how do I read the
output?
Can someone tell me if is possible to find similarties between sentences
and know how similar they are as output?
Hi, i want use the lib to compute the co-occurrence of terms in many docs,
which class i should use, sorry, there is no examples, so its a little bit
difficult.
If you're wanting to compute the co-occurrence, you'll want to use the
GenericWordSpace class, which records word co-occurrences as features.
There's not a .jar file for running that class, but we could probably get
one up and running easily if you don't want to write the code. Please
email the development team at s-space-re...@googlegroups.com and we
can talk more there.
Just to note, LSA doesn't directly record word co-occurrences (only
document co-occurrences), so you probably don't want to use the lsa.jar for
what you're describing.
Greetings - In running LSA using the following command:
`java -jar lsa.jar -f lexis_monthly_corpus-filelist.txt -n 300 -t 1 -o text
-v ./`
I find the following message in the output:
"FINE: Finished writing matrix in MATLAB_SPARSE format with 60 columns"
I'm wondering how I can get the full 300-dimension space to written to the
file as it appears to be COALS and BEAGLE. The 60-D matrix appears to get
written regardless of output type for an n > 60.
Any ideas?
Hi,
I Have installed the S-Space package and tried it out on a small test file,
but I get following error - any ideas what's going wrong?
Tx!!
java -Xmx8g -jar ./bin/lsa.jar -d ./bin/test.txt -S
SVDLIBJ ../Data/Train/Fr/LSA_out
20-apr-2012 12:10:24 edu.ucla.sspace.util.LoggerUtil info
INFO: performing log-entropy transform
20-apr-2012 12:10:24 edu.ucla.sspace.util.LoggerUtil info
INFO: reducing to 300 dimensions
20-apr-2012 12:10:24 edu.ucla.sspace.matrix.MatrixIO
matlabToSvdlibcSparseBinary
INFO: Converting from Matlab double values to SVDLIBC float values;
possible loss of precision
java.lang.NumberFormatException: For input string: "1,584963"
at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1222)
at java.lang.Double.valueOf(Double.java:475)
at
edu.ucla.sspace.matrix.MatrixIO.matlabToSvdlibcSparseBinary(MatrixIO.java:589)
at edu.ucla.sspace.matrix.MatrixIO.convertFormat(MatrixIO.java:259)
at edu.ucla.sspace.matrix.MatrixIO.convertFormat(MatrixIO.java:220)
at edu.ucla.sspace.matrix.SvdlibjDriver.svd(SvdlibjDriver.java:95)
at edu.ucla.sspace.matrix.SVD.svd(SVD.java:343)
at
edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:350)
at
edu.ucla.sspace.mains.GenericMain.processDocumentsAndSpace(GenericMain.java:508)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:432)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:147)
--
You received this message because you are subscribed to the Google Groups "Semantic Space Research - Development" group.
To post to this group, send email to s-space-research-dev@googlegroups.com.
To unsubscribe from this group, send email to s-space-research-dev+unsub...@googlegroups.com.
I am also interested using s-space's LSA for doc similarity, so I need to load a pre-trained sspace, and convert new documents into the low-dimensional space for comparison. Has any progress been made in the codebase for implementing these remaining steps -- as far as I can tell, there has not been (aside from the related item of persisting the documentSpace was done).
So in order to compute U^-1^ * S^T^ * d I still need to:
1. Obtain the singularValues (S) stored in the SVD class (e.g. SingularValueDecompositionLibC), by downcasting the reducer to its implementing class, and then persist it in the LatentSemanticAnalysis class in the same way the documentSpace is persisted.
2. Integrate my own Matrix inversion (I’m sure they abound, but any recommendations?)
Does that sound about right?Thanks!!
--
You received this message because you are subscribed to the Google Groups "Semantic Space Research - Development" group.
To view this discussion on the web visit https://groups.google.com/d/msg/s-space-research-dev/-/RtatwwmJAcUJ.
To post to this group, send email to s-space-re...@googlegroups.com.
To unsubscribe from this group, send email to s-space-research...@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Semantic Space Research - Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s-space-research-dev+unsub...@googlegroups.com. Visit this group at http://groups.google.com/group/s-space-research-dev.
For more options, visit https://groups.google.com/d/optout.