Similarity measures in SemanticSpaceExplorer all show NaN

15 views
Skip to first unread message

rocketman

unread,
Mar 29, 2011, 9:28:24 PM3/29/11
to S-Space Package Users
I checked out and built the HEAD on Win7/cygwin, ran:

/cygdrive/c/opt/java/jdk1.6.0_23/bin/java -server -Xmx2g -jar bin/
lsa.jar -d YahooFinance.load.txt -o binary -v .
/cygdrive/c/opt/java/jdk1.6.0_23/bin/java -server -Xmx2g -cp classes/
edu.ucla.sspace.tools.SemanticSpaceExplorer

but i get results like the following:

> load lsa-semantic-space.sspace
> gn software
SilkTest NaN
DLK NaN
Bible; NaN
(EBT). NaN
stationery; NaN
Electricidade NaN
non-specifically NaN
accident, NaN
SYK NaN
$3.2 NaN
> gn drugs
SilkTest NaN
Bible; NaN
(EBT). NaN
stationery; NaN
Electricidade NaN
non-specifically NaN
accident, NaN
SYK NaN
$3.2 NaN
10,100 NaN

Any ideas what might be wrong?

Thanks,
Paul

David Jurgens

unread,
Apr 4, 2011, 4:44:51 PM4/4/11
to s-spac...@googlegroups.com
Hi Paul,

  That behavior certainly looks odd.  From what I've seen, I suspect the vectors in .sspace file have NaN values, which is why you're seeing NaN similarities.  However, I'm not sure why exactly the vectors have those values.  Given that you're using the HEAD version, it should be selecting SVDLIBJ for the SVD operation, which should give stable results.  I just tested the trunk on our end and it seems to produce reasonable results, so perhaps it's something specific to your corpus, or to your system.  How big is the corpus you are using?  If we can't figure the issue out by email, would you be willing to run custom LSA jar with lots of debug output so we could track what's going on internally?

  Thanks,
  David
Reply all
Reply to author
Forward
0 new messages