Cosine Similarity

48 views
Skip to first unread message

wiem lahbib

unread,
Mar 24, 2015, 3:42:54 PM3/24/15
to s-spac...@googlegroups.com
Hello,

I have already built my semantic space using LSA .jar
Now i want use the cosine similarity function to calculate distance between words and documents How can I do it?? is there a cosine similarity .jar file??
Think you

w. Lahbib 

David Jurgens

unread,
Mar 24, 2015, 4:00:04 PM3/24/15
to s-spac...@googlegroups.com
Hi Wiem,

  Do you want to compute the similarity interactively, or are you trying to compute the similarity within some program (e.g., you want to read in a file with word pairs and print the similarity of each)?

  Thanks,
  David

--

---
You received this message because you are subscribed to the Google Groups "S-Space Package Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s-space-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

wiem lahbib

unread,
Mar 25, 2015, 6:15:28 AM3/25/15
to s-spac...@googlegroups.com
Hello,
It is about calculating the distance between a chosen word and different documents to know to which document this word is close. So I think the more appropriate is computing the similarity between pairs of term.
Thanks

wiem lahbib

unread,
Mar 26, 2015, 7:08:51 AM3/26/15
to s-spac...@googlegroups.com

Hello,
well to be more specific, I have a group of terms that I want search the most-similar words for each one of them. And from what I have read in previous discussion WordComparator class can do this.
So having a semantic space from(lsa.jar) and a word what should i do next?? Please I want some technic details:
-Should I create a main ?
-Can I run WordComparator from the prompt command??
I am really confused
Think you
Le mardi 24 mars 2015 20:42:54 UTC+1, wiem lahbib a écrit :

David Jurgens

unread,
Mar 26, 2015, 12:08:40 PM3/26/15
to s-spac...@googlegroups.com
Hi Wiem,

 
well to be more specific, I have a group of terms that I want search the most-similar words for each one of them. And from what I have read in previous discussion WordComparator class can do this.
 
So having a semantic space from(lsa.jar) and a word what should i do next?? Please I want some technic details:
-Should I create a main ?

I think in your case, it would be easier to create a new class with a main() function that (1) loads the semantic space you created, (2) loads the words you want to find, and (3) iterates over the words you want to compare to find the most similar.  You can do #3 with something like this (in pseudocode)

for (word : words) {
  double highestSimilarity = -1;
  String mostSimilarWord = null;
  for (String otherWord : wordsToCompare) {
    DoubleVector v1 = semanticSpace.getVector(word);
    DoubleVector v2 = semanticSpace.getVector(otherWord);
    double similarity = Similarity.cosineSimilarity(v1, v2);
    if (similarity > highestSimilarity) {
      highestSimilarity = similarity;
      mostSimilarWord = otherWord;
    } 
    System.out.printf("Most similar word to %s is %s (similarity: %f)%n",
                      word, mostSimilarWord, highestSimilarity);
  }
}
 
-Can I run WordComparator from the prompt command??

I don't think this is necessary.  WordComparator is designed more for cases where you need to compute all pair-wise similarities.  It sounds like in your case, you would be computing fewer comparisons, so the WordComparator might actually be a bit slower due to the pre-computaiton it does to speed up bulk operations.

  I hope this helps and please let us know if you have more questions.

  Thanks,
  David 


Le mardi 24 mars 2015 20:42:54 UTC+1, wiem lahbib a écrit :
Hello,

I have already built my semantic space using LSA .jar
Now i want use the cosine similarity function to calculate distance between words and documents How can I do it?? is there a cosine similarity .jar file??
Think you

w. Lahbib 

--

wiem lahbib

unread,
Mar 27, 2015, 6:05:18 AM3/27/15
to s-spac...@googlegroups.com
Hi David,

Think you so much for your help, so i will create a class main and i'll let you know if it works

wiem


Le mardi 24 mars 2015 20:42:54 UTC+1, wiem lahbib a écrit :
Reply all
Reply to author
Forward
0 new messages