about gensim word2vec most_similar() function

563 views
Skip to first unread message

sam

unread,
Jan 18, 2016, 7:58:44 PM1/18/16
to gensim
I'm using the most_similar() method as below to get all the words similar to a given word:

word,score= model.most_similar('apple',topn=sizeofdict)

AFAIK, what this does is, calculate the cosine similarity between the given word and all the other words in the dictionary. When i'm inspecting the words and scores, I can see there are words with negative score down the list. What does this mean? are them the words that has opposite meaning to the given word?

Also if it's using cosine similarity, how does it get a negative value? cosine similarity varies between 0-1 for two documents.

Radim Řehůřek

unread,
Jan 26, 2016, 1:12:02 AM1/26/16
to gensim
Hello sam,

yes, you could interpret negative scores as "opposite meaning". Sometimes, for specific applications of similarity, people take absolute value of the cosine score as well. This can be interpreted as "I want to treat opposites as similar".

Cosine similarity has a range of <-1, 1> (not <0, 1>).

HTH,
Radim
Reply all
Reply to author
Forward
0 new messages