Word2Vec: Efficient way to find most similar word to a vector?

mot...@gmail.com

unread,

Feb 20, 2015, 4:37:17 PM2/20/15

to gen...@googlegroups.com

Hello Radim and everyone!

First of all, thanks a lot for your awesome gensim! I use it a lot and it's really well developed.

Second, I'm currently working with word2vec, and I'm generating vectors that are supposed to approximate the vectors of words in the word2vec I'm interested in. I'm supposed to find the most similar words that have vectors similar to the one I'm generating.

Currently, I'm calculating the cosine similarity between each vector I generate and all the word vectors in word2vec and then choose the one with the highest score, which is really slow. I'm wondering if anyone can think of a more efficient way to do this?

Cheers,

Mo

Anh Le

unread,

Nov 12, 2015, 8:25:59 PM11/12/15

to gensim

Here you go: https://github.com/piskvorky/gensim/issues/527

Radim Řehůřek

unread,

Nov 12, 2015, 10:02:56 PM11/12/15

to gensim

In addition to Anh's proposal (giving up on exact results and using an approximate "most similar" algorithm), you can also optimize your current approach. How many words do you have in your word2vec model? How exactly do you compute the cosine similarity?