Gensim Word2Vec: Vocab Size and getting word by index?

8,304 views
Skip to first unread message

Kevin L

unread,
Nov 30, 2015, 5:06:29 AM11/30/15
to gensim
Hi, maybe a too simple question but couldn't find the answer in Gensim.

I have loaded a trained model. How do I get the vocab size of this model and a word by index?

Until now I used code from the Word2Vec Python Implementation like this:

max = model.vocab.size -1 #max vocab size

model
.similarity(model.vocab[(randint(0,max))],model.vocab[(randint(0,max))]) #similarity between pair of random words

Well, doesn't work in Gensim. What is the equivalent to this?

Thanks for help.

Kevin L

unread,
Nov 30, 2015, 7:44:49 AM11/30/15
to gensim
I found this solution:
max = len(model.vocab) -1
wordVocab
= [k for (k, v) in model.vocab.iteritems()]
model
.similarity(wordVocab[randint(0,max)],wordVocab[randint(0,max)])
It's working, even it is maybe not the smartest solution.

Gordon Mohr

unread,
Nov 30, 2015, 2:02:57 PM11/30/15
to gensim
That's fine! But `model.index2word` is already like your `wordVocab`, and goes from an int index to a string token (word). And those strings can then be used as key indexes into the `model.vocab` dictionary. 

So your code could also just be:
 
    max = len(model.vocab) - 1
    model.similarity(model.index2word[randint(0, max)], model.index2word[randint(0,max)]))

- Gordon
Reply all
Reply to author
Forward
0 new messages