Gensim Word2Vec: Vocab Size and getting word by index?

Kevin L

unread,

Nov 30, 2015, 5:06:29 AM11/30/15

to gensim

Hi, maybe a too simple question but couldn't find the answer in Gensim.

I have loaded a trained model. How do I get the vocab size of this model and a word by index?

Until now I used code from the Word2Vec Python Implementation like this:

max = model.vocab.size -1 #max vocab size

model.similarity(model.vocab[(randint(0,max))],model.vocab[(randint(0,max))]) #similarity between pair of random words

Well, doesn't work in Gensim. What is the equivalent to this?

Thanks for help.

Kevin L

unread,

Nov 30, 2015, 7:44:49 AM11/30/15

to gensim

I found this solution:

max = len(model.vocab) -1
wordVocab = [k for (k, v) in model.vocab.iteritems()]
model.similarity(wordVocab[randint(0,max)],wordVocab[randint(0,max)])

It's working, even it is maybe not the smartest solution.

Gordon Mohr

unread,

Nov 30, 2015, 2:02:57 PM11/30/15

to gensim

That's fine! But `model.index2word` is already like your `wordVocab`, and goes from an int index to a string token (word). And those strings can then be used as key indexes into the `model.vocab` dictionary.

So your code could also just be:

max = len(model.vocab) - 1

model.similarity(model.index2word[randint(0, max)], model.index2word[randint(0,max)]))

- Gordon

Reply all

Reply to author

Forward