Hi there,
Word2vec model (Gensim 0.13.3)
I'm using gensim word2vec implementation for almost a year. I have trained my most recent word2vec model with gensim 0.13.3 and I saved it using save_word2vec_format() in a binary format.
Updating to Gensim 2.0.0
I recently updated my system to gensim 2.0.0 and I started using KeyedVectors class to load and use my word embeddings, as a simple dictionary as usual.
These days I'm researching for some optimisations on my neural networks and I basically started looking for ways to handle OOV (Out of Vocabulary) techniques, until today I'm just using random word embeddings. So I considered gensim similarity functions given the context words of the OOV word, but they don't look such a good idea when I look at some specific cases and their print outs.
Looking the actual code for Word2Vec class and KeyedVectors class, I found a possible solution (function) that looks much more reasonable in Word2Vec class. It's called predict_output_word(self, context_words_list, topn=10).
So I thought maybe I could use this function given the previous/next 2 words of my OOV word and get the most possible center word.
The problem here is that this function is working under Word2Vec class, while in gensim 2.0.0 we have to load word2vec models using Keyedvectors class, so we cannot directly call this function.
Can you think of any possible solution to this? Maybe is obvious and I'm just missing something....
Considering normalized word vectors
While I was looking gensim code, I also found the function word_vec(word, use_norm=False), so instead of using KeyedVectors as a simple dictionary, someone can use this function and get normalized word vectors, which according to the literature provide better results in some cases.
If you load your word2vec model with load_word2vec_format(), and try to call word_vec('greece', use_norm=True), you get an error message that self.syn0norm is NoneType.
Is this caused only with word2vec models trained using older versions of gensim, where probably syn0norm did not exist, or is this an actual bug (todo, whatever)?
Probably I could resolve all of these on my own through coding those functionalities, but this is not the case, I would to follow the gensim library API.
Thanks for any answer!