Hi!
I have a list of words and their embeddings as a numpy array and I want to import them as a gensim Word2Vec model.
Since the gensim model embeddings can be easily updated I wrote gensim.wv.syn0 = my_embeddings.
However, it is not as easy to update the vocabulary keys with a new string list to maintain the word,embedding pair order. First, I tried to change the value of the vocabulary items but I did not succeed. Then, I decided to give the list of words as input to the model and use the order that gensim used (I also can not understand what kind of sorting it is using "Leap years" < "four" < "aria", consider that a word appears only once).
So, I updated the embeddings following the order of model.wv.vocab, discovering that model.wv.syn0 has a different order than model.wv.vocab.
I would expect the following condition to be True for all i but it is not!
np.array_equal(model[model.vocab.keys()[i]] , model.syn0[i])
1. There is an easier way to import a list of words and a numpy array in gensim Word2Vec model?
2. Why the order of syn0 and vocab are different and how can I deal with it?
Thank you,
Debora