problem when load google pre-trained word2vector with gensim 3.2.0

3,096 views
Skip to first unread message

Yuchen Zhang

unread,
Jan 9, 2018, 3:38:28 AM1/9/18
to gensim
Hi All,
I'm trying to retrain google news word2vec model with my additional corpuses. My gensim version is 3.2.0.
However, I found out that, in gensim 3.2.0, 'Word2Vec.load_word2vec_format()' is deprecated, and 'KeyedVectors.load_word_format()' does not support 'continue training'. So, I use 'Word2Vec.load()' in my code. However, an error raised and I can't fixed it. Can someone help please?

The code is:
import gensim.models as gmodels
google_model = gmodels.Word2Vec.load('./model/GoogleNews-vectors-negative300.bin.gz')

The error is:
Traceback (most recent call last):
  File "D:/5 update_word2vec_model/update_word2vec_model.py", line 28, in <module>
    google_model = gmodels.Word2Vec.load('./model/GoogleNews-vectors-negative300.bin.gz')
  File "D:\Program Files\Python\Python35\lib\site-packages\gensim\models\word2vec.py", line 1569, in load
    model = super(Word2Vec, cls).load(*args, **kwargs)
  File "D:\Program Files\Python\Python35\lib\site-packages\gensim\utils.py", line 281, in load
    obj = unpickle(fname)
  File "D:\Program Files\Python\Python35\lib\site-packages\gensim\utils.py", line 933, in unpickle
    return _pickle.load(f, encoding='latin1')
_pickle.UnpicklingError: invalid load key, '3'.

I tried the zipped file "GoogleNews-vectors-negative300.bin.gz" and unzipped file "GoogleNews-vectors-negative300.bin", but nothing changed.

Ivan Menshikh

unread,
Jan 10, 2018, 2:48:38 AM1/10/18
to gensim
Hi Yuchen,
your code should look like

from gensim.models import KeyedVectors
model
=
KeyedVectors.load_word2vec_format('./model/GoogleNews-vectors-negative300.bin.gz', binary=True)

or, you can use gensim-data for this, look at example.

Yuchen Zhang

unread,
Jan 11, 2018, 5:31:26 AM1/11/18
to gensim
Hi Ivan,
Thank you for replying. 
Model loaded through 'KeyedVectors.load_word2vec_format()' doesn't support continue training, right?
What if I not only want to load the model, but also need to retrain/update it with customized additional corpus. What should I do?

Ivan Menshikh

unread,
Jan 11, 2018, 11:32:08 PM1/11/18
to gensim
Yes, you are right because of this file (doesn't contain all needed stuff for it, only contains word-vectors).
For your case - train you own Word2Vec and call model.save(...) after (this is full model -> you can continue training after Word2Vec.load(...))

FYI: Almost all the files that I found on the Internet - word-vectors only (doesn't support training continuation), whole models are shared very rarely.

Yuchen Zhang

unread,
Jan 13, 2018, 7:38:25 PM1/13/18
to gensim
Thank you, Ivan. It helps!

Biswadeep

unread,
May 16, 2018, 7:04:04 AM5/16/18
to gensim
Thanks!This helped me out a lot in my Project!

Cheers!
Message has been deleted
Message has been deleted

akash kandpal

unread,
Jul 19, 2018, 5:55:10 AM7/19/18
to gensim
@Ivan Menshikh Sir,  I have a pretrained binary file and I want to train it on my corpus.

Approach I tried :
I tried to extract the txt file from the bin file I had and use this as a word2vec file at time of loading and further trained it on my own corpus and saved the model but the model is performing badly for the words which are there in the pre-trained bin file (I used intersect_word2vec_format command for this.)I have attached the script I used.

What should be my approach for my model to perform well on words from both the pre-trained file and my corpus?
general_finance_merged3.py
Reply all
Reply to author
Forward
0 new messages