Function "_minimize_model" in word2vec.py

34 views
Skip to first unread message

Chinmaya Pancholi

unread,
Mar 11, 2017, 12:21:41 PM3/11/17
to gensim
The function "_minimize_model" in the file word2vec.py would not make any change to our model if one calls it like :  

model._minimize_model(save_syn1 = True, save_syn1neg = True, save_syn0_lockf = True)

And thus, ideally one should be able to train the model AFTER making the above call. However, we are always setting

self.model_trimmed_post_training = True

in the  "_minimize_model" function and we would NOT be able to train the model again, irrespective of what values we pass for the parameters. Should this not be a bug? Shouldn't we check if we are actually discarding any parameters for training before we set self.model_trimmed_post_training = True?

 

Gordon Mohr

unread,
Mar 11, 2017, 3:23:17 PM3/11/17
to gensim
Yes, that seems wrong. 

This method is a bit arcane in its workings – it's trying to help provide a simple way to save some memory, but when you get into all the particulars/parameter-options, the user still needs to understand the model internals. (And in that case, they could null/un-define the properties they're sure they don't need themselves, without recourse to a convenience method.) 

Also, with the introduction of KeyedVectors, the sensible thing to do has changed. If you truly just need read-only/compare-only access to the previously-trained vectors, it'd be best to retain the KeyedVectors instance but discard the model entirely (rather than mangle the model to some smaller-but-now-not-really-usable state). 

So both incremental fixes or a total re-think of this helper would be welcome. 

- Gordon

Chinmaya Pancholi

unread,
Mar 11, 2017, 3:36:08 PM3/11/17
to gensim
I believe I now understand well why this function was created in the first place. Off the cuff, one naive thing that I could think of (while continuing to use such a function) was to explicitly check if all the 3 params passed are "True". If this is the case then the model would remain the same and we should not set 'model_trimmed_post_training' to be True. So this would allow us to train the model again.
But it would be better to give some more serious thought to the problem you mentioned above.
Should I start by creating an "issue" for this on Github? In my opinion, it would be better to make this issue known so that others could chip in with their suggestions as well. :) 

Lev Konstantinovskiy

unread,
Mar 11, 2017, 8:05:22 PM3/11/17
to gensim
Hi Chinmaya,

A PR with your proposed incremental fix of not setting up the trimmed flag would be very welcome.

A deprecation warning should also be added. This method will be deprecated in the future as the KeyedVectors is the right way to go for word2vec(not for doc2vec).

Regards
Lev

Reply all
Reply to author
Forward
0 new messages