Doc2vec - how to access the L2 norm vectors

82 views
Skip to first unread message

Michel Fombellida

unread,
Aug 26, 2017, 8:03:31 AM8/26/17
to gensim
Hi there,

I have a trained model in a file and I load it :
 model = Doc2Vec.load(file)
I generate the L2-norm vectors
 model.init_sims(replace=False)
I can then access the document vectors with this command :

 vector = model.docvecs['label']

but how do I access the L2-norm vectors for the same 'label' ?


Thanks in advance.

Gordon Mohr

unread,
Aug 26, 2017, 12:18:08 PM8/26/17
to gensim
Note that in the top `model` object, only the word-vectors are stored. And so `model.init_sims()` only precalculates their unit-norms.

To pre-calculate the docvecs unit-norms, you'd use `model.docvecs.init_sims()`. 

For the word-vectors, there's a utility method on the object which stores them which takes an optional `use_norm` parameter to request the normed-vector. For example:

    model.wv.word_vec('apple', use_norm=True)

There's not a similar utility method on the `model.docvecs` object. (There probably should be.) 

You'd need to look in the `model.docvecs.doctag_syn0norm` property, at the right int index, to directly access the unit-normed vector – you would want to copy the tag0index-lookup approach at...


(The `model.docvecs.most_similar()` and related operations that traditionally operate based on cosine-distances already make use of the normed vectors before calculating their results.)

- Gordon
Reply all
Reply to author
Forward
0 new messages