most similar words in the vocabulary with Doc2Vec similarity

Praveen049

unread,

May 5, 2019, 5:00:38 AM5/5/19

to Gensim

Hi

I have implemented document similarity with Doc2Vec.

My question is :

once i get the most similarity documents for a given input document identified is there a way to analyze the associated keyed vector for each word in the model to identify the words which are most similar between the

word sin the input document and the words in the most similar documents ?

Cheers

Message has been deleted

Gordon Mohr

unread,

May 6, 2019, 8:01:39 PM5/6/19

to Gensim

[corrected & expanded reply]

If you have a `Doc2Vec` model `d2v_model`, then `d2v_model.docvecs.most_similar()` will return the doc-vector tags (lookup keys) that are most-similar. With those, you can look up those doc-vectors.

And if you've used a `Doc2Vec` mode which trains words, then `d2v_model.wv.most_similar()` can return the words in the model closest to some query doc-vectors or words.

But the model retains no list of the individual words originally in each document – you'd have to do any comparisons based on individual-words-in-one-doc, to individual-words-in-another-doc, separately, in your own code.

- Gordon

Praveen049

unread,

May 8, 2019, 4:10:47 AM5/8/19

to Gensim

Thanks Gordon for the response.

Reply all

Reply to author

Forward