How can I get most similar words to given vector

686 views
Skip to first unread message

Matúš Pikuliak

unread,
Apr 1, 2016, 11:21:11 AM4/1/16
to gensim
Hi, I am trying to find most similar words in created vector space using python gensim library. However I don't want to use a word as a parameter for most_similar method, but my own vector. There are some conditionals with ndarray type in this method that makes me think that it is somehow possible even in current implementation. I am trying to make it work with this line:

vector = [..., ..., ...] # This is list I created
model.most_similar([np.array(vector)])

However there is an error:

  File "/usr/local/lib/python2.7/dist-packages/gensim/models/word2vec.py", line 1198, in most_similar
    mean.append(weight * word)
TypeError: unsupported operand type(s) for *: 'float' and 'module'

Is it possible to make it work? Thanks :)

Pouya

unread,
Apr 4, 2016, 12:38:20 PM4/4/16
to gensim
You first need to infer your vector:
myVec = model.infer_vector(doc_wordsalpha=0.1min_alpha=0.0001steps=5)

Infer a vector for given post-bulk training document.

Document should be a list of (word) tokens.

Then you need to find the most similar docvec:

d2v_model.docvecs.most_similar(myVec)
most_similar(positive=[]negative=[]topn=10clip_start=0clip_end=None)

Find the top-N most similar docvecs known from training. Positive docs contribute positively towards the similarity, negative docs negatively.

This method computes cosine similarity between a simple mean of the projection weight vectors of the given docs. Docs may be specified as vectors, integer indexes of trained docvecs, or if the documents were originally presented with string tags, by the corresponding tags.

The ‘clip_start’ and ‘clip_end’ allow limiting results to a particular contiguous range of the underlying doctag_syn0norm vectors. (This may be useful if the ordering there was chosen to be significant, such as more popular tag IDs in lower indexes.)



Reply all
Reply to author
Forward
0 new messages