How to use `most_similar_to_given`

466 views
Skip to first unread message

emmanuel chappat

unread,
May 22, 2019, 11:52:52 AM5/22/19
to Gensim
Hi guys,

I am retrieving a list of word embeddings and I would like to run a `most_similar` given another embedding (i.e. the query). 

I've stumbled on `most_similar_to_given` API which I've tried to use this:

some_vector_list = np.array([[1, 5], [-2, -3], [12, 43]])
query
= np.array([2,3])
gensim
.models.keyedvectors.WordEmbeddingsKeyedVectors.most_similar_to_given(entity1=query, entities_list=some_vector_list)

however this throws a `most_similar_to_given() missing 1 required positional argument: 'self'` error.

Is there a recommended way to go about this use case?

Thanks a lot

Gordon Mohr

unread,
May 22, 2019, 1:21:11 PM5/22/19
to Gensim
`most_similar_to_given()` is an instance method; it must be called on a specific preexisting instance of `KeyedVectors`, not a class. For example, if `my_vectors` is some `KeyedVectors` instance, and `some_vector_keys` are the lookup keys for vectors already in it (*not* vectors themselves), then:

    my_vecs.most_similar_to_given(entity1=query_key, entities_list=some_vector_keys)

If you want to build your own instance, you may be able to use something like the `add()` method (https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.Word2VecKeyedVectors.add), for example, with your existing `some_vector_list`, but also some new list of the keys you want to be able to look:

    my_vecs = Word2VecKeyedVectors(vector_size=2)
    my_vecs.add(some_vector_keys, some_vector_list)

(Of course once you have this subset, you can just use a normal `most_similar()` – no need for the `most_similar_to_given()` variant.)

- Gordon

emmanuel chappat

unread,
May 23, 2019, 12:30:26 AM5/23/19
to Gensim
Hi Gordon,

Thanks, I was able to add the vectors manually like you've suggested:

```
model = Word2VecKeyedVectors(vector_size=2)
some_vector_list = np.array([[1, 5], [-2, -3], [12, 43]])
model.add(['a', 'b', 'c'], some_vector_list)
```

This, however, won't work for my use case as: 
1. I don't have strings ID for the embeddings ( I could go around that by generating unique strings but its seems not ideal) 
2. Most similar query is based on a given embedding vector, not a string.

Do you know of any helper function in Gensim code that given a vector, runs a cosine similarity in a list of N vectors?  

Gordon Mohr

unread,
May 23, 2019, 2:58:11 PM5/23/19
to Gensim
There's no utility function for exactly what you want, but you can model your own code after gensim's source for `most_similar()`:


- Gordon

emmanuel chappat

unread,
May 24, 2019, 10:48:36 AM5/24/19
to Gensim
Got it, Thanks Gordon.
Reply all
Reply to author
Forward
0 new messages