How to use `most_similar_to

emmanuel chappat

unread,

May 22, 2019, 11:52:52 AM5/22/19

to Gensim

Hi guys,

I am retrieving a list of word embeddings and I would like to run a `most_similar` given another embedding (i.e. the query).

I've stumbled on `most_similar_to_given` API which I've tried to use this:

some_vector_list = np.array([[1, 5], [-2, -3], [12, 43]])
query = np.array([2,3])
gensim.models.keyedvectors.WordEmbeddingsKeyedVectors.most_similar_to_given(entity1=query, entities_list=some_vector_list)

however this throws a `most_similar_to_given() missing 1 required positional argument: 'self'` error.

Is there a recommended way to go about this use case?

Thanks a lot

Gordon Mohr

unread,

May 22, 2019, 1:21:11 PM5/22/19

to Gensim

`most_similar_to_given()` is an instance method; it must be called on a specific preexisting instance of `KeyedVectors`, not a class. For example, if `my_vectors` is some `KeyedVectors` instance, and `some_vector_keys` are the lookup keys for vectors already in it (*not* vectors themselves), then:

my_vecs.most_similar_to_given(entity1=query_key, entities_list=some_vector_keys)

If you want to build your own instance, you may be able to use something like the `add()` method (https://radimrehurek.com/gensim/models/keyedvectors.html#gensim.models.keyedvectors.Word2VecKeyedVectors.add), for example, with your existing `some_vector_list`, but also some new list of the keys you want to be able to look:

my_vecs = Word2VecKeyedVectors(vector_size=2)

my_vecs.add(some_vector_keys, some_vector_list)

(Of course once you have this subset, you can just use a normal `most_similar()` – no need for the `most_similar_to_given()` variant.)

- Gordon

emmanuel chappat

unread,

May 23, 2019, 12:30:26 AM5/23/19

to Gensim

Hi Gordon,

Thanks, I was able to add the vectors manually like you've suggested:

```

model = Word2VecKeyedVectors(vector_size=2)

some_vector_list = np.array([[1, 5], [-2, -3], [12, 43]])

model.add(['a', 'b', 'c'], some_vector_list)
```

This, however, won't work for my use case as:

1. I don't have strings ID for the embeddings ( I could go around that by generating unique strings but its seems not ideal)

2. Most similar query is based on a given embedding vector, not a string.

Do you know of any helper function in Gensim code that given a vector, runs a cosine similarity in a list of N vectors?

Gordon Mohr

unread,

May 23, 2019, 2:58:11 PM5/23/19

to Gensim

There's no utility function for exactly what you want, but you can model your own code after gensim's source for `most_similar()`:

https://github.com/RaRe-Technologies/gensim/blob/8741d1c674e505a2268bc9ef08e916f5c8a7e403/gensim/models/keyedvectors.py#L490

- Gordon

emmanuel chappat

unread,

May 24, 2019, 10:48:36 AM5/24/19

to Gensim

Got it, Thanks Gordon.

Reply all

Reply to author

Forward

How to use `most_similar_to_given`

emmanuel chappat

Gordon Mohr

emmanuel chappat

Gordon Mohr

emmanuel chappat