averaging word vectors in gensim

Donna Hoffman

unread,

Jan 30, 2018, 12:15:58 PM1/30/18

to gensim

Does gensim have a method that could give (weighted) averages of the word2vec vectors for each phrase/sentence? That would be great, especially if weights were tf-idf or idf.

Gordon Mohr

unread,

Jan 30, 2018, 1:05:51 PM1/30/18

to gensim

There's no such single method in Word2VecKeyedVectors.

For its own purposes KeyedVectors does a simple (non-weighted) average of multiple vectors as a one-liner inside a few of its other methods. Some examples:

https://github.com/RaRe-Technologies/gensim/blob/1f357a7c4db27ea9c946dbc6942d82b00815a55e/gensim/models/keyedvectors.py#L510

https://github.com/RaRe-Technologies/gensim/blob/1f357a7c4db27ea9c946dbc6942d82b00815a55e/gensim/models/keyedvectors.py#L736

https://github.com/RaRe-Technologies/gensim/blob/1f357a7c4db27ea9c946dbc6942d82b00815a55e/gensim/models/keyedvectors.py#L853

- Gordon

Donna Hoffman

unread,

Jan 30, 2018, 1:10:15 PM1/30/18

to gensim

Wow, Gordon, this is great! And that code actually addresses another question I had but hadn't posted yet. Thanks! :)

Radim Řehůřek

unread,

Jan 31, 2018, 1:03:38 PM1/31/18

to gensim

Let me add that we've had good experience with weighted averaging too (for example, where the weights correspond to IDF = inverse document frequencies, as a simple proxy to "term importance"; other more domain/thesaurus/NER-specific weightings are worth considering too, depending on your application's goals).

-rr

Ivan Menshikh

unread,

Jan 31, 2018, 11:31:14 PM1/31/18

to gensim

Hello Donna,

I also very recommended look at this paper (page 4, algorithm 1), IMO this is one of the best ways how to combine word-vectors.

Radim Řehůřek

unread,

Feb 1, 2018, 4:12:55 AM2/1/18

to gensim

Nice paper Ivan! Do we have this algo in Gensim (or any plans)?

-rr

Donna Hoffman

unread,

Feb 1, 2018, 12:40:32 PM2/1/18

to gensim

Very nice paper! thanks for the pointer.

Ivan Menshikh

unread,

Feb 2, 2018, 4:27:54 AM2/2/18

to gensim

Probably if we extend this idea, this can be simple, nice and useful incubator project (I have been thinking about it for a long time).

Ivan Menshikh

unread,

Feb 6, 2018, 11:45:54 PM2/6/18

to gensim

UPD: I created a feature request (it seems that I already have a student for this task)

Reply all

Reply to author

Forward