averaging word vectors in gensim

1,269 views
Skip to first unread message

Donna Hoffman

unread,
Jan 30, 2018, 12:15:58 PM1/30/18
to gensim
Does gensim have a method that could give (weighted) averages of the word2vec vectors for each phrase/sentence? That would be great, especially if weights were tf-idf or idf.

Gordon Mohr

unread,
Jan 30, 2018, 1:05:51 PM1/30/18
to gensim

Donna Hoffman

unread,
Jan 30, 2018, 1:10:15 PM1/30/18
to gensim
Wow, Gordon, this is great! And that code actually addresses another question I had but hadn't posted yet. Thanks!  :)

Radim Řehůřek

unread,
Jan 31, 2018, 1:03:38 PM1/31/18
to gensim
Let me add that we've had good experience with weighted averaging too (for example, where the weights correspond to IDF = inverse document frequencies, as a simple proxy to "term importance"; other more domain/thesaurus/NER-specific weightings are worth considering too, depending on your application's goals).

-rr

Ivan Menshikh

unread,
Jan 31, 2018, 11:31:14 PM1/31/18
to gensim
Hello Donna,

I also very recommended look at this paper  (page 4, algorithm 1), IMO this is one of the best ways how to combine word-vectors.

Radim Řehůřek

unread,
Feb 1, 2018, 4:12:55 AM2/1/18
to gensim
Nice paper Ivan! Do we have this algo in Gensim (or any plans)?

-rr

Donna Hoffman

unread,
Feb 1, 2018, 12:40:32 PM2/1/18
to gensim
Very nice paper! thanks for the pointer.

Ivan Menshikh

unread,
Feb 2, 2018, 4:27:54 AM2/2/18
to gensim
Probably if we extend this idea, this can be simple, nice and useful incubator project (I have been thinking about it for a long time).

Ivan Menshikh

unread,
Feb 6, 2018, 11:45:54 PM2/6/18
to gensim
UPD: I created a feature request (it seems that I already have a student for this task)
Reply all
Reply to author
Forward
0 new messages