word2vec support for phrases

2,622 views
Skip to first unread message

Amir H. Jadidinejad

unread,
Sep 3, 2014, 4:16:39 AM9/3/14
to gen...@googlegroups.com
Hi All,

I'm playing with word2vec in GenSim to know more about distributed representation models, it's interesting. First I want to thank all contributors.

I want to calculate semantic relatedness between two phrases or sentences, what is the easiest way?

I can download GoogleNews pre-trained model and calculate word similarity but what about phrases or sentences?
Clearly, the problem is that how can I map a phrase to an appropriate vector using word2vec package?

I see the following papers in this field but I really looking for an environment to practically evaluate and learn these models:
  • Distributed Representations of Sentences and Documents
  • Distributed Representations of Words and Phrases and their Compositionality
Any comments and suggestions are welcomed.

igor.b...@ucdconnect.ie

unread,
Sep 4, 2014, 10:50:37 AM9/4/14
to gen...@googlegroups.com
In the original code, there's a word2phrase preprocessing step you can do - essentially extracting most common bigrams from text. "language model" becomes "language_model" training is performed as usual. You'll now have a vector for the phrase "language model".

Sentences can be a bit tricky, depending on length, you can get far with just an element wise sum of vectors (instead of individual word similarity you're comparing the resultant sum of word vectors from a sentence). This works well for short sentences, but breaks down for longer ones.

Adam Smith

unread,
Sep 4, 2014, 6:08:01 PM9/4/14
to gen...@googlegroups.com
Looks like the first implementation of `phrase2vec` was recently released on github: https://github.com/zseymour/phrase2vec/commits/master

Adam

Amir H. Jadidinejad

unread,
Sep 4, 2014, 9:16:04 PM9/4/14
to gen...@googlegroups.com
Dear Friends,

What's the difference between phrase2vec and word2phrase?
Unfortunately, there is no documentation in practical point of view.

Thanks.
Amir

Radim Řehůřek

unread,
Sep 5, 2014, 3:27:50 AM9/5/14
to gen...@googlegroups.com
On Friday, September 5, 2014 12:08:01 AM UTC+2, Adam Smith wrote:
Looks like the first implementation of `phrase2vec` was recently released on github: https://github.com/zseymour/phrase2vec/commits/master

It's also been ported to gensim some time ago, but the pull request is not finished: 

Your help welcome!

Radim

Ramya Y S

unread,
Jul 10, 2017, 7:22:39 AM7/10/17
to gensim
Hi! 
Were you able to compare phrases successfully? Also, which is better for this, word2vec or doc2vec? Appreciate a quick reply!

Thanks and Regards
Reply all
Reply to author
Forward
0 new messages