Showing phrases vectors using T-SNE - is that possible?

169 views
Skip to first unread message

Lior Magen

unread,
Apr 6, 2017, 6:51:40 AM4/6/17
to gensim
I would like to plot a graph that shows the relative distance between (for example) the word "sugar" and "sugar_free", while "sugar_free" is the output of the Phrases class. 

When I use T-SNE to visualize word2vec model it works perfectly and the locations and semi-clusters that are created makes sense but when I run it on a Phrases it becomes just a one big mess. 

An example of messy plot that I'm getting with Phrases

And the plot that I get with Word2vec (Iv'e circled similar words that located one next to the other as should be)

So what am I missing here? 


Gordon Mohr

unread,
Apr 6, 2017, 2:13:29 PM4/6/17
to gensim
What do you mean by "when I run it [t-SNE] on a Phrases"? 

Phrases is typically just a preprocessing step, which combines some of a corpus's tokens into multi-word tokens. When Word2Vec training is then run on that, those tokens, along with all the surviving single-word tokens, get trained word-vectors. You'd still run t-SNE on those Word2Vec results. 

If that's what you're doing, are you sure the model based on a Phrases-changed corpus is getting as much training effort as the unigram corpus? (The lack of any structure almost suggests the token-vector are still at their randomly-initialized locations.) Perhaps also the change in vocabulary after Phrases-application requires more tuning of t-SNE's metaparameters?

- Gordon
Message has been deleted

er.pra...@gmail.com

unread,
Apr 6, 2017, 9:19:40 PM4/6/17
to gensim
Hello,

I suspect you are giving phrases output directly to word2vec- in that case word2vec won't be able to train it as phrases provide generators, while word2vec need iterators. Please look here for more info regarding this - https://groups.google.com/forum/#!topic/gensim/XWQ8fPMFSi0

Lior Magen

unread,
Apr 9, 2017, 5:14:36 AM4/9/17
to gensim
Perfectly resolved my issue, thank you so much.
Reply all
Reply to author
Forward
0 new messages