Showing phrases vectors using T-SNE

Lior Magen

unread,

Apr 6, 2017, 6:51:40 AM4/6/17

to gensim

I would like to plot a graph that shows the relative distance between (for example) the word "sugar" and "sugar_free", while "sugar_free" is the output of the Phrases class.

When I use T-SNE to visualize word2vec model it works perfectly and the locations and semi-clusters that are created makes sense but when I run it on a Phrases it becomes just a one big mess.

An example of messy plot that I'm getting with Phrases

And the plot that I get with Word2vec (Iv'e circled similar words that located one next to the other as should be)

So what am I missing here?

Gordon Mohr

unread,

Apr 6, 2017, 2:13:29 PM4/6/17

to gensim

What do you mean by "when I run it [t-SNE] on a Phrases"?

Phrases is typically just a preprocessing step, which combines some of a corpus's tokens into multi-word tokens. When Word2Vec training is then run on that, those tokens, along with all the surviving single-word tokens, get trained word-vectors. You'd still run t-SNE on those Word2Vec results.

If that's what you're doing, are you sure the model based on a Phrases-changed corpus is getting as much training effort as the unigram corpus? (The lack of any structure almost suggests the token-vector are still at their randomly-initialized locations.) Perhaps also the change in vocabulary after Phrases-application requires more tuning of t-SNE's metaparameters?

- Gordon

Message has been deleted

er.pra...@gmail.com

unread,

Apr 6, 2017, 9:19:40 PM4/6/17

to gensim

Hello,

I suspect you are giving phrases output directly to word2vec- in that case word2vec won't be able to train it as phrases provide generators, while word2vec need iterators. Please look here for more info regarding this - https://groups.google.com/forum/#!topic/gensim/XWQ8fPMFSi0

Lior Magen

unread,

Apr 9, 2017, 5:14:36 AM4/9/17

to gensim

Perfectly resolved my issue, thank you so much.

Reply all

Reply to author

Forward

Showing phrases vectors using T-SNE - is that possible?

Lior Magen

Gordon Mohr

er.pra...@gmail.com

Lior Magen