What do you mean by "when I run it [t-SNE] on a Phrases"?
Phrases is typically just a preprocessing step, which combines some of a corpus's tokens into multi-word tokens. When Word2Vec training is then run on that, those tokens, along with all the surviving single-word tokens, get trained word-vectors. You'd still run t-SNE on those Word2Vec results.
If that's what you're doing, are you sure the model based on a Phrases-changed corpus is getting as much training effort as the unigram corpus? (The lack of any structure almost suggests the token-vector are still at their randomly-initialized locations.) Perhaps also the change in vocabulary after Phrases-application requires more tuning of t-SNE's metaparameters?
- Gordon