If I understand the topic2vec paper properly, they're doing LDA first, completely independent of word-vectors. At the end of that process, they have LDA topics, and scores indicating words most associated with certain topics.
They then do a word2vec training, on the same texts, but every place where word2vec would normally attempt to predict a word, they *also* try to predict the LDA topic most associated with the word.
So let's say a training sentence was "The cat leapt to the branch". And let's say that 'cat' was found most-associated with LDA 'topic_3', and 'leapt' most-associated with 'topic_27', and 'branch' most-associated with 'topic_11'. Their word2vec training is then somewhat like, instead of just training on the original raw sentence "The cat leapt to the branch", training on all the alternate/expanded variants:
The cat leapt to the branch
The topic_3 leapt to the branch
The cat topic_27 to the branch
The cat leapt to the topic_11
(This may not be exactly right, but it's the same gist, so might get similar results via a mere preprocessing step on the corpus, without modifying the word2vec code. And the Yahoo queryCategorizr paper also seems to be doing something very similar.)
At the end of the process, all the `topic_#` pseudowords wind up with word-vectors, and their relative distances to other topics and other words may be useful in the same way word-vectors are.
The Doc2Vec-like way to approximate the same effect could be to supply the top LDA topics as extra tags on TaggedDocument examples. So instead of vanilla Doc2Vec examples like:
TaggedDocument(tags=['doc_7'], words=['the', 'cat', 'leapt', 'to', 'the', 'branch'])
...you'd have instead...
TaggedDocument(tags=['doc_7', 'topic_3', 'topic_27', 'topic_11'], words=['the', 'cat', 'leapt', 'to', 'the', 'branch'])
Hopefully, the various 'topic_#' tags should again arrange in some useful constellation. If using a Doc2Vec mode that co-trains words in the same vector-space (`dm=1` or `dm=0, dbow_words=1`), they might also arrange in a way that renders them comparable to words. And per the claims of the topic2vec paper, those words closest to topic-vectors might be more helpful in understanding the subtle differences between LDA topics.
Not sure any of these approaches would offer great results – just that they're plausible, given what's suggested by the topic2vec and queryCategorizr results.
- Gordon