Hi Lev,
It seems that (regardless of how I set the document-topic prior, alpha), after manually setting the topic-word prior, eta, to a non-uniform (in fact highly peaked) distribution over some hand-picked tokens (6 topics used, with 40-125 tokens with higher weights in each one), the perplexity (as given by logging at the INFO level when fitting LDA) simply oscillates with a very regular pattern in the best of cases, and in the worst cases actually drops. Despite the strongly-peaked prior over the topic-words prior, the topics inferred still reshuffle the seed words across all topics. Though frustrating (because of course the complete reshuffling nullifies the purpose of trying to guide the topic inference via the priors) I would be happy to accept that the inference over the data set simply prefers this and overpowers the priors, however with the perplexity behaving erratically as it is I am instead suspicious that the optimization is not converging at all.
Do you have any intuition/experience/guidance as to how to better do this, or what might be the issue?
I suspect a numerical stability issue so will try making the priors smoother/less peaked, though my fear is that with very multi-modal priors (as is the case here) the variational inference may be failing altogether. Or is this concern misplaced?
Kind regards,
Ilan