Increasing perplexity with increased no. of topics on small documents

576 views

Skip to first unread message

Thomas Evans

unread,

Apr 27, 2016, 6:12:38 AM4/27/16

to gensim

I've seen a few posts here where people have discussed the situation where perplexity counter-intuitively increases when you increase the number of topics, but have not seen an explanation as to why this would occur. Is anyone able to explain why this happens?

I am currently running batch LDA on a very large corpus (1 million documents) but with smaller document sizes (20-100 words) and finding that per-word perplexity increases on a holdout sample of 10,000 documents as the number of topics increases (number of topics 1-10,20,50,100,200,500). I have tried this with a symmetric alpha, and auto alpha and situation continues to occur.

What's interesting is that when looking at the topics themselves tend to be quite intuitive (and stable when tweaking the seed) so I feel the model is squeezing something out of the data.

Many thanks

Tom

Jonathan Klemetz

unread,

May 10, 2016, 9:12:03 AM5/10/16

to gensim

Not sure, but noticed that no one had answered. I took an online course on machine learning and these things tend to happen when alpha is to large. I see that you have tried to experiment with different kinds of alphas. Maybe try decreasing it? :)

Good luck!

Reply all

Reply to author

Forward

0 new messages