I've seen a few posts here where people have discussed the situation where perplexity counter-intuitively increases when you increase the number of topics, but have not seen an explanation as to why this would occur. Is anyone able to explain why this happens?
I am currently running batch LDA on a very large corpus (1 million documents) but with smaller document sizes (20-100 words) and finding that per-word perplexity increases on a holdout sample of 10,000 documents as the number of topics increases (number of topics 1-10,20,50,100,200,500). I have tried this with a symmetric alpha, and auto alpha and situation continues to occur.
What's interesting is that when looking at the topics themselves tend to be quite intuitive (and stable when tweaking the seed) so I feel the model is squeezing something out of the data.
Many thanks
Tom