I'm doing topics modelling on a corpus using Gensim LDA implementation.When I compare perplexity on different number of topics, I observe that with increasing number of topics from 5 to 60 although likelihood on both training and test set increases, the amount of increase is very small:
num_topics
|
Likelihood_Train |
Likelihood_Test |
| 5 |
-229377021.9 |
-58513103.75 |
| 10 |
-224322296.5 |
-57476512.33 |
| 20 |
-219480128.3 |
-56550233.29 |
| 30 |
-217518306.7 |
-56260050.09 |
| 40 |
-215907408.8 |
-56081111.45 |
50
|
-214815993.7 |
-55982093.45 |
60
|
-213963838.1 |
-55885838.52 |
I'm not sure how should I interpret this? I'm setting iterations and gamma_threshold values to more extreme values than their default ones (500 and 0.00001 respectively) so that I would be sure that the model converges properly.
Ryan