Hey Romain,
The answer is that whatever you set the higher level threshold to that is the number of topics estimated. Each topic has an associated weight (alpha). Alpha ranges between zero and one, and the sum across them (by topics) is equal to one. To get a sense of the number of topics "used" look at the value of alpha for any given topic. If it's a very small value, then you would say the topic is not likely to exist in the data. You can easily access the data with the hdp_to_lda attribute. It returns a tuple with the first element being a vector with all the values of alpha.
The trick, I discovered, is to think about it like a regular hierarchical linear model. It always will estimate the weights for each topic. The non-parametric trick is that if you estimate a sufficiently large number of topics the values of alpha don't change much if you reestimate the model with a different number of topics. The model doesn't directly estimate the number of topics in the data.
Hope that helps,
Bill
--
You received this message because you are subscribed to a topic in the Google Groups "gensim" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gensim/YujF90ahELE/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to gensim+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.