how to check the number of topics used by HDP

William MacMillan

unread,

Dec 14, 2012, 6:17:54 PM12/14/12

to gen...@googlegroups.com

Subject says it all, basically.

I'm looking over the source code puzzling over how to check the number of topics that HDP ends up using. Looking over the paper and the code, it looks like the parameter K defines the threshold of the number of groups. I'd like to figure out how many groups are actually used then. Thoughts? I'm fairly new to this class of model, so maybe I'm missing something obvious.

Thanks!

b

Radim Řehůřek

unread,

Dec 18, 2012, 7:37:12 AM12/18/12

to gensim

Hi William,

have a look at the `optimal_ordering` method of HdpModel. Other than
that, sorry, I have no pointers -- I never used HDP and I'm not
familiar with that model or its pitfalls.

And apologies for the long delay in responding -- I was hoping
Jonathan (the author of the HDP code) would bite here :-)

Radim

William MacMillan

unread,

Dec 20, 2012, 2:27:52 PM12/20/12

to gen...@googlegroups.com

Thanks for the help again, Radim. I'll put up a post as this project comes to fruition. Your code has been invaluable.

QSheng Fu

unread,

Jan 16, 2013, 3:59:06 AM1/16/13

to gen...@googlegroups.com

hi, did you solved this problem? I have the same question , so please help & thanks!

在 2012年12月15日星期六UTC+8上午7时17分54秒，William MacMillan写道：

romain deveaud

unread,

Mar 4, 2013, 4:54:20 AM3/4/13

to gen...@googlegroups.com

Hi everyone,

I'm stuck with the same problem where I would like to know what is the number of topics chosen by HDP in the end.

Does anyone have an idea?

Thanks !

romain

William MacMillan

unread,

Mar 4, 2013, 8:27:36 AM3/4/13

to gen...@googlegroups.com

Hey Romain,

The answer is that whatever you set the higher level threshold to that is the number of topics estimated. Each topic has an associated weight (alpha). Alpha ranges between zero and one, and the sum across them (by topics) is equal to one. To get a sense of the number of topics "used" look at the value of alpha for any given topic. If it's a very small value, then you would say the topic is not likely to exist in the data. You can easily access the data with the hdp_to_lda attribute. It returns a tuple with the first element being a vector with all the values of alpha.

The trick, I discovered, is to think about it like a regular hierarchical linear model. It always will estimate the weights for each topic. The non-parametric trick is that if you estimate a sufficiently large number of topics the values of alpha don't change much if you reestimate the model with a different number of topics. The model doesn't directly estimate the number of topics in the data.
Hope that helps,

Bill

--
You received this message because you are subscribed to a topic in the Google Groups "gensim" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gensim/YujF90ahELE/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to gensim+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all

Reply to author

Forward