Difference between lda.state.get_lambda() and lda.expElogbeta?

satarupa guha

未读，

2014年12月3日 00:40:062014/12/3

收件人 gen...@googlegroups.com

Hello,

I want to get the word-topic probability matrix for all words in the corpus. I looked at lda.show_topic() as well. But as far as I understood, it return word-topic probabilities for only the top few words. I need them for all the words, and having it in the form of a matrix would help. So, I thought lda.state.get_lambda() or lda.expElogbeta would be preferable, having read the discussions here and here. Both the functions return matrices of the same dimension. But the probability values(if I can them that, although they are not normalized between 0 and 1 in case of lda.state.get_lambda) are different in the two output matrices.

Which one should be used and when?

Thanks,

Satarupa

Radim Řehůřek

未读，

2014年12月3日 05:46:122014/12/3

收件人 gen...@googlegroups.com

Hello Satarupa,

you can use the `get_lambda()` method, and normalize each row=topic to sum to 1 (=probability dist).

You can see it done here in show_topic: https://github.com/piskvorky/gensim/blob/develop/gensim/models/ldamodel.py#L728 ,

including the normalization.

Best,

Radim

satarupa guha

未读，

2014年12月3日 08:07:172014/12/3

收件人 gen...@googlegroups.com

Thanks for such a quick reply!

-Satarupa

--
You received this message because you are subscribed to a topic in the Google Groups "gensim" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/gensim/6N9-Y5KVQu0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to gensim+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Satarupa Guha

MS by Research, CSE

IIIT Hyderabad

Myrthe van Dieijen

未读，

2017年2月14日 09:18:232017/2/14

收件人 gensim

Hi Radim,

I seem to have emailed you privately, so I'll post my question in this thread as well, sorry for the inconvenience. I've been working with the lda model myself and came across this thread and the two threads Satarupa mentioned in this first post. There is still something I don't quite understand though.

If I understand you correctly, the lda.expElogbeta gives the posterior distribution. That is, it gives the topic-word distribution for each topic: p(word|topic). The lda.state.get_lambda() has the same dimensions and also gives weights, but needs to be normalized in order to get the topic-word distributions for each topic. In order to compare the two methods I computed rowsums of the lda.expElogbeta matrix and the normalized lda.state.get_lambda() matrix. I assumed they were the same, but the rowsums of the lda.state.get_lambda() matrix are all exactly 1 (as expected), whereas the rowsums of the lda.expElogbeta are all around 0,99. Could you explain what the difference between the methods is? Initially I thought they were the same, which made me wonder why you'd recommend Satarupa to use lda.state.get_lambda() and normalize instead of the lda.expElogbeta method (which would be a lot quicker) to get the topic-word distributions.

Many thanks in advance for the clarification!

Myrthe

回复全部

回复作者