Seeded LDA | guided lda | using eta to guilde LDA

479 views
Skip to first unread message

ra...@qedata.io

unread,
Sep 17, 2018, 7:21:15 PM9/17/18
to Gensim
Hi There, I have been trying to use eta to create a topic word matrix with values to guide the LDA process.

My expectation is the the LDA will converge using the set eta and my topics and topic order will be known.

For example: if I seed:
[0][word_index] then know 0 is topic 1
[1][word_index] then know 0 is topic 2
[2][word_index] then know 0 is topic 3
[3][word_index] then know 0 is topic 4

When I take the top 10 words for each topic the order is all scrambled. They seemed to be correctly clustered but the order is all off.

topic 1 is at index 3
topic 2 is at index 0
etc..


Am I setting eta wrong?

In the end I want to use gensim like this algorithm.
https://github.com/vi3k6i5/GuidedLDA/blob/master/guidedlda/guidedlda.py

Please help

ra...@qedata.io

unread,
Sep 17, 2018, 7:54:03 PM9/17/18
to Gensim
Correction:


For example: if I seed:
[0][word_index] then index 0 is topic 1
[1][word_index] then index 1 is topic 2
[2][word_index] then index 2 is topic 3
[3][word_index] then index 3 is topic 4

ra...@qedata.io

unread,
Nov 9, 2018, 3:49:17 PM11/9/18
to Gensim
Solution was that in PyLDAvis there was shuffling of the return dataframe.

Other problems of eta exist. Specifically the an eta between 0->1 does not provide help and requires a multiplicative factor to ensure expected results.

On Monday, September 17, 2018 at 7:21:15 PM UTC-4, ra...@qedata.io wrote:

simon mackenzie

unread,
Mar 10, 2019, 1:00:44 PM3/10/19
to Gensim
If you got this working please can you say how. I assumed eta is topics*words*probability. Therefore for 5 topics the default is .2. Then for sport in topic 0 I set football column as [.8, .05, .05, .05, .05].

This did not produce any better results than unguided. If I set the random seed then the results are identical.
Reply all
Reply to author
Forward
0 new messages