Question about the output file

Alejandra Lorenzo

unread,

Mar 5, 2012, 7:55:07 AM3/5/12

to pl...@googlegroups.com

Hi,

I'm a newbie to PLDA, I would like to apply it on a huge corpus (like gigaword) to obtain some synonyms. I tried your code and it's the fastest lda algorithm i found...

But, I don't know exactly how to obtain the theta and phi parameters of LDA.

I see the output file of the procedure is a model_plda.txt file, where each line is the topic distribution of a word: the first element is the word, and the rest is the occurrence count within each topic.

Here I have a question, since I see decimals numbers (is this because the occurrence is averaged between machines?).

My second question, do you think that I could get theta and phi from the output of plda? I kind of imagine how to do it for phi, but I don't see the way to do it for theta, since I don't see information about the documents in the output file. Should I use the infer procedure?

Thanks a lot for your answer.

Alejandra.

Arturo Sánchez Correa

unread,

Jul 19, 2013, 7:16:25 AM7/19/13

to pl...@googlegroups.com

I have the same concerns about the output of plda, did you manage to find out how to get information about the documents at all?

It's been a long time since you asked the question,

任宏达

unread,

Oct 6, 2016, 10:30:25 PM10/6/16

to PLDA

I have the same questions here.

I can get one Topic*Word Matrix(from PLDA), and I only have Doc*Word Matrix. I really don't know how to get Doc*Topic Matrix.

I tried 2 ways, first I infer(provided by PLDA) the training dataset ( which is a very bad idea and get messy results); second, I add up all word-topic distributions, which reaches a very different result from stm in R. stm in R can provide this Doc*Topic Matrix, but unfortunately, it can't run too large topic number.

I am very anxious about this problem, hope some one can answer it for me.

在 2013年7月19日星期五 UTC+8下午7:16:25，Arturo Sánchez Correa写道：

Reply all

Reply to author

Forward