Question about the output file

169 views
Skip to first unread message

Alejandra Lorenzo

unread,
Mar 5, 2012, 7:55:07 AM3/5/12
to pl...@googlegroups.com
Hi, 
I'm a newbie to PLDA, I would like to apply it on a huge corpus (like gigaword) to obtain some synonyms. I tried your code and it's the fastest lda algorithm i found... 
But, I don't know exactly how to obtain the theta and phi parameters of LDA. 
I see the output file of the procedure is a model_plda.txt file, where each line is the topic distribution of a word: the first element is the word, and the rest is the occurrence count within each topic. 
Here I have a question, since I see decimals numbers (is this because the occurrence is averaged between machines?).
My second question, do you think that I could get theta and phi from the output of plda? I kind of imagine how to do it for phi, but I don't see the way to do it for theta, since I don't see information about the documents in the output file. Should I use the infer procedure?
Thanks a lot for your answer. 
Alejandra. 

Arturo Sánchez Correa

unread,
Jul 19, 2013, 7:16:25 AM7/19/13
to pl...@googlegroups.com
I have the same concerns about the output of plda, did you manage to find out how to get information about the documents at all?



It's been a long time since you asked the question, 

任宏达

unread,
Oct 6, 2016, 10:30:25 PM10/6/16
to PLDA
I have the same questions here.

I can get one Topic*Word Matrix(from PLDA), and I only have Doc*Word Matrix. I really don't know how to get Doc*Topic Matrix.

I tried 2 ways, first I infer(provided by PLDA) the training dataset ( which is a very bad idea and get messy results); second, I add up all word-topic distributions, which reaches a very different result from stm in R. stm in R can provide this Doc*Topic Matrix, but unfortunately, it can't run too large topic number.

I am very anxious about this problem, hope some one can answer it for me.

在 2013年7月19日星期五 UTC+8下午7:16:25,Arturo Sánchez Correa写道:
Reply all
Reply to author
Forward
0 new messages