Negative log perplexity in gensim ldamodel

閲覧: 3,604 回
最初の未読メッセージにスキップ

Guthrie Govan

未読、
2018/08/20 17:52:372018/08/20
To: Gensim
I'm using gensim's ldamodel in python to generate topic models for my corpus. To evaluate my model and tune the hyper-parameters, I plan to use log_perplexity as evaluation metric.

However, computing log_perplexity (using predefined LdaModel.log_perplexity function) on the training (as well on test) corpus returns a negative value (~ -6). I'm a little confused here if negative values for log perplexity make sense and if they do, how to decide which log perplexity value is better ? Should I try to minimize magnitude of log perplexity?

Following are the parameters I'm using while training -
    num_topics = 50
    alpha = 0.02
    eta = 0.02
    iterations = 100
    passes = 10
Other optional parameters are default

Training corpus details - 
    Number of documents ~ 30,000
    Vocabulary size (after removing stop words, verbs, adjectives, etc.) ~ 35000
    Median document size (after removing stop words, etc.) ~ 50

Gensim python library version - 3.4.0

Thanks!

Armaan Bhullar

未読、
2018/09/06 6:51:012018/09/06
To: Gensim
Hey Govan, the negatuve sign is just because it's a logarithm of a number. Perplexity is basically the generative probability of that sample (or chunk of sample), it should be as high as possible. Since log(x) is monotonically increasing with x, gensim perplexity should also be high for a good model. So in your case, "-6" is better than "-7" for example
全員に返信
投稿者に返信
転送
新着メール 0 件