Negative log perplexity in gensim ldamodel

3,881 views

Skip to first unread message

Guthrie Govan

unread,

Aug 20, 2018, 5:52:37 PM8/20/18

to Gensim

I'm using gensim's ldamodel in python to generate topic models for my corpus. To evaluate my model and tune the hyper-parameters, I plan to use log_perplexity as evaluation metric.

However, computing log_perplexity (using predefined LdaModel.log_perplexity function) on the training (as well on test) corpus returns a negative value (~ -6). I'm a little confused here if negative values for log perplexity make sense and if they do, how to decide which log perplexity value is better ? Should I try to minimize magnitude of log perplexity?

Following are the parameters I'm using while training -

num_topics = 50

alpha = 0.02

eta = 0.02

iterations = 100

passes = 10

Other optional parameters are default

Training corpus details -

Number of documents ~ 30,000

Vocabulary size (after removing stop words, verbs, adjectives, etc.) ~ 35000

Median document size (after removing stop words, etc.) ~ 50

Gensim python library version - 3.4.0

Thanks!

Armaan Bhullar

unread,

Sep 6, 2018, 6:51:01 AM9/6/18

to Gensim

Hey Govan, the negatuve sign is just because it's a logarithm of a number. Perplexity is basically the generative probability of that sample (or chunk of sample), it should be as high as possible. Since log(x) is monotonically increasing with x, gensim perplexity should also be high for a good model. So in your case, "-6" is better than "-7" for example

Reply all

Reply to author

Forward

0 new messages