Negative log perplexity in gensim ldamodel

3,636 views
Skip to first unread message

Guthrie Govan

unread,
Aug 20, 2018, 5:52:37 PM8/20/18
to Gensim
I'm using gensim's ldamodel in python to generate topic models for my corpus. To evaluate my model and tune the hyper-parameters, I plan to use log_perplexity as evaluation metric.

However, computing log_perplexity (using predefined LdaModel.log_perplexity function) on the training (as well on test) corpus returns a negative value (~ -6). I'm a little confused here if negative values for log perplexity make sense and if they do, how to decide which log perplexity value is better ? Should I try to minimize magnitude of log perplexity?

Following are the parameters I'm using while training -
    num_topics = 50
    alpha = 0.02
    eta = 0.02
    iterations = 100
    passes = 10
Other optional parameters are default

Training corpus details - 
    Number of documents ~ 30,000
    Vocabulary size (after removing stop words, verbs, adjectives, etc.) ~ 35000
    Median document size (after removing stop words, etc.) ~ 50

Gensim python library version - 3.4.0

Thanks!

Armaan Bhullar

unread,
Sep 6, 2018, 6:51:01 AM9/6/18
to Gensim
Hey Govan, the negatuve sign is just because it's a logarithm of a number. Perplexity is basically the generative probability of that sample (or chunk of sample), it should be as high as possible. Since log(x) is monotonically increasing with x, gensim perplexity should also be high for a good model. So in your case, "-6" is better than "-7" for example
Reply all
Reply to author
Forward
0 new messages