LDA perplexity calculation

Kyle Jensen

unread,

Jul 1, 2013, 4:20:51 PM7/1/13

to gen...@googlegroups.com

Hi All, I'd sincerely appreciate some guidance on perplexity calculations for LDA models I'm building. My problem is as follows.

Background:

* Dictionary has 100k terms (ie, relatively large)

* Corpus has a few million documents, which are divided into roughly equal classes A, B, & C.

* Documents in class A are more similar to B than C, based on our human understanding.

* I built a LDA model of class A documents and have intuitive results when looking at the document-topic and topic-word distributions.

My problem:

Given the class-A-model, I'd expect a document drawn from class-C to have a higher perplexity than a document from class-A or class-B. However, this is not what I see. Instead, they all appear to be the same. That is, the `bounds()` method of the LDA model gives me approximately the same---large, negative---number for documents drawn from any class.

So, I'm embarrassed to ask. Am I correct that the .bounds() method is giving me the perplexity. Is there an alternative way for me to know the likelihood that a model would produce a document?

Thanks!

Kyle

Radim Řehůřek

unread,

Jul 2, 2013, 6:24:35 AM7/2/13

to gen...@googlegroups.com

Hello Kyle,

don't be embarrassed :) `bound` can indeed be used to give an estimate on perplexity. But the unit here is "corpus" -- comparing scores for different inputs (different corpora) is comparing apples to oranges. To compare scores across different corpora, normalize (divide) the estimated bound by the number of words in the input.

For example, for per-word perplexity: numpy.exp2(-model.bound(corpus) / sum(cnt for document in corpus for _, cnt in document))

HTH,

Radim

Thanks!
Kyle

Kyle Jensen

unread,

Jul 5, 2013, 2:42:20 PM7/5/13

to gen...@googlegroups.com

Radim -

As ever, thanks so much for your timely and lucid answer. I sincerely appreciate it.

Kyle

Kyle Jensen

unread,

Jul 9, 2013, 3:04:26 PM7/9/13

to gen...@googlegroups.com

Radim -

As you may have noticed in my previous post. I'm looking for a metric that will tell me whether or not an individual document could be well-described by a particular topic model. As an alternative to the perplexity calculation, I have been calculating the KL-divergence between the actual word distribution for a document, and the word distribution implied by the inferred topic distribution.

I noticed, based on a few tests, that this works well. May I ask your thoughts on that strategy?

Thanks,

Kyle

Radim Řehůřek

unread,

Jul 14, 2013, 9:27:25 AM7/14/13

to gen...@googlegroups.com

Hello Kyle,

On Tuesday, July 9, 2013 9:04:26 PM UTC+2, Kyle Jensen wrote:

Radim -

As you may have noticed in my previous post. I'm looking for a metric that will tell me whether or not an individual document could be well-described by a particular topic model. As an alternative to the perplexity calculation, I have been calculating the KL-divergence between the actual word distribution for a document, and the word distribution implied by the inferred topic distribution.

but these two distributions are not compatible -- one has samples of cardinality |dictionary|, the other of |num_topics|. Or perhaps I misunderstand what you mean here :)

-rr

Kyle Jensen

unread,

Jul 16, 2013, 11:04:16 AM7/16/13

to gen...@googlegroups.com

Radim --

Sorry, I worded that poorly. What I meant was the following.

I have a document that I've not seen previously and I infer a topic distribution for this document using a previously built LDA model. That topic distribution implies a particular word distribution. (Ie., the distribution of words that would produced if I generated an arbitrarily long document with the same topic distribution.) So, now I have two word distributions: the actual word distribution for a document, and the distribution implied by the inferred topic distribution.

I've been measuring the KL-divergence between these two word distributions as a metric how well that single document can be described by the topic model. Anecdotally (non-parametric eye-ball test), this works for me. I'd appreciate your thoughts!

- Kyle

Reply all

Reply to author

Forward