Issue with likelihood values: LDA with ADVI using stan

159 views
Skip to first unread message

Ankur Brahmbhatt

unread,
Apr 27, 2016, 1:14:35 PM4/27/16
to Stan users mailing list, Trevor Bonjour
We tried running LDA with mean field using stan-cmd ADVI.
We are using the LDA model given in the user manual.

We tried running it for a toy data set, with 25 documents and 5 vocab words and 2 topics.
We also tried it with a dataset of 1500 documents with 9000 vocab words and 5 topics.

However, in both cases, we get a likelihood of 0 for all thousand samples. We do get values for ELBO, but under lp__ all values are 0.
Q1. Any idea why we are getting 0 values for likelihood?

Q2. With stan-ADVI we get an ELBO value of around -1100000 for the 1500 document dataset, whereas with the classical LDA implementation using VI(mean-field), we get a log likelihood value of around -890000. Is this an expected behaviour?

Q3. Is there a way of changing the output format on stan?

Thanks.

Bob Carpenter

unread,
Apr 27, 2016, 1:37:49 PM4/27/16
to stan-...@googlegroups.com, Trevor Bonjour
NOTE: ADVI is still an experimental algorithm in that we're
still working on getting the code right and understanding what
the algorithm does.

How do you compute the likelihood? The lp__ value returned
is *not* the likelihood (or even th joint or posterior density),
even in Stan's MCMC. To compound the confusion, I believe ADVI
just hacked the lp__ value to be 0 everywhere (and I believe the
fix for it is in for Stan 2.10).

No, the output formats are fixed, but RStan's pretty flexible
in letting you manipulate output.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> To post to this group, send email to stan-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Alp Kucukelbir

unread,
Apr 28, 2016, 3:38:08 PM4/28/16
to Stan users mailing list, bonjour...@gmail.com
Q1. Any idea why we are getting 0 values for likelihood?

ADVI does not compute the log probability. Instead it evaluates the ELBO, which you can store using the diagnostic output file.
 
Q2. With stan-ADVI we get an ELBO value of around -1100000 for the 1500 document dataset, whereas with the classical LDA implementation using VI(mean-field), we get a log likelihood value of around -890000. Is this an expected behaviour?

Yes. ADVI uses a general variational approximation. Details are here (http://www.proditus.com/papers/KucukelbirTran_2016.pdf).

I suspect what you call "classical LDA" uses a different variational approximation, custom tailored for LDA. 

Thus, the ELBOs (at convergence) are different by definition.

Final comment: ADVI is not the best tool for fitting LDA and there are many excellent packages out there that implement fantastic algorithms for LDA specifically.

Cheers,
Alp

Ankur Brahmbhatt

unread,
Apr 28, 2016, 9:36:11 PM4/28/16
to stan-...@googlegroups.com, Trevor Bonjour
Thanks a lot Alp and Bob for your valuable inputs. Alp, I read the paper and from what I understood is that ADVI uses gaussian distribution as a choice of variational distribution, so I was wondering if it could be a reasonable choice to choose some other distribution in ADVI, for experimental purposes, to implement LDA. If yes, then what could be a suitable distribution for LDA and any idea if this has been previously tried? Thanks again.

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/EX7vnJl6-pQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Bob Carpenter

unread,
Apr 28, 2016, 10:03:34 PM4/28/16
to stan-...@googlegroups.com, Trevor Bonjour
There's a rather substantial literature on variational approximations
to LDA, starting with the original LDA paper by Blei et al. In Stan,
you'd need to code it all in C++. Or you could use something like
Vowpal Wabbit, which Matt Hoffman coded, if you just want scalable
LDA.

- Bob
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Dustin Tran

unread,
Apr 29, 2016, 12:38:02 AM4/29/16
to stan-...@googlegroups.com, Trevor Bonjour
Matt and I were thinking about doing something like BUGS for variational inference (i.e. VIBES specifically), so that we could take advantage of the graphical model structure. This is useful in the case described here about specifying a variational approximation whose mean-field factors have the same exponential family form as the conditionally conjugate posterior’s exponential family. It’s also useful more importantly for a faster and stabler algorithm.

This would have to be done outside Stan of course because Stan (as far as I know) only has access to the “model” via the log joint density on a fixed data set and its gradient. I think expressive variational families is a solved problem even in this setting though; see my papers with Rajesh and Dave. There’s also a useful connection to ADVI in that the choice of distribution doesn’t matter so long as you use the right transformation from constrained to unconstrained (or unconstrained to unconstrained).

Dustin
Reply all
Reply to author
Forward
0 new messages