Dirichlet process mixture of product multinomial distributions‏

1,885 views
Skip to first unread message

Joe King

unread,
Dec 6, 2013, 5:58:52 PM12/6/13
to stan-...@googlegroups.com
Hello all

I am starting to work on an implementation of the nonparametric Bayesian model for high dimensional multivariate categorical data described in 

David B. Dunson & Chuanhua Xing (2009) Nonparametric Bayes Modeling of Multivariate Categorical Data, Journal of the American Statistical Association, 104:487, 1042-1051, DOI: 10.1198/jasa.2009.tm08439

Abstract:
Modeling of multivariate unordered categorical (nominal) data is a challenging problem, particularly in high dimensions and cases in which one wishes to avoid strong assumptions about the dependence structure. Commonly used approaches rely on the incorporation of latent Gaussian random variables or parametric latent class models. The goal of this article is to develop a nonparametric Bayes approach, which defines a prior with full support on the space of distributions for multiple unordered categorical variables. This support condition ensures that we are not restricting the dependence structure a priori. We show this can be accomplished through a Dirichlet process mixture of product multinomial distributions, which is also a convenient form for posterior computation. Methods for nonparametric testing of violations of independence are proposed, and the methods are applied to model positional dependence within transcription factor binding motifs.

I would like to know if there are any similar examples that could help guide me, and also whether to expect any particular problems with this kind of model using Stan.

Thanks
JK

Bob Carpenter

unread,
Dec 6, 2013, 6:15:01 PM12/6/13
to stan-...@googlegroups.com
There are no mechanisms for categorical sampling in Stan, but
this may not be an issue for you if the categorical part is just data,
which we can handle.

We also don't implement Dirichlet processes or any other Bayesian
non-parametric method that potentially relies on varying numbers
of parameters (such as Bayesian additive regression trees, aka BART).
Our samplers assume a fixed parameter space and although we'd like
to be able to do more, we don't have any plans to expand this part
of Stan any time soon.

Sometimes you can approximate a Dirichlet process with a simple
Dirichlet of high enough dimension. Don't know if that'll work in
this problem or not.

And any mixture model is going to run into label-switching problems,
which you need to be aware of when looking at things like the
R-hat convergence diagnostics, which no longer work. As far as I
know, it's an open research problem to measure convergence in these
settings.

Another huge problem is that these mixture models can be very multi-modal,
which is also something Stan's not good at sampling through (and no system
is good at sampling through if the combinatorics are bad enough, as they are
in say, latent Dirichlet allocation).

- Bob
> --
> You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Joe King

unread,
Dec 7, 2013, 3:46:51 AM12/7/13
to stan-...@googlegroups.com
Dear Bob

Thank you for this information and advice.

May I ask your advice about which modelling package you would recommend for fitting such a model ?

Thanks again
JK

Andrew Gelman

unread,
Dec 7, 2013, 12:21:31 PM12/7/13
to stan-...@googlegroups.com, Yajuan Si
Hi, Joe.  Yajuan has implemented Gaussian processes in Stan, which is not quite the same thing but has some similarities.
A

Bob Carpenter

unread,
Dec 7, 2013, 1:52:32 PM12/7/13
to stan-...@googlegroups.com
If the data's not too huge and you can approximate the
required Dirichlet process(es) with a fairly low dimensional
Dirichlet, then you could use Stan.

If not, I'd suggest asking Dunson and Xing what they used for
the paper.

- Bob


On 12/7/13, 3:46 AM, Joe King wrote:
> Dear Bob
>
> Thank you for this information and advice.
>
> May I ask your advice about which modelling package you would recommend for fitting such a model ?
>
> Thanks again
> JK
>
> On Friday, 6 December 2013 22:58:52 UTC, Joe King wrote:
>
> Hello all
>
> I am starting to work on an implementation of the nonparametric Bayesian model for high dimensional multivariate
> categorical data described in
>
> David B. Dunson & Chuanhua Xing (2009) Nonparametric Bayes Modeling of Multivariate Categorical Data, Journal of the
> American Statistical Association, 104:487, 1042-1051, DOI: 10.1198/jasa.2009.tm08439
> http://dx.doi.org/10.1198/jasa.2009.tm08439 <http://dx.doi.org/10.1198/jasa.2009.tm08439>
>
> Abstract:
> Modeling of multivariate unordered categorical (nominal) data is a challenging problem, particularly in high
> dimensions and cases in which one wishes to avoid strong assumptions about the dependence structure. Commonly used
> approaches rely on the incorporation of latent Gaussian random variables or parametric latent class models. The goal
> of this article is to develop a nonparametric Bayes approach, which defines a prior with full support on the space
> of distributions for multiple unordered categorical variables. This support condition ensures that we are not
> restricting the dependence structure a priori. We show this can be accomplished through a Dirichlet process mixture
> of product multinomial distributions, which is also a convenient form for posterior computation. Methods for
> nonparametric testing of violations of independence are proposed, and the methods are applied to model positional
> dependence within transcription factor binding motifs.
>
> I would like to know if there are any similar examples that could help guide me, and also whether to expect any
> particular problems with this kind of model using Stan.
>
> Thanks
> JK
>

sophie

unread,
Dec 8, 2013, 2:09:43 PM12/8/13
to stan-...@googlegroups.com, gel...@stat.columbia.edu
Hi Joe,

I am pretty familiar with this model because my thesis is based on this to handle missing data for many categorical variables (http://jeb.sagepub.com/content/38/5/499.full?keytype=ref&siteid=spjeb&ijkey=jYj38jOWx2Qns). I have not implemented it into Stan due to the slow updating of  the number of latent classes. But I believe it is doable using the truncated stick-breaking representation of DP. Dunson and Xing lay out the steps using slice sampling in the paper, which can also be done using the approximate blocked Gibbs or exact blocked Gibbs. I can provide the references if you need.

Thanks!
Yajuan

Janne Sinkkonen

unread,
Dec 9, 2013, 3:36:10 AM12/9/13
to stan-...@googlegroups.com, gel...@stat.columbia.edu
Looks like I have implemented the same model in a commercial setting. Collapsed Gibbs is very easy with this model, but getting it to converge to a good mode is harder. One way is to start with a small number of samples, then gradually put more samples in while the chain is running. This is a bit analogous to tempering: Initially, with only few data, the chain is able to switch modes easily. 

Hyperparameter optimization or sampling (the DP parameter and the prior parameters of the multinomials, i.e., \beta in  Dir(\beta)) is important for good performance, especially with sparse data. 

Yes, this should be doable with Stan if one marginalizes over the categorical variables. Look at the LDA example in the Stan manual. The code will look a bit ugly, for you have to increase the log-likelihood manually (there is no sampling statement of the type ~ sum(…)). Probably the best approach is to use the truncated stick breaking approach, as someone already suggested. (The alternative is a finite Dirichlet.)

Let us know if you succeed with Stan. I'm also interested on trying these models in Stan. 
Reply all
Reply to author
Forward
0 new messages