Ben: What do you mean by dicey? And how large? What's
the problem?
Chris: see some suggestions inline below
- Bob
P.S. This list is shutting soon and moving to discourse.
See:
mc-stan.org/community
> On May 19, 2017, at 5:13 AM, Chris <
wel...@gmail.com> wrote:
>
> Hi,
>
> I am trying to implement a mixture model of Negative Binomial regression models. I want to use the Dirichlet Process prior to estimate the number of mixture components.
> I found this thread (
https://groups.google.com/forum/#!topic/stan-users/d-R7J6MldkQ) with an example code for a hierarchical model, and together with the mixture model example in the docs, I came up with this code:
>
>
> library(rstan)
>
> model_string <-"
> data {
> int<lower=1> K; // number of mixture components
> int<lower=1> N; // number of data points
> int<lower=1> M; //the number of columns in the model matrix
> matrix[N,M] X; //the model matrix
> int y[N]; // observations
> real<lower=0> alpha0 ; // dirichlet prior
> }
> transformed data {
> vector<lower=0>[K] alpha0_vec;
This can just be
vector<lower=0>[K] alpha0_vec = rep_vector(alpha0, K);
> for (k in 1:K){
> alpha0_vec[k] = alpha0; //symmetric dirichlet prior
> }
> }
> parameters {
> simplex[K] theta; // mixing proportions
> vector[M] betas[K];
> real<lower=0> sigma[K]; // scales of mixture components
> }
> model {
> theta ~ dirichlet(alpha0_vec);
> sigma ~ lognormal(0, 2);
> for (k in 1:K)
> betas[k,] ~ normal(0,10);
> for (n in 1:N) {
> real lps[K];
> for (k in 1:K)
> lps[k] = log(theta[k]) + neg_binomial_2_lpmf(y[n] | X[n] * betas[k], sigma[k]);
> target += log_sum_exp(lps);
> }
> }"
I don't know what the predictors in X[n] are, but it might
make sense to have a hierarchical prior for the betas either
altogether or by predictor m in 1:M.
Otherwise, this looks fine as a negative binomial mixture.
You're combining two hard to estimate pieces---negative binomial
overidspersion and mixture models. Michael Betancourt wrote
a case study recently on mixtures illustrating some of the
difficulties (see
mc-stan.org/docs).
If you run this for longer and the n_eff go up proportionally you should be OK.
You'll only find one mode, of course.
>
> I this model specification correct?
>
> When looking at the thetas, theta[1] has a value of 1.0, where the other thetas have zero. Does that mean the optimal number of components is 1?
There's no optimization per se, but that's the posterior mean number
of components. You could print this to more decimal places and maybe
see that it's not precisely 1.0.
> Actually that would be true here, as the data are generated from a simple NegBin.
Well, that's reassuring at least.
You could also have a stronger prior (wouldn't approximate a DP, but
it may save you from having some bins not have any data).
> How do I interpret the betas? betas other for components other than the first are meangingless?!
You could run this out to more decimal places. Probably not exactly meaningless,
but only making a weak contribution. That essentially means you're using very
little data to train the components, so their posteriors should look a lot like
your priors.
> The true sigma is 1.2 (called phi in the data generation) and the estimated sigma for component 1 is 1.19, so that is perfect.
> The estimated betas are way off the truth, however.
If you generated everything from one negative binomial, that's
to be expected. Are the betas for the first component that's being
used way off?
> Another question, how can I add ordering to avoid label switching? In the docs it is achieved by a line
> ordered mu[K]; // locations of mixture components
> but how can I do it here?
You'd have to do write your own ordered simplex transform, which
is a bit tricky. You could order one component of mu, like maybe
an intercept. Again, I recommend Michael's case study to get some
more understanding of how mixtures work.
> I also get tons of errors like this, but I assume I can safely ignore them?
>
> The following numerical problems occurred the indicated number of times on chain 1
> count
> Exception thrown at line 31: neg_binomial_2_lpmf: Precision parameter is inf, but must be finite! 2
> Exception thrown at line 31: neg_binomial_2_lpmf: Location parameter is -0.000138117, but must be > 1
> Exception thrown at line 31: neg_binomial_2_lpmf: Location parameter is -0.00013909, but must be > 0 1
> ...
You should try upping the target acceptance rate (adapt_delta = 0.95 or 0.99 or something)
and maybe starting with a lower stepsize and running more iterations.
- Bob