Can someone help me understand why Jags converges and Stan doesn't in my neg. binomial regression?

kris2...@gmail.com

unread,

Jun 13, 2013, 7:52:51 PM6/13/13

to stan-...@googlegroups.com

Hi everyone,

I've just started using (R)stan and started so by building a toy model to do negative binomial regression. I set some parameters,simulated values and then used these as data. My goal was to code the model in Jags and Stan and do a comparison in terms of inference and run time. So, my jags model ran just fine and converged within 20000 iterations. However, the stan model did not converge at all in the same number of iterations. I'm wondering if I'm doing something wrong. Could someone familiar with both jags and stan see if these are the same models? I think they are, but again maybe I've misinterpreted something. By the way, I'm using the negative binomial parameterized as a Gamma mixture of Poissons. For the overdispersion parameter, I'm using a prior which is a compound gamma distribution.

Simulation of data:
n <- 200
gradient <- runif(n,5,1200)
beta1 <- -0.00221
beta0 <- 1.9571
theta <- 8
taxon_ct <- rnbinom(n,size = theta, mu=exp(beta0 + beta1*gradient))

JAGS Model:
model_jags <- function() {
#Likelihood
for (i in 1:n){     #loop around sites
    r[i] ~ dgamma(theta,theta)
    log(mu[i]) <- beta0 + beta1*gradient[i]
    mustar[i] <- r[i]*mu[i]
    taxon_ct[i] ~ dpois(mustar[i])
}
#Priors
beta0 ~ dnorm(0, 0.001)
beta1 ~ dnorm(0, 0.001)
theta ~ dgamma(0.01,h)
h ~ dgamma(0.01,0.01)
}

####SUMMARY OF OUTPUT

1. Empirical mean and standard deviation for each variable,
   plus standard error of the mean:

           Mean        SD  Naive SE Time-series SE
beta0  2.004388 0.0837506 1.184e-03      1.240e-03
beta1 -0.002347 0.0001813 2.564e-06      2.682e-06
h      0.002291 0.0158177 2.237e-04      2.337e-04
theta  9.999899 5.8335050 8.250e-02      2.127e-01

2. Quantiles for each variable:

            2.5%        25%        50%        75%    97.5%
beta0  1.837e+00  1.949e+00  2.005e+00  2.061e+00  2.16714
beta1 -2.711e-03 -2.468e-03 -2.346e-03 -2.224e-03 -0.00199
h      1.901e-78  2.951e-32  8.719e-17  2.593e-08  0.02096
theta  4.213e+00  6.516e+00  8.421e+00  1.154e+01 25.64662

Potential scale reduction factors:

      Point est. Upper C.I.
beta0       1.00       1.00
beta1       1.00       1.00
h           1.01       1.01
theta       1.01       1.02

Multivariate psrf

1

STAN model:
test_code <- '
data {
int<lower=0> n; //number of sites
real gradient[n]; //gradient value at each site
int<lower=0> taxon_ct[n]; //count of taxon at each site
}
parameters {
real beta1;
real beta0;
real<lower=0> theta;
real<lower=0> h;
real<lower=0> r[n];
}
transformed parameters{
real<lower=0> mustar[n];
for (i in 1:n) {
    mustar[i] <- r[i]*exp(beta0 + beta1*gradient[i]);
    }
}
model {
for (i in 1:n){
    taxon_ct[i] ~ poisson(mustar[i]);
    }
beta0 ~ normal(0,sqrt(1000));
beta1 ~ normal(0,sqrt(1000));
theta ~ gamma(0.01,h);
h ~ gamma(0.01,0.01);
}
'
###OUTPUT

                      mean       se_mean            sd           2.5%            25%            50%            75%         97.5%
beta1        -3.000000e-01  3.000000e-01  4.000000e-01  -6.000000e-01  -6.000000e-01  -6.000000e-01  -3.000000e-01  4.000000e-01
beta0        -3.500000e+00  2.440000e+01  3.450000e+01  -6.290000e+01  -1.510000e+01   1.100000e+01   2.370000e+01  2.420000e+01
theta         1.000000e-01  1.000000e-01  2.000000e-01   0.000000e+00   0.000000e+00   0.000000e+00   1.000000e-01  4.000000e-01
h             8.300000e+00  1.010000e+01  1.430000e+01   0.000000e+00   0.000000e+00   1.000000e-01   7.400000e+00  3.450000e+01

 n_eff          Rhat
beta1           2  1.402600e+03
beta0           2  9.070000e+01
theta           2  5.373000e+02
h               2  2.960000e+01

Andrew Gelman

unread,

Jun 13, 2013, 8:11:30 PM6/13/13

to stan-...@googlegroups.com

Hi, unless I'm missing something, your parameter r[i] has a proper distribution in the Jags model and a flat distribution in your Stan model.

A

--
You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

kris2...@gmail.com

unread,

Jun 13, 2013, 9:15:18 PM6/13/13

to stan-...@googlegroups.com, gel...@stat.columbia.edu

Thanks for picking up that silly error on my part. However, when I added the same distribution I had in Jags (i.e., r[i] ~ gamma(theta,theta)) ,

test_code <- '
data {
int<lower=0> n; //number of sites
real gradient[n]; //gradient value at each site
int<lower=0> taxon_ct[n]; //count of taxon at each site
}
parameters {
real beta1;
real beta0;
real<lower=0> theta;
real<lower=0> h;
real<lower=0> r[n];
}
transformed parameters{
real<lower=0> mustar[n];
for (i in 1:n) {
mustar[i] <- r[i]*exp(beta0 + beta1*gradient[i]);
}
}
model {
for (i in 1:n){

r[i] ~ gamma(theta,theta);

taxon_ct[i] ~ poisson(mustar[i]);
}
beta0 ~ normal(0,sqrt(1000));
beta1 ~ normal(0,sqrt(1000));
theta ~ gamma(0.01,h);
h ~ gamma(0.01,0.01);
}
'

I began getting this (which can be ignored?)

Informational Message: The parameter state is about to be Metropolis rejected due to the following underlying, non-fatal (really) issue (and please ignore that what comes next might say 'error'): Error in function stan::prob::gamma_log(N4stan5agrad3varE): Inverse scale parameter is 0:0, but must be > 0!
If the problem persists across multiple draws, you might have a problem with an initial state or a gradient somewhere.
 If the problem does not persist, the resulting samples will still be drawn from the posterior.

I'm still also having issues with convergence:

            n_eff         Rhat
beta1           2 1.746000e+03
beta0           2 2.060000e+01
theta          16 1.100000e+00
h               2 2.200000e+01

Thanks for any help you can provide.

Bob Carpenter

unread,

Jun 14, 2013, 4:41:44 PM6/14/13

to stan-...@googlegroups.com

On June 13, 2013 09:15:18 PM EDT, kris2...@gmail.com wrote:

> Thanks for picking up that silly error on my part. However, when I added
> the same distribution I had in Jags (i.e., r[i] ~ gamma(theta,theta)) ,
>
>
> test_code <- '
> data {
> int<lower=0> n; //number of sites
> real gradient[n]; //gradient value at each site
> int<lower=0> taxon_ct[n]; //count of taxon at each site
> }
> parameters {
> real beta1;
> real beta0;
> real<lower=0> theta;
> real<lower=0> h;
> real<lower=0> r[n];
> }
> transformed parameters{
> real<lower=0> mustar[n];
> for (i in 1:n) {
> mustar[i] <- r[i]*exp(beta0 + beta1*gradient[i]);
> }
> }
> model {
> for (i in 1:n){
> r[i] ~ gamma(theta,theta);
> taxon_ct[i] ~ poisson(mustar[i]);

I'd suggest here using poisson_log, with

transformed parameters {
...

for (i in 1:n)

mustar[i] <- log(r[i]) + beta0 + beta1*gradient[i];
...

model {
for (i in 1:n)

taxon_ct[i] ~ poisson_log(mustar[i]);

If you don't want mustar[i] written out, you can do this all
inside poisson_log, using:

for (i in 1:n)

taxon_ct[i] ~ poisson_log(log(r[i]) + beta0 + beta1 * gradient[i]);

In fact, you can go one step further in efficiency and use
vectorization,

taxon_ct ~ poisson_log(log(r) + beta0 + beta1 * gradient);

but to do that, you need to define the relevant real arrays as vectors
instead, e.g.,

vector[n] gradient;

vector[n] r;

> }

> beta0 ~ normal(0,sqrt(1000));
> beta1 ~ normal(0,sqrt(1000));
> theta ~ gamma(0.01,h);
> h ~ gamma(0.01,0.01);
> }
> '
> I began getting this (which can be ignored?)
>
> Informational Message: The parameter state is about to be Metropolis
> rejected due to the following underlying, non-fatal (really) issue
> (and please ignore that what comes next might say 'error'): Error in
> function stan::prob::gamma_log(N4stan5agrad3varE): Inverse scale
> parameter is 0:0, but must be > 0!
> If the problem persists across multiple draws, you might have a
> problem with an initial state or a gradient somewhere.
> If the problem does not persist, the resulting samples will still be
> drawn from the posterior.

Yes, these can be ignored as long as they don't persist.
It's an HMC proposal that because of the discretization of
the Hamiltonian we need for computation runs outside of the
support of some constrained variables.

It can also be a sign of a problem with the model if you don't
want these parameters near 0.

> I'm still also having issues with convergence:
>
> n_eff Rhat
> beta1 2 1.746000e+03
> beta0 2 2.060000e+01
> theta 16 1.100000e+00
> h 2 2.200000e+01
>
>
> Thanks for any help you can provide.

I think there may be an identifiabilty problem
with the

log(r[i]) + beta0

term because both r[i] and beta0 are parameters, so
there seems to be additive non-identifiability.

- Bob

kris2...@gmail.com

unread,

Jun 14, 2013, 6:00:36 PM6/14/13

to stan-...@googlegroups.com, ca...@alias-i.com

Hi Bob,

Thanks for this! I was keeping my code transparent to me step-by-step before vectorizing for efficiency, but perhaps the changes will help. I'll definitely try your suggestions and post what I find.

Anyway, not sure that non-identifiability is at work here since this is a pretty standard way to parameterize the negative binomial distribution (unless I'm doing that incorrectly)-- especially since the identical model has no problems in jags at converging to the parameters I used to simulate the toy dataset -- in only 5000 iterations. As a sidenote, I did try coding the stan model directly through the negative binomial distribution (something that jags and bugs can't do) before I switched to the Gamma mixture of Poissons. That was not working either so I switched to something I was used to.

I'll keep you posted!
Kris

kris2...@gmail.com

unread,

Jun 14, 2013, 10:02:32 PM6/14/13

to stan-...@googlegroups.com, ca...@alias-i.com

Hi Bob,

Using the log-likelihood solved the problem! Thanks!

Kris

On Friday, June 14, 2013 4:41:44 PM UTC-4, Bob Carpenter wrote:

Reply all

Reply to author

Forward