From JAGS to Stan: hierarchical logistic regression stuck at the first iteration

1,294 views
Skip to first unread message

martina...@gmail.com

unread,
Apr 4, 2015, 2:27:25 AM4/4/15
to stan-...@googlegroups.com
Dear community, I am trying to translate a code I have been using in JAGs to Stan, looking for a better performance. I already worked with Stan for one-level linear models, but that is my first time writing down an hierarchical model code in Stan. And I am having trouble doing it, since while the JAGs version works well (although with a not so stellar performance), in Stan the model doesn't actually run. It keeps stuck at the first iteration.

The specification I wanna model is a hierarchical logistic regression, where the first-level intercept and slopes b1 and b2 vary across each second-level group, while other first level slopes do not vary.

For the sake of clarity plus simplicity, I annex both the JAGs and Stan codes I've been playing with, instead of inserting both within this message. In both codes, N stands for the total number of cases in the first level, and G for the total number of groups in the second level.

Any insights would be much appreciated.

Thanks in advance,

Martina Andrews
myJAGSmodel.txt
mySTANmodel.txt

Bob Carpenter

unread,
Apr 4, 2015, 3:29:41 AM4/4/15
to stan-...@googlegroups.com
Those aren't the same models. In the JAGS model you use
a uniform(0,100) prior on the standard deviations whereas
in the Stan model you're using gamma(0.001,0.001).

I'm assuming you really mean to put this gamma prior on
precision (inverse square std dev), not on the std dev
parameter, because that would follow BUGS practice.

In our manual chapter on regression, you can find our own
recommended priors for these parameters. You probably
want tighter priors on the b parameters and something
like cauchy(0, 2.5) on the sd parameters. You have the
parameters properly constrained, so that shouldn't be a problem.
And you're using the bernoulli_logit, which is great.

To speed up Stan, there are two more things you can do.
One is to vectorize the priors, rewriting

for (g in 1:G) {
alpha[g] ~ normal(alpha_hat[g],sigma_alpha);
b1[g] ~ normal(b1_hat[g],sigma_b1);
b2[g] ~ normal(b2_hat[g],sigma_b2);
alpha_hat[g] ~ normal(0, 100);
b1_hat[g] ~ normal(0, 100);
b2_hat[g] ~ normal(0, 100);
}

as

alpha ~ normal(mu_alpha, sigma_alpha);
b1 ~ normal(b1_hat, sigma_b1);
b2 ~ normal(b2_hat, sigma_b2);
alpha_hat ~ normal(0, 100);
b1_hat ~ normal(0, 100);
b2_hat ~ normal(0, 100);

I'd also recommend changing the anmes --- "_hat" is typical
notation used for point estimates, not hierarchical priors.
We prefer something like:

b1 ~ normal(mu_b1, sigma_b1);

so you're already halfway there!

Otherwise everything looks OK. Are you trying to hand-initialize
the parameters? If so, that may be another issue. I'd try to let
Stan just do it automatically.

If you still have problems, please send us the data (or simulated
data) and the revised model.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <myJAGSmodel.txt><mySTANmodel.txt>

martina...@gmail.com

unread,
Apr 4, 2015, 3:52:42 PM4/4/15
to stan-...@googlegroups.com
Dear Bob, thank you so much for the detailed reply. I will revise the model shortly, but have a few reactions/questions beforehand, if you don't mind:

On Saturday, April 4, 2015 at 4:29:41 AM UTC-3, Bob Carpenter wrote:
Those aren't the same models.  In the JAGS model you use
a uniform(0,100) prior on the standard deviations whereas
in the Stan model you're using gamma(0.001,0.001).
I'm assuming you really mean to put this gamma prior on
precision (inverse square std dev), not on the std dev
parameter, because that would follow BUGS practice.

In our manual chapter on regression, you can find our own
recommended priors for these parameters.  You probably
want tighter priors on the b parameters and something
like cauchy(0, 2.5) on the sd parameters.  You have the
parameters properly constrained, so that shouldn't be a problem.
And you're using the bernoulli_logit, which is great.


Oops, my mistake. I was changing all sigma to uniform, but forgot to do that with Stan's version. But I will use the cauchy, instead. It seems certainly more reasonable, and in Stan it is a direct implementation of Gelman (2006), which is nice. I am not sure about the safe sigma for tightening the Beta priors, though. I will do some playing to see if anything changes regarding speed.

To speed up Stan, there are two more things you can do.
One is to vectorize the priors, rewriting

 for (g in 1:G) {
    alpha[g] ~ normal(alpha_hat[g],sigma_alpha);
    b1[g] ~ normal(b1_hat[g],sigma_b1);
    b2[g] ~ normal(b2_hat[g],sigma_b2);
    alpha_hat[g] ~ normal(0, 100);
    b1_hat[g] ~ normal(0, 100);
    b2_hat[g] ~ normal(0, 100);
  }

as

  alpha ~ normal(mu_alpha, sigma_alpha);
  b1 ~ normal(b1_hat, sigma_b1);
  b2 ~ normal(b2_hat, sigma_b2);
  alpha_hat ~ normal(0, 100);
  b1_hat ~ normal(0, 100);
  b2_hat ~ normal(0, 100);


Quick question: I supposed there is no way to do the same thing with the for(n in 1:N) loop in a hierarchical model, i.e. to vectorize the 1st-level equation since it is needed to index some parameters with the second level index, like b1[group[n]]. Is that right?

I'd also recommend changing the anmes --- "_hat" is typical
notation used for point estimates, not hierarchical priors.
We prefer something like:

  b1 ~ normal(mu_b1, sigma_b1);

so you're already halfway there!


Good point. Changed.

Otherwise everything looks OK.  Are you trying to hand-initialize
the parameters?  If so, that may be another issue.  I'd try to let
Stan just do it automatically.


I am already letting Stan choose the initial values. However, your suggestion made me raise a question: is it better practice for efficiency sake than passing hard initial values (for instance, rough previous estimates +2sd in some chains, zeros in another chain, etc)?

If you still have problems, please send us the data (or simulated
data) and the revised model.

- Bob


Thanks! I will give it a try and will come back with model and data in a few hours if the problem persists.

Martina

Bob Carpenter

unread,
Apr 4, 2015, 7:47:41 PM4/4/15
to stan-...@googlegroups.com

> On Apr 5, 2015, at 6:52 AM, martina...@gmail.com wrote:

...

> Quick question: I supposed there is no way to do the same thing [vectorize] with the for(n in 1:N) loop in a hierarchical model, i.e. to vectorize the 1st-level equation since it is needed to index some parameters with the second level index, like b1[group[n]]. Is that right?

Correct --- but I'm working on it. It should land right after
ragged arrays --- maybe in six months to a year.

...

> I am already letting Stan choose the initial values. However, your suggestion made me raise a question: is it better practice for efficiency sake than passing hard initial values (for instance, rough previous estimates +2sd in some chains, zeros in another chain, etc)?

It can help convergence, but if you independently do
+/- 2sd marginally, it might not help in models with
strong correlations.

The default init is uniform (-2,2) on the unconstrained
scale, but lowering the 2 to (-1,1) or even (-epislon,epislon)
for a small epsilon, or even (0,0) can help convergence in
models with very poorly behaved tails.

- Bob


martina...@gmail.com

unread,
Apr 5, 2015, 8:17:22 PM4/5/15
to stan-...@googlegroups.com
Thank you for the feedback. It is great to know that a vectorized version of the low-level of the hierarchical models is on the way.

In any case, I did vectorize the high level as you suggested and also adopted all other indications. My model still gets stuck in the first iteration even after half an hour.

I send here the revised version of the model, within the R code used to try to run it, as well as the data (which is loaded by readRDS in the R script), as requested. Any help will be very much appreciated.

Best,

Martina
mydata.file
mySTANmodel_v02.R

Bob Carpenter

unread,
Apr 5, 2015, 8:56:41 PM4/5/15
to stan-...@googlegroups.com
Did you try simulating data according to the model then trying
to fit that? We recommend always doing that before trying to
fit real data.

We'd also generally recommend starting with a small model that
does fit and gradually increasing its complexity. Then you can
diagnose where things go wrong. It's hard to debug a large
model that doesn't work.

When you say the model gets stuck in the first iteration, what is
happening? And you haven't set refresh=1, so you can't be sure it's
really the first iteration that's the problem.

One thing you can do to try to debug is add print() statements to
track the values of variables.

How much memory do you have on your machine? Is it possible that it
tries to swap to disk and isn't technically stuck, but just going very
slowly?

Are you letting Stan do the initialization? If not, I'd try that
first.

- Bob
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <mydata.file><mySTANmodel_v02.R>

Andrew Gelman

unread,
Apr 5, 2015, 8:57:20 PM4/5/15
to stan-...@googlegroups.com
This is another reason it will be good to have the R and Python wrappers that do graphical models in Stan.

martina...@gmail.com

unread,
Apr 6, 2015, 2:26:47 AM4/6/15
to stan-...@googlegroups.com
Thanks for the reply!

Actually, it turns out that after a few tries (and yes, letting Stan do the initialization with the "random" option set), the model is now running. However, to run 2000 iterations with 4 chains, it already took me 8 hours in a i7 3.2ghz personal computer, with 16GB RAM, just to barely finish the first chain. That's not reasonable and I guess something might be wrong, because the JAGS version (the one I sent in the original message) takes me 6 hours to run 500k iterations in 4 chains. Also the same model with MCMChlogistic (if I was able to mimic the model there, I am not sure) takes an amount of time similar to that of JAGS's.

I can't see anything particularly problematic with the model specification, that might lead to such a slow behavior when running it particularly with Stan. Any ideas?

Best,

Martina

Bob Carpenter

unread,
Apr 6, 2015, 3:27:42 AM4/6/15
to stan-...@googlegroups.com
Have you tried to fit with data simulated from the model? If not, could
you do that to make sure both programs can recover the proper parameter
values?

It's hard to debug a big program that's not working, whereas it's much
easier to build up a simpler program that works and see where it fails.

The Stan and JAGS models are still different in terms of priors and
constraints, but I don't see any reason the Stan model should be so slow
relative to the JAGS models unless the model is extremely poorly specified
(statistically, not computationally). In some cases where the posterior
is improper or nearly improper, Gibbs will merrily iterate without
actually sampling from the posterior.

Could you let Stan run fewer iterations and see what the diagnostic
parameters are, such as tree depth and acceptance? A high treedepth
(near or pegged at 10) will indicate a problem in the posterior.
In which case, it's likely JAGS isn't behaving properly either, just
running quick iterations. Did the draws from JAGS look like they converged
in terms of R-hat and n_eff?

Given that the variable names are different, are you sure you're
running the same data and sizes of data?

- Bob

Bob Carpenter

unread,
Apr 6, 2015, 3:49:42 AM4/6/15
to stan-...@googlegroups.com
I just simulated the data and found your problem.
Your hierarchical prior parameters are vectors rather than
scalars. This will cause the model to not be identified
and Stan will march all over the "posterior", whereas Gibbs
will move in a random walk, making it seem like Gibbs is
sampling from the posterior.

There's a more thorough discussion in the problematic posteriors
section of the manual with a very simple example involving normal
location (mean) parameters.

When I fix that problem and simulate data with N = 1000 and G=10,
Stan recovers the simulated parameters within their 95% intervals
in roughly 40 seconds using default settings.

I'm attaching both my simulator and the revised model.

- Bob

martina-sim.R
martina.stan

Andrew Gelman

unread,
Apr 6, 2015, 4:18:14 PM4/6/15
to stan-...@googlegroups.com
Excellent! Another Stan win.
> --
> You received this message because you are subscribed to the Google Groups "Stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
> <martina-sim.R><martina.stan>

Lionel Henry

unread,
Apr 6, 2015, 5:27:09 PM4/6/15
to stan-...@googlegroups.com, gel...@stat.columbia.edu

Andrew Gelman wrote:
This is another reason it will be good to have the R and Python wrappers that do graphical models in Stan.

Hello,

Is there such an effort going on right now? I'm asking because I'm also working on an R wrapper for Stan (and hopefully Jags and blme, but my focus is currently on Stan).

Lionel

Andrew Gelman

unread,
Apr 6, 2015, 6:12:11 PM4/6/15
to Lionel Henry, stan-...@googlegroups.com
Hi, I know of no such effort.  The idea would be to have a graphical modeling language (which would necessarily be less expressive than Stan’s language) where, given the model and data, the model could be translated into Stan.

Blme is another story.  For that there’s no real need (at least, I hope not), in that we are busy writing something that will translate lm/glm/bayesglm/lme/blme models into Stan.

A

Andrew Gelman

unread,
Apr 6, 2015, 6:36:22 PM4/6/15
to Lionel Henry, stan-...@googlegroups.com
Indeed, Stan does not yet have MML but I hope it won’t be too far in the future!

> On Apr 6, 2015, at 6:34 PM, Lionel Henry <lione...@gmail.com> wrote:
>
>
>> Le 7 avr. 2015 à 00:12, Andrew Gelman <gel...@stat.columbia.edu> a écrit :
>>
>> Hi, I know of no such effort.
>
> Great, it'd have been sad to have duplicated efforts.
>
>
>> Blme is another story. For that there’s no real need (at least, I hope not), in that we are busy writing something that will translate lm/glm/bayesglm/lme/blme models into Stan.
>
> Do you mean a translation from a classical R formula to Stan
> code? I was talking about fitting a given model spec with blme(),
> which is the opposite operation. That may be less useful once
> Stan has MML estimation, but one of my aim is to make it easy to
> fit a same model with different softwares and parametrisations, as
> a way of detecting problems in the fitting procedure.
>
> But I will also provide R formula translation as I expect it will
> be trivial to implement on top of my package.
>
> I should have a working first version in two or three weeks.
>
> Lionel


Lionel Henry

unread,
Apr 6, 2015, 7:15:44 PM4/6/15
to Andrew Gelman, stan-...@googlegroups.com

> Le 7 avr. 2015 à 00:12, Andrew Gelman <gel...@stat.columbia.edu> a écrit :
>
> Hi, I know of no such effort.

Great, it'd have been sad to have duplicated efforts.


> Blme is another story. For that there’s no real need (at least, I hope not), in that we are busy writing something that will translate lm/glm/bayesglm/lme/blme models into Stan.

Bob Carpenter

unread,
Apr 6, 2015, 8:21:43 PM4/6/15
to stan-...@googlegroups.com
[changed topic]

Yes, there is such an effort going on, but not by us:

NIMBLE

http://r-nimble.org

I haven't been keeping up with what they're doing, but they
are building a graphical modeling language embedded in R that
they'll then use to translate to systems such as BUGS and JAGS
and Stan and ...

Their goal is to support flexible reasoning over the model graph,
and should be able to do the kinds of things Andrew wants.
Without help, I doubt they'll focus too much on Stan and the
various densities and functions that only Stan supports.

- Bob

Martina Andrews

unread,
May 30, 2015, 7:08:14 PM5/30/15
to stan-...@googlegroups.com
Dear professor Carpenter,

Sorry for taking so long to reply to your kind effort of simulating the data before me. By the way, thanks for that - it was really helpful to better understand some details of the model.

So, I had to make a sudden travel, but now I could come back to this and to my dear Stan model. I have tried your fix for my hierarchical prior and also the other tips you kindly gave regarding vectorialization. Your code is really fast on the simulated data. But applied to my data without any modification (it is matches the data well), in fact it duplicates the speed of my estimation compared to previous attempts.

However, I am still taking around 4 hours to run 2000 iterations on 4 chains. For most parameters, the n_eff is still below 100 and there is, of course, still evidence of non-convergence - so I have to run more iterations. That's why I am here struggling  to make this at least a bit faster. At the current rate, it would take some 4.5 days to achieve around 50k absolute (non-effective) iterations. I also tried centering variables, but nothing changed much. Would  you have any additional tips on why that model you sent me would be performing so slowly in Stan?

Obs: it is a model without correlation between random effects, but when I also tried specifications with correlated random effects, both using Wishart and Lewandoski priors, nothing changed much as well.

Thanks again for all your help.

Martina



- Bob

--
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/LExiKWPw8Vk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
> On Apr 6, 2015, at 5:26 PM, Bob Carpenter <ca...@alias-i.com> wrote:
>
You received this message because you are subscribed to a topic in the Google Groups "Stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/LExiKWPw8Vk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Bob Carpenter

unread,
May 31, 2015, 5:39:57 PM5/31/15
to stan-...@googlegroups.com
Could you send your current model. If it's running much
more slowly with real data than simulated data (of the same
size) it probably means the real data doesn't match the model
specification as well.

- Bob
Reply all
Reply to author
Forward
0 new messages