Slow performance for logistic regression with random effects

2,018 views
Skip to first unread message

Sean J Taylor

unread,
Sep 6, 2012, 1:24:02 PM9/6/12
to stan-...@googlegroups.com
I've been benchmarking Stan against JAGS for a simple logistic regression.  I have 2 binary independent variables and one binary outcome with 100K observations and 1500 groups.  For each model I am running 1000 iterations and 4 chains on a Macbook Pro with 2.4 Ghz processor.

For a logistic regression, it takes about 6 minutes in Stan, 7 minutes in JAGS.  Not a bad speedup.

When I add random effects with a hyper parameter on the variance (tau ~ inv_gamma(0.001, 0.001), u ~ normal(0, sqrt(tau)), u is a 1500 element vector), things slow down considerably.  Stan takes almost an hour to fit this model, whereas JAGS takes about 10 minutes.

Am I doing something horribly wrong here?  I've attached my stan and bugs model files.

Thanks!

Sean
upvote.viewer.bugs
upvote.viewer.stan

Bob Carpenter

unread,
Sep 6, 2012, 1:42:38 PM9/6/12
to stan-...@googlegroups.com


On 9/6/12 1:24 PM, Sean J Taylor wrote:
> I've been benchmarking Stan against JAGS for a simple logistic regression. I have 2 binary independent variables and
> one binary outcome with 100K observations and 1500 groups. For each model I am running 1000 iterations and 4 chains on
> a Macbook Pro with 2.4 Ghz processor.
>
> For a logistic regression, it takes about 6 minutes in Stan, 7 minutes in JAGS. Not a bad speedup.

JAGS has a GLM module that can speed up some regressions.
I'm not sure if your model falls in that category or not.

> When I add random effects with a hyper parameter on the variance (tau ~ inv_gamma(0.001, 0.001), u ~ normal(0,
> sqrt(tau)), u is a 1500 element vector), things slow down considerably. Stan takes almost an hour to fit this model,
> whereas JAGS takes about 10 minutes.

Some questions:

1. Were the number of effective samples similar when
you were done?

2. What optimization are you compiling under?

3. If you're not running under R, do you know what the
tuning parameters look like after adaptation and what
the tree depth is for NUTS?

4. Does it take Stan a long time to converge to the
area of the final answer with a random starting point or
does it take a long time to sampler when it gets there?

> Am I doing something horribly wrong here? I've attached my stan and bugs model files.

Not that I can see. You have the right constraints
on the variance parameter tau.

One simplification you can make is to use the logit-scaled
Bernoulli, replacing

z ~ bernoulli(inv_logit(theta))

with

z ~ bernoulli_logit(theta);

It should be both faster and more arithmetically stable.

You might also try a slightly more informative prior on
the regression coefficients than normal(0,100); or you
could just drop the prior altogether -- the normal(0,100)
prior won't have much effect anyway. But it also won't
lead to a big speedup.

Why do you have an inverse gamma prior on the variance tau?
That could be part of the problem, as the u are not very
cnstrained here.

See:

http://www.stat.columbia.edu/~gelman/research/published/taumain.pdf

- Bob

Matt Hoffman

unread,
Sep 6, 2012, 1:52:42 PM9/6/12
to stan-...@googlegroups.com
Hi Sean,

Another tip that I've found useful: Instead of saying

model {
...
tau ~ inv_gamma(0.001, 0.001);
u ~ normal(0, sqrt(tau));
...
}

I'm confident you'll get faster convergence if you use this equivalent model:

transformed parameters {
real u[J];
for (j in 1:J)
u[j] <- u_z[j] * sqrt(tau);
}
model {
...
tau ~ inv_gamma(0.001, 0.001);
u_z ~ normal(0, 1);
...
}

The reason being that this breaks the strong prior dependence between
u and tau and reduces the range of the distribution of posterior log
probabilities. Of course, this kind of reparameterization might make
JAGS faster too, for all I know!

Best,
Matt

Sean J Taylor

unread,
Sep 6, 2012, 3:14:19 PM9/6/12
to stan-...@googlegroups.com
Hey Bob,

Thanks for the quick reply. I'm running all of these using rstan, so I'm letting R take care of compiling and tuning parameters.

1. n_eff reported by summary(stan.fit) is ~1000 for every variable, except the constant, where it's ~40.
2. Unsure what rstan is doing here.
3. Again, I'm just using defaults.  Not sure if there's something smarter I could be doing.
4. Looking at the traceplots, the chains seems to converge quite quickly (within the first hundred iterations).

I just swapped in the bernoulli_logit function, tightened up the priors on my betas, and replaced the tau prior with uniform(0, 10), but there was no significant speedup.

I'm trying out Matt's reparameterization right now, will report the results here in a bit.

Sean

Bob Carpenter

unread,
Sep 6, 2012, 3:27:09 PM9/6/12
to stan-...@googlegroups.com
There's a note in the install instructions on how to
set the compiler flag to -O3 for more optimization.
It makes a big difference, even compared to the default
-O2 in R's build.

On 9/6/12 3:14 PM, Sean J Taylor wrote:
> Hey Bob,
>
> Thanks for the quick reply. I'm running all of these using rstan, so I'm letting R take care of compiling and tuning
> parameters.
>
> 1. n_eff reported by summary(stan.fit) is ~1000 for every variable, except the constant, where it's ~40.

Right, so this means the constant is harder to sample.
You have more effective samples than you probably need
for the other variables.

Any idea of the effective sample size under JAGS?
You can get it using R2jags (which was itself based
on R2WinBUGs and in turn formed the basis for RStan;
having said that, we compute ESS and R-hat a bit differently
in RStan).

What matters for speed is n_eff / wall time.

If you converge very quickly, you should only need to do that
much warmup (or a bit more to be safe) in subsequent runs.

> 2. Unsure what rstan is doing here.

Where? The manual explains how we calculate ESS and
R-hat in Stan.

> 3. Again, I'm just using defaults. Not sure if there's something smarter I could be doing.

You can get faster convergence with reasonable initialization.
But it still makes sense to have dispersed starting points
to make sure the MCMC is working.

> 4. Looking at the traceplots, the chains seems to converge quite quickly (within the first hundred iterations).

That's good news.

> I just swapped in the bernoulli_logit function, tightened up the priors on my betas, and replaced the tau prior with
> uniform(0, 10), but there was no significant speedup.

The bernoulli_logit's faster, but even the basic bernoulli's
pretty fast. The big advantage is increased arithmetic
stability as it gets around potential overflow and underflow
in applying exp() to the linear predictor.

>
> I'm trying out Matt's reparameterization right now, will report the results here in a bit.

Thanks! We need to put more of these tips in the doc.

- Bob

Jiqiang Guo

unread,
Sep 6, 2012, 3:36:36 PM9/6/12
to stan-...@googlegroups.com
This can be obtained in R using function get_sampler_params and get_adaptation_info. 

--
Jiqiang

Andrew Gelman

unread,
Sep 6, 2012, 5:03:41 PM9/6/12
to stan-...@googlegroups.com
In any case, I think we should see what we can do to get hierarchical logistic regression going faster. Maybe vectorization of the Bernoulli and Matt's separating parameterization will help? I hope we can be faster than Bugs/Jags, after all it's not like Gibbs is doing anything helpful or computationally efficient in these models.
A
> <upvote.viewer.bugs><upvote.viewer.stan>

Matt Hoffman

unread,
Sep 6, 2012, 7:06:57 PM9/6/12
to stan-...@googlegroups.com
JAGS is using a specialized sampler that exploits second-order
information, but yes. It'd be good to be competitive.

Matt
Reply all
Reply to author
Forward
0 new messages