stan's slow optimization

Andrew Gelman

unread,

May 25, 2013, 9:31:41 AM5/25/13

to stan...@googlegroups.com

Hi all. I'm still reflecting upon the fact, reported by Bob, that Stan-optimize is at least an order of magnitude slower than glm for simple logistic regression. What bothers me about this is that glm uses iterative least squares, which itself is not supposed to be a very fast algorithm. My impression is that the fastest logistic regression algorithms solve the optimization problem directly without wasting their time with least squares.
I can see that Stan will not be the fastest thing out there--after all, it was not designed for optimization--but should we try to understand why it is so slow? I'm thinking about this with regard to my plan to have Stan be a replacement for lmer/glmer. If we're so slow at basic optimization, I'm worried about these more complicated algorithms.
A

Marcus Brubaker

unread,

May 26, 2013, 11:25:45 AM5/26/13

to stan...@googlegroups.com

Actually, the speed of the optimizer that Bob reported is no longer an issue. It was due to an old, known-to-be slow algorithm. I tested the BFGS-based optimizer that I implemented recently and found that it's basically about the same speed as glm. (Well, that was on one problem, I haven't done the more thorough testing that Bob did of varying data and parameter size but I expect the results to generalize.)

The BFGS optimizer has a few minor issues yet which I'm resolving right now but I hope to make it the default optimizer for the next release.

Cheers,

Marcus

A

--
You received this message because you are subscribed to the Google Groups "stan development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ben Goodrich

unread,

May 26, 2013, 11:55:56 AM5/26/13

to stan...@googlegroups.com, gel...@stat.columbia.edu

I wouldn't be surprised if BFGS optimization in Stan were always slower than a specialized optimization algorithm for a particular problem where analytic gradients are available. But if researchers were only interested in the mode, then lmer / glmer are probably adequate. For Stan to add value here, I think you need to convince interested people that MCMC would be better than mode finding (shouldn't be too hard to do that) and to implement an interface to those models that is as easy as glmer's (which is what I have been pushing for with a rstanarm package that someone like Alexej could work on).

Ben

Marcus Brubaker

unread,

May 26, 2013, 1:53:05 PM5/26/13

to stan...@googlegroups.com

I actually think the same arguments that support Stan over BUGS/JAGS in the MCMC context, apply to the mode-finding context. Yes, a specialized algorithm for a specific class of models should always be faster than a general purpose algorithm but then you're stuck with the specific model.

At any rate, with BFGS I think we're basically competitive with glm if we're comparing mode finding speed. We probably require a lot more memory which may hurt with enough parameters and/or data, but if that comes up it wouldn't be too much work to implement L-BFGS.

Cheers,

Marcus

Ben Goodrich

unread,

May 26, 2013, 2:09:31 PM5/26/13

to stan...@googlegroups.com

If we want to do something closer to an apples-to-apples comparison, it should be against the BFGS implementation in optim(), rather than the IRLS algorithm that glm() does. But I already assume BFGS in C++ is faster than in C calling R to evaluate the function and gradient

Ben

Bob Carpenter

unread,

May 27, 2013, 11:45:49 AM5/27/13

to stan...@googlegroups.com

Right -- I hadn't realized we hadn't put the faster
optimizer in yet (I should've looked at the code).

For Stan 2.0, we should be FASTER than glm() and lm(),
at least for larger problems.

- Bob

> stan-dev+u...@googlegroups.com <mailto:stan-dev%2Bunsu...@googlegroups.com>.

Andrew Gelman

unread,

May 27, 2013, 5:41:16 PM5/27/13

to stan...@googlegroups.com

Good news. Once it's in, I'll give it a try.
A

Bob Carpenter

unread,

May 27, 2013, 6:53:18 PM5/27/13

to stan...@googlegroups.com

Does "least squares" imply a particular algorithm for
optimization? I was under the impression it defined an
objective function that is then optimized.

- Bob

Andrew Gelman

unread,

May 27, 2013, 6:56:42 PM5/27/13

to stan...@googlegroups.com

I think of the "least squares" algorithm as being any of the non-iterative solutions of the least-squares problem.

Logistic regression is of course not a least squares problem at all; there are various iterative least squares algorithms that solve logistic regression by starting with a guess, then linearizing the derivative of the log likelihood at that guess, then doing weighted least squares, then linearizing around this new guess, etc. This algorithm is very natural (to statisticians) but is slow (for big problems) and can be unstable (even for small problems, as we discussed once on the blog).

A

Reply all

Reply to author

Forward