Speeding Up Multinomial Logistic Regressions in Stan

Avi Feller

unread,

Apr 19, 2013, 10:39:37 AM4/19/13

to stan-...@googlegroups.com

Hi all,

I'm working on a problem that will eventually require a slightly complicated multilevel multinomial logistic regression model; so I'm naturally hoping to fit this model in Stan. Unfortunately, even for very straightforward multinomial logistic regression models, Stan seems to be dramatically slower than some of the alternatives, especially the MCMCmnl function in MCMCpack.

For example, I've fit a simple multinomial logit regression using an example data set from the MCMCpack package (the "Nethvote" dataset). Running the code from the manual (literally copy-and-paste), the Stan model takes about 20 minutes for a single chain of 1,000 iterations on my laptop (a fairly new MacBook Pro). By contrast, corresponding MCMCpack function takes about 3 seconds for 1,000 iterations, and about 45 seconds for the recommended 100,000 iterations. Of course, I realize that the Stan sampler can be slow in certain situations---especially relative to the M-H draws that MCMCpack uses---but I didn't expect a several order-of-magnitude difference between the run times. (Note that I've heard reports from several others about how slow multinomial logit regression in Stan can be.) Finally, I've also tried inputing the covariates as vectors rather than matrices, but didn't find a marked increase in speed.

I've attached the example code, which includes the "sessionInfo()" output. Hopefully there is an obvious fix to this problem, as it'd be great to use Stan for the more complex multinomial model that I'd like to fit!

Thanks in advance for the help.

Best,

Avi Feller

stan_multinom_code.R

Marcus Brubaker

unread,

Apr 19, 2013, 11:29:10 AM4/19/13

to stan-...@googlegroups.com

There are a couple things to keep in mind.

1) The example models in the manual are often written with
expositional clarity in mind, not speed. There is a section on
improving performance through vectorization which you should check
out. That model can be made to run a fair bit faster, particularly
for large amounts of data/variables. (We should probably provide a
library of "clear" and "optimized" examples for cases like this...)
If you need help vectorizing things, let us know.

2) MCMCmnl uses RW Metropolis and/or slice sampling. As a rule, Stan
will require far fewer samples than MCMCmnl to achieve a certain level
of accuracy. Or, to put it another way, 1 iteration of Stan is worth
a lot more than one iteration of MCMCmnl.

3) Stan may be be beaten by specially designed packages which target
very specific forms of models. This is the nature of the tradeoff
between generality and speed. That said, if you take 2) into account
when comparing results, Stan can still sometimes beat out specialized
packages.

Cheers,
Marcus

> --
> You received this message because you are subscribed to the Google Groups
> "stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to stan-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Michael Betancourt

unread,

Apr 19, 2013, 11:51:50 AM4/19/13

to stan-...@googlegroups.com

> 2) MCMCmnl uses RW Metropolis and/or slice sampling. As a rule, Stan
> will require far fewer samples than MCMCmnl to achieve a certain level
> of accuracy. Or, to put it another way, 1 iteration of Stan is worth
> a lot more than one iteration of MCMCmnl.

Exactly. Can you calculate the ESS for each method, in particular ESS / time?

Avi Feller

unread,

Apr 19, 2013, 12:46:56 PM4/19/13

to stan-...@googlegroups.com

Marcus and Michael (et al.),

First, thank you both for the prompt and thoughtful responses. I had already been thinking about Marcus's second and third points, but I definitely appreciate the need to reiterate them---they're important to keep in mind!

Michael's suggestion to check ESS is a good one. However, it only confirms my previous suspicions: the ESS for the MCMCpack estimates are generally between 11,000-12,000 (based on 100,000 iterations) and took about 50 seconds to estimate; the corresponding ESS for estimates from a single 1,000 iteration chain from Stan are between 250 and 300, but took about 22 minutes (~1350 seconds). So ESS/time is extremely unfavorable to Stan using the code from the manual.

Of course, I do understand that the MCMCmnl function was specifically tailored to this problem, but this still seems like a fairly dramatic asymmetry.

As for point #1, I had previously tried vectorizing/optimizing the Stan code but kept hitting roadblocks (to be fair, I'm fairly new to Stan so maybe I'm just doing a poor job!). Specifically, it appears as if the "softmax" can only take vectors, rather than matrices---in other words, Stan can only pass vectors of linear predictors to the softmax function one observation at a time. I had considered implementing something more complicated to work around this issue, but decided it made more sense to just email the Stan list directly.

In short, I'd very much welcome your suggestions for speeding up the example multinomial logit code. And again, I hope I'm just missing something else obvious (it certainly wouldn't be the first time).

Many thanks,

Avi

--
You received this message because you are subscribed to a topic in the Google Groups "stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/bqx6Rz2tDls/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to stan-users+...@googlegroups.com.

Bob Carpenter

unread,

Apr 19, 2013, 1:46:17 PM4/19/13

to stan-...@googlegroups.com

Thanks for the report.

Have you checked the two models produce the same answers
(within MCMC error)? If MCMCmnl is working for you,
there's really no reason to switch to Stan if you don't
need Stan's modeling flexibility.

We would, of course, like to make Stan faster.

Would you mind sharing your data? We're collecting
benchmark problems. Or at least telling
us the sizes and if the predictors were standardized and
how much correlation there was among the predictors?

You're right that there's no vectorization of categorical
or of sofmax yet. But that's not the key to speeding this
model up.

As to the specialization of the code, what we really need
to do is write a multi_logit distribution so that your model likelihood:

for (n in 1:N)
y[n] ~ categorical(softmax(beta * x[n]));

can be written as:

y ~ multi_logit(beta, x);

For simple multinomial regression, all the derivatives take
very simple analytic forms for the likelihood and coding
this function this way would be as much as 20 times faster
than what's there now. Which would leave us only a factor
of 20 or so slower :-)

I'll put it on the to-do list. This is the kind of
thing we really want to make faster, so it's a fairly
high priority, and it's independent of all of our other
work, so we should be able to get to it right away.
(In fact, Marcus just put in a bunch of improvements like
this for Gaussian process models so as to compute their
covariance matrices faster and remove redundant solver
calls.)

Your model as is could be sped up by declaring x as a matrix:

and transforming what you have:

for (k in 1:K)
for (d in 1:D)
beta[k,d] ~ normal(0,5);
for (n in 1:N)
y[n] ~ categorical(softmax(beta * x[n]));

to first declare x as a matrix:

matrix[N,K] x;

and then redefining the model to:

model {
vector[N] beta_x;
beta_x <- beta * (5 * x);
for (k in 1:K)
beta_raw[k] ~ normal(0,1);
for (n in 1:N)
y[n] ~ categorical(beta_x[n]);
}

It'd be even faster if you scaled the data
instead of multiplying x by 5 each time in this
loop. You could either do that with the inputs or
in a transformed data block. But it should be pretty
fast because 5 and x do not involve parameters (why
I grouped the expression so oddly).

- Bob

On 4/19/13 10:39 AM, Avi Feller wrote:
> Hi all,
>
> I'm working on a problem that will eventually require a slightly complicated multilevel multinomial logistic regression
> model; so I'm naturally hoping to fit this model in Stan. Unfortunately, even for very straightforward multinomial
> logistic regression models, Stan seems to be dramatically slower than some of the alternatives, especially the MCMCmnl
> function in MCMCpack.
>
> For example, I've fit a simple multinomial logit regression using an example data set from the MCMCpack package (the
> "Nethvote" dataset). Running the code from the manual (literally copy-and-paste), the Stan model takes about 20 minutes
> for a single chain of 1,000 iterations on my laptop (a fairly new MacBook Pro). By contrast, corresponding MCMCpack

> function takes about 3 /seconds/ for 1,000 iterations, and about 45 seconds for the recommended 100,000 iterations. Of

> course, I realize that the Stan sampler can be slow in certain situations---especially relative to the M-H draws that
> MCMCpack uses---but I didn't expect a several order-of-magnitude difference between the run times. (Note that I've heard
> reports from several others about how slow multinomial logit regression in Stan can be.) Finally, I've also tried
> inputing the covariates as vectors rather than matrices, but didn't find a marked increase in speed.
>
> I've attached the example code, which includes the "sessionInfo()" output. Hopefully there is an obvious fix to this
> problem, as it'd be great to use Stan for the more complex multinomial model that I'd like to fit!
>
> Thanks in advance for the help.
>
> Best,
> Avi Feller
>

> --

> You received this message because you are subscribed to the Google Groups "stan users mailing list" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Avi Feller

unread,

Apr 19, 2013, 2:06:23 PM4/19/13

to stan-...@googlegroups.com

Bob,

Thank you for this. You make some excellent suggestions in your post. The truth is that I'm not in a particular rush on the analysis, so it seems like I should just be patient and wait for the next update of Stan. Specifically, it sounds like adding a "multi_logit" function is the best and most obvious next step.

As for your questions about the data, I'm happy to go into more detail, but it's probably more convenient for you to load the data set directly---I ran all of these examples using a built-in data set in the "MCMCpack" package, the "Nethvote" data set, where the outcome of interest is vote choice for one of four political parties in a 1989 Dutch survey. As you can see in the code from my original post, I did minimal/no pre-processing and have no doubt that that would also speed up Stan in this case.

Finally, you're right that if I were only interested in a simple model like this, MCMCmnl would be fine. However, my actual research question involves a more complex multilevel model, which MCMCmnl can't readily accomodate (note that the actual data set is restricted access). Hence hoping for a speed improvement in Stan.

Let me know if I can help clarify any more of these questions---perhaps off-list as we now seem to have converged on the problem at hand.

Thanks again for all the help.

Best,

Avi

To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--

You received this message because you are subscribed to a topic in the Google Groups "stan users mailing list" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stan-users/bqx6Rz2tDls/unsubscribe?hl=en.

To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@googlegroups.com.

Jack Tanner

unread,

Apr 19, 2013, 2:24:11 PM4/19/13

to stan-...@googlegroups.com

Bob, I'm awfully confused by your model redefinition. Why the
multiplication by 5? How is beta_raw related to anything? Could you walk
through this redefinition? Thanks in advance.

Bob Carpenter

unread,

Apr 19, 2013, 2:35:32 PM4/19/13

to stan-...@googlegroups.com

On 4/19/13 2:24 PM, Jack Tanner wrote:
> Bob, I'm awfully confused by your model redefinition. Why the multiplication by 5? How is beta_raw related to anything?
> Could you walk through this redefinition? Thanks in advance.

There's a section in the manual on reparameterization that should help.

In short, if beta is a parameter, then the statement:

beta ~ normal(0,5);

is equivalent to declaring beta_raw a parameter and
defining beta in terms of beta_raw by:

beta_raw ~ normal(0,1);
beta <- beta_raw * 5;

So you make beta_raw the parameter, then scale it by 5 when
using it. You can define beta as above (you'll need to make it
a local variable, or you can make it a transformed parameter if
you want to print it). Or you can just fold
it into the expression where it's used, which I did. Note
that:

beta * x[n] == beta_raw * 5 * x[n]

Associating (5 * x[n]) is faster because there's no
parameter involved.

- Bob

Bob Carpenter

unread,

Apr 19, 2013, 2:37:06 PM4/19/13

to stan-...@googlegroups.com

Thanks for the pointer to data -- it should help
us in benchmarking.

I hope we can get multi_logit into the next release.
Definitely the best bet if you want to use more complex models.

- Bob

> stan-users+unsubscribe@__googlegroups.com <mailto:stan-users%2Bunsu...@googlegroups.com>.
>
> For more options, visit https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>.

>
>
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "stan users mailing list" group.

> To unsubscribe from this topic, visit https://groups.google.com/d/__topic/stan-users/bqx6Rz2tDls/__unsubscribe?hl=en
> <https://groups.google.com/d/topic/stan-users/bqx6Rz2tDls/unsubscribe?hl=en>.
> To unsubscribe from this group and all its topics, send an email to stan-users+unsubscribe@__googlegroups.com
> <mailto:stan-users%2Bunsu...@googlegroups.com>.
> For more options, visit https://groups.google.com/__groups/opt_out <https://groups.google.com/groups/opt_out>.

>
>
>
> --
> You received this message because you are subscribed to the Google Groups "stan users mailing list" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Tom Eagle

unread,

Apr 19, 2013, 2:55:56 PM4/19/13

to stan-...@googlegroups.com

Avi,

I have a small version of a vectorized multi-level poisson model. It could be translated into a logistic regression pretty easily. See the attached. There are two files: a stan code file and the R program that called it. Let me know if you want the data. Note I also use multiple processors to run each chain to speed things up…

Tom Eagle

--
You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stan-users+...@googlegroups.com.

Poisson RSTAN TE 6.zip

Bob Carpenter

unread,

Apr 19, 2013, 3:53:59 PM4/19/13

to stan-...@googlegroups.com

On 4/19/13 2:55 PM, Tom Eagle wrote:
> Avi,
>
> I have a small version of a vectorized multi-level poisson model. It could be translated into a logistic regression
> pretty easily. See the attached. There are two files: a stan code file and the R program that called it. Let me know
> if you want the data.

Thanks. We always love examples.

You're right that the same GLM form could be used
for Poisson and logistic regression, but the multi-logit's
a bit trickier because we haven't vectorized as much of
it in the language or provided efficiently parameterized
probability functions.

> Note I also use multiple processors to run each chain to speed things upï¿½

That should help any system. But to do a fair comparison,
you'd have to be sure MCMCmnl isn't using multiple cores
already.

- Bob

Andrew Gelman

unread,

Apr 19, 2013, 5:26:53 PM4/19/13

to stan-...@googlegroups.com

When the dust clears on this example (so that the stan code is vectorized etc), we should run it by Malecki, because he knows all about MCMCpack.
A

On Apr 19, 2013, at 3:53 PM, Bob Carpenter wrote:

>
>
> On 4/19/13 2:55 PM, Tom Eagle wrote:
>> Avi,
>>
>> I have a small version of a vectorized multi-level poisson model. It could be translated into a logistic regression
>> pretty easily. See the attached. There are two files: a stan code file and the R program that called it. Let me know
>> if you want the data.
>
> Thanks. We always love examples.
>
> You're right that the same GLM form could be used
> for Poisson and logistic regression, but the multi-logit's
> a bit trickier because we haven't vectorized as much of
> it in the language or provided efficiently parameterized
> probability functions.
>

> > Note I also use multiple processors to run each chain to speed things up…

>
> That should help any system. But to do a fair comparison,
> you'd have to be sure MCMCmnl isn't using multiple cores
> already.
>
> - Bob
>

Reply all

Reply to author

Forward