Thanks for the report.
Have you checked the two models produce the same answers
(within MCMC error)? If MCMCmnl is working for you,
there's really no reason to switch to Stan if you don't
need Stan's modeling flexibility.
We would, of course, like to make Stan faster.
Would you mind sharing your data? We're collecting
benchmark problems. Or at least telling
us the sizes and if the predictors were standardized and
how much correlation there was among the predictors?
You're right that there's no vectorization of categorical
or of sofmax yet. But that's not the key to speeding this
model up.
As to the specialization of the code, what we really need
to do is write a multi_logit distribution so that your model likelihood:
for (n in 1:N)
y[n] ~ categorical(softmax(beta * x[n]));
can be written as:
y ~ multi_logit(beta, x);
For simple multinomial regression, all the derivatives take
very simple analytic forms for the likelihood and coding
this function this way would be as much as 20 times faster
than what's there now. Which would leave us only a factor
of 20 or so slower :-)
I'll put it on the to-do list. This is the kind of
thing we really want to make faster, so it's a fairly
high priority, and it's independent of all of our other
work, so we should be able to get to it right away.
(In fact, Marcus just put in a bunch of improvements like
this for Gaussian process models so as to compute their
covariance matrices faster and remove redundant solver
calls.)
Your model as is could be sped up by declaring x as a matrix:
and transforming what you have:
for (k in 1:K)
for (d in 1:D)
beta[k,d] ~ normal(0,5);
for (n in 1:N)
y[n] ~ categorical(softmax(beta * x[n]));
to first declare x as a matrix:
matrix[N,K] x;
and then redefining the model to:
model {
vector[N] beta_x;
beta_x <- beta * (5 * x);
for (k in 1:K)
beta_raw[k] ~ normal(0,1);
for (n in 1:N)
y[n] ~ categorical(beta_x[n]);
}
It'd be even faster if you scaled the data
instead of multiplying x by 5 each time in this
loop. You could either do that with the inputs or
in a transformed data block. But it should be pretty
fast because 5 and x do not involve parameters (why
I grouped the expression so oddly).
- Bob
On 4/19/13 10:39 AM, Avi Feller wrote:
> Hi all,
>
> I'm working on a problem that will eventually require a slightly complicated multilevel multinomial logistic regression
> model; so I'm naturally hoping to fit this model in Stan. Unfortunately, even for very straightforward multinomial
> logistic regression models, Stan seems to be dramatically slower than some of the alternatives, especially the MCMCmnl
> function in MCMCpack.
>
> For example, I've fit a simple multinomial logit regression using an example data set from the MCMCpack package (the
> "Nethvote" dataset). Running the code from the manual (literally copy-and-paste), the Stan model takes about 20 minutes
> for a single chain of 1,000 iterations on my laptop (a fairly new MacBook Pro). By contrast, corresponding MCMCpack
> function takes about 3 /seconds/ for 1,000 iterations, and about 45 seconds for the recommended 100,000 iterations. Of
> course, I realize that the Stan sampler can be slow in certain situations---especially relative to the M-H draws that
> MCMCpack uses---but I didn't expect a several order-of-magnitude difference between the run times. (Note that I've heard
> reports from several others about how slow multinomial logit regression in Stan can be.) Finally, I've also tried
> inputing the covariates as vectors rather than matrices, but didn't find a marked increase in speed.
>
> I've attached the example code, which includes the "sessionInfo()" output. Hopefully there is an obvious fix to this
> problem, as it'd be great to use Stan for the more complex multinomial model that I'd like to fit!
>
> Thanks in advance for the help.
>
> Best,
> Avi Feller
>
> --
> You received this message because you are subscribed to the Google Groups "stan users mailing list" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
stan-users+...@googlegroups.com.