On 7/18/13 6:19 PM, Ben Goodrich wrote:
> On Thursday, July 18, 2013 8:30:09 AM UTC-4, Avi Feller wrote:
>
> I'm particularly interested in ways of speeding up the current "transformed parameters" section, which seems to be
> the slowest part. In a usual MCMC, this part is executed via data augmentation, which is obviously impossible in Stan.
You can tell this will be the slow part because it's N * K
exp() evals, plus N size-K sums.
...
Your code looks pretty clean (if oddly R-like in spacing
to my eyes; pardon my reversion to C++ style).
Using a sum() of an explicitly calculated vector should work.
You can make this faster:
for(k in 1:(K-2))
for(d in 1:D)
gamma[k,d] ~ cauchy(0, 2.5);
by vectorizing. Same for the beta distributed normal.
for (k in 1:(K-2))
gamma[k] ~ cauchy(0, 2.5);
And this is most efficient as an array of vectors (whhat
you have for beta, but not gamma). We'll eventually vectorize
this all more efficiently so you can just write gamma ~ cauchy(0,2.5)
and then the most efficient version will be a matrix. (Sorry
it's a moving target for efficiency issues.)
> it should be a little faster if you use the softmax() function. What you have here relies on the
> auto-differentiation mechanism to do the derivatives, while most named functions in Stan have analytic derivatives. The
> auto-differentiation requires memory allocations and so forth.
I'm afraid softmax() isn't one of them, so you're unlikely
to see any speedup at all.
But it is built in, so it should be a little less error prone
than coding your own softmax in the model.
We should add faster auto-diff for softmax --- I was too lazy to
compute and code the derivatives when I coded it initially.
And you're actually taking the log of a softmax, which would
be another op that itself could be sped up and made more arithmetically
stable. Do you really need the normalization supplied by softmax?
We only need probabilities up to a proportion, and it looks like you
get the same denominator in each term in the if either way.
For simplicity, you could pull this term
if (cat[n] == 1)
log_p_vec[n] <- log(theta[n,1])
+ normal_log(y[n], beta[1] * x[n], sigma[1]);
Note that you can put the plus sign on the next line, unlike in
R, which conforms to standard math typesetting practice, which
is aimed at making the formulas easier to scan.
...
log_p_vec[n] <- log(theta[n,cat[n]])
+ normal_log(y[n], beta[cat[n]] * x[n], sigma[cat[n]]);
if (cat[n] == 3)
log_p_vec[n] <- log_sum_exp(log_p_vec[n],
... other big term ...);
else if (cat[n] == 4)
log_p_vec[n] <- log_sum_exp(log_p_vec[n],
... other term ...);
And given the form, you could combine the cat[n] == 3 and cat[n] == 4
cases. But this might not generalize to your richer model.
- Bob