Are you running optimized code (O=3)? Are you running
from R or from the command line, and if so, how is the
memory on the machine doing overall? It can be a problem
in general if you get close to swapping and a big problem
if you start swapping (memory into and out of files).
I'm afraid the simple answer is yes, bigger models
take longer to sample. The time that it takes to
calculate a log probability and gradient
for a model with a fixed set of parameters and priors
will be proportional to the data size plus a constant
for the work done for the priors. Now if the data's
much larger than the set of parameters, the data size
should dominate the calculations.
The issue is with the posterior geometry. In some
cases, more data can make the problem faster by more
tightly constraining the posterior. In other problems,
the hierarchical structure can present problems for the
sampler getting stuck in regions of low hierarchical
variance (see the discussion of Neal's funnel in the manual).
The code's pretty tight as you wrote it. The only suggestions
I have for improving its speed are to:
1. Further vectorize this operation:
for (d in 1:D){
for (s in 1:S){
beta[s,d] <- mu[d] + e_beta[s,d]*sigma[d]; # "matt trick" beta.
}
}
to
for (d in 1:D)
beta[s] <- mu + e_beta[s] .* sigma;
The ".*" is componentwise multiplication as in MATLAB.
This may not even be a noticeable speed improvement, but
it simplifies the model.
2. Reparameterize your gamma prior:
sigma ~ gamma(1,0.5);
which is equivalently
sigma ~ exponential(0.5);
in Stan's notation. This won't save you much time if any
because the gamma implementation's clever enough not to
compute the Gamma function for fixed alpha. This won't
help much, but then you can go to this:
parameters {
real<lower=exp(0),upper=exp(20)>[D] sigma_raw;
...
transformed parameters {
real<lower=0,upper=20>[D] sigma;
for (d in 1:D) sigma[d] <- log(sigma_raw[d]);
...
model {
sigma_raw ~ uniform(exp(0),exp(20));
...
This probably also won't make much of a difference because sigma
isn't a very long vector. It might help with the geometry, though.
3. Convert x and beta to types:
vector[D] x[N];
vector[D] beta[S];
That is make them arrays of vectors. The reason is that it
doesn't require any copying or allocation in the loops to pull
out a vector member of an array. Calling row(x,n) has to
do two bad things: access the values of x in non-local order [because
the matrices are stored column major], and create an intermediate
row vector data structure to hold the results.
4. Switch to a logistic regression from a probit. The
cumulative density function Phi() is a lot less efficient
than the inverse logit.
5. Aggregate the data in y and replace the Bernoulli
with a binomial. There's an example in the manual section
of optimization on this. It looks possible because it looks like
x has many repeated rows, and we know ss[n] only takes on
three values for the three subjects.
- Bob
> --
>
>