On Thu, 2013-09-12 at 22:57 -0400, Bob Carpenter wrote:
> On 9/12/13 8:34 PM, Ross Boylan wrote:
> > Summary: While estimating a model memory use increased steadily, resulting in a dead job or unresponsive system around
> > 8G. Anything I can do about that?
>
> Running from the command line, models should allocate
> memory before the first iteration and never need more
> memory than the amount used after the first iteration.
> If it's going up otherwise, it may be a signal of a memory
> leak.
Would it be any different if run from R?
>
> Your parameter is a J x K matrix and you'll need enough memory in
> R to store the number of samples times a J x K matrix times 8 bytes.
> At minimum (not accounting for R overhead), this will require
>
> 2000 iterations/chain
> * 3 chains
> * 8 bytes/double
> * 1 double/parameter
> * J * K parameters/iteration
>
> = 2000 * 3 * 8 * J * K bytes
= 720,000 << 8G
>
> If that's a lot less than 8 GB, it may be the sign of a problem.
>
> > I ran a model for 200 iterations and 1 chain; it finished in about 65 seconds.
> > > system.time(r <- stan(model_code = stan_model, data=stanDat,
> > + iter=200, chains=1))
>
> Can you do that for 4 chains and if so, does it produce enough
> effective samples? If so, you may not need to run longer.
Effective sample sizes for the 1 chain ranged from 10 through 22; even
times 4 that's pretty low. Plus 100 burn-in is not large either, though
Rhat wasn't bad (1.1 or less).
>
> If the memory's too big or there's too much correlation in the
> samples, you can thin the output, which should cut down on memory
> usage.
If the problem is saving the results. But it appears to be something
else.
>
> > Then I used the model compiled at the previous step to do something more:
> > system.time(r2 <- stan(fit=r, data=stanDat, iter=2000, chains=3))
> >
> > That ran for a few minutes and then I got "Process R killed".
> >
> > I rebuilt with debugging level non-optimization, as suggested in the FAQ. The job took steadily increasing amounts of
> > memory. I killed it when it was just over 8G and my system became unresponsive.
>
> I have no idea what this does w.r.t. R, but just at the C++
> level, the optimization level shouldn't have only a minimal impact on
> memory usage.
I have no reason to think it made any difference; I just didn't look at
the memory use in the earlier run. And the fact that the debug run was
slower made it easier to watch the progress.
I did this partly because the code I copied had it and partly because it
seemed semi-reasonable for the parameters I was estimating. Even a
value as big as 2 or 3 would be surprising.
>
> > for(n in 1:N) {
> > beta_x <- (x[n] * beta)';
> > pEthnic[n] ~ categorical(softmax(beta_x));
> > }
> > }
>
> Stan 2 (the current dev branch), will let you do this:
>
> pEthnic[n] ~ categorical_logit(beta_x);
Cool; I'll try it out. I might as well get some goodies with the risk.
Because of the earlier thread I looked a little in the recent commits
and in the feature/ branches (just the branch names, not the contents)
for this feature, but didn't see it.
>
> It should be a bit more efficient than categorical(softmax(beta_x)).
>
>
> It'll also be more efficient if the covariate type is:
>
> row_vector[J] x[N]; // covariates
Will this affect the order (or type) of the x data I pass in?
>
> Then there's no memory allocation for x[n].
>
>
> > The model has a number of oddities, at least when viewed from my frequentist experience:
> > 1. no intercept
> > 2. overdetermined outcomes. The outcome is one of 5 categories; all 5 are modeled.
> > 3. overdetermined covariates. Each observation is one of 3 categories; each category has a dummy variable.
> > Other example models I looked at seemed similarly over-determined, so I thought maybe this was OK (though the example I
> > copied from did have an intercept).
>
> None of this should matter for memory usage.
That was an implicit question if this would cause problems in general.
Also, I figured if it made things go ape it might have something to do
with the memory problem.
>
> > I based this on the "Speeding Up Multinomial Logistic Regressions in Stan" thread started April 19. I tried to follow
> > Bob's advice about using vector and matrix arithmetic when possible, but my efforts to imitate the code revisions he
> > gave were not successful. The manual suggests (x[n] * beta)' is not the most efficient way to go. (I skipped the stuff
> > about beta_raw to simplify).
>
> If you have specific questions here, we're happy to help.
I was wondering what the optimal way to set things up was; I think you
answered that with the suggestion about x. Though I'm a little
confused, since in the earlier thread you suggested making x a matrix.
>
> > Any tips or advice would be great.
> >
> > Running under R 2.15.1, Debian wheezy amd64 bit. This is without any of my efforts to use system boost or eigen
> > headers. I refreshed rstan (and the stan subproject) and built the tarball from the upper level makefile. Then I did
> > an R CMD INSTALL on that tarball, using a local path for installation.
>
> Probably better if you stick to our headers for now.
I figured that was safer, as well as being the easiest thing to do.
>
> Presumably the R CMD INSTALL is configured with optimization, too.
I didn't do anything special, but the generated commands from the
makefile had -o3.
Next steps? valgrind (likely really slow)? Running via free-standing
stan rather than R? For the latter, is there a simple way to right the
data out from R in the necessary format?
>
> >
> > Ross Boylan
> >
> > P.S. Time for the initial 200 iterations with 65 seconds with optimization, 763 seconds without optimization. Wow!
>
> We built the code to be optimized --- that is, we didn't
> try to build code that would be efficient if it wasn't
> compiled with optimization.
>
> - Bob
>
P.S. I was suprised to see in the manual that MSVC can't handle the
templates. Template handling was a weak spot in MSVC when I used it ~ a
decade ago, but I would have expected them to fix it by now. Maybe C#
is getting all the love.