Nested Tapes for Efficient Memory Allocation

34 views
Skip to first unread message

Abe Solberg

unread,
Dec 2, 2025, 10:48:52 AMDec 2
to TMB Users
Hi All,

I am trying to run an ecosystem model with random effects in RTMB and am running into either std::bad_alloc errors or caught bus errors: cause 'non-existent physical address' (when TapeConfig matmult="compact")

Similar to this user, my memory spikes during the start of optimization and then goes back down to normal levels before failing. 

The model runs fine when only using fixed effects, but when I try to use random effects I get the error message. I've also tried to use a more powerful machine but it has not resolved the problem. It does not seem to make any difference how many years of data I include--I always get the same errors. The model will run with random effects when I optimize using TMBStan, although it is quite time consuming. 

Because the model is essentially a large loop where the values in time t impact the values in time t+1 etc., I expect that I could probably rewrite my function to include nested tapes in a way that would more efficiently use memory, but after having tried and failed many times I don't think I'm clever enough to figure this out. 

I've attached a simple example of the model here using prepacked data from the mizer package (again, thanks to the authors of that package), and I'd be grateful for any tips/directions forward. 

As a note, I'm not specifically concerned about any of the probability functions here, as the model I'm working on is doing something somewhat different than I'm doing here, but this seemed like the easiest way to reproduce the error.

Thanks,
Abe


simple_mizer.R

Kasper Kristensen

unread,
Dec 3, 2025, 7:49:49 AMDec 3
to TMB Users
Your model lacks sparsity because it's not initially formulated as a state space model. It may seem like a natural strategy to try to make the current implementation run faster. However, that's not gonna do any good, because the model has structural issues. Going forward I'd recommend thinking carefully about what the state vector should be (natural space or frequency domain?) in order to minimize the number of FFT transformations. Make sure to connect states via
X(t) ~ Distribution(f(X(t-1)))
and make sure that all further state calculations only access one 'X(t)' at a time. It may be useful to start with a small toy example and build up, rather than starting with the full mizer model...

How to diagnose model issues:

## Normally not needed, but here we set it to not run out of memory
TMB::config(tmbad.sparse_hessian_compress=TRUE, DLL="RTMB")
## Build model object as before
obj <- MakeADFun(function(p)do.call(simple_mizer,p), log_pars , map = map , random = c('rdd' , 'effort'))

## Build the sparse hessian tape - takes a LONG time
## 1208.890 seconds for only 379 random effects!
system.time(obj$env$spHess)

## Check the sparsity pattern
h <- obj$env$spHess(random=TRUE)
Matrix::image(h) ## Dense! - not what you want

## What's on the gradient tape? ~21000 Fourier transforms (yikes!):
TMB:::op_table(obj$env$ADGrad)
## FFT         10518
## iFFT        10518
If each of the 379 random effects link to all these FFTs it means there are 7.5 million FFTs on the sparse hessian tape! That's exactly what you need to avoid.
Reply all
Reply to author
Forward
0 new messages