nimble and memory use

360 views
Skip to first unread message

Henry Scharf

unread,
Aug 8, 2021, 7:56:19 PM8/8/21
to nimble-users
Hello nimble users and developers,

Thanks very much for all your hard work building and maintaining nimble. Recently, I've been trying to conduct a large simulation study that involves refitting the same model to various datasets. The model is somewhat complex and a little slow to build/compile, but it behaves reasonably well for a single run on one dataset. However, when I try to repeatedly fit the model for the simulation study, R rapidly grows in its memory usage until it overwhelms my system's resources.

I've created what I think is a minimum working example of the phenomenon I'm encountering. The first loop below aligns with my understanding of how R's memory use works (ie, removing variables and doing garbage collection releases memory). The second loop shows the confusing behavior I'm encountering with nimble. I would expect the memory usage to stay flat as in the first case, but it seems to grow linearly with each iteration. I'd be grateful for any insight.

Thanks,
Henry

library(pryr)
mem_1 <- rep(NA, 10)
for(i in 1:length(mem_1)){
  a <- rnorm(1e7)
  mem_1[i] <- mem_used()
  var_list <- ls(all.names = T)
  rm(list = var_list[-which(var_list == "mem_1")])
  gc()
}
plot(mem_1)

mem_2 <- rep(NA, 10)
for(i in 1:length(mem_2)){
  library(nimble)
  code <- nimbleCode({
    mu ~ dnorm(mean = 0, sd = 1)
    for(i in 1:N){
      y[i] ~ dnorm(mean = mu, sd = 1)
    }
  })
  constants <- list(N = 1e3)
  data <- list(y = rnorm(constants$N))
  model <- nimbleModel(code = code, constants = constants, data = data)
  Cmodel <- compileNimble(model)
  conf <- configureMCMC(model = model)
  mcmc <- buildMCMC(conf = conf)
  Cmcmc <- compileNimble(mcmc, project = model)
  Cmcmc$run(niter = 1e3)
  samples <- as.matrix(Cmcmc$mvSamples)
  mem_2[i] <- mem_used()
  var_list <- ls(all.names = T)
  rm(list = var_list[-which(var_list == "mem_2")])
  gc()
}
plot(mem_2)

Perry de Valpine

unread,
Aug 9, 2021, 9:02:51 PM8/9/21
to Henry Scharf, nimble-users
Hi Henry,

This is a tricky issue.  We have built two features to attempt to reduce memory use, but still it seems that objects evade R's garbage collection.  This may be because we have a lot of reference class objects and environments, creating potentially closed loops of referenced objects.  The two features are:

  nimbleOptions(clearNimbleFunctionsAfterCompiling = TRUE) # This can modestly reduce memory use

  nimble:::clearCompiled(model) # This attempts to clear all compiled content for the project related to model and to unload the on-the-fly compiled shared library used for it.

However I tried both of these and they don't resolve the issue you're reporting.  It's something we'll have to look into more.   We have worked on this in the past but this doesn't look like good behavior.

Here are some potential workarounds.

You could within each loop use system2() to call Rscript and launch a self-contained process.  This would make sense if you really need to do the full nimble building and compilation each time.

If you are really re-using the same model structure, you could build and compile just once and then simply re-assign data values.  e.g.
Cmodel$y <- some_other_values
then re-run your already-compiled MCMC.

If in your simulations you need models of different sizes or different setups of what is data (observed vs unobserved), it gets trickier but it is still possible to build and compile just once and re-use those objects.  For example you can configure-build-compile a full set of samplers for all nodes (including, atypically, data nodes) and then control the sampler order in a particular run of the MCMC to include some samplers and omit others.  In that way, you can make some nodes handled like data and some like unobserved on a run-by-run basis.  If that sounds like something you need and it is too imprecisely described here, please holler again and we could go into more detail.

Will one of those approaches help you?

-Perry

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/ef734cf0-fbc6-4066-ade2-319c6d3f4449n%40googlegroups.com.

Henry Scharf

unread,
Aug 10, 2021, 7:16:14 PM8/10/21
to nimble-users
Hi Perry,

Thanks for the quick reply and helpful suggestions. I think the system2() + Rscript workout will work for me. The memory use during some preliminary runs using system2() looked good.
Not having to rebuild and compile is tempting, but the data structures for the various simulation scenarios are pretty intricate and I'm reluctant to kick that nest.

Best,
Henry

gesta...@gmail.com

unread,
Aug 25, 2022, 11:21:18 AM8/25/22
to nimble-users
Perry or Henry,

I know this thread is a year old now, but do either of you have a very simple code example demonstrating this system2() + Rscript approach for repeatedly building and compiling nimble models when the data structure and model constants are changing from one simulation to the next?

Thanks,
Glenn

Message has been deleted
Message has been deleted
Message has been deleted

Chris Paciorek

unread,
Apr 15, 2023, 2:28:48 PM4/15/23
to Keith Lau, nimble-users
I think using Rscript, that could be a good strategy. You'll likely want to write a file of R code, say `run_chain.R`) that implements a single chain and saves the results in a chain-specific file.
The R file should take an ID as an argument, using syntax like:

  args <- commandArgs(TRUE)
  chainID <- as.numeric(args[1])

Then in your main R script, you can use parLapply (as seen in our parallelization example) to invoke a function where that function uses system2 to invoke the file with the ID of the chain.

run_MCMC_allcode <- function(seed) {
  system2(paste("Rscript run_chain.R", seed)
}

Let me know if any of that is not clear.

It's clunky, I know. We are working on the memory limitations as part of an overhaul of the compiler and model-building systems, but that's a long-term project.

You could also experiment with using `nimble:::clearCompiled`(see here) to clear out compiled objects that might be holding onto a lot of memory, but as I played a
bit with that for a simple case, it didn't seem to be effective in terms of freeing up memory, so I'm not sure how well it would work for you.

-chris

On Tue, Apr 11, 2023 at 3:05 PM Keith Lau <genw...@gmail.com> wrote:
Dear all, 

I am also interested in knowing how to delete redundant memory in parallel computing. I ran 50 chains in parallel by foreach. When it returns the 50 summary objects (each of small size of memory), I found the main R session increased a lot of memory (almost 256 GB). It seems returning many redundant memory from each sessions (I'm not sure). I wonder whether there is any way to solve the memory issue. I saw there is a  system2() + Rscript approach, but I wonder how to do it in an example. 

Thanks a lot!
Keith

gesta...@gmail.com 在 2022年8月25日 星期四晚上11:21:18 [UTC+8] 的信中寫道:
Reply all
Reply to author
Forward
Message has been deleted
0 new messages