Hello everyone,
I am running a very time- and memory-demanding CMR model and I need some help in trying to save some time (so that I may finish my PhD in less than 10 years lol).
Basically the model I’m running can take up to 5-6 days to build and compile and then runs at around 2000 iterations /24 h. It is a complex model with many individual- and time-based random effects, and so I’ll likely need over 100k iterations for chains to converge, which means I’m looking at at least 1.5-2 months for each chain to run.
So one issue I have is that if my model crashes (like it did this weekend when the cluster I was running it on went into maintenance), I’d like to be able to start it over from where it left off so as to not have to start over from the start. I’ve been saving my samples along the way using something like
mcmc$run(niter=5000, burnin=0, thin=5)
run1<-as.matrix(mcmc$MVsamples)
save(run1,'run1.RData')
rm(run1)
mcmc$run(niter=5000,burnin=0,thin=5, reset=F, resetMV=T)
run2<-as.matrix(mcmc$MVsamples)
save(run2,'run2.RData')
rm(run2)
and so on…
This allows me to save some memory be removing samples after each run but still extending the chains. My problem is that when the model crashes, I need to start over. So I have 2 questions.
1. Is is possible to save some parts of the model building process (i.e. model graph, links between objects created by nimble, anything else that could be saved) and store it in an object that could be re-loaded later so as to save at least some time of the (5-day) model building process? I have a hunch this won’t work given what I’ve read but I’m asking anyway just in case.
2. Is there a way to restart a chain from where a chain from a past model ended, so as to simply extend the chains from a previous model? I’ve ran some tests using mcmc$run(reset=F) and also some using the end points of a previous run of the same model as the initial values for the stochastic nodes (this also seems to be what happens when unsing mcmc$run(reset=F), am I wrong?). With these tests I found that the chains initialized with the last samples obtained in a previous run do not seem to be behaving the same way (at least in the first iterations) as when using the option mcmc$run(reset=F)(see an example in the attached graphs). I imagine then that there must be some other values saved in the sampler functions that are not simply the values of the nodes which are used when runing the function with reset=F? The section 7.5 of the nimble manual also hints that there are other values in the sampling function resetted when using the reset=T option. I am wondering whether it is possible to access this data, save it after each run, and feed it to a freshly built model (for instance after a crash), so that I could simply extend the chains I had already run and saved, as I would do with the reset=F option. Is there any way to do this? Otherwise it means I must start over after each crash and given the time it takes for my model to run, I’m not sure that is going to be an option.
Thanks a lot for any help with this, and let me know if some things are unclear,
Cheers,
Fred L.
--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/9667bc36-419f-4217-b79d-e3d45bfece59n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/CAKbe0ho2n3Nz%3D6Ju-D2BdRxejMpr5XGOd4%3DYg0AF%3DiPJg0HjsA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/ca47e8ce-fcb5-4a4d-8d8c-0939eb3ca4b7n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/cfeab8e6-83cf-4faf-a95b-55d5c692fd2bn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/89c3f6c3-4fae-4b32-9ae8-05a6da0d377cn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/f96d9a6e-ce73-4a75-915f-fe4b621b5df3n%40googlegroups.com.