Pausing and Restarting MCMC iterations / NIMBLE Efficiency

577 views
Skip to first unread message

wenn...@gmail.com

unread,
Jan 22, 2018, 5:35:50 PM1/22/18
to nimble-users
Hi everyone,

I am currently fitting some Bayesian models on a computing cluster and, since the models are too big, the MCMC iterations cannot be finished within the maximum allowed wall time (2 weeks). My question is: is there a way for me to save the finished the MCMC iterations and then restart the MCMC algorithm again at where it was left in another job? Or, in general, does anyone have any suggestions to speed up the MCMC/model building process? 

PS: My models have around 5000 - 20,000 parameters and my data set is about 1400 * 900. However, I don't think it's the model/data set that is causing this slow mixture, but NIMBLE is taking too much space during this process. But my intuition might be wrong here... Any thoughts on this anyone?

Thanks,
Wenna

Daniel Turek

unread,
Jan 22, 2018, 9:13:30 PM1/22/18
to wenn...@gmail.com, nimble-users
Wenna, thanks for your interest in NIMBLE, and sending the question.  I'm not an expert on the space usage, or whether that's the problem, but I can help you out with re-starting the MCMC from "where it left off". This can be done by restoring the final model parameter values into the model object, and also restoring the "state" variables internal to each sampler function, within the MCMC.  This would be a tiny bit tricky, but it can certainly be done.  Then you're restart the MCMC using the argument "reset = FALSE", so it wouldn't reset the internal state of all the sampling algorithms, but instead they'd pick up from where they left off.  I'm happy to help you through this process, if you want.

Thanks again for your interest in NIMBLE.  Cheers,

Daniel


--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users+unsubscribe@googlegroups.com.
To post to this group, send email to nimble...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/51412599-109b-48cf-9a9e-be00b7b239d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

wenn...@gmail.com

unread,
Jan 23, 2018, 12:27:43 PM1/23/18
to nimble-users
Hi Daniel,

Thanks for your quick response! It's great news for me to know that the MCMC chain can restart from where it left off! I found the "reset = FALSE" argument in the NIMBLE user's manual, but can you provide more details in how to save the final model parameter values and restoring the "state" variables? 

Many thanks!!

Wenna


On Monday, January 22, 2018 at 9:13:30 PM UTC-5, Daniel Turek wrote:
Wenna, thanks for your interest in NIMBLE, and sending the question.  I'm not an expert on the space usage, or whether that's the problem, but I can help you out with re-starting the MCMC from "where it left off". This can be done by restoring the final model parameter values into the model object, and also restoring the "state" variables internal to each sampler function, within the MCMC.  This would be a tiny bit tricky, but it can certainly be done.  Then you're restart the MCMC using the argument "reset = FALSE", so it wouldn't reset the internal state of all the sampling algorithms, but instead they'd pick up from where they left off.  I'm happy to help you through this process, if you want.

Thanks again for your interest in NIMBLE.  Cheers,

Daniel

On Mon, Jan 22, 2018 at 5:35 PM, <wenn...@gmail.com> wrote:
Hi everyone,

I am currently fitting some Bayesian models on a computing cluster and, since the models are too big, the MCMC iterations cannot be finished within the maximum allowed wall time (2 weeks). My question is: is there a way for me to save the finished the MCMC iterations and then restart the MCMC algorithm again at where it was left in another job? Or, in general, does anyone have any suggestions to speed up the MCMC/model building process? 

PS: My models have around 5000 - 20,000 parameters and my data set is about 1400 * 900. However, I don't think it's the model/data set that is causing this slow mixture, but NIMBLE is taking too much space during this process. But my intuition might be wrong here... Any thoughts on this anyone?

Thanks,
Wenna

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.

Perry de Valpine

unread,
Jan 23, 2018, 3:21:42 PM1/23/18
to wenn...@gmail.com, nimble-users
Hi Wenna,

We can follow up on how to work with the model and samplers to save model variables and internal sampler parameters.  But let me respond about the model building time and memory use.  We are working on making these steps faster and lighter and re-usable, but unfortunately for now they do need to be re-done in each R session.  These both result largely from working through R.  However, sometimes we find that re-writing the model can yield faster building times and potentially faster sampling.  Sometimes this involves declaring fewer vector nodes instead of many scalar nodes, or writing user-defined distributions that integrate over some random effects, or other considerations.  If you want to share your model with us (you can do so off-list, if you prefer), we can take a look and see if we have any suggestions.

-Perry


To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users+unsubscribe@googlegroups.com.

To post to this group, send email to nimble...@googlegroups.com.

Daniel Turek

unread,
Jan 23, 2018, 10:59:10 PM1/23/18
to wenn...@gmail.com, nimble-users, Perry de Valpine
Wenna, I've put together an instructional document on Restarting a NIMBLE MCMC.

Hopefully it's entirely clear, and gets you going with no problems.  Let me know if you have any questions.

Good luck!

Daniel

wenn...@gmail.com

unread,
Jan 24, 2018, 5:53:06 PM1/24/18
to nimble-users
Hi Daniel,

Thank you so much for sharing this document! I tried the sample code, but when I restored the MCMC in the new R session, the first iteration of all variables, as well as all iterations of beta[2] were "NA." However, when I ran a single MCMC algorithm with 20,000 iterations, the beta[2]'s in the second half of the iterations (iterations 10,001 to 20,000) were not "NA." I have attached my results below. Do you know what went wrong here?

Thanks,
Wenna

#Pausing and restarting MCMC
> samples[10000,] #last iteration of the first MCMC
   beta[1]    beta[2]         mu      sigma 
1.38199465 0.27802192 2.71661321 0.05292085 
> head(samples_continued) #first 6 iterations after restarting the MCMC
      beta[1] beta[2]       mu      sigma
[1,]       NA      NA       NA         NA
[2,] 1.346531      NA 2.751533 0.09538531
[3,] 1.346531      NA 2.720952 0.07220712
[4,] 1.346531      NA 2.699330 0.08764595
[5,] 1.346531      NA 2.710693 0.09458339
[6,] 1.346531      NA 2.705708 0.29522208





#running single MCMC with 20,000 iterations (no pausing and restarting)
> samples[9999:10006,]
      beta[1]   beta[2]       mu      sigma
[1,] 1.381995 0.2288096 2.741263 0.05120657
[2,] 1.381995 0.2780219 2.716613 0.05292085
[3,] 1.381995 0.2220797 2.700665 0.04997873
[4,] 1.381995 0.2433144 2.690273 0.08012158
[5,] 1.381995 0.2410102 2.669353 0.07388723
[6,] 1.381995 0.2388559 2.696224 0.04560240
[7,] 1.331974 0.2388559 2.694611 0.03243129
[8,] 1.329267 0.2388559 2.670744 0.07833757

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To post to this group, send email to nimble...@googlegroups.com.

Wenna Xi

unread,
Jan 25, 2018, 5:08:45 PM1/25/18
to Daniel Turek, nimble-users

Hi Daniel,

 

Thanks again for the document. It is working now, but I still have two questions:

 

  1. When I ran your code, the samplers I got after restarting the MCMC were different from yours. I’m not sure if it’s because you didn’t actually use the seed number you posted (then I’m not worried) or if there’s something else going on (see question 2).
> head(samples_continued)
      beta[1]   beta[2]       mu      sigma
[1,] 1.346531 0.2780219 2.751533 0.06776875
      [2,] 1.373949 0.2780219 2.708912 0.05720470
[3,] 1.368100 0.2780219 2.731302 0.07306114
[4,] 1.397031 0.2482833 2.710105 0.06642219
[5,] 1.417554 0.2482833 2.691461 0.05077806
[6,] 1.355736 0.2482833 2.701397 0.04253383

 

  1. I wanted to see if pausing and restarting the MCMC will provide me the same results as running one long MCMC chain, so I also ran the MCMC chain with 20,000 iterations and compared the results. It seems like the first half of the samplers (1 – 10,000) were the same, but the second half (10,001 – 20,000) were different. My understanding of pausing and restarting the MCMC was that we would still obtain the same posterior samplers as running the MCMC without the pause. I’m okay if this is because the results change when the starting values change even if we keep the same seed number, but I will be worried if for those adaptive samplers we are not restarting the MCMC from the last saved proposals (though I think saving the state variables are trying to save the last proposals, but I’m not sure about this). Do you have any comments on this issue?
> set.seed(0)
> Cmcmc$run(20000)
|-------------|-------------|-------------|-------------|
|-------------------------------------------------------|
NULL
> 
> samples <- as.matrix(Cmcmc$mvSamples)
      > samples[9995:10005,]
       beta[1]   beta[2]       mu      sigma
 [1,] 1.394315 0.2578996 2.709307 0.03237251
 [2,] 1.399039 0.2502140 2.724426 0.05516978
 [3,] 1.400995 0.2494896 2.728632 0.03437194
 [4,] 1.400995 0.2494896 2.719170 0.03925602
       [5,] 1.381995 0.2288096 2.741263 0.05120657
 [6,] 1.381995 0.2780219 2.716613 0.05292085
 [7,] 1.381995 0.2220797 2.700665 0.04997873
 [8,] 1.381995 0.2433144 2.690273 0.08012158
 [9,] 1.381995 0.2410102 2.669353 0.07388723
[10,] 1.381995 0.2388559 2.696224 0.04560240
[11,] 1.331974 0.2388559 2.694611 0.03243129
 

I apologize if I sound too nitpicky here; I just wanted to understand the mechanism behind this pausing and restarting thing better before I actually use it.

 

Thanks,

Wenna

 

 

From: Daniel Turek
Sent: Wednesday, January 24, 2018 9:10 PM
To: wenn...@gmail.com
Cc: nimble-users
Subject: Re: Pausing and Restarting MCMC iterations / NIMBLE Efficiency

 

Thanks for bringing this to my attention, Wenna.  There were a few minor details overlooked, the root cause being that the "reset = FALSE" option of the MCMC was designed to continue an MCMC that had already been run.  So doing this alternate use of it, a few of the initialization steps were missed.

 

I've updated the Restarting a NIMBLE MCMC document.   There's one functional change in a new code block, near the end, which restores things into the compiled MCMC.  There's now also output at the end, showing the continued samples, and correct behaviour.  It should work without any problems now.

 

Cheers!

Danie

 

 

Daniel Turek

unread,
Jan 26, 2018, 3:59:41 PM1/26/18
to wenn...@gmail.com, nimble-users
Thanks for bringing this to my attention, Wenna.  There were a few minor details overlooked, the root cause being that the "reset = FALSE" option of the MCMC was designed to continue an MCMC that had already been run.  So doing this alternate use of it, a few of the initialization steps were missed.

I've updated the Restarting a NIMBLE MCMC document.   There's one functional change in a new code block, near the end, which restores things into the compiled MCMC.  There's now also output at the end, showing the continued samples, and correct behaviour.  It should work without any problems now.

Cheers!
Danie

On Wed, Jan 24, 2018 at 5:53 PM, <wenn...@gmail.com> wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users+unsubscribe@googlegroups.com.

To post to this group, send email to nimble...@googlegroups.com.

Daniel Turek

unread,
Jan 26, 2018, 10:14:03 PM1/26/18
to Wenna Xi, nimble-users
Wenna, thanks again for raising these issues.  There were two problems going on.

(1) Rmarkdown seems to do some funny things with R's random number seed, in between different code chunks.  I think I fixed it now, but that would explain why you didn't get the same results that I showed.  I believe (hope) that now your results will agree with what's shown in Restarting NIMBLE MCMCs.

(2) The reason the "first run + restarted run" didn't agree with the "one long run" were that we weren't being entirely careful with restoring R's random number seed, to where it left off after the first run.  So, I also modified the document to save R's RNG seed after the first MCMC run, and restore it before beginning the second run.  So this truly does, now, produce the exact same samples as "one long run".  In the modified document, it also performs "one long run" at the end, and shows that the results agree.

I hope this finally closes the books on this.  But thanks for double-checking everything, to make sure it works.

Cheers,
Daniel





To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users+unsubscribe@googlegroups.com.

Wenna Xi

unread,
Jan 28, 2018, 12:47:00 AM1/28/18
to Daniel Turek, nimble-users

It’s working perfectly now! Thank you so much Daniel!!!

Reply all
Reply to author
Forward
0 new messages