running same Nimble model multiple times (with different data)

579 views
Skip to first unread message

Christel FAES

unread,
Apr 24, 2020, 8:24:40 AM4/24/20
to nimble-users
Hi, 

I have a (probably basic) question about running a Nimble model several times but with different data.  

I follow the following steps to run a model in Nimble
- create a Nimble model via nimbleModel(), 
- configure the MCMC via configureMCMC()  and buildMCMC() 
- then compile it via CompileNimble().
- and run the model via runMCMC()

For every new data sets (with the same constants), I rerun all these steps again. The compilation takes quite some time. 
Therefore, I was wondering whether it is possible to rerun the model without the need to redo the C++ compilation? 

Kind regards,
Christel

Daniel Turek

unread,
Apr 24, 2020, 10:27:31 AM4/24/20
to Christel FAES, nimble-users
Christel, basically the functionality you're looking for is model$setData(newDataList).  If you do this using the compiled model object (Cmodel), then the compiledMCMC will operate using the newDataList.  A very simple example is given below.

library(nimble)

N <- 10
y.first <- 1:N

code <- nimbleCode({
    mu ~ dnorm(0, sd = 1000)
    sigma ~ dunif(0, 1000)
    for(i in 1:N) {
        y[i] ~ dnorm(mu, sd = sigma)
    }
})
constants <- list(N = N)
data <- list(y = y.first)
inits <- list(mu = 0, sigma = 1)

Rmodel <- nimbleModel(code, constants, data, inits)
Rmodel$calculate()

conf <- configureMCMC(Rmodel)
Rmcmc <- buildMCMC(conf)
Cmodel <- compileNimble(Rmodel)
Cmcmc <- compileNimble(Rmcmc, project = Rmodel)

samples.first <- runMCMC(Cmcmc, 10000)
samplesSummary(samples.first)

y.second <- 2*(1:N)
Cmodel$setData(y = y.second)

samples.second <- runMCMC(Cmcmc, 10000)
samplesSummary(samples.second)





--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/8ba6f265-b662-496e-86a7-3557f58325de%40googlegroups.com.

Perry de Valpine

unread,
Apr 24, 2020, 10:42:53 AM4/24/20
to Daniel Turek, Christel FAES, nimble-users
An alternative is to directly change the values of the data nodes:

Cmodel$y <- new_y



Zoey Werbin

unread,
Apr 26, 2021, 9:05:31 AM4/26/21
to nimble-users
About the solutions proposed here: what is the difference between using the setData() function and changing the data value nodes directly? Specifically: I want to run datasets with different patterns of NA (missing values), while avoiding re-compilation. Would either approach work for this?

Thanks very much, and I can post this as a new question if preferred.
Zoey

Daniel Turek

unread,
Apr 26, 2021, 10:04:40 AM4/26/21
to Zoey Werbin, nimble-users
Zoey, thanks for your question.  Short answers:

Changing the value of (data or non-data) nodes in the model will do only that: change the value of particular nodes (but not affect which nodes are considered as NA missing values, and hence are either updated by the MCMC, or help constant as fixed "data" values).

Using the setData() method will perform as above, changing the node values, and will also set / reset boolean flags internal to the model itself, indicating which model nodes are flagged as observations (and hence are fixed).  However, these boolean flags internal to the model are (as far as building / compiling an MCMC algorithm goes) *only* inspected and used at the time of calling configureMCMC(model) (or, if you pass the model object directly to buildMCMC, at the time of calling buildMCMC(model)).  At that point, of MCMC configuration, these flags are inspected, and used to determine which model nodes will undergo MCMC sampling (specifically: all stochastic model nodes which are *not* flagged as being fixed data observations).

Finally, getting back to your question, if the locations of NA (missing) values is changing between datasets, then going back to the steps of configureMCMC(), buildMCMC(), and compileNimble() will be necessary, since the internals of the MCMC algorithm itself (which model nodes are being updated) will have changed.  Neither of the above options (changing node values, or using setData) will change how an already built and compiled MCMC will operate.

I hope this helps.  Let me know if it's still unclear.


Perry de Valpine

unread,
Apr 26, 2021, 10:19:36 AM4/26/21
to Daniel Turek, Zoey Werbin, nimble-users
Hi Zoey,

Thanks for the question.  It's a good one.  Oh, Daniel just gave a good reply as well!  I have another suggestion for you that might work nicely, but let me walk through a bit more of what is happening.

The typical steps are nimbleModel -> configureMCMC -> buildMCMC -> compile model and MCMC -> runMCMC.  

buildMCMC will do configureMCMC if you skip that step.  All are done inside of nimbleMCMC.

At configureMCMC, the samplers are set up based on the current tags of what is data that are in the model.  Hence, if you use setData to change what is tagged as data, you will generally need to re-create and re-compile the MCMC again. (There are some ways that buildMCMC also uses the data tags, but in any case it comes after configureMCMC).

I think you should be able to do this without recompiling the *model* again, which is some help.  That is, you should be able to do:

m <- nimbleModel(...)
cm <- compileNimble(m)
MCMC1 <- buildMCMC(m)
cMCMC1 <- compileMCMC(MCMC1, project = m)
## change what is data in m using resetData and setData, checking with isData.
MCMC2 <- buildMCMC(m)
cMCMC2 <- compileMCMC(MCMC2, project = m) # Possibly use resetFunctions = TRUE

I'd suggest trying this in a toy model first to be sure it all works and does what you want.

If you have a limited set of cases and the above doesn't work, you should be able to hold off on compiling the MCMCs and do them together:
cMCMClist <- compileMCMC(MCMC1, MCMC2, project = m)

Here is another option that should work in many cases and I think would achieve what you want.  You could make the model with NAs anywhere you'll ever need them.  The MCMC configuration object (returned by configureMCMC) will have methods printSamplers() or getSamplers() that will show you the order in which they are listed.  You'll need that.  Specifically you'll need to see which samplers are for the NAs, so you can omit them when you want to treat those nodes as data.  Then go through compiling that MCMC.  Then inspect and modify the sampler execution order list:

cMCMC$samplerExecutionOrderFromConfPlusTwoZeros

This will be a vector of integers from 1 to the number of samplers, with two 0s at the end for internal purposes.  Always keep those two zeros.

You can modify that as you please and assign it back into cMCMC$samplerExecutionOrderFromConfPlusTwoZeros.   If you omit a sampler that you don't want, it will not be used.

For example:

samplerToSkip <- 101 ## If sampler 101 has target node y[1] and you do not want to sample y[1] as an NA, 
cMCMC$samplerExecutionOrderFromConfPlusTwoZeros <- cMCMC$samplerExecutionOrderFromConfPlusTwoZeros[-samplerToSkip]

Be sure to put the data values you want into the compiled model
cm$y[1] <- data_for_y1

Let us know if it works or brings up more questions.

-Perry

Reply all
Reply to author
Forward
0 new messages