modeling with imputed data

65 views
Skip to first unread message

Brook Milligan

unread,
Apr 20, 2025, 4:16:51 PM4/20/25
to nimble-users
I am trying to infer parameters of a model that requires imputing some of the data. I think this means something like the following

x ~ f(theta)
x <- rf(phi)

That is, the model equation where x is distributed as f(), but also where x is generated anew each MCMC iteration from the simulation function rf().

Attached is a minimal example of this. Needless to say, this does not work and nimble complains with the following and quits.

[Warning] Model includes deterministic assignment using '<-' of the result of a density ('d') or simulation ('r') calculation. This is likely not what you intended

I am hoping this is possible, but that I am misunderstanding how to code this.

I would greatly appreciate input on how a node can simultaneously be data for paremeter inference and have a value imputed from some distribution during the MCMC.

Thanks a lot.

Cheers,
Brook


minimal_example.R

Daniel Turek

unread,
Apr 21, 2025, 7:14:12 AM4/21/25
to Brook Milligan, nimble-users
Brook, you have the right idea here, the only difference is for imputed nodes, you only need to prescribe their prior distribution using the dxxx() function, or using your notation from above:

x ~ f(theta)

But, then do *not* assign "x" as data.  You may wish to provide a valid initial value for x, however.  The MCMC will then identify these "predictive", or "imputed" stochastic nodes, and will impute values for them during the course of MCMC sampling.  This approach can be done using built-in distributions, such as the "dnorm" distribution as in your script.  Or, it can also work with "custom written" distributions, which you may yourself implement as nimbleFunctions.  If that's your goal, we can discuss how to do that.

Another question - is your goal to have "mu" and "sigma" be inferred from data?  That's the suggestion, when they have prior distributions.  However, when you provide values for "Mu" and "Sigma" as constants, it fixes their values.  I guess I'm confused if there are multiple "mu" and "Mu" terms, and same for "sigma" and "Sigma" in your script.

It also occurs to me that you most likely want to infer "mu" and "sigma" from some actual data values for x, and then impute new values.  I've modified your script to instead have 100 *data* values called "x" (generated from a normal(mu=3, sigma=2) distribution), which will inform the inferences for "mu" and "sigma".  Then, also impute 100 new values, into a new variable called xp.  I also changed all the "Mu" and "Sigma" in the model to be lowercase.

See the modified code below.

Cheers,
Daniel


library(nimble)

code <- nimbleCode({

    # infer (mu,sigma) from x
    for (i in 1:n) {
        x[i] ~ dnorm(mean=mu,sd=sigma)
    }

    # augment unobserved x with samples
    for (i in 1:n) {
        xp[i] ~ dnorm(mean=mu,sd=sigma)
    }

    # priors
    mu ~ dunif(-10,10)
    sigma ~ dunif(0,10)
})

Mu <- 0
Sigma <- 1

n <- 100
x <- rnorm(n, 3, 2)

MCMC_chains <- 1
MCMC_iterations <- 1000

constants <- list(n=n)   #####,Mu=Mu,Sigma=Sigma)
data <- list(x=x)
initial_values <- list(mu=Mu,sigma=Sigma, xp = rep(0,n))    ## added initial values for xp

mcmc.out <- nimbleMCMC(code=code,
                       constants=constants,
                       data=data,
                       inits=initial_values,
                       nchains=MCMC_chains,
                       niter=MCMC_iterations,
                       summary=TRUE,
                       monitors=c("mu","sigma", "xp"))     ## added a monitor for x


--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/nimble-users/8F3F8726-29E8-490F-804A-B329C49241C1%40nmsu.edu.


--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/nimble-users/8F3F8726-29E8-490F-804A-B329C49241C1%40nmsu.edu.

Brook Milligan

unread,
Apr 21, 2025, 12:56:12 PM4/21/25
to Daniel Turek, nimble-users
Hi Daniel,

Thanks for responding and for taking the time to mess with my script.

Likely, the bigger picture motivation does not come through clearly in my minimal example. I’ll try to explain better and contrast with your update.

Your modified example is designed around the following graph:

data <-- parameters --> imputed data

That is, infer the parameters from the data and use their inferred values to impute new data. That makes perfect sense, but is not what I am trying to do.

Instead, I am trying to represent the following graph:

data <- parameters -> imputed data <- imputation parameters

That is, infer the parameters from both actual data and imputed data when the distribution of the imputed subset is known. The script I sent lacked the data part of that graph because it was focusing on the problem of defining a part of the graph for inference and a part for imputation that overlap. In my script, mu/sigma are the parameters to infer, whereas Mu/Sigma are the known parameters for the imputation.

Is this possible in Nimble?

Thanks again for your help.

Cheers,
Brook

Daniel Turek

unread,
Apr 22, 2025, 6:17:35 AM4/22/25
to Brook Milligan, nimble-users
Brook, I understand your question better now, and also why you had the mu/Mu and sigma/Sigma terms in your original model.  Apologies for the misunderstanding!

Yes, what you're seeking to do is certainly possible in nimble, although it's not natively supported using out-of-the-box syntax and distributions, but would rather require some customization of distributions, to have the parameters be inferred by both the data and the imputed data.  At the most basic level, this could be done using a custom distribution for the parameters, which would accept both the data and the imputed data as arguments (or equally well, having one of those be the "x" variate in the custom distribution), and then computing the log-density of the parameters using all of these terms.

Does that make sense?  Does this sound like a helpful path forward for you?  If you'd prefer to share more specific details of your desired model or data off list, that's fine also, and I'm sure we could figure it out.

Cheers,
Daniel

Perry de Valpine

unread,
Apr 22, 2025, 10:51:03 AM4/22/25
to Daniel Turek, Brook Milligan, nimble-users
Hi Brook,

I may not be following entirely, but it looks like I gave an approach to do this in a thread a few years ago. I went to check on it, and... it looks like you had asked the same or similar question on that thread back then. Is that right? If so, did the trick I provided make sense and work for you? That is a way to do what you are asking. Here is the old thread. Is the current situation different?

HTH
Perry


Brook Milligan

unread,
Apr 22, 2025, 10:54:41 PM4/22/25
to Daniel Turek, nimble-users
Hi Daniel,

OK, I think I understand, but I’m still trying to wrap my head around these special distributions and dummy variables. I’m not quite there.

However, I have attached a new minimal example that tries to calculate the likelihood for each dummy term by first generating the corresponding imputed value and second calculating its likelihood. It seems that I have to wrap the R versions of the called functions in nimbleRcall, but even that is not working. I feel like I am chasing errors that I have no clue about.

The attached script gives the following error for me:

Error: In sizeAssignAfterRecursing: 'rNorm' is not available or its output type is unknown. This may occur if a user-defined function name is the same as the name of a function in a package that `nimble` uses.
This occurred for: model_x[getNodeFunctionIndexedInfo(ARG1_INDEXEDNODEINFO__,1)] <<- rNorm(=1,mean=model_mu[1],sd=model_sigma[1],Mean=0,SD=1)
This was part of the call: { model_x[getNodeFunctionIndexedInfo(ARG1_INDEXEDNODEINFO__,1)] <<- rNorm(=1,mean=model_mu[1],sd=model_sigma[1],Mean=0,SD=1) }
In addition: Warning messages:
1: In dunif(model$mu[1], min = -0, max = 0, log = 1) : NaNs produced
2: In dunif(model$mu[1], min = -0, max = 0, log = 1) : NaNs produced
3: In dunif(model$mu[1], min = -0, max = 0, log = 1) : NaNs produced
Execution halted

I have no identifier rNorm, so I’m lost.

I thought this was the path you were suggesting, but clearly I’m not understanding completely. Any suggestions are welcome.

Thanks a lot.

Cheers,
Brook

> To view this discussion visit https://groups.google.com/d/msgid/nimble-users/CAKbe0hpsGX2fRmkxvkgKOQavn67beyuXLMnyfanb2VxVzCDLOA%40mail.gmail.com.


minimal_example.R

Daniel Turek

unread,
Apr 28, 2025, 1:40:48 PM4/28/25
to Brook Milligan, nimble-users
Brook, I've modified your script to operate the way you were aiming for.  That's a good use of the dummy values for the x[] array, that's the right idea.  Also, there's no need for the nimbleRcalls, because you're only making calls to (otherwise compilable) functions dnorm and rnorm.  So that helped simplify things.

That said, I'm not certain that the random generation of y belongs in the dNorm function, as you had it.  It might belong inside the model, instead.  Please take a look, and modify it as you see fit.  In any case, the code below operates, and implements what you were aiming for in your script.

Cheers,
Daniel



library(nimble)

dNorm <- nimbleFunction(
    run = function(x = double(), mean = double(), sd = double(), Mean = double(), SD = double(), log=integer(0, default = 0)) {
        ####y <- Rrnorm(mean=Mean,sd=SD)
        y <- rnorm(1, Mean, SD)
        ####pdf <- Rdnorm(y,mean=mean,sd=sd,log=log)
        lp <- dnorm(y, mean, sd, log = 1)
        returnType(double(0))
        return(lp)
    })

code <- nimbleCode({
    for(i in 1:n) {
        x[i] ~ dNorm(mean=mu, sd=sigma, Mean=Mu, SD=Sigma)
    }
    # priors
    mu ~ dunif(-10, 10)    ## dunif(-10*Mu, 10*Mu)
    sigma ~ dunif(0, 10*Sigma)

})

Mu <- 0
Sigma <- 1
n <- 10
x <- rep(0,n)

constants <- list(n=n, Mu=Mu, Sigma=Sigma)
data <- list(x=x)
inits <- list(mu=Mu, sigma=Sigma)


mcmc.out <- nimbleMCMC(code=code,
                       constants=constants,
                       data=data,
                       inits=inits,
                       niter=1000,
                       summary=TRUE,
                       monitors=c("mu","sigma"))


Brook Milligan

unread,
May 7, 2025, 2:14:52 PM5/7/25
to Daniel Turek, nimble-users
Hi Daniel,

Thanks for the update to the script. Perhaps we are running different versions of Nimble, as your script will not run for me; I am using 1.3.0, which I believe to be the most recent release. I get the following messages:

Defining model
[Note] Registering 'dNorm' as a distribution based on its use in BUGS code. If you make changes to the nimbleFunctions for the distribution, you must call 'deregisterDistributions' before using the distribution in BUGS code for those changes to take effect.
[Warning] Random generation function for dNorm is not available. NIMBLE is generating a placeholder function, rNorm, that will invoke an error if an algorithm needs to simulate from this distribution. Some algorithms (such as random-walk Metropolis MCMC sampling) will work without the ability to simulate from the distribution. If simulation is needed, provide a nimbleFunction (with no setup code) to do it.
Building model
Setting data and initial values
Running calculate on model
[Note] Any error reports that follow may simply reflect missing values in model variables.
Checking model sizes and dimensions
Checking model calculations
Compiling
[Note] This may take a minute.
[Note] Use 'showCompilerOutput = TRUE' to see C++ compilation details.
Error: In sizeAssignAfterRecursing: 'rNorm' is not available or its output type is unknown. This may occur if a user-defined function name is the same as the name of a function in a package that `nimble` uses.
This occurred for: model_x[getNodeFunctionIndexedInfo(ARG1_INDEXEDNODEINFO__,1)] <<- rNorm(=1,mean=model_mu[1],sd=model_sigma[1],Mean=0,SD=1)
This was part of the call: {
model_x[getNodeFunctionIndexedInfo(ARG1_INDEXEDNODEINFO__,1)] <<- rNorm(=1,mean=model_mu[1],sd=model_sigma[1],Mean=0,SD=1)
}
Execution halted

Any ideas?

Cheers,
Brook

Daniel Turek

unread,
May 9, 2025, 10:13:19 AM5/9/25
to Brook Milligan, nimble-users
Brook, thanks for following up.

Yes, you're right.  When you go straight into nimbleMCMC, and require the on-the-fly generation of the random generation function (rNorm) for your custom distribution, it does create this function for you, but in a place where nimbleMCMC can locate it.  We'll look into this and fix this.

For now, can you just add the line of code shown below (in blue), and I think this script should work for you.

Cheers,
Daniel


Rmodel <- nimbleModel(code, constants, data, inits)


mcmc.out <- nimbleMCMC(code=code,
                       constants=constants,
                       data=data,
                       inits=inits,
                       niter=1000,
                       summary=TRUE,
                       monitors=c("mu","sigma"))

Brook Milligan

unread,
May 22, 2025, 3:10:53 PM5/22/25
to Daniel Turek, nimble-users
Hi Daniel,

I’m getting back to this. Thanks for your input.

As I understand it, nimbleMCMC() compiles the model, but cannot find the compiled code; hence the need to call nimbleModel() as well. Is that correct? I have run lots of other models with just nimbleMCMC(), so I don’t really understand why this case is different. Perhaps because of the extra NIMBLE function definition (which I normally do not use)?

In any case, the code you sent last does in fact run. Thanks a lot for that. However, it does not generate the type of MCMC samples I would expect (perhaps my expectations are wrong, though). The samplers act as if they are rejecting long stretches of proposals.

Perhaps you can take a look at the model code (attached) and make sure it should be sampling correctly. I have added some output stuff, including a PDF file with trace plots, etc., to help you visualize this. I hope it just runs and the problem is obvious.

Thanks a lot for your help. I look forward to learning how to do this correctly.

Cheers,
Brook
> > > > To unsubscribe from this group and stop receiving emails from it, send an email tonimble-user...@googlegroups.com.
> > > > To view this discussion visit https://groups.google.com/d/msgid/nimble-users/8F3F8726-29E8-490F-804A-B329C49241C1%40nmsu.edu.
> > > >
> > > >
> > > > --
> > > > You received this message because you are subscribed to the Google Groups "nimble-users" group.
> > > > To unsubscribe from this group and stop receiving emails from it, send an email tonimble-user...@googlegroups.com.
> > > > To view this discussion visit https://groups.google.com/d/msgid/nimble-users/8F3F8726-29E8-490F-804A-B329C49241C1%40nmsu.edu.
> > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google Groups "nimble-users" group.
> > > To unsubscribe from this group and stop receiving emails from it, send an email tonimble-user...@googlegroups.com.
minimal_example-5.R
Reply all
Reply to author
Forward
0 new messages