dmnorm does not impute missing data?

Matthijs Hollanders

unread,

Nov 28, 2022, 11:50:14 PM11/28/22

to nimble-users

Hi all,

I was playing around with some models today and ran into an issue. It appears that with a multivariate normal likelihood for data, any missing values in the data do not get imputed. Is this known behaviour? Attached is a script with a reproducible example.

Thanks!

Matt

dmnorm.R

PierGianLuca

unread,

Nov 29, 2022, 3:18:36 AM11/29/22

to Matthijs Hollanders, nimble...@googlegroups.com

Hi Matt,

Yes, it's mentioned in the manual, § 6.1.1.3: "A node following a multivariate distribution must be either entirely observed or entirely missing".

I don't know how much this restriction affects what you plan to do. In my case, working with kernel mixtures, I use a product of gaussians instead of a multivariate gaussian (that is, spherical kernels), so I can deal with missing single components.

Cheers,
Luca

Chris Paciorek

unread,

Nov 29, 2022, 10:46:18 AM11/29/22

to Matthijs Hollanders, nimble...@googlegroups.com

Perry de Valpine

unread,

Nov 29, 2022, 12:27:58 PM11/29/22

to paci...@stat.berkeley.edu, Matthijs Hollanders, nimble...@googlegroups.com

Another work-around: I believe it should be possible to manually assign scalar samplers to the missing dimensions. Nothing clever such as expressions for conditional distributions or conjugate relationships will be used, but the dimensions will be sampled correctly. Chris, correct me if I'm wrong on this.

Perry

To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/CA%2B7Ts_r8DABGcQvKACL-UFTnwoTD_B%3D3Ozu6d6exCyMvWnFtMw%40mail.gmail.com.

Message has been deleted

Perry de Valpine

unread,

Dec 1, 2022, 1:25:26 PM12/1/22

to Matthijs Hollanders, nimble-users

That's an interesting idea. Let us know if it works and seems to mix ok. You might be able to skip the Y[j,i] declarations for the missing observations, and let them be imputed just in y_new. I might not be seeing it clearly.

Perry

On Tue, Nov 29, 2022 at 1:06 PM Matthijs Hollanders <matthijs....@gmail.com> wrote:

Hi everyone,

Thanks for the quick responses. I wonder if I've found another solution to this as well. The idea is to create a new response variable y_new that's very close to the observed y data matrix. Then model y_new with noncentered parameterization (not possible with data because data has to be followed by a "~ d()", which does impute the missingness. The modified code is this:

for (i in 1:n_obs) {
# multivariate likelihood (noncentered multivariate)
y_new[1:4,i] <- mu[1:4] + diag(sigma[1:4]) %*% t(chol[1:4,1:4]) %*% z[1:4,i]
for (j in 1:4) {
z[j,i] ~ dnorm(0, 1) # z-scores
y[j,i] ~ dnorm(y_new[j,i], sd = 0.001) # trick
} # j
} # i

Attached is a script showing there's no missingness in y_new after running the model and that the observed y and y_new are essentially the same values.

Curious about any feedback on this!

Matt

To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/06857b16-3227-4b49-92fb-09d24921c25cn%40googlegroups.com.

Chris Paciorek

unread,

Dec 2, 2022, 11:20:47 AM12/2/22

to Matthijs Hollanders, nimble-users

Taking the sd=0.001 and letting it go to zero would I think give you something equivalent to the true model, but the z's would probably not mix well.

I think it might be hard to publish something based on that approach, as a reviewer would probably ask why don't you just implement the

actual model based on standard MVN calculations.

Perry's suggestion to manually add samplers for the missing elements should work in terms of implementation, but it may

need a block sampler to get decent mixing, given the correlation amongst the missing elements of y.

-Chris

To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/CABeTmqiT7P7xqouopL4S8mspEuhgZJ83irYp1FbTACtmQtsodg%40mail.gmail.com.

Reply all

Reply to author

Forward