dmnorm does not impute missing data?

33 views
Skip to first unread message

Matthijs Hollanders

unread,
Nov 28, 2022, 11:50:14 PM11/28/22
to nimble-users
Hi all,

I was playing around with some models today and ran into an issue. It appears that with a multivariate normal likelihood for data, any missing values in the data do not get imputed. Is this known behaviour? Attached is a script with a reproducible example.

Thanks!

Matt
dmnorm.R

PierGianLuca

unread,
Nov 29, 2022, 3:18:36 AM11/29/22
to Matthijs Hollanders, nimble...@googlegroups.com
Hi Matt,

Yes, it's mentioned in the manual, § 6.1.1.3: "A node following a multivariate distribution must be either entirely observed or entirely missing".

I don't know how much this restriction affects what you plan to do. In my case, working with kernel mixtures, I use a product of gaussians instead of a multivariate gaussian (that is, spherical kernels), so I can deal with missing single components.

Cheers,
Luca

Chris Paciorek

unread,
Nov 29, 2022, 10:46:18 AM11/29/22
to Matthijs Hollanders, nimble...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/6c0fcc1a-66e3-a333-2c57-a5627f0d36fa%40magnaspesmeretrix.org.

Perry de Valpine

unread,
Nov 29, 2022, 12:27:58 PM11/29/22
to paci...@stat.berkeley.edu, Matthijs Hollanders, nimble...@googlegroups.com
Another work-around: I believe it should be possible to manually assign scalar samplers to the missing dimensions.  Nothing clever such as expressions for conditional distributions or conjugate relationships will be used, but the dimensions will be sampled correctly.  Chris, correct me if I'm wrong on this.
Perry


Message has been deleted

Perry de Valpine

unread,
Dec 1, 2022, 1:25:26 PM12/1/22
to Matthijs Hollanders, nimble-users
That's an interesting idea.  Let us know if it works and seems to mix ok.  You might be able to skip the Y[j,i] declarations for the missing observations, and let them be imputed just in y_new.  I might not be seeing it clearly.
Perry


On Tue, Nov 29, 2022 at 1:06 PM Matthijs Hollanders <matthijs....@gmail.com> wrote:
Hi everyone,

Thanks for the quick responses. I wonder if I've found another solution to this as well. The idea is to create a new response variable y_new that's very close to the observed y data matrix. Then model y_new with noncentered parameterization (not possible with data because data has to be followed by a "~ d()", which does impute the missingness. The modified code is this:

for (i in 1:n_obs) {   
   # multivariate likelihood (noncentered multivariate)
   y_new[1:4,i] <- mu[1:4] + diag(sigma[1:4]) %*% t(chol[1:4,1:4]) %*% z[1:4,i]
   for (j in 1:4) {
     z[j,i] ~ dnorm(0, 1)  # z-scores
     y[j,i] ~ dnorm(y_new[j,i], sd = 0.001)  # trick
   } # j    
} # i

Attached is a script showing there's no missingness in y_new after running the model and that the observed y and y_new are essentially the same values.

Curious about any feedback on this!

Matt

Chris Paciorek

unread,
Dec 2, 2022, 11:20:47 AM12/2/22
to Matthijs Hollanders, nimble-users
Taking the sd=0.001 and letting it go to zero would I think give you something equivalent to the true model, but the z's would probably not mix well.
I think it might be hard to publish something based on that approach, as a reviewer would probably ask why don't you just implement the
actual model based on standard MVN calculations.

Perry's suggestion to manually add samplers for the missing elements should work in terms of implementation, but it may
need a block sampler to get decent mixing, given the correlation amongst the missing elements of y.

-Chris



Reply all
Reply to author
Forward
0 new messages