MCMC with many missing values

30 views
Skip to first unread message

Matthijs Hollanders

unread,
Aug 30, 2021, 11:03:36 PM8/30/21
to nimble-users
Hello all,

I'm working on a model that incorporates multiple levels of state uncertainty in multistate models. In our case, we collect multiple samples from individuals and perform multiple diagnostic runs on each sample to detect a pathogen. I use the pathogen loads -- estimated from the positive diagnostic runs, from positive samples, from positive individuals only -- to estimate individual pathogen loads to form a predictor on survival. I'm willing to share code, but I think the question I have is more general.

Because of the nature of that data, the vast majority of my 4-D array (individual, survey, sample, diagnostic run) is NA. As I've always understood it, MCMC will just sample from the posteriors to impute these NAs. However, I notice that the MCMC really struggles to converge on parameters to estimate the overall, individual, and sample pathogen loads. I'm modeling as follows:

Pathogen load on individual i, survey t, sample k, and diagnostic run l gets modeled as:

y[i,t,k,l] ~ dnorm(sample[i,t,k], sigma1)

Then the sample as:

sample[i,t,k] ~ dnorm(individual[i,t], sigma2) 

and the individual as

individual[i,t] ~ dnorm(overall, sigma3)

Is there any fundamental reason why it's difficult for the MCMC to converge when there are many NAs? 

Also, a somewhat related question: when I don't include individual[i,t] and sample[i,t,k] as predictors, individual[i,t] and sample[i,t,k] get assigned conjugate samplers. When I do include them as predictors in other functions, they get assigned an RW. When I manually try to assign a conjugate, it gives me an error. Is there any reason why those two can't be conjugate samplers when they're used in another function?

Happy to share code off-list. :)

Kind regards,
Matt

Chris Paciorek

unread,
Sep 2, 2021, 8:22:21 PM9/2/21
to Matthijs Hollanders, nimble-users
Hi Matt,

In the MCMC, NIMBLE cycles across all the NA data elements and all the parameters as part of one iteration, treating the NA data elements as additional parameters to be sampled. Because the missing data values depend on the parameters (and vice versa), there will be additional dependence in the MCMC that wouldn't be present if there were no missing data. In principle, one could sample parameters in a way that is not conditional on the current values of the missing data elements and then simply sample from the predictive distributions of the missing data elements at the end of each iteration so that they are consistent with the current parameters.

We've had some recent discussions amongst the NIMBLE developers about setting up sampling of missing data in this way but it's not yet implemented. Another possibility is that you could "flatten" your data so that you have a single y vector that only contains the observed data. You'll then need to set up some objects that map each observation to the i,t,k indexing that corresponds to that observation so that in the model code you have something like

y[j] ~ dnorm(sample[i[j],t[j],k[j]], sigma1)

That should avoid all the sampling of the missing values (which you could do post hoc if you desired) and remove the dependencies I mention above.

As far as conjugacy, we should detect conjugacy for individual[i,t] if the dependents of individual[i,t] satisfy the rules for conjugacy. If you can show us enough of the code to see all the dependents, I can probably tell you why NIMBLE is not detecting conjugacy. From what I see in your code that you show, I would think we should detect conjugacy, but I'm not actually sure what you mean by "when I don't include individual[i,t] and sample[i,t,k] as predictors".

Side note, you have "sigma1" which seems to indicate you are thinking of a standard deviation, but the default parameterization has the second parameter of dnorm as the precision.

-Chris

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nimble-users/0c9753d8-400f-4c08-b9d0-bd635a478fa4n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages