Sampling missing data

43 views
Skip to first unread message

Pedro Cardoso

unread,
Jul 17, 2025, 2:33:42 PMJul 17
to nimble-users
Hi everyone,

Thank you for all your work on the nimble package. It has been very useful.

About 2 years ago, I coded a model that takes continuous variables and builds a Dirichlet process mixture model (DPMM) using the stick-breaking function and building multivariate normal distributions for each cluster. At the time, I noticed that the default behaviour in the presence of missing data was to sample missing values in rows that had missingness in all variables, but not sample missing values in rows with partial missingness (only some variables are missing). To overcome this, I coded custom samplers (conditional random walk) that would sample rows with full missingness from the corresponding cluster's multivariate normal distribution and would sample conditional values (based on the available information) from the corresponding cluster's multivariate normal distribution. However, this has stopped working, and the custom samplers no longer work. I wonder if there's something I'm not aware of regarding the missing data that has changed in the package.

I've done some investigating about the nimble package versions and noticed that my code works for version 1.0.1 (the version available at the time), but stops working from version 1.1.0 onwards.

I've also coded a reprex that contains the code to test this problem and the two custom samplers.

Thank you for any help.

Reprex explanation:
- Dataset with 2 variables
- Missingness: 1 row with X1 missing, 1 row with X2 missing, 1 row with X1/X2 missing
- Build model
- Run MCMC
- Check if values were sampled (they are for the row with missingness in X1/X2
- Replace the samplers
- Run MCMC
- Check if values were sampled (they are all sampled in v1.0.1).
sampler_conditional_RW_block.R
sampler_conditional_RW.R
reprex.R

Perry de Valpine

unread,
Jul 23, 2025, 8:03:55 AMJul 23
to Pedro Cardoso, nimble-users
Pedro,
Thanks for the question and very clear reproducible example.
This was not entirely obvious, but the relevant items from the 1.1.0 release NEWS (see posts on r-nimble.org or the complete set in inst/NEWS.md on our GitHub repo) are:

- `configureMCMC` will no longer assign samplers to data nodes, even if
  the `nodes` argument includes data nodes (PR #1407).
 
- Add new argument `allowData` to the `addSampler` method of MCMC
  configuration objects, with default value `FALSE`.  When `TRUE`,
  samplers can be assigned to operate on data nodes (PR #1407).
 
These changes were intended to provide better fences around data nodes to avoid accidental MCMC sampling if a user is manually changing an MCMC configuration. In your case, the nodes x_cont_miss[1, ] and x_cont_miss[2, ] are vector nodes (following dmnorm) each with one provided element and one missing element. A node is considered all data or all non-data, so these are considered data nodes, and even the missing elements are considered data. For example, model$isData("x_cont_miss[1, 1]") is TRUE. If you add "allowData=TRUE" when you call config$addSampler, it works.

Note that the block sampler you provide is not assigned because zmiss[3] and x_cont_miss[3,] are all sampled by a posterior predictive sampler. It shows the target as vmiss[3], and in its setup code finds dependencies and should in that way include x_cont_miss[3, ]. I think you are aware of this from following the second if() condition of your config modifications.

(Note also that a variable with multiple nodes can contain a mix of data and non-data nodes. For example, some x[i, ] could be data and some non-data, when each x[i, ] follows a dmnorm and thus is a separate node from x[j, ] for i != j. It's just that the elements of a non-scalar node aren't tracked as a mix of data and non-data.)

One could debate the nuances and use cases here, although I don't think it's a bug. I do think we should consider emitting a message when addSampler bails out because of not sampling data nodes. I'll file an issue on that.

HTH
Perry


--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/nimble-users/3c7ecbf8-a79a-40cc-87d3-daba3caa0dc7n%40googlegroups.com.
Message has been deleted

Pedro Cardoso

unread,
Jul 24, 2025, 3:58:56 PMJul 24
to nimble-users
Hi Perry,

Thank you so much for the clear explanation.

The addition of the "bail out" message during the addSampler would definitely be a useful addition. Thank you for filling an issue on this.
Reply all
Reply to author
Forward
0 new messages