Unstable WAIC and multivariate data nodes

135 views
Skip to first unread message

Eliot Boulaire

unread,
Jan 24, 2025, 4:43:20 AM1/24/25
to nimble-users

Hi nimble community,

I’m currently working on a large life cycle model and encountered an issue I’ve seen mentioned in this group before :

[Warning] There are 131 individual pWAIC values that are greater than 0.4. This may indicate that the WAIC estimate is unstable (Vehtari et al., 2017), at least in cases without grouping of data nodes or multivariate data nodes.

To address this, I’ve already run the model with an extended burn-in phase and a large number of iterations, suspecting convergence might be the root of the problem. However, the warning persists. 

My first question is : Is there a way to pinpoint the specific data nodes responsible for pWAIC values exceeding 0.4? The getWAICdetails() function doesn’t seem to provide this information, but it would be very helpful for debugging.


To try debugging this here’s an overview of the types of data nodes in my model:

  1. Abundance :
for (t in 1:C) {
  Eff[t] ~ dlnorm(meanlog = Log_Mu[t], sdlog = Log_Sd[t])
}
- Eff[t] is an abundance estimate derived from a CMR model.
- Log_Sd[t] represents uncertainty in abundance, incorporated as a constant.
  1. Sex ratio :

for (t in 1:C) {
  Nsex[t, 1:2] ~ dmulti(size = Tot[t], prob = Psex[t, 1:2])
}
- Nsex[t] is the number of individuals in each sex.
- Tot[t] is the total number of individuals, treated as a constant.

  1. Size structure :

for (t in 1:C) {
  for (l in 1:L) {
    Scales[l, t] ~ dnorm(mean = Mu[t], sd = Sd[t])
  }
}

- Scales[l, t]
represents individual scale lengths over time.
- These values are assumed to follow a normal size distribution defined by Mu[t] and Sd[t].

I suspect the size structure nodes might be causing the issue due to their multivariate nature. From what I’ve read, grouping multivariate nodes for WAIC calculation could be a solution.

My second question is : Would grouping multivariate nodes be a good approach here, or should I abandon WAIC entirely and use another metric like PSIS-LOO, as recommended in the literature ?


Thanks in advance for your insights and advice !

Have a great day,
Eliot B.

Chris Paciorek

unread,
Jan 25, 2025, 1:59:27 PM1/25/25
to Eliot Boulaire, nimble-users
Hi Eliot,

That's a good point about reporting more informatively which nodes are associated with the large pWAIC values. I've created a branch on GitHub that has a version of nimble which prints that information out. The printout is not the cleanest (it first tells you the names of the data nodes, then prints out indices for the nodes with pWAIC > 0.4), but it should give you the info you are looking for.

remotes::install_github("nimble-dev/nimble", ref = "report-waic-nodes", subdir = "packages/nimble")

As far as your question about grouping multivariate nodes, while I was one of the people who wrote this functionality, I don't actually have a good statistical sense for when grouping is a good approach versus using another metric. My only vague comment is that if data nodes are correlated (conditionally given the parameters that define their likelihood), it makes sense to group them, since the WAIC calculation sums over individual WAIC values for each data node, which might not be the right thing to do when there is dependence across data nodes. 

-chris

--
You received this message because you are subscribed to the Google Groups "nimble-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nimble-users...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/nimble-users/ed406a5d-4866-41e3-8efa-d2175e754636n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages