I'll start with a short question: Where can I go to learn more about which samplers perform best in various situations? I've found a few papers on the edge of my comprehension levels, but I'd love a broad overview, particularly for models with highly correlated posteriors. I'd like to become more of a nimble power-user who can pick optimal samplers for any model I write, but I'm at a loss of where to start. Please feel free to only answer this question, but I'll elaborate on what is going on in my current model for future readers.
The motivating issue is an Integrated Population Model (IPM) that converges in JAGS but that I cannot get to converge in nimble. The exact behavior depends on which sampler I use. When I let the defaults stand, almost all of the nodes use the RW sampler. With that sampler, many of the nodes stick on a value, either from the very start or after exploring the posterior for a while (watching a functioning chain flat-line is fascinating). Similar patterns happen when I switch it the samplers to the log scale or set reflect to true. I've tried the RW-block sampler as well, but I'm not at all certain that I'm choosing my grouping for the blocks correctly (guidance on when/how to group would be fantastic - I'm currently just grouping temporally sequential variables together (like abundance at each time step)). The RW-block sampler hasn't given significantly different results than the RW sampler. I've also tried the slice sampler with onlySlice = T and by selecting just the sticking nodes to slice sample. Any version of the slice sampler results in nodes wandering off into crazy space (like 10^8 for a value that should be >0, <1000).
For a bit more about the model, it has a removal component where a known number of individuals are subtracted at each time step. That led to many headaches when originally troubleshooting it in JAGS, but I was able to solve it with normal approximations of binomials and heavy use of truncation to "minimum number alive" at various steps. All that to say, the model has a very narrow space of reasonable, highly-correlated posteriors.
At this point I could run it all in JAGS, but I want to be able to do all of my work in nimble going forward (I'd also prefer not to wait weeks for every run). With that in mind, a few specific questions I have are:
1. How do various nimbe samplers work compared to JAGS?
2. When do various samplers work better/worse?
3. Are the any plans to re-maintain the autoblocking functionality?
4. Is there a way to see which nodes in a single parameter (e.g. N[]) are sampled by each sampler as opposed to the posterior predictive sampler?
5. When/how should I block?
6. Could "sticking" be a result of the adaptive RW? Should I change it to non-adaptive, manually set step lengths or other parameters to unstick nodes? My thinking here is that the "reasonable" values are very tightly clustered, so if it fails to find them at a high enough rate, it might start looking farther and farther out, continuing to decrease the acceptance rate to functionally 0.
7. If a node is stuck, is there some obscenely large number of iterations that will unstick it?
Sorry for the tome, but thank you so much for reading this far! Mostly, I am at a loss of resources for continued self-education.
Cheers,
Kenneth