Ecological modeling in Stan

Matthijs Hollanders

unread,

May 6, 2024, 3:56:32 AM5/6/24

to hmecology: Hierarchical Modeling in Ecology

Hi everyone,

I know Bayesian ecological modeling is largely dominated by JAGS and NIMBLE because gradient-based methods like Hamiltonian Monte Carlo (HMC) don't allow discrete latent parameters. For me this was a major obstacle but I decided to learn how to do it and discovered that constructing ecological models in a state-space formulation like hidden Markov models actually lends itself to an implementation of the forward algorithm that scales with increasing complexity, including hierarchical observation processes with multiple hidden states. I wrote a blog post about how to do this in Stan and I hope some people here wanting to transition to Stan find it useful. Stan is a great language with an incredible team of developers and has some really nice features for some our big and slow models, such as within-chain parallelisation that spreads the work of HMC chains across all the cores on your computer (typical parallel workflows just use one core per chain, whereas this methods can perform the work of individual chains across many cores). I'd be happy to answer questions and comments about the post here.

Thanks!

Matt

Matthijs Hollanders

unread,

Feb 3, 2025, 10:13:32 PM2/3/25

to hmecology: Hierarchical Modeling in Ecology

Hi again,

I wrote a new blog post about fitting occupancy and N-mixture models in Stan, including on how to recover the posterior distributions of marginalised discrete parameters. When I started my Bayesian journey I was using JAGS and NIMBLE and found the marginalisation just one step too far, so I waited for a long time before transitioning to Stan. However, it's not that hard, yields better posterior distributions, and finally really helped me develop a deeper intuition for these models. I hope these posts can be useful to some of you.

Thanks!

Matt

Marc Kery

unread,

Feb 4, 2025, 7:10:43 AM2/4/25

to Matthijs Hollanders, hmecology: Hierarchical Modeling in Ecology

Dear Matt, all,

this is great, thanks for posting.

Anybody interested in learning program Stan, and marginal likelihood, might also want to check out the new Applied Statistical Modeling (ASM) book that Ken Kellner and I wrote. In addition to Stan, we fit every model also with R, JAGS, Nimble and TMB. Hence, you can actually experience how much better, or worse, each software is relative to each other. If you don't want to buy the book, or get it from some Russian pirate site, then you can simply check out the code on the book website here.

Best regards --- Marc

From: hmec...@googlegroups.com <hmec...@googlegroups.com> on behalf of Matthijs Hollanders <matthijs....@gmail.com>
Sent: Tuesday, February 4, 2025 04:13
To: hmecology: Hierarchical Modeling in Ecology <hmec...@googlegroups.com>
Subject: Re: Ecological modeling in Stan

--
*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2021) and Schaub & Kéry (2022)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hmecology/925a5cab-2c42-40df-a871-fd189db8e291n%40googlegroups.com.

Matthijs Hollanders

unread,

Feb 4, 2025, 9:00:40 AM2/4/25

to Marc Kery, hmecology: Hierarchical Modeling in Ecology

Hey Marc,

Thanks for reminding me of your new book, it had totally slipped my mind. It seems incredibly impressive and quite unique to be comparing all of these different approaches. I suspect I'll get an answer in the book, but do you have any spoilers about what you prefer using these days?

Cheers,

Matt

Qing Zhao

unread,

Feb 4, 2025, 9:55:24 AM2/4/25

to Matthijs Hollanders, Marc Kery, hmecology: Hierarchical Modeling in Ecology

Something probably related to this discussion:

I have been trying the NUTS sampler in Nimble and found it convenient to be able to have NUTS on continuous parameters and other samplers on discrete parameters. Also, it makes convergence possible for my models which don't seem to converge at all with MH. That being said, I did find an issue with the NUTS sampler in Nimble. While the computing could be fast with just a few covariates, it became unacceptably slow when I had 10+ covariates in the model, without changes in any other parts of the model. Do you guys know if that's also the case for Stan? Although, to be honest, I don't even know if this is a ubiquitous situation for Nimble or not since I only tried it with one model.

Best,

Qing

To view this discussion visit https://groups.google.com/d/msgid/hmecology/CAKMRE2t43xmE%3D1MU4R0cXx72JLE2tmoXFp47qeWXcGMhY2x9yg%40mail.gmail.com.

Marc Kery

unread,

Feb 4, 2025, 11:14:28 AM2/4/25

to Matthijs Hollanders, hmecology: Hierarchical Modeling in Ecology

Dear Matt,

Ahhhh ..... I am very sorry to say, but my preference is still for JAGS :) For its robustness, and for the ease of allowing me to write a model in its hierarchical expression, which to me is far more natural than in the form of the marginal likelihood.

Having said this, what I almost enjoyed most in this book project is to learn from Ken how to write the likelihood for all of these models as an R function and then using optim() for model fitting and for what we call the "Do-it-yourself" MLEs. I do agree with you that understanding marginal likelihood gives a new level of statistical understanding for these models. Nevertheless, from a conceptual standpoint, the hierarchical representation of the likelihood appears to me to be more fundamental.

But Stan is no doubt a wonderful program !

Best regards -- Marc

From: Matthijs Hollanders <matthijs....@gmail.com>
Sent: Tuesday, February 4, 2025 15:00
To: Marc Kery <marc...@vogelwarte.ch>
Cc: hmecology: Hierarchical Modeling in Ecology <hmec...@googlegroups.com>

Matthijs Hollanders

unread,

Feb 4, 2025, 5:47:48 PM2/4/25

to Qing Zhao, Marc Kery, hmecology: Hierarchical Modeling in Ecology

Hey Qing,

The reason I finally moved onto Stan was that the NUTS sampler was insanely slow for any model I tried. This certainly isn't the case in Stan. Also, according to Michael Betancourt there's no real principled way to mix discrete samplers alongside NUTS. I can't find the post right now, but he talks about it on the Stan discourse.

Cheers,

Matt

Qing Zhao

unread,

Feb 4, 2025, 5:52:55 PM2/4/25

to Matthijs Hollanders, Marc Kery, hmecology: Hierarchical Modeling in Ecology

👍

Qing Zhao reacted via Gmail

Perry de Valpine

unread,

Feb 5, 2025, 2:48:47 PM2/5/25

to Qing Zhao, Matthijs Hollanders, Marc Kery, hmecology: Hierarchical Modeling in Ecology

HI Matt, Qing and Marc,

Matt, thanks for the blog post. I'm sure many will find it useful. I just want to chime in with a few brief perspectives from a nimble developer.

Early versions of nimbleHMC were painfully slow. Now it should be much better. Of course performance will vary. I've seen our HMC be faster than Stan on some problems and slower on others. We haven't had time for a thorough comparison, and there are some problems where we'll be slow. It may be helpful to know that the marginal distributions in nimbleEcology (occupancy, N-mixture, HMM, etc.) were updated to be compatible with nimble's automatic differentiation (AD), which allows them to be used in HMC.

It is understandable that the Stan team is all in on HMC. It is a very good method, and some of their biggest arguments are about how its performance scales up to large data sets better than other methods. However, it is not necessarily universally the best method on all real problems, and not all real problems are so large that other methods won't be relevant. For example, I have seen Pólya-gamma samplers outperform HMC for occupancy models in spOccupancy or nimble (which does now have Pólya-gamma). The way I look at HMC is that in the computational cost tradeoff between iterations and mixing, it spends a lot of computation on each iteration in order to achieve good mixing per iteration. Other samplers can look worse *per iteration* in a trace plot but generate iterations much more rapidly, so we always like to look at effective sample size *per computation time* to really compare methods. Some of the comparisons we've published can be found on the Documentation tab at r-nimble.org. Of particular interest may be Ponisio et al (2020), where we illustrated that marginalization is not always worth the computational cost. It is another kind of tradeoff between computation and mixing. N-mixture models have very costly marginalizations, so they can sometimes be more efficient (per time, not per iteration) to sample without marginalization. (That paper was before we had nimbleHMC.)

Regarding the point you relayed from the Stan team about not having a "principled" way to use HMC together with other samplers (or I'm not sure if the comment was specifically about samplers for discrete latent states), that is curious. I would like to point out that Radford Neal, the inventor of HMC, wrote in his chapter in the Handbook of MCMC about how he likes to use HMC on some parts of a model and other samplers on other parts. On the other hand we've explored some of that and haven't found it necessarily or automatically is a strategy, but I'm just saying it is feasible and valid. Maybe the point from the Stan team was in relation to discrete samplers specifically, but I don't see an additional issue there (besides that HMC can't handle discrete latent states at all). Or maybe the idea was that the HMC tuning -- what happens during "warmup" -- may not necessarily work well when other parts of a model are being sampled in other ways (which just means it could be less efficient, but not wrong). In any case there is a lot of art and pragmatism to MCMC, so my two cents would be to take comments like the one you related with a grain of salt.

For folks reading this who aren't familiar, I'd just like to highlight that two big differences between nimble and Stan are (i) nimble represents a model as a directed acyclic graph (allowing much more control by algorithms and hence many different kinds of algorithms), while Stan represents a model as a single big calculation, and (ii) nimble is a platform for extension with algorithms that can be written from R by users and for methodological experimentation. We have some exciting developments under way, so I hope you'll stay tuned. We're always keen for feedback and to provide support (we try to be quick, sometimes it's hard), mostly at the nimble-users mailing list.

Best wishes,

Perry

On Tue, Feb 4, 2025 at 2:52 PM Qing Zhao <white...@gmail.com> wrote:

👍
Qing Zhao reacted via Gmail

--
*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2021) and Schaub & Kéry (2022)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.

To view this discussion visit https://groups.google.com/d/msgid/hmecology/CAE-HdUGp0Z51fgWK_JCZKg_35P%2Bpv3Wp7xUv4FEBCmP_Eb1Kzw%40mail.gmail.com.

Matthijs Hollanders

unread,

Feb 5, 2025, 6:43:11 PM2/5/25

to Perry de Valpine, Qing Zhao, Marc Kery, hmecology: Hierarchical Modeling in Ecology

Hey Perry,

Good to hear from you and excellent that HMC is working well, I'm not at all surprised to hear it. I think the timing for me was unfortunate because I had a need for HMC (turns out the model itself was the issue, not the samplers, but at the time it was enough motivation for me to learn Stan), but at the time it was still prohibitively slow in nimble.

I dug up the Twitter thread from Michael Betancourt where he talks about mixing samplers. This is well over my head so I'm agnostic, but it may be of interest to you. Keen to hear your thoughts if you're willing to share!

https://x.com/betanalpha/status/1412488546351562762

Thanks!

Matt

Perry de Valpine

unread,

Feb 18, 2025, 1:49:24 PM2/18/25

to Matthijs Hollanders, Qing Zhao, Marc Kery, hmecology: Hierarchical Modeling in Ecology

Dear Matt et al,

I am following up on this in slow motion. Thanks for pointing out that Twitter/X thread. It is hard to know where to begin. Basically, many MCMC methods work fine most of the time, and that rather didactic thread makes it sound like that is not the case, and that is wrong. HMC can't sample discrete latent states because HMC requires derivatives, and the derivative with respect to a discrete value doesn't make sense. In general for hierarchical models, you can marginalize over a latent state (analytically solve its integral or summation over probabilities given other values) or you can sample it as part of the MCMC. The two should be and generally are equivalent. What I'd suggest is to go ahead and try with experiments. Also, we have a package called compareMCMCs that is set up to help you time and compare efficiencies and posteriors from runs of nimble, Stan, and any other tool you want.