Estimating 'long-term' occupancy over multiple seasons, and effects of covariates on this

Will Langdon

unread,

Apr 10, 2024, 10:54:58 AMApr 10

to unmarked

Question

I think I need to use occupancy modelling for an upcoming project, but am very new to it, so apologies for a rather lengthy question that shows quite a poor grip of the basics! I'm hoping for a steer on the right way to approach answering my question. This is on the face of it, quite simple – I am interested in understanding what environmental conditions drive the occurrence of a butterfly species, and to do this, I have a very large number of butterfly records gathered across the species’ range, over a decade, that can be structured into visits with detection/non-detection based on the species list for different combinations of site and date.

Problem

However, it is a species with quite a dynamic population at large spatial scales. It has a core range where it is more or less resident, and should be detected with a certain amount of effort in most/all years. There is then a peripheral range into which there is immigration from the core range, in some years, quite a lot, resulting in lots of observations there. For this reason, I think taking a single year and doing a single-season occupancy model isn’t going to be appropriate, as if it’s a year with lots of immigration, I will get high predicted probabilities of occupancy in the periphery of the range where populations are unlikely to actually persist beyond the wave of immigration – the individuals sighted there are just immigrants that do not usually succeed in establishing themselves.

Because of this, it seems like I need an approach that takes a longer view, over several seasons with and without immigration, and identifies the core range as areas where it occurs consistently across seasons, the periphery as ones where it pops up occasionally, and then completely unoccupied areas as well. I’m after advice on the best tool for doing this.

Possible Solutions?

I wondered whether a dynamic occupancy model in unmarked might be the right approach. My current understanding is that I could use this to understand how environmental conditions affect the probability of occurrence at sites in the first year of my study period, and then colonisation or extinction after that. I guess the effects of environmental conditions on these parameters will capture what I’m after in different pieces, particularly the effects on extinction probability – core range sites will be ones that once occupied have a low probability of extinction, while peripheral range ones will be ones with a high probability of extinction if they are occupied in the initial time step (which should be unlikely), or get colonised during a period of immigration. Basically conditions that are associated with high occupancy and colonisation, and in particular, low extinction, indicate the core range.

But I wasn’t sure this was quite what I wanted, although the process underlying the core/periphery is one of extinction and colonisation, I am not sure that they are really what I’m interested in, more their combined effects. Is there a way to summarise these separate effects in a dynamic occupancy model, to understand which sites are have a higher probability of occupancy across (in each year of) the study period? From Ken Kellner’s handy tutorial here, it seems that I could get close to this by applying projected() to my model with mean = FALSE, to get a matrix of the expected occupancy probability at each site in each year. But is there a way to get the mean of this and associated confidence intervals for the studied period, which seems a nice way to show the core/peripheral range during the study period? And is there then a way to summarise the effects of environmental conditions on these mean probabilities, via their effects on probability of initial occupancy/colonisation/extinction, or is that not possible, and actually harder to interpret than the effects of environmental conditions on the parameters estimated by the model already with good a priori arguments against doing it!

The other approach that seems to get suggested here quite often for situations where people have data from multiple seasons, and they are either not interested in estimating colonisation/extinction, or don’t have enough data to do so, is a stacked model with a random effect for site. I see people suggesting that they can be implemented in unmarked with a bit of clever data structuring (and can site level random effects now be used, mentioned here, see also here, here?), and it also looks like they can be implemented with a Bayesian framework in ubms.

Summary

What approach would people advise – should I just use a dynamic occupancy model in normal way, and focus on the effects of environmental conditions on the extinction parameter, could I use a dynamic occupancy model to get some kind of long-term occupancy estimate and the effects of environmental conditions on it? Or would a stacked model be best – are the occupancy probabilities for sites (and the effects of environmental conditions on them) generated by these more akin to what I am after, in the answer here, Dan Linden says: with ‘stacking (and depending on pooling), you are now estimating the probability of occurrence across space and time as opposed to just space’ which does sound pretty much like what I’m after.

John Clare

unread,

Apr 10, 2024, 1:06:07 PMApr 10

to unma...@googlegroups.com

Hi Will,

I don't think I followed everything you wrote, but here is one view:

--the dynamic and stacked models can be viewed as nested. One type of stacked model might look like psi_{i, year}~Environment_{i, year}. A dynamic model with colonization and persistence processes just extends this...maybe something like psi_{i, year}~Environment_{i, year}*z_{i, year-1} where there's an interaction between the previous state and the environment. Put another way, in a stacked model where psi _{i, year}~Environment_{i, year}, the probability of extinction between (e.g.) year 1 and year 2 is just 1- (psi_{i, 2}) and the probability of colonization and persistence are each psi_{i, 2}.

--Lots of papers take the view of "I'm not interested in the interaction and just want to make inference about the additive effect of the environment on z/psi". I think this is one of those inference/prediction trade-offs where it's helpful to choose an objective first. If just interested in estimating those particular effects, then it's ok to estimate them. On the other hand, estimates of z/psi probably vary across models to some degree, and presumably, the model that "predicts" best (via IC or cross validation) provides the best estimates of psi_{i, year} or the decoded z_{i, year}. So if interested in predictions/maps, then some sort of model selection/averaging process might be better.

--It's straightforward to compare or model average different types of stacked/"temporally autologistic"/dynamic models. For IC, the kicker is one has to formulate them so there's a consistent/comparable likelihood (e.g., write the stacked model as a type of hidden markov model). Easy to code directly in nimble or stan or something. I'm not sure, but I think in unmarked, one would have to hack the colext function. I don't think it would require substantial reworking? Another alternative might be SURGE (Olivier Gimenez et al. have a paper about this), or maybe MARK or something.

HTH, and maybe others have better/clearer thoughts,

John

From: unma...@googlegroups.com <unma...@googlegroups.com> on behalf of Will Langdon <will.la...@gmail.com>
Sent: Wednesday, April 10, 2024 9:47 AM
To: unmarked <unma...@googlegroups.com>
Subject: [unmarked] Estimating 'long-term' occupancy over multiple seasons, and effects of covariates on this

--
*** Three hierarchical modeling email lists ***
(1) unmarked (this list): for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology: for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2021) and Schaub & Kéry (2022)
---
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unmarked/6ab8058a-61f0-40c2-ae9c-4d636f5dda48n%40googlegroups.com.

Will Langdon

unread,

Apr 11, 2024, 5:36:32 AMApr 11

to unmarked

Thanks a lot for taking the time to reply in such detail, John. Is there anything in particular that I didn't make clear, or can add more detail to?

I think my situation is the one you describe in the second bullet point - I'm not too focused on understanding interactions between years, more the effects of the environment on occupancy across years - is this a case for the stacked model then? But you're cautioning that this might not provide the most accurate predictions if I want to use the model to make predictions about future range dynamics or similar?

Cheers,

Will

John Clare

unread,

Apr 11, 2024, 1:02:39 PMApr 11

to unmarked

Hi Will,

I don't know what particular model will fit/predict best here. Mostly, just saying this doesn't have to be a binary choice between stacked/some sort of dynamic any more than one has to choose between using 3 predictors vs. 4 or one count distribution more than another. Can totally make the decision to choose one a priori, but these can also be compared in different ways if there's no strong inclination one way or another.

Similarly, say you set up a "stacked" style unmarkedframeOccu where there's one predictor like elevation and a column for year . A model like occu~1+elevation kind of estimates an average spatial effect of elevation over time very painlessly. But you could also estimate or derive something similar (maybe with a little more work) if you used a different model, be it occu~1+elevation+elevation|year, something with gamma/epsilon terms, etc.

Cheers,

John

From: unma...@googlegroups.com <unma...@googlegroups.com> on behalf of Will Langdon <will.la...@gmail.com>

Sent: Thursday, April 11, 2024 4:36 AM
To: unmarked <unma...@googlegroups.com>
Subject: Re: [unmarked] Estimating 'long-term' occupancy over multiple seasons, and effects of covariates on this

To view this discussion on the web visit https://groups.google.com/d/msgid/unmarked/130466d9-0ef9-46b0-af15-90393c4c54c0n%40googlegroups.com.

Reply all

Reply to author

Forward