Question
I think I need to use occupancy modelling for an upcoming project, but am very new to it, so apologies for a rather lengthy question that shows quite a poor grip of the basics! I'm hoping for a steer on the right way to approach answering my question. This is on the face of it, quite simple – I am interested in understanding what environmental conditions drive the occurrence of a butterfly species, and to do this, I have a very large number of butterfly records gathered across the species’ range, over a decade, that can be structured into visits with detection/non-detection based on the species list for different combinations of site and date.
Problem
However, it is a species with quite a dynamic population at large spatial scales. It has a core range where it is more or less resident, and should be detected with a certain amount of effort in most/all years. There is then a peripheral range into which there is immigration from the core range, in some years, quite a lot, resulting in lots of observations there. For this reason, I think taking a single year and doing a single-season occupancy model isn’t going to be appropriate, as if it’s a year with lots of immigration, I will get high predicted probabilities of occupancy in the periphery of the range where populations are unlikely to actually persist beyond the wave of immigration – the individuals sighted there are just immigrants that do not usually succeed in establishing themselves.
Because of this, it seems like I need an approach that takes a longer view, over several seasons with and without immigration, and identifies the core range as areas where it occurs consistently across seasons, the periphery as ones where it pops up occasionally, and then completely unoccupied areas as well. I’m after advice on the best tool for doing this.
Possible Solutions?
I wondered whether a dynamic occupancy model in unmarked might be the right approach. My current understanding is that I could use this to understand how environmental conditions affect the probability of occurrence at sites in the first year of my study period, and then colonisation or extinction after that. I guess the effects of environmental conditions on these parameters will capture what I’m after in different pieces, particularly the effects on extinction probability – core range sites will be ones that once occupied have a low probability of extinction, while peripheral range ones will be ones with a high probability of extinction if they are occupied in the initial time step (which should be unlikely), or get colonised during a period of immigration. Basically conditions that are associated with high occupancy and colonisation, and in particular, low extinction, indicate the core range.
But I wasn’t sure this was quite what I wanted, although the process underlying the core/periphery is one of extinction and colonisation, I am not sure that they are really what I’m interested in, more their combined effects. Is there a way to summarise these separate effects in a dynamic occupancy model, to understand which sites are have a higher probability of occupancy across (in each year of) the study period? From Ken Kellner’s handy tutorial here, it seems that I could get close to this by applying projected() to my model with mean = FALSE, to get a matrix of the expected occupancy probability at each site in each year. But is there a way to get the mean of this and associated confidence intervals for the studied period, which seems a nice way to show the core/peripheral range during the study period? And is there then a way to summarise the effects of environmental conditions on these mean probabilities, via their effects on probability of initial occupancy/colonisation/extinction, or is that not possible, and actually harder to interpret than the effects of environmental conditions on the parameters estimated by the model already with good a priori arguments against doing it!
The other approach that seems to get suggested here quite often for situations where people have data from multiple seasons, and they are either not interested in estimating colonisation/extinction, or don’t have enough data to do so, is a stacked model with a random effect for site. I see people suggesting that they can be implemented in unmarked with a bit of clever data structuring (and can site level random effects now be used, mentioned here, see also here, here?), and it also looks like they can be implemented with a Bayesian framework in ubms.
Summary
What approach would people advise – should I just use a dynamic occupancy model in normal way, and focus on the effects of environmental conditions on the extinction parameter, could I use a dynamic occupancy model to get some kind of long-term occupancy estimate and the effects of environmental conditions on it? Or would a stacked model be best – are the occupancy probabilities for sites (and the effects of environmental conditions on them) generated by these more akin to what I am after, in the answer here, Dan Linden says: with ‘stacking (and depending on pooling), you are now estimating the probability of occurrence across space and time as opposed to just space’ which does sound pretty much like what I’m after.