Detection formula suggestion

arianna vicari

unread,

May 30, 2025, 5:14:05 PMMay 30

to spOccupancy and spAbundance users

Hello everybody!

I am very much stuck on my analysis, and would greatly appreciate any input from you.

I am working on camera trap data in Senegal, comparing the distribution of warthog between two areas, one in a national park, and one inhabited by people. I am therefore performing my analyses separately.

Until now, I am using the non-spatial models, trying to fit them into the proper formulas.

For the inhabited area, I managed to find a decently satisfying model, with a WAIC relatively significant from the ecological point of view, and Bayesian p-value within range.

For the wilder area, I am struggling a lot. I cannot manage to converge a model with significant Bayesian p-value, let alone a significant occupancy formula.

I am therefore asking all of you for suggestions on how to approach this issue, in particular with the detection formula. So far I tried to consider the following covariates: (1| days), (1 | site_id), scale(days), site_id. All of them, in whatever combination possible, give me a Bayesian p-value <0.1, both with naive occupation formula or with significant ecological covariates.

In my head it makes sense, based on the widely distributed specie, that no model is meaningful, but I have the feeling that I am missing out on something and would like to try a few more options.

Maybe the time of the day might influence the detection? Maybe something else... If you have any insight on this, I would be deeply grateful.

Have a nice day,

Arianna Vicari

Marc Kéry

unread,

Jun 3, 2025, 5:13:21 AMJun 3

to spOccupancy and spAbundance users

Dear Arianna, and hello Jeff,

couple of comments here and a question to Jeff.

First of all, I am confused by much of what Arianna says:

· what do you mean with "a WAIC relatively significant from the ecological point of view" ? The WAIC is a measure of how well the model would predict a new data set that is similar to your data set. It is used for helping you to decide whether you should take one or the other model for learning about the process that generated your data. But it cannot be significant and it does not have anything to do with ecology.

· Then, by "Bayesian p-value within range" you probably mean that this measure of goodness of fit is not more extreme than 0.05 – 0.95 ?

· But then you say "a model with significant Bayesian p-value" … what do you mean by this ? The thing with a goodness of fit (GoF) test is just than one hopes that it ends up NOT significant. So, what you seem to suggest is a problem is actually what we all pray for when we conduct a GoF test.

· And what is a "significant occupancy formula" ? I presume you mean a model with terms in the occupancy submodel that have 95% Bayesian credible intervals that do not contain 0 and hence can be considered as being "significant" in an analogy to the non-Bayesian version of a significance test based on a 95% confidence interval ? --- But there is no guarantee that any of your covariates is significant. Perhaps you are unlucky and your covariates simply are not related strongly with spatial variation in occupancy ? So, I don't see what the "issue" is you are talking about.

· Then, you say that you have problems with GoF for the non-wild study area, with p-values < 0.1. So how bad are the p-values ? For better or (rather for) worse, people often take 0.05 as a threshold for a significant GoF test and so you might get away with yours.

· A GoF test such as the one you conducted yields a single-number answer to the question of "Does my model fit the data?". A much better approach and one which may help you recognize how you can improve your model is to ask "Where does my model not fit?" For this, you can looks at residuals of your model (see here). You can plot them in space and against any covariate you can come up with. This may perhaps show that most of the lack of fit identified by the GoF test is due to certain patterns in the data that your current model does not explain in a satisfactory way. Then, you can perhaps explain them in a new model with a suitable additional covariate.

Finally, and this contains (at the bottom) also a question to Jeff: Normally, I'd fit a single model to both of your data sets and accommodate the differences between the wild and non-wild areas by adding a factor 'Area' in the analysis that codes for this contrast. If you fit a model with a main effect of 'Area' and with its interactions with all covariates, then you get exactly the same analysis as when you fit the same models to both data sets separately. But the advantage is that you can directly compare the parameters to see whether they differ between the two areas. And if they don't, then you can drop the associated term in the model and in this way share the information across the areas and get more efficient estimates.

This is so for the occupancy and the detection formula, but not for the spatial model, which would then still be assumed to be identical for both areas. This may not make sense (e.g., occupancy may be correlated over a wider distance in one than in the other area). So, might it not be useful to add the option to spOccupancy to do some limited modeling of the spatial part of a model as well ?

Good luck and thanks --- Marc

--
You received this message because you are subscribed to the Google Groups "spOccupancy and spAbundance users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spocc-spabund-u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/f0e8e414-ce38-40a6-a851-4813fdcedcacn%40googlegroups.com.

--

______________________________________________________________

Marc Kéry
Tel. ++41 41 462 97 93
marc...@vogelwarte.ch
www.vogelwarte.ch

Swiss Ornithological Institute | Seerose 1 | CH-6204 Sempach | Switzerland
______________________________________________________________

*** Hierarchical modeling in ecology ***

https://www.mbr-pwrc.usgs.gov/pubanalysis/roylebook/

arianna vicari

unread,

Jun 3, 2025, 6:11:05 AMJun 3

to spOccupancy and spAbundance users

Dear Marc and Jeff,

First of all I want to sincerely thank you deeply for your detailed. I also want to apologize for my confusing previous email, mainly because I was confused myself.

I have been working with the dataset for almost two months, and only yesterday I found out I had some error in my matrix, leading my Bayesian p-value to be in the range of 0-0.008.

As this is my first time approaching modeling (I am a Master;'s student), my way of conducting it was simply by trying all the possible combinations of covariates until I had a satisfying result from the statistical and ecological point of views. I wonder if there is a more efficient way to compare models using the dredge() function from the MuMIn package?

I really appreciate your input regarding merging the two datasets — this is actually a direction I’ve been considering as well. In particular, I’m interested in extending my model to include both study areas and the space in between them (see the attached map).

One point I’m unsure about is how to classify the ‘Area’ variable in this case. If we’re including the transition zone between the two areas, how should we define or code the ‘Area’ factor? Should it remain binary, or would a three-level classification (e.g., wild / transitional / non-wild) make more sense?

I’m also looking forward to Jeff’s thoughts, especially on your point about potential differences in spatial autocorrelation between the areas. Perhaps the distance between the areas plays a role? In my case, as you can see from the map, they are relatively close—around 100 km apart—but this is definitely something I’m very interested in exploring further.

Thanks again for the insights!

Arianna Vicari

Screenshot 2025-06-03 115155.png

Marc Kéry

unread,

Jun 5, 2025, 4:41:28 AMJun 5

to spOccupancy and spAbundance users

Dear Ariana,

re. Goodness of fit: so, yes, the Bayesian p-value suggest the model does not fit your data very well. As mentioned earlier, investigating patterns in the residuals may give you a clue about ways in which you could improve your model and hopefully get it to pass a further test.

Then, re. variable selection: the two most typical goals of a model are its use for inference (e.g., understanding which covariates "affect" the response) and prediction (e.g. producing a species distriibution map). For the former goal, I would advise against any type of dredging. Rather, just fit a single, large model and then describe that. Typically, you will have to go through a small number of cycles of goodness of fit or another model assessment, followed by targeted modifications of the model. This view of scientific inference from a statistical model is proposed for instance by https://wildlife.onlinelibrary.wiley.com/doi/10.1002/jwmg.891 and https://arxiv.org/abs/2011.01808 and also in Chapter 18 on model building, checking and selection in Kery and Kellner (2024).

Re. merging the two data sets in a single analysis: strictly, from a statistical perspective, with just two study areas you cannot say anything about effects of wild versus non-wild, since you don't have any replication: n = 1 for each type of habitat. However, you could get some form of replication if you can obtain a measure of wild versus non-wild that varies not between the two study sites, but within each study site. For instance, you could quantify some measure of non-wild-ness within some suitable buffer distance around each trap location and then you'd be all fine.

Best regards --- Marc

To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/881f7233-6780-49ba-a502-e503fe3eac55n%40googlegroups.com.

Jeffrey Doser

unread,

Jun 5, 2025, 5:37:12 AMJun 5

to spOccupancy and spAbundance users

Hi Arianna and Marc,

Thanks for the discussion. I agree with what Marc has said regarding assessment of residuals and taking a closer look at the output of the Bayesian posterior predictive check to get a more detailed assessment of why the model is not fitting well.

In regards to fitting a spatial model with these data, fitting such a model would likely present some difficulties due to the large "gap" between the data points in the two regions. As Marc mentioned, fitting a spatial model would include a spatial random effect for each site, which is determined by a spatial decay and spatial variance parameter that are assumed to be the same across the entirety of the data set. When you have a situation like you have here with two distinct groups of points, fitting such a model can lead to difficulties as the model might "flip-flop" between trying to explain spatial autocorrelation on the large scale (between the two areas) versus the small scale (among points within an area). If you want to try and fit a spatial model, you would likely need to put a pretty restrictive prior on the spatial decay parameter, likely restricting it to only be able to explain autocorrelation at finer-scale (e.g., within an area) as opposed to across areas. You can see this article for some suggestions on how to go about doing that (among other things).

As Marc points out, there is perhaps the potential that the spatial autocorrelation has different characteristics between the two areas. In theory, one could estimate different spatial decay and variance parameters for the two different regions such that they could be allowed to have different patterns. Of course, one would need to have a suitable number of data points in each of the different regions. Or, there are also more advanced spatial covariance functions that could allow for more complex patterns like this when estimating the spatial random effects. Unfortunately, neither of those are possibilities in spOccupancy given its current structure, but I've added it to a list of potential things to incorporate at some point down the road.

Kind regards,

Jeff

To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/CAP5oAEouyMfVO1FQA1twAkjNFadCOb%2BzJN19j3PTjQRHsDO2bQ%40mail.gmail.com.

--

Jeffrey W. Doser, Ph.D.

Assistant Professor

Department of Forestry and Environmental Resources

North Carolina State University

Statistical Ecology and Forest Science Lab

Pronouns: he/him/his

arianna vicari

unread,

Jun 12, 2025, 3:36:34 AMJun 12

to spOccupancy and spAbundance users

Dear Jeff, dear Marc,

Thank you so much for the useful and deep answers you gave me.

I would therefore ask you, based on your previous comments, if I should consider to apply the single-species spatially varying coefficient occupancy model.

I am thinking about including the wilderness factor as "distance from the protected area", so even the transitional area, where i do not have any sampling point, will be included.

Otherwise, another idea is to fit the simpler single-specie occupancy model over the whole region, and eventually to blank out the transitional area, where I would expect a higher SD. But I am not so sure about this idea, mainly due to what Marc pointed out (varying spatial correlation).

On the last resort, I would use the non-spatial model, which would remove the problem of the spatial autocorrelation differences, but would lower the accuracy of my prediction.

I did try to run null models (det and occ formula = ~1) for each of them, just to explore their WAICs, and the lower one was for the non-spatial model. I do not think it is meaningful, and I will need to run a few more models to try it, I only did so to explore the general idea.

Looking forward to your precious answers, I wish you a good day.

Kind regards,

Arianna Vicari

Jeffrey Doser

unread,

Jun 17, 2025, 11:37:25 AMJun 17

to arianna vicari, spOccupancy and spAbundance users

Hi Arianna,

That's an interesting thought. You could I think use the svcPGOcc() to get at the suggestion Marc had before to fit a different spatial random effect for the two regions. To do this, you would want to include two dummy variables in your occ.covs list, such that one of the variables took value 1 for one study area and 0 for the other, and vice versa for the other variable. If you called these two variables "region1" and "region2", you could then set occ.formula to something like

occ.formula = ~ factor(region1) + factor(region2) - 1 + ...

where the "..." indicates other covariates you had in your model. Then if you set

svc.cols = c(1, 2)

then the result would in theory fit a different spatial random effect for the two regions. i haven't tried that so not sure if it would work in practice, but could be worth a try.

Jeff

Jeffrey W. Doser, Ph.D.

Assistant Professor

Department of Forestry and Environmental Resources

North Carolina State University

Statistical Ecology and Forest Science Lab

Pronouns: he/him/his

To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/f1109bb2-f592-43e9-8fd1-e73f83f5dbe7n%40googlegroups.com.

Reply all

Reply to author

Forward