Major confusion/discrepancy in modelled + predicted occupancy values

王英龍

unread,

Oct 28, 2024, 4:57:10 AM10/28/24

to spOccupancy and spAbundance users

Hi all,

I've found a recurring issue that has left me stumped - I need your help!

I've been modelling occupancy and then using the predict() function using 600 randomly generated points with the corresponding occurrence covariate data used in my models.

I have done this for multiple seasons with single season models, and am now trying a multi-season model.

I have around 35 camera trap sites for the input data, with the occurrence covariate info extracted from my environmental rasters. I have then been using the predict() function as I want to generated a map of what the modelled occupancy for a given season looks like for our camera trap grids. To avoid patches in the raster from just using the 25 camera trap locations, I have randomly generated 600 points within the grid and extracted the occurrence covariate info for each of these points, which is used as the x.0 file for the predict() function. The prediction occurrence covariates fed into the predict function are exactly the same for each season, the only difference is the occupancy model (which then has some difference camera trapping sites (not all are present in every single dataset - some didn't work etc), different detection histories, etc).

My issue is that when modelled occupancy is modelled as be the lowest (e.g. 0.47 for Spring 2024, compared with 0.65 for Summer 2024), when I then put these models into the predict() function, the resulting prediction raster I get indicates the highest occupancy (e.g. 0.8) for the season that was modelled as having the lowest occupancy, and predicts lower occupancy for the seasons that were modelled as having a higher average occupancy. This has happened with 2 separate camera trap grids now, and I am totally stumped.

Why might this be? Has anyone else had this issue?

I am happy to provide code if needed - just let me know what specifically you might want to see.

All the best,

Jamie

Jeffrey Doser

unread,

Oct 29, 2024, 10:14:46 AM10/29/24

to 王英龍, spOccupancy and spAbundance users

Hi Jamie,

Thanks for the note. The model estimated values of psi and z takeinto account the observed data values at those locations (i.e., since the models assume no false positives, if a species is ever detected at a site then that site is known to be occupied). When using the predict() function, no such information is available, and so depending on how well the model is able to predict occupancy probability, there may be some differences between values of psi/z at locations used to fit the model versus psi/z values predicted at locations near by. It of course also depends on the covariates as well, and in general how well the model fits and how reliable predictions in new parts of the study area may or may not be. It's not super clear to me where the problem might lie in your examples based on your description, so If you could send me the code/data needed to fit the model and do the prediction (i.e., to generate the contrasting results you are finding) then I can try and take a look to see what is going on.

Cheers,

Jeff

Jeffrey W. Doser, Ph.D.

Assistant Professor

Department of Forestry and Environmental Resources

North Carolina State University

Statistical Ecology and Forest Science Lab

Pronouns: he/him/his

--
You received this message because you are subscribed to the Google Groups "spOccupancy and spAbundance users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spocc-spabund-u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/317ee04b-475e-4a92-82a2-9f920dc2b4a1n%40googlegroups.com.

Message has been deleted

王英龍

unread,

Nov 1, 2024, 9:44:31 AM11/1/24

to spOccupancy and spAbundance users

Hi Jeff,

Thanks for the offer to help. I've just emailed you some example data and code that should show what my problem is.

In the meantime, I realised today that I can make a map of the modelled psi (not values from predict() but the original spOccupancy models), which was what I really wanted all along. I thought the only way I could do that was via the prediction function. Since I was following the walkthroughs on the spOccupancy website and only saw maps in the prediction section, I did not realise I could make maps by putting the modelled psi values for each site in a data frame with the x and y coordinates.

I imagine others may not realise this as well, so I would definitely recommend highlighting in the walkthrough examples on the spOccupancy website that maps of modelled occupancy at each sampling site can be made by making a dataframe with the mean psi for each site (apply(model$psi.samples, 2, mean)) along with one's x and y coordinates.

Otherwise, I imagine others will, like me, think that the only way to make a map showing occupancy is by using the predict() function and workflows on the website, when this may not be exactly what we want to show.

All the best and looking forward to hearing your thoughts on why I'm seeing such differences between modelled and predicted psi,

Jamie

Jeffrey Doser

unread,

Nov 16, 2024, 6:05:38 AM11/16/24

to spOccupancy and spAbundance users

Just to circle back to this in case someone else encounters this problem, I believe the issue here was a result of the covariates not be standardized when doing prediction. If standardizing covariates when fitting the model, the covariates supplied to "X.0" in the predict() functions will need to be standardized by the same values used when fitting the model. An example of how to do this is in the intro spOccupancy vignette.

Also thanks Jamie for the useful suggestion on highlighting how one can use "psi.samples" in a vignette. That's a great idea!

Jeff

Matthew Hyde

unread,

Nov 7, 2025, 5:20:16 PMNov 7

to spOccupancy and spAbundance users

Hi Jeff,

I was looking through the forum to see if I could encounter a similar issue to mine and I am also having high predicted values across my study area. I'm estimating occupancy for southern tamandua across 269 sites and then I want to create a range map for the province. I fit the model and when I predict out I get that the minimum value is 0.53 and mean occupancy is 0.89 I standardized my covariates, though some of them are dummy variables for habitat types. Below I pasted my code, I was wondering if there's something with my model or if it's the way I'm predicting out with the random effect. Any help would be much appreciated.

# model

## formula

T.occ <- ~(1|ID)+scale(fordist)+scale(forcov)+scale(hfp)+scale(roads)+palma+savannah+forest+mosaic+veg_sec+rice

## model

Tout2sp <- spPGOcc(occ.formula = T.occ,
det.formula = det.formula,
data = TASP,
inits = TTI.inits,
n.batch = 100,
batch.length = 1000,
priors = TTI.priors,
cov.model = Tcov.model,
NNGP = TRUE,
n.neighbors = 15,
tuning = TTI.tuning,
n.burn = 20000,
n.thin = 20,
n.chains = 3,
n.omp.threads = 3,
verbose = TRUE,
n.report = 1000)

summary(Tout2sp)

## predict out

coords.0 <- as.matrix(RAS[, c('x', 'y')])

## covariates from the final model
forcov.pred <- (RAS$forcov - mean(GASP$occ.covs[, 3])) / sd(GASP$occ.covs[, 3])
fordist.pred <- (RAS$fordist - mean(GASP$occ.covs[, 4])) / sd(GASP$occ.covs[, 4])
hfp.pred <- (RAS$hfp - mean(GASP$occ.covs[, 5])) / sd(GASP$occ.covs[, 5])
roads.pred <- (RAS$roads - mean(GASP$occ.covs[, 6])) / sd(GASP$occ.covs[, 6])
for.pred <- (RAS$forest)
sav.pred <- (RAS$savannah)
rice.pred <- (RAS$rice)
mos.pred <- (RAS$mosaic)
veg.pred <- (RAS$veg_sec)
pal.pred <- (RAS$palma)

# These are the new intercept and covariate data.
X.0 <- cbind(1,
forcov.pred,
fordist.pred,
hfp.pred,
roads.pred,
for.pred,
sav.pred,
rice.pred,
mos.pred,
veg.pred,
pal.pred)

Tout.pred.sp <- predict(Tout2sp,
X.0,
coords.0,
ignore.RE = TRUE, # site level random effect
verbose= TRUE)

Jeffrey Doser

unread,

Nov 17, 2025, 3:19:24 PM (11 days ago) Nov 17

to Matthew Hyde, spOccupancy and spAbundance users

Hi Matt,

Apologies for the delay. I just took a look at your code and the only thing that stands out to me is that it seems like you included a random, unstructured site-level random effect (1|ID)? If ID is truly the ID of each individual site, then the model is overparameterized. An unstructured site-level random effect on occupancy is not identifiable, and rather if a site-level random effect is desired it needs to have some structure to it (e.g., spatial structure). With the way your model appears to be set up, you are including both a site-level unstructured random effect and a spatial random effect (since you're using spPGOcc), and so I'm guessing that could be causing some of the discrepancies.

If the ID random effect is something different, then it's not immediately clear to me from the code what is going on. In that case you can send me the data to take a look at and I'll try to gauge what's going on.

Kind regards,

Jeff

To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/7ba155e2-9973-4b83-912d-d61aabd4da3cn%40googlegroups.com.

--

Reply all

Reply to author

Forward