spOccupancy - calculating detection, prediction pixel number/size

45 views
Skip to first unread message

王英龍

unread,
Sep 10, 2024, 2:29:56 AMSep 10
to spOccupancy and spAbundance users
Hi Jeff,

I'm a big fan of spOccupancy and how it can take into account spatial autocorrelation. Been on the 'accounting for autocorrelation' hype train since I started using ctmm last year for my MSc thesis. Thanks for making this package!

I've been using spOccupancy for trhe past few months for work here in Taiwan to produce occupancy estimates of some of the endangered animals here (camera trap data). I've used the examples provided on the spOccupancy website, alongside your lecture I found online, to teach myself the package (along with ChatGPT, as I imagine most of us do nowdays).

For now I have two questions (and will probably have more with time!)

1) How does the predict() function predict the size of the prediction raster, and number of pixels? e.g., in the screenshot attached (AWB_2023_SPR_B), the prediction model has been ran using just the actual camera trap sites and the occurrence covariates at these sites (so no new points randomly selected with occurrence covariates calculated/extracted, just predicting based on the camera trap sites using our model input data). With this data, we have 40 camera trap sites in the grid, but there are more than 40 pixels in the prediction raster. So, my colleagues and I are wondering, how does spOccupancy's prediction function determine the size and number of its raster pixels? Is this something that can be altered (e.g. if we wanted our prediction to be at the scale of 40 1km2 pixels)? I mainly want to understand what to tell my colleagues (and to understand myself!) but if this is something that can be tweaked, that is super.

2) Is there a simple way to calculate detection probabilities for each replicate/season for a spOccupancy model? I see that in R packages like unmarked, there are simple functions to calculate the detection probability, just as there are with the occurrence probability. But I haven't found a clear and easy way to calculate detection probabilities with spOccupancy. I'm wondering if I've colossally missed something while learning this package, or if a function/small code snippet for calculating detection probabilities is something that is on the to-do list for spOccupancy!

Based on the spOccupancy examples, there is a clear way to calculate mean occupancy probability e.g.:
mean(psi.mean_out <- apply(out$psi.samples, 2, mean)). gives mean occupancy for the entire season (single season model, in my example we have 3 replicates).
The same code applied to a single-species multi-season model would give the average occupancy for the entire multiseason model, while replacing the 2 with a 3 gives the summarised occupancy for each season of the multiseason model.

Since I couldn't see a clear way of calculating detection probabilities with spOccupancy, using ChatGPT, I devised the following code (see below this question). I ran my same data through unmarked and got fairly similar occupancy estimates, the detection estimates were still ROUGHLY similar in their patterns but definitely not as much as the occupancy estimates.

Cheers!

Jamie Bolam


# calculating detection probability per primary time period
#now with with multiple det covs
# Extract alpha samples for detection coefficients
out_awb_2023_aut_B_sp_exp
#  det.formula = ~scale(DC1) + I(scale(DC1^2)) +
# scale(DC2) + I(scale(DC2^2)) + scale(DC3) + I(scale(DC3)^2) +
#   scale(DC4) + I(scale(DC4)^2)

alpha_samples_awb_2023_aut_B <- out_awb_2023_aut_B_sp_exp$alpha.samples

# Get detection covariates from the data structure
DC1_covariates_awb_2023_aut_B <- data.awb_2023_aut_B$det.covs$DC1
DC2_covariates_awb_2023_aut_B <- data.awb_2023_aut_B$det.covs$DC2
DC3_covariates_awb_2023_aut_B <- data.awb_2023_aut_B$det.covs$DC3
DC4_covariates_awb_2023_aut_B <- data.awb_2023_aut_B$det.covs$DC4

alpha_samples_awb_2023_aut_B
head(alpha_samples_awb_2023_aut_B)

# Extract dimensions
num_samples_awb_2023_aut_B <- nrow(alpha_samples_awb_2023_aut_B)  # Number of posterior samples
num_sites_awb_2023_aut_B <- dim(DC1_covariates_awb_2023_aut_B)[1]  # Number of sites
num_replicates_awb_2023_aut_B <- dim(DC1_covariates_awb_2023_aut_B)[2]  # Number of replicates

str(DC1_covariates_awb_2023_aut_B)

# Initialize array to store detection probabilities
detection_probs_awb_2023_aut_B <- array(NA, dim = c(num_samples_awb_2023_aut_B,
                                                       num_sites_awb_2023_aut_B,
                                                       num_replicates_awb_2023_aut_B))
detection_probs_awb_2023_aut_B

# Loop through each posterior sample
for (s in 1:num_samples_awb_2023_aut_B) {
  # Get the s-th set of detection coefficients
  alpha_s_awb_2023_aut_B <- alpha_samples_awb_2023_aut_B[s, ]
 
  # Compute detection probability for each site, season, and replicate
  for (site in 1:num_sites_awb_2023_aut_B){
    for (rep in 1:num_replicates_awb_2023_aut_B) {
      # Extract covariate values for this site, season, and replicate
      DC1_value_awb_2023_aut_B <- DC1_covariates_awb_2023_aut_B[site, rep]
      DC2_value_awb_2023_aut_B <- DC2_covariates_awb_2023_aut_B[site, rep]
      DC3_value_awb_2023_aut_B <- DC3_covariates_awb_2023_aut_B[site, rep]
      DC4_value_awb_2023_aut_B <- DC4_covariates_awb_2023_aut_B[site, rep]
     
      # Skip if any covariate values contain NAs
      if (is.na(DC1_value_awb_2023_aut_B) || is.na(DC2_value_awb_2023_aut_B)) next
     
      # Compute the linear predictor (eta)
      eta_awb_2023_aut_B <- (alpha_s_awb_2023_aut_B[1]) +
        (alpha_s_awb_2023_aut_B[2] * DC1_value_awb_2023_aut_B) + #DC1
        (alpha_s_awb_2023_aut_B[3] * (DC1_value_awb_2023_aut_B^2)) + #DC1^2
        (alpha_s_awb_2023_aut_B[4] * DC2_value_awb_2023_aut_B) + #DC2
        (alpha_s_awb_2023_aut_B[5] * (DC2_value_awb_2023_aut_B^2)) + #DC2^2
        (alpha_s_awb_2023_aut_B[4] * DC3_value_awb_2023_aut_B) + #DC3
        (alpha_s_awb_2023_aut_B[5] * (DC3_value_awb_2023_aut_B^2)) + #DC3^2
        (alpha_s_awb_2023_aut_B[4] * DC4_value_awb_2023_aut_B) + #DC4
        (alpha_s_awb_2023_aut_B[5] * (DC4_value_awb_2023_aut_B^2)) #DC4^2
     
      # Compute detection probability
      detection_probs_awb_2023_aut_B[s, site, rep] <- 1 / (1 + exp(-eta_awb_2023_aut_B))
    }
  }
}

detection_probs_awb_2023_aut_B
# summarize the detection probabilities, e.g., mean across samples
mean_detection_probs_awb_2023_aut_B <- apply(detection_probs_awb_2023_aut_B, c(2,3), mean, na.rm = TRUE)
mean_detection_probs_awb_2023_aut_B

str(detection_probs_awb_2023_aut_B)
class(detection_probs_awb_2023_aut_B)

# Calculate average detection probability per season
avg_detection_probs_per_replicate_awb_2023_aut_B <- apply(mean_detection_probs_awb_2023_aut_B, 2, function(x) mean(x, na.rm = TRUE))

# Print the result
avg_detection_probs_per_replicate_awb_2023_aut_B
# 0.7483477 0.7483494 0.7483454

data.awb_2023_aut_B$y


AWB_2023_SPR_B_screenshot.png

Jeffrey Doser

unread,
Sep 10, 2024, 9:38:58 AMSep 10
to 王英龍, spOccupancy and spAbundance users
Hi Jamie, 

Thanks for the note! Here are some thoughts on your two questions: 
  1. spOccupancy does not change the scale at which occupancy is estimated when predicting. Most occupancy models (like those fit in spOccupancy) are scale dependent in the sense that the estimates of occupancy probability are inherently based on the scale of the data that were collected, even when predicting. So, when doing prediction in spOccupancy, you will send in a set of coordinates. If generating these coordinates from a raster cell, these will likely be the center of the coordinates. However, the estimate of occupancy probability that comes from the model is still on the scale of the data. So with your example, if you are predicting using cells from a raster, the interpretation of each prediction "cell" that comes from the model is still a probability on the scale of the data. In other words, if you assumed each camera trap was sampling something like a 200m square area around the camera trap, the predictions of occupancy probability would still be "probability of occupancy probability within a 200m square area at the new prediction location", regardless of if the prediction coordinates were generated from a raster with different size grid cells. The scale of the occupancy probability is solely based on how the data were generated and the interpretation you can make based on the species you're working with. The only thing that is really relevant for the prediction is the coordinates that you give to the predict() function when you predict. That will determine the spatial location at which occupancy is predicted, and the occupancy probability is interpreted in the same way as how you are interpreting the occupancy probabilities in the data set (just as if you had a camera centered on that new prediction location). If you're interested in predicting occupancy at different scales, that is not something spOccupancy can do, but perhaps you can check out this lovely paper by Koshkina et al related to "scale invariant" occupancy models: https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12738
  2. Yes, there are two ways you can go about doing this. If you want the detection probabilities at the sites/replicates/seasons that were in the model, you can use the fitted() function to get these (they will be stored in the "p.samples" output of that list). Just run fitted(out), where "out" is the model object. If you want to predict detection probability at new locations, there is a "type" argument in the predict() functions that allows you to specify that you want to predict detection probability at the new locations/seasons, and you would then supply detection covariates to the predict() function in that case. It is very interesting though to hear that ChatGPT was able to give a fairly reasonable solution!
Hope that helps!

Jeff


--
You received this message because you are subscribed to the Google Groups "spOccupancy and spAbundance users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spocc-spabund-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spocc-spabund-users/474dbe56-4d5b-41e3-8328-372d912b8d7en%40googlegroups.com.


--
Jeffrey W. Doser, Ph.D.
Assistant Professor
Department of Forestry and Environmental Resources
North Carolina State University
Pronouns: he/him/his
Message has been deleted

王英龍

unread,
Sep 11, 2024, 5:13:28 AMSep 11
to spOccupancy and spAbundance users
Hi Jeff,

Thank you very much for the explanation.

Regarding calculating detection probabilities at the sites/replicates/seasons in the model using the fitted() function, it worked very well.
e.g. calculating the detection prob. for each replicate using this:
apply(fit_out$p.samples, 3, mean)

However, I noticed that for some of the sites/replicates, the detection probability was coming up as NA. I thought this was as I had some NA values in my detection covariates (n the same places where I had NA in my detection-non-detection (y) data. e.g. if one covariate is worktime (hours a camera trap was operating that replicate), if a certain camera trap was not at a site for e.g. replicate 1, but was during replicate 2 and 3, the blank spot is currently listed as NA in my y data as well as the worktime data for detection covariates. 
So, I changed the NAs in the detection covariate worktime data to 0, re-ran my model and the fitted() detection probabilty code, but it still came up as NA for the sites/replicates where any NAs were detected at all.
In other words, I cannot calculate detection probability for the sites/replicates where there were any NA values (even e.g if for a replicate, 39/40 of the camera traps were active and 1 was not placed/deployed in that replicate, the estimated det.prob. will be NA).
Is there any workaround to this? My models have up to now worked perfectly fine including the NAs in my y data, so I was quite surprised to run into this. I've been under the impression that spOccupancy can handle NAs in the y data fine - am I mistaken? Should they all be removed before analysis? If that were the case, I would be missing out on valuable data. I don't want to replace the NAs with 0, since we can neither confirm nor deny that an observation would have taken place at the site/replicate where the camera was not yet deployed.

Would an appropriate quick and easy workaround be to find the mean of the det probs for each site/replicate, excluding the NAs? 

Regarding spOccupancy scale of prediction
Thank you for your thorough explanation. I just want to clarify, the above screenshot is the prediction data based on just the data used in our model (no additional points and occurrence covariates provided). Each camera trap was placed roughly (fieldwork/terrain permitting) in the centre of a 1km grid. Of course, the camera trap itself cannot sample the entire 1km2 grid, so I do not expect the prediction output raster to be 1km2.
I have simply followed the code in the prediction section here (https://doserlab.com/files/spoccupancy-web/articles/modelfitting#prediction), but since I was having trouble visualising the data in a ggplot, I turned the stars object into a raster and plotted it on leaflet. I am still just wondering why there were more than 40 pixels for the actual camera trap only-data, considering there were only 40 xy coordinates provided to the model.

It is a good time to ask as well, do you have any idea why I am having issues visualising my prediction dataset on ggplot? When I ran your example data, I got the graphs fine, and I hadn't changed anything barring my model and input data, so it should have worked. But it comes out looking very sparse, even though when I convert it to a raster and plot it in leaflet it looks fine (e.g. previously attached screenshot in my first post).
For reference, this is what my ggplot looks like when I use just the camera trap point dataset, along with a dataset where I predict for 200 points randomly generated within the shapefile for the camera trap grid, and then one for 600 points randomly generated in the shapefile grid (see attached).

I wonder if you could please look at my code and help me figure out what is going wrong? I am totally stumped! 

Cheers,

Jamie


# Produce a species distribution map (posterior predictive means of occupancy)
plot.dat_2023_spr_B <- data.frame(x = coords.0.2023_spr_B[,1],
                                         y = coords.0.2023_spr_B[,2],
                                         mean.psi = apply(out.sp.pred_2023_spr_B$psi.0.samples, 2, mean),
                                         sd.psi = apply(out.sp.pred_2023_spr_B$psi.0.samples, 2, sd))
dat.stars_2023_spr_B <- st_as_stars(plot.dat_2023_spr_B, dims = c('x', 'y'))
ggplot() +
  geom_stars(data = dat.stars_2023_spr_B, aes(x = x, y = y, fill = mean.psi)) +
  scale_fill_viridis_c(na.value = 'transparent') +
  labs(x = 'Easting', y = 'Northing', fill = '',
       title = 'Mean occurrence probability - Spring 2023') +
  theme_bw()

I always get this error and I'm unsure why. I changed my coords.0 object into a dataframe and called x = coords.0.2023_spr_B_df$x etc, but got the same error
警告訊息: 於 ggplot2::geom_rect(mapping = mapping, data = data, ...): Ignoring unknown aesthetics: x and y
Occ_2023_SPR__ggplot.png
Occ_2023_SPR__ggplot_200points.png
Occ_2023_SPR__ggplot_600points.png

Jeffrey Doser

unread,
Sep 13, 2024, 4:04:44 PMSep 13
to 王英龍, spOccupancy and spAbundance users
Hi Jamie, 

spOccupancy can certainly handle missing values in the detection-nondetection data. Regarding the detection probabilities, the functions will return detection estimates for each iteration, site, and replicate combination. It will give NA values for the site/replicate combinations where there is no data. So yes, if you want an average detection probability at a site, then just add na.rm = TRUE into your call to apply() and it should work. Of course you would then need to keep in mind that the average would involve taking the average over different replicates at your sites if they were sampled for different numbers of replicates. The alternative to this is to use the predict() function to predict detection probability at each site for a given value of the covariates. 

For the prediction, if you are supplying only 40 coordinates into the predict() function, then the resulting object from predict() will only give predictions for those 40 values. So my guess would be that there is something wrong with the post-processing code, so perhaps take a look there to make sure you are summarizing over things in the proper dimensions when trying to get site-level estimates, etc. 

I have encountered that problem before using ggplot() following the approach I have used in the vignettes, but I don't have a great answer as to what's going on. It isn't related to the spOccupancy objects, but rather it is something to do with the number of points you're trying to plot and depending on how close those points are to each other it can cause the ggplot function to fail. You could try looking things up online in relation to plotting stars objects in ggplot to see if you can find anyone with a similar problem. 

Cheers,

Jeff

Reply all
Reply to author
Forward
0 new messages