Abnormally high standard error for "forest" covariate in site occupancy models of ferals ungulates in Martinique

85 views
Skip to first unread message

Constant Sépari

unread,
Aug 30, 2025, 2:44:42 PM (5 days ago) Aug 30
to unmarked

Hello everyone,

I am currently working on site occupancy models for feral ungulates in Martinique and I am having some issues with a particular covariate : "forest". That covariate describes the presence/absence in both Pitons and Pelée massifs. As you can see on the ploteffect, "forestPitons" has a really wide SE compared to "forestPelee".

I really do not understand why there is such a high SE knowing that almost all sites on the "Pitons" massif count no detection at all for the feral pig and that there are approximately 30 occasions for pig detection in total on this same massif.

The ploteffect I attached comes from the model below ; s1 is the unmarkedframe (for the feral pig only) I also attached :  https://we.tl/t-uFwZJ3vz2C
massif_o<-occu(~ sentier+veg_bas+rough ~ forest, s1)

However, this covariate seems to be very interesting in my model because it has the lowest AIC among the others covariates.

Many thanks.

 ploteffet_forest_cochon.JPG

Ken Kellner

unread,
Aug 30, 2025, 9:39:08 PM (5 days ago) Aug 30
to unmarked
Thanks for including your data. The problem is (as you point out) that you have 0 detections at sites where forest = "Pitons".

Without any detections at these sites, it is not going to be possible to get an occupancy estimate for them with maximum likelihood; you have something like quasi-complete separation. That's why the model returns such a large SE for the Pitons estimate, and I would not use this model.

I don't think there's an easy solution to this problem in an occupancy modeling framework. Technically you could consider penalized likelihood or Bayesian approaches, but with literally 0 detections, I'd consider dropping the Pitons sites from the analysis, and instead separately state that you had no detections at these sites, which strongly implies very low or 0 occupancy (or extremely low detection probability).

Ken

Jim Baldwin

unread,
Aug 31, 2025, 1:48:45 AM (5 days ago) Aug 31
to unma...@googlegroups.com
I'm going to take my chances and quibble about "...it is not going to be possible to get an occupancy estimate for them with maximum likelihood".  There is a unique maximum likelihood estimate for all parameters in that model.  The estimate of the parameter labeled as forestPitons should be minus infinity which when added to the occupancy intercept corresponds to an estimated occupancy probability of zero.  The downside is that there is no associated estimate of precision because that value is on the border of the parameter space.

Also, if the Pitons sites are removed from the analysis, the maximum likelihood estimates of all of the other parameters remains the same as the original analysis so there is no need for another analysis.  If a starting value for the forestPitons parameter is say -25, then the resulting standard error displayed is even larger and the parameter estimate ends up being -25.  (I assume that occu uses double-precision arithmetic as starting values much smaller than -25 cause numerical issues.)

Jim


--
*** Three hierarchical modeling email lists ***
(1) unmarked (this list): for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology: for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2021) and Schaub & Kéry (2022)
---
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/unmarked/e6ebfb9c-fcbe-4a9e-a12b-1493bda9d18an%40googlegroups.com.

Jim Baldwin

unread,
Sep 1, 2025, 2:59:56 AM (4 days ago) Sep 1
to unma...@googlegroups.com
One possibility for obtaining a confidence interval for the occupancy probability for forest = Pitons is to use the profile likelihood method.  Below is some very crude R code which obtains an approximate 95% confidence interval being (0, 0.028).  Among other assumptions, this assumes that the detection model for forest = Pelee applies to forest = Pitons.  I'm not a wildlife biologist so I don't know how reasonable that assumption might be.

Assuming that the R data has been downloaded and the unmarkedFrameOccu is named s1, here is the code:

# Extract data from the unmarkedOccuFrame
  detections <- rowSums(s1@y, na.rm=TRUE)
  visits <- rowSums(!is.na(s1@y))
  forest <- as.numeric(s1@siteCovs$forest) - 1
  veg_bas <- s1@siteCovs$veg_bas
  sentier <-  as.numeric(s1@siteCovs$sentier) - 1
  rough <- s1@siteCovs$rough
  names(detections) <- NULL
  names(visits) <- NULL

# Run model that gives the initial estimates
  results <- occu(~ sentier + veg_bas + rough ~ 0 + forest, s1)

# Define the log likelihood function
logL <- function(x, psiPitons) {
   b0 <- x[1]
   b1 <- x[2]
   b2 <- x[3]
   b3 <- x[4]
   psiPelee <- x[5] # Associated with forest =  Pelee
   # Log of the likelihood
   p <- 1 - 1/(1 + exp(b0 + b1*sentier + b2*veg_bas + b3*rough))
   temp <- 0
   for (i in 1:length(rough)) {
       if (forest[i] == 0) {
          psi <- psiPelee
       } else {
          psi <- psiPitons
       }
       if (detections[i] == 0) {
          if (psi > 0) temp <- temp + log(1 - psi + psi*(1-p[i])^visits[i])
       } else {
          temp <- temp + log(psi) + detections[i]*log(p[i]) +
            (visits[i] - detections[i])*log(1 - p[i])
       }
   }
   temp
}

# Search for values of psiPitons (not the logit)
  psiPitons.grid <- seq(0, 1, length.out = 1001)
  logL.values <- numeric(length(psiPitons.grid))

  for (i in seq_along(psiPitons.grid)) {
    psiPitonsx <- psiPitons.grid[i]
    logL.values[i] <- optim(
      par = c(0.267, -0.865, -0.357, -1.637, 0.34),
      fn = logL,
      psiPitons = psiPitonsx,
      control = list(fnscale = -1),
      method = "L-BFGS-B",
      lower = c(-Inf, -Inf, -Inf, -Inf, 0),
      upper = c(Inf, Inf, Inf, Inf, 1))$value
  }

# Profile likelihood drop
  delta <- base.value - logL.values
  psiPitons.values <- psiPitons.grid[delta < 1.92]
  conf.int <- range(psiPitons.values)


I've directly estimated the occupancy probability rather than the logit of the occupancy probability.

Jim




Ken Kellner

unread,
Sep 1, 2025, 8:17:25 AM (4 days ago) Sep 1
to unmarked
Thanks Jim! I was too confident in my assessment it wasn't possible to get a useful max likelihood estimate/CI in this case. I guess I'd still be a little hesitant about making conclusions based on no detections, but that might be based more on gut feeling than evidence.

Ken

Jim Baldwin

unread,
Sep 1, 2025, 4:35:35 PM (3 days ago) Sep 1
to unma...@googlegroups.com
Ken:  I think we're in agreement in that it's the assumptions we are able to assume rather than the assumptions we are willing to assume.

Jim


Constant Sépari

unread,
Sep 2, 2025, 9:34:58 AM (3 days ago) Sep 2
to unmarked
Hello Mr Ken Kellner and Mr Jim Baldwin, thank you very much for your answers and the several solutions you are proposing to my issue. It really helps to better understand how an unmarked frame works. 

Despite the lack of time, my colleagues and I are still working on the enhancement of our code on R. Putting aside the high SE value for "forestPitons",  we are encountering an issue on the mb.gof.test from Mackenzie and Bailey showing very low c-hat values (=~0.1). I am not sure to understand well the meaning of an underdispersed model and especially how to deal with it. Can it comes from a particular covariate or does it signifies that the model is overconfigured with too many covariates ? Or are there others reasons ? 

I joined the covariates combinations ending up with low c-hat values. I assume "forest" is not straightly linked to this issue as we managed to have decent c-hat values using it in other combinations.

Here it is the values coming from the mb.gof.test in the same order than the covariates combinations I joined : 

#mb.gof.test(m_sc1, nsim = 2000)# p.value = 0.98, c-hat = 0.12

#mb.gof.test(m_sc2, nsim = 2000)# p.value = 0.96, c-hat = 0.11

#mb.gof.test(m_sc3, nsim = 2000)# p.value = 0.97, c-hat = 0.16


Thank you very much again for your assistance

Constant


Capture_unmarked.JPG
Reply all
Reply to author
Forward
0 new messages