Big citizen science dynamic occupancy models and thinning effect

68 views
Skip to first unread message

Thomas Duchesne

unread,
Nov 20, 2025, 10:59:16 AM (13 days ago) Nov 20
to hmecology: Hierarchical Modeling in Ecology

Dear ecological modelers,

 

I am performing dynamic occupancy models using citizen science data using jagUI package.

As I am working with restructured citizen science data, I have a lot of sites, detection events for 12 years (about 10000 sites with 10 detection events per year). Of course I also have a lot of NA in my multi-dimentional matrix (observation matrix). To speed up process, I vectorized matrix to avoid having to update missing values as suggested in AHM2 book (chapter 4). My model therfore works with "long" format.


My model is quite "complex" with :

  • fixed effect on detection probability : date (linear and quadratic), observer experience, list length type
  • random year effect on P, phi and gamma to account for annual variation in parameters around a mean
  • random site effect on P to account for variation of detection probability with specific sites particularly monitored by professional field workers

Here is an sample of the model developed (single-species model but executed in series for each species). I use relative long chains (30000 iterations, burn in value set to 15000 and thinning of 1 on 8 chains). I got a machine with 24 core on my CPU so running 8 chains is not a problem at all - i gess).

 

For some species, I didn't reach convergence with these MCMC settings (visual inspection of traceplots and Rhat>=1.1 for some parameters). That’s why I check the Rhat and n.eff values to continue chains by incrementing 5,000 iterations (max 4 times) for species that have not converged.

 

In some paper and in the chapter 4 of AHM2, I can read high thinning values (25-150). I thought that increasing thinning value is not recommended (if the memory is not an issue). In my conception of the thinning effect, increasing thinning does not fix bad mixing and reduce the n.eff.

In my specific case, I did not include z parameter as well as eps_p_site in the monitored parameters. This greatly reduce the memory use that’s why I can use nt =1 on my 128Gb RAM machine. However, I wonder why such thinning values can be seen in the literature. I suppose this results from the monitoring of massive latent structures such as z. Why is it useful to monitor this specific z parameter If I just focus on global occupancy trend ? Can I just focus on parameters monitored in the “params_vec” vector ?

 


      bdata_vec <- list(
        y = y_vector,
        site = site_idx,
        year = year_idx,
        list_type = list_type_vector,
        julian_date = julian_date_vector,
        exp_obs = exp_obs_vector,
        nsite = nsites,
        nyear = nyears,
        Nobs = Nobs,
        nListTypes = length(levels(as.factor(list_type_vector)))
      )
     
     
      cat(file = "dynocc_random_effects_fixed.txt", "
model {
  # Priors for initial occupancy
  psi1 ~ dunif(0, 1)

  # Global means for survival (phi), colonization (gamma), and detection (p)
  mu_phi ~ dunif(0, 1)
  mu_gamma ~ dunif(0, 1)
  mu_p ~ dunif(0, 1)

  # Random year effects
  for (t in 1:(nyear - 1)) {
    eps_phi[t] ~ dnorm(0, tau_phi)
    eps_gamma[t] ~ dnorm(0, tau_gamma)
  }

  for (t in 1:nyear) {
    eps_p[t] ~ dnorm(0, tau_p)
  }

  # Random site effects on detection
  for (i in 1:nsite) {
    eps_p_site[i] ~ dnorm(0, tau_p_site)
  }

  # Hyperpriors for year and site variance
  sd_phi ~ dunif(0, 5)
  tau_phi <- pow(sd_phi, -2)

  sd_gamma ~ dunif(0, 5)
  tau_gamma <- pow(sd_gamma, -2)

  sd_p ~ dunif(0, 5)
  tau_p <- pow(sd_p, -2)

  # Hyperprior for site-level detection variability
  sd_p_site ~ dunif(0, 5)
  tau_p_site <- pow(sd_p_site, -2)

  # Dynamic occupancy model (ecological process)
  for (t in 1:(nyear - 1)) {
    logit(phi[t]) <- logit(mu_phi) + eps_phi[t]
    logit(gamma[t]) <- logit(mu_gamma) + eps_gamma[t]
  }

  for (i in 1:nsite) {
    z[i, 1] ~ dbern(psi1)
    for (t in 2:nyear) {
      z[i, t] ~ dbern(z[i, t - 1] * phi[t - 1] + (1 - z[i, t - 1]) * gamma[t - 1])
    }
  }

  # Detection model with site + year random effects + observer experience covariate
  beta1 ~ dunif(-5, 5)
  beta2 ~ dunif(-5, 5)
  # List type effects (categorical)
  for (k in 2:nListTypes) {
    beta_list[k] ~ dnorm(0, 0.04)
  }
  beta_list[1] <- 0

  # Coefficients for observer experience levels 2, 3, 4 (level 1 as baseline)
  for (k in 2:4) {
    beta_exp[k] ~ dunif(-5, 5)
  }

  for (n in 1:Nobs) {
    # Indicator variables for observer experience categories
    exp_eff[n] <- equals(exp_obs[n], 2) * beta_exp[2] +
                  equals(exp_obs[n], 3) * beta_exp[3] +
                  equals(exp_obs[n], 4) * beta_exp[4]

    logit(p[n]) <- logit(mu_p) +
                   eps_p[year[n]] +
                   eps_p_site[site[n]] +    # site-level random effect
                   beta1 * julian_date[n] +
                   beta2 * pow(julian_date[n], 2) +
                   beta_list[list_type[n]] +
                   exp_eff[n]               # NEW: observer experience effect

    y[n] ~ dbern(z[site[n], year[n]] * p[n])
  }


  # Derived quantities
  psi[1] <- psi1
  psi.fs[1] <- sum(z[1:nsite, 1]) / nsite

  for (t in 2:nyear) {
    psi[t] <- psi[t - 1] * phi[t - 1] + (1 - psi[t - 1]) * gamma[t - 1]
    psi.fs[t] <- sum(z[1:nsite, t]) / nsite
  }
}
")
     

      params_vec <- c("psi", "psi.fs", "phi", "gamma", "mu_phi", "mu_gamma", "mu_p", "sd_phi", "sd_gamma", "sd_p", "beta1", "beta2","beta_list", "beta_exp", "eps_p","sd_p_site")

     

     

     

      # MCMC settings

      na <- 2000

      ni <- 30000

      nb <- 15000

      nt <- 1

      nc <- 8

     

      effective_samples <- (ni - nb) / nt * nc

      print(paste("Stored samples per parameter:", effective_samples)) # 120000

     

      tinitial<-Sys.time()

 

      z_init <- apply(int_array, c(1, 3), function(x) {

        if(all(is.na(x))) return(0L)

        return(as.integer(max(x, na.rm=TRUE)))

      })

     

      inits_list <- replicate(nc, list(z = z_init), simplify = FALSE)

     

      out_vec <- jags(bdata_vec, inits=inits_list, params_vec, "dynocc_random_effects_fixed.txt",

                      n.adapt = na, n.chains = nc, n.thin = nt,

                      n.iter = ni, n.burnin = nb, parallel = TRUE)

     

     

      ### Extraction of model parameters ####

      mod_summary<-as.data.frame(out_vec$summary)

      mod_summary<-cbind(param=rownames(mod_summary),mod_summary)

     

 

      # check the Rhat values to increment supplementary iterations

 

      mod_summary[c("param", "i")] <- do.call(rbind, strsplit(mod_summary$param, "[",2))

      levels(as.factor(mod_summary$param))

 

      p_keep<-c("psi","psi.fs","mu_phi","mu_gamma","mu_p","sd_phi","sd_gamma","sd_p",            "beta_list_length","beta1","beta2","beta_exp","eps_p", "sd_p_site")

 

      conv_check<-mod_summary[mod_summary$param %in% p_keep,]

 

      ext<-1

      max_extention<-4

      if((max(conv_check$Rhat,na.rm=T)>1.1 | min(conv_check$n.eff)<=400) && ext<=max_extention)

      {

        print(paste0("extention increment ",ext, " on four maximum due to Rhat ", round(max(conv_check$Rhat,na.rm=T),2)," and n.eff ",min(conv_check$n.eff)))

        out_vec <- update(out_vec, n.iter = 5000, parallel = TRUE)

        ext<-ext+1

      }

 

      mod_summary<-as.data.frame(out_vec$summary)

      mod_summary<-cbind(param=rownames(mod_summary),mod_summary)

 


Thanks,

 

Thomas Duchesne

Natagora study department

Belgium

 

Marc Kery

unread,
Nov 20, 2025, 11:13:09 AM (13 days ago) Nov 20
to Thomas Duchesne, hmecology: Hierarchical Modeling in Ecology
Dear Thomas,

I always thin because diskspace is always an issue for me. While I *believe* it is true that in some sense, thinning doesn't improve mixing, it is still the case that when one runs very long chains and thins possibly quite strongly, then the traceplots may look better mixed and I believe even Rhat may go down. ..... Colleagues: please correct me if you think I am wrong.

Then, of course, you don't need to monitor the latent variables (z, other random effects), unless you're interested in their values. If you're interested in the global occupancy trend, then, yes, by all means, have a look at that only.

Best regards  -- Marc




From: hmec...@googlegroups.com <hmec...@googlegroups.com> on behalf of Thomas Duchesne <thomasd...@hotmail.be>
Sent: Wednesday, November 19, 2025 16:16
To: hmecology: Hierarchical Modeling in Ecology <hmec...@googlegroups.com>
Subject: Big citizen science dynamic occupancy models and thinning effect
 
--
*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2021) and Schaub & Kéry (2022)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hmecology/82bde871-b65b-4ddb-9836-c200d3bc599cn%40googlegroups.com.

Perry de Valpine

unread,
Nov 20, 2025, 11:42:52 AM (13 days ago) Nov 20
to Marc Kery, Thomas Duchesne, hmecology: Hierarchical Modeling in Ecology
Dear Thomas,

Regarding the issue of monitoring very large latent state or random effects objects, nimble provides the option of a second set of monitors with a different thinning interval. This allows you to monitor the very large objects separately with a larger thinning interval in order to manage the total memory use. If you want to try switching to nimble from jags code, you can find a blog post on how to do so linked from our documentation page at r-nimble.org.

I agree with Marc that thinning is a common and reasonable thing to do. I think you are correct that technically there is always a loss of information, but in practical situations with slow mixing from complex models, the loss of information is often nothing to worry about. A subtlety in favor of thinning is that some methods to estimate effective sample size (e.g. coda::effectiveSize) will not give a good (valid) estimate for very slowly mixed chains. That is because it works by estimating an AR process and puts an upper limit on the number of lags, which can be inadequate for slowly mixed chains. (At least, that used to be the case, but I haven't checked in a while). I am not sure if that also impacts estimation of R-hat, in relation to Marc's comment on that.

I would also say that 30000 iterations is not really a huge number in the context of MCMC for large models. I may not be following your comment about thinning and mixing. If you have computing resources, it is reasonable to run for much (10x-100x) longer and thin.

You could also try the marginalized version of dynamic occupancy models in the nimbleEcology package. This is the kind of model where (for a single species) in Ponisio et al. 2020 we found that marginalizing was too costly to improve MCMC efficiency compared to sampling the site latent states, so I am not jumping to recommend it. However, it could be more advantageous in other problems, perhaps yours, so I am just pointing out the option.

HTH
Perry

 

Thomas Duchesne

unread,
Nov 21, 2025, 8:40:25 AM (12 days ago) Nov 21
to hmecology: Hierarchical Modeling in Ecology
Many thanks,

I think both of you "converge" with my understanding of thinning and effect.
I think the use of thinning can indeed make Rhat look closer to one, but "artificially", as it reduce the within chain autocorrelation. If I follow my instincts, I would say that it is probably a sampling artifact, reducing Rhat artificially and not a solution to mixing problems. I also noted this in my preliminary analyses : thinning is always the cause of information loss, but that this loss is less significant in the case of slow mixing of complex models. 

In my specific case, I only focus on global trends, but still want to keep on eye on random effect (site and year). I don't really want to store the latent states z. I guess that if my machine has sufficient memory, thinning is not necessary (I may be wrong). Maybe I will try to run longer chains with a small thinning (nt=4) to check the reported Rhat/n.eff and compare. I initially didn't want to run very long chains because the computation time rapidly get massive with this model and I reach convergence for the majority of my species with 30.000 iterations, 8 chains and thinning=1.
If you have any other tips on how to choose the best MCMC settings, please do not hesitate to share them. Feedbacks are precious.

Thank you for the nimble option. I will take a look at documentations.

Best regards,

Thomas

Marc Kery

unread,
Nov 21, 2025, 9:31:45 AM (12 days ago) Nov 21
to Thomas Duchesne, hmecology: Hierarchical Modeling in Ecology
Dear Thomas, Perry,

Perry: thanks for your excellent and informative mail !

Thomas: at the big risk of sounding like an idiot, I feel that you may be overthinking the whole issue: to a good approximation, one may say that everybody is thinning their Markov chains.

Best regards  --- Marc


From: hmec...@googlegroups.com <hmec...@googlegroups.com> on behalf of Thomas Duchesne <thomas....@natagora.be>
Sent: Friday, November 21, 2025 11:37

To: hmecology: Hierarchical Modeling in Ecology <hmec...@googlegroups.com>
Subject: Re: Big citizen science dynamic occupancy models and thinning effect
 

Matthijs Hollanders

unread,
Nov 22, 2025, 3:47:47 PM (11 days ago) Nov 22
to Marc Kery, Thomas Duchesne, hmecology: Hierarchical Modeling in Ecology
FWIW, I think you may be able to run more shorter chains without thinning if you use non centered random effects. Instead of:

for (i in 1:I) {
  x[i] ~ dnorm(0, tau)
}

where tau is precision do

for (i in 1:I) {
  x[i] <- sigma * z[i]
  z[i] ~ dnorm(0, 1)
}

Where sigma is the SD. 

Perry de Valpine

unread,
Nov 24, 2025, 5:01:23 AM (9 days ago) Nov 24
to Matthijs Hollanders, Marc Kery, Thomas Duchesne, hmecology: Hierarchical Modeling in Ecology
Thanks for suggesting this, Matthijs. FWIW, nimble provides a "noncentered" sampler that will take your first way of writing it (potentially also fully centered on a non-zero mean) and sample it as if it had been written the second way. This allows following the suggestion of Yu and Meng (2011) to sample in both parameterizations within the same MCMC so that one or the other can work better on a given problem (because there is at least some work in the Statistics literature saying that whether centered or non-centered parameterizations produce better mixing can depend on the model, data and algorithm). If that is of interest, it is described more in nimble's help(samplers). 
-Perry

Reply all
Reply to author
Forward
0 new messages