Dear ecological modelers,
I am performing dynamic occupancy models using citizen science data using jagUI package.
As I am working with restructured citizen science data, I have a lot of sites, detection events for 12 years (about 10000 sites with 10 detection events per year). Of course I also have a lot of NA in my multi-dimentional matrix (observation matrix). To speed up process, I vectorized matrix to avoid having to update missing values as suggested in AHM2 book (chapter 4). My model therfore works with "long" format.
My model is quite "complex" with :
Here is an sample of the model developed (single-species model but executed in series for each species). I use relative long chains (30000 iterations, burn in value set to 15000 and thinning of 1 on 8 chains). I got a machine with 24 core on my CPU so running 8 chains is not a problem at all - i gess).
For some species, I didn't reach convergence with these MCMC settings (visual inspection of traceplots and Rhat>=1.1 for some parameters). That’s why I check the Rhat and n.eff values to continue chains by incrementing 5,000 iterations (max 4 times) for species that have not converged.
In some paper and in the chapter 4 of AHM2, I can read high thinning values (25-150). I thought that increasing thinning value is not recommended (if the memory is not an issue). In my conception of the thinning effect, increasing thinning does not fix bad mixing and reduce the n.eff.
In my specific case, I did not include z parameter as well as eps_p_site in the monitored parameters. This greatly reduce the memory use that’s why I can use nt =1 on my 128Gb RAM machine. However, I wonder why such thinning values can be seen in the literature. I suppose this results from the monitoring of massive latent structures such as z. Why is it useful to monitor this specific z parameter If I just focus on global occupancy trend ? Can I just focus on parameters monitored in the “params_vec” vector ?
bdata_vec <- list(
y = y_vector,
site = site_idx,
year = year_idx,
list_type = list_type_vector,
julian_date = julian_date_vector,
exp_obs = exp_obs_vector,
nsite = nsites,
nyear = nyears,
Nobs = Nobs,
nListTypes =
length(levels(as.factor(list_type_vector)))
)
cat(file = "dynocc_random_effects_fixed.txt",
"
model {
# Priors for initial occupancy
psi1 ~ dunif(0, 1)
# Global means for survival (phi), colonization (gamma), and detection
(p)
mu_phi ~ dunif(0, 1)
mu_gamma ~ dunif(0, 1)
mu_p ~ dunif(0, 1)
# Random year effects
for (t in 1:(nyear - 1)) {
eps_phi[t] ~ dnorm(0, tau_phi)
eps_gamma[t] ~ dnorm(0, tau_gamma)
}
for (t in 1:nyear) {
eps_p[t] ~ dnorm(0, tau_p)
}
# Random site effects on detection
for (i in 1:nsite) {
eps_p_site[i] ~ dnorm(0, tau_p_site)
}
# Hyperpriors for year and site variance
sd_phi ~ dunif(0, 5)
tau_phi <- pow(sd_phi, -2)
sd_gamma ~ dunif(0, 5)
tau_gamma <- pow(sd_gamma, -2)
sd_p ~ dunif(0, 5)
tau_p <- pow(sd_p, -2)
# Hyperprior for site-level detection variability
sd_p_site ~ dunif(0, 5)
tau_p_site <- pow(sd_p_site, -2)
# Dynamic occupancy model (ecological process)
for (t in 1:(nyear - 1)) {
logit(phi[t]) <- logit(mu_phi) + eps_phi[t]
logit(gamma[t]) <- logit(mu_gamma) + eps_gamma[t]
}
for (i in 1:nsite) {
z[i, 1] ~ dbern(psi1)
for (t in 2:nyear) {
z[i, t] ~ dbern(z[i, t - 1] * phi[t - 1] + (1 - z[i, t -
1]) * gamma[t - 1])
}
}
# Detection model with site + year random effects + observer experience
covariate
beta1 ~ dunif(-5, 5)
beta2 ~ dunif(-5, 5)
# List type effects (categorical)
for (k in 2:nListTypes) {
beta_list[k] ~ dnorm(0, 0.04)
}
beta_list[1] <- 0
# Coefficients for observer experience levels 2, 3, 4 (level 1 as
baseline)
for (k in 2:4) {
beta_exp[k] ~ dunif(-5, 5)
}
for (n in 1:Nobs) {
# Indicator variables for observer experience categories
exp_eff[n] <- equals(exp_obs[n], 2) * beta_exp[2] +
equals(exp_obs[n], 3) * beta_exp[3] +
equals(exp_obs[n], 4) * beta_exp[4]
logit(p[n]) <- logit(mu_p) +
eps_p[year[n]] +
eps_p_site[site[n]] + # site-level random effect
beta1 *
julian_date[n] +
beta2 *
pow(julian_date[n], 2) +
beta_list[list_type[n]] +
exp_eff[n]
# NEW: observer experience
effect
y[n] ~ dbern(z[site[n], year[n]] * p[n])
}
# Derived quantities
psi[1] <- psi1
psi.fs[1] <- sum(z[1:nsite, 1]) / nsite
for (t in 2:nyear) {
psi[t] <- psi[t - 1] * phi[t - 1] + (1 - psi[t - 1]) * gamma[t
- 1]
psi.fs[t] <- sum(z[1:nsite, t]) / nsite
}
}
")
params_vec <- c("psi", "psi.fs", "phi", "gamma", "mu_phi", "mu_gamma", "mu_p", "sd_phi", "sd_gamma", "sd_p", "beta1", "beta2","beta_list", "beta_exp", "eps_p","sd_p_site")
# MCMC settings
na <- 2000
ni <- 30000
nb <- 15000
nt <- 1
nc <- 8
effective_samples <- (ni - nb) / nt * nc
print(paste("Stored samples per parameter:", effective_samples)) # 120000
tinitial<-Sys.time()
z_init <- apply(int_array, c(1, 3), function(x) {
if(all(is.na(x))) return(0L)
return(as.integer(max(x, na.rm=TRUE)))
})
inits_list <- replicate(nc, list(z = z_init), simplify = FALSE)
out_vec <- jags(bdata_vec, inits=inits_list, params_vec, "dynocc_random_effects_fixed.txt",
n.adapt = na, n.chains = nc, n.thin = nt,
n.iter = ni, n.burnin = nb, parallel = TRUE)
### Extraction of model parameters ####
mod_summary<-as.data.frame(out_vec$summary)
mod_summary<-cbind(param=rownames(mod_summary),mod_summary)
# check the Rhat values to increment supplementary iterations
mod_summary[c("param", "i")] <- do.call(rbind, strsplit(mod_summary$param, "[",2))
levels(as.factor(mod_summary$param))
p_keep<-c("psi","psi.fs","mu_phi","mu_gamma","mu_p","sd_phi","sd_gamma","sd_p", "beta_list_length","beta1","beta2","beta_exp","eps_p", "sd_p_site")
conv_check<-mod_summary[mod_summary$param %in% p_keep,]
ext<-1
max_extention<-4
if((max(conv_check$Rhat,na.rm=T)>1.1 | min(conv_check$n.eff)<=400) && ext<=max_extention)
{
print(paste0("extention increment ",ext, " on four maximum due to Rhat ", round(max(conv_check$Rhat,na.rm=T),2)," and n.eff ",min(conv_check$n.eff)))
out_vec <- update(out_vec, n.iter = 5000, parallel = TRUE)
ext<-ext+1
}
mod_summary<-as.data.frame(out_vec$summary)
mod_summary<-cbind(param=rownames(mod_summary),mod_summary)
Thanks,
Thomas Duchesne
Natagora study department
Belgium
To view this discussion visit https://groups.google.com/d/msgid/hmecology/ZR0P278MB08698C192110F8287CA8088BEBD4A%40ZR0P278MB0869.CHEP278.PROD.OUTLOOK.COM.
To view this discussion visit https://groups.google.com/d/msgid/hmecology/ZR0P278MB0869A6F19F9ED5C924232EA1EBD5A%40ZR0P278MB0869.CHEP278.PROD.OUTLOOK.COM.
To view this discussion visit https://groups.google.com/d/msgid/hmecology/CAKMRE2uJkFJKvp98nY5Jn2_K2Si%3D4tsD596hBo4iWLRzxd0k-w%40mail.gmail.com.