Convergence and sampling diagnostics in multi-species latent factor spatial model

56 views
Skip to first unread message

Aaron Skinner

unread,
Feb 6, 2026, 12:43:42 AMFeb 6
to spOccupancy and spAbundance users

Hi Jeff and all, 

I am running a multi-species latent factor spatial model with the 100 most abundant species in my system (more info on study system & goals). Overall the models seem to be running decently well, but there are several parameters that have Rhats greater than 1.1 or effective sample sizes < 100 (about 10% total). I've attached a Word document with some screenshots of model output, figures, tables, and the model specification. 

The parameters that are not fitting well are mostly the intercepts for both alpha and beta, as well as several of the beta estimates. And unfortunately, the community level parameters also don't seem to be sampling very well. Furthermore, the trace plots on the factor loadings show poor convergence for many species.

So I have a few questions about what I might do to improve convergence, sampling efficiency, and GOF. You have at least three suggestions in your 'convergence issues vignette': 1) consider the order of the species, 2) using more factors, or 3) fit one chain. I’m curious whether you think any of these might be particularly helpful in my case, and how you would approach them in practice.

1) Species order: In the vignette you recommend ordering species based on underlying biology (e.g., functional guilds). How should goodness of fit factor into this decision? My goal is inference on the full bird community, so I’d like biologically meaningful structure to be represented. However, ordering more widespread species (those observed at more sites and across more ecoregions) first consistently improves WAIC. I recognize WAIC reflects predictive performance, but it also loosely tracks model fit (ideally I’d rely on PPCs, though I’m currently limited by RAM). With respect to the recommendation from Carvalho et al. (2008) discussed in the convergence vignette, I’ve included code and output in the Word document—does my interpretation look correct?

2) Number of latent factors: Would you prioritize goodness of fit here, biological interpretability, or some balance of the two? Given that my main interest is community response to land-use change, should factor structure reflect functional guilds, shared responses to disturbance, or something else?

Finally, outside of the three suggested strategies, I’m wondering whether tuning parameters or initial values might help improve sampling efficiency. Acceptance rates are close to the target (~0.43), and tuning values stabilize above 2 by the first report (n.report = 100). I’ve included example output in the document.

Thanks very much for any insight you all might have.
Aaron, PhD Candidate, UBC

Convergence sampling diagnostics spOccupancy.docx

Jeffrey Doser

unread,
Feb 17, 2026, 4:01:57 PMFeb 17
to spOccupancy and spAbundance users
Hi Aaron, 

Sorry for the delay. Have you been able to successfully fit a model with msPGOcc()? This is what I would first suggest doing as this model is substantially simpler and would let you better narrow down how/if you can achieve convergence of the model you currently have specified. Here are some other thoughts: 
  • It's not clear to me how long you've run the model for based on the document you shared (n.batch = 200 it seems, but I don't know what you set the batch.length to be). These models can require hundreds of thousands of MCMC iterations to achieve convergence, so if you're running it for a lot less than 100,000 I'm not too surprised it hasn't converged. 
  • I may be misinterpreting the density plot that you are showing in the word document of the factor loadings, but is that showing the density of the actual values of the factor loadings? If so, there is certainly an identifiability problem in the model, since the values of the factor loadings span an extremely massive range (-5000 to 5000). I'm not exactly sure what the plot is showing though. 
  • You should be cautious in interpreting WAIC for models that are far from convergence. 
  • I don't remember the exact approach Carvalho et al. recommend, but generally you should look at which species has the maximum mean value of a given factor loading, set that species to the corresponding order in the y-matrix, and then do that for all the factors. 
  • It might be worthwhile to try to fit a model with fewer factors just to see if you can achieve convergence of the model. 
  • These are complex models, so there is no best approach for how to set the number of latent factors. I would recommend basing the number of factors biologically, while also considering potential computational limitations. While you can more formally assess the number of factors you want in your model by doing a series of WAIC comparisons, that is probably not the best approach if you're really focused on inference. If you have way too many factors, you can look at the factor loadings and you would see that there are effectively no "significant" loadings for certain factors (the latter factors). Alternatively, if you have way too few factors, you would likely see many significant factor loadings for all factors in your model, which may signal that there is additional residual spatial structure you're not accounting for. 
  • Tuning parameters won't really have any influence if acceptance rates are close to 0.43 by the end of the MCMC chain. 
  • Initial values could help with convergence of the factor loadings. The factor loadings can be hard to identify, so sometimes more restrictions on the initial values need to be placed. You can do this by manually specifying the initial values for lambda based on a preliminary run. Then, you would want to add a small amount of noise to those values across different MCMC chains so that the chains don't start at exactly the same values. To do that, you would need to run the chains separately by setting n.chains = 1 and running that script three different times with different initial values. Setting n.chains > 1 won't allow you to control the initial values of lambda (unless you fixed them at exactly the same value, which I don't recommend. 
Jeff

Aaron Skinner

unread,
Feb 27, 2026, 12:09:29 AMFeb 27
to Jeffrey Doser, spOccupancy and spAbundance users
Hi Jeff,

Thanks for your thoughts. I’ve continued playing around to try to achieve better convergence. As I mentioned, I’ve split the dataset into two parts to try to improve identifiability, which has helped. The results and questions below pertain to the forest dataset, estimating a model with 7 latent factors and 213 species (this is the model I’m struggling with more). Model parameters were:
n.burn = 20000, n.thin = 10, n.batch = 1600, batch.length = 25, n.chains = 3

Convergence
You’re correct that just increasing the burn in and overall number of samples has greatly improved convergence (Rhats & ESS), although some of the community level parameters still have ESS < 100 & high Rhats (particularly the occupancy variance terms, and the occupancy random effect term point count cluster). See 'Convergence plot' in the attached Word doc.

Factor loadings plot 
Yes, I had made a mistake previously. The 'Factor loadings plot' in the attached Word doc shows the mean factor loading across 6000 draws for each species X factor combination. In your previous email you mentioned that the latter factors wouldn’t be "significant" if there were too many factors. All the factor loadings are centered around zero (see below), but what do you mean regarding ‘significance’? 

Goodness of fit 
GOF seems mostly OK at the site level, at least for the community as a whole (freeman-tukey Bayesian p-value = 0.384), but the ‘per replicate’ GOF is poor (Bayesian p-value < 0.1). 60% of the GOF tests ‘failed’ overall (site & replicate), mostly underpredicting (p < 0.1), and even 47% of the species ‘failed’ on the site GOF test. I’ve reproduced the figures that you show in your introductory vignette for site and replicate (see 'Goodness of fit plots').This suggests that, at the community level, the model adequately represents variation in occurrence and detection probability across space, but fails to adequately represent variation in the detection probability across the different replicate surveys. Looking at the sites that are most poorly predicted I’ve noticed that most of them are from a single data collector which gives me some indication of what might be going on.
  • The replicate-level GOF PPC seems difficult to link back to specific dates or surveys. Is there a recommended way to diagnose which replicate-level processes are driving poor fit?
  • If my primary goal is inference on community-level occupancy covariate effects (e.g., canopy height), but not replicate-specific detection processes, would this model be defensible for that purpose? Or does the replicate-level lack of fit suggest that occupancy inference may still be biased?
  • Similarly, it wouldn’t really be recommended to do anything species-specific with this model (e.g., predict functional diversity) since so many of the species don’t have adequate GOF?
  • In very large communities, is this degree of lack of fit typical? At this point, I’m trying to distinguish between relatively minor misspecification (e.g., missing detection covariates) versus something more structural in the model formulation or data structure.
Thanks,
Aaron
He/him/his

--
You received this message because you are subscribed to a topic in the Google Groups "spOccupancy and spAbundance users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spocc-spabund-users/HFLMcwE1xV8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spocc-spabund-u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/6a2872bb-884c-4d94-9c7b-5af1f0b6c582n%40googlegroups.com.
spOccupancy_follow-up_questions.docx

Jeffrey Doser

unread,
Mar 12, 2026, 12:26:51 PMMar 12
to Aaron Skinner, spOccupancy and spAbundance users
Hi Aaron, 

Some thoughts below: 
  • Convergence: 40,000 still just may not be enough iterations to run the model. Things are fairly close to convergence, which suggests to me there is no underlying identifiability problem, and rather that you just need to run the model longer. It is not unusual to have to run these models for more than 100,000 iterations. 
  • Significance: by significant, I just mean that a large portion of the posterior distribution does not overlap zero. A rough check would be to see if any 95% credible intervals overlapped zero. Another check would be to look at the probability an effect was positive. If all of those values are near 50, then things are very tightly centered around zero. If you have some very close to 1 or some very close to 0, then those could be considered "significant". 
  • GoF
    • I don't have any recommended way of how to do this. It is one of those things that is highly specific to each data set. 
    • This all depends on your willingness to accept that the species-level estimates may be inaccurate. The accuracy of the community-level parameters and estimates is of course dependent on the accuracy of the ability of the model to accurately characterize species-specific patterns in the model. If the model does a terrible job at predicting estimates for a large number of species, then your community-level estimates may not be super reliable. Whether or not you can defend the model also depends on what other approaches you could use to attempt to answer your question. If there are not a large number of alternatives, then perhaps you might be more willing to deal with the potential model mis-specification in the model. 
    • If a large number of species-specific models are showing bad goodness of fit tests, then I would be hesitant in making large conclusions regarding species-specific concepts. Of course, it might be worthwhile to explore including additional covariates in the model to try and explain the extra heterogeneity in detection probability that appears to be in the model. When there are a large amount of rare species in a data set, GoF tests will often return poor results because it is very difficult to generate good estimates for those species, particularly if their detection probability is very low. 
    • As mentioned above, yes, the lack of fit is common when there are large amounts of rare species in the analysis. My guess is that this is primarily related to detection heterogeneity that is unccounted for in the model, which I would urge you to explore a bit more and see if you can try to accommodate with additional covariates or random effect structures. 
Jeff
--
Jeffrey W. Doser, Ph.D.
Assistant Professor
Department of Forestry and Environmental Resources
North Carolina State University
Pronouns: he/him/his

Aaron Skinner

unread,
Mar 26, 2026, 12:28:56 AM (6 days ago) Mar 26
to Jeffrey Doser, spOccupancy and spAbundance users
Hi Jeff, 


Convergence: I had run 120,000 iterations total, but it sounds like you mean 100,000+ iterations per chain? Increasing the samples drawn from the posterior did help with convergence, good suggestion!

n.burn = 40000, n.thin = 10, n.batch = 4000, batch.length = 25, n.chains = 3
Significance: I am sorry I’m still confused regarding the ‘significance’ of the factor loadings. I understand what you mean regarding intervals crossing zero or the probability an effect was positive, but all of my factors are centered around zero (if I’m interpreting the model output correctly). The lambda.samples correspond to the factor loading scores, right? You can see here that the factor loadings all have density curves that broadly overlap zero.
Captura de pantalla 2026-03-25 a la(s) 21.27.14.png

GOF: Regarding the goodness of fit issue I had made a mistake and a portion of the detection data had been scrambled, so now I’m seeing improved GOF for the replicates as well (bayesian p = 0.459). 

Thanks,
Aaron
He/him/his

Jeffrey Doser

unread,
Mar 30, 2026, 5:24:13 AM (yesterday) Mar 30
to Aaron Skinner, spOccupancy and spAbundance users
Hi Aaron, 

It's not particularly clear to me what you are plotting in that figure. Each species has its own distinct set of lambda (factor loadings) for each of the factors, so I'm not sure what the density plots are that you're showing, since there is only one density curve for each factor. Either you're showing it for one species, or you're summarizing the factors across species, which is not what I was suggesting. What I was saying was more akin to making a plot like the one you showed for each species, where each curve shows the posterior distribution for the factor loading for that given species. If all of them across all species are very close to 0, then that suggests little support for inclusion of the factors in the model (or a limited ability to estimate them from the data). Making all those plots would be cumbersome, so you could also just look at a posterior summary of the factor loadings. Something like summary(out$lambda.samples) gives a quick summary and 95% CI for each of the species-specific factor loadings. If you quickly scan that, you should be able to gauge whether there are any "significant" effects or not. 

Jeff
Reply all
Reply to author
Forward
0 new messages