Convergence and sampling diagnostics in multi-species latent factor spatial model

19 views

Skip to first unread message

Aaron Skinner

unread,

Feb 6, 2026, 12:43:42 AMFeb 6

to spOccupancy and spAbundance users

Hi Jeff and all,

I am running a multi-species latent factor spatial model with the 100 most abundant species in my system (more info on study system & goals). Overall the models seem to be running decently well, but there are several parameters that have Rhats greater than 1.1 or effective sample sizes < 100 (about 10% total). I've attached a Word document with some screenshots of model output, figures, tables, and the model specification.

The parameters that are not fitting well are mostly the intercepts for both alpha and beta, as well as several of the beta estimates. And unfortunately, the community level parameters also don't seem to be sampling very well. Furthermore, the trace plots on the factor loadings show poor convergence for many species.

So I have a few questions about what I might do to improve convergence, sampling efficiency, and GOF. You have at least three suggestions in your 'convergence issues vignette': 1) consider the order of the species, 2) using more factors, or 3) fit one chain. I’m curious whether you think any of these might be particularly helpful in my case, and how you would approach them in practice.

1) Species order: In the vignette you recommend ordering species based on underlying biology (e.g., functional guilds). How should goodness of fit factor into this decision? My goal is inference on the full bird community, so I’d like biologically meaningful structure to be represented. However, ordering more widespread species (those observed at more sites and across more ecoregions) first consistently improves WAIC. I recognize WAIC reflects predictive performance, but it also loosely tracks model fit (ideally I’d rely on PPCs, though I’m currently limited by RAM). With respect to the recommendation from Carvalho et al. (2008) discussed in the convergence vignette, I’ve included code and output in the Word document—does my interpretation look correct?

2) Number of latent factors: Would you prioritize goodness of fit here, biological interpretability, or some balance of the two? Given that my main interest is community response to land-use change, should factor structure reflect functional guilds, shared responses to disturbance, or something else?

Finally, outside of the three suggested strategies, I’m wondering whether tuning parameters or initial values might help improve sampling efficiency. Acceptance rates are close to the target (~0.43), and tuning values stabilize above 2 by the first report (n.report = 100). I’ve included example output in the document.

Thanks very much for any insight you all might have.
Aaron, PhD Candidate, UBC

Convergence sampling diagnostics spOccupancy.docx

Jeffrey Doser

unread,

Feb 17, 2026, 4:01:57 PMFeb 17

to spOccupancy and spAbundance users

Hi Aaron,

Sorry for the delay. Have you been able to successfully fit a model with msPGOcc()? This is what I would first suggest doing as this model is substantially simpler and would let you better narrow down how/if you can achieve convergence of the model you currently have specified. Here are some other thoughts:

It's not clear to me how long you've run the model for based on the document you shared (n.batch = 200 it seems, but I don't know what you set the batch.length to be). These models can require hundreds of thousands of MCMC iterations to achieve convergence, so if you're running it for a lot less than 100,000 I'm not too surprised it hasn't converged.
I may be misinterpreting the density plot that you are showing in the word document of the factor loadings, but is that showing the density of the actual values of the factor loadings? If so, there is certainly an identifiability problem in the model, since the values of the factor loadings span an extremely massive range (-5000 to 5000). I'm not exactly sure what the plot is showing though.
You should be cautious in interpreting WAIC for models that are far from convergence.
I don't remember the exact approach Carvalho et al. recommend, but generally you should look at which species has the maximum mean value of a given factor loading, set that species to the corresponding order in the y-matrix, and then do that for all the factors.
It might be worthwhile to try to fit a model with fewer factors just to see if you can achieve convergence of the model.
These are complex models, so there is no best approach for how to set the number of latent factors. I would recommend basing the number of factors biologically, while also considering potential computational limitations. While you can more formally assess the number of factors you want in your model by doing a series of WAIC comparisons, that is probably not the best approach if you're really focused on inference. If you have way too many factors, you can look at the factor loadings and you would see that there are effectively no "significant" loadings for certain factors (the latter factors). Alternatively, if you have way too few factors, you would likely see many significant factor loadings for all factors in your model, which may signal that there is additional residual spatial structure you're not accounting for.
Tuning parameters won't really have any influence if acceptance rates are close to 0.43 by the end of the MCMC chain.
Initial values could help with convergence of the factor loadings. The factor loadings can be hard to identify, so sometimes more restrictions on the initial values need to be placed. You can do this by manually specifying the initial values for lambda based on a preliminary run. Then, you would want to add a small amount of noise to those values across different MCMC chains so that the chains don't start at exactly the same values. To do that, you would need to run the chains separately by setting n.chains = 1 and running that script three different times with different initial values. Setting n.chains > 1 won't allow you to control the initial values of lambda (unless you fixed them at exactly the same value, which I don't recommend.

Jeff

Aaron Skinner

unread,

Feb 27, 2026, 12:09:29 AM (12 days ago) Feb 27

to Jeffrey Doser, spOccupancy and spAbundance users

Hi Jeff,

Thanks for your thoughts. I’ve continued playing around to try to achieve better convergence. As I mentioned, I’ve split the dataset into two parts to try to improve identifiability, which has helped. The results and questions below pertain to the forest dataset, estimating a model with 7 latent factors and 213 species (this is the model I’m struggling with more). Model parameters were:
n.burn = 20000, n.thin = 10, n.batch = 1600, batch.length = 25, n.chains = 3

Convergence
You’re correct that just increasing the burn in and overall number of samples has greatly improved convergence (Rhats & ESS), although some of the community level parameters still have ESS < 100 & high Rhats (particularly the occupancy variance terms, and the occupancy random effect term point count cluster). See 'Convergence plot' in the attached Word doc.

Factor loadings plot

Yes, I had made a mistake previously. The 'Factor loadings plot' in the attached Word doc shows the mean factor loading across 6000 draws for each species X factor combination. In your previous email you mentioned that the latter factors wouldn’t be "significant" if there were too many factors. All the factor loadings are centered around zero (see below), but what do you mean regarding ‘significance’?

Goodness of fit

GOF seems mostly OK at the site level, at least for the community as a whole (freeman-tukey Bayesian p-value = 0.384), but the ‘per replicate’ GOF is poor (Bayesian p-value < 0.1). 60% of the GOF tests ‘failed’ overall (site & replicate), mostly underpredicting (p < 0.1), and even 47% of the species ‘failed’ on the site GOF test. I’ve reproduced the figures that you show in your introductory vignette for site and replicate (see 'Goodness of fit plots').This suggests that, at the community level, the model adequately represents variation in occurrence and detection probability across space, but fails to adequately represent variation in the detection probability across the different replicate surveys. Looking at the sites that are most poorly predicted I’ve noticed that most of them are from a single data collector which gives me some indication of what might be going on.

The replicate-level GOF PPC seems difficult to link back to specific dates or surveys. Is there a recommended way to diagnose which replicate-level processes are driving poor fit?
If my primary goal is inference on community-level occupancy covariate effects (e.g., canopy height), but not replicate-specific detection processes, would this model be defensible for that purpose? Or does the replicate-level lack of fit suggest that occupancy inference may still be biased?
Similarly, it wouldn’t really be recommended to do anything species-specific with this model (e.g., predict functional diversity) since so many of the species don’t have adequate GOF?
In very large communities, is this degree of lack of fit typical? At this point, I’m trying to distinguish between relatively minor misspecification (e.g., missing detection covariates) versus something more structural in the model formulation or data structure.

Thanks,

Aaron

He/him/his

--
You received this message because you are subscribed to a topic in the Google Groups "spOccupancy and spAbundance users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/spocc-spabund-users/HFLMcwE1xV8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to spocc-spabund-u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/spocc-spabund-users/6a2872bb-884c-4d94-9c7b-5af1f0b6c582n%40googlegroups.com.