Good converge but poor GOF

85 views
Skip to first unread message

Juan Gallego Zamorano

unread,
Jul 30, 2024, 8:48:03 AMJul 30
to spOccupancy and spAbundance users
Hi Jeff, 

Great that you have created this group! And congrats on the packages!! They are very useful and easy to apply so far. I'm fitting distance models with spAbundance for line transects with only one visit to estimate bird abundances. 

My model seems to converge properly with rhats close to 1 and high ESS for all parameters (attachment 1). However, the GOF has a p-value of 0 which indicates a very poor fit of the model to the data (attachment 2). I tried the NULL model and it happened the same so I added variables that I thought could be relevant for the species. One important thing is that the data is Zero-inflated (~70% of transects have true 0s). Do you have any tip to improve the GOF or check what's going to make the model to fit properly?

Thanks in advance!!

Juan

Screenshot 2024-07-30 143923.png
Screenshot 2024-07-30 144437.png

Jeffrey Doser

unread,
Jul 30, 2024, 3:05:19 PMJul 30
to Juan Gallego Zamorano, spOccupancy and spAbundance users
Hi Juan,

Thanks for the note! Glad to hear you've found the packages easy to use so far. 

The concepts of convergence and GoF testing are two separate concepts. Convergence assess whether the Bayesian MCMC algorithm gets to a point where you can reliably interpret the parameters and feel confident that the algorithm is working properly. GoF assesses whether or not your model is a good representation of the data at hand. It is not uncommon to have a converged model with a bad Bayesian p-value, or a model that has a good Bayesian p-value that hasn't converged. 

Your thought of adding covariates to the model that you think will influence abundance/detection probability is definitely the right idea to improve the model fit. A low Bayesian p-value indicates that there is more dispersion in the data than what the model is predicting. Your intuition is certainly correct that this problem can often arise when there is zero-inflation in the data, or when there is unexplained spatial variation in abundance and/or detection probability that the model is not currently accounting for. Unfortunately, there is not currently a zero-inflated Poisson distribution to fit in spAbundance (it's on my todo list to eventually add in). However, the package does support a Negative Binomial distribution (set family = 'NB') in the model, which could potentially soak up some of that variation that is not being explained. Alternatively, you could explore fitting a spatial model with spDS(). This fits the same model as DS(), but now also attempts to account for spatial autocorrelation with a spatial random effect. If there is spatial structure in the additional variation in the data that your current model is not accounting for, this could help improve the model fit. So, I would suggest you first try out using the Negative Binomial distribution, and if that doesn't help explore fitting a spatial model with spDS(). 

In addition to trying those two things, it could be useful to dig more into the results of the posterior predictive check by making some visualizations of the results from ppcAbund(). Check out this section in the intro spOccupancy vignette which shows how to make some plots from the resulting model objects from a single-species occupancy model. Although that is for an occupancy model, the exact same code should work for the distance sampling model (except you use ppcAbund instead of ppcOcc). 

Hope that helps!

Jeff

--
You received this message because you are subscribed to the Google Groups "spOccupancy and spAbundance users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to spocc-spabund-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/spocc-spabund-users/57f1857e-7e0f-4a62-86b9-477e0aa6b56fn%40googlegroups.com.


--
Jeffrey W. Doser, Ph.D.
Assistant Professor
Department of Forestry and Environmental Resources
North Carolina State University
Pronouns: he/him/his

Juan Gallego Zamorano

unread,
Jul 30, 2024, 3:34:59 PMJul 30
to spOccupancy and spAbundance users
Hi Jeff,

Thanks for your clear answer! Good to know that is not an uncommon thing, I'll keep it in mind.

As for your suggestions, I already thought about those two :) The NB didn't help at all, so I decided to go spDS way without success, yet. I'm getting this error constantly: Error in spDS(abund.formula = abund.formula, det.formula = det.formula, : c++ error: dpotrf failed. There is not much about it (at least according to my search) but I suspect that it is related to the phi parameter and my coordinates. I tried the prior in your vignette of c(3 / max.dist, 3 / min.dist), and also to leave the prior without a value but I kept getting the error. I guess I need to keep playing to find the right prior for it. Are you familiar with that error?

Thanks a lot for sharing the vignette, I will definitely dive into the results more to understand what's going on!

Best regards,

Juan

Jeffrey Doser

unread,
Jul 31, 2024, 6:08:27 AMJul 31
to Juan Gallego Zamorano, spOccupancy and spAbundance users
Hi Juan,

Bummer that the NB model didn't help. Apologies for the very cryptic error with the spatial model, I should at some point make that a bit more user-friendly. That error indicates there is something wrong when the model is updating the spatial random effects in the model. This can happen for a variety of reasons but most commonly it is some sort of problem with the spatial coordinates ("coords" in the data list). First, make sure that you have specified the coordinates in a projected coordinate system (i.e., meters or km or something like that) as opposed to latitude/longitude values. Second, make sure that each site (i.e., each of your distance sampling transects/point counts) has its own unique set of spatial coordinates. If you have checked both of those two things, it could also be due to the spatial orientation of the sites in your data set. If you have highly clustered data such that there are some locations that are very close together and other locations that are very far apart, you may need to put a more informative prior on the spatial decay parameter phi. This usually involves changing the upper bound of the uniform prior from "3/minimum distance" to something slightly more informative. I give some guidance on how to change the prior on phi in this document here (again in the context of occupancy models, but it all still applies). If none of that helps, feel free to send me off-list your script and data and I can try to see what's going on.

Cheers,

Jeff

Juan Gallego Zamorano

unread,
Jul 31, 2024, 7:13:45 AMJul 31
to spOccupancy and spAbundance users
Hi Jeff,

After doing some careful checks of the coords I found that when I calculated the centroid of the transects, the order was changed so basically the coordinates didn't match the observations and I guess that affected the fitting. Now it seems to work and the first tests show a better GOF with p-value around 0.5. Now is running with just non-informative prior for phi but I'm going to check your document and see if I can make it better so it runs faster. Thanks a lot again for your help!

Cheers,

Juan

katie spencer

unread,
Nov 28, 2024, 8:46:39 AM (8 days ago) Nov 28
to spOccupancy and spAbundance users
Hi, 

Thanks for the great package! I have a query relating to this thread whereby I have many zeros in my dataset (rare species) but good spatial and temporal replicate coverage with my camera traps. As expected, I am not getting much of a signal from covariates - is there a way to account for zero-inflation? I tried family = "NB" but I am using the SpOccupancy package rather than SpAbundance & I get an error that this isn't part of the model set up. Any help appreciated, thank you! 

Katie 

Jeffrey Doser

unread,
Nov 30, 2024, 8:28:28 AM (6 days ago) Nov 30
to spOccupancy and spAbundance users
Hi Katie,

Thanks for the kind words. The negative binomial distribution is for count data, so that's not applicable for fitting occupancy models with detection-nondetection data with the types of models in spOccupancy. An occupancy model by definition is a zero-inflated binomial GLM (i.e., the detection model is effectively accounting for more 0s than expected because of false negatives). If you are finding a not-so-great fit of your model that you think is related to a large number of 0s that your covariates are not accounting for, you could try two things: 
  • Fit a spatial model (if you aren't already). The spatial random effect can do a good job of soaking up additional variation in the distribution of your species that your covariates aren't accounting for. 
  • Attempt to put a site-level random effect in detection probability if you think there is heterogeneity in detection probability that the covariates in your model are not accounting for. 
However, both of those suggestions are fairly "data hungry" and they may be difficult to estimate if you don't have a whole lot of detections. With very few detections of a species, it can be quite difficult to get a model to fit well unless you have very good covariates. 

Hope that helps,

Jeff

Reply all
Reply to author
Forward
0 new messages