overdispersion - "good fit / bad prediction dilemma"

48 views
Skip to first unread message

Walter Di Nicola

unread,
Jun 27, 2025, 9:48:13 AMJun 27
to unmarked

Hello! 

I am working on a N-Mixture model and I selected a model which provides predictions that closely aligns with our expectations and what we see in the field. I based my model selection mainly on the ecological relevance of my variables but also considering the c-hat. However, looking at the c-hat of this model (=2.69), there seem to be some overdispersion issues. Thus, I tried to changed the distribution, and while the negative binomial definitely improves the c-hat, it provides predictions that are ecologically very unrealistic. 

I read in the literature about the “good fit bad prediction dilemma”, and in particular from Kery & Royle (2015) I understood that such models can sometimes be statistically imperfect (i.e. overdispersed) but still provide adequate predictions. 

As I am mainly interested in prediction accuracy rather than statistical inference, can I accept a degree of overdispersion and interpret it as unstructured noise of the data? And can I use such an explanation as a justification for selecting this model?

Thank you very much for your help!

Walter 

Jeffrey Royle

unread,
Jun 27, 2025, 6:33:23 PMJun 27
to unma...@googlegroups.com
hi Walter,
 I don't think that's a huge amount of lack-of-fit. One idea is you can adjust your uncertainties about predictions and estimates using the quasi-likelihood adjustment for over-dispersion, See AHM1: p378:
"However, as we have argued in Chapter 6, in general, if a
fitting overdispersed model predicts unrealistic abundance, then we may settle for the poor fitting
model and some kind of a quasi-likelihood adjustment for lack of fit (“lack-of-fit ratio”) to inflate
the standard errors associated with parameter estimates (see Section 6.9; Chapter 5 in Cooch and
White, 2014; Section 12.3 in Ke´ry and Schaub, 2012). This may be an adequate solution if we only
care about assessing the significance of some covariate effects or account for the additional
uncertainty induced by such “overdispersion” when making predictions (see below)."

Another idea that is not widely used in the unmarked universe is to quit worrying about fit altogether and just pick models based on predictive ability. So you can use cross-validation for example to compute a predictive quality for each model and just rank them. Of course this is what people do in the ML/AI universe where no one is ever concerned with whether models fit or not.  I don't fully understand why that is, but it seems to be the case.

Finally, and I kind of hate to say this, but if your lack of fit is mainly determined by a small number of observations ("outliers") then you might be able to justify omitting those from the model fitting..

regards
andy



--
*** Three hierarchical modeling email lists ***
(1) unmarked (this list): for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology: for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2021) and Schaub & Kéry (2022)
---
You received this message because you are subscribed to the Google Groups "unmarked" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unmarked+u...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/unmarked/c446d782-720a-4d9f-a7fc-d2a13c5d3635n%40googlegroups.com.

Walter Di Nicola

unread,
Jun 30, 2025, 6:32:01 AMJun 30
to unmarked
Hi Andy, 

thank you so much for your quick and detailed answer. I like the option of adjusting the uncertainty using the quasi-likelihood approach - it seems to work well for my data. 

I wanted to briefly summarise my model selection process, and if you have a moment, I would really appreciate your feedback to make sure I did everything correct. 

I first selected a subset of models based on the ecological relevance of the variables. Within this subset, I chose the best model using the qAICc, in order to account for overdispersion in the model selection step.  For the inference, I adjusted my SE (and thus the p-value) using the quasi-likelihood approach. For prediction, I also accounted for overdispersion using the modavgPred() function, following the recommendation by Kéry & Royle (2015).

This way, I believe I have addressed the moderate level of overdispersion in both the inference and prediction stages of the analysis.  

Do you think this would be considered good practice?

Thanks a lot again!

Walter 
Reply all
Reply to author
Forward
0 new messages