I am modeling the abundance of Culex mosquitoes (vectors of West Nile virus) in southern Spain using INLA. The dataset covers two years of sampling, with repeated mosquito counts at spatial locations. The response variable is the number of mosquitoes captured per sampling event.
The data are highly overdispersed and zero-heavy: many locations have zero counts (especially outside rice-growing areas), while in some sites mosquito abundance reaches values above 4,000. Ecologically, mosquito presence and abundance are strongly linked to rice fields, which explains the large spatial heterogeneity and structural zeros. I am fitting a spatio-temporal model using an SPDE spatial random field and seasonal effects, with a formulation such as:
ns(DOY, knots = c(120, 200, 280)) for seasonality
f(spatial, model = spde) for spatial structure
Zero-inflated negative binomial (zeroinflatednbinomial1) as the response distribution
Although the model converges, I still observe very strong overdispersion in the fitted results, and the estimated zero-inflation parameter suggests that only ~6% of the zeros are explained by the zero-inflated component.
My questions are:
Is this behavior expected in highly heterogeneous ecological abundance data like this?
How should I interpret a small estimated zero-inflation probability in the presence of many observed zeros?
Would alternative strategies (e.g. standard negative binomial, hurdle models, additional random effects, or different seasonal structures) be more appropriate in this context?
Any advice or suggestions would be greatly appreciated.
Thank you.
--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
Thanks for the helpful suggestions so far @Håvard & Bob O'Hara,Based on the discussion, I implemented an alternative two-process (hurdle-type) approach in INLA: first modeling presence–absence (N > 0) using a binomial likelihood with an SPDE spatial field, and then modeling positive counts with a negative binomial model including climate covariates, seasonality, and spatial structure. Overall abundance is obtained asE(N) = P(presence) \ E(N|presence) (The expected mosquito abundance is the probability that mosquitoes are present multiplied by the expected number of mosquitoes given that they are present). which avoids using an explicit zero-inflation parameter. This approach appears to handle structural zeros better than zero-inflated NB models or 0poisson, although I still observe mild residual overdispersion and PIT diagnostics indicate that some high counts are under-predicted.I also experimented with the 0Poisson likelihood, which seems to improve overall fit, but I am unsure how to interpret its treatment of zeros and whether it is appropriate when ecological zeros arise from both habitat unsuitability and stochastic abundance variation. Does this hurdle-type strategy seem reasonable for highly aggregated mosquito abundance data, or would you recommend further refinements?