INLA_MODELING ABUNDANCE OF MOSQUITOES

33 views
Skip to first unread message

Shirin Taheri

unread,
Feb 10, 2026, 2:23:13 PMFeb 10
to R-inla discussion group
Dear all, 

I am modeling the abundance of Culex mosquitoes (vectors of West Nile virus) in southern Spain using INLA. The dataset covers two years of sampling, with repeated mosquito counts at spatial locations. The response variable is the number of mosquitoes captured per sampling event.

The data are highly overdispersed and zero-heavy: many locations have zero counts (especially outside rice-growing areas), while in some sites mosquito abundance reaches values above 4,000. Ecologically, mosquito presence and abundance are strongly linked to rice fields, which explains the large spatial heterogeneity and structural zeros. I am fitting a spatio-temporal model using an SPDE spatial random field and seasonal effects, with a formulation such as:

  • ns(DOY, knots = c(120, 200, 280)) for seasonality

  • f(spatial, model = spde) for spatial structure

  • Zero-inflated negative binomial (zeroinflatednbinomial1) as the response distribution

Although the model converges, I still observe very strong overdispersion in the fitted results, and the estimated zero-inflation parameter suggests that only ~6% of the zeros are explained by the zero-inflated component.

My questions are:

  1. Is this behavior expected in highly heterogeneous ecological abundance data like this?

  2. How should I interpret a small estimated zero-inflation probability in the presence of many observed zeros?

  3. Would alternative strategies (e.g. standard negative binomial, hurdle models, additional random effects, or different seasonal structures) be more appropriate in this context?

Any advice or suggestions would be greatly appreciated.

Thank you.

Helpdesk (Haavard Rue)

unread,
Feb 11, 2026, 3:54:50 AMFeb 11
to Shirin Taheri, R-inla discussion group
Hi there,

its hard to tell in general. If the overdispersion is 'to high' it might be a
sign that the model does not fit very well, and it chose to obsorb model-error
into this term.

In general, I would consider this model

inla.doc("0poisson")

as it allow for its own model in the overdispersion. I know, this is not
neg.binomial, but neg.binomial is essentially just poisson + iid term for each
observation, so ...

seems like if you let the prob(zero) be dependent on covariates you might
progress a little, let us know how this goes. if you still have issues, let us
know and we take it from there

Best
Havard




On Tue, 2026-02-10 at 11:08 -0800, Shirin Taheri wrote:
> Dear all, 
> I am modeling the abundance of Culex mosquitoes (vectors of West Nile virus)
> in southern Spain using INLA. The dataset covers two years of sampling, with
> repeated mosquito counts at spatial locations. The response variable is the
> number of mosquitoes captured per sampling event.
> The data are highly overdispersed and zero-heavy: many locations have zero
> counts (especially outside rice-growing areas), while in some sites mosquito
> abundance reaches values above 4,000. Ecologically, mosquito presence and
> abundance are strongly linked to rice fields, which explains the large spatial
> heterogeneity and structural zeros. I am fitting a spatio-temporal model using
> an SPDE spatial random field and seasonal effects, with a formulation such as:
>  * ns(DOY, knots = c(120, 200, 280)) for seasonality
>  * f(spatial, model = spde) for spatial structure
>  * Zero-inflated negative binomial (zeroinflatednbinomial1) as the response
> distribution
> Although the model converges, I still observe very strong overdispersion in
> the fitted results, and the estimated zero-inflation parameter suggests that
> only ~6% of the zeros are explained by the zero-inflated component.
> My questions are:
>    1. Is this behavior expected in highly heterogeneous ecological abundance
> data like this?
>    2. How should I interpret a small estimated zero-inflation probability in
> the presence of many observed zeros?
>    3. Would alternative strategies (e.g. standard negative binomial, hurdle
> models, additional random effects, or different seasonal structures) be more
> appropriate in this context?
> Any advice or suggestions would be greatly appreciated.
> Thank you.
> --
> You received this message because you are subscribed to the Google Groups "R-
> inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to r-inla-discussion...@googlegroups.com.
> To view this discussion, visit
> https://groups.google.com/d/msgid/r-inla-discussion-group/e064ba54-4afe-4112-8a19-aacf2945cd43n%40googlegroups.com
> .

--
Håvard Rue
he...@r-inla.org

Bob O'Hara

unread,
Feb 11, 2026, 7:27:02 AMFeb 11
to Shirin Taheri, R-inla discussion group
To answer your questions:
1. it is usual to have horribly high over dispersion in ecological data. To the point where a Gamma distribution with a shape parameter close to zero is typical, and mosquitoes have exactly the right live history to do this.
2. A small inflation probability with lots of zeroes is because you have typical ecological data - a Gamma with a shape parameter below 1 has a lot of zeroes, so it’s difficult to distinguish between this and a zero inflated model. Both zero inflation and overdispersion give rise more zeroes, so it’s often not worth having both in the model.
3. The alternative strategy of just using a negative binomial is the one I would start with (or a Poisson log normal: essentially a Poisson with a random effect on each observation): it’s simpler. You can then test whether you are correctly predicting the number of zeroes: if you aren’t predicting enough then you should follow Håvard’s advice, and make sure the Poisson part has random effects, so you can add overdispersion there.

Good luck. 

Bob



--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.

Shirin Taheri

unread,
Feb 16, 2026, 4:07:15 AMFeb 16
to R-inla discussion group
Dear all,

Thanks for the helpful suggestions so far @Håvard & Bob O'Hara,
Based on the discussion, I implemented an alternative two-process (hurdle-type) approach in INLA: first modeling presence–absence (N > 0) using a binomial likelihood with an SPDE spatial field, and then modeling positive counts with a negative binomial model including climate covariates, seasonality, and spatial structure. Overall abundance is obtained as
E(N) = P(presence) \ E(N|presence) (The expected mosquito abundance is the probability that mosquitoes are present multiplied by the expected number of mosquitoes given that they are present). which avoids using an explicit zero-inflation parameter. This approach appears to handle structural zeros better than zero-inflated NB models or 0poisson, although I still observe mild residual overdispersion and PIT diagnostics indicate that some high counts are under-predicted.

I also experimented with the 0Poisson likelihood, which seems to improve overall fit, but I am unsure how to interpret its treatment of zeros and whether it is appropriate when ecological zeros arise from both habitat unsuitability and stochastic abundance variation. Does this hurdle-type strategy seem reasonable for highly aggregated mosquito abundance data, or would you recommend further refinements? 

Thanks you so much 
All the best,

Shirin 

Bob O'Hara

unread,
Feb 16, 2026, 12:01:33 PMFeb 16
to R-inla discussion group
In general I'm not a fan of hurdle models for this sort of data, but if it work it works. 

I also would be more worried if you didn't end up with overdispersion: it's absolutely typical with this data. From what you've described I wouldn't be too worried - if under-predicting the high counts is a worry, try without any zero inflation, and see if that gives you a happier model (where happiness is orthogonal to overdispersion).

Bob
Reply all
Reply to author
Forward
0 new messages