Conceptual question on N-Mixture models with data augmentation

Facundo Palacio

unread,

Mar 19, 2025, 9:52:04 AMMar 19

to hmecology: Hierarchical Modeling in Ecology

Hi all,

I have a conceptual question regarding N-mixture models for a model I’m currently trying to fit, and I’d appreciate your insight. As you will see, this is not a typical occupancy model approach.

Study Design
I sampled a set of plants across 10 populations, recording the number of birds foraging on these plants. Additionally, at each population, I documented the abundance of all frugivore species that I know that feed on this plant in the area (i.e. regional pool). For the sake of simplicity, I started with a single population (although each population is several km apart, and I may consider them as independent).

Model objective & structure
My goal is to model species-specific abundances for both the observed foraging species and the undetected frugivores present in the regional pool. The model is structured as follows:

Latent frugivore abundance N[i]: Modeled as a negative binomial.
Mean abundance lambda[i]: Depends on the size of the regional species pool log_pool[i]
Detection process y[i, j]: Binomial, conditional on N[i].

To account for unobserved frugivores, I incorporated "zero data" for species not detected foraging (akin to data augmentation). However, their latent abundances are informed by their regional abundance. I treated individual plants as the repeated visits, as my focus is on describing the species composition at the population level. This makes sense to me because birds are mobile within the area and may forage on a plant at a time when I was not observing it.

Key questions
Does this approach make theoretical sense? I’m unsure if standard data augmentation is appropriate here, as it typically assumes species identity is exchangeable, whereas my ultimate goal is to characterize species composition explicitly. Given that my latent abundances are species-specific and linked to the regional pool, does this approach
align with the conceptual foundations of N-mixture models? Or should I consider an alternative formulation?

I’d greatly appreciate your thoughts on this.

Best,
Facundo

Marc Kery

unread,

Mar 30, 2025, 7:03:35 AMMar 30

to hmecology: Hierarchical Modeling in Ecology

Dear Facundo,

people don't seem to have been overly enthusiastic in replying to your question, so here are a couple of comments:

I find this a fascinating modeling application. It will probably require building a highly customized model, which may not really fall into any clear category such as an occupancy or an Nmix model.
As always, when building a model, it pays to isolate the essential things first and then add in complexity step by step until you're at the desired model. In your case, complexity that I might initially ignore is the following: multiple populations (start with a single one, which is what you already do) and the NegBin (start with Poisson abundance and only once you understand that and see that it works, go to the NegBin).
I thought first that you should also initially ignore detection error, but then it seems this is an essential part of your goals. Hence, you can probably not gain anything by ignoring it. On the other hand, it might possibly help to demote the counts to detection/nondetection and thus model presence/absence of a species rather than abundance, but perhaps also not ...
the conceptual nugget of your problem seems to be that you have a species pool containing the set of potential consumers in the area and then you have some sort of selection process which leads to a set of species that comprises the actual consumers. Obviously, you must have some "thinning" or other type of process that "transforms" the abundance of a species in the potential's pool to its abundance in the actual consumer pool. Thus, a rate parameter might link the two species in the two sets, and this rate measures the preference that a species gives to that particular plant.
Perhaps you might put a suitable abundance distribution on the species in the species pool (start with a Poisson), and then model the expected abundance in the set of actual consumers as something like lambda[i, actual] <- rho[i] * lambda[i, potential], where rho is the preference parameter. And then you have N[i, actual] <- Poisson(lambda[i, actual]) (and perhaps the rho should be put on the realized abundance rather than on their expectation, i.e., on N rather than lambda ?)
One has to be careful with the interpretation and the modeling of the two abundances: do they represent actual abundance or just frequency of use (especially the data collected for the actual users seems to me to fall more likely into the latter category perhaps ?)
And then, you have the issue that you might overlook a potential consumer in the data for the actual consumers, which is a key thing in your modeling, and hence, you really need detection probability in there. So, yes, why not expressing the counts of a species (i) over replicated flowers (j) as in an Nmix model, i.e., as

C[i, actual, j] ~ Binomial(w[i] * N[i, actual], p[i])

where w[i] is an indicator for a species that is a member of the consumer pool on that plant species. (Plus, the w[i] might perhaps go in higher up in the model, i.e., as N[i, actual] <- Poisson(w[i] * rho[i] * lambda[i, potential]) ?)

Ah, well, so much for some ramblings on a sunny Sunday afternoon ..... hope this is somewhat intelligible and at least a little bit useful. Apologies if it is not...

Best regards ----- Marc

From: hmec...@googlegroups.com <hmec...@googlegroups.com> on behalf of Facundo Palacio <fxpa...@gmail.com>

Sent: Wednesday, March 19, 2025 13:31
To: hmecology: Hierarchical Modeling in Ecology <hmec...@googlegroups.com>
Subject: Conceptual question on N-Mixture models with data augmentation

--
*** Three hierarchical modeling email lists ***
(1) unmarked: for questions specific to the R package unmarked
(2) SCR: for design and Bayesian or non-bayesian analysis of spatial capture-recapture
(3) HMecology (this list): for everything else, especially material covered in the books by Royle & Dorazio (2008), Kéry & Schaub (2012), Kéry & Royle (2016, 2021) and Schaub & Kéry (2022)
---
You received this message because you are subscribed to the Google Groups "hmecology: Hierarchical Modeling in Ecology" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hmecology+...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/hmecology/6ffda91b-aca4-4f18-b44e-d3d3e326f981n%40googlegroups.com.

Facundo Palacio

unread,

Apr 4, 2025, 7:31:37 AMApr 4

to hmecology: Hierarchical Modeling in Ecology

Dear Marc,

I really appreciate your feedback. I’ll do my best to incorporate your suggestions and propose a potential model for you to review, if possible. As you rightly pointed out, my response variable is not abundance, but consumption. I wouldn’t frame it in terms of preference either, since high consumption might simply reflect a greater number of birds in the area—which is precisely what I’m aiming to capture in the model. I also have data on plant traits that are known to influence consumption, and while I haven’t included them yet, I’m considering adding them in a later stage of the analysis.

Thanks again,

Facundo

Facundo Palacio

unread,

Apr 29, 2025, 10:49:36 AMApr 29

to hmecology: Hierarchical Modeling in Ecology

Dear Marc,

I’ve drafted an initial version of the model. Based on your feedback, I incorporated the degree of frugivory—measured as the proportion of fruits in the diet—as the parameter rho, instead of using a direct measure of "preference", which is much harder to quantify. Accordingly, I made a slight modification to your approach: the indicator variable w[i] now depends on the degree of frugivory (i.e. a bird that eats fruits will have more chances to be a consumer of this species). The model is:

model {
for (i in 1:n_species) {
# Regional pool abundance
lambda_potential[i] <- exp(alpha + beta*log_pool[i])

# Indicator variable (whether species i actually consumes or not)
w[i] ~ dbern(omega*rho[i]) # w depends on frugivory degree

# Actual expected abundance (after selection)
lambda_actual[i] <- w[i]*lambda_potential[i]

# Realized abundance
N_actual[i] ~ dpois(lambda_actual[i])

# Detection model over replicated plants
for (j in 1:n_plants) {
C[i, j] ~ dbin(p[i], N_actual[i])
}

# Detection probability
p[i] ~ dunif(0, 1)
}

# Hyperpriors
omega ~ dunif(0, 1) # Baseline probability of consumption
alpha ~ dnorm(0, 0.1) T(-10,10)
beta ~ dnorm(0, 0.1) T(-10,10)
}

I'd strongly appreciate your thoughts, as I'm not a statistician.

Best,

Facu

To view this discussion visit https://groups.google.com/d/msgid/hmecology/b2b5360d-f261-4003-a4a3-2dffb6e1247fn%40googlegroups.com.

Reply all

Reply to author

Forward