Re: [r-inla] Double hurdle model

41 views
Skip to first unread message

Håvard Rue

unread,
Feb 27, 2025, 3:05:18 AM2/27/25
to Ruby Ji, R-inla discussion group
this is possible only if one of p_1 and p_2, and one of c1 and c2, are simple,
like only contain fixed effects

even at this point, it will require new implementation to be done


On Wed, 2025-02-26 at 19:01 -0800, Ruby Ji wrote:
> Dear INLA group,
>
> I have a question regarding a double hurdle model that I am developing. I plan
> to fit a hurdle model for the “small” and “large” classes separately. My
> available data include observations of the total count (small + large) as well
> as additional but not a lot of composition information. I have a few questions
> concerning the likelihood formulation for the total count.
>
> The structure of my double hurdle model is as follows:
>
> For class “small”:
> • Probability part:
>   logit(p₁) = β₁₁ + year₁₁ + spatial₁₁
> • Positive count part:
>   log(c₁) = β₁₂ + year₁₂ + spatial₁₂
>
> For class “large”:
> • Probability part:
>   logit(p₂) = β₂₁ + year₂₁ + spatial₂₁
> • Positive count part:
>   log(c₂) = β₂₂ + year₂₂ + spatial₂₂
>
> Likelihood:
> • For the encounter process, I model the binary indicator (e,
> which takes the value 0 or 1) as:
>   e ~ dbinomial(1 − (1 − p₁) × (1 − p₂))
>                      # where p₁ and p₂ are the encounter probabilities for
> “small” and “large,” respectively.
>
> • For the positive counts, I assume:
>   y ~ dpoisson(exp(c₁) + exp(c₂))
>                      # Here, exp(c₁) and exp(c₂) represent the expected
> positive counts for each class.
>
> My main question is about the positive count likelihood:
> Is it appropriate to simply model the total positive count as the sum of the
> expected counts from the two classes (i.e., exp(c₁) + exp(c₂))? I am concerned
> that this approach might not fully capture potential differences in the
> contribution of each class to the overall count, and I would appreciate any
> suggestions or references regarding this formulation.
>
> Additionally, could you please advise if INLA is capable of fitting such a
> model?
>
> Thank you very much for your help.
>
> Best regards,
> Ruby
> --
> You received this message because you are subscribed to the Google Groups "R-
> inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to r-inla-discussion...@googlegroups.com.
> To view this discussion, visit
> https://groups.google.com/d/msgid/r-inla-discussion-group/93c2d034-ef97-46cb-8090-a7f0e18fbdcdn%40googlegroups.com
> .

--
Håvard Rue
hr...@r-inla.org

Finn Lindgren

unread,
Feb 27, 2025, 3:50:35 AM2/27/25
to R-inla discussion group, Ruby Ji
Hi,

This model can in principle be written as a nonlinear inlabru model,
via predictors
eta_encounter = qlogis(1-(1-plogis(eta_c1))*(1-plogis(eta_c2)))
eta_positive = log(exp(c1)+exp(c2))

However, as written, there is no way to identify the relative
contributions from "small" and "large", making the model
nonidentifiable.
Do you have any observations that distinguish between "small" and
"large"? Without that, I don't think one can get useful inference from
it, partly due to multimodality, but also the fundamental
nonidentifiability.

Treating a total count as y ~ Poisson(exp(c1)+exp(c2)) is appropriate
if the "small" count is y1 ~ Poisson(exp(c1)) and the "large" count is
y2 ~ Poisson(exp(c2)), since the sum of two (independent) Poisson
variables is Poisson.
The hurdle model then needs a truncated poisson model, "nzpoisson",
not a full Poisson model.

Finn
> You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
> To view this discussion, visit https://groups.google.com/d/msgid/r-inla-discussion-group/0f5ce4af8a73a8f1f2d2c0c43381b9527051446a.camel%40r-inla.org.



--
Finn Lindgren
email: finn.l...@gmail.com

rujia bi

unread,
Feb 27, 2025, 5:28:21 AM2/27/25
to Finn Lindgren, R-inla discussion group
I have length frequency information for a subset of data. Initially, I considered fitting multiple models (encounter and count for each length class, encounter and count for the rest combined data) and sharing some information among them within INLA.

I found a mistake in my previous equations.
The total positive count y should follow nzpoisson(c1 + c2), since I used a log-link in the length-specific positive count sub-models.

I am still confused about whether I should use y ~ nzpoisson((p1 * c1 + p2 * c2) / (1 - (1-p1)*(1-p2)) to better reflect the expected total positive count given that at lease one individual was encountered.

Following this, if the positive catch is in weight rather than count, can I assume that y ~ lognormal(log((p1 * c1 + p2 * c2) / (1 - (1-p1)*(1-p2)), sigma)? I know that the sum of two lognormal variables is a complex distribution. If I don’t simply do a sum, is it possible to use lognormal? Data in weight provided more length frequency information.

Apologies if my questions seem basic; I’m not a statistician but am eager to learn.

> On Feb 27, 2025, at 12:50 AM, Finn Lindgren <finn.l...@gmail.com> wrote:
>
> Hi,

Finn Lindgren

unread,
Feb 27, 2025, 6:06:43 AM2/27/25
to rujia bi, R-inla discussion group
The idea in the hurdle model construction is that the positive counts
are modelled conditionally on there being at least one, and to not
scale the parameter by 1/(1-(1-p1)*(1-p2)).
But I see the point about whether the combined model actually has a
nzpoisson structure at all.

If y1 and y2 are the small and large counts (from 0 and upwards), we have
p_y1(x) ~ I(x=0) (1-p1) + p1 * I(x>0) p_nzp(x;c1) (i.e. a hurdle model
with 0/1 and nzpoisson)
p_y2(x) ~ I(x=0) (1-p2) + p2 * I(x>0) p_nzp(x;c2)

The question is then what distribution the sum y = y1+y2 has, and
whether that can be written on the same hurdle form.

For the sum of two independent random variables, we get
p_y(z) = \sum_{x=0}^z p_y1(x) p_y2(z-x)

I suspect this doesn't result in an expression of the form
I(z=0) (1-py) + py * I(z>0) p_nzp(z;cy)
which would be required for the model to be the true combined result
of the building blocks y1 and y2.

But a good approximation might be had by working out what the expected
value of y is, and then choosing a regular hurdle model where the
parameter is chosen to give that overall expectation.

E(y1+y2) = p1 * c1 / (1 - exp(-c1)) + p2 * c2 / (1 - exp(-c2))
and
E(y) = py * cy / (1-exp(-cy))

and since the nonzero probability is known to be py = 1-(1-p1)*(1-p2),
that gives a nonlinear equation to solve for cy to make the
expectations equal, and log(cy) could then be used as the predictor
expression in a regular hurdle model.
How good this approximation might be I don't know (and it's possible
that your suggestion is close to this as well).
I'm also not sure how practical this would be to implement and use!

Finn

rujia bi

unread,
Feb 27, 2025, 7:03:08 AM2/27/25
to Finn Lindgren, R-inla discussion group
Thank you! These are really helpful!
Reply all
Reply to author
Forward
0 new messages