Re: [r-inla] Likelihoods for a hurdle model

1,402 views
Skip to first unread message

Haakon Bakka

unread,
Jul 30, 2017, 7:53:25 AM7/30/17
to Ruby Ji, R-inla discussion group
I am not sure if your use of zeroinflatedpoisson0 is correct.

Usually for Hurdle models: Where there are 0's the "poisson-part" has an NA observation. The reasoning is that you do not know what it would have been if it had made the "hurdle" of crossing zero.

So I believe you should exchange your zeroinflatedpoisson for a poisson, or a poisson where the zeroes are removed.

PS. You could end up with similar results through different approaches, but estimating the zero-probability will get you in trouble if you have "NA and positive data".

Kind regards,
Haakon Bakka


On 26 July 2017 at 22:57, Ruby Ji <crysta...@gmail.com> wrote:
Hi all,

I have a catch data that are counts characterized by many zeros. I am planing to build a hurdle model including two processes: one generating the zeros and one generating the positive catch.  A binomial probability model governs the binary outcome of whether the catch rate is zero or positive. The conditional distribution of the positive catch rates is governed by a zero-truncated poisson model.

Is it correct to assign family=c(’binomial’, ’zeroinflatedpoisson0’)  for the two processes?

Thanks,
Ruby

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send email to r-inla-discussion-group@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

Rujia Bi

unread,
Jul 30, 2017, 10:10:29 AM7/30/17
to Haakon Bakka, R-inla discussion group
The "poisson-part" in my model only includes positive catch, no "NA". There is an advanced example in the file of zeroinflated poisson likelihood, which also uses "bionomial" and "zeroinflatedpoisson0" together. So I think my model setting should be ok. 

However, I got a DIC="Inf". There were many "Inf" in the local DIC of "binary-part”. I can't find the reason.

Sent from my iPhone

Alain Zuur

unread,
Jul 30, 2017, 10:23:18 AM7/30/17
to R-inla discussion group, crysta...@gmail.com


On Sunday, July 30, 2017 at 12:53:25 PM UTC+1, Haakon Bakka wrote:
I am not sure if your use of zeroinflatedpoisson0 is correct.

Usually for Hurdle models: Where there are 0's the "poisson-part" has an NA observation. The reasoning is that you do not know what it would have been if it had made the "hurdle" of crossing zero.

Hello...

It would either be:

family = "zeroinflatedpoisson0"

for a Poisson hurdle model (which means that the binary part is modelled as a function of an intercept...with no covariates..and the count part as a function of covariates).....or

family=c(’binomial’, ’zerotruncatedPoisson’) 

where the first column of the response variable consists of 0 and 1, and the second column of the response variable contains only the non-zero data  (and the rest is set to NA). The problem is that INLA does not have a zero truncated Poisson distribution....so
zerotruncatedPoisson will not work. Hence......I am not sure how you can put covariates in the binary part of the model.

There are examples in the INLA literature in which this is done for continuous data (rainfall)....but in that example a gamma distribution is used for the non-zero data...and the gamma distribution is already zero truncated. So..there is works.

It would be nice if the zero truncated Poisson/NB distributions could be added to INLA....or if someone can tell me that I am wrong.

Kind regards,

Alain Zuur
www.highstat.com

 

So I believe you should exchange your zeroinflatedpoisson for a poisson, or a poisson where the zeroes are removed.

PS. You could end up with similar results through different approaches, but estimating the zero-probability will get you in trouble if you have "NA and positive data".

Kind regards,
Haakon Bakka

On 26 July 2017 at 22:57, Ruby Ji <crysta...@gmail.com> wrote:
Hi all,

I have a catch data that are counts characterized by many zeros. I am planing to build a hurdle model including two processes: one generating the zeros and one generating the positive catch.  A binomial probability model governs the binary outcome of whether the catch rate is zero or positive. The conditional distribution of the positive catch rates is governed by a zero-truncated poisson model.

Is it correct to assign family=c(’binomial’, ’zeroinflatedpoisson0’)  for the two processes?

Thanks,
Ruby

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send email to r-inla-disc...@googlegroups.com.

Rujia Bi

unread,
Jul 30, 2017, 1:52:31 PM7/30/17
to Alain Zuur, R-inla discussion group
Agree.

Sent from my iPhone

Haakon Bakka

unread,
Aug 2, 2017, 6:47:25 AM8/2/17
to Rujia Bi, Alain Zuur, R-inla discussion group
I think the problem with estimating the zero probability in the zeroinflatedpoisson0 is that you are already inferring zero-probabilities in the bernoulli likelihood. So this parameter may not be identifiable.

Can you try zeroinflatedpoisson0, but fixing the zero probability parameter, e.g. to the empirical estimate? (Take care of the internal parametrisation).

Kind regards,
Haakon



To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsubscr...@googlegroups.com.

To post to this group, send email to r-inla-disc...@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send email to r-inla-discussion-group@googlegroups.com.

Rujia Bi

unread,
Aug 2, 2017, 9:28:36 AM8/2/17
to Haakon Bakka, Alain Zuur, R-inla discussion group
Sorry, I can't understand... If use zeroinflatedpoisson0, how can I add covariates on the binary part?

Thierry Onkelinx

unread,
Aug 2, 2017, 9:31:10 AM8/2/17
to Rujia Bi, Haakon Bakka, Alain Zuur, R-inla discussion group
You can't. All zero inflated distributions in INLA assume a single zero inflation parameter (an intercept only model for the zero inflation part).

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey

To post to this group, send email to r-inla-discussion-group@googlegroups.com.
Visit this group at https://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/d/optout.

Rujia Bi

unread,
Aug 2, 2017, 9:36:25 AM8/2/17
to Thierry Onkelinx, Haakon Bakka, Alain Zuur, R-inla discussion group
It is the reason why I did not use zeroinflatedpoisson0...

Elias T. Krainski

unread,
Aug 2, 2017, 9:42:00 AM8/2/17
to r-inla-disc...@googlegroups.com

On 02/08/2017 10:28, Rujia Bi wrote:
Sorry, I can't understand... If use zeroinflatedpoisson0, how can I add covariates on the binary part?

You can model both parts with covariates + random effects + parmetric non-linear effects in INLA having or not shared components (if not, model separated). Haakon give you insight on this. Basically, you can build one linear predictor for the zeros and another for the positives. You may follow an example for the zero-gamma case in the INLA book.
Elias

Highland Statistics Ltd

unread,
Aug 2, 2017, 9:47:48 AM8/2/17
to Rujia Bi, Thierry Onkelinx, Haakon Bakka, R-inla discussion group



On 02/08/2017 14:36, Rujia Bi wrote:
It is the reason why I did not use zeroinflatedpoisson0...
You can try and ask the INLA team very friendly whether they can add the zero truncated Poisson distribution to R-INLA. Its likelihood function is nearly the same as that of the Poisson....except for a 1 / (1 - exp(-mu_i)) term. Assuming that the (spatial) random effects in the Bernoulli part and the (spatial) random effects in the zero truncated Poisson part are independent of each other, then you/R-INLA can estimate the parameters of the two components separately....and once you have the two parts of the hurdle model, you can easily work out the mean and the variance of the actual hurdle model.

Alain
-- 

Dr. Alain F. Zuur



Author of:
1. Beginner's Guide to Spatial, Temporal and Spatial-Temporal Ecological Data Analysis with R-INLA. (2017). 
2. Beginner's Guide to Zero-Inflated Models with R (2016).
3. Beginner's Guide to Data Exploration and Visualisation with R (2015).
4. Beginner's Guide to GAMM with R (2014).
5. Beginner's Guide to GLM and GLMM with R (2013).
6. Beginner's Guide to GAM with R (2012). 
7. Zero Inflated Models and GLMM with R (2012).
8. A Beginner's Guide to R (2009).
9. Mixed effects models and extensions in ecology with R (2009).
10. Analysing Ecological Data (2007).

Highland Statistics Ltd.
9 St Clair Wynd
UK - AB41 6DZ Newburgh
Tel:   0044 1358 788177
Email: high...@highstat.com
URL:   www.highstat.com

Thierry Onkelinx

unread,
Aug 2, 2017, 9:50:20 AM8/2/17
to Elias T. Krainski, R-inla discussion group
So zeroinflated.poisson.0 with \theta fixed to a small value (e.g. -10) is equivalent to a zero trunctaded poisson?

Prob(y | . . .) = p × 1[y=0] + (1 − p) × Poisson(y | y > 0)
p = exp(\theta)/(1 + exp(\theta))

\theta = -10
p = 0.0005
Prob(y | . . .) = 0.0005 × 1[y=0] + 0.99995 × Poisson(y | y > 0)

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey

--

Elias T. Krainski

unread,
Aug 2, 2017, 10:11:20 AM8/2/17
to Thierry Onkelinx, R-inla discussion group

bigger \theta gives bigger probability of zeros, see inla.doc('zeroinflatedpoisson0')

Elias T. Krainski

unread,
Aug 2, 2017, 10:13:04 AM8/2/17
to r-inla-disc...@googlegroups.com

On 02/08/2017 10:41, Elias T. Krainski wrote:
> You can model both parts with covariates + random effects + parmetric
> non-linear effects in INLA having or not shared components (if not,
> model separated).
One can find a simple example attached

hurdlepoisson.R

Alain Zuur

unread,
Aug 2, 2017, 10:31:06 AM8/2/17
to R-inla discussion group
Clever.

I'm lazy here.....do you have something equally clever for a zeroinflated.poisson.1....so that a ZIP model can be fitted in which covariates are used in both components??

Alain

Thierry Onkelinx

unread,
Aug 2, 2017, 10:39:23 AM8/2/17
to Alain Zuur, R-inla discussion group
I would think that it would be sufficient to copy to covariates.

x.zero <- ifelse(is.na(count), x, NA)
x.count <- ifelse(is.na(count), NA, x)

cbind(zero, count) ~ x.zero + x.count

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey

Thierry Onkelinx

unread,
Aug 2, 2017, 10:42:05 AM8/2/17
to Elias T. Krainski, R-inla discussion group
Yes. But the probability of zero's should be very low to approximate a zero truncated poisson. Hence you want a low \theta.

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey

Elias T. Krainski

unread,
Aug 2, 2017, 11:14:07 AM8/2/17
to r-inla-disc...@googlegroups.com
On 02/08/2017 11:31, Alain Zuur wrote:
>
> I'm lazy here.....do you have something equally clever for a
> zeroinflated.poisson.1....so that a ZIP model can be fitted in which
> covariates are used in both components??
To implement this is actually easier as you just play with 'binomial'
and 'poisson' in the end. However it will becomes a bit messy to
identify which effect do affect which part. Following the previous
example (updated and attached) you can see what happens with 'x2'.

Elias
hurdlepoisson1.R

Thierry Onkelinx

unread,
Aug 2, 2017, 11:41:25 AM8/2/17
to Elias T. Krainski, R-inla discussion group
I played a bit with your example. I've used dplyr for the data wrangling which makes it IMHO more readable. 

More importantly I wanted to investigate the effect of the prior of the zero inflated part. It turns out that it has fixing it at 10 or -10 yields the same parameter estimates for the count process, the same fitted values and the same WAIC. I am not sure if that is what we would expect...
Using the default prior yields a \theta estimate of -5.8 (-8.2, -4.1). The parameter estimates for the count process are equivalent (differences <1e-7 for the mean, <1e-5 for the quantiles). Differences in mean fitted values were < 1e-4. The waic was, to my surprise, much lower.

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey

hurdlepoisson2.R

Alain Zuur

unread,
Aug 6, 2017, 12:55:41 PM8/6/17
to R-inla discussion group


Elias.....thanks for the code. It is now clear to me how to fit a hurdle model (zeroinflatedpoisson0) in INLA in which covariates are used in both parts of the model.

Now to the zeroinflatedpoisson1. If I understand your R code well then that is just executing a model with a Bernoulli and Poisson likelihood function. And the estimated parameters are identical to those obtained by seperate GLMs. But that does not make it a zeroinflatedpoisson1. So...how would you do a zeroinflatedpoisson1 model in INLA such that covariates can be used in both components of the ZIP?

Kind regards,

Alain
 

 
Elias
Message has been deleted
Message has been deleted

Udani

unread,
Jan 11, 2020, 2:16:35 AM1/11/20
to R-inla discussion group
If I use Occurrance for zero inflation part and the counts with (positive , 0, and NA), 

using zeroinflatedpoisson1 and zeroinflatedpoisson0, does this give the correct results?

n.y = n.z = n

z <- rep(0,n)
z[which(count > 0 | count == 0)]<-1

y = count

tmp =rep(NA, 4 * n)
Y = matrix(tmp, ncol = 2)
Y[1:n, 1] = z
Y[(n + 1):(2*n), 2] = y

mu.z = rep(1:0, c(n, n))
mu.y = rep(0:1, c(n, n))

x.z <- c(x, rep(NA, n))
x.y <- c(rep(NA, n), x)

data1 = list(Y = Y, mu.z = mu.z, mu.y = mu.y, x.y=x.y, x.z=x.z)

fit1 <- inla(Y ~ 0 + mu.z + mu.y + x.z + x.y,
             data = data1, family = c("binomial", "zeroinflatednbinomial0"), 
             control.family = list(list(link = "logit"), list(link = "log")),
             control.compute = list(dic = TRUE,cpo = TRUE, po = TRUE)) 

fit2 <- inla(Y ~ 0 + mu.z + mu.y + x.z + x.y,
             data = data1, family = c("binomial", "zeroinflatednbinomial1"), 
             control.family = list(list(link = "logit"), list(link = "log")),
             control.compute = list(dic = TRUE,cpo = TRUE, po = TRUE)) 

Please help!!!!!

Thank you in advance.

Helpdesk

unread,
Jan 12, 2020, 7:41:29 AM1/12/20
to Udani, R-inla discussion group
try to simulate fake data from a simplified model (like with one
covariate), and check that you get the true values back (for large
enough data)
> --
> You received this message because you are subscribed to the Google
> Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to
> r-inla-discussion...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/r-inla-discussion-group/d8b7bcbb-ad77-4b8f-a7aa-690de72c9ead%40googlegroups.com
> .

--
Håvard Rue
Helpdesk
he...@r-inla.org

Udani Wijewardhana

unread,
Jan 12, 2020, 1:13:18 PM1/12/20
to R-inla discussion group, he...@r-inla.org
Noted. Thank you very much


From: Helpdesk <he...@r-inla.org>
Sent: Sunday, January 12, 2020 11:41:33 PM
To: Udani <udaniwij...@gmail.com>; R-inla discussion group <r-inla-disc...@googlegroups.com>
Subject: Re: [r-inla] Likelihoods for a hurdle model
 
Reply all
Reply to author
Forward
0 new messages