Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

Predictions for zeroinflated models

1,414 views
Skip to first unread message

thierry....@gmail.com

unread,
Jan 3, 2013, 5:41:41 AM1/3/13
to r-inla-disc...@googlegroups.com
Dear all,

I have count data with overdispersion, zero-inflation and missing data. I'd like to fit a model to this data to impute the missing data. Below is a reproducible example.

My questions:
1) does this give me predictions on the original scale of the data?
2) some of the marginal distributions go below zero. Does that indicate that these data points have a high probability of being zero?
3) Any suggestions to get random counts based on the marginal distributions?

Best regards,

Thierry

library(INLA)
library(ggplot2)
set.seed(12345)
nLocation <- 40
nYear <- 20
Intercept <- 3
sdYear <- 0.5
sdLocation <- 3
sdNoise <- 1
probZero <- 0.25
probMissing <- 0.2

dataset <- expand.grid(location = seq_len(nLocation), year = seq_len(nYear))
dataset$eta <- with(dataset, Intercept + rnorm(nYear, mean = 0, sd = sdYear)[year] + rnorm(nLocation, mean = 0, sd = sdLocation)[dataset$location] + rnorm(nrow(dataset), mean = 0, sd = sdNoise))
dataset$y <- rpois(nrow(dataset), lambda = exp(dataset$eta))
dataset$nonzero <- rbinom(nrow(dataset), size = 1, prob = 1 - probZero)
dataset$missing <- rbinom(nrow(dataset), size = 1, prob = probMissing)
dataset$y0 <- with(dataset, ifelse(missing == 1, NA, y * dataset$nonzero))
ggplot(dataset, aes(x = year, y = y0, group = location)) + geom_line()

link <- rep(NA, nrow(dataset))
link[which(is.na(dataset$y0))] <- 1

model <- inla(y0 ~ f(year, model = "iid") + f(location, model = "iid"), family = "zeroinflatednbinomial1", data = dataset, control.predictor = list(compute = TRUE),  control.compute = list(dic = TRUE))
plot(model)
summary(model)
which(model$summary.fitted.values$mean < 0)
plot(model$marginals.fitted.values[[which.min(model$summary.fitted.values$mean)]], type = "l")

dataset$prediction <- model$summary.fitted.values$mean
ggplot(dataset, aes(x = year, y = prediction)) + geom_line() + geom_point(aes(y = y0)) + facet_wrap(~location, scales = "free_y")

INLA help

unread,
Jan 3, 2013, 6:26:54 AM1/3/13
to thierry....@gmail.com, r-inla-disc...@googlegroups.com
On Thu, 2013-01-03 at 02:41 -0800, thierry....@gmail.com wrote:
> Dear all,
>
> I have count data with overdispersion, zero-inflation and missing
> data. I'd like to fit a model to this data to impute the missing data.
> Below is a reproducible example.
>
> My questions:
> 1) does this give me predictions on the original scale of the data?
> 2) some of the marginal distributions go below zero. Does that
> indicate that these data points have a high probability of being zero?
> 3) Any suggestions to get random counts based on the marginal
> distributions?


check if this code runs better...

you forgot to add the argument 'link = link' in control.predictor. the
fitted values are now on the correct scale. this fixed #1 and #2.

#3: Don't understand...

Best,
Håvard



--
INLA help <he...@r-inla.org>
R-INLA
runme.R

thierry....@gmail.com

unread,
Jan 3, 2013, 7:29:46 AM1/3/13
to r-inla-disc...@googlegroups.com, thierry....@gmail.com, he...@r-inla.org
Dear Håvard,

Thank you for your fast reply.

Meanwhile I found the inla.rmarginal() function which almost does what I want for my question #3. It returns continuous values while I expect integers, like rpois() and rbinom() , since I use a negative binomial family. Would it be ok to round the output of inla.rmarginal()? Round it to the nearest integer or round it downwards?

Best regards,

Thierry

Op donderdag 3 januari 2013 12:26:54 UTC+1 schreef help help het volgende:

Finn Lindgren

unread,
Jan 3, 2013, 8:37:17 AM1/3/13
to thierry....@gmail.com, r-inla-disc...@googlegroups.com, thierry....@gmail.com, he...@r-inla.org
On 3 jan 2013, at 13:29, thierry....@gmail.com wrote:
Meanwhile I found the inla.rmarginal() function which almost does what I want for my question #3. It returns continuous values while I expect integers, like rpois() and rbinom() , since I use a negative binomial family. Would it be ok to round the output of inla.rmarginal()? Round it to the nearest integer or round it downwards?

No, the values you get are not negbin samples, but rather samples from the posterior for the expectation/location _parameter_ of the negbin model.  If you only need to sample a single data point you could thus feed the rmarginal value into a sampler for the negbin distribution to get a sampled data point from the posterior predictive data distribution.

However, if you need to jointly sample several data points it gets much more complicated to do correctly, since that requires joint samples from the posterior of the latent field, and expert knowledge is likely needed.  Also, I'm not sure those simulated values would be appropriate for imputation, which to me seems a very non-Bayesian thing to do, since Bayesian calculations automatically integrate out the uncertainty of the "missing" values.

Finn

thierry....@gmail.com

unread,
Jan 3, 2013, 9:22:13 AM1/3/13
to r-inla-disc...@googlegroups.com, he...@r-inla.org
Dear Finn,

Thanks for the clarification on the interpretation of the marginal values.

We need to report estimates for the totals: for each year the sum of the counts over all locations. Therefore we would like to impute the missing data.

Is it doable to get joint posterior densities out of INLA?

Best regards,

Thierry, how is more skilled in the frequentist stuff than the bayesian stuff...

Op donderdag 3 januari 2013 14:37:17 UTC+1 schreef Finn Lindgren het volgende:

INLA help

unread,
Jan 3, 2013, 9:36:55 AM1/3/13
to thierry....@gmail.com, r-inla-disc...@googlegroups.com
On Thu, 2013-01-03 at 06:22 -0800, thierry....@gmail.com wrote:
> Dear Finn,
>
> Thanks for the clarification on the interpretation of the marginal
> values.
>
> We need to report estimates for the totals: for each year the sum of
> the counts over all locations. Therefore we would like to impute the
> missing data.
>
> Is it doable to get joint posterior densities out of INLA?

partly. well, you'll get the gaussian part with its correlations.
depending on size of the 'imputations' there are essentially two ways
out:

for all sites with missing observations, sample the observations from
each site using the estimated zero-probability (from the zero-inflated
model) and sample from the linear predictor (inla.rmarginal()) and then
sample the missing obervations. sum it all up and display. this
strategy neglect any dependency within the linear predictors, but at
the same time, this dependency is reduced as you sample new data. you
can compute the posterior correlation matrix for all those linear
predictors, using 'linear combinations'; see the FAQ. then you have to
sample all the linear predictors jointly, but I'm not convinced that
you'll see a big effect...

there are more complicated alternatives as well, which are more
accurate, but this gets very involved and I do not think they will lead
to very different results.

Best
H

Finn Lindgren

unread,
Jan 3, 2013, 9:32:16 AM1/3/13
to r-inla-disc...@googlegroups.com
On 03/01/13 15:22, thierry....@gmail.com wrote:
> We need to report estimates for the totals: for each year the sum of
> the counts over all locations. Therefore we would like to impute the
> missing data.

Aha, I see. In the Bayesian setting, what you want is the "posterior
predictive distribution for the total counts". In the case of Gaussian
data that wouldn't involve imputation at all, since one can then
explicitly calculate the distribution of the total. In your case I also
wouldn't call it "imputation", since it involves simulating multiple
samples of the joint posterior (so that one gets both a point estimate
and an uncertainty measure, which is typically the whole point of
running inla...).

> Is it doable to get joint posterior densities out of INLA?

Densities, no. Joint samples, yes, but not trivial. Since it involves
new experimental features I'll email you off-list about it...

/Finn

Dong Liang

unread,
Jun 17, 2015, 5:43:20 PM6/17/15
to r-inla-disc...@googlegroups.com
Dear group,

I have similar questions with predictive distribution from a zero-inflated likelihood.

My goal is to estimate the total from a sample using a SPDE model. So I need to spatially interpolate over a fine grid, and ideally at the data scale, and use linear combination to estimate the total.

Please check attached my problematic codes.

My questions are:

1) With the link=1 option, is the "summary.fitted.values"  prediction before applying zero-inflation, or after zero-inflation?

2) If it is before zero-inflation, the total would depends on the zero inflation probability, and the probability enters non-linearly, so linear combination could not be directly applied. is that right?

3) I also tried the experimental sample facility, to generate predictive distribution manually, but the results do not look right, could you please check the codes and correct error(s)?

Regards
dong
inla_question_4.R

Dong Liang

unread,
Jul 1, 2015, 11:19:26 AM7/1/15
to r-inla-disc...@googlegroups.com
Dear all,

I have also similar count data with zero inflation, and would like to fit spatial model.

I have question regarding the "summary.fitted.values" from a type 0 zero inflated poisson model.

Please see attached a simulated example, it seems that the "summary.fitted.values" does not account for zero-inflation,

## predictive distribution from zero inflated Poisson model
require(INLA)
## simulation from the Type 0 zero inflated poisson model
n=500
p = 0.5
z <- rbinom(n, size=1, prob=p)
## simulate from unconditional data y0 ~ Poi(e^eta)
## where eta=x\beta+phi
a = 1
b = 1
x = rnorm(n)
library(geoR)
phi = grf(500,cov.pars=c(1,0.2))
eta = a + b*x + phi$data/3
lambda = exp(eta)
y0 = rpois(n, lambda = lambda)
## combine the zero inflation process with unconditional data y0
y <- y0
y[z==1] <- 0
## create a SPDE model
m1  <- inla.mesh.2d(phi$coords,max.edge=c(0.4,1),cutoff=0.1)
A <- inla.spde.make.A(m1, loc=phi$coords)
spde <- inla.spde2.matern(m1, alpha=2)
mesh.index <- inla.spde.make.index(name="field", n.spde=spde$n.spde)
stk.dat <- inla.stack(data=list(y=y), A=list(A,1), tag="est",
                        effects=list(c(mesh.index,list(Intercept=1)),
                                     list(x=x)))
## fit the zeroinflation model
formula = y ~ -1+Intercept+ x + f(field,model=spde)
result0 = inla(formula, family = "zeroinflatedpoisson0", data = inla.stack.data(stk.dat),
               control.compute=list(config=TRUE),
               control.predictor=list(A=inla.stack.A(stk.dat),link=1))
library(lattice)
xyplot(result0$summary.fitted.values$mean[1:500]~y|ifelse(y>0,"non-zero","zero"),xlab="data",ylab="predited")

If so, can the "sample" facility be used for fitting this model?


## try the sample facility to get the posterior predictive distribution
samples <- inla.posterior.sample(99,result0)

pred <- function(l)
{
  #mu  <- l$latent["Intercept.1",1] + l$latent["x.1",1] * x
  eta <- l$latent[1:500,1] ## extract the linear predictor in log scale
  z <- rbinom(500,size=1,1-l$hyperpar[1]) ## extract the zero inflation probability
  lambda  <-  exp(eta)   ## apply the link function
  y <- rpois(500,lambda=lambda) ## posterior sample of non-zero inflated data
  #print(p)
  z *y  ## apply zero inflation
}

ypred <- sapply(samples,pred) ## generate posterior predictive distribution

m0 <- apply(ypred,1,median)
xyplot(m0~y|ifelse(y>0,"non-zero","zero"),xlab="data",ylab="predited")



Thanks
Dong 

Håvard Rue

unread,
Jul 5, 2015, 3:57:59 AM7/5/15
to Dong Liang, r-inla-disc...@googlegroups.com
On Wed, 2015-07-01 at 08:19 -0700, Dong Liang wrote:
> Dear all,
>
> I have also similar count data with zero inflation, and would like to
> fit spatial model.
>
> I have question regarding the "summary.fitted.values" from a type 0
> zero inflated poisson model.
>
> Please see attached a simulated example, it seems that the
> "summary.fitted.values" does not account for zero-inflation,

summary.fitted.values is just the mapping of the inverse-link applied
to the linear predictor, like

eta[i]

vrs

exp(eta[i])

so it is not always an appropriate predictor regarding future data
-values.

for the zeroinflated models, if you want to predict data-values, you
have to do this yourself, maybe easiest using simulation
(inla.posterior.sample()), or simply ignoring the dependence between
the linear predictor and the zero-prob parameter, and just pretend
their independent.

Best
H


--
Håvard Rue
Department of Mathematical Sciences
Norwegian University of Science and Technology
N-7491 Trondheim, Norway
Voice: +47-7359-3533 URL : http://www.math.ntnu.no/~hrue
Mobile: +47-9260-0021 Email: havar...@math.ntnu.no

R-INLA: www.r-inla.org

Reply all
Reply to author
Forward
0 new messages