1,295 views

Skip to first unread message

Jan 3, 2013, 5:41:41 AM1/3/13

to r-inla-disc...@googlegroups.com

Dear all,

I have count data with overdispersion, zero-inflation and missing data. I'd like to fit a model to this data to impute the missing data. Below is a reproducible example.

My questions:

1) does this give me predictions on the original scale of the data?

2) some of the marginal distributions go below zero. Does that indicate that these data points have a high probability of being zero?

3) Any suggestions to get random counts based on the marginal distributions?

Best regards,

Thierry

library(INLA)

library(ggplot2)

set.seed(12345)

nLocation <- 40

nYear <- 20

Intercept <- 3

sdYear <- 0.5

sdLocation <- 3

sdNoise <- 1

probZero <- 0.25

probMissing <- 0.2

dataset <- expand.grid(location = seq_len(nLocation), year = seq_len(nYear))

dataset$eta <- with(dataset, Intercept + rnorm(nYear, mean = 0, sd = sdYear)[year] + rnorm(nLocation, mean = 0, sd = sdLocation)[dataset$location] + rnorm(nrow(dataset), mean = 0, sd = sdNoise))

dataset$y <- rpois(nrow(dataset), lambda = exp(dataset$eta))

dataset$nonzero <- rbinom(nrow(dataset), size = 1, prob = 1 - probZero)

dataset$missing <- rbinom(nrow(dataset), size = 1, prob = probMissing)

dataset$y0 <- with(dataset, ifelse(missing == 1, NA, y * dataset$nonzero))

ggplot(dataset, aes(x = year, y = y0, group = location)) + geom_line()

link <- rep(NA, nrow(dataset))

link[which(is.na(dataset$y0))] <- 1

model <- inla(y0 ~ f(year, model = "iid") + f(location, model = "iid"), family = "zeroinflatednbinomial1", data = dataset, control.predictor = list(compute = TRUE), control.compute = list(dic = TRUE))

plot(model)

summary(model)

which(model$summary.fitted.values$mean < 0)

plot(model$marginals.fitted.values[[which.min(model$summary.fitted.values$mean)]], type = "l")

dataset$prediction <- model$summary.fitted.values$mean

ggplot(dataset, aes(x = year, y = prediction)) + geom_line() + geom_point(aes(y = y0)) + facet_wrap(~location, scales = "free_y")

I have count data with overdispersion, zero-inflation and missing data. I'd like to fit a model to this data to impute the missing data. Below is a reproducible example.

My questions:

1) does this give me predictions on the original scale of the data?

2) some of the marginal distributions go below zero. Does that indicate that these data points have a high probability of being zero?

3) Any suggestions to get random counts based on the marginal distributions?

Best regards,

Thierry

library(INLA)

library(ggplot2)

set.seed(12345)

nLocation <- 40

nYear <- 20

Intercept <- 3

sdYear <- 0.5

sdLocation <- 3

sdNoise <- 1

probZero <- 0.25

probMissing <- 0.2

dataset <- expand.grid(location = seq_len(nLocation), year = seq_len(nYear))

dataset$eta <- with(dataset, Intercept + rnorm(nYear, mean = 0, sd = sdYear)[year] + rnorm(nLocation, mean = 0, sd = sdLocation)[dataset$location] + rnorm(nrow(dataset), mean = 0, sd = sdNoise))

dataset$y <- rpois(nrow(dataset), lambda = exp(dataset$eta))

dataset$nonzero <- rbinom(nrow(dataset), size = 1, prob = 1 - probZero)

dataset$missing <- rbinom(nrow(dataset), size = 1, prob = probMissing)

dataset$y0 <- with(dataset, ifelse(missing == 1, NA, y * dataset$nonzero))

ggplot(dataset, aes(x = year, y = y0, group = location)) + geom_line()

link <- rep(NA, nrow(dataset))

link[which(is.na(dataset$y0))] <- 1

model <- inla(y0 ~ f(year, model = "iid") + f(location, model = "iid"), family = "zeroinflatednbinomial1", data = dataset, control.predictor = list(compute = TRUE), control.compute = list(dic = TRUE))

plot(model)

summary(model)

which(model$summary.fitted.values$mean < 0)

plot(model$marginals.fitted.values[[which.min(model$summary.fitted.values$mean)]], type = "l")

dataset$prediction <- model$summary.fitted.values$mean

ggplot(dataset, aes(x = year, y = prediction)) + geom_line() + geom_point(aes(y = y0)) + facet_wrap(~location, scales = "free_y")

Jan 3, 2013, 6:26:54 AM1/3/13

to thierry....@gmail.com, r-inla-disc...@googlegroups.com

On Thu, 2013-01-03 at 02:41 -0800, thierry....@gmail.com wrote:

> Dear all,

>

> I have count data with overdispersion, zero-inflation and missing

> data. I'd like to fit a model to this data to impute the missing data.

> Below is a reproducible example.

>

> My questions:

> 1) does this give me predictions on the original scale of the data?

> 2) some of the marginal distributions go below zero. Does that

> indicate that these data points have a high probability of being zero?

> 3) Any suggestions to get random counts based on the marginal

> distributions?

check if this code runs better...
> Dear all,

>

> I have count data with overdispersion, zero-inflation and missing

> data. I'd like to fit a model to this data to impute the missing data.

> Below is a reproducible example.

>

> My questions:

> 1) does this give me predictions on the original scale of the data?

> 2) some of the marginal distributions go below zero. Does that

> indicate that these data points have a high probability of being zero?

> 3) Any suggestions to get random counts based on the marginal

> distributions?

you forgot to add the argument 'link = link' in control.predictor. the

fitted values are now on the correct scale. this fixed #1 and #2.

#3: Don't understand...

Best,

Håvard

--

INLA help <he...@r-inla.org>

R-INLA

Jan 3, 2013, 7:29:46 AM1/3/13

to r-inla-disc...@googlegroups.com, thierry....@gmail.com, he...@r-inla.org

Dear Håvard,

Thank you for your fast reply.

Meanwhile I found the inla.rmarginal() function which almost does what I want for my question #3. It returns continuous values while I expect integers, like rpois() and rbinom() , since I use a negative binomial family. Would it be ok to round the output of inla.rmarginal()? Round it to the nearest integer or round it downwards?

Best regards,

Thierry

Op donderdag 3 januari 2013 12:26:54 UTC+1 schreef help help het volgende:

Thank you for your fast reply.

Meanwhile I found the inla.rmarginal() function which almost does what I want for my question #3. It returns continuous values while I expect integers, like rpois() and rbinom() , since I use a negative binomial family. Would it be ok to round the output of inla.rmarginal()? Round it to the nearest integer or round it downwards?

Best regards,

Thierry

Op donderdag 3 januari 2013 12:26:54 UTC+1 schreef help help het volgende:

Jan 3, 2013, 8:37:17 AM1/3/13

to thierry....@gmail.com, r-inla-disc...@googlegroups.com, thierry....@gmail.com, he...@r-inla.org

On 3 jan 2013, at 13:29, thierry....@gmail.com wrote:

Meanwhile I found the inla.rmarginal() function which almost does what I want for my question #3. It returns continuous values while I expect integers, like rpois() and rbinom() , since I use a negative binomial family. Would it be ok to round the output of inla.rmarginal()? Round it to the nearest integer or round it downwards?

No, the values you get are not negbin samples, but rather samples from the posterior for the expectation/location _parameter_ of the negbin model. If you only need to sample a single data point you could thus feed the rmarginal value into a sampler for the negbin distribution to get a sampled data point from the posterior predictive data distribution.

However, if you need to jointly sample several data points it gets much more complicated to do correctly, since that requires joint samples from the posterior of the latent field, and expert knowledge is likely needed. Also, I'm not sure those simulated values would be appropriate for imputation, which to me seems a very non-Bayesian thing to do, since Bayesian calculations automatically integrate out the uncertainty of the "missing" values.

Finn

Jan 3, 2013, 9:22:13 AM1/3/13

to r-inla-disc...@googlegroups.com, he...@r-inla.org

Dear Finn,

Thanks for the clarification on the interpretation of the marginal values.

We need to report estimates for the totals: for each year the sum of the counts over all locations. Therefore we would like to impute the missing data.

Is it doable to get joint posterior densities out of INLA?

Best regards,

Thierry, how is more skilled in the frequentist stuff than the bayesian stuff...

Op donderdag 3 januari 2013 14:37:17 UTC+1 schreef Finn Lindgren het volgende:

Thanks for the clarification on the interpretation of the marginal values.

We need to report estimates for the totals: for each year the sum of the counts over all locations. Therefore we would like to impute the missing data.

Is it doable to get joint posterior densities out of INLA?

Best regards,

Thierry, how is more skilled in the frequentist stuff than the bayesian stuff...

Op donderdag 3 januari 2013 14:37:17 UTC+1 schreef Finn Lindgren het volgende:

Jan 3, 2013, 9:36:55 AM1/3/13

to thierry....@gmail.com, r-inla-disc...@googlegroups.com

On Thu, 2013-01-03 at 06:22 -0800, thierry....@gmail.com wrote:

> Dear Finn,

>

> Thanks for the clarification on the interpretation of the marginal

> values.

>

> We need to report estimates for the totals: for each year the sum of

> the counts over all locations. Therefore we would like to impute the

> missing data.

>

> Is it doable to get joint posterior densities out of INLA?

partly. well, you'll get the gaussian part with its correlations.
> Dear Finn,

>

> Thanks for the clarification on the interpretation of the marginal

> values.

>

> We need to report estimates for the totals: for each year the sum of

> the counts over all locations. Therefore we would like to impute the

> missing data.

>

> Is it doable to get joint posterior densities out of INLA?

depending on size of the 'imputations' there are essentially two ways

out:

for all sites with missing observations, sample the observations from

each site using the estimated zero-probability (from the zero-inflated

model) and sample from the linear predictor (inla.rmarginal()) and then

sample the missing obervations. sum it all up and display. this

strategy neglect any dependency within the linear predictors, but at

the same time, this dependency is reduced as you sample new data. you

can compute the posterior correlation matrix for all those linear

predictors, using 'linear combinations'; see the FAQ. then you have to

sample all the linear predictors jointly, but I'm not convinced that

you'll see a big effect...

there are more complicated alternatives as well, which are more

accurate, but this gets very involved and I do not think they will lead

to very different results.

Best

H

Jan 3, 2013, 9:32:16 AM1/3/13

to r-inla-disc...@googlegroups.com

On 03/01/13 15:22, thierry....@gmail.com wrote:

> We need to report estimates for the totals: for each year the sum of

> the counts over all locations. Therefore we would like to impute the

> missing data.

Aha, I see. In the Bayesian setting, what you want is the "posterior
> We need to report estimates for the totals: for each year the sum of

> the counts over all locations. Therefore we would like to impute the

> missing data.

predictive distribution for the total counts". In the case of Gaussian

data that wouldn't involve imputation at all, since one can then

explicitly calculate the distribution of the total. In your case I also

wouldn't call it "imputation", since it involves simulating multiple

samples of the joint posterior (so that one gets both a point estimate

and an uncertainty measure, which is typically the whole point of

running inla...).

> Is it doable to get joint posterior densities out of INLA?

new experimental features I'll email you off-list about it...

/Finn

Jun 17, 2015, 5:43:20 PM6/17/15

to r-inla-disc...@googlegroups.com

Dear group,

I have similar questions with predictive distribution from a zero-inflated likelihood.

My goal is to estimate the total from a sample using a SPDE model. So I need to spatially interpolate over a fine grid, and ideally at the data scale, and use linear combination to estimate the total.

Please check attached my problematic codes.

My questions are:

1) With the link=1 option, is the "summary.fitted.values" prediction before applying zero-inflation, or after zero-inflation?

2) If it is before zero-inflation, the total would depends on the zero inflation probability, and the probability enters non-linearly, so linear combination could not be directly applied. is that right?

3) I also tried the experimental sample facility, to generate predictive distribution manually, but the results do not look right, could you please check the codes and correct error(s)?

Regards

dong

Jul 1, 2015, 11:19:26 AM7/1/15

to r-inla-disc...@googlegroups.com

Dear all,

I have also similar count data with zero inflation, and would like to fit spatial model.

I have question regarding the "summary.fitted.values" from a type 0 zero inflated poisson model.

Please see attached a simulated example, it seems that the "summary.fitted.values" does not account for zero-inflation,

## predictive distribution from zero inflated Poisson model

require(INLA)

## simulation from the Type 0 zero inflated poisson model

n=500

## simulation from the Type 0 zero inflated poisson model

n=500

p = 0.5

z <- rbinom(n, size=1, prob=p)

z <- rbinom(n, size=1, prob=p)

## simulate from unconditional data y0 ~ Poi(e^eta)

## where eta=x\beta+phi

a = 1

b = 1

x = rnorm(n)

library(geoR)

phi = grf(500,cov.pars=c(1,0.2))

eta = a + b*x + phi$data/3

lambda = exp(eta)

y0 = rpois(n, lambda = lambda)

## where eta=x\beta+phi

a = 1

b = 1

x = rnorm(n)

library(geoR)

phi = grf(500,cov.pars=c(1,0.2))

eta = a + b*x + phi$data/3

lambda = exp(eta)

y0 = rpois(n, lambda = lambda)

## combine the zero inflation process with unconditional data y0

y <- y0

y[z==1] <- 0

y <- y0

y[z==1] <- 0

## create a SPDE model

m1 <- inla.mesh.2d(phi$coords,max.edge=c(0.4,1),cutoff=0.1)

A <- inla.spde.make.A(m1, loc=phi$coords)

spde <- inla.spde2.matern(m1, alpha=2)

mesh.index <- inla.spde.make.index(name="field", n.spde=spde$n.spde)

stk.dat <- inla.stack(data=list(y=y), A=list(A,1), tag="est",

effects=list(c(mesh.index,list(Intercept=1)),

list(x=x)))

## fit the zeroinflation model

formula = y ~ -1+Intercept+ x + f(field,model=spde)

result0 = inla(formula, family = "zeroinflatedpoisson0", data = inla.stack.data(stk.dat),

control.compute=list(config=TRUE),

control.predictor=list(A=inla.stack.A(stk.dat),link=1))

m1 <- inla.mesh.2d(phi$coords,max.edge=c(0.4,1),cutoff=0.1)

A <- inla.spde.make.A(m1, loc=phi$coords)

spde <- inla.spde2.matern(m1, alpha=2)

mesh.index <- inla.spde.make.index(name="field", n.spde=spde$n.spde)

stk.dat <- inla.stack(data=list(y=y), A=list(A,1), tag="est",

effects=list(c(mesh.index,list(Intercept=1)),

list(x=x)))

## fit the zeroinflation model

formula = y ~ -1+Intercept+ x + f(field,model=spde)

result0 = inla(formula, family = "zeroinflatedpoisson0", data = inla.stack.data(stk.dat),

control.compute=list(config=TRUE),

control.predictor=list(A=inla.stack.A(stk.dat),link=1))

library(lattice)

xyplot(result0$summary.fitted.values$mean[1:500]~y|ifelse(y>0,"non-zero","zero"),xlab="data",ylab="predited")

xyplot(result0$summary.fitted.values$mean[1:500]~y|ifelse(y>0,"non-zero","zero"),xlab="data",ylab="predited")

If so, can the "sample" facility be used for fitting this model?

## try the sample facility to get the posterior predictive distribution

samples <- inla.posterior.sample(99,result0)pred <- function(l)

{

#mu <- l$latent["Intercept.1",1] + l$latent["x.1",1] * x

eta <- l$latent[1:500,1] ## extract the linear predictor in log scale

z <- rbinom(500,size=1,1-l$hyperpar[1]) ## extract the zero inflation probability

lambda <- exp(eta) ## apply the link function

y <- rpois(500,lambda=lambda) ## posterior sample of non-zero inflated data

#print(p)

z *y ## apply zero inflation

}ypred <- sapply(samples,pred) ## generate posterior predictive distribution

m0 <- apply(ypred,1,median)

xyplot(m0~y|ifelse(y>0,"non-zero","zero"),xlab="data",ylab="predited")

Thanks

Dong

Jul 5, 2015, 3:57:59 AM7/5/15

to Dong Liang, r-inla-disc...@googlegroups.com

On Wed, 2015-07-01 at 08:19 -0700, Dong Liang wrote:

> Dear all,

>

> I have also similar count data with zero inflation, and would like to

> fit spatial model.

>

> I have question regarding the "summary.fitted.values" from a type 0

> zero inflated poisson model.

>

> Please see attached a simulated example, it seems that the

> "summary.fitted.values" does not account for zero-inflation,

summary.fitted.values is just the mapping of the inverse-link applied
> Dear all,

>

> I have also similar count data with zero inflation, and would like to

> fit spatial model.

>

> I have question regarding the "summary.fitted.values" from a type 0

> zero inflated poisson model.

>

> Please see attached a simulated example, it seems that the

> "summary.fitted.values" does not account for zero-inflation,

to the linear predictor, like

eta[i]

vrs

exp(eta[i])

so it is not always an appropriate predictor regarding future data

-values.

for the zeroinflated models, if you want to predict data-values, you

have to do this yourself, maybe easiest using simulation

(inla.posterior.sample()), or simply ignoring the dependence between

the linear predictor and the zero-prob parameter, and just pretend

their independent.

Best

H

--

Håvard Rue

Department of Mathematical Sciences

Norwegian University of Science and Technology

N-7491 Trondheim, Norway

Voice: +47-7359-3533 URL : http://www.math.ntnu.no/~hrue

Mobile: +47-9260-0021 Email: havar...@math.ntnu.no

R-INLA: www.r-inla.org

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu