Different outputs with E and offset arguments in nbinomial

1,506 views
Skip to first unread message

Víctor Hugo Gutiérrez Vélez

unread,
Nov 14, 2014, 11:35:07 AM11/14/14
to r-inla-disc...@googlegroups.com
Dear inla people,

This is a follow up from my previous post.

I am running a negative binomial model which tries to account for differences in area in districts. I am triying to understand whty I am obtaining widely different results when I use E=district_area vs. offset = log(district_area).

When using E=district_area, the non-linear effect of one of the covariates seems to be overfitted (see emod.pdf). This is not the problem when using offset=log(district_area) (seeoffsetmod.pdf). DIC values are also widely different 62060 vs. 15659.

Shouldn't the results be the same given that, as the documentation says: "E represents known constant and log(E) is the offset of η"?

In both cases the verbose shows me this kind of message:
            file: smtp-taucs.c  hgid: 718682c6287f  date: Wed Nov 05 12:49:49 2014 +0100
            Function: GMRFLib_build_sparse_matrix_TAUCS(), Line: 712, Thread: 0
            Variable evaluates to NAN or INF. idx=(1,1). I will try to fix it...
            file: smtp-taucs.c  hgid: 718682c6287f  date: Wed Nov 05 12:49:49 2014

Here are the model specifications:

formula=occurrences~f(inla.group(cov1), model="rw2") + cov2 +  f(district_id, model="bym", graph=g)

e.cheap.mod= inla(formula, data=Data, family="nbinomial", verbose=TRUE, control.fixed=list(prec.intercept=0.1, expand.factor.strategy='inla'), control.inla=list(strategy="gaussian", int.strategy="eb"),  E=district_area)

offset.cheap.mod= inla(formula, data=Data, family="nbinomial", verbose=TRUE, control.fixed=list(prec.intercept=0.1, expand.factor.strategy='inla'), control.inla=list(strategy="gaussian", int.strategy="eb"),  offset=log(district_area))

Thanks for any help,

Victor.

offsetmod.pdf
emod.pdf

Víctor Hugo Gutiérrez Vélez

unread,
Nov 14, 2014, 1:29:23 PM11/14/14
to r-inla-disc...@googlegroups.com

Sorry, I believe my previous post wasn’t clear enough.

If in negative binomial responses, the mean is linked to the linear predictor by

μ = E exp(η)

Then
log(μ) = log(E) + η

Where log (E) is the offset term.

Then why when I run a model as:
r = inla(y ~ 1 +cov + f(u, model="bym"), family= "poisson", data = data.frame(y, cov, u), E = E)

I am getting widely different results than when I run the model as below? Shouldn’t they yield similar results?

r = inla(y ~ 1 +cov + f(u, model="bym"), family= "poisson", data = data.frame(y, cov, u), offset= log(E))

I hope I explained myself better.

Victor.

Håvard Rue

unread,
Nov 16, 2014, 8:28:09 AM11/16/14
to Víctor Hugo Gutiérrez Vélez, r-inla-disc...@googlegroups.com
this is strange, but might be related to the error message, saying that
you get an overflow at the first linear predictor.

can you run this testprogram to verify?

Thanks,
H



n = 100
E = runif(n)
x = runif(n)
eta = E*exp(x)
y = rpois(n, eta)

r = inla(y ~ 1 + x,
data = data.frame(y, x, E),
family = "nbinomial",
control.inla = list(tolerance = 1e-9),
E=E)

rr = inla(y ~ offset(log(E)) + 1 + x,
data = data.frame(y, x, E),
control.inla = list(tolerance = 1e-9),
family = "nbinomial")

rrr = inla(y ~ 1 + x,
data = data.frame(y, x, E),
family = "nbinomial",
control.inla = list(tolerance = 1e-9),
offset=log(E))

print(cbind(r = r$summary.fixed$mean,
rr = rr$summary.fixed$mean,
rrr = rrr$summary.fixed$mean))

print(cbind(r = r$summary.fixed$sd,
rr = rr$summary.fixed$sd,
rrr = rrr$summary.fixed$sd))

print(cbind(r = r$mlik,
rr = rr$mlik,
rrr = rrr$mlik))




--
Håvard Rue
Department of Mathematical Sciences
Norwegian University of Science and Technology
N-7491 Trondheim, Norway
Voice: +47-7359-3533 URL : http://www.math.ntnu.no/~hrue
Mobile: +47-9260-0021 Email: havar...@math.ntnu.no

R-INLA: www.r-inla.org

Reply all
Reply to author
Forward
0 new messages