Intercept type in spatial model provides different estimates of fixed effects

317 views
Skip to first unread message

Marie-Christine Rufener

unread,
Sep 20, 2022, 6:13:05 AM9/20/22
to R-inla discussion group

Hi all,

 I’m currently testing a set of poisson/nb models, some of which include a spatial effect and others not (for comparison purposes). For your record, my model includes both continuous and factor variables as fixed-effect terms.


While following the extensive (and very helpful) INLA material available online, I noticed that one has to explicitly remove the Intercept term when using a spde model (and add it in the inla.stack effects). This is not the case when running a ‘simple’ model without spatial random effect, whereby the formula uses R’s internal intercept specification.

 

Why does one have to make such distinction explicitly? I spent quite some time reading many INLA material, but couldn’t find any palatable answer to that.

 

That said, I also noticed that estimates of fixed effects changed remarkably between a spatial model that removes the intercept explicitly from R’s standard model formula (see mod1 below), and one that uses R’s internal formula intercept (see mod2 below). The most drastic difference happens for the factor variable, where in the first case the first level of the first factor isn’t even absorbed by the intercept, whereas the one from the second factor variable is. This also leads to estimates with very high standard deviations (see summary of mod1 below).

 

Why is this odd behavior happening? I’m certainly missing some important link, but can't figure out why. In either case, it would be very nice to understand this aspect of INLA better, including why the first factor level is not being absorbed when using INLA standard intercept approach for spde models.

 

Below I include part of my code and the summary of the two models to understand my issue.The model I’m showing has two continuous (offNadirAngle & PopDens) and two factor variables (Image_resolution (three levels) & Cloud_presence(two levels)) as fixed effects.

 

Thank you in advance very much for your time and guidance on this issue.

 

-Marie

 

#################    CODE    #################

 

 

## Estimation stack

#~~~~~~~~~~~~~~~~

 

stk.est <- inla.stack(

  data = list(Ncars = df$Ncars),

  A = list(A, 1),

  effects = list(list(spatial = 1:spde$n.spde),

                 data.frame(Intercept = 1,

                            Image_resolution = df$Image_resolution,

                            offNadirAngle = df$offNadirAngle,

                            Cloud_presence = df$Cloud_presence,

                            Effort = df$SQKM,

                            PopDens = df$Worldpop_dens)),

  tag = 'estimation')

 

 

## Models

#~~~~~~~~

 

# model 1: Uses the intercept specified in the estimation stack (stk.est), with R's internal intercept explicityl removed

mod1 <- inla(Ncars ~ 0 + Intercept + Image_resolution + Cloud_presence + offNadirAngle + log1p(PopDens) + f(spatial, model = spde)+ offset(log(Effort)),

             family="poisson",

             control.compute = list(dic=T, waic=T, cpo=T),

             data = inla.stack.data(stk.est),

             control.predictor = list(A = inla.stack.A(stk.est))) 

 

 

# model 2: Uses R's internal intercept in the model formula

mod2 <- inla(Ncars ~ Image_resolution + Cloud_presence + offNadirAngle + log1p(PopDens) + f(spatial, model = spde)+ offset(log(Effort)),

             family="poisson",

             control.compute = list(dic=T, waic=T, cpo=T),

             data = inla.stack.data(stk.est),

             control.predictor = list(A = inla.stack.A(stk.est))) 

 

 

## Model summaries

#~~~~~~~~~~~~~~~~~~

 

### For mod1

 

summary(mod1)

 

Time used:

    Pre = 2.58, Running = 13.2, Post = 0.0677, Total = 15.8

Fixed effects:

                      mean     sd 0.025quant 0.5quant 0.975quant   mode kld

Intercept           -0.784 15.813    -31.796   -0.784     30.227 -0.784   0

Image_resolution0.3  0.271 15.811    -30.737    0.271     31.280  0.271   0

Image_resolution0.4 -0.261 15.811    -31.269   -0.261     30.748 -0.261   0

Image_resolution0.5 -0.811 15.811    -31.819   -0.811     30.198 -0.811   0

Cloud_presenceNo     0.050  0.001      0.048    0.050      0.051  0.050   0

offNadirAngle       -0.013  0.000     -0.013   -0.013     -0.013 -0.013   0

log1p(PopDens)       0.726  0.046      0.636    0.726      0.817  0.726   0

 

Random effects:

  Name   Model

    spatial SPDE2 model

 

Model hyperparameters:

                    mean    sd 0.025quant 0.5quant 0.975quant  mode

Theta1 for spatial -2.19 0.314      -2.85    -2.17      -1.62 -2.09

Theta2 for spatial  1.69 0.289       1.17     1.67       2.30  1.60

 

Deviance Information Criterion (DIC) ...............: 3298720.68

Deviance Information Criterion (DIC, saturated) ....: 3303741.23

Effective number of parameters .....................: 3873.00

 

Watanabe-Akaike information criterion (WAIC) ...: 642107.11

Effective number of parameters .................: 292598.15

 

Marginal log-Likelihood:  -1695542.82

CPO, PIT is computed

Posterior summaries for the linear predictor and the fitted values are computed

(Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

 

 

 

### For mod2

 

summary(mod2)

 

Time used:

    Pre = 2.56, Running = 8.89, Post = 0.0609, Total = 11.5

Fixed effects:

                      mean    sd 0.025quant 0.5quant 0.975quant   mode kld

(Intercept)         -0.258 0.150     -0.553   -0.258      0.036 -0.257   0

Image_resolution0.4 -0.532 0.001     -0.534   -0.532     -0.531 -0.532   0

Image_resolution0.5 -1.082 0.001     -1.084   -1.082     -1.080 -1.082   0

Cloud_presenceNo     0.050 0.001      0.048    0.050      0.051  0.050   0

offNadirAngle       -0.013 0.000     -0.013   -0.013     -0.013 -0.013   0

log1p(PopDens)       0.727 0.046      0.637    0.727      0.817  0.727   0

 

Random effects:

  Name   Model

    spatial SPDE2 model

 

Model hyperparameters:

                    mean    sd 0.025quant 0.5quant 0.975quant  mode

Theta1 for spatial -2.16 0.302      -2.79    -2.14      -1.61 -2.08

Theta2 for spatial  1.67 0.284       1.15     1.65       2.26  1.59

 

Deviance Information Criterion (DIC) ...............: 3298720.68

Deviance Information Criterion (DIC, saturated) ....: 3303741.23

Effective number of parameters .....................: 3873.00

 

Watanabe-Akaike information criterion (WAIC) ...: 642107.08

Effective number of parameters .................: 292598.14

 

Marginal log-Likelihood:  -1695538.46

CPO, PIT is computed

Posterior summaries for the linear predictor and the fitted values are computed

(Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

Finn Lindgren

unread,
Sep 27, 2022, 4:52:13 AM9/27/22
to Marie-Christine Rufener, R-inla discussion group
[The original message had been flagged as spam]

The problem is that inla.stack() can't do anything clever with factor
inputs (due to the complicated ways in which the R handling of
"factor" variables work), so all levels will be used.
Regular R formula handling will detect the extra degree of freedom in
~ 1 + Image_resolution
and remove the first factor level, but with inla.stack and ~ 0 +
Intercept + Image_resolution that doesn't work.
A workaround is to create the two needed 0/1 dummy variables yourself,
e.g. by extracting the "0.4" and "0.5" columns from
model.matrix(~ 0 + Image_resolution, data = data)
adding them to your data object, so you can then use them directly in
the inla formula.
This is likely the simplest solution in your case, since there are
only three levels in the factor, and it's just doing what the normal
lm() behaviour would do.

Another alternative is to use the inlabru package interface instead,
that has a model component type that can handle specification of both
factors and interaction terms; anything that
MatrixModels::model.Matrix() can handle:
For your model, something like this:

library(inlabru)
fit <- bru(Ncars ~ 0 + fixed(~ Intercept + Image_resolution +
Cloud_presence + offNadirAngle + log1p(Worldpop_dens), model =
"fixed") +
spatial(cbind(easting, northing), model = spde)+ offset(log(SQKM)),
family="poisson",
data = df,
options =list(control.compute = list(dic=T, waic=T, cpo=T))

Note the absence of any inla.stack calls; that's all handled internally.
The "cbind(easting, northing)" part depends on how your
data/coordinates are stored. If it's an sp object,
spatial(coordinates, model=spde)
should work. For sf, we're currently implementing full support, but I
think something like
spatial(st_coordinates(.data.), model=spde)
would work already.

Finn

On Tue, 20 Sept 2022 at 11:13, Marie-Christine Rufener
> --
> You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/b2e8fdb0-641c-4408-83b4-1694eba1c88an%40googlegroups.com.



--
Finn Lindgren
email: finn.l...@gmail.com

Marie-Christine Rufener

unread,
Sep 28, 2022, 1:46:34 AM9/28/22
to R-inla discussion group
Dear Finn,

Many thanks for your insightful explanation.
It certainly clarified my doubts I was having on all this.

I have one additional question, though:
When you say that I could manually create the dummy variables and insert them in my data object/inla formula, 
do I still need to keep the actual Image_resolution variable, and add the last two dummy levels on top of that (modA)? Or do simply add a dummy variable for each level (modB)?
See short example below:

######################
## Add dummy variable for each factor level (df will be specified in the stk.est object)
df$Img03 <- model.matrix(~ 0 + Image_resolution, data = df)[,1]
df$Img04 <- model.matrix(~ 0 + Image_resolution, data = df)[,2]
df$Img05 <- model.matrix(~ 0 + Image_resolution, data = df)[,3]

modA <- inla(Ncars ~ 0 + Intercept + Image_resolution + Img04 + Img05 + Cloud_presence + offNadirAngle + log1p(PopDens) + f(spatial, model = spde)+ offset(log(Effort)),
family="poisson",
control.compute = list(dic=T, waic=T, cpo=T),
data = inla.stack.data(stk.est),
control.predictor = list(A = inla.stack.A(stk.est))

modB <- inla(Ncars ~ 0 + Intercept + Img03 + Img04 + Img05 + Cloud_presence + offNadirAngle + log1p(PopDens) + f(spatial, model = spde)+ offset(log(Effort)),
family="poisson",
control.compute = list(dic=T, waic=T, cpo=T),
data = inla.stack.data(stk.est),
control.predictor = list(A = inla.stack.A(stk.est))

######################

Kind Regards,
Marie

Finn Lindgren

unread,
Sep 28, 2022, 10:46:04 AM9/28/22
to Marie-Christine Rufener, R-inla discussion group
It's neither of the two, but rather this:

Ncars ~ 0 + Intercept + Img04 + Img05 + Cloud_presence + ...

One can either use all three levels and remove the intercept, or keep
the intercept and use only 2 of the 3 factor levels.
If there are several factor variables, it's common to include an
intercept and remove the first level of each factor variable.

Finn

On Wed, 28 Sept 2022 at 06:46, Marie-Christine Rufener
> To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/1229322a-d2b5-4481-8db7-75fe69578091n%40googlegroups.com.

Marie-Christine Rufener

unread,
Sep 29, 2022, 12:59:15 AM9/29/22
to R-inla discussion group
Ok, that makes sense.

I will run a couple of models today with this new configuration.
Thanks for the valuable hints.

Best,
Marie

Reply all
Reply to author
Forward
0 new messages