How to interpret inla outputs that include categorical variables?

1,573 views
Skip to first unread message

krtmc...@gmail.com

unread,
Sep 28, 2018, 11:08:57 AM9/28/18
to R-inla discussion group
Hi,
I’d like some help figuring out how to interpret the output from inla for categorical variables. 

 
The suggestion was as follows: “To get the behaviour you describe, you can instead use an f(...,model=“iid”, constr=TRUE) random effects model and impose a sum to zero constrain there (and optionally fix the precision parameter, to mimic the behaviour of a regular factor covariate).”

I would like to know how this this differs from using “control.fixed=list(expand.factor.strategy="inla")”.
It gives the output below. How should I interpret this? The 95% credible intervals for the most important levels of the categorical variable do not contain zero? How is this determined. i.e. was it compared to the mean of all of the levels? Also, how different is this interpretation when compared with using f(...,model=“iid”, constr=TRUE)? Thanks for your help.


Kurt

######For single variable soil..,
stack.est = inla.stack(data = list(AGB=est.data), A = list(A.est, 1),
effects = list(c(field.indices), data.frame (Intercept=1, soil=biomass4$DOMSOI_12, Plot = inla.group(biomass4$Plot))),
tag="est")

formula <- AGB ~ -1 + Intercept + soil +  f(Plot, model="rw1")+  f(field, model=spde)

family = 'gaussian'
control.family = list(hyper = list(prec = list(
  prior = "pc.prec", fixed = FALSE, param = c(prior.median.gaus.sd,0.5))))


#-- Call INLA and get results --#
mod =   inla(formula,
             data=inla.stack.data(stack.est),
             family=family,
             control.predictor=list(A=inla.stack.A(stack.est), compute=TRUE),
  control.fixed=list(expand.factor.strategy="inla"),
             control.compute=list(cpo=TRUE, dic=TRUE,   waic = TRUE, config=TRUE),
             keep=FALSE, verbose=TRUE)

mod$dic$dic
print(summary(mod))


Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant  mode
Intercept 202.5691 8.5827 185.7158 202.5694 219.4049 202.5709
Ah 61.4027 20.4614 21.1749 61.4197 101.499 61.4557
Ao 6.5099 14.4237 -21.8095 6.5086 34.8105 6.5071
Ap 53.892 21.0617 12.4911 53.9072 95.1706 53.9394
Bd -5.5688 25.786 -56.1909 -5.5717 45.0225 -5.5752
Be 2.6193 29.3545 -55.0186 2.6199 60.2011 2.6235
Bh 9.5713 24.52 -38.5799 9.5733 57.6671 9.5794
E -23.0918 25.8093 -73.7437 -23.1 27.5594 -23.1144
Fh 10.5801 29.2085 -46.7717 10.5811 67.8739 10.5856
Fo 45.6997 14.8441 16.5363 45.7046 74.8089 45.7157
Fr 16.9426 29.2151 -40.4254 16.9447 74.2465 16.9513
Fx 69.1909 11.6517 46.305 69.193 92.0438 69.1983
Gd 85.9835 14.3183 57.8385 85.9933 114.0484 86.0142
Ge 7.2266 29.2058 -50.1182 7.227 64.5167 7.2303
Gm -11.5483 27.3186 -65.1754 -11.5524 42.052 -11.5582
I 4.9352 12.4293 -19.4662 4.9334 29.3243 4.9307
Je -27.2367 20.4151 -67.2947 -27.2467 12.8391 -27.2648
Kl -31.0783 19.3554 -69.0542 -31.0888 6.9207 -31.1083
Lc -99.5886 19.9491 -138.6698 -99.6196 -60.3725 -99.6803
Lf 16.633 29.2146 -40.7339 16.6351 73.9359 16.6418
Lo -46.6099 24.5689 -94.8031 -46.6263 1.6298 -46.6572
Nd 3.0787 20.4036 -36.9839 3.078 43.1079 3.0785
O -12.457 29.2103 -69.8002 -12.4601 44.8512 -12.4639
Od 20.3532 23.4661 -25.7392 20.3587 66.373 20.3716
Tm -22.655 29.2237 -80.0193 -22.6599 34.6842 -22.6673
Tv 26.6877 29.2302 -30.7151 26.6915 84.0166 26.7018
WR 28.6334 19.3648 -9.4104 28.6398 66.6068 28.6542
Zg 12.4649 20.4067 -27.6122 12.4672 52.4927 12.4733
Random effects:
Name      Model
 Plot   RW1 model 
field   SPDE2 model 
Model hyperparameters:
                                             mean       sd 0.025quant  0.5quant 0.975quant      mode
Precision for the Gaussian observations    0.0002   0.0000     0.0002    0.0002     0.0002    0.0002
Precision for Plot                        53.1149  14.0537    32.1695   50.8144    86.8022   46.3932
Range for field                         2181.8808 973.7679   855.1236 1993.7392  4608.8956 1665.5552
Stdev for field                            0.7496   0.4021     0.2514    0.6583     1.7860    0.5124
Expected number of effective parameters(std dev): 13.25(0.2999)
Number of equivalent replicates : 24.45 
Deviance Information Criterion (DIC) ...: 3740.01
Effective number of parameters .........: 13.68
Watanabe-Akaike information criterion (WAIC) ...: 3743.41
Effective number of parameters .................: 16.00
Marginal log-Likelihood:  -1932.28 
CPO and PIT are computed
Posterior marginals for linear predictor and fitted values computed

krtmc...@gmail.com

unread,
Nov 1, 2018, 6:34:15 PM11/1/18
to R-inla discussion group
Can anyone help??

Helpdesk

unread,
Nov 3, 2018, 5:33:33 AM11/3/18
to krtmc...@gmail.com, R-inla discussion group
the issue about 'factors' is that in the freq viewpoint, the common
approach is to chose a reference category, and then everything else is
relative to that one, this is what happen with 'lm()' and relatives,

> model.matrix(~1+x, data=data.frame(x=as.factor(c(1,2,3))))
(Intercept) x2 x3
1 1 0 0
2 1 1 0
3 1 0 1
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$x
[1] "contr.treatment"


from a Bayesian point of view, this leads to a non-exchangable prior
for 'x', as the marginal variance depends on the choice of reference
category. As an alternative, you can use the iid which sums to zero,

f(x, model="iid", constr=T)

so that the average effect of 'x' is zero.

this one, explains a third strategy

http://www.r-inla.org/faq#TOC-How-does-inla-deal-with-NA-s-in-the-data-argument-

by setting the expand.factor.strategy="inla", then you'll get a
'singular' expandsion without the reference group, but then only
contrasts are likelihood identifiable. see the refence for details.


there is no single best solution here...

Best
H
> > .4049 202.5709
> --
> You received this message because you are subscribed to the Google
> Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to r-inla-discussion...@googlegroups.com
> .
> To post to this group, send email to
> r-inla-disc...@googlegroups.com.
> Visit this group at
> https://groups.google.com/group/r-inla-discussion-group.
> For more options, visit https://groups.google.com/d/optout.

--
Håvard Rue
Helpdesk
he...@r-inla.org

krtmc...@gmail.com

unread,
Nov 5, 2018, 9:20:01 AM11/5/18
to R-inla discussion group
Thanks a million! 

Kurt

Brian

unread,
Jan 24, 2025, 3:22:16 AM1/24/25
to R-inla discussion group
Hello, does this mean that when using f(x, model ="iid", constructed = T),  the effects of 'x' levels are interpreted relative to the overall average effect of the outcome? Specifically, does it imply that the average 'y' value in x1/x2 is higher or lower compared to the average y?

Helpdesk (Haavard Rue)

unread,
Jan 24, 2025, 6:30:01 AM1/24/25
to Brian, R-inla discussion group

I guess you mean f(x, model ="iid", constr=TRUE)

constr=T will say that the sum (or mean), of the iid effects are set to zero,
hence the iid effects are centered. this would ensure, f.ex. that a model

1 + f(x, model="iid", constr=TRUE)

has no confounding between the intercept and iid terms
he...@r-inla.org

Brian

unread,
Jan 24, 2025, 7:06:49 AM1/24/25
to R-inla discussion group
Thank you for the clarification! Just to confirm, with constr=TRUE, the random effects for x (where x is a factor with several levels) capture the deviations of y for each level of x from the overall average (the intercept)?

Helpdesk (Haavard Rue)

unread,
Jan 24, 2025, 2:57:13 PM1/24/25
to Brian, R-inla discussion group
On Fri, 2025-01-24 at 04:06 -0800, Brian wrote:
> Thank you for the clarification! Just to confirm, with constr=TRUE, the random
> effects for x (where x is a factor with several levels) capture the deviations
> of y for each level of x from the overall average (the intercept)?

yes

n=10
> x=as.factor(1:10)
> y=rnorm(n)
> r=inla(y~1 + f(idx,model="iid",constr=TRUE), data=data.frame(y,idx=1:n, x),
family="stdnormal")
> r$summary.fixed
mean sd 0.025quant 0.5quant 0.975quant
(Intercept) 0.3337736279 0.3162254153 -0.2863130495 0.3337736279 0.9538603054
mode kld
(Intercept) 0.3337736279 5.526824971e-11
> r$summary.random$idx$mean
[1] -1.279748355e-04 -1.140560885e-04 8.522459669e-05 1.652662371e-04
[5] -1.344403152e-04 -3.720065591e-05 5.959069135e-05 9.553867382e-05
[9] -8.215610038e-05 9.021291741e-05
> mean(r$summary.random$idx$mean)
[1] 5.120967894e-10

Brian

unread,
Jan 25, 2025, 4:26:23 AM1/25/25
to R-inla discussion group
Thank you!
Reply all
Reply to author
Forward
0 new messages