Good morning.
I tried the sample scripts of the spatial lag model and implemented these scripts to my dataset as well. The problem is the predicted values are too good to be trusted because R^2 values are always very close to 1. Please look at the sample scripts of the spatial lag model below:
> require(INLA)
> require(spdep)
> data(boston)
> n <- nrow(boston.c)
> boston.c$idx <- 1:n
> lw <- nb2listw(boston.soi)
> W <- as(as_dgRMatrix_listw(lw), "CsparseMatrix")
> f1 <- log(CMEDV) ~ CRIM + ZN + INDUS
> mmatrix <- model.matrix(f1, boston.c)
> zero.variance = list(prec=list(initial = 25, fixed=TRUE))
> e = eigenw(lw)
> re.idx = which(abs(Im(e)) < 1e-6)
> rho.max = 1/max(Re(e[re.idx]))
> rho.min = 1/min(Re(e[re.idx]))
> rho = mean(c(rho.min, rho.max))
> betaprec <- .0001
> Q.beta = Diagonal(n=ncol(mmatrix), betaprec)
> hyper = list(
+ prec = list(
+ prior = "loggamma",
+ param = c(0.01, 0.01)),
+ rho = list(
+ initial=0,
+ prior = "logitbeta",
+ param = c(1,1)))## Fit model
> slmm1 <- inla( log(CMEDV) ~ -1 +
+ f(idx, model="slm",
+ args.slm=list(
+ rho.min = rho.min,
+ rho.max = rho.max,
+ W=W,
+ X=mmatrix,
+ Q.beta=Q.beta),
+ hyper=hyper),
+ data=boston.c, family="gaussian",
+ control.predictor=list(compute=T),
+ control.family = list(hyper=zero.variance),
+ control.compute=list(dic=TRUE, cpo=TRUE)
+ )
> fit.slm <- slmm1$summary.fitted.values[1]
> cor(fit.slm, log(boston.c$CMEDV))^2
[,1]
mean 1
If we look at R^2 of the regular spatial lag model and the linear regression model, we can see R^2 values are only 0.77 and 0.41.
## spatial lag model
> m2 <- lagsarlm(f1, boston.c, lw)
> cor(predict(m2), log(boston.c$CMEDV))^2
[1] 0.7734959
## linear regression
> summary(lm1 <- lm(f1, data=boston.c))
Call:
lm(formula = f1, data = boston.c)
Residuals:
Min 1Q Median 3Q Max
-0.92270 -0.18576 -0.02711 0.15459 1.13667
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.2965438 0.0348381 94.625 < 2e-16 ***
CRIM -0.0176848 0.0017688 -9.998 < 2e-16 ***
ZN 0.0019488 0.0007048 2.765 0.0059 **
INDUS -0.0197747 0.0025694 -7.696 7.49e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.3123 on 502 degrees of freedom
Multiple R-squared: 0.4184, Adjusted R-squared: 0.4149
F-statistic: 120.4 on 3 and 502 DF, p-value: < 2.2e-16
I tried the spatial error model, spatial lag model, and spatial Durbin model from INLABMA package as well, all of them have R^2 close to 1 no matter which dataset and which likehoods I used. My dataset has R^2 around 0.16 when I used GLM function with binomial regression model. However, the R^2 value increases dramatically to 0.9998 if I use INLA spatial lag model. Could you explain why the predicted values are so good when using INLA spatial models? Do you think these predicted values can be trusted?
Sincerely,
Anna