Questions about the perfect predicted values

510 views
Skip to first unread message

Anna

unread,
Jan 29, 2017, 10:52:18 AM1/29/17
to R-inla discussion group
Dear INLA developers,

Good morning.

I tried the sample scripts of the spatial lag model and implemented these scripts to my dataset as well. The problem is the predicted values are too good to be trusted because R^2 values are always very close to 1. Please look at the sample scripts of the spatial lag model below:

> require(INLA)
> require(spdep)
> data(boston)
> n <- nrow(boston.c)
> boston.c$idx <- 1:n
> lw <- nb2listw(boston.soi)
> W <- as(as_dgRMatrix_listw(lw), "CsparseMatrix")
> f1 <- log(CMEDV) ~ CRIM + ZN + INDUS
> mmatrix <- model.matrix(f1, boston.c)
> zero.variance = list(prec=list(initial = 25, fixed=TRUE))
> e = eigenw(lw)
> re.idx = which(abs(Im(e)) < 1e-6)
> rho.max = 1/max(Re(e[re.idx]))
> rho.min = 1/min(Re(e[re.idx]))
> rho = mean(c(rho.min, rho.max))
> betaprec <- .0001
> Q.beta = Diagonal(n=ncol(mmatrix), betaprec)
> hyper = list(
+   prec = list(
+     prior = "loggamma",
+     param = c(0.01, 0.01)),
+   rho = list(
+     initial=0,
+     prior = "logitbeta",
+     param = c(1,1)))## Fit model
> slmm1 <- inla( log(CMEDV) ~ -1 +
+                  f(idx, model="slm",
+                    args.slm=list(
+                      rho.min = rho.min,
+                      rho.max = rho.max,
+                      W=W,
+                      X=mmatrix,
+                      Q.beta=Q.beta),
+                    hyper=hyper),
+                data=boston.c, family="gaussian",
+                control.predictor=list(compute=T),
+                control.family = list(hyper=zero.variance),
+                control.compute=list(dic=TRUE, cpo=TRUE)
+ )
> fit.slm <- slmm1$summary.fitted.values[1]
> cor(fit.slm, log(boston.c$CMEDV))^2
     [,1]
mean    1  

If we look at R^2 of the regular spatial lag model and the linear regression model, we can see R^2 values are only 0.77 and 0.41.
## spatial lag model
> m2 <- lagsarlm(f1, boston.c, lw)
> cor(predict(m2), log(boston.c$CMEDV))^2
[1] 0.7734959

## linear regression
> summary(lm1 <- lm(f1, data=boston.c))

Call:
lm(formula = f1, data = boston.c)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.92270 -0.18576 -0.02711  0.15459  1.13667 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.2965438  0.0348381  94.625  < 2e-16 ***
CRIM        -0.0176848  0.0017688  -9.998  < 2e-16 ***
ZN           0.0019488  0.0007048   2.765   0.0059 ** 
INDUS       -0.0197747  0.0025694  -7.696 7.49e-14 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3123 on 502 degrees of freedom
Multiple R-squared:  0.4184, Adjusted R-squared:  0.4149 
F-statistic: 120.4 on 3 and 502 DF,  p-value: < 2.2e-16

I tried the spatial error model, spatial lag model, and spatial Durbin model from INLABMA package as well, all of them have R^2 close to 1 no matter which dataset and which likehoods I used. My dataset has R^2 around 0.16 when I used GLM function with binomial regression model. However, the R^2 value increases dramatically to 0.9998 if I use INLA spatial lag model. Could you explain why the predicted values are so good when using INLA spatial models? Do you think these predicted values can be trusted?

Sincerely,
Anna 

Håvard Rue

unread,
Jan 30, 2017, 12:11:49 AM1/30/17
to Anna, R-inla discussion group
Hi,

the issue is that the slm model moves the whole model, including the
likelihood, into the latent field, and hence the 'predicted' and
'fitted' values one normally use, does not make sense. You'll also see
from the code that the gaussian response has essetially zero variance.
you have to use the values from within the slm model itself. VIRGILIO
GOMEZ RUBIO <Virgili...@uclm.es> knows details here...

Best,
H
> -- 
> You received this message because you are subscribed to the Google
> Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to r-inla-discussion...@googlegroups.com
> .
> To post to this group, send email to r-inla-discussion-group@googlegr
> oups.com.
> Visit this group at https://groups.google.com/group/r-inla-discussion
> -group.
> For more options, visit https://groups.google.com/d/optout.

--
Håvard Rue
Department of Mathematical Sciences
Norwegian University of Science and Technology
N-7491 Trondheim, Norway
Voice: +47-7359-3533 URL : http://www.math.ntnu.no/~hrue
Mobile: +47-9260-0021 Email: havar...@math.ntnu.no

R-INLA: www.r-inla.org

Virgilio Gómez Rubio

unread,
Jan 30, 2017, 6:03:43 PM1/30/17
to Havard Rue, Anna, R-inla discussion group
Dear Anna,

In addition to what Havard has said, we are aware of this issue. Roger Bivand reported that  some time ago and I am trying to find out what is going on. Please, note that the slm latent effect is still marked as experimental for a reason...

In principle, for Gaussian models the estimates of the different parameters are very accurate, so I think that an alternative way to obtain the fitted values is to compute a linear combination on the parameters using the actual values of the covariates. Perhaps Havard can correct me here.

For non-Gaussian models we still need to conduct more tests as I am not sure we are using the same link function as other implementations...

I am developing an R package that includes some JAGS code to fit some Spatial Econometrics models to carry out all these comparissons. It is slow but I am using it to compare INLA and MCMC. You can find it here:


There is a doc (still under development) comparing results with INLA and MCMC under SEjags/inst/doc.

You can install the whole package using:

install_github("becarioprecario/SEjags")

Hope this helps.

Virgilio





> .
> To post to this group, send email to r-inla-discussion-group@googlegr
> oups.com.
> Visit this group at https://groups.google.com/group/r-inla-discussion
> -group.
> For more options, visit https://groups.google.com/d/optout.

--
Håvard Rue
Department of Mathematical Sciences
Norwegian University of Science and Technology
N-7491 Trondheim, Norway
Voice:  +47-7359-3533    URL  : http://www.math.ntnu.no/~hrue
Mobile: +47-9260-0021    Email: havar...@math.ntnu.no

R-INLA: www.r-inla.org
--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion-group+unsub...@googlegroups.com.
To post to this group, send an email to r-inla-discussion-group@googlegroups.com.

Anna

unread,
Jan 30, 2017, 7:22:14 PM1/30/17
to R-inla discussion group, hr...@r-inla.org, puxiao...@gmail.com
Dear Dr. Gómez-Rubio,

Thanks for your reply and details explanation. I tried to compute the linear combination of the parameters of the slm model,
but the fitted values are very different from the observed values. Could you show me an example?

Sincerely,
Anna 
To post to this group, send an email to r-inla-disc...@googlegroups.com.

Virgilio Gómez Rubio

unread,
Jan 31, 2017, 6:28:09 AM1/31/17
to Anna, R-inla discussion group, Havard Rue
Hi Anna,

please, check the vignette in the SEjags package. You can install the package and browse it doing this:

devtools::install_github("becarioprecario/SEjags")
library(SEjags)
browseVignettes("SEjags")

Note that, at the moment, the documentation is incomplete and it may change, but it will show you how to compute the fitted values using linear combinations. Also, it seems as if the marginals computed with INLA and MCMC are the same but that the summary statistics are not.

Best,

Virgilio

KC

unread,
Aug 13, 2019, 2:18:14 PM8/13/19
to R-inla discussion group
 I noticed that this post from ~ two years ago as I am having similar problems to those described by Anna. 

I'm trying to run some land cost data through SLM, SDM and SEM models. While the models all converge successfully I am struggling to obtain predictions from the SEM model for un-modeled areas, which seems to provide the best fit for the data. As per suggestions in a couple vignettes, I included the relevant covariates for the locations where I wanted to make predictions and provided response values of NAs for the model. However, using the semm1$summary.fitted.values$mean code just seems to spit out predictions that are identical to the known values. I know one suggestion in this post was to look at some jags code for linear combinations. Have any other solutions been uncovered since that time? Either way, is there example code somewhere that I could use to learn a bit more about how to do this? The previous post mentions code in a jags GitHub location but I couldn't seem to find a relevant code line there either (although I could very well just not be looking for the right thing). 
Input appreciated.
Thanks,
Kaylan

Virgilio Gómez-Rubio

unread,
Aug 14, 2019, 1:09:55 PM8/14/19
to KC, R-inla discussion group
Hi,



I'm trying to run some land cost data through SLM, SDM and SEM models. While the models all converge successfully I am struggling to obtain predictions from the SEM model for un-modeled areas, which seems to provide the best fit for the data. As per suggestions in a couple vignettes, I included the relevant covariates for the locations where I wanted to make predictions and provided response values of NAs for the model. However, using the semm1$summary.fitted.values$mean code just seems to spit out predictions that are identical to the known values.

This is actually a ‘feature’ of how the model is defined as the variance of the Gaussian likelihood is close to zero.

I know one suggestion in this post was to look at some jags code for linear combinations. Have any other solutions been uncovered since that time?

This is something that we need to check in detail and probably address it in the paper where we describe the slm latent effect...

Either way, is there example code somewhere that I could use to learn a bit more about how to do this? The previous post mentions code in a jags GitHub location but I couldn't seem to find a relevant code line there either (although I could very well just not be looking for the right thing). 

The package is now called SEMCMC (as it includes the possibility of fitting models with Stan in addition to JAGS). Hence, this code should work:

devtools::install_github("becarioprecario/SEMCMC")
library(SEMCMC)

The vignette does not seem to be available using vignette() but you can browse it in the local installation or directly on GitHub.

Best,

Virgilio
Reply all
Reply to author
Forward
0 new messages