Hi all:
I am a bit uncertain about how the prediction and mapping inla performs across the entire gridded map (of Spain, say e.g.) when there are no data available for covariates for non-sampled grids. I am using the terminology grid very loosely to mean the entire map of mainland Spain. I went through the Spain's pollution PM25 example and was able to produce the maps for three years (2015/16/17). This was for a constant-only model.
https://www.paulamoraga.com/book-geospatial/sec-geostatisticaldataexamplest.html
I then added a Time trend covariate and reran the model which gave me the maps again. I was not quite sure how it did the prediction without the Time trend available for the entire region.
Sample of codes
###Squared grid
bb <- st_bbox(m) $ m is border of mainland Spain
x <- seq(bb$xmin - 1, bb$xmax + 1, length.out = 50)
y <- seq(bb$ymin - 1, bb$ymax + 1, length.out = 50)
dp <- as.matrix(expand.grid(x, y)) # plot(dp); this gridded Spain will be repeated and stacked three times later
plot(dp, asp = 1)
plot(m) # prints three maps for three years.
##Select grids within Spain only
#Then, we keep only the locations that lie within the map of Spain (Figure 10.10).
p <- st_as_sf(data.frame(x = dp[, 1], y = dp[, 2]),
coords = c("x", "y")
)
st_crs(p) <- st_crs(25830)
ind <- st_intersects(m, p)
dp <- dp[ind[[1]], ]
plot(dp, asp = 1)
####
stk.e <- inla.stack(
tag = "est",
data = list(y = d$value),
A = list(1, A),
effects = list(data.frame(b0 = rep(1, nrow(d)), Time = d$Time), s = indexs)
)
stk.p <- inla.stack(
tag = "pred",
data = list(y = NA), # poll data for other non-station places are not available...
A = list(1, Ap),
effects = list(data.frame(b0 = rep(1, nrow(dp),Time=d$Time)), s = indexs)
)
stk.full <- inla.stack(stk.e, stk.p)
rprior <- list(theta = list(prior = "pccor1", param = c(0, 0.9)))
#The formula is defined as
formula <- y ~ 0 + b0 + Time+ f(s,
model = spde, group = s.group,
control.group = list(model = "ar1", hyper = rprior)
)
#I then went through the rest of the plotting codes to get the three predicted maps for three years just like the one for the constant-only model.
index <- inla.stack.index(stack = stk.full, tag = "pred")$data
res$summary.fitted.values
dp <- data.frame(dp)
names(dp) <- c("x", "y", "time")
dp$pred_mean <- res$summary.fitted.values[index, "mean"]
dp$pred_ll <- res$summary.fitted.values[index, "0.025quant"]
dp$pred_ul <- res$summary.fitted.values[index, "0.975quant"]
library(reshape2)
dpm <- melt(dp,
id.vars = c("x", "y", "time"),
measure.vars = c("pred_mean", "pred_ll", "pred_ul")
)
head(dpm)
ggplot(m) + geom_sf() + coord_sf(datum = NA) +
geom_tile(data = dpm, aes(x = x, y = y, fill = value)) +
labs(x = "", y = "") +
facet_wrap(variable ~ time) +
scale_fill_viridis("PM2.5") +
theme_bw()
########################################################
Q: I was not quite sure how it knows the value of the time trend for those non-sampled stations. For example, a more realistic model may look like this:
PM2.5 (it)= a + b * TimeTrend(t) + c* NoOfFactories(it) + f (etc...)
where i = stations, t = time.
In this case, I have NoOfFactories around the station.
Could I still predict those other areas of Spain without the covariates available to me? Or should I assume that most examples I see in INla books for constant-only model may not be applicable for the model with covariates, especially to predict the entre region. I have seen example of elevation raster image being used as a covariate which makes sense as these elevations can be extracted for the entire region. I am interested in the model like above.
I would appreciate any tips you may provide.