Dear Finn,
Håvard & INLA users,
1. How do you handle offsets in LGCP models applied to: areal outcomes & continuous/smooth covariates ?
I currently have as my offset: the total population raster at 1km^2 grid, and using it within the LGCP model as shown in code below.
I am not sure if i should modify the integration weights in any way or modify the offset: total population before using it in the LGCP.
Any insights on best practices in this case will be appreciated (also mention the case when we would want offsets at areal level, covariates being pixel level)
# --- Build mesh
mesh <- fm_mesh_2d_inla(
boundary = bnd, # the boundary
max.edge = c(.01, .5),
cutoff = (1/5)*.01,
crs = fm_crs("EPSG:4326")
)
plot(mesh)
# --- the area is so small, so we use these realistic
matern <- inla.spde2.pcmatern(
mesh,
prior.range = c(0.05, 0.5), # P(range < 5.5 km) = 0.5
prior.sigma = c(1, 0.01) # P(σ > 1) = 0.01
)
# --- Define the intermediate zone integration scheme
ips <- fm_int(mesh, samplers = outcome)
agg <- bru_mapper_logsumexp(rescale = FALSE, n_block = nrow(outcome))
dim(ips)
# --- Extract raster covariates at integration points
# Convert integration points to SpatVector for terra::extract
ips_sv <- vect(st_as_sf(ips))
# Extract all raster layers - there are indeed no missing values
cov_vals <- terra::extract(covariates_stack, ips_sv, ID = FALSE, touches = TRUE, small = TRUE)
summary(cov_vals)
# --- Attach covariate columns to ips
ips <- cbind(ips, cov_vals)
# --- Scale continuous covariates
for (v in c("covariate_1", "covariate_2", "covariate_3")) {
if (v %in% names(ips)) {
ips[[v]] <- scale(ips[[v]])
}
}
# --- Define model components and formula
cmp <- ~
Intercept(1) +
myoffset(log(total_population), model = "offset") +
b_1(covariate_1, model = "linear") +
b_2(covariate_2, model = "linear") +
b_3(covariate_3, model = "linear") +
xi(main = geometry, model = matern)
fml <- incidence ~ ibm_eval(
agg,
input = list(weights = weight, block = .block),
state = Intercept + b_1 + b_2 + b_3 + myoffset + xi
)
# --- Construct the likelihood
lik <- bru_obs(
formula = fml,
family = "poisson",
data = ips,
response_data = outcome,
allow_combine=TRUE
)
# --- Fit the model
fit <- bru(
components = cmp,
lik,
options = list(
control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE),
verbose = FALSE,
bru_initial = list(Intercept = -7,
b_1 = -1,
b_2 = 1.0,
b_3 = .1),
safe = TRUE,
bru_verbose = 2,
bru_max_iter = 20,
control.inla = list(int.strategy = "eb", strategy = "laplace")
)
)
fit
# very init fit: fit.init = fit0
fit0 <- fit
fit <- bru_rerun(fit0)
# saveRDS(fit, 'model/fit.rds')
# --- Areal level predictions:
# areal unit predictions using logsumexp mapper
fml_latent <- ~ ibm_eval(
agg,
input = list(weights = weight, block = .block),
state = (Intercept + b_1 + b_2 + b_3 + xi),
log = T
)
areal_pred <- predict(
fit,
newdata = ips,
formula = fml_latent,
n.samples = 1e3,
num.threads = 7
)
areal_pred <- bind_cols(outcome_data, areal_pred)
2. Secondly, in model fitting, when I get the below verbose, is there anything indicative of 'bad' issues:
bru: Preprocessing
iinla: Iteration 1 [max:20]
iinla: Step rescaling: 83.6% (norm0 = 512, norm1 = 62.67, norm01 = 512)
iinla: Iteration 2 [max:20]
iinla: Step rescaling: 66.2% (norm0 = 55.81, norm1 = 47.13, norm01 = 88.93)
iinla: Max deviation from previous: 1430% of SD, and line search is active
[stop if: <10% and line search inactive]
iinla: Iteration 3 [max:20]
iinla: Step rescaling: 9.02% (norm0 = 5.163, norm1 = 35.69, norm01 = 36.52)
iinla: Max deviation from previous: 2190% of SD, and line search is active
[stop if: <10% and line search inactive]
iinla: Iteration 4 [max:20]
iinla: Step rescaling: 2.72% (norm0 = 2.006, norm1 = 34.78, norm01 = 35.3)
iinla: Max deviation from previous: 773% of SD, and line search is active
[stop if: <10% and line search inactive]
..........
----
3. Could you comment on considerations to take into account when fitting LGCPs on areal data where some areal units are embedded in other areal units, e.g. areal data on regions, and sub-regions as well, where all the sub regions are also contained in these regions themselves. Will standard fitting be okay for such datasets or should I make any adjustments ?
Thanks.