Fitting LGCPs in INLABRU on areal(spatial) data

69 views
Skip to first unread message

Stanley Sayianka

unread,
Apr 2, 2025, 8:02:42 AM4/2/25
to R-inla discussion group
Greetings,

I have browsed the comprehensive tutorials on LGCPs in inlabru vignettes online, and I have not encountered examples in which we fit LGCP to instances where only areal data is available (while I do understand LGCPs are meant for point processes, and there exists models for areal count data).

Since my data is only available in aggregated areal resolution of the form e.g.
    ward       pop      pop_density  cases
1 area 1     23669   1811.0612    50
2 area 2     25467    547.5481     12
3 area 3     37690   1292.2057     8
4 area 4     36906    533.8474      2
5 area 5     31060    574.9972      1
6 area 7     49156    472.3919      3

I first "uncounted" the dataset, i.e. for observation 1, I replicate it 50 times, in the dataset etc. using `tidy::uncount()`, and ended up with a final sp dataset, where the geometry is area centroids.

My current approach has been to use area centroids as the points geometry, and proceeding with the inlabru workflow for LGCP models. Is there something I should be cautious about in this case ?

Any warnings/directions will be appreciated.

 

Finn Lindgren

unread,
Apr 2, 2025, 8:38:09 AM4/2/25
to R-inla discussion group
Hi Stanley,

the aggregated areal data case is discussed in https://arxiv.org/abs/2407.00791 and https://arxiv.org/abs/2502.10584
where we show how to properly account for intensity integration over subregions.

The first of the above manuscripts will get a new updated version soon as well, and I will likely merge the feature/aggregate inlabru branch on github soon as well, which adds a direct support feature for these models, making it easier to implement.

I will also discuss such models at https://centreforstatistics.maths.ed.ac.uk/events/upcoming-events/inlabru

Finn

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/r-inla-discussion-group/775974b6-eac8-46d3-ab08-dc94b27d70can%40googlegroups.com.


--
Finn Lindgren
email: finn.l...@gmail.com

Stanley Sayianka

unread,
Apr 9, 2025, 5:22:22 AM4/9/25
to R-inla discussion group
Thanks for the response and resources Finn,
The two papers will take some time for me to read and understand.

I will reach back, in case.

Stanley Sayianka

unread,
Mar 4, 2026, 8:46:58 AM (18 hours ago) Mar 4
to R-inla discussion group
Dear Finn, Håvard & INLA users,

1. How do you handle offsets in LGCP models applied to: areal outcomes & continuous/smooth covariates ?
I currently have as my offset: the total population raster at 1km^2 grid, and using it within the LGCP model as shown in code below. 
I am not sure if i should modify the integration weights in any way or modify the offset: total population before using it in the LGCP.
Any insights on best practices in this case will be appreciated (also mention the case when we would want offsets at areal level, covariates being pixel level)


# --- Build mesh
mesh <- fm_mesh_2d_inla(
  boundary = bnd, # the boundary
  max.edge = c(.01, .5),  
  cutoff   = (1/5)*.01,
  crs      = fm_crs("EPSG:4326")
)
plot(mesh)

# --- the area is so small, so we use these realistic
matern <- inla.spde2.pcmatern(
  mesh,
  prior.range = c(0.05, 0.5),   # P(range < 5.5 km) = 0.5
  prior.sigma = c(1,   0.01)    # P(σ > 1) = 0.01
)

# --- Define the intermediate zone integration scheme
ips <- fm_int(mesh, samplers = outcome)
agg <- bru_mapper_logsumexp(rescale = FALSE, n_block = nrow(outcome))
dim(ips)

# --- Extract raster covariates at integration points

# Convert integration points to SpatVector for terra::extract
ips_sv <- vect(st_as_sf(ips))

# Extract all raster layers - there are indeed no missing values
cov_vals <- terra::extract(covariates_stack, ips_sv, ID = FALSE, touches = TRUE, small = TRUE)
summary(cov_vals)

# --- Attach covariate columns to ips
ips <- cbind(ips, cov_vals)

# --- Scale continuous covariates
for (v in c("covariate_1", "covariate_2", "covariate_3")) {
  if (v %in% names(ips)) {
    ips[[v]] <- scale(ips[[v]])
  }
}

# --- Define model components and formula
cmp <- ~
  Intercept(1) +
  myoffset(log(total_population), model = "offset") +
  b_1(covariate_1,  model = "linear") +
  b_2(covariate_2,  model = "linear") +
  b_3(covariate_3, model = "linear") +
  xi(main = geometry, model = matern)

fml <- incidence ~ ibm_eval(
  agg,
  input = list(weights = weight, block = .block),
  state = Intercept + b_1 + b_2 + b_3 + myoffset + xi
)

# --- Construct the likelihood
lik <- bru_obs(
  formula = fml,
  family = "poisson",
  data = ips,
  response_data = outcome,
  allow_combine=TRUE
)

# --- Fit the model
fit <- bru(
  components = cmp,
  lik,
  options = list(
    control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE),
    verbose = FALSE,

    bru_initial = list(Intercept = -7,
                       b_1 = -1,
                       b_2 = 1.0,
                       b_3 = .1),

    safe = TRUE,
    bru_verbose = 2,
    bru_max_iter = 20,
    control.inla = list(int.strategy = "eb", strategy = "laplace")
  )
)
fit

# very init fit: fit.init = fit0
fit0 <- fit
fit <- bru_rerun(fit0)
# saveRDS(fit, 'model/fit.rds')


# --- Areal level predictions:
# areal unit predictions using logsumexp mapper

fml_latent <- ~ ibm_eval(
  agg,
  input = list(weights = weight, block = .block),
  state = (Intercept + b_1 + b_2 + b_3 + xi),
  log = T
)

areal_pred <- predict(
  fit,
  newdata = ips,
  formula = fml_latent,
  n.samples = 1e3,
  num.threads = 7
)

areal_pred <- bind_cols(outcome_data, areal_pred)



2. Secondly, in model fitting, when I get the below verbose, is there anything indicative of 'bad' issues:

bru: Preprocessing
iinla: Iteration 1 [max:20]
iinla: Step rescaling: 83.6% (norm0 = 512, norm1 = 62.67, norm01 = 512)
iinla: Iteration 2 [max:20]
iinla: Step rescaling: 66.2% (norm0 = 55.81, norm1 = 47.13, norm01 = 88.93)
iinla: Max deviation from previous: 1430% of SD, and line search is active
       [stop if: <10% and line search inactive]
iinla: Iteration 3 [max:20]
iinla: Step rescaling: 9.02% (norm0 = 5.163, norm1 = 35.69, norm01 = 36.52)
iinla: Max deviation from previous: 2190% of SD, and line search is active
       [stop if: <10% and line search inactive]
iinla: Iteration 4 [max:20]
iinla: Step rescaling: 2.72% (norm0 = 2.006, norm1 = 34.78, norm01 = 35.3)
iinla: Max deviation from previous: 773% of SD, and line search is active
       [stop if: <10% and line search inactive]
..........
----

3. Could you comment on considerations to take into account when fitting LGCPs on areal data where some areal units are embedded in other areal units, e.g. areal data on regions, and sub-regions as well, where all the sub regions are also contained in these regions themselves. Will standard fitting be okay for such datasets or should I make any adjustments ?


Thanks.

Finn Lindgren

unread,
Mar 4, 2026, 9:27:42 AM (17 hours ago) Mar 4
to Stanley Sayianka, R-inla discussion group
Your code implements aggregated count model, not a point process, but in both cases I would recommend doing what you do, i.e. including the scaling as an offset in the predictor, as then it automatically adapts to different aggregations, etc.

You can also avoid the manual ivm_eval() call:

fml <- incidence ~ Intercept + b_1 + b_2 + b_3 + myoffset + xi
bru_obs(fml, ..., aggregate = "logsumexp")

The default parameters for the aggregate feature matches what you were using with ibm_eval.
This approach will make it possible for inlabru to speed up the calculations in the future, whereas your manual ibm_eval() call may prevent that.

High resolution covariates require the integration scheme to match that resolution, even when the spde mesh is coarser. Use
  domain=list(geometry=fm_subdivide(mesh, ...))
with a suitable subdivision argument.
If you have aggregated covariate data that really should be on a fine scale, you may need to actually model the covariate.
See e.g. https://onlinelibrary.wiley.com/doi/full/10.1002/env.70078

2. Use bru_convergence_plot() which is usually pretty clear about how well the convergence works.

3. Highly problem specific; you may need to design your model in a clear hierarchical way.

Finn

Finn Lindgren

unread,
Mar 4, 2026, 9:32:09 AM (17 hours ago) Mar 4
to Stanley Sayianka, R-inla discussion group
One big problem with your covariate evaluation approach is that it will require manual evaluation when doing prediction.
Instead, I would recommend defining the components so they're automatically evaluated:

myoffset(log(eval_spatial(covariate_stack, .data., layer = "total_population")), model = "offset") +
  b_1(covariate_stack, main_layer = "covariate_1",  model = "linear") +
  b_2(covariate_stack, main_layer = "covariate_2",  model = "linear") +
  b_3(covariate_stack, main_layer = "covariate_3", model = "linear")

This is assuming "total_population" has been stored in units of "population per unit area".

The multiple terra extractions this implies will of course be a bit slower, but it means that everything will work automatically in calls to predict(), making the user side code much easier to write and maintain.

Finn

Stanley Sayianka

unread,
Mar 4, 2026, 12:25:50 PM (14 hours ago) Mar 4
to R-inla discussion group
Thank you for the response.

A follow-up is: If my covariates are at 1km^2 grids and my mesh specification is approximately max.edge inner~0.5km, and max.edge outer~55km, what exactly informs the parameter 'n' in the code: domain=list(geometry=fm_subdivide(mesh, n = xxx, delaunay = F))

I am reading up on the paper on aggregated covariate data.
Thank you for sharing it.

S.


Finn Lindgren

unread,
Mar 4, 2026, 12:50:37 PM (14 hours ago) Mar 4
to Stanley Sayianka, R-inla discussion group
The "n" specifies how many extra points to add on each edge, so if you want my_length as resolution, you need max.edge / (n+1) <= my_length, i.e. n+1 >= max.edge/my_length, e.g. n = ceiling(max.edge/my_length) - 1  (where max.edge is the setting you used when creating the original mesh for the relevant part f the mesh, i.e. the "inner" part).

But if your inner max.edge already is smaller than the covariate resolution, you don't really need the subdivision; the rough formula above will give you n=0...

Finn 

Finn Lindgren

unread,
Mar 4, 2026, 12:58:51 PM (14 hours ago) Mar 4
to Stanley Sayianka, R-inla discussion group
Also, don’t specify the delaunay argument. The default is what it is for a reason, and should only be overridden in specific circumstances. Use the default. (And never abbreviate TRUE/FALSE)
Finn

On 4 Mar 2026, at 17:50, Finn Lindgren <finn.l...@gmail.com> wrote:



Stanley Sayianka

unread,
Mar 4, 2026, 1:16:09 PM (14 hours ago) Mar 4
to R-inla discussion group
Thank you Finn for the advices.
It is clear now.

Reply all
Reply to author
Forward
0 new messages