Fitting LGCP model to penguin point locations

31 views
Skip to first unread message

Alexandra Strang

unread,
Sep 6, 2025, 8:59:50 AM (3 days ago) Sep 6
to R-inla discussion group
Hi everyone!

As part of my PhD, I have been developing a point process model to relate the location of penguin nests to fine scale spatial covariates such as guano area and terrain (slope, aspect, etc.). Before including any covariates, I built an LGCP model that just accounted for spatial autocorrelation among points and predicted an abundance that was close to my input data (about 200,000 points) and the intensity plot looked good too.

However, when I try to include covariates (with a finer subdivided mesh), the predictions became very large and implausible (millions or infinite penguins!) and sometimes causes the model to crash. The intercept and fixed-effects slopes are also very strange. Each of my covariates are included as rasters at 2 x 2 m resolution. The guano area raster is the proportion of a pixel covered in guano, 1 being fully covered. The guano area is very spatially defined where most of the points (but not all) are contained within it. I have also standardised each covariate to have a mean of 0 and a standard deviation of 1 and ensured they have the same crs (EPSG:3031). I have been using an HPC server to run my r script which has restricted me to R 4.3.2-foss-2023a, with INLA 23.04.24 and inlabru 2.13.0.9008. I have included my code below, including mesh construction, SPDE priors, and fitting LGCP models. Happy to provide more information, my email is: alexandr...@pg.canterbury.ac.nz

Very new to this method and any advice would be greatly appreciated!
Alexandra

# load packages library(sf) library(fmesher) library(ggplot2) library(INLA) library(ggpubr) library(inlabru) # for lgcp() library(dplyr) library(purrr) library(tidyr) library(terra) # for rasters # read in points from xy csv Crozier_xy <- read.csv("Crozier_Points_2020_3031.csv") # convert to sf object and check CRS sf_Crozier <- st_as_sf(Crozier_xy, coords = c("x", "y"), crs = 3031) st_crs(sf_Crozier) # import Crozier coastline boundary sf_Crozier_boundary <- st_read("Crozier_boundary.shp") # transform shapefile so it has the right crs code sf_Crozier_boundary <- st_transform(sf_Crozier_boundary, crs = st_crs(sf_Crozier)) st_crs(sf_Crozier_boundary) # mesh parameters # this is 1/15 study size using x range # x range is longer than y here Crozier_max.edge <- diff(range(st_coordinates(sf_Crozier)[,1]))/(3*5) print(Crozier_max.edge) # ~150 metres # expand outer layer Crozier_bound.outer = diff(range(st_coordinates(sf_Crozier)[,1]))/5 print(Crozier_bound.outer) # ~450 metres # create Crozier mesh Crozier_mesh <- fm_mesh_2d(boundary = sf_Crozier_boundary, max.edge = c(1,5)*Crozier_max.edge, offset = c(Crozier_max.edge, Crozier_bound.outer), cutoff = Crozier_max.edge/10, crs = st_crs(sf_Crozier)) print(Crozier_mesh$n) # null LGCP # define the SPDE priors (matern) matern <- inla.spde2.pcmatern(mesh = Crozier_mesh, prior.range = c(250, 0.5), # distance decay in metres prior.sigma = c(1, 0.5)) # amount of spatial variation # spatial SPDE effect and Intercept null_cmp <- geometry ~ Intercept(1) + mySmooth(geometry, model = matern) # random effect # import guano area shapefile for samplers area sf_Crozier_guano <- st_read("Crozier_2020_1_3031_guano.shp") # ensure shapefile has right crs code
sf_Crozier_guano <- st_transform(sf_Crozier_guano, crs = st_crs(sf_Crozier))

# add 100 metre buffer sf_Crozier_guano_buffered <- st_buffer(sf_Crozier_guano, dist = 100) sf_Crozier_guano_buffered <- st_sf(geometry = st_union(sf_Crozier_guano_buffered)) crs(sf_Crozier_guano_buffered) # fit null lgcp model null_model <- lgcp(null_cmp, # formula data = sf_Crozier, # locations samplers = sf_Crozier_guano_buffered, domain = list(geometry = Crozier_mesh), # mesh options = list( control.inla = list(verbose = TRUE), control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE) ) ) # takes less than ~2 mins to run summary(null_model) # predicting intensity # predict the spatial intensity surface # need to sum over prediction grid # make prediction grid as sf points grid_pts <- fm_pixels( Crozier_mesh, format = "sf", mask = sf_Crozier_guano_buffered
) # predict intensity per m2 lambda <- predict( null_model, grid_pts, # use this instead of fm_pixels ~ exp(mySmooth + Intercept) ) # cell spacing from the point coordinates coords <- st_coordinates(grid_pts) dx <- median(diff(sort(unique(coords[,1])))) dy <- median(diff(sort(unique(coords[,2])))) cell_area <- dx * dy # m2 per pixel # multiply and sum total_pred <- sum(lambda$mean * cell_area) print(total_pred) # Build LGCP model with covarites # use continuous guano raster percent_guano_raster <- rast("CrozierGuano_2m.tif") # change guano crs to match crs(percent_guano_raster) <- "EPSG:3031" crs(percent_guano_raster) res(percent_guano_raster) # 2m x 2m # 2m terrain rasters clipped by mesh slope_raster <- rast("Crozier_terrain_mesh/Cape_Crozier_slope.tif") aspect_raster <- rast("Crozier_terrain_mesh/Cape_Crozier_aspect_corrected.tif") roughness_raster <- rast("Crozier_terrain_mesh/Cape_Crozier_Roughness.tif") TRI_raster <- rast("Crozier_terrain_mesh/Cape_Crozier_TRI.tif") # change corrected aspect crs to match crs(aspect_raster) <- "EPSG:3031" crs(aspect_raster) # match guano area extent to terrain rasters percent_guano_raster <- extend(percent_guano_raster, slope_raster) percent_guano_raster[values(percent_guano_raster) > 1] <- 0 percent_guano_raster[is.na(percent_guano_raster)] <- 0 # scale (mean of 0 sd of 1) standardize <- function(r) { m <- global(r, fun = "mean", na.rm = TRUE)[1, 1] s <- global(r, fun = "sd", na.rm = TRUE)[1, 1] (r - m) / s } # apply to continuous variables percent_guano_raster <- standardize(percent_guano_raster) slope_raster <- standardize(slope_raster) aspect_raster <- standardize(aspect_raster) roughness_raster <- standardize(roughness_raster) TRI_raster <- standardize(TRI_raster) # check for misalignment covariate_plot <- c(percent_guano_raster, slope_raster, aspect_raster, roughness_raster, TRI_raster) plot(covariate_plot) # check for collinearity between covariates cov_stack <- c(percent_guano_raster, slope_raster, aspect_raster, roughness_raster, TRI_raster) cov_values <- as.data.frame(cov_stack, na.rm = TRUE) cor(cov_values) # subdivide mesh # splits triangles into subtriangles mesh_sub <- fm_subdivide(Crozier_mesh,3) # try 1/9 of max.edge print(mesh_sub$n) # update model formula to include covariates matern <- inla.spde2.pcmatern(mesh = mesh_sub, prior.range = c(250, 0.5), # distance decay in metres prior.sigma = c(1, 0.5)) # amount of spatial variation # trial Full_cmp <- geometry ~ Intercept(1) + percentguano(percent_guano_raster, model = "linear") + slope(slope_raster, model = "linear") + mySmooth(geometry, model = matern) # random effect Full_model <- lgcp(Full_cmp, # formula data = sf_Crozier, # locations samplers = sf_Crozier_guano_buffered, # sample area domain = list(geometry = mesh_sub), # mesh options = list( control.inla = list(verbose = TRUE), control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE)) ) summary(Full_model) # need to sum over prediction grid # make prediction grid as sf points grid_pts <- fm_pixels( mesh_sub, format = "sf", mask = sf_Crozier_guano_buffered ) # predict intensity per m2 lambda <- predict( Full_model, grid_pts, # use this instead of fm_pixels ~ exp(mySmooth + Intercept + percentguano + slope) ) # cell spacing from the point coordinates coords <- st_coordinates(grid_pts) dx <- median(diff(sort(unique(coords[,1])))) dy <- median(diff(sort(unique(coords[,2])))) print(dx) print(dy) cell_area <- dx * dy # m2 per pixel print(cell_area) # multiply and sum total_pred <- sum(lambda$mean * cell_area) print(total_pred) # terrain only Terrain_cmp <- geometry ~ Intercept(1) + slope(slope_raster, model = "linear") + aspect(aspect_raster, model = "linear") + mySmooth(geometry, model = matern) # random effect Terrain_model <- lgcp(Full_cmp, # formula data = sf_Crozier, # locations samplers = sf_Crozier_guano_buffered, # sample area domain = list(geometry = mesh_sub), # mesh options = list( control.inla = list(verbose = TRUE), control.compute = list(dic = TRUE, waic = TRUE, cpo = TRUE)) ) summary(Terrain_model) # need to sum over prediction grid # make prediction grid as sf points grid_pts <- fm_pixels( mesh_sub, format = "sf", mask = sf_Crozier_guano_buffered ) # predict intensity per m2 lambda <- predict( Terrain_model, grid_pts, # use this instead of fm_pixels ~ exp(mySmooth + Intercept + slope + aspect) ) # cell spacing from the point coordinates coords <- st_coordinates(grid_pts) dx <- median(diff(sort(unique(coords[,1])))) dy <- median(diff(sort(unique(coords[,2])))) print(dx) print(dy) cell_area <- dx * dy # m2 per pixel print(cell_area) # multiply and sum total_pred <- sum(lambda$mean * cell_area) print(total_pred)

Helpdesk (Haavard Rue)

unread,
Sep 7, 2025, 8:59:02 AM (2 days ago) Sep 7
to Alexandra Strang, R-inla discussion group
please upgrade to R-4.5 and the most recent testing version of R-INLA and
inlabru

in the HPC world, I would just compile and install R-4.5 myself, locally. then
you can upgrade R-INLA and inlabru
> --
> You received this message because you are subscribed to the Google Groups "R-
> inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to r-inla-discussion...@googlegroups.com.
> To view this discussion, visit
> https://groups.google.com/d/msgid/r-inla-discussion-group/c44590c2-7e02-4bcc-a2ec-129c0d856650n%40googlegroups.com
> .

--
Håvard Rue
he...@r-inla.org

Alexandra Strang

unread,
12:23 AM (16 hours ago) 12:23 AM
to R-inla discussion group
Thanks, Håvard! 

I have now upgraded R and the INLA and inlabru packages.

I went back and rechecked my null/ no covariates model, which still predicts a similar abundance to my input points, however I am now getting these error messages:

***[0] thread_num[0] warning *** iterative process seems to diverge, 'vb.correction' is aborted *** Please (re-)consider your model, priors, confounding, etc. ***[2] thread_num[2] warning *** iterative process seems to diverge, 'vb.correction' is aborted ***[2] thread_num[2] warning *** max_correction = 25.01 >= 25.00, 'vb.correction' is aborted *** You can change the emergency value (current value=25.00) by *** 'control.inla=list(control.vb=list(emergency=...))' *** Please (re-)consider your model, priors, confounding, etc.

The model output looks okay, my CRS is in metres, however the DIC values are negative and I am unsure what this means:

inlabru version: 2.13.0 INLA version: 25.09.04 Components: Intercept: main = linear(1), group = exchangeable(1L), replicate = iid(1L), NULL mySmooth: main = spde(geometry), group = exchangeable(1L), replicate = iid(1L), NULL Observation models: Family: 'cp' Tag: <No tag> Data class: 'sf', 'tbl_df', 'tbl', 'data.frame' Response class: 'numeric' Predictor: geometry ~ . Additive/Linear: TRUE/TRUE Used components: effects[Intercept, mySmooth], latent[] Time used: Pre = 8.08, Running = 4.87, Post = 0.167, Total = 13.1 Fixed effects: mean sd 0.025quant 0.5quant 0.975quant mode kld Intercept -12.478 5.488 -26.657 -11.794 -3.886 -11.117 0.007 Random effects: Name Model mySmooth SPDE2 model Model hyperparameters: mean sd 0.025quant 0.5quant 0.975quant mode Range for mySmooth 1036.62 352.13 540.04 973.55 1906.56 853.39 Stdev for mySmooth 8.52 2.63 4.72 8.08 14.93 7.21 Deviance Information Criterion (DIC) ...............: -6755964.79 Deviance Information Criterion (DIC, saturated) ....: -6755965.27 Effective number of parameters .....................: -6756786.30 Watanabe-Akaike information criterion (WAIC) ...: 34991.17 Effective number of parameters .................: 11530.87 Marginal log-Likelihood: -3379648.01 CPO, PIT is computed Posterior summaries for the linear predictor and the fitted values are computed (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

When adding the covariates in I receive different error messages:
*** WARNING *** GMRFLib_2order_approx: rescue NAN/INF values in logl for idx=0 *** WARNING *** GMRFLib_2order_approx: reset counter for 334 NAN/INF values in logl *** WARNING *** GMRFLib_2order_approx: rescue NAN/INF values in logl for idx=0

My points (penguin nests) and main covariate (guano cover) are very spatially defined, I wonder if this could be causing zero inflation?

Any advice would be greatly appreciated!
Alexandra 

Finn Lindgren

unread,
3:52 AM (13 hours ago) 3:52 AM
to R-inla discussion group
Hi Alexandra,

the implementation of DIC and WAIC in R-INLA isn't well defined for how the point process observation model likelihoods are implemented, so I would not rely on those (they will generally "point in the right direction", but don't fully follow how DIC and WAIC would need to defined for such models; e.g., WAIC has a link to leave-one-out cross validation, but for point processes one has to conceptually leave out a region of space, and that's not what happens in the implementation; we are working on alternatives).

(Note; negative values in DIC doesn't in itself mean anything; DIC relates to -2 times the log-likelihood, and since it is _not_ a count model with probabilities, but rather a general likelihood function, correct values can be both positive and negative; it's the difference in DIC between two different models that is relevant.)

Having a crs in metres is OK if the study region is small; as you mention 2x2 metre cells, presumably you're not modelling on the scale of a whole country.

But if the points are highly clustered in a way that is not directly explained by the covariates, the random field might not be fully able to handle both the peaks and the areas of low intensity.
You can try fit2 <- bru_rerun(fit) to see if the VB-messages still appear, or if the additional optimisation solves it; check how much the parameter estimates change between fit and fit2; if they are similar (relative to the posterior uncertainty), and the estimated field loks ok, I wouldn't worry too much.

Finn

You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.

To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.


--
Finn Lindgren
email: finn.l...@gmail.com
Reply all
Reply to author
Forward
0 new messages