Clarification on INLA's "bym2" Functionality with Multiple Records per Spatial Unit

29 views
Skip to first unread message

Angela Fanelli

unread,
Jun 6, 2025, 2:48:44 AM6/6/25
to R-inla discussion group
Hello,

I am reaching out to seek clarification on how the "bym2" function in INLA handles adjacency matrices when dealing with multiple records per spatial unit, such as repeated measurements over time or replicates due to a third factor.

Specifically, I want to ensure that the spatial adjacency matrix is correctly aligned with the order of spatial units in my dataset. I have attempted to build spatio-temporal models with and without replicates, but the fitted values do not seem to make sense, leading me to suspect that there might be an issue with how the adjacency matrix is being constructed.

To illustrate my concerns, I have attached a simulation that includes the code I use to fit the INLA model, along with the options I have specified. I would appreciate it if you could review this code and provide guidance on whether there are any potential issues with how I am building the model.

My primary concern is whether the adjacency matrix automatically recognizes the correct row for each spatial unit based on the index. I have provided two examples: a simple one with repeated records over time and a second one with a replicated factor that I would like to use within the random effect.

It's worth noting that I do not have complete information for all locations and have discarded some due to missing covariates. However, I have taken care to only consider the locations included in the model fitting and prediction, and I have built the graph accordingly.

I would be grateful for any clarification or guidance you can provide on how INLA's "bym2" function handles adjacency matrices in these scenarios. Please let me know if you require any additional information or context.

Thank you for your time and assistance.

Best regards




# Simulation

# Load the required packages
libraries <- c("tidyverse","dplyr","lubridate", "purrr","readxl","ISOweek","data.table",
               "corrplot", "ggsci","flextable",
               "doFuture","foreach","jsonlite","httr",
               "glue",  "ggplot2",  "gridExtra","classInt",
               "ggridges","colorspace","giscoR","eurostat","scales",
               "INLA", "tsModel", "splines","dlnm",
               "sf","tmap","basemaps","terra", "sp", "spdep", "pROC")

# Install and load required packages
for (lib in libraries) {
  if (!(lib %in% installed.packages())) {
    if (lib == "INLA") {
      install.packages("INLA", repos = c(getOption("repos"), INLA = "https://inla.r-inla-download.org/R/stable"), dep = TRUE)
    } else {
      install.packages(lib)
    }
  }
  library(lib, character.only = TRUE)
}


# Get all nuts code
nuts3<-gisco_get_nuts(nuts_level = 3,year=2021, epsg = 3035)

# Create a dataframe with all combinations of nuts IDs, months and years
df <- expand.grid(
  nuts_id = nuts3$NUTS_ID,
  month = 1:12,
  year = 2010:2023,
  KEEP.OUT.ATTRS=FALSE
)

# Simulate Y outcome variable following a Poisson distribution
df$Y <- rpois(nrow(df), lambda = 30)

# Introduce some NA values for Y, but only for specific NUTS3
# Sample 100 unique nuts IDs
sample_nuts_ids <- sample(unique(nuts3$NUTS_ID), size = 100, replace = FALSE)

df$Y[df$nuts_id %in% sample_nuts_ids] <- NA # introduce NA values for specific NUTS3 regions

# Sample 3 unique nuts IDs to delete from the entire dataframe because you do not have records of covariates
sample_nuts_ids2 <- sample(unique(nuts3$NUTS_ID), size = 3, replace = FALSE)

# Exclude the ids from the original data
df <- df |>
  filter(!nuts_id %in% sample_nuts_ids2)

# Simulate TMEAN temperature in Kelvin (range: 257.1252 - 305.0120)
df$TMEAN <- runif(nrow(df), min = 257.1252, max = 305.0120) # temperature in Kelvin

# Relocate some columns and create indexes for the graph  parameters
df<-df |>
  mutate(nuts_index=as.integer(factor(nuts_id)))|>
  relocate(nuts_index,.before=nuts_id) |> arrange(year,month,nuts_index)


# Create the graph

# Reorder the rows of nuts3 shapefile based on the order of NUTS_ID in data
nuts3 <- nuts3 |>  arrange(match(NUTS_ID, df$nuts_id))
head(nuts3)
head(df)

# Create the adj matrix
nuts3_nb <- poly2nb(as_Spatial(nuts3$geometry))
nb2INLA("Input/Data_model/graph", nuts3_nb)

g <- inla.read.graph(filename = "Input/Data_model/graph")


# Define the model formula

alpha <- 0.01
u <- 1

precision_prior <- list(prec = list(prior = "pc.prec", param = c(u, alpha)))

# Define the formula
formula <- Y ~ 1 +f(month, model = "rw1",cyclic = TRUE, constr = TRUE, scale.model = TRUE, hyper = precision_prior) +
  f(nuts_index, model = "bym2",  graph = g, scale.model = TRUE, hyper = precision_prior) +log(TMEAN)


# Run the model

# Model to assess the relationships between the outcome and covariates  
model <- inla(
  # formula, data and model family
  formula = formula, data = df, family="poisson",
 
  # configure inla approx
  control.inla = list(strategy = 'adaptive'),
 
  # items to compute
  control.compute = list(dic = TRUE, waic = TRUE, config = FALSE,
                         cpo = TRUE,return.marginals = TRUE),
 
  # fixed effects calibration
  control.fixed = list(correlation.matrix = TRUE,
                       prec.intercept = 1, # precision 1
                       prec = 1), # weakly regularising on fixed effects (sd of 1)
 
  # save predicted values on response scale
  control.predictor = list(link = 1, compute = TRUE),
 
  # Verbose
  verbose = TRUE)

# Look at the predictions
df$fitted<-model$summary.fitted.values[,c("mean")]

head(df)

# Let's add a layer of complexity using replicate within the formula

rm(list = ls())
# Get all nuts code
nuts3<-gisco_get_nuts(nuts_level = 3,year=2021, epsg = 3035)

# Create a dataframe with all combinations of nuts IDs, months and years
df <- expand.grid(
  nuts_id = nuts3$NUTS_ID,
  month = 1:12,
  year = 2010:2023,
  replicate_factor=1:3,
  KEEP.OUT.ATTRS=FALSE
)

# Simulate Y outcome variable following a Poisson distribution
df$Y <- rpois(nrow(df), lambda = 30)

# Introduce some NA values for Y, but only for specific NUTS3
# Sample 100 unique nuts IDs
sample_nuts_ids <- sample(unique(nuts3$NUTS_ID), size = 100, replace = FALSE)

df$Y[df$nuts_id %in% sample_nuts_ids] <- NA # introduce NA values for specific NUTS3 regions

# Sample 3 unique nuts IDs to delete from the entire dataframe because you do not have records of covariates
sample_nuts_ids2 <- sample(unique(nuts3$NUTS_ID), size = 3, replace = FALSE)

# Exclude the ids from the original data
df <- df |>
  filter(!nuts_id %in% sample_nuts_ids2)

# Simulate TMEAN temperature in Kelvin (range: 257.1252 - 305.0120)
df$TMEAN <- runif(nrow(df), min = 257.1252, max = 305.0120) # temperature in Kelvin

# Relocate some columns and create indexes for the graph  parameters
df<-df |>
  mutate(nuts_index=as.integer(factor(nuts_id)))|>
  relocate(nuts_index,.before=nuts_id) |> arrange(year,month,nuts_index)


# Create the graph

# Reorder the rows of nuts3 shapefile based on the order of NUTS_ID in data
nuts3 <- nuts3 |>  arrange(match(NUTS_ID, df$nuts_id))
head(nuts3)
head(df)

# Create the adj matrix
nuts3_nb <- poly2nb(as_Spatial(nuts3$geometry))
nb2INLA("Input/Data_model/graph", nuts3_nb)

g <- inla.read.graph(filename = "Input/Data_model/graph")


# Define the model formula

alpha <- 0.01
u <- 1

precision_prior <- list(prec = list(prior = "pc.prec", param = c(u, alpha)))

# Define the formula with the replicate
formula <- Y ~ 1 +f(month, replicate=replicate_factor, model = "rw1",cyclic = TRUE, constr = TRUE, scale.model = TRUE, hyper = precision_prior) +
  f(nuts_index, model = "bym2",  graph = g, scale.model = TRUE, hyper = precision_prior) +log(TMEAN)


# Run the model

# Model to assess the relationships between the outcome and covariates  
model <- inla(
  # formula, data and model family
  formula = formula, data = df, family="poisson",
 
  # configure inla approx
  control.inla = list(strategy = 'adaptive'),
 
  # items to compute
  control.compute = list(dic = TRUE, waic = TRUE, config = FALSE,
                         cpo = TRUE,return.marginals = TRUE),
 
  # fixed effects calibration
  control.fixed = list(correlation.matrix = TRUE,
                       prec.intercept = 1, # precision 1
                       prec = 1), # weakly regularising on fixed effects (sd of 1)
 
  # save predicted values on response scale
  control.predictor = list(link = 1, compute = TRUE),
 
  # Verbose
  verbose = TRUE)

# Look at the predictions
df$fitted<-model$summary.fitted.values[,c("mean")]

head(df)

Helpdesk (Haavard Rue)

unread,
Jun 7, 2025, 4:53:40 AM6/7/25
to Angela Fanelli, R-inla discussion group
On Thu, 2025-06-05 at 23:48 -0700, Angela Fanelli wrote:
> Specifically, I want to ensure that the spatial adjacency matrix is correctly
> aligned with the order of spatial units in my dataset. I have attempted to
> build spatio-temporal models with and without replicates, but the fitted
> values do not seem to make sense, leading me to suspect that there might be an
> issue with how the adjacency matrix is being constructed.
>

to check this, set all data to be 0, except one region, which you set to 100,
say. then you should see in the results the largerst value of the bym2 at the
correct region, otherwise there is a missmatch.



--
Håvard Rue
he...@r-inla.org

Helpdesk (Haavard Rue)

unread,
Jun 7, 2025, 5:27:33 AM6/7/25
to Angela Fanelli, R-inla discussion group
I see nuts3_nb has an attribute called 'region.id', which is in somewhat random
order. I guess this means that graph index is not eq to region.id

rid = as.integer(sub("ID", "", attr(nuts3_nb, 'region.id')))
gid = 1:graph$n

so rid[idx] gives the region.id to graph index idx, I presume. then you can
also have the inverse.

is this what you're asking for?
Reply all
Reply to author
Forward
0 new messages