Clarification on INLA's "bym2" Functionality with Multiple Records per Spatial Unit

6 views
Skip to first unread message

Angela Fanelli

unread,
Jun 6, 2025, 2:36:55 AM6/6/25
to r-inla-disc...@googlegroups.com, Haavard Rue, Finn Lindgren
Hello,

I am reaching out to seek clarification on how the "bym2" function in INLA handles adjacency matrices when dealing with multiple records per spatial unit, such as repeated measurements over time or replicates due to a third factor.

Specifically, I want to ensure that the spatial adjacency matrix is correctly aligned with the order of spatial units in my dataset. I have attempted to build spatio-temporal models with and without replicates, but the fitted values do not seem to make sense, leading me to suspect that there might be an issue with how the adjacency matrix is being constructed.

To illustrate my concerns, I have attached a simulation that includes the code I use to fit the INLA model, along with the options I have specified. I would appreciate it if you could review this code and provide guidance on whether there are any potential issues with how I am building the model.

My primary concern is whether the adjacency matrix automatically recognizes the correct row for each spatial unit based on the index. I have provided two examples: a simple one with repeated records over time and a second one with a replicated factor that I would like to use within the random effect.

It's worth noting that I do not have complete information for all locations and have discarded some due to missing covariates. However, I have taken care to only consider the locations included in the model fitting and prediction, and I have built the graph accordingly.

I would be grateful for any clarification or guidance you can provide on how INLA's "bym2" function handles adjacency matrices in these scenarios. Please let me know if you require any additional information or context.

Thank you for your time and assistance.

Best regards




# Simulation

# Load the required packages
libraries <- c("tidyverse","dplyr","lubridate", "purrr","readxl","ISOweek","data.table",
               "corrplot", "ggsci","flextable",
               "doFuture","foreach","jsonlite","httr",
               "glue",  "ggplot2",  "gridExtra","classInt",
               "ggridges","colorspace","giscoR","eurostat","scales",
               "INLA", "tsModel", "splines","dlnm",
               "sf","tmap","basemaps","terra", "sp", "spdep", "pROC")

# Install and load required packages
for (lib in libraries) {
  if (!(lib %in% installed.packages())) {
    if (lib == "INLA") {
      install.packages("INLA", repos = c(getOption("repos"), INLA = "https://inla.r-inla-download.org/R/stable"), dep = TRUE)
    } else {
      install.packages(lib)
    }
  }
  library(lib, character.only = TRUE)
}


# Get all nuts code
nuts3<-gisco_get_nuts(nuts_level = 3,year=2021, epsg = 3035)

# Create a dataframe with all combinations of nuts IDs, months and years
df <- expand.grid(
  nuts_id = nuts3$NUTS_ID,
  month = 1:12,
  year = 2010:2023,
  KEEP.OUT.ATTRS=FALSE
)

# Simulate Y outcome variable following a Poisson distribution
df$Y <- rpois(nrow(df), lambda = 30)

# Introduce some NA values for Y, but only for specific NUTS3
# Sample 100 unique nuts IDs
sample_nuts_ids <- sample(unique(nuts3$NUTS_ID), size = 100, replace = FALSE)

df$Y[df$nuts_id %in% sample_nuts_ids] <- NA # introduce NA values for specific NUTS3 regions

# Sample 3 unique nuts IDs to delete from the entire dataframe because you do not have records of covariates
sample_nuts_ids2 <- sample(unique(nuts3$NUTS_ID), size = 3, replace = FALSE)

# Exclude the ids from the original data
df <- df |>
  filter(!nuts_id %in% sample_nuts_ids2)

# Simulate TMEAN temperature in Kelvin (range: 257.1252 - 305.0120)
df$TMEAN <- runif(nrow(df), min = 257.1252, max = 305.0120) # temperature in Kelvin

# Relocate some columns and create indexes for the graph  parameters
df<-df |>
  mutate(nuts_index=as.integer(factor(nuts_id)))|>
  relocate(nuts_index,.before=nuts_id) |> arrange(year,month,nuts_index)


# Create the graph

# Reorder the rows of nuts3 shapefile based on the order of NUTS_ID in data
nuts3 <- nuts3 |>  arrange(match(NUTS_ID, df$nuts_id))
head(nuts3)
head(df)

# Create the adj matrix
nuts3_nb <- poly2nb(as_Spatial(nuts3$geometry))
nb2INLA("Input/Data_model/graph", nuts3_nb)

g <- inla.read.graph(filename = "Input/Data_model/graph")


# Define the model formula

alpha <- 0.01
u <- 1

precision_prior <- list(prec = list(prior = "pc.prec", param = c(u, alpha)))

# Define the formula
formula <- Y ~ 1 +f(month, model = "rw1",cyclic = TRUE, constr = TRUE, scale.model = TRUE, hyper = precision_prior) +
  f(nuts_index, model = "bym2",  graph = g, scale.model = TRUE, hyper = precision_prior) +log(TMEAN)


# Run the model

# Model to assess the relationships between the outcome and covariates  
model <- inla(
  # formula, data and model family
  formula = formula, data = df, family="poisson",
 
  # configure inla approx
  control.inla = list(strategy = 'adaptive'),
 
  # items to compute
  control.compute = list(dic = TRUE, waic = TRUE, config = FALSE,
                         cpo = TRUE,return.marginals = TRUE),
 
  # fixed effects calibration
  control.fixed = list(correlation.matrix = TRUE,
                       prec.intercept = 1, # precision 1
                       prec = 1), # weakly regularising on fixed effects (sd of 1)
 
  # save predicted values on response scale
  control.predictor = list(link = 1, compute = TRUE),
 
  # Verbose
  verbose = TRUE)

# Look at the predictions
df$fitted<-model$summary.fitted.values[,c("mean")]

head(df)

# Let's add a layer of complexity using replicate within the formula

rm(list = ls())
# Get all nuts code
nuts3<-gisco_get_nuts(nuts_level = 3,year=2021, epsg = 3035)

# Create a dataframe with all combinations of nuts IDs, months and years
df <- expand.grid(
  nuts_id = nuts3$NUTS_ID,
  month = 1:12,
  year = 2010:2023,
  replicate_factor=1:3,
  KEEP.OUT.ATTRS=FALSE
)

# Simulate Y outcome variable following a Poisson distribution
df$Y <- rpois(nrow(df), lambda = 30)

# Introduce some NA values for Y, but only for specific NUTS3
# Sample 100 unique nuts IDs
sample_nuts_ids <- sample(unique(nuts3$NUTS_ID), size = 100, replace = FALSE)

df$Y[df$nuts_id %in% sample_nuts_ids] <- NA # introduce NA values for specific NUTS3 regions

# Sample 3 unique nuts IDs to delete from the entire dataframe because you do not have records of covariates
sample_nuts_ids2 <- sample(unique(nuts3$NUTS_ID), size = 3, replace = FALSE)

# Exclude the ids from the original data
df <- df |>
  filter(!nuts_id %in% sample_nuts_ids2)

# Simulate TMEAN temperature in Kelvin (range: 257.1252 - 305.0120)
df$TMEAN <- runif(nrow(df), min = 257.1252, max = 305.0120) # temperature in Kelvin

# Relocate some columns and create indexes for the graph  parameters
df<-df |>
  mutate(nuts_index=as.integer(factor(nuts_id)))|>
  relocate(nuts_index,.before=nuts_id) |> arrange(year,month,nuts_index)


# Create the graph

# Reorder the rows of nuts3 shapefile based on the order of NUTS_ID in data
nuts3 <- nuts3 |>  arrange(match(NUTS_ID, df$nuts_id))
head(nuts3)
head(df)

# Create the adj matrix
nuts3_nb <- poly2nb(as_Spatial(nuts3$geometry))
nb2INLA("Input/Data_model/graph", nuts3_nb)

g <- inla.read.graph(filename = "Input/Data_model/graph")


# Define the model formula

alpha <- 0.01
u <- 1

precision_prior <- list(prec = list(prior = "pc.prec", param = c(u, alpha)))

# Define the formula with the replicate
formula <- Y ~ 1 +f(month, replicate=replicate_factor, model = "rw1",cyclic = TRUE, constr = TRUE, scale.model = TRUE, hyper = precision_prior) +
  f(nuts_index, model = "bym2",  graph = g, scale.model = TRUE, hyper = precision_prior) +log(TMEAN)


# Run the model

# Model to assess the relationships between the outcome and covariates  
model <- inla(
  # formula, data and model family
  formula = formula, data = df, family="poisson",
 
  # configure inla approx
  control.inla = list(strategy = 'adaptive'),
 
  # items to compute
  control.compute = list(dic = TRUE, waic = TRUE, config = FALSE,
                         cpo = TRUE,return.marginals = TRUE),
 
  # fixed effects calibration
  control.fixed = list(correlation.matrix = TRUE,
                       prec.intercept = 1, # precision 1
                       prec = 1), # weakly regularising on fixed effects (sd of 1)
 
  # save predicted values on response scale
  control.predictor = list(link = 1, compute = TRUE),
 
  # Verbose
  verbose = TRUE)

# Look at the predictions
df$fitted<-model$summary.fitted.values[,c("mean")]

head(df)



--

Angela Fanelli











Reply all
Reply to author
Forward
0 new messages