Dear INLA/bru community,
I’m working on an integrated marine model that combines:
Presence-only (PO) data (citizen science data that covers a narrow region with many presences).
Presence–absence (PA) data (collected in a broader survey across a larger region with few presences).
I’m using inlabru (/ the wrapper pointedSDMs) and am confused about how best to define integration points (IPS) or samplers when these two data types have different spatial extents. In particular:
My PO data only exist in a small corridor, so I’d like the IPS there to capture the small spatial sampling effort.
My PA data spans a larger region and is modeled as a binomial likelihood.
In normal usage, I can pass one polygon sampler to inlabru for presence-only to define the integration domain. But if the PA data covers a bigger area, how should I incorporate that extra region without artificially generating IPS for the PO dataset outside its corridor?
Should I perhaps use large domain for the IPS that then cover spatially both datasets, and then rely on a “bias term” or offset to handle the narrower PO coverage? Is that approach recommended, or is it better to literally give each dataset its own sampler polygon so that the PO data are integrated only where they truly exist, and the PA data “knows” about its broader region?
Finally, if I want to create a finer integration scheme (higher density of IPS) for the PO region but keep a coarser one for the rest, is there a standard inlabru approach for multi-resolution integration, or do I need to manually piece that together?
Any advice or examples on specifying IPS/ samplers for integrated models with different spatial coverage would be greatly appreciated!
Thanks so much,
Moritz
--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/r-inla-discussion-group/6d202841-1b71-4e93-9a08-8832cb4df66bn%40googlegroups.com.
Hi Finn,
Thank you so much for your detailed explanation and suggestions! I only started exploring INLA and inlabru recently, and the advice from you and others on the forum has been invaluable in getting my integrated model running.
I ended up following the approach you recommended: keeping the PO and PA observations as separate likelihood contributions and providing an integration domain only for the presence-only data within its known sampling corridor.
At the bottom of this message, there is a summary of my current model setup and output. I would really appreciate it if you could take a quick look and let me know if everything seems sensible from a first look.
Some notes:
I am modelling a shared spatial field with AR1 temporal grouping across 12 months.
The PO dataset (monthly resolution) uses a separate bias field component (po_sf_biasField) that also has 12 monthly groups with an AR1 correlation structure.
The PA dataset covers only 1 of those months.
The environmental covariates (temperature, bathymetry, slope) each have their own 1D SPDE model component, using PC priors. Temperature changes across the months and I managed to dynamically link the values to the PO (presences and IPS)
Shared Spatial Random Field + Bias Field
Does my use of a single shared_spatial field, grouped by month and modeled with an AR1 correlation, seem appropriate when the PO data spans 12 months but the PA data covers only a single month? Currently, the PA data is treated as one “slice” (month = X) of that same AR1 structure. I’m wondering whether this is standard practice?
I’ve also introduced a separate po_sf_biasField component to capture sampling bias in the PO data. This bias field has the same monthly grouping/AR1 structure as the PO data. Is that a sensible approach to account for possible differences in search effort over space and time in a presence-only survey, or would you suggest a different strategy?
Question on priors
Currently, I’m using PC priors for the SPDE components (2D SPDE for the SRF and 1D for my covariates), roughly guided by domain knowledge about species home ranges (for the SRF) (for example, a prior expectation that the spatial range might be on the order of tens of kilometers). However, I’m still unsure how best to refine these priors iteratively. Do you typically lean more on prior domain knowledge (the species core habitat area is about X km, so the prior range scale should be around X”), or do you adjust iteratively based on posterior checks?
Model components
~-1
+ shared_spatial(main = geometry, model = shared_field, group = month,
ngroup = 12, control.group = list(model = "ar1"))
+ po_sf_intercept(1)
+ pa_sf_intercept(1)
+ temperature(main = temperature, model = INLA::inla.spde2.pcmatern(
mesh = fmesher::fm_mesh_1d(
loc = seq(-1.33369767665863, 2.07408857345581, length.out = 20),
boundary = "free"
),
alpha = 2,
prior.range = c(0.2, 0.05),
prior.sigma = c(0.5, 0.05),
constr = TRUE)
)
+ bathymetry(main = bathymetry, model = INLA::inla.spde2.pcmatern(
mesh = fmesher::fm_mesh_1d(
loc = seq(-1.14557325839996, 2.52323031425476, length.out = 20),
boundary = "free"
),
alpha = 2,
prior.range = c(0.2, 0.05),
prior.sigma = c(0.5, 0.05),
constr = TRUE)
)
+ po_sf_biasField(main = geometry, model = po_sf_bias_field, group = month,
ngroup = 12, control.group = list(model = "ar1"))
+ slope(main = slope, model = INLA::inla.spde2.pcmatern(
mesh = fmesher::fm_mesh_1d(
loc = seq(-0.944889545440674, 9.53582763671875, length.out = 20),
boundary = "free"
),
alpha = 2,
prior.range = c(0.2, 0.05),
prior.sigma = c(0.5, 0.05),
constr = TRUE)
)
Model output
Summary of 'modISDM' object:
inlabru version: 2.12.0
INLA version: 24.12.11
Types of data modelled:
po_sf Present only
pa_sf Present absence
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode
po_sf_intercept -6.664 1.230 -9.075 -6.664 -4.254 -6.664
pa_sf_intercept -4.040 0.392 -4.809 -4.040 -3.271 -4.040
Random effects:
Name Model
shared_spatial SPDE2 model
temperature SPDE2 model
po_sf_biasField SPDE2 model
bathymetry SPDE2 model
slope SPDE2 model
Model hyperparameters:
mean sd 0.025quant 0.5quant 0.975quant mode
Range for shared_spatial 47.111 18.682 20.222 43.917 92.523 38.109
Stdev for shared_spatial 0.476 0.161 0.236 0.450 0.861 0.404
GroupRho for shared_spatial 0.847 0.090 0.615 0.867 0.960 0.905
Range for temperature 2.968 1.197 1.331 2.739 5.957 2.333
Stdev for temperature 0.514 0.161 0.275 0.488 0.903 0.439
Theta1 for po_sf_biasField -1.289 0.627 -2.556 -1.278 -0.088 -1.230
Theta2 for po_sf_biasField -1.969 0.445 -2.827 -1.975 -1.076 -2.001
GroupRho for po_sf_biasField 0.999 0.001 0.998 0.999 1.000 1.000
Range for bathymetry 1.277 0.602 0.483 1.154 2.797 0.943
Stdev for bathymetry 0.766 0.232 0.407 0.734 1.313 0.674
Range for slope 9.490 9.607 1.404 6.671 34.877 3.471
Stdev for slope 0.319 0.159 0.113 0.286 0.722 0.229
DIC: -41289.11, WAIC: 5992.56, Marg. log-likelihood: -26784.69
Everything seems to run smoothly, and the estimates appear reasonable. I would love to hear if you see any red flags or have any additional suggestions regarding my setup.
Many thanks again!
Moritz