Newbie Questions

25 views
Skip to first unread message

Becky Hansis-O'Neill

unread,
May 5, 2026, 11:28:25 AM (8 days ago) May 5
to R-inla discussion group
I am I PhD student working on some ecological modelling for a study with an opportunistic sampling design

We have overall survey sites and polygons representing the transects we actually walked within each site. The transects have equal sampling effort. The survey sites are scatted across the state - sometimes hundreds of km apart. 

Each transect is only 15m wide. The shortest tis probably 30m long and the longest is a couple hundred meters. There are 217 observations across all the sites (which is not very many at all). Green=survey site, Blue=transects. 

Capture.JPG


I am using INLA- SPDE to capture the spatial field. Once I do that, many of my covariate signals get absorbed. The covariates are all resampled to the same resolution and standardized. 

I can't help thinking two things.

1. There may not be a good way to make a mesh for my scattered tiny transect over hundreds of km (see below). Adding a boundary breaks the model. 
mesh_noboundary.JPG


2. By including the spatial field and washing out my covariates, I am losing some ecologically relevant conclusions My covs, for example, are forest canopy cover, observations of non target species, cos_aspect, sin_spect. I can see why they get muddled up into the spatial field.

As a newbie, I was wondering if my approach and concerns seem reasonable?  Secondly, if there are any experienced folks that are thinking, "no no no, this is absolutely the wrong approach for these data."  No one in my dept is really working on this and, given my small data set, it would be easy to make misleading conclusions. 


# ----------------------------
# SPDE
# ----------------------------
spde <- inla.spde2.pcmatern(
  mesh        = mesh,
  prior.range = c(10000, 0.5),
  prior.sigma = c(2, 0.01)
)

spatial_index <- inla.spde.make.index(
  name   = "spatial",
  n.spde = spde$n.spde
)

A <- inla.spde.make.A(mesh, loc = coords)

# ----------------------------
# FORMULA
# ----------------------------
formula_spatial <- y ~ -1 +
  intercept +
  offset(log_area) +
  sin_aspect +
  cos_aspect +
  canopy_2023 +
  herpobs_500m +
  hii +
  f(spatial, model = spde)

Time used: Pre = 0.973, Running = 3.66, Post = 0.185, Total = 4.81 Fixed effects: mean sd 0.025quant 0.5quant 0.975quant mode kld intercept -10.497 1.025 -12.573 -10.478 -8.534 -10.479 0 sin_aspect -0.126 0.178 -0.483 -0.124 0.218 -0.124 0 cos_aspect -0.653 0.523 -1.674 -0.656 0.382 -0.656 0 canopy_2023 0.223 0.745 -1.215 0.214 1.710 0.214 0 herpobs_500m -0.016 0.024 -0.060 -0.017 0.037 -0.021 0 hii 0.499 0.648 -0.732 0.475 1.862 0.475 0 Random effects: Name Model spatial SPDE2 model Model hyperparameters: mean sd 0.025quant 0.5quant 0.975quant mode size for nbinomial_1 zero-inflated observations 1653.378 1.81e+04 3.860 113.372 1.04e+04 6.686 zero-probability parameter for zero-inflated nbinomial_1 0.065 8.00e-02 0.002 0.036 2.98e-01 0.004 Range for spatial 2928.950 1.64e+03 1025.503 2530.062 7.24e+03 1923.618 Stdev for spatial 1.416 3.85e-01 0.817 1.364 2.32e+00 1.263 Deviance Information Criterion (DIC) ...............: 191.25 Deviance Information Criterion (DIC, saturated) ....: -2630.50 Effective number of parameters .....................: 22.99 Watanabe-Akaike information criterion (WAIC) ...: 191.03 Effective number of parameters .................: 18.21 Marginal log-Likelihood: -132.34 CPO, PIT is computed Posterior summaries for the linear predictor and the fitted values are computed (Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

Finn Lindgren

unread,
May 5, 2026, 12:20:44 PM (8 days ago) May 5
to Becky Hansis-O'Neill, R-inla discussion group
Hi Becky,

yes, that mesh won't be very useful.
How to fix it depends on a few things:
1. Do you need the spde model to capture large scale differences between the distant groups of transects, or within-local-site variability?
2. Do you have point-referenced count data, or do you have point pattern data? (I can't quite tell from the summary output if you're using some home-made point pattern implementation)

For 1, to capture large scale behaviour only, you can force the mesh to ignore the fine detail at the observation sites, e.g. by making "cutoff" larger than the size of the transect clusters.
For modelling within-local-site variability instead, use fm_nonconvex_hull() to construct a multi-polygon boundary that encircles each site separately, and use that as the mesh boundary.
(It's not clear to me precisely what you meant by "adding a boundary"; the specifics of that can be extremely important!)

For 2, if you do have point pattern data (presumably defined by your plotted polygons), please switch to the inlabru interface that has a proper implementation of the Poisson point process model.

In general, to make model specification and problem solving easier (in particular if you have point pattern data, but also generally), please use the inlabru interface, as that has direct support for point patterns in particular, but also more widely a much easier way to specify spatial models (and also supports almost all other inla models). I am nowadays increasingly reluctant to help debug raw-inla() implementations of spde and point process models, as we've spent a lot of effort to implement a better interface (ie. inlabru) that gives much shorter and easier-to-read user-side code...

See https://inlabru-org.github.io/inlabru/ for examples (most examples are for line transect distance sampling; fully searched polygon transects are easier)

Finn


--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/r-inla-discussion-group/6a4b4069-d422-4b35-82fb-c2060a215dbfn%40googlegroups.com.


--
Finn Lindgren
email: finn.l...@gmail.com

Becky Hansis-O'Neill

unread,
May 5, 2026, 12:50:17 PM (8 days ago) May 5
to Finn Lindgren, R-inla discussion group
Thanks Finn! You're a lifesaver and I appreciate the patient/kind response. I actually do have an inlabru version I can go back to. I have been struggling a little bit to understand the best way to look at my point data - counts per transect area or point-process. I know they answer different questions. Ideally, I want to 1. understand how my covariates influence density and 2. create a map of density in my habitat polygons(green) to use in a population viability model. This is something I can spend more time reading about. I need to understand it for my dissertation/do not expect in-depth explanations.

1. Do you need the spde model to capture large scale differences between the distant groups of transects, or within-local-site variability? Local only.  
2. Do you have point-referenced count data, or do you have point pattern data? (I can't quite tell from the summary output if you're using some home-made point pattern implementation). In this version of the model I am trying to use point-referenced count data within transects (blue), but I am not sure that's the best way to answer my question. The area offset is the green polygon area. Because my transects(blue) represent complete sampling effort - I am concerned I may be leaving out important information about where the species does NOT occur if I use point pattern data, which has led to some weirdness and may represent a fundamental misunderstanding I have about these modelling approaches. 

Here is what I meant by boundary - but the point is mute I think. I need to approach the mesh in a very different way.
boundarynoboundarymesh.JPG
A clearer picture of my data. 

Green- habitat boundary/study site  Blue- transect polygons Dots- observations
I have been using the blue as my study sites, but I would like to extrapolate those results as a predictive model in the green polygons. 
pointsintransects.JPG

I will work on implementing the new multi-polygon mesh and moving back to the inlabru interface before posting more questions.

Finn Lindgren

unread,
May 5, 2026, 4:49:05 PM (8 days ago) May 5
to Becky Hansis-O'Neill, R-inla discussion group
Hi,

ok, I see, then I'd try to first set the model up to handle one of the study sites (for ease of debugging), and then add the others.
For each study site, I'd build a mesh covering the green polygon; use fm_nonconvex_hull(the_green_polygon, the_extension_amount) (or rather one inner and one outer extension).
Then you have a choice for the observation model; either
1. treat the observations as counts aggregated over each blue polygon; the "aggregate" feature can help with that,
  ips <- fm_int(domain = list(geometry = fm_subdivide(mesh, number_of_extra_points_per_edge)), samplers = the_blue_polygons)
  obs <- bru_obs(..., family = "poisson", aggregate = "logsumexp", response_data = the_count_df, data = ips)
2. or treat them as point patterns,
  ips <- fm_int(domain = list(geometry = fm_subdivide(mesh, number_of_extra_points_per_edge)), samplers = the_blue_polygons)
  obs <- bru_obs(..., family="cp", data = the_point_sf, ips = ips)

In both cases the above pseudo-code ignores details regarding how and where to evaluate spatial covariates, but I wrote it in such a way
that you can pre-evaluate the covariates for the locations in ips (and the_point_sf, for the second version) before calling bru_obs.
That would remove the need for a global covariate raster, so should
  work also when extending to multiple green polygons (in fact, the code above should work as intended if the_green_polygon is the union of the different sites).

For posterior prediction on the green polygons, use predict() with fm_pixels(), separately for each green polygon.

Finn
Reply all
Reply to author
Forward
0 new messages