Manual creation of integration points for distance sampling model

30 views
Skip to first unread message

Josh Cullen

unread,
Jan 12, 2026, 10:43:52 AMJan 12
to R-inla discussion group
Hi,

I'm attempting to fit a distance sampling model in {inlabru} as a thinned inhomogeneous Poisson point process, but using points where environmental values have already been extracted for both the animal sightings and the transect segments. I'm trying to use this approach instead of the method of supplying the transects as {sf} LINESTRINGS and a raster stack since I would need to provide 30 years of daily rasters, which I expect would prove computationally challenging. So it seems much simpler to just extract all environmental values prior to model fitting.

While I've built and fitted a model where the marginal effects for the environmental covariates make sense, i'm not sure if the way I'm generating the integration points is correct because I wasn't able to find a similar example available from this Google Group, Github repos, or the {inlabru} website. Any feedback as to whether the approach I'm using is correct would be much appreciated. (Relevant sample code below).

Thanks,
     Josh


```
# Define half-Normal detection function
log_hn_ls <- function(distance, lsig) {
  # lsig is the linear predictor for log(sigma)
  -0.5 * (distance / exp(lsig))^2
}

# Reformat transect data and scale covars
surv_sf2 <- surv_sf |>
  st_transform(proj) |>
  mutate(across(.cols = c(mld, sss, sst, eke, bathym),
                .fns = ~{scale(.x) |>
                    c()},
                .names = "{.col}_s"))

# Scale covars for distance sampling model
dat_sights3 <- dat_sights2 |>
  mutate(bathym_s = (bathym - mean(surv_sf2$bathym)) / sd(surv_sf2$bathym),
         mld_s = (mld - mean(surv_sf2$mld)) / sd(surv_sf2$mld),
         sss_s = (sss - mean(surv_sf2$sss)) / sd(surv_sf2$sss),
         sst_s = (sst - mean(surv_sf2$sst)) / sd(surv_sf2$sst),
         eke_s = (eke - mean(surv_sf2$eke)) / sd(surv_sf2$eke))

# Define 1D distance mesh
W <- max(dat_sights3$distance)  #truncation distance

mesh_dist <- fm_mesh_1d(
  seq(0, W, length.out = 5),
  degree = 2
)



### Generate integration points on subdivided mesh ###
mesh2 <- mesh
mesh2$loc <- cbind(mesh2$loc, 0)  #needs 3rd column to work
mesh_ips <- fm_subdivide(mesh2, n = 1)

surv_ips <- fm_int(domain = mesh_ips,
                   samplers = surv_sf2 |>
                     dplyr::select(segnum, transect, date, dist, eke_s, mld_s, sss_s,
                                   sst_s, bathym_s),
                   int.args = list(method = "stable", nsub = NULL)
                                   ) |>
  rename(weight_spatial = weight)

# Add back in environ covars by associated row ID
surv_ips2 <- surv_ips |>
  left_join(surv_sf2 |>
              st_drop_geometry() |>
              mutate(row_id = row_number()),
            by = c(".block" = "row_id"))


### Create cross-product of IPs across the detection distance domain ###

# Create detection distance ips
dist_ips <- fmesher::fm_int(mesh_dist, name = "distance") |>
  rename(weight_dist = weight) |>
  dplyr::select(distance, weight_dist)

# Perform a "Cross Join" with spatial points (cross-product of all rows from surv_ips w/ dist_ips)
surv_ips_full <- surv_ips2 |>
  st_drop_geometry() |>
  crossing(dist_ips) |> # This multiplies rows: (92k spatial) * (9 distance)
  st_as_sf(geometry = rep(st_geometry(surv_ips2), each = nrow(dist_ips))) |>  #add spatial component back

  mutate(weight = weight_spatial * weight_dist)  #(Weight of Transect Segment) * (Weight in 'distance' dimension)



### Define model components
cmp6 <- ~
  ## Intensity model
  Int_lambda(1) +

  # Functional GP over SST (1-D SPDE on covariate value)
  gp_sst(
    main   = sst_s,                  # numeric covariate at IPs
    model  = spde1d[["sst_s"]],
    mapper = bru_mapper_fmesher(covmesh1d[["sst_s"]])
  ) +
  # Functional GP over bathymetry
  gp_bathym(
    main   = bathym_s,
    model  = spde1d[["bathym_s"]],
    mapper = bru_mapper_fmesher(covmesh1d[["bathym_s"]])
  ) +
  # Functional GP over MLD
  gp_mld(
    main   = mld_s,
    model  = spde1d[["mld_s"]],
    mapper = bru_mapper_fmesher(covmesh1d[["mld_s"]])
  ) +
  # Functional GP over SSS
  gp_sss(
    main   = sss_s,
    model  = spde1d[["sss_s"]],
    mapper = bru_mapper_fmesher(covmesh1d[["sss_s"]])
  ) +
  # Functional GP over EKE
  gp_eke(
    main   = eke_s,
    model  = spde1d[["eke_s"]],
    mapper = bru_mapper_fmesher(covmesh1d[["eke_s"]])
  ) +
 
  ## Distance sampling model
  lsig0(1, model = "linear", mean.linear = 0, prec.linear = 1) +
  GRF(main = geometry, model = matern)


# Define likelihood for distance sampling model
lik_ds <- bru_obs(
  family = "cp",
  data   = dat_sights3 |>
    st_as_sf(coords = c("lon","lat"), crs = 4326) |>
    st_transform(proj),
  ips = surv_ips_full,
  domain = list(distance = mesh_dist,
                geometry = mesh),
  formula = distance + geometry ~ Int_lambda + gp_sst + gp_bathym + gp_mld + gp_sss + gp_eke +
    GRF +
    log_hn_ls(distance,  lsig0) +
   log(2)
    )


# Fit model
fit5 <- bru(
  components = cmp6,
  lik_ds,
  options = list(
    control.inla  = list(int.strategy = "eb", strategy = "adaptive"),
    control.compute = list(openmp.strategy = "huge"),
    inla.mode = "experimental",
    verbose = TRUE,
    bru_verbose = 4
  )
)
```

Finn Lindgren

unread,
Jan 13, 2026, 7:39:54 AMJan 13
to Josh Cullen, R-inla discussion group
Can you please repost this as a discussion entry on the inlabru gihub?
I have a bunch of comments to make, but formatting code replies in gmail is hopeless.

In short, at least with the development versions of inlabru and fmesher, I don't think you need to do the integration point construction manually, as some bugs that made it difficult have been fixed (including the fm_subdivide 3rd column coordinate issue you had to work around).

Assuming the observation data and transect objects both include all the needed covariate values, you shouldn't need to do anything special:

In the example below, the sampler includes additional columns "id" and "covar" with the transect ids and per-transect covariate values.
The component definitions can then directly refer to covar as input, if the observation data also contains those (as inlabru should now keep those around; I'm not sure if the cran version does it, and if you notice an issue with the development/r-universe version, let me know so I can fix it before soon submitting the new versions to CRAN). The component inputs can also call functions, such as a function to take the "id" value and compute/extract the relevant covariate values.

> fm_int(list(space=1:4),tibble(space=list(c(1,4),2:4),id=c(1,2),covar=c(pi, 10)))
# A tibble: 5 × 6
  space    id covar weight .block .block_origin[,"space"]
  <int> <dbl> <dbl>  <dbl>  <int>                   <int>
1     1     1  3.14      1      1                       1
2     4     1  3.14      1      1                       1
3     2     2 10         1      2                       2
4     3     2 10         1      2                       2
5     4     2 10         1      2                       2


Finn

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/r-inla-discussion-group/2560eef3-a30f-4157-ac80-ec212dd23609n%40googlegroups.com.


--
Finn Lindgren
email: finn.l...@gmail.com

Finn Lindgren

unread,
Jan 13, 2026, 8:04:21 AMJan 13
to Josh Cullen, R-inla discussion group
Here's what it should look like, with the observation data having the same covariate values as the matching transects:

components = ~ 0 + ... + covar(covar, model = "linear")
or just
components = ~0 +... + covar
or for the function method,
components = ~ 0 + ... + covar(extract_covar_from_transect(id), model = "linear")

> bru_obs(space + distance ~ covar,
    data=tibble(space=c(1,2,3,4),distance=c(2,2,1,1),id=c(1,2,2,1),covar=c(pi,10,10,pi)),
    domain=list(space=1:4,distance=0:2),
    samplers=tibble(space=list(c(1,4),2:4),id=c(1,2),covar=c(pi, 10)),
    family="cp")$data
# A tibble: 19 × 9
   space distance    id covar .block BRU_aggregate BRU_point_weights weight .block_origin[,1]  [,2]
   <dbl>    <dbl> <dbl> <dbl>  <int> <lgl>                     <dbl>  <dbl>             <int> <int>
 1     1        2     1  3.14      1 TRUE                          1     NA                NA    NA
 2     2        2     2 10         1 TRUE                          1     NA                NA    NA
 3     3        1     2 10         1 TRUE                          1     NA                NA    NA
 4     4        1     1  3.14      1 TRUE                          1     NA                NA    NA
 5     1        0     1  3.14      1 FALSE                         0      1                 1     1
 6     4        0     1  3.14      1 FALSE                         0      1                 1     1
 7     1        1     1  3.14      1 FALSE                         0      1                 1     1
 8     4        1     1  3.14      1 FALSE                         0      1                 1     1
 9     1        2     1  3.14      1 FALSE                         0      1                 1     1
10     4        2     1  3.14      1 FALSE                         0      1                 1     1
11     2        0     2 10         2 FALSE                         0      1                 2     1
12     3        0     2 10         2 FALSE                         0      1                 2     1
13     4        0     2 10         2 FALSE                         0      1                 2     1
14     2        1     2 10         2 FALSE                         0      1                 2     1
15     3        1     2 10         2 FALSE                         0      1                 2     1
16     4        1     2 10         2 FALSE                         0      1                 2     1
17     2        2     2 10         2 FALSE                         0      1                 2     1
18     3        2     2 10         2 FALSE                         0      1                 2     1
19     4        2     2 10         2 FALSE                         0      1                 2     1


Finn Lindgren

unread,
Jan 13, 2026, 8:06:32 AMJan 13
to Josh Cullen, R-inla discussion group
Note: the .block information generated by the likelihood construction is likely to change in the future, as it will need to match the observed points to their corresponding transects, but the .block info will also be used for cross-validation constructions, so may not always match the transect i (as it indeed already doesn't!)

Josh Cullen

unread,
Jan 13, 2026, 11:04:30 AMJan 13
to R-inla discussion group
Hi Finn,

Thanks for your reply! I'll repost this in the Discussion for the {inlabru} GH repo. However, I'll note that while I can replicate the behavior from this code that you posted, this doesn't work as expected when providing an {sf} LINESTRING object as a sampler to fm_int().

Best,
  Josh

Finn Lindgren

unread,
Jan 13, 2026, 11:16:32 AMJan 13
to R-inla discussion group
Please post details of the failure for LINESTRING on the github item so I can debug/fix.
Finn

Reply all
Reply to author
Forward
0 new messages