distance sampling with random effects

197 views
Skip to first unread message

Tim Meehan

unread,
Dec 13, 2023, 3:27:07 PM12/13/23
to R-inla discussion group
hello all:

i am fitting distance functions to data using inlabru and the 
following example: 

in the distance models there is a shape parameter. my distance data has 
multiple species and i am wondering if i can make my shape parameter 
a random effect that varies by species, or otherwise a fixed effect by species. 
some species have very few observations (i.e., 1 observation), hence 
shrinkage toward a mean would be nice.

any advice for how to do that? here is some reproducible code for a single
species model and then a dataset with 3 species if it is helpful to demonstrate,

Thank you!

# libraries
library(inlabru)
library(fmesher)
library(INLA)
library(ggplot2)
library(dplyr)
library(sf)

# data
mexdolphin <- mexdolphin_sf # from example
mexdolphin2 <- mexdolphin$points %>% mutate(species_idx=1) %>% # add more species
  bind_rows(sample_frac(mexdolphin$points %>% st_drop_geometry(), size=.5) %>% mutate(species_idx=2)) %>%
  bind_rows(sample_frac(mexdolphin$points %>% st_drop_geometry(), size=.25) %>% mutate(species_idx=3))

# distance function
hn <- function(distance, sigma) {
  exp(-0.5 * (distance / sigma)^2)
}
sigma <- function(x) {
  bru_forward_transformation(qexp, x, rate = 1 / 8)
}

# model
formula <- distance ~ log(hn(distance, sigma(sigma_theta))) + Intercept
cmp <- ~ sigma_theta(1, prec.linear = 1) + Intercept(1)
dfit <- lgcp(
  components = cmp,
  mexdolphin$points,
  domain = list(distance = fm_mesh_1d(seq(0, 8, length.out = 30))),
  formula = formula,
  options = list(bru_initial = list(sigma_theta = 1, Intercept = 3))
)

# prediction
distdf <- data.frame(distance = seq(0, 8, length.out = 100))
detfun <- predict(dfit, distdf, ~ hn(distance, sigma(sigma_theta)))
plot(detfun)
summary(detfun) # this how to get average detecion probability?

# plot
hnline <- data.frame(distance = detfun$distance, p = detfun$mean,
                     lower = detfun$q0.025, upper = detfun$q0.975)
wts <- diff(hnline$distance)
wts[1] <- wts[1] / 2
wts <- c(wts, wts[1])
hnarea <- sum(wts * hnline$p)
n <- length(mexdolphin$points$distance)
scale <- n / hnarea
hnline$En <- hnline$p * scale
hnline$En.lower <- hnline$lower * scale
hnline$En.upper <- hnline$upper * scale
dlines <- rbind(
  cbind(hnline, model = "Half-normal")
)
ggplot(data.frame(mexdolphin$points)) +
  geom_histogram(aes(x = distance), breaks = seq(0, 8, length.out = 9),
                 alpha = 0.3) +
  geom_point(aes(x = distance), y = 0.2, shape = "|", size = 3) +
  geom_line(data = dlines, aes(x = distance, y = En, group = model,
                               col = model))

# three species plot
ggplot(data.frame(mexdolphin2)) +
  geom_histogram(aes(x = distance), breaks = seq(0, 8, length.out = 9),
                 alpha = 0.3) +
  geom_point(aes(x = distance), y = 0.2, shape = "|", size = 3) +
  facet_wrap(~species_idx)

Finn Lindgren

unread,
Dec 13, 2023, 5:28:19 PM12/13/23
to R-inla discussion group
Hi Tim,

there are two issues to deal with here:
1. How to model the individual species densities.
2. How to construct a "sigma" parameter model that handles
low-prevalence species in a robust way.

For 1., the simplest option is when the same spatial pattern is
assumed for all species, with just a separate Intercept for each
species, but this part can be made much more complex...
In your example code, you're however not modelling the spatial
distribution at all, so you only need separate intercepts to account
for the different relative abundance;
Intercept(species_idx, model = "iid", hyper=list(prec=list(...)))

For 2., one can view the model as a "marked point process, where the
mark influences the detectability". This type of model can be
constructed as a point process on the "product space", which
in your example would be "distance x species". Using the new
"marginal" feature, and without attempting to "borrow strength"
between species:

formula <- distance + species_idx ~ log(hn(distance, sigma)) + Intercept
cmp <- ~
sigma(species_idx,
model = "iid",
hyper=list(prec=list(initial=0, fixed = TRUE)),
marginal = bru_mapper_marginal(qexp, rate = 1 / 8)) +
Intercept(species_idx, model = "iid",
hyper=list(prec=list(initial = -2, fixed = TRUE)))
dfit <- lgcp(
components = cmp,
mexdolphin$points,
domain = list(
distance = fm_mesh_1d(seq(0, 8, length.out = 30)),
species_idx = sort(unique(mexdolphin$points$species_idx))),
formula = formula,
options = list(bru_initial = list(sigma = rep(1, n_species),
Intercept = rep(3, n_species))))


To borrow strength between the sigma parameters, perhaps this?

cmp <- ~
sigma_common(1, prec.linear = 1, marginal =
bru_mapper_marginal(qexp, rate = 1 / 8)) +
sigma_log_factor(species_idx,
model = "iid",
hyper=list(prec=list(initial=0, fixed = FALSE)),
constr = TRUE # For identifiability
) + ...

formula <- distance + species_idx ~ log(hn(distance,
sigma_common*exp(sigma_log_factor))) + Intercept


Finn
> --
> You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
> To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/d99ea391-ba31-4c95-b040-b590f3fa213fn%40googlegroups.com.



--
Finn Lindgren
email: finn.l...@gmail.com

Tim Meehan

unread,
Dec 13, 2023, 6:34:44 PM12/13/23
to R-inla discussion group
Thanks, Finn! You are right, in my case, I am not including a spatial component as I am only interested in a species-specific distance function, which will give me a species-specific average detection probability, which I will use to correct observed species-specific counts outside of modeling. I know it's a clunky way to correct the counts, but there are reasons for this approach for this project. Looking forward to trying your code tomorrow!

Tim Meehan

unread,
Dec 15, 2023, 2:55:23 PM12/15/23
to R-inla discussion group
Hi Finn, All,

Finn, I tried the code you sent and got the first model with random intercepts to work after installing the latest inlabru version. The second model with the sigma_log_factor component, I'm not so sure. The model computes but when I try to predict and plot, I get funky plots. That could be because I specified model options wrong or that I am giving predict() the wrong formula. If you you wouldn't mind looking over the latest script, below, I'd appreciate it. Once it is for-sure working I will add the final script here for others. Or I am happy to work it into the original vignette if you think it would be helpful. Doing a multispecies distance function like this is a common objective for ecologists.

Thanks! 
Tim

# libraries
library(inlabru) # >= v 2.10 !!!

library(fmesher)
library(INLA)
library(ggplot2)
library(dplyr)
library(sf)

# data
mexdolphin <- mexdolphin_sf # from example
mexdolphin2 <- mexdolphin$points %>% mutate(species_idx=1) %>% # add more species
  bind_rows(sample_frac(mexdolphin$points %>% st_drop_geometry(), size=.5) %>% mutate(species_idx=2)) %>%
  bind_rows(sample_frac(mexdolphin$points %>% st_drop_geometry(), size=.25) %>% mutate(species_idx=3))

# distance function
hn <- function(distance, sigma) {
  exp(-0.5 * (distance / sigma)^2)
}
sigma <- function(x) {
  bru_forward_transformation(qexp, x, rate = 1 / 8)
}

# 1 species model 1

formula <- distance ~ log(hn(distance, sigma(sigma_theta))) + Intercept
cmp <- ~ sigma_theta(1, prec.linear = 1) + Intercept(1)
dfit <- lgcp(
  components = cmp,
  mexdolphin$points,
  domain = list(distance = fm_mesh_1d(seq(0, 8, length.out = 30))),
  formula = formula,
  options = list(bru_initial = list(sigma_theta = 1, Intercept = 3))
)

# prediction model 1

distdf <- data.frame(distance = seq(0, 8, length.out = 100))
detfun <- predict(dfit, distdf, ~ hn(distance, sigma(sigma_theta)))
plot(detfun)
summary(detfun)

# 1 species plot

hnline <- data.frame(distance = detfun$distance, p = detfun$mean,
                     lower = detfun$q0.025, upper = detfun$q0.975)
wts <- diff(hnline$distance)
wts[1] <- wts[1] / 2
wts <- c(wts, wts[1])
hnarea <- sum(wts * hnline$p)
n <- length(mexdolphin$points$distance)
scale <- n / hnarea
hnline$En <- hnline$p * scale
hnline$En.lower <- hnline$lower * scale
hnline$En.upper <- hnline$upper * scale
dlines <- rbind(
  cbind(hnline, model = "Half-normal")
)
ggplot(data.frame(mexdolphin$points)) +
  geom_histogram(aes(x = distance), breaks = seq(0, 8, length.out = 9),
                 alpha = 0.3) +
  geom_point(aes(x = distance), y = 0.2, shape = "|", size = 3) +
  geom_line(data = dlines, aes(x = distance, y = En, group = model,
                               col = model))

# initial three species plot

ggplot(data.frame(mexdolphin2)) +
  geom_histogram(aes(x = distance), breaks = seq(0, 8, length.out = 9),
                 alpha = 0.3) +
  geom_point(aes(x = distance), y = 0.2, shape = "|", size = 3) +
  facet_wrap(~species_idx)

# 3 species model 1
n_species <- length(unique(mexdolphin2$species_idx))
formula1 <- distance + species_idx ~ log(hn(distance, sigma)) + Intercept
cmp1 <- ~

  sigma(species_idx,
        model = "iid",
        hyper=list(prec=list(initial=0, fixed = TRUE)),
        marginal = bru_mapper_marginal(qexp, rate = 1 / 8)) +
  Intercept(species_idx, model = "iid",
            hyper=list(prec=list(initial = -2, fixed = TRUE)))
dfit1 <- lgcp(
  components = cmp1,
  mexdolphin2,

  domain = list(
    distance = fm_mesh_1d(seq(0, 8, length.out = 30)),
    species_idx = sort(unique(mexdolphin2$species_idx))),
  formula = formula1,

  options = list(bru_initial = list(sigma = rep(1, n_species),
                                    Intercept = rep(3, n_species))))

# 3 spp model 1 preds
distdf <- expand.grid(distance = seq(0, 8, length.out = 100),
                      species_idx=1:3)
detfun <- predict(dfit1, distdf, ~ hn(distance, sigma))
plot(detfun) + facet_wrap(~species_idx)
detfun %>% group_by(species_idx) %>%
  summarise(mean_p_per_species=mean(mean))

# 3 species model 2
formula2 <- distance + species_idx ~ log(hn(distance,
                              sigma_common*exp(sigma_log_factor))) + Intercept
cmp2 <- ~

  sigma_common(1, prec.linear = 1, marginal =
                 bru_mapper_marginal(qexp, rate = 1 / 8)) +
  sigma_log_factor(species_idx,
                   model = "iid",
                   hyper=list(prec=list(initial=0, fixed = FALSE)),
                   constr = TRUE) +
  Intercept(species_idx, model = "iid",
            hyper=list(prec=list(initial = -2, fixed = TRUE)))
dfit2 <- lgcp(
  components = cmp2,
  mexdolphin2,

  domain = list(
    distance = fm_mesh_1d(seq(0, 8, length.out = 30)),
    species_idx = sort(unique(mexdolphin2$species_idx))),
  formula = formula2,
  options = list(bru_initial = list(sigma_common = 1,
                                    sigma_log_factor = rep(1, n_species),
                                    Intercept = rep(3, n_species))))

# 3 spp model 2 preds
distdf <- expand.grid(distance = seq(0, 8, length.out = 100),
                      species_idx=1:3)
detfun <- predict(dfit2, distdf, ~log(hn(distance,
                                       sigma_common*exp(sigma_log_factor))))
plot(detfun) + facet_wrap(~species_idx)
detfun %>% group_by(species_idx) %>%
  summarise(mean_p_per_species=mean(mean))

Finn Lindgren

unread,
Dec 15, 2023, 3:40:40 PM12/15/23
to Tim Meehan, R-inla discussion group
A few bugs in your code... Corrected version of the second model below.
In short
- The initial values for sigma_log_factor need to respect the
sum-to-zero constraint. Starting them at zero works fine.
- You were predicting log(hn()) at the end, instead of hn().
I think that's all I needed to change; the results I get are now
similar for dfit1 and dfit2, with slightly narrower uncertainty bands
for dfit2, indicating that it did indeed "borrow strength" between the
species.
You should also set the parameters of the sigma_log_factor hyper
specification instead of relying on the defaults; who knows what they
mean in this context...

The bru_convergence_plot() figure indicates some wobbliness in the
convergence of the parameters for the least abundant species; not sure
if/how to improve that except by tightening the convergence detection
tolerance a bit; it is stable and has decreasing step lengths, but
oscillates around what would likely be the true optimum. The
oscillation is well within the posterior uncertainty bands though.

Finn


formula2 <- distance + species_idx ~ log(hn(distance,

sigma_common*exp(sigma_log_factor))) + Intercept
cmp2 <- ~
sigma_common(1, prec.linear = 1, marginal =
bru_mapper_marginal(qexp, rate = 1 / 8)) +
sigma_log_factor(species_idx,
model = "iid",
hyper=list(prec=list(initial=0,
fixed = FALSE)),
constr = TRUE) +
Intercept(species_idx, model = "iid",
hyper=list(prec=list(initial = -2, fixed = TRUE)))
dfit2 <- lgcp(
components = cmp2,
mexdolphin2,
domain = list(
distance = fm_mesh_1d(seq(0, 8, length.out = 30)),
species_idx = sort(unique(mexdolphin2$species_idx))),
formula = formula2,
options = list(bru_initial = list(sigma_common = 1,
sigma_log_factor = rep(0, n_species),
Intercept = rep(3, n_species))))

# 3 spp model 2 preds
distdf <- expand.grid(distance = seq(0, 8, length.out = 100),
species_idx=1:3)
detfun <- predict(dfit2, distdf, ~hn(distance,
sigma_common*exp(sigma_log_factor)))
plot(detfun) + facet_wrap(~species_idx)
detfun %>% group_by(species_idx) %>%
summarise(mean_p_per_species=mean(mean))

> To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/092af797-64a5-491a-9b87-d6e78fd5628en%40googlegroups.com.

Finn Lindgren

unread,
Dec 15, 2023, 3:43:31 PM12/15/23
to Tim Meehan, R-inla discussion group
A vignette on this would be helpful; it's too much to put into the
full distance sampling vignette, so starting with a separate vignette
with just the detection modelling would be preferable, I think, since
the full spatial model aspect requires multivariate spatial modelling,
making it more complicated.

Finn

Tim Meehan

unread,
Dec 15, 2023, 5:30:52 PM12/15/23
to R-inla discussion group
Excellent! I'll send along a vignette in a month or so. In the meantime here is some rough code that works in case it is useful to others.

# 3 species model 1 --- random species intercepts
# 3 spp model 1 plot

hnline <- data.frame(distance = detfun$distance, p = detfun$mean,
                     lower = detfun$q0.025, upper = detfun$q0.975,
                     species_idx=detfun$species_idx)
scale_for_plots <- function(obs_dat=mexdolphin2, pred_dat=hnline){
  wts <- diff(pred_dat[,"distance"])

  wts[1] <- wts[1] / 2
  wts <- c(wts, wts[1])
  hnarea <- sum(wts * pred_dat[, "p"])
  n <- length(obs_dat$distance)

  scale <- n / hnarea
  pred_dat$En <- pred_dat[, "p"] * scale
  pred_dat$En.lower <- pred_dat[, "lower"] * scale
  pred_dat$En.upper <- pred_dat[, "upper"] * scale
  return(pred_dat)
}
out_df <- c()
for(i in 1:n_species){
  out_i <- scale_for_plots(obs_dat=mexdolphin2[mexdolphin2$species_idx==i, ],
                           pred_dat=hnline[hnline$species_idx==i, ])
  out_df <- rbind(out_df, out_i)
}
plot_dat <- out_df

ggplot(data.frame(mexdolphin2)) +
  geom_histogram(aes(x = distance),
                 breaks = seq(0, 8, length.out = 9),
                 alpha = 0.3) +
  geom_point(aes(x = distance), y = 0.2, shape = "|", size = 3) +
  geom_ribbon(data = plot_dat,
              aes(x = distance, ymin = En.lower, ymax=En.upper, fill=factor(species_idx)),
              alpha=0.3) +
  geom_line(data = plot_dat, aes(x = distance, y = En)) +
  facet_wrap(~species_idx)

# 3 species model 2 --- random intercepts and shape parameters
# 3 spp model 2 plot

hnline <- data.frame(distance = detfun$distance, p = detfun$mean,
                     lower = detfun$q0.025, upper = detfun$q0.975,
                     species_idx=detfun$species_idx)
scale_for_plots <- function(obs_dat=mexdolphin2, pred_dat=hnline){
  wts <- diff(pred_dat[,"distance"])

  wts[1] <- wts[1] / 2
  wts <- c(wts, wts[1])
  hnarea <- sum(wts * pred_dat[, "p"])
  n <- length(obs_dat$distance)

  scale <- n / hnarea
  pred_dat$En <- pred_dat[, "p"] * scale
  pred_dat$En.lower <- pred_dat[, "lower"] * scale
  pred_dat$En.upper <- pred_dat[, "upper"] * scale
  return(pred_dat)
}
out_df <- c()
for(i in 1:n_species){
  out_i <- scale_for_plots(obs_dat=mexdolphin2[mexdolphin2$species_idx==i, ],
                  pred_dat=hnline[hnline$species_idx==i, ])
  out_df <- rbind(out_df, out_i)
}
plot_dat <- out_df

ggplot(data.frame(mexdolphin2)) +
  geom_histogram(aes(x = distance),
                 breaks = seq(0, 8, length.out = 9),
                 alpha = 0.3) +
  geom_point(aes(x = distance), y = 0.2, shape = "|", size = 3) +
  geom_ribbon(data = plot_dat,
              aes(x = distance, ymin = En.lower, ymax=En.upper, fill=factor(species_idx)),
              alpha=0.3) +
  geom_line(data = plot_dat, aes(x = distance, y = En)) +
  facet_wrap(~species_idx) 

Tim Meehan

unread,
Dec 18, 2023, 12:11:03 PM12/18/23
to R-inla discussion group
One more question: If one wanted to add another nuisance variable, say effort, to the distance detection function, would you so it like the code below? The code runs but the fit for effort is perfect which makes me think there is something funky.

# 3 species model 2
formula2 <- distance + species_idx ~ log(hn(distance,
                              sigma_common*exp(sigma_log_factor))) + Intercept + effort

cmp2 <- ~
  sigma_common(1, prec.linear = 1, marginal =
                 bru_mapper_marginal(qexp, rate = 1 / 8)) +
  sigma_log_factor(species_idx,
                   model = "iid",
                   hyper=list(prec=list(initial=0,
                                        fixed = FALSE)),
                   constr = TRUE) +
  effort(Effort, model="rw2", scale.model=T) +

  Intercept(species_idx, model = "iid",
            hyper=list(prec=list(initial = -2, fixed = TRUE)))
 
dfit2 <- lgcp(
  components = cmp2,
  mexdolphin2,
  domain = list(
    distance = fm_mesh_1d(seq(0, 8, length.out = 30)),
    species_idx = sort(unique(mexdolphin2$species_idx))),
  formula = formula2,
  options = list(bru_initial = list(sigma_common = 1,
                                    sigma_log_factor = rep(0, n_species),
                                    Intercept = rep(3, n_species))))

Finn Lindgren

unread,
Dec 18, 2023, 6:22:30 PM12/18/23
to Tim Meehan, R-inla discussion group
The correct code for this in the case of spatial modelling depends on how the effort varies. If it’s a function of location, the component input needs to be given in such a way that inlabru can evaluate it for arbitrary locations; you cannot supply precomputed values, but should instead e.g. store it as a terra raster.

But since you don’t have space in your model it’s less clear what the intended model is meant to be. Every model component in the point process must be a function of the domain variables, I.e. here distance and species_idx. So effort would need to be a function of distance and species_idx _only_. How to code that depends on the storage…  what does the effort depend on? If it depends on location, you need to either make a spatial model, or potentially make “effort” a dimension of the point process?  I’ll need more details to give an accurate answer.
Finn

On 18 Dec 2023, at 18:11, Tim Meehan <tme...@gmail.com> wrote:

One more question: If one wanted to add another nuisance variable, say effort, to the distance detection function, would you so it like the code below? The code runs but the fit for effort is perfect which makes me think there is something funky.
Reply all
Reply to author
Forward
0 new messages