estimate mean and 95% CrI with linear combinations

154 views
Skip to first unread message

Ben Marcek

unread,
Aug 16, 2022, 11:52:12 AM8/16/22
to R-inla discussion group
Hi all,
I am new to using R-INLA and I'm trying to understand how to use the inla.make.lincombs() function to obtain the mean and 95% credible intervals for different levels of a covariate. In this case, my response is catch-per-effort and I have 2 categorical covariates, gear (2 levels; EF and DT) and habitat (3 levels; Trib, MC, and SC) in my model. For simplicity, I'll focus only on positive catches here without random effects or a spatial component, but this will be part of a barrier hurdle model. Here is what I've done so far. 

# create the gamma response
SVC2$pos <- ifelse(SVC2$CPUE_Ped_Hr > 0, SVC2$CPUE_Ped_Hr, NA)

# create the model matrix
Xm <- model.matrix(~ fGear + fHabitat, data = SVC2)
X <- data.frame(fGearDT = Xm[, 2],
                fHabitatMC = Xm[, 3],
                fHabitatSC = Xm[, 4])

# create the stack
stack.gamma1 <- inla.stack(tag = 'GammaFit1',
                           data = list(pos = SVC2$pos),
                           A = list(1, 1),
                           effects = list(Intercept = rep(1, nrow(SVC2)),
                                          X = as.data.frame(X)))

# and the formula
fglm1 <- formula(pos ~ -1 + Intercept + fGearDT + fHabitatMC + fHabitatSC)

#use inla.make.lincombs() to specify the contrast of interest
lcb <- inla.make.lincombs("Intercept" = -1, 'fGearDT' = 1,
                          "fHabitatMC" = 0, "fHabitatSC" = 0)

#run the model
glm1 <- inla(fglm1,
             family = 'gamma',
             lincomb = lcb,
             data = inla.stack.data(stack.gamma1),
             control.compute = list(dic = T,
                                    waic = T),
             control.predictor = list(A = inla.stack.A(stack.gamma1),
                                      link = 1))

summary(glm1)

Fixed effects:

             mean    sd 0.025quant 0.5quant 0.975quant   mode kld

Intercept   4.277 0.123      4.041    4.275      4.526  4.270   0

fGearDT     0.863 0.140      0.587    0.864      1.138  0.864   0

fHabitatMC -0.289 0.151     -0.583   -0.289      0.010 -0.291   0

fHabitatSC -0.313 0.213     -0.717   -0.318      0.121 -0.328   0


The summary indicates an important gear effect, but no effect of habitat on catch-per-effort.

What I'm uncertain of is the appropriate way to obtain the mean catch-per-effort and credible intervals for each gear (EF and DT). It appears that I can use 

glm1$marginals.fixed$Intercept

to get the posterior distribution of the Intercept (i.e., gear = EF and habitat = Trib) and 

glm1$marginals.fixed$fGearDT

to get the posterior for the difference between DT and EF in Trib habitat.

Am I understanding that correctly? If so, is there a way to separate the gear effects from the Trib habitat? Is there another way to derive the mean and 95% credible intervals for visualizing differences in catch-per-effort by gear?

As I move forward with this, I would also like to examine if there are spatial changes in the posterior distributions by adding the random effect of site or the SPDE into the linear combinations. How might I go about doing that?

I greatly appreciate any advice/suggestions/corrections to my understanding of this issue.

Thank you,
Ben

Tim Meehan

unread,
Aug 16, 2022, 2:43:20 PM8/16/22
to R-inla discussion group
Hi Ben. I haven't really figured out inla.make.lincombs() either, so I'll let someone else address that. But something you might not know if you are new to INLA, you can always add rows to your dataset with NA for the response variable and INLA will fill in predicted values with summaries. So you can specify whatever combinations of the gear and habitat variables you want, put in an NA for the response, add those rows to the data, and extract the predictions from the results object. If posterior summaries are all you want for the predictions then you are all set. If you want full samples of the posteriors, you can use inla.posterior.sample().

Ben Marcek

unread,
Aug 16, 2022, 3:28:14 PM8/16/22
to R-inla discussion group
Hi Tim,
Thanks for the suggestion. I had seen that as a possibility but I haven't figured out where to add rows to the data. Would this be as part of the original data set (SVC2 in my case) where I would specify 2 rows (1 for each gear type) and then NA for all other columns to get the gear-specific comparison, or as part of the X data frame in which case it's a bit more complicated since the intercept is not specified within that data frame and a 0 would have to be added for the intercept when creating the stack? I might be overthinking/misunderstanding so let me know if you have any information.
Thanks again,
Ben

Finn Lindgren

unread,
Aug 16, 2022, 3:39:25 PM8/16/22
to Ben Marcek, R-inla discussion group
Hi,

For what Tim mentions, I would suggest adding a separate stack for such output; you then only need to include the variables that are actually involved in the calculations, and don’t need to set other covariates to NA; inla.stack will do that for you.

But for more complex combinations, I nowadays much prefer posterior sampling; that gives you full control over what you calculate, and no special lincomb syntax is needed.

Finn

On 16 Aug 2022, at 20:28, Ben Marcek <ben.m...@gmail.com> wrote:

Hi Tim,
--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/b62619a9-b9de-4764-b7b7-8e6ca7d75e7bn%40googlegroups.com.

Tim Meehan

unread,
Aug 16, 2022, 5:39:44 PM8/16/22
to R-inla discussion group
Hi Ben,

Sounds like you are headed toward a spatial model, and that it will morph into a spatial hurdle model. For both those reasons, you probably want to keep working with estimation and prediction stacks, like Finn says, and also think about how to code this in inlabru as the predict() function makes predictions a lot more automagic. But, just in case you want some quick info on generating predictions using a simple inla model like the one you are starting with, here is some code to get you started.

# libraries
library(dplyr); library(ggplot2); library(INLA); library(inlabru)

# sample size
n.gear <- 2; n.hab <- 3; n.sample <- 30; n <- n.gear * n.sample

# factor levels
gear <- gl(n = n.gear, k = n.sample, length = n)
habitat <- gl(n = n.hab, k = n.sample / n.hab, length = n)

# effects
baseline <- 40    # intercept
gear.effects <- -10 # gear effect
habitat.effects <- c(5, 10)    # habitat effects
all.effects <- c(baseline, gear.effects, habitat.effects)

# error
sigma <- 3
eps <- rnorm(n, 0, sigma)

# model matrix
X <- as.matrix(model.matrix(~ gear+habitat) ); X

# response
pos <- as.numeric(as.matrix(X) %*% as.matrix(all.effects) + eps)

# all data in a dataframe
d1 <- data.frame(pos=pos, intercept=1) %>%
  bind_cols(as.data.frame(X) %>% select(-1)) %>%
  mutate(gear=gear, habitat=habitat)

# plot it
d1 %>%
  ggplot() + geom_point(aes(x=habitat, y=pos)) + facet_wrap(~gear)

# add some prediction rows to the input data just to show how it works
d2 <- d1 %>% mutate(pos=NA) %>% distinct()
d1 <- d1 %>% rbind(d2)

# inla model
in1 <- inla(pos ~ gear + habitat,
           data=d1,
           control.compute=list(config=T),
           control.predict=list(compute=T))
summary(in1)

# get inla posterior sample
s1 <- inla.posterior.sample(100, in1)
row.names(s1[[1]]$latent)

# pull out samples for the prediction rows 61-66
pred_samps <- t(inla.posterior.sample.eval("Predictor", s1)[61:66,])
head(pred_samps)

# pull out samples for the effects. then you can do any
# kind of prediction you want using the linear predictor
all_samps <- data.frame(
  int_samps=as.numeric(inla.posterior.sample.eval("(Intercept)", s1)),
  gear2_samps=as.numeric(inla.posterior.sample.eval("gear2", s1)),
  habitat2_samps=as.numeric(inla.posterior.sample.eval("habitat2", s1)),
  habitat3_samps=as.numeric(inla.posterior.sample.eval("habitat3", s1)))
head(all_samps)

# or use an inlabru model
br1 <- bru(pos ~ gear2 + habitat2 + habitat3,
            data=d1)
summary(br1)

# and get inlabru predictions
d1 %>%
  select(gear2) %>%
  distinct() %>%
  predict(object = br1, formula = ~ gear2)
d1 %>%
  select(habitat2, habitat3) %>%
  distinct() %>%
  predict(object = br1, formula = ~ habitat2 + habitat3) 

Ben Marcek

unread,
Aug 17, 2022, 10:08:26 AM8/17/22
to R-inla discussion group
Tim and Finn,
Thank you for the information, this was extremely helpful! Now that I have a better sense for it, it seems that using the posterior sampling is a reasonably easy way to get the information I wanted. As a follow up question, how would I incorporate a random effect into the posterior sampling? I'm curious about this since I will be moving towards more complex models in the near future.
Thanks again for your help,
Ben

Finn Lindgren

unread,
Aug 17, 2022, 11:05:50 AM8/17/22
to Ben Marcek, R-inla discussion group
Hi, that depends a bit on the type of random effect, and whether it’s for an observed point or an unobserved point.
For iid effects and observed points, the estimated random effect value is included in the model, so no special trick is needed.
For iid effects for new point, I.e. actual prediction/extrapolation/interpolation, you can sample from it using the sampled value of the precision parameter. You’ll need to look in the sampling output to find the name of the specific parameter.
For spde effects, no special handling is needed, as the whole defined domain is part of the model.
For other structured random effects, the only simple way is to make sure that all locations you want to predict for are part of the estimated model (which is what using a “prediction stack” can be used for).

Finn

On 17 Aug 2022, at 15:08, Ben Marcek <ben.m...@gmail.com> wrote:

Tim and Finn,

Ben Marcek

unread,
Aug 17, 2022, 1:22:42 PM8/17/22
to R-inla discussion group
Thanks, Finn.
That gives me the information I needed. I was most interested in the iid effects for an observed point and the spde effects.
Ben

Finn Lindgren

unread,
Aug 17, 2022, 1:49:47 PM8/17/22
to Ben Marcek, R-inla discussion group
When using the interface in the inlabru package, there’s a special method for sampling new values for an iid model component in that way that is done automatically. From what I recall it requires an option to be set somewhere; will need to look at the documentation for inlabru::generate/predict where it talks about the _eval() feature.

Finn

On 17 Aug 2022, at 18:22, Ben Marcek <ben.m...@gmail.com> wrote:

Thanks, Finn.
Reply all
Reply to author
Forward
0 new messages