Posterior predictive intervals from generate()/predict() — observation noise not included?

16 views
Skip to first unread message

Peter Dorey

unread,
Apr 22, 2026, 12:32:40 PM (12 hours ago) Apr 22
to R-inla discussion group

Hi all,

I've been using inlabru for Bayesian spatial population modelling  and have run into a question about how prediction intervals should be constructed. I'd appreciate any guidance on whether my understanding is correct and what the recommended approach is.

When I use generate() or predict() to produce intervals at held-out locations for cross-validation, the resulting 95% intervals achieve substantially less than 95% coverage against observed data. After some investigation, I believe this is because the samples returned by generate() represent the posterior of the linear predictor-  they are samples of E[y|θ], the conditional mean,  and do not include the observation-level noise from the likelihood.

For a Gaussian example this is straightforward to verify: the 95% quantiles from generate() match those from summary.fitted.values, and both undercover by a wide margin. I've attached a minimal reproducible example below that demonstrates this.

My current workaround is to include the observation noise directly inside the generate() formula, which seems to work well and has the nice property of jointly sampling the precision hyperparameter:


# Without observation noise — credible intervals for E[y] generate(fit, newdata = df_new, formula = ~ Intercept + x1 + x2 + group_re, n.samples = 2000) # With observation noise — posterior predictive intervals for y_new generate(fit, newdata = df_new, formula = ~ rnorm( n = length(Intercept), mean = Intercept + x1 + x2 + group_re, sd = 1 / sqrt(Precision_for_the_Gaussian_observations) ), n.samples = 2000)

The first gives ~17% coverage; the second gives ~95%.

My questions:

  1. Is my understanding correct that generate(), predict(), and summary.fitted.values all return intervals for the latent mean rather than for a new observation? Is this the intended behaviour?
  2. Is the rnorm()-inside-the-formula approach a reasonable way to obtain posterior predictive samples, or is there a cleaner built-in mechanism I'm missing?
  3. More broadly, is there any plan to add something like a type = "response" or include.observation.noise = TRUE argument to predict()/generate()? This seems like a common source of confusion, particularly for cross-validation, where the natural quantity to evaluate against is the observed data, not the latent mean.

I realise this is documented in the INLA literature (Martino & Riebler 2019 note that "adding the observational noise to the fitted values leads to the predictive distributions"), but in practice it's easy to miss, especially since the standard cross-validation metrics (bias, RMSE, correlation) all depend on the posterior mean and are identical regardless of which interval type is used, only coverage reveals the problem. I also haven't seen this mentioned much in the tutorials and teaching literature around INLA. 

Minimal reproducible example below. Thanks for any guidance.

Peter


###############################################################################

library(inlabru)
library(matrixStats)

set.seed(2026)

# ── 1. Simulate data ────────────────────────────────────────────────────────

n         <- 1000
n_groups  <- 25
beta_0    <- 1.0
beta_1    <- 0.5
beta_2    <- -0.5
sigma_re  <- 0.1
sigma_obs <- 1.0       # observation-level SD (true precision = 1)

group <- sample(1:n_groups, n, replace = TRUE)
x1    <- rnorm(n)
x2    <- rnorm(n)
re    <- rnorm(n_groups, 0, sigma_re)

eta   <- beta_0 + beta_1 * x1 + beta_2 * x2 + re[group]
y     <- rnorm(n, mean = eta, sd = sigma_obs)

# ── 2. Hold out 20% ────────────────────────────────────────────────────────

holdout <- sample(1:n, round(0.2 * n))
y_true  <- y[holdout]

y_train <- y
y_train[holdout] <- NA

df <- data.frame(y = y_train, x1 = x1, x2 = x2, group = group)

# ── 3. Fit model ───────────────────────────────────────────────────────────

cmp <- y ~ Intercept(1) +
  x1(x1, model = "linear") +
  x2(x2, model = "linear") +
  group_re(group, model = "iid",
           hyper = list(prec = list(prior = "pc.prec", param = c(0.5, 0.01))))

fit <- bru(components = cmp, data = df, family = "gaussian",
           options = list(control.compute = list(config = TRUE),
                          verbose = FALSE))

# ── 4. Approach A: generate() without observation noise ─────────────────────

samples_A <- generate(fit, newdata = df[holdout, ],
                      formula = ~ Intercept + x1 + x2 + group_re,
                      n.samples = 2000)

q_A   <- rowQuantiles(samples_A, probs = c(0.025, 0.975))
cov_A <- mean(y_true >= q_A[, 1] & y_true <= q_A[, 2])

cov_A

#  predict() also produces the same intervals, since it samples from
#  the same latent posterior as generate().

fitted_holdout <- fit$summary.fitted.values[holdout, ]
cov_fitted     <- mean(y_true >= fitted_holdout$`0.025quant` &
                         y_true <= fitted_holdout$`0.975quant`)

## Results: the "generate() without noise" and "fitted values" approaches give
cov_A
cov_fitted
## These are approximately equal


# ── 5. Approach B: generate() WITH observation noise ────────────────────────


samples_B <- generate(fit, newdata = df[holdout, ],
                      formula = ~ rnorm(
                        n    = 2000,
                        mean = Intercept + x1 + x2 + group_re,
                        sd   = 1 / sqrt(Precision_for_the_Gaussian_observations)
                      ),
                      n.samples = 2000)

q_B   <- rowQuantiles(samples_B, probs = c(0.025, 0.975))
cov_B <- mean(y_true >= q_B[, 1] & y_true <= q_B[, 2])


cov_B

# Nominal Target 95% 

Finn Lindgren

unread,
Apr 22, 2026, 6:34:38 PM (6 hours ago) Apr 22
to Peter Dorey, R-inla discussion group
Hi Peter,

yes, this is a fundamental concept that apparently gets glossed over in teaching.

One always has to be precise about which quantity one is computing the distribution for;
The models in inla have two components:
1. A model for a predictor quantity, \eta, relating latent variables to \eta
2. A model for how response variables relate to \eta

Usually there is also "1.5", a link function g(), relating a linear predictor \eta in some nonlinear way to a property of the response variable distribution, usually a location parameter.

The linear predictor summary is for the posterior distribution of \eta, i.e. posterior prediction of \eta.
The "fitted.values" summary is for the posterior distribution of g^{-1}(\eta), i.e. posterior prediction of g{-1}(\eta)
To get posterior prediction for the response variables themselves, one has to combine the \eta distribution with the response distribution, like you did with generate/predict with "\eta + rnorm(..., sd =...)", which is the most generally applicable approach for this.
Quantile summaries of that posterior predictive distribution for the response variables give predictive intervals for the responses.
For Poisson responses, one can (conceptually use  "rpois(..., lambda = exp(\eta))" as predictor for generate/predict to access the posterior response variable distribution for new locations.

Note: in frequentist statistics, one can usually be lazy and imprecise, since there is such thing as a frequentist prediction intervals for \eta, as \eta is "deterministic, unknown", and one has to resort to confidence intervals instead,
and only response variables involve prediction intervals. That way "confidence interval" vs "prediction interval" implicitly specifies which quantity is being targeted.
However, in the Bayesian setting, almost everything is a random variable, so everything can be associated with a prediction interval, specifically both \eta and the responses, and one is _forced_ to be more precise. I wish educators were more clear about this (it is, or should be, obvious for those with a grounding in probability theory, but that isn't the case in practice...)

For an example of how one can sometimes do more efficient calculation/estimation of posterior response level distributions, see e.g.
https://inlabru-org.github.io/inlabru/articles/2d_lgcp_distancesampling.html#hazard-rate-detection-function
where the end of the section uses a Rao-Blackwellisation technique to compute the full posterior probability mass function for the total number of animals, without the extra Monte Carlo error induced by full response level sampling.
Instead, it uses that f(x|y) = \int f(x|y,\theta) f(\theta|y) d\theta, so the Monte Carlo variability is only from the sampling of the posterior distribution of \theta, and not x directly.

Finally, there have been attempts at implementing automated tools for response level posterior sampling from inla estimates, but even though each individual model tends to have a simple straightforward solution, it has proven more difficult to design a system that could work for the full range of inla/inlabru models. It is however very much "would be nice to solve/implement in the future".

Best wishes,
Finn

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion, visit https://groups.google.com/d/msgid/r-inla-discussion-group/144a78ab-79d6-486c-b43b-ef53cec616a1n%40googlegroups.com.


--
Finn Lindgren
email: finn.l...@gmail.com
Reply all
Reply to author
Forward
0 new messages