# Poor prediction interval coverage probability (PCIP) using INLA

25 views

### Pierre Roudier

May 10, 2022, 12:50:20 AMMay 10
to R-inla discussion group
Hi all,

I am using INLA to derive predictions + uncertainty estimates.

I find that the prediction interval coverage  probability (PCIP, ie the percentage of actual observations falling within the prediction interval) is generally poor.

Here is an example I have used in the past on an unrelated note with Finn Lindgren (thanks to him for the development of this example):

```
library(dplyr)
library(INLA)

# devtools::install_github("julianfaraway/faraway")
data(meatspec, package = "faraway")

# Scaling from the original example; numerically unstable without it,
# at least with the default prior
scaley <- 100

# To be able to control scopes, encapsulate the data and avoid global variables.
the_data <- list(
X = matrix(1, nrow = nrow(meatspec), ncol = 1),
# Z is the matrix of predictors (in this case, a matrix of reflectance values)
Z = as.matrix(meatspec[,which(colnames(meatspec) != "fat")]),
# The outcome is the meat fat content
orig_outcome = meatspec\$fat / scaley,
outcome = meatspec\$fat / scaley
)

# Setting some to NA to check performance on held out samples
the_data\$outcome[173:215] <- NA

# INLA model, based on https://julianfaraway.github.io/brinla/examples/ridge.html#ridge-regression-using-bayes
# with modifications from Finn
mod_inla <- inla(
formula = outcome ~ -1 + X +  f(psi, model = "z", Z = Z, hyper = list()),
data = c(the_data,
list(psi = seq_len(nrow(meatspec)))),
control.predictor = list(compute = TRUE)
)
```

The PICP is then computed as follows:

```
# PICP on held-out samples for the 95% PI

mod_inla\$summary.fitted.values[173:215,] %>%
bind_cols(obs = the_data\$orig_outcome[173:215]) %>%
mutate(obs_in_pi = obs >= `0.025quant` & obs <= `0.975quant`) %>%
summarise(picp = sum(obs_in_pi) / n())
```

I am getting pcip = 0.51, while the expected value for the prediction interval used in this example is 0.95.

Any reason for this systematic underestimation of the prediction interval?

Kind regards,

Pierre

### Pierre Roudier

May 10, 2022, 6:22:31 PMMay 10
to R-inla discussion group
Below a comparison with a similar approach, using {mgcv}.

It is giving a PICP value of 0.98, much closer to the 0.95 corresponding to the 95% prediction interval chosen in this example.

```
library(mgcv)
library(gratia)

# devtools::install_github("julianfaraway/faraway")
data(meatspec, package = "faraway")

the_data <- list(
X = matrix(1, nrow = nrow(meatspec), ncol = 1),
# Z is the matrix of predictors (in this case, a matrix of reflectance values)
Z = as.matrix(meatspec[,which(colnames(meatspec) != "fat")]),
# The outcome is the meat fat content
orig_outcome = meatspec\$fat / scaley,
outcome = meatspec\$fat / scaley
)

fm_mgcv <- as.formula(
paste0(
"fat ~ ",
paste0("s(", colnames(the_data\$Z), ", bs = 'cr', k = 3)", collapse = " + ")
)
)

mod_mgcv <- gam(
formula = fm_mgcv,
data = data.frame(fat = the_data\$outcome, the_data\$Z),
family = betar(),
method = 'REML'
)

sim_outcome <- scaley * gratia:::simulate.gam(
mod_mgcv,
1000,
newdata = data.frame(the_data\$Z)
)

res_mgcv <- data.frame(
obs = scaley * the_data\$outcome,
t(apply(sim_outcome, 1, quantile, probs = c(0.025, 0.5, 0.975)))
)

# PICP
res_mgcv %>%
mutate(obs_in_pi = obs >= `X2.5.` & obs <= `X97.5.`) %>%
summarise(picp = sum(obs_in_pi) / n())
```