Model selection

Yalemzewod Gelaw

unread,

Nov 28, 2022, 5:17:52 AM11/28/22

to r-inla-disc...@googlegroups.com

Hi INLA Support group,

I fitted four different poison models to predict the Spatio-temporal mapping of malaria at the district level. I was wondering if someone can share :

1) what metric is better to choose the good fit model (DIC, WAIC, CPO, PIT)

2) what overfitted model means and how do check the overfitted poison model?

Thank you,

Yalem

--

Best regards,

_______________________

Yalemzewod A Gelaw

Postdoctoral Research Officer

Geospatial Health and Development team

Telethon Kids Institute, Perth, Western Australia

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

E: yala...@gmail.com

ORCiD

Helpdesk (Haavard Rue)

unread,

Nov 28, 2022, 5:38:18 AM11/28/22

to Yalemzewod Gelaw, r-inla-disc...@googlegroups.com

I would also add https://arxiv.org/abs/2210.04482 to the list...

On Mon, 2022-11-28 at 18:17 +0800, Yalemzewod Gelaw wrote:
> Hi INLA Support group,
>
> I fitted four different poison models to predict the Spatio-temporal
> mapping of malaria at the district level. I was wondering if someone
> can share :
> 1) what metric is better to choose the good fit model (DIC, WAIC, CPO,
> PIT)
> 2) what overfitted model means and how do check the overfitted poison
> model?
>
> Thank you,
> Yalem
>
>

--

Håvard Rue
he...@r-inla.org

Tim Meehan

unread,

Dec 9, 2022, 2:35:53 PM12/9/22

to R-inla discussion group

Hi Yalem,

I'm not an expert. But hope this is helpful.

Best,
Tim

### From my reading and asking others:

## WAIC is 'better' than DIC. You get WAIC using the following.
# Theoretically, models that better balance prediction and parsimony (not overfit)

# have lower scores.
inla_waic <- inla_result$waic$waic

## Another model ranking criterion that reflects how the predictions match the

## data is the following.

# Better models have lower scores with this particular formulation. This score

# gets tricky when CPO is not well estimated, i.e., it fails the QC test or is NA.

cpo_vec <- inla_result$cpo$cpo
inla_bpic <- as.numeric(-2*sum(log(cpo_vec), na.rm=F))

# If that fails due to NA CPO values, try with the experimental version of INLA.
gcpo_vec <- inla_result$gcpo$gcpo
inla_bpic <- as.numeric(-2*sum(log(gcpo_vec), na.rm=F))

### Hopefully the two criteria above agree on a model. But also:

## Judge the fit of your model with the following.
# Libraries
library(inlatools)
library(ggplot2)
library(cowplot)
library(dplyr)

# Diagnostic plots
dp1 <- dispersion_check(inla_result) %>% plot()
dp2 <- fast_distribution_check(inla_result) %>% plot()
dp3 <- ggplot(data.frame(x=inla_result$cpo$pit)) + geom_histogram(aes(x=x))
dp4 <- ggplot(data.frame(predicted=inla_result$summary.fitted.values$mean,
observed=data_set$y), aes(x=observed, y=predicted)) +
geom_hex() +
geom_abline(aes(intercept=0, slope=1))
plot_grid(dp1, dp2, dp3, dp4)
# For dp3, ideally the histogram shows a uniform-ish distribution.
# For dp4, ideally the mass is along the 1:1 line.

# Hopefully there is a reasonable correlation between observed and mean predicted
cor(inla_result$summary.fitted.values$mean, data_set$y, use="complete.obs")

Reply all

Reply to author

Forward