Model selection

62 views
Skip to first unread message

Yalemzewod Gelaw

unread,
Nov 28, 2022, 5:17:52 AM11/28/22
to r-inla-disc...@googlegroups.com
Hi INLA Support group,

I fitted four different poison models to predict the Spatio-temporal mapping of malaria at the district level. I was wondering if someone can share :
1) what metric is better to choose the good fit model (DIC, WAIC, CPO, PIT)
2) what overfitted model means and how do check the overfitted poison model? 

Thank you, 
Yalem


--


Best regards,
_______________________
Yalemzewod A Gelaw
Postdoctoral Research Officer 
Telethon Kids Institute, Perth, Western Australia
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

Helpdesk (Haavard Rue)

unread,
Nov 28, 2022, 5:38:18 AM11/28/22
to Yalemzewod Gelaw, r-inla-disc...@googlegroups.com
I would also add https://arxiv.org/abs/2210.04482 to the list...


On Mon, 2022-11-28 at 18:17 +0800, Yalemzewod Gelaw wrote:
> Hi INLA Support group,
>
> I fitted four different poison models to predict the Spatio-temporal
> mapping of malaria at the district level. I was wondering if someone
> can share :
> 1) what metric is better to choose the good fit model (DIC, WAIC, CPO,
> PIT)
> 2) what overfitted model means and how do check the overfitted poison
> model? 
>
> Thank you, 
> Yalem
>
>

--
Håvard Rue
he...@r-inla.org

Tim Meehan

unread,
Dec 9, 2022, 2:35:53 PM12/9/22
to R-inla discussion group
Hi Yalem,

I'm not an expert. But hope this is helpful.

Best,
Tim

### From my reading and asking others:

## WAIC is 'better' than DIC. You get WAIC using the following.
# Theoretically, models that better balance prediction and parsimony (not overfit) 
# have lower scores.
inla_waic <- inla_result$waic$waic

## Another model ranking criterion that reflects how the predictions match the 
## data is the following. 
# Better models have lower scores with this particular formulation. This score 
# gets tricky when CPO is not well estimated, i.e., it fails the QC test or is NA.
cpo_vec <- inla_result$cpo$cpo
inla_bpic <- as.numeric(-2*sum(log(cpo_vec), na.rm=F))

# If that fails due to NA CPO values, try with the experimental version of INLA.
gcpo_vec <- inla_result$gcpo$gcpo
inla_bpic <- as.numeric(-2*sum(log(gcpo_vec), na.rm=F))

### Hopefully the two criteria above agree on a model. But also:

## Judge the fit of your model with the following.
# Libraries
library(inlatools)
library(ggplot2)
library(cowplot)
library(dplyr)

# Diagnostic plots
dp1 <- dispersion_check(inla_result) %>% plot()
dp2 <- fast_distribution_check(inla_result) %>% plot()
dp3 <- ggplot(data.frame(x=inla_result$cpo$pit)) + geom_histogram(aes(x=x))
dp4 <- ggplot(data.frame(predicted=inla_result$summary.fitted.values$mean,
       observed=data_set$y), aes(x=observed, y=predicted)) +
       geom_hex() +
       geom_abline(aes(intercept=0, slope=1))
plot_grid(dp1, dp2, dp3, dp4)
# For dp3, ideally the histogram shows a uniform-ish distribution.
# For dp4, ideally the mass is along the 1:1 line.

# Hopefully there is a reasonable correlation between observed and mean predicted
cor(inla_result$summary.fitted.values$mean, data_set$y, use="complete.obs")

Reply all
Reply to author
Forward
0 new messages