Dear R-INLA community
I am doing spatial-temporal disease mapping for predicting count data. The prediction in more recent years (2023) looks not good (see graph below), compared to 2021. I guess this is because many of my covariates are 100% missing (not reported yet by the open data provider) in recent years (2022 & 2023), and the intercept (-11.851) made everything down while these missing covariates are regarded as 0 in INLA.
My main question is, does anyone think missing covariates imputation or joint model will be a good solution (Virgilio Gómez-Rubio et .al 2029. Missing data analysis and imputation via latent Gaussian Markov random fields) in this case?
About my model: The response variable is a count (suspicious polio cases, i.e. Non-Polio AFP) and the covariates are numerical and manually scaled already. The dataset has been aggregated by month and by district (admin 2 level) across three countries. I used negative binomial and bym2 for spatial and ar1 for year and grouped by month. I fit the all the years with the response variable between 2021 and 2023 given NA.
In the meantime, I am other issue that I am still struggling/reading / trying:
Apologies for this long post. I will appreciate if you spot any issues here. Any recommendations are welcome! Thanks for developing this such beautiful INLA method!
Wish you have a great day!!!
Kai
###################CODE##################
#model
pc_prior <- list(prec = list(prior="pc.prec",
param = c(3* sd(df_all_gis_scale$no_npafp),0.01)))
temp_formula_prediction <- as.formula(
no_npafp_predicted ~ 1+conflict_total_events + sia + rain_r1h_avg +
interval_incidence_province + stool_adequacy + google_trend +
vegetation_anomaly + c_influenza + c_report_measles + c_financing_gghegge +
c_wash_hygiene + c_infant_mortality + c_development_gdp +
c_vac_coverage_polio + c_temperature +
f(index_district, model = "bym2", graph = graph_neighbour_list, scale.model = TRUE) +
f(index_year, model = "iid", group = month, control.group = list(model = "ar1", scale.model = TRUE), hyper = pc_prior))#index year ranges from1:6; month ranges from 1 to 12. (non-cyclic)
#fit
res <- inla(data = df_all_gis_scale, E = population, family = "nbinomial",
control.predictor = list(compute = TRUE, link=1), control.compute = list(dic=TRUE,waic=TRUE,return.marginals.predictor=TRUE,cpo=TRUE),
formula = temp_formula_prediction ) #response variable between 2021 and 2023 has been replaced by NA
#the fit summary is attached.
The predicted count data = res$summary.fitted.values[,"mean"] * df_all_gis_scale$population # as I saw somewhere that you need to transform it back by multiplying population when you did a offset, right???