Hi Finn,
Thanks so much for your help. I ran some models using R-INLA on the mosquito data using a hierarchal random effect, including one for year, model="iid" and one for grid-id, "model=bym". We did not have many points where multiple traps were placed at the same location during the same year, so we excluded those points rather than creating an additional random effect for trap location.
I am now in the process of evaluating the fit of the top models. I followed the suggestions in the book, "Spatial and Spatio-temporal Bayesian models with R-INLA" and used methods based on the predictive distribution, including the cross-validation and posterior predictive checks. Our dataset consists of 193,651 records, and the sum of the cpo values was 306. Considering how large the dataset is, should we still be concerned about these failures? Would you recommend other cross-validation methods such as leaving out 20% of the data and then using that to test the predictive ability of the model?
Also, there were many zero values in the dataset. When I plotted a histogram of the frequency of the posterior predictive p-values, it did not have a uniform distribution, and there were several low values and high values, and not many in between, indicating that the model does not fit the data well. Do you have any suggestions in this case? I tried running it with negative binomial, Poisson, and zero-inflatedpoisson. However, I am unsure what the difference is between zeroinflatedpoisson0, zeroinflatedpoisson1, and zeroinflatedpoisson3.
I am also considering that the model may not fit the data well because we might be missing an important covariate, such as the weather conditions the night of trapping, which could also explain why some traps may have had 0 values, especially if there was high wind, rain, and colder temperatures the night the trap was set out.
It was also suggested that we could try using the median of the abundance estimates to assess model validity instead of the mean. However, I am unsure on the code to do this in INLA. Would this involve transforming the posterior means into medians after they are calculated in R_INLA?
Sincerely,
Kristin