I've got a hurdle model with the following form. It's a dataset of 347 areal units with case counts and a lot of 0s. My goal, in addition to finding significant covariates, is to see whether the spatial heterogeneity of this disease is explained by these variables. So I'd like to predict both unadjusted (except for population) and adjusted (for many variables) onto the study area to see if the prediction is changed by the adjustment.
This is the unadjusted model:
fit1<-brm(bf(cases~ s(long, lat) + POP2010,
hu~ s(long,lat) + POP2010),
control=list(adapt_delta=0.99),
data=bg, family = hurdle_negbinomial())If I predict from this to a grid, using fitted(fit1, grid), what am I getting in the prediction?Is there a way to separate the cases from the odds of a 0?
This is what the data look like:
> table(bg$cases)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 19 20 22 24 25 26 52
177 54 31 24 18 6 8 1 6 1 6 2 1 1 1 1 1 2 1 1 1 1 1 1
The model summary looks like this:
> summary(fit1)
Family: hurdle_negbinomial(log)
Formula: cases ~ s(long, lat) + POP2010
hu ~ s(long, lat) + POP2010
...
Smooth Terms:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sds(slonglat_1) 2.42 0.65 1.39 3.92 1218 1
sds(hu_slonglat_1) 1.86 1.00 0.29 4.14 783 1
Population-Level Effects:
Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept -0.08 0.18 -0.45 0.25 4000 1
hu_Intercept -0.10 0.14 -0.39 0.18 4000 1
POP2010 0.46 0.08 0.31 0.61 4000 1
slonglat_1 0.05 0.51 -0.98 1.07 1907 1
slonglat_2 -0.23 0.40 -1.01 0.60 1493 1
hu_POP2010 -0.38 0.15 -0.70 -0.09 4000 1
hu_slonglat_1 -1.08 0.58 -2.25 0.12 2650 1
hu_slonglat_2 -0.06 0.35 -0.75 0.65 2529 1
When I predict to a grid (~5000 points) covering the area, I get this range for the estimate and error, which is very large but I'm not sure what's actually being predicted:
> pred1<-cbind(grid[,2:3], fitted(fit1, grid))
> range(pred1$Estimate)
[1] 0.1159221 200.5955257
> range(pred1$Est.Error)
[1] 0.05017637 143.20620066