prediction from hurdle model

Paul Lantos

unread,

Sep 22, 2017, 9:05:02 PM9/22/17

to brms-users

I've got a hurdle model with the following form. It's a dataset of 347 areal units with case counts and a lot of 0s. My goal, in addition to finding significant covariates, is to see whether the spatial heterogeneity of this disease is explained by these variables. So I'd like to predict both unadjusted (except for population) and adjusted (for many variables) onto the study area to see if the prediction is changed by the adjustment.

This is the unadjusted model:

fit1<-brm(bf(cases~ s(long, lat) + POP2010,
             hu~ s(long,lat) + POP2010),
          control=list(adapt_delta=0.99),
          data=bg, family = hurdle_negbinomial())

If I predict from this to a grid, using fitted(fit1, grid), what am I getting in the prediction?Is there a way to separate the cases from the odds of a 0?

This is what the data look like:

> table(bg$cases)

  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  17  19  20  22  24  25  26  52 
177  54  31  24  18   6   8   1   6   1   6   2   1   1   1   1   1   2   1   1   1   1   1   1 

The model summary looks like this:
> summary(fit1)
 Family: hurdle_negbinomial(log) 
Formula: cases ~ s(long, lat) + POP2010 
         hu ~ s(long, lat) + POP2010

...
 
Smooth Terms: 
                   Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
sds(slonglat_1)        2.42      0.65     1.39     3.92       1218    1
sds(hu_slonglat_1)     1.86      1.00     0.29     4.14        783    1

Population-Level Effects: 
              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
Intercept        -0.08      0.18    -0.45     0.25       4000    1
hu_Intercept     -0.10      0.14    -0.39     0.18       4000    1
POP2010           0.46      0.08     0.31     0.61       4000    1
slonglat_1        0.05      0.51    -0.98     1.07       1907    1
slonglat_2       -0.23      0.40    -1.01     0.60       1493    1
hu_POP2010       -0.38      0.15    -0.70    -0.09       4000    1
hu_slonglat_1    -1.08      0.58    -2.25     0.12       2650    1
hu_slonglat_2    -0.06      0.35    -0.75     0.65       2529    1

When I predict to a grid (~5000 points) covering the area, I get this range for the estimate and error, which is very large but I'm not sure what's actually being predicted:

> pred1<-cbind(grid[,2:3], fitted(fit1, grid))
> range(pred1$Estimate)
[1]   0.1159221 200.5955257
> range(pred1$Est.Error)
[1]   0.05017637 143.20620066

Paul Buerkner

unread,

Sep 23, 2017, 2:52:08 AM9/23/17

to Paul Lantos, brms-users

Fitted returns the predicted mean of the response variable that is it combines the predictions of the zero and non-zero part.

If you want to take a look at the splines only use marginal_smooths.

If you just want predictiond for the zeros go for fitted (fit, dpar = "hu").

--
You received this message because you are subscribed to the Google Groups "brms-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to brms-users+unsubscribe@googlegroups.com.
To post to this group, send email to brms-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/brms-users/d32b938d-7a9b-4fc4-aaa1-eaac07ff32d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Paul Lantos

unread,

Sep 23, 2017, 7:11:11 AM9/23/17

to Paul Buerkner, brms-users

Thanks, so are the predictions of the combined model on the scale of the initial data (case counts)? If so I'm trying to understand the range of predictions (0.1 to 200) versus the scale of initial data (0 to 52).

Paul Buerkner

unread,

Sep 23, 2017, 8:36:37 AM9/23/17

to Paul Lantos, brms-users

Yes they are on the original scale (counts in your case). The predictions do not necessarily have to match the original range and i recommend pp_check to graphically investigate the model fit. You might for instance use pp_check (fit, type = "rootogram")

To view this discussion on the web visit https://groups.google.com/d/msgid/brms-users/CAHvcWWj5uCa%2B-e7C0qaJCxoA51a-yC9858Qec_M%3D2RDxO72CxA%40mail.gmail.com.

Paul Lantos

unread,

Sep 23, 2017, 12:18:04 PM9/23/17

to brms-users

Ok, makes sense - this is the ppcheck (I truncated the x axis to 200, but in the original it goes to 2000).

To unsubscribe from this group and stop receiving emails from it, send an email to brms-users+...@googlegroups.com.

To post to this group, send email to brms-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/brms-users/d32b938d-7a9b-4fc4-aaa1-eaac07ff32d8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "brms-users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to brms-users+...@googlegroups.com.

To post to this group, send email to brms-...@googlegroups.com.

Auto Generated Inline Image 1

Reply all

Reply to author

Forward