Does 'predict' with a fitted GLM use the offset?

jordan....@gmail.com

unread,

Nov 17, 2021, 12:10:29 PM11/17/21

to pystatsmodels

Hello,

I've fit a model with offsets from a different model like so:

offset_formula = "cm_pure_premium ~ new_auto_m_score - 1"

y,x = patsy.dmatrices(offset_formula, df_d1,

return_type = 'matrix')

weight_factor = np.array(df_d1['comp_eu'])

offset_factor = np.array(df_d1['offset_factor'])

model_d_m = sm.GLM(y, x, family = sm.families.Poisson(),

freq_weights=weight_factor,

offset = offset_factor).fit(scale="x2")

I've tested this and the offset is working correctly.

When I run:

'model_d_m.predict(x)' can anyone confirm if the model is calculating the offset in the prediction? Or is the offset only considered in the fit?

josef...@gmail.com

unread,

Nov 17, 2021, 12:45:41 PM11/17/21

to pystatsmodels

It's a bit tricky, and we had some bugs in this.

If exog x is not specified in predict, then all model arrays, exog, offset and exposure, ... are used.

If exog x in predict is user provided, then

if offset is also provided, then it is used, (similar for other extra arrays in different models/families)

if offset is NOT provided, then the default is 0.

So

'model_d_m.predict(x)' will not use offset (will set offset=0)

'model_d_m.predict(x_predict, offset=offset_predict)' will use offset_predict as offset.

'model_d_m.predict()' uses insample arrays for exog and offset from the model

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/2efe7fc1-d25a-4385-9f71-39516e10076dn%40googlegroups.com.

josef...@gmail.com

unread,

Nov 17, 2021, 12:47:25 PM11/17/21

to pystatsmodels

On Wed, Nov 17, 2021 at 12:43 PM <josef...@gmail.com> wrote:

On Wed, Nov 17, 2021 at 12:10 PM jordan....@gmail.com <jordan....@gmail.com> wrote:
Hello,

I've fit a model with offsets from a different model like so:

offset_formula = "cm_pure_premium ~ new_auto_m_score - 1"
y,x = patsy.dmatrices(offset_formula, df_d1,
return_type = 'matrix')

weight_factor = np.array(df_d1['comp_eu'])
offset_factor = np.array(df_d1['offset_factor'])

model_d_m = sm.GLM(y, x, family = sm.families.Poisson(),
freq_weights=weight_factor,
offset = offset_factor).fit(scale="x2")

I've tested this and the offset is working correctly.

When I run:

'model_d_m.predict(x)' can anyone confirm if the model is calculating the offset in the prediction? Or is the offset only considered in the fit?

It's a bit tricky, and we had some bugs in this.

If exog x is not specified in predict, then all model arrays, exog, offset and exposure, ... are used.

If exog x in predict is user provided, then
if offset is also provided, then it is used, (similar for other extra arrays in different models/families)
if offset is NOT provided, then the default is 0.

So
'model_d_m.predict(x)' will not use offset (will set offset=0)
'model_d_m.predict(x_predict, offset=offset_predict)' will use offset_predict as offset.
'model_d_m.predict()' uses insample arrays for exog and offset from the model

best is always to verify

eg. these two should differ

'model_d_m.predict(x_predict, offset=offset_predict)

'model_d_m.predict(x_predict, offset=1 + offset_predict)

these two should be the same if exog offset are the model data

'model_d_m.predict(exog[:5], offset=offset[:5])

'model_d_m.predict()[:5]'

but should differ from

'model_d_m.predict(exog[:5])'

Reply all

Reply to author

Forward