Weights and Offsets do not seem to move the model results

jordan....@gmail.com

unread,

Sep 21, 2021, 2:19:44 PM9/21/21

to pystatsmodels

Hello,

I have ran a tweedie, multi-variate GLM three times. One without weights and offsets, one with weights only and one with offsets only. The results are the exact same everytime.

Is that normal?

josef...@gmail.com

unread,

Sep 21, 2021, 2:45:50 PM9/21/21

to pystatsmodels

It's not normal in the general case.

Any non-zero offset should at least change the constant in params

which weights? var_weights or freq_weights?

If they are non-constant, then params should change.

Prediction might not change much, e.g. if the model adjusts to compensate for offset.

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/180a449d-646e-404e-8066-9fc05edad29bn%40googlegroups.com.

Jordan Howell

unread,

Sep 21, 2021, 3:02:56 PM9/21/21

to pystat...@googlegroups.com

I just used "weights". But it should probably be freq_weights.

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAMMTP%2BD_tCv%2BfamLrP4OpvL7rJK2H%3DgUh%3D3ZZqe-vyoVkO4T8w%40mail.gmail.com.

--

Respectfully,

Jordan Howell
253-266-8088

josef...@gmail.com

unread,

Sep 21, 2021, 3:03:41 PM9/21/21

to pystatsmodels

On Tue, Sep 21, 2021 at 2:45 PM <josef...@gmail.com> wrote:

On Tue, Sep 21, 2021 at 2:19 PM jordan....@gmail.com <jordan....@gmail.com> wrote:
Hello,

I have ran a tweedie, multi-variate GLM three times. One without weights and offsets, one with weights only and one with offsets only. The results are the exact same everytime.

Is that normal?

It's not normal in the general case.
Any non-zero offset should at least change the constant in params

which weights? var_weights or freq_weights?

If they are non-constant, then params should change.

Prediction might not change much, e.g. if the model adjusts to compensate for offset.

Most of GLM, and essentially all estimation with irls is family independent.

It's unlikely that there is a bug in regular cases.

So most likely it's something specific to your data or to what you are doing.

There can always be problems with corner cases.

Did your fit converge?

Jordan Howell

unread,

Sep 21, 2021, 3:04:10 PM9/21/21

to pystat...@googlegroups.com

Yep. That was the issue.

josef...@gmail.com

unread,

Sep 21, 2021, 3:05:53 PM9/21/21

to pystatsmodels

On Tue, Sep 21, 2021 at 3:02 PM Jordan Howell <jordan....@gmail.com> wrote:

I just used "weights". But it should probably be freq_weights.

weights will just be swallowed by the **kwargs, and not do anything.

We still don't have a proper check in the models to which kwargs are allowed.

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAJhRQOBomv4Lns-mQ%2BLHQHXVbqVxJJmjnikq4r%3DzpLS1xzfEVg%40mail.gmail.com.

jordan....@gmail.com

unread,

Sep 21, 2021, 3:18:08 PM9/21/21

to pystatsmodels

Is using the offset argument the same as adding the offset column into the formula? So instead of:

target ~ var1 * var2 * var3

I do:

target~ var1 * var2 * var3 + offset_factor

josef...@gmail.com

unread,

Sep 21, 2021, 3:23:07 PM9/21/21

to pystatsmodels

On Tue, Sep 21, 2021 at 3:18 PM jordan....@gmail.com <jordan....@gmail.com> wrote:

Is using the offset argument the same as adding the offset column into the formula? So instead of:

target ~ var1 * var2 * var3

I do:

target~ var1 * var2 * var3 + offset_factor

you need to use GLM(..., offset=my_offset_factor)

then it will be included as in your second expression.

If you add it to the design matrix, exog, then the coefficient for it will not be fixed to 1.

It would get an estimated parameter just like all other explanatory variables.

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/e5f0336e-57a4-4e64-92c2-2c5f4ce3b0c3n%40googlegroups.com.

Jordan Howell

unread,

Sep 21, 2021, 3:28:41 PM9/21/21

to pystat...@googlegroups.com

Ok. I'm getting the exact same coefficients when using or taking out the offset. The offset is derived from multiple coefficients multiplied together from a previous model. I'm not sure why it's not changing the resulting parameters.

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAMMTP%2BDfSKUsFAqeUA9e427WT0U21MFpAXHpfnCQoBCp3OQw-Q%40mail.gmail.com.

josef...@gmail.com

unread,

Sep 21, 2021, 3:39:59 PM9/21/21

to pystatsmodels

On Tue, Sep 21, 2021 at 3:28 PM Jordan Howell <jordan....@gmail.com> wrote:

Ok. I'm getting the exact same coefficients when using or taking out the offset. The offset is derived from multiple coefficients multiplied together from a previous model. I'm not sure why it's not changing the resulting parameters.

Is it close to perfectly collinear?

e.g. run OLS(offset_factor, exog_in_glm)

and see whether Rsquared is close to 1 and residual scale is close to zero

Close to perfect collinearity could be a reason that it doesn't have any effect.

The algorithm will find an "arbitrary" solution with perfect collinearity, where "arbitrary" is defined by `pinv`

And to check that you are using offset correctly:

offset2 = offset_factor + s * np.random.randn(len(offset_factor))

use `s` large enough compared to the magnitude of values in offset_factor.

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAJhRQOC52Vw5fVXxP%3DK2AVMfPn9Ktcw1q550zta11_16dpkuoQ%40mail.gmail.com.

josef...@gmail.com

unread,

Sep 21, 2021, 3:46:11 PM9/21/21

to pystatsmodels

On Tue, Sep 21, 2021 at 3:39 PM <josef...@gmail.com> wrote:

On Tue, Sep 21, 2021 at 3:28 PM Jordan Howell <jordan....@gmail.com> wrote:
Ok. I'm getting the exact same coefficients when using or taking out the offset. The offset is derived from multiple coefficients multiplied together from a previous model. I'm not sure why it's not changing the resulting parameters.

Is it close to perfectly collinear?
e.g. run OLS(offset_factor, exog_in_glm)
and see whether Rsquared is close to 1 and residual scale is close to zero
Close to perfect collinearity could be a reason that it doesn't have any effect.
The algorithm will find an "arbitrary" solution with perfect collinearity, where "arbitrary" is defined by `pinv`

Does the previous model use the same as or a subset of explanatory variables in the current models.

Then any linear combination would have to be perfectly collinear.

You need to have a least one extra variable in the previous model.

(no formal proof, but in analogy to similar two stage estimation problems)

josef...@gmail.com

unread,

Sep 26, 2021, 3:43:49 PM9/26/21

to pystatsmodels

On Tue, Sep 21, 2021 at 3:05 PM <josef...@gmail.com> wrote:

On Tue, Sep 21, 2021 at 3:02 PM Jordan Howell <jordan....@gmail.com> wrote:
I just used "weights". But it should probably be freq_weights.

weights will just be swallowed by the **kwargs, and not do anything.

We still don't have a proper check in the models to which kwargs are allowed.

I added the invalid kwarg check to several models.

This will issue a ValueWarning in upcoming release 0.13

It's a bit tricky because classes in the hierarchy use different valid kwargs

https://github.com/statsmodels/statsmodels/pull/7751

Josef

Reply all

Reply to author

Forward