Offset is throwing Nan's/Inf when there are none there

jordan....@gmail.com

unread,

Nov 1, 2021, 12:34:41 PM11/1/21

to pystatsmodels

Hello,

When I add my offset, I an error stating ValueError: NaN, inf or invalid value detected in weights, estimation infeasible.

My weights are fine because the model runs without the offset in there (with the frequency weights).

I checked my offset factor for infinity and nan values and none are there. The min value is 0.002618184435718673 and the max value is 1418.904980670142. Could the wide range have something to do with it?

I tried to use the log of the offset factor, and it runs, but the parameters do not change from the model without offsets.

Has anyone had the same issue or know how to fix it?

josef...@gmail.com

unread,

Nov 1, 2021, 12:58:57 PM11/1/21

to pystatsmodels

No, not enough information.

A reproducible example would be best, showing which model and the code for it is the minimum.

Models with exp in the inverse link function easily overflow.

<ipython-input-47-54d2408a5e55>:1: RuntimeWarning: overflow encountered in exp

np.exp(1418.9)
inf

The linear prediction part from the exog would need to compensate to get finite values.

I haven't checked whether offset is well handled when computing start_params in various models.

Josef

You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/4b4c285f-63a0-4e96-913d-7d817b589c9fn%40googlegroups.com.

Jordan Howell

unread,

Nov 1, 2021, 1:33:58 PM11/1/21

to pystat...@googlegroups.com

When I run the following, it works.

y,x = patsy.dmatrices(formula, df, return_type = 'matrix')

weight_factor = np.array(df[df['x1'].isna() == False]['weight'])

offset_factor = np.array(df[df['x1'].isna() == False]['offset'])
offset_factor_l = np.log(offset_factor)

model = sm.GLM(y, x, family = sm.families.Poisson(), freq_weights=weight_factor).fit(scale="x2")

When I add the offset as follows, I get the NaN/Inf in my weight error.

y,x = patsy.dmatrices(formula, df, return_type = 'matrix')

weight_factor = np.array(df[df['x1'].isna() == False]['weight'])

offset_factor = np.array(df[df['x1'].isna() == False]['offset'])
offset_factor_l = np.log(offset_factor)

model = sm.GLM(y, x, family = sm.families.Poisson(), freq_weights=weight_factor,

offset = offset_factor).fit(scale="x2")

When I run it with the log of the offset, it runs without error, but doesn't give a difference answer then running without the offset.

y,x = patsy.dmatrices(formula, df, return_type = 'matrix')

weight_factor = np.array(df[df['x1'].isna() == False]['weight'])

offset_factor = np.array(df[df['x1'].isna() == False]['offset'])
offset_factor_l = np.log(offset_factor)

model = sm.GLM(y, x, family = sm.families.Poisson(), freq_weights=weight_factor,

offset = offset_factor_l).fit(scale="x2")

The overall goal is to set all variables coefficients but one (x). x is then replaced with a value from a different data source to see what type of lift that value (x1) brings compared to the original (x).

Does that example make more sense?

--

You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAMMTP%2BD86GO19TSaDmdYzYwM7U11c%2BMh1MsN5T7VEAU%3DkOjbtA%40mail.gmail.com.

--

Respectfully,

Jordan Howell
253-266-8088

josef...@gmail.com

unread,

Nov 1, 2021, 2:39:00 PM11/1/21

to pystatsmodels

On Mon, Nov 1, 2021 at 1:33 PM Jordan Howell <jordan....@gmail.com> wrote:

When I run the following, it works.

y,x = patsy.dmatrices(formula, df, return_type = 'matrix')

weight_factor = np.array(df[df['x1'].isna() == False]['weight'])

offset_factor = np.array(df[df['x1'].isna() == False]['offset'])
offset_factor_l = np.log(offset_factor)

model = sm.GLM(y, x, family = sm.families.Poisson(), freq_weights=weight_factor).fit(scale="x2")

When I add the offset as follows, I get the NaN/Inf in my weight error.

y,x = patsy.dmatrices(formula, df, return_type = 'matrix')

weight_factor = np.array(df[df['x1'].isna() == False]['weight'])

offset_factor = np.array(df[df['x1'].isna() == False]['offset'])
offset_factor_l = np.log(offset_factor)

model = sm.GLM(y, x, family = sm.families.Poisson(), freq_weights=weight_factor,
offset = offset_factor).fit(scale="x2")

When I run it with the log of the offset, it runs without error, but doesn't give a difference answer then running without the offset.
y,x = patsy.dmatrices(formula, df, return_type = 'matrix')

weight_factor = np.array(df[df['x1'].isna() == False]['weight'])

offset_factor = np.array(df[df['x1'].isna() == False]['offset'])
offset_factor_l = np.log(offset_factor)

model = sm.GLM(y, x, family = sm.families.Poisson(), freq_weights=weight_factor,
offset = offset_factor_l).fit(scale="x2")

The overall goal is to set all variables coefficients but one (x). x is then replaced with a value from a different data source to see what type of lift that value (x1) brings compared to the original (x).

Does that example make more sense?

Did you compute the offset correctly as part of the linear predictor offset = x_not1 dot params_not1?

As check that your steps work , you can do the same thing but instead of using the second dataset you use the first dataset again.

Then the estimated coefficient of x1 should be the same in the offset model as in the original model.

There shouldn't be an overflow problem in the offset model, at least close to the MLE params.

Do the two datasets have a similar range of values, or do the x in the second dataset have some much larger values?

Josef

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAJhRQOCxz_EZvSqRBtUB1YX0q0qd-0EqzMSHMu%2Bi__AhTRz8-A%40mail.gmail.com.

Jordan Howell

unread,

Nov 1, 2021, 2:43:22 PM11/1/21

to pystat...@googlegroups.com

It is the same dataset throughout. The Paramus in the offset are exp(xnot*coefficient). The total offset factor is the conglomerate of the previous model.

Offset = (x1*coefficient)*(x2*coefficient)*(xN*coefficient)

Jordan

On Nov 1, 2021, at 2:39 PM, josef...@gmail.com wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAMMTP%2BAnAV%3DteVjvpS%2Biezc3HEbjn276F-H4x8PJwsYmmKycag%40mail.gmail.com.

josef...@gmail.com

unread,

Nov 1, 2021, 2:49:14 PM11/1/21

to pystatsmodels

On Mon, Nov 1, 2021 at 2:43 PM Jordan Howell <jordan....@gmail.com> wrote:

It is the same dataset throughout. The Paramus in the offset are exp(xnot*coefficient). The total offset factor is the conglomerate of the previous model.

Offset = (x1*coefficient)*(x2*coefficient)*(xN*coefficient)

the offset is added to the linear predictor and replace part of it, so it need to be additive

Offset = (x1*coefficient) + (x2*coefficient) + (xN*coefficient)

What I meant with the original dataset is the dataset that you used to estimate the `coefficients` that you use in the offset model

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/436E51CB-4E36-45F8-928A-E6769904DA66%40gmail.com.

Jordan Howell

unread,

Nov 1, 2021, 2:51:08 PM11/1/21

to pystat...@googlegroups.com

Yes it's the original data set with the new variable appended on.

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAMMTP%2BB6Z%3D_gnRx10UFW9jVCg%2B9js%3D3zf_-sk1bUVzqaW%3D7FnA%40mail.gmail.com.

josef...@gmail.com

unread,

Nov 1, 2021, 2:59:44 PM11/1/21

to pystatsmodels

When you add the offset, are you removing the other variables that the offset is replacing.

Maybe you should make up a simple example to see how it works.

If offset is just replacing part of the x effects, then there should be no overflow problem because it worked in the full model.

(I wrote unit test like that to check offset)

The only problem could come from very bad starting values in the optimization.

josef

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAJhRQODxiMht2P3gx6w0zL-utfAbU1L_XzETofuOwm4fNhGMJA%40mail.gmail.com.

Jordan Howell

unread,

Nov 1, 2021, 3:05:00 PM11/1/21

to pystat...@googlegroups.com

yes. the only variable when using the offset is the new variable. Can you send me the unit test?

You received this message because you are subscribed to a topic in the Google Groups "pystatsmodels" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pystatsmodels/bxWRGYs4lxA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAMMTP%2BCo6DYtZav_ReqvKryuHqTTTbtvvBA2A2cBvA5%2BNty2Nw%40mail.gmail.com.

josef...@gmail.com

unread,

Nov 1, 2021, 3:22:41 PM11/1/21

to pystatsmodels

I don't remember which unit test I used this. That might be 8 to 10 years ago and our unit test code is huge and not well organized.

"offset" is much to common to do a code search for it. (around 1500 search matches in all of statsmodels)

Josef

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAJhRQOBu%3D_qgRmuE3EEt8Ue%2BGEorqTpgxoyQU3TewdsQQxkJ3w%40mail.gmail.com.

Jordan Howell

unread,

Nov 1, 2021, 3:24:07 PM11/1/21

to pystat...@googlegroups.com

understood. I'll try and come up with something. Thanks for all the support....as always, it's great.

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAMMTP%2BCBMjWWyNBcXvr-_qTJP2xW1iNmODkFmA5X%3D0sG5sRPTg%40mail.gmail.com.

Jordan Howell

unread,

Nov 1, 2021, 3:50:32 PM11/1/21

to pystat...@googlegroups.com

Ok. I ran a unit test with random data. Ran a model with 2 variables. Then set an offset for x2 and took x2 out of the model. got the same coefficient for x1. That tells me it works fine and sends me back to the drawing board for what's wrong with my data.

Thank you for that idea.

josef...@gmail.com

unread,

Nov 2, 2021, 2:02:56 PM11/2/21

to pystatsmodels

I found a case where I used it.

For profile confidence interval computation, we replace the relevant x variable by an offset of x times a given coefficient.

https://github.com/statsmodels/statsmodels/pull/1791/files#diff-26f241547a107c18fc024b9aae392d35a71ceba90e551ce4ce1d4aac9325bb06R76

Also, fit_constrained uses the same basic idea but in a more general version.

Josef

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAJhRQOCLgCQk6RJL-ZRTEbHepOTELeXMvipN2dkmt8P5bw6T%2Bg%40mail.gmail.com.

Reply all

Reply to author

Forward