Offset is throwing Nan's/Inf when there are none there

86 views
Skip to first unread message

jordan....@gmail.com

unread,
Nov 1, 2021, 12:34:41 PM11/1/21
to pystatsmodels
Hello,

When I add my offset, I an error stating ValueError: NaN, inf or invalid value detected in weights, estimation infeasible.

My weights are fine because the model runs without the offset in there (with the frequency weights).  

I checked my offset factor for infinity and nan values and none are there.  The min value is 0.002618184435718673 and the max value is 1418.904980670142.  Could the wide range have something to do with it?  

I tried to use the log of the offset factor, and it runs, but the parameters do not change from the model without offsets.  

Has anyone had the same issue or know how to fix it?

josef...@gmail.com

unread,
Nov 1, 2021, 12:58:57 PM11/1/21
to pystatsmodels
No, not enough information.
A reproducible example would be best, showing which model and the code for it is the minimum.

Models with exp in the inverse link function easily overflow.
<ipython-input-47-54d2408a5e55>:1: RuntimeWarning: overflow encountered in exp
  np.exp(1418.9)
inf

The linear prediction part from the exog would need to compensate to get finite values.
I haven't checked whether offset is well handled when computing start_params in various models.

Josef

 
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/4b4c285f-63a0-4e96-913d-7d817b589c9fn%40googlegroups.com.

Jordan Howell

unread,
Nov 1, 2021, 1:33:58 PM11/1/21
to pystat...@googlegroups.com
When I run the following, it works. 

y,x = patsy.dmatrices(formula, df, return_type = 'matrix')

weight_factor = np.array(df[df['x1'].isna() == False]['weight'])

offset_factor = np.array(df[df['x1'].isna() == False]['offset'])
offset_factor_l = np.log(offset_factor)

model = sm.GLM(y, x, family = sm.families.Poisson(), freq_weights=weight_factor).fit(scale="x2")


When I add the offset as follows, I get the NaN/Inf in my weight error.

y,x = patsy.dmatrices(formula, df, return_type = 'matrix')

weight_factor = np.array(df[df['x1'].isna() == False]['weight'])

offset_factor = np.array(df[df['x1'].isna() == False]['offset'])
offset_factor_l = np.log(offset_factor)

model = sm.GLM(y, x, family = sm.families.Poisson(), freq_weights=weight_factor,
offset = offset_factor).fit(scale="x2")

When I run it with the log of the offset, it runs without error, but doesn't give a difference answer then running without the offset. 
y,x = patsy.dmatrices(formula, df, return_type = 'matrix')

weight_factor = np.array(df[df['x1'].isna() == False]['weight'])

offset_factor = np.array(df[df['x1'].isna() == False]['offset'])
offset_factor_l = np.log(offset_factor)

model = sm.GLM(y, x, family = sm.families.Poisson(), freq_weights=weight_factor,
offset = offset_factor_l).fit(scale="x2")

The overall goal is to set all variables coefficients but one (x).  x is then replaced with a value from a different data source to see what type of lift that value (x1) brings compared to the original (x).  

Does that example make more sense? 



--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.


--
Respectfully,

Jordan Howell
253-266-8088

josef...@gmail.com

unread,
Nov 1, 2021, 2:39:00 PM11/1/21
to pystatsmodels
On Mon, Nov 1, 2021 at 1:33 PM Jordan Howell <jordan....@gmail.com> wrote:
When I run the following, it works. 

y,x = patsy.dmatrices(formula, df, return_type = 'matrix')

weight_factor = np.array(df[df['x1'].isna() == False]['weight'])

offset_factor = np.array(df[df['x1'].isna() == False]['offset'])
offset_factor_l = np.log(offset_factor)

model = sm.GLM(y, x, family = sm.families.Poisson(), freq_weights=weight_factor).fit(scale="x2")


When I add the offset as follows, I get the NaN/Inf in my weight error.

y,x = patsy.dmatrices(formula, df, return_type = 'matrix')

weight_factor = np.array(df[df['x1'].isna() == False]['weight'])

offset_factor = np.array(df[df['x1'].isna() == False]['offset'])
offset_factor_l = np.log(offset_factor)

model = sm.GLM(y, x, family = sm.families.Poisson(), freq_weights=weight_factor,
offset = offset_factor).fit(scale="x2")

When I run it with the log of the offset, it runs without error, but doesn't give a difference answer then running without the offset. 
y,x = patsy.dmatrices(formula, df, return_type = 'matrix')

weight_factor = np.array(df[df['x1'].isna() == False]['weight'])

offset_factor = np.array(df[df['x1'].isna() == False]['offset'])
offset_factor_l = np.log(offset_factor)

model = sm.GLM(y, x, family = sm.families.Poisson(), freq_weights=weight_factor,
offset = offset_factor_l).fit(scale="x2")

The overall goal is to set all variables coefficients but one (x).  x is then replaced with a value from a different data source to see what type of lift that value (x1) brings compared to the original (x).  

Does that example make more sense? 

Did you compute the offset correctly as part of the linear predictor offset = x_not1 dot params_not1?

As check that your steps work , you can do the same thing but instead of using the second dataset you use the first dataset again.
Then the estimated coefficient of x1 should be the same in the offset model as in the original model.
There shouldn't be an overflow problem in the offset model, at least close to the MLE params.

Do the two datasets have a similar range of values, or do the x in the second dataset have some much larger values?

Josef

 

Jordan Howell

unread,
Nov 1, 2021, 2:43:22 PM11/1/21
to pystat...@googlegroups.com
It is the same dataset throughout. The Paramus in the offset are exp(xnot*coefficient). The total offset factor is the conglomerate of the previous model. 

Offset = (x1*coefficient)*(x2*coefficient)*(xN*coefficient)

Jordan

On Nov 1, 2021, at 2:39 PM, josef...@gmail.com wrote:



josef...@gmail.com

unread,
Nov 1, 2021, 2:49:14 PM11/1/21
to pystatsmodels
On Mon, Nov 1, 2021 at 2:43 PM Jordan Howell <jordan....@gmail.com> wrote:
It is the same dataset throughout. The Paramus in the offset are exp(xnot*coefficient). The total offset factor is the conglomerate of the previous model. 

Offset = (x1*coefficient)*(x2*coefficient)*(xN*coefficient)

the offset is added to the linear predictor and replace part of it, so it need to be additive
Offset = (x1*coefficient) + (x2*coefficient) + (xN*coefficient)

What I meant with the original dataset is the dataset that you used to estimate the `coefficients` that you use in the offset model
 

Jordan Howell

unread,
Nov 1, 2021, 2:51:08 PM11/1/21
to pystat...@googlegroups.com
Yes it's the original data set with the new variable appended on.

josef...@gmail.com

unread,
Nov 1, 2021, 2:59:44 PM11/1/21
to pystatsmodels
When you add the offset, are you removing the other variables that the offset is replacing.

Maybe you should make up a simple example to see how it works.

If offset is just replacing part of the x effects, then there should be no overflow problem because it worked in the full model.
(I wrote unit test like that to check offset)
The only problem could come from very bad starting values in the optimization.

josef


Jordan Howell

unread,
Nov 1, 2021, 3:05:00 PM11/1/21
to pystat...@googlegroups.com
yes.  the only variable when using the offset is the new variable. Can you send me the unit test?

You received this message because you are subscribed to a topic in the Google Groups "pystatsmodels" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pystatsmodels/bxWRGYs4lxA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAMMTP%2BCo6DYtZav_ReqvKryuHqTTTbtvvBA2A2cBvA5%2BNty2Nw%40mail.gmail.com.

josef...@gmail.com

unread,
Nov 1, 2021, 3:22:41 PM11/1/21
to pystatsmodels
I don't remember which unit test I used this. That might be 8 to 10 years ago and our unit test code is huge and not well organized.
"offset" is much to common to do a code search for it. (around 1500 search matches in all of statsmodels)

Josef


Jordan Howell

unread,
Nov 1, 2021, 3:24:07 PM11/1/21
to pystat...@googlegroups.com
understood.  I'll try and come up with something.  Thanks for all the support....as always, it's great. 

Jordan Howell

unread,
Nov 1, 2021, 3:50:32 PM11/1/21
to pystat...@googlegroups.com
Ok.  I ran a unit test with random data.  Ran a model with 2 variables.  Then set an offset for x2 and took x2 out of the model.  got the same coefficient for x1.  That tells me it works fine and sends me back to the drawing board for what's wrong with my data. 

Thank you for that idea. 

josef...@gmail.com

unread,
Nov 2, 2021, 2:02:56 PM11/2/21
to pystatsmodels
I found a case where I used it.

For profile confidence interval computation, we replace the relevant x variable by an offset of x times a given coefficient.

Also, fit_constrained uses the same basic idea but in a more general version.

Josef

Reply all
Reply to author
Forward
0 new messages