Getting a ValueError: NaN, inf or invalid value detected in weights, estimation infeasible. when not using weights

Jordan Howell

unread,

Mar 9, 2020, 10:27:56 AM3/9/20

to pystatsmodels

Hello,

I'm running a GLM and getting a `ValueError: NaN, inf or invalid value detected in weights, estimation infeasible.` error when I'm not passing a weights argument. Is there a reason for this?

josef...@gmail.com

unread,

Mar 9, 2020, 10:58:35 AM3/9/20

to pystatsmodels

On Mon, Mar 9, 2020 at 10:27 AM Jordan Howell <jordan....@gmail.com> wrote:

Hello,

I'm running a GLM and getting a `ValueError: NaN, inf or invalid value detected in weights, estimation infeasible.` error when I'm not passing a weights argument. Is there a reason for this?

The default optimizer is IRLS which uses a wls internally with internal weights as part of the algorithm.

This problem can happen when some values are at the boundaries, e.g. perfect prediction in GLM Binomial.

That's the equivalence of having a zero variance problem at an observation which might result in zero division and infs, or nans if there is for example a sqrt.

We should have imposed limits and clipping for most of those cases in GLM.

What's your model, family, link?

Maybe there is a boundary condition for which we missed the clipping.

GLM allows using a link that doesn't constrain the values to be in the range of the distribution. Those cases should produce a warning.

These cases can be useful, but will not always work.

Josef

--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/fa11323b-73af-4aac-b272-aba7e18e49d1%40googlegroups.com.

Jordan Howell

unread,

Mar 9, 2020, 11:39:29 AM3/9/20

to pystat...@googlegroups.com

Glm model, poisson family with log link.

Jordan

On Mar 9, 2020, at 10:58 AM, josef...@gmail.com wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAMMTP%2BA_x9z4kBsq4fR4ccoH%2Bha6d8UKi7%2BJW1ECjCZn10h9Cg%40mail.gmail.com.

josef...@gmail.com

unread,

Mar 9, 2020, 11:57:19 AM3/9/20

to pystatsmodels

On Mon, Mar 9, 2020 at 11:39 AM Jordan Howell <jordan....@gmail.com> wrote:

Glm model, poisson family with log link.

Do you have cases where the prediction might be zero?

I think that would be the only corner case for poisson and log link.

Also, did you check that your data doesn't have nans.

Josef

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/71F4C310-4194-4446-817D-253A7AF344A0%40gmail.com.

Jordan Howell

unread,

Mar 9, 2020, 12:04:13 PM3/9/20

to pystat...@googlegroups.com

Oh yeah. It’s a book of insurance claims so there are more zeros than not. No NaNs though.

Jordan

On Mar 9, 2020, at 11:57 AM, josef...@gmail.com wrote:

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/CAMMTP%2BBJydWLhKgpW-RA4zNTnQU_H_owZRmcp0p10BPouuP3%2Bg%40mail.gmail.com.

josef...@gmail.com

unread,

Mar 9, 2020, 12:22:13 PM3/9/20

to pystatsmodels

On Mon, Mar 9, 2020 at 12:04 PM Jordan Howell <jordan....@gmail.com> wrote:

Oh yeah. It’s a book of insurance claims so there are more zeros than not. No NaNs though.

If there are some categoricals/dummy/discrete variables for which all endog are zero, then we shouldn't be able to estimate the poisson parameter for those,

mu=0 will cause numerical problems.

So far this was mainly a theoretical case, where I never looked at an example case

https://github.com/statsmodels/statsmodels/issues/1512

You could try the simplest, most robust optimizer method='nm' and see if it converges to something.

Then you can inspect cases where fittedvalues are zero or close to zero.

Assuming my guess is right.

Josef

To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/FE7DDC63-8937-4E6D-806B-32C2A0D2AE11%40gmail.com.

CCRPC engineer

unread,

Apr 22, 2020, 4:13:12 PM4/22/20

to pystatsmodels

Hi Jordan,

Were you able to fix this problem? I have a very disperse data with tons of zeros and a very sparse X matrix (many dummy variables converted from categorical variables) and I am facing the same issue. I'd appreciate you help!

Thanks!

Reply all

Reply to author

Forward