nan deviance in binomial model

396 views
Skip to first unread message

Rachel Melamed

unread,
Oct 7, 2016, 12:21:29 PM10/7/16
to pystatsmodels
Hello,

I'm running a binomial regression and I can't figure out why the fitted model deviance is "nan" 

Model = sm.GLM(endog=success_fail, exog=design, family=sm.families.Binomial()).fit()

Model.summary()
Generalized Linear Model Regression Results
Dep. Variable:['success', 'fail']No. Observations:15351
Model:GLMDf Residuals:14832
Model Family:BinomialDf Model:518
Link Function:logitScale:1.0
Method:IRLSLog-Likelihood:nan
Date:Fri, 07 Oct 2016Deviance:nan
Time:11:13:53Pearson chi2:1.89e+04
No. Iterations:20

Particularly because I tried to calculate the deviance myself (as below), and I got not nan, but 19645.24, which seems reasonable. I am running similar models for slightly different data, and none of them have this problem, so I can't tell where the nan is coming from. I wonder if it is influencing the fit?

    pi = Model.fittedvalues
    success = np.array(success_fail['success'])
    fail = np.array(success_fail['fail'])
    n = np.array(success_fail.sum(axis=1))

    success_dev = success*np.log(success/(pi*n) + 1e-200)
    fail_dev = fail*np.log(fail/((1-pi)*n) + 1e-200)
    print 2*(success_dev.sum() + fail_dev.sum())

Thank you,
Rachel

josef...@gmail.com

unread,
Oct 7, 2016, 12:30:46 PM10/7/16
to pystatsmodels
What version of statsmodels are you using?

right now I don't have a guess what's going on, the binomial deviance also uses the 1e-200 to avoid the corner problem, but maybe a corner is not taken care of properly.
What's your min and max of the fittedvalues (i.e. mu)?

Deviance is not directly used in the estimation, so in itself it doesn't affect the other results. However, it could be that there is an underlying problem with the data that affects the deviance and independently also affects other results.


Josef

Rachel Melamed

unread,
Oct 7, 2016, 1:58:29 PM10/7/16
to pystatsmodels
Thank you for the reply. 
Version: 0.6.1
Min: 2.649e-10
Max: 1.0

There is one observation where the mu = 1.0, and this has [success, fail] = [1, 0]
Do you think that is the problem?
Thanks again!

Rachel Melamed

unread,
Oct 7, 2016, 2:21:50 PM10/7/16
to pystatsmodels
Ah, I took out that one point and no more nan.  Ok, I guess that answers my question, sort of. 
Thanks

josef...@gmail.com

unread,
Oct 7, 2016, 2:42:10 PM10/7/16
to pystatsmodels
On Fri, Oct 7, 2016 at 1:58 PM, Rachel Melamed <rachel....@gmail.com> wrote:
Thank you for the reply. 
Version: 0.6.1
Min: 2.649e-10
Max: 1.0

There is one observation where the mu = 1.0, and this has [success, fail] = [1, 0]
Do you think that is the problem?

Yes, when I looked at the expression in the deviance I thought a parenthesis is set wrongly so that it doesn't avoid a zero division if mu (or pi) ==1.

out of curiosity: How can you get a mu = 1.0? 
This should theoretically only be possible if a parameter goes to infinite. 
We might need more review of this for numerical precision at the 0 and 1 boundaries.

(aside perfect separation checks that all predictions probabilities are essentially 0 or 1, so a few 0 or 1 are not ruled out.
The variance goes to zero at mu=0 or 1, but that is handled by clipping in the varfuncs)


So this looks like a bug in Binomial family for this corner case with mu=1.
I don't know whether it's possible to come up with a simple test case, the immediate bugfix itself is just shifting a parenthesis, and looking for the same problem in loglike.

Thanks for reporting, corner case problems are difficult to catch.

Josef

josef...@gmail.com

unread,
Oct 7, 2016, 2:45:47 PM10/7/16
to pystatsmodels
On Fri, Oct 7, 2016 at 2:21 PM, Rachel Melamed <rachel....@gmail.com> wrote:
Ah, I took out that one point and no more nan.  Ok, I guess that answers my question, sort of. 

follow-up to this:
Did the parameters change by more than in the 3rd decimal or so?

I'm trying to figure out if observations like this have a strong influence on the results, becauset their weight in the estimation (IRLS) should be pretty large.

Josef

Rachel Melamed

unread,
Oct 7, 2016, 3:17:43 PM10/7/16
to pystatsmodels
14 of the parameters have a change that is > 10**-3, but none of those have a change that is greater than 10**-2.  All of  my predictors are categorical, and parameters affected are for very sparsely observed categories.
Rachel

Rachel Melamed

unread,
Oct 7, 2016, 3:21:08 PM10/7/16
to pystatsmodels
I'm happy to send you the data if it will help.

josef...@gmail.com

unread,
Oct 7, 2016, 3:49:09 PM10/7/16
to pystatsmodels
On Fri, Oct 7, 2016 at 3:21 PM, Rachel Melamed <rachel....@gmail.com> wrote:
I'm happy to send you the data if it will help.

Thanks, not necessary, I found an old testcase for when we fixed the mu=0 corner case.

There is also already a PR for this where I don't know why it got ignored (nor why I didn't remember it at all).


The analysis of influential observations and outlier robust estimation in GLM still has to wait for a bit at least until we have the necessary supporting infrastructure. Also reasonably reliable identification of partial separation (perfect prediction of a subset of observations) is still missing, and I don't know much about the details for that.

Josef

Rachel Melamed

unread,
Oct 12, 2016, 11:04:56 AM10/12/16
to pystatsmodels
Hi Josef,
I'm not sure if this is the same issue, but I ran into a case where the parameter estimation totally failed, and the deviance is again nan, and the pearson chi2 value is super high

Dep. Variable:['success', 'fail']No. Observations:
16069
Model:GLMDf Residuals:15481
Model Family:BinomialDf Model:
587
Link Function:logitScale:1.0
Method:IRLSLog-Likelihood:nan
Date:Wed, 12 Oct 2016Deviance:nan
Time:09:55:30Pearson chi2:2.58e+20
No. Iterations:33

In this case, the mu's range from fully 0 to 1.  Maybe it is due to the sparseness of my "successes".  Of 16069 "groups", 6000 have zero successes. I plotted the distribution below.  Just letting you know.  I'm running 1000 of these regressions and only a few have this problem.
Thanks,
Rachel

josef...@gmail.com

unread,
Oct 16, 2016, 9:13:57 PM10/16/16
to pystatsmodels
Sorry for not replying, I just saw this. (I got caught up for a few days with US presidential election :(

Are your "groups" observations or explanatory variables?
What do you mean with "the parameter estimation totally failed"? How did it fail?

My guess is that you run again into some partial perfect prediction problem, and maybe as a consequence some singular or near singular matrices.

Two opposite solutions:

1) The parameters are not identified or are inf, we cannot estimate them and we should raise an exception in those cases.

2) When we don't want an exception and just "some" solution, then we can use a penalization on the parameters. The penalization (like a Bayesian prior) would force the parameters to be identified (by the prior info) if they are not identified by the data. If the data has enough information, then adding "a little bit of prior information" would not have any remarkable effect. 
We have now glmnet for penalized estimation, however, I guess it would be much faster for this kind of problems to have a proper Ridge type penalization in the standard optimization. Unfortunately, it's not yet available in statsmodels.

For deciding between 1) and 2): 
What's the purpose for the 1000 regressions? What kind of solution would help for cases where the data is "messed up"?

Rachel Melamed

unread,
Oct 20, 2016, 4:38:16 PM10/20/16
to pystatsmodels
Better to not think about the election, when you have nice python packages to think about!
The groups are settings of explanatory variables.  The parameters are not inf, but some of them are enormous (10**36), and obviously the fit to the data is very bad, so that's what I mean by failed.  If I add a couple more parameters to the model the fit is fine (in terms of deviance, etc).   I will definitely add some analysis with other packages for comparison.

The 1000 regressions are to model 1000 different/independent output variables.  
Thanks!
Reply all
Reply to author
Forward
0 new messages