"SVD did not converge" for GLM - no NaNs

1,460 views
Skip to first unread message

R S

unread,
Jan 14, 2015, 4:45:01 PM1/14/15
to pystat...@googlegroups.com
Hey,

I'm experiencing the following issue:

In [552]: glm_binom = sm.GLM(endog, exog, family=sm.families.Binomial())                                                                                                                        

In [553]: glm_binom.fit()
---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
<ipython-input-553-814e0115c842> in <module>()
----> 1 glm_binom.fit()

....
    101 def get_linalg_error_extobj(callback):

LinAlgError: SVD did not converge

I have seen that this is a common issue where there are NaNs; this is not the case here. The data is attached for debugging. I am clueless as to how to continue. Thanks!

R

exog.txt
endog.txt

josef...@gmail.com

unread,
Jan 14, 2015, 5:27:40 PM1/14/15
to pystatsmodels
It's still possible that the logit transformation introduces nans during the optimization.

Which statsmodels version are you using?   IIRC we had one change for the corner case recently.


If I read your data correctly into pandas, I don't get any SVD failure, fit finishes, but the results look a bit strange

>>> print(res.summary())
                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                 [0, 1]   No. Observations:                   36
Model:                            GLM   Df Residuals:                       26
Model Family:                Binomial   Df Model:                            9
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                    nan
Date:                Wed, 14 Jan 2015   Deviance:                          nan
Time:                        17:13:01   Pearson chi2:                 1.98e+18
No. Iterations:                    13                                        
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const       2.545e+14   2.44e+07   1.04e+07      0.000      2.54e+14  2.54e+14
x1          4.829e+14   8.76e+05   5.52e+08      0.000      4.83e+14  4.83e+14
x2         -3.812e+14   2.75e+06  -1.39e+08      0.000     -3.81e+14 -3.81e+14
x3         -1.647e+13   3.86e+04  -4.27e+08      0.000     -1.65e+13 -1.65e+13
x4          1.631e+12   5.22e+04   3.12e+07      0.000      1.63e+12  1.63e+12
x5          8.522e+12   7.05e+04   1.21e+08      0.000      8.52e+12  8.52e+12
x6          1.235e+11    423.555   2.92e+08      0.000      1.23e+11  1.23e+11
x7          1.021e+11    344.035   2.97e+08      0.000      1.02e+11  1.02e+11
x8         -1.074e+11    782.263  -1.37e+08      0.000     -1.07e+11 -1.07e+11
x9         -5.571e+10    577.962  -9.64e+07      0.000     -5.57e+10 -5.57e+10
==============================================================================



It looks like a perfect prediction case, we warn or raise in discrete Logit but I guess we don't have a check for it in GLM, But I don't know whether Binomial with counts has a perfect prediction problem, I've never heard of it.

>>> res.fittedvalues.values
array([ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.])
>>> res.model.endog
array([ 0.        ,  0.        ,  0.05645161,  0.        ,  0.0546875 ,
        0.        ,  0.00234742,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.00409836,
        0.        ,  0.        ,  0.01744186,  0.04268293,  0.        ,
        0.03846154,  0.5       ,  0.        ,  0.04545455,  0.02325581,
        0.        ,  0.        ,  0.        ,  0.01639344,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,  0.        ])

or something else is strange in this case.


Josef



>
> R
>

R S

unread,
Jan 15, 2015, 7:34:00 AM1/15/15
to pystat...@googlegroups.com
I was using version 0.5.0 (which is the default in anaconda). I updated to 0.6.1, ran into this issue, downgraded scipy to 0.14, and it ran without crashing. 
Thanks!

josef...@gmail.com

unread,
Jan 15, 2015, 8:15:06 AM1/15/15
to pystatsmodels
On Thu, Jan 15, 2015 at 7:34 AM, R S <reg...@gmail.com> wrote:
I was using version 0.5.0 (which is the default in anaconda). I updated to 0.6.1, ran into this issue, downgraded scipy to 0.14, and it ran without crashing. 

Do you get the same or similar numbers, parameter estimates  and so on, as I did?

My impression is still that those numbers are "useless" and we should find out what "corener case" this is hitting.
Maybe parameters not identified, convergence problems or something else.

Josef

R S

unread,
Jan 15, 2015, 8:20:22 AM1/15/15
to pystat...@googlegroups.com
This is what I'm getting:

(14:54:04) In [10]: exog = loadtxt("exog.txt")                                                                                                            
(15:17:11) In [11]: endog = loadtxt("endog.txt")

(15:17:14) In [12]: glm_binom = sm.GLM(endog, exog, family=sm.families.Binomial())                                                                                                           
(15:17:45) In [13]: print glm_binom.fit().summary()                                                                                                                                             
                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:           ['y1', 'y2']   No. Observations:                   36
Model:                            GLM   Df Residuals:                       26
Model Family:                Binomial   Df Model:                            9
Link Function:                  logit   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -28.227
Date:                Thu, 15 Jan 2015   Deviance:                       26.371
Time:                        15:18:02   Pearson chi2:                     3.19
No. Iterations:                    20                                         
==============================================================================
                 coef    std err          z      P>|z|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const        208.4345    603.534      0.345      0.730      -974.469  1391.339
x1             4.9253     12.208      0.403      0.687       -19.001    28.852
x2           -29.0834     81.505     -0.357      0.721      -188.830   130.664
x3            -0.0937      0.338     -0.277      0.782        -0.756     0.569
x4            -0.1737      0.372     -0.466      0.641        -0.904     0.556
x5             1.1037      2.992      0.369      0.712        -4.761     6.969
x6            -0.0007      0.001     -0.575      0.565        -0.003     0.002
x7             0.0044      0.010      0.433      0.665        -0.016     0.024
x8            -0.0006      0.006     -0.105      0.916        -0.011     0.010
x9            -0.0122      0.032     -0.378      0.705        -0.076     0.051
==============================================================================

It looks much better...

josef...@gmail.com

unread,
Jan 15, 2015, 8:25:18 AM1/15/15
to pystatsmodels
Yes, that looks much better, and the numbers look reasonable.
I guess I messed up in my pandas datahandling.

Thanks for the feedback.

Josef

Skipper Seabold

unread,
Jan 15, 2015, 8:48:50 AM1/15/15
to pystat...@googlegroups.com

FWIW, I also got wild numbers in the solution trying this on master yesterday.

Skipper

Reply all
Reply to author
Forward
0 new messages