Re: [pystatsmodels] Need help by how to deal with warnings from OLSInfluence()

1,281 views
Skip to first unread message

josef...@gmail.com

unread,
Oct 1, 2017, 9:15:44 AM10/1/17
to pystatsmodels


On Sun, Oct 1, 2017 at 8:13 AM, Henrik Eckermann <henrik.ec...@gmail.com> wrote:
Hey guys,

I am still a beginner when it comes to data analysis, especially in Python. I exercise linear regression. The DV is continuous, the IV are:
1. gender (either 0 or 1)
2. DISTRESS (scale from 1-5)
3. FEAR (scale from 1-5)
4. Interaction between FEAR and gender

I want to use some statistics and therefore used the following command:
test = OLSInfluence(results).summary_frame()

If I do this without the interaction term, then there is no problem but when the interaction term is in there, I get the following warnings:

There are diverse warnings. I wonder: What can and should I do now? Can I use the values that will be put out despite the warning?

/Users/henrikeckermann/anaconda3/lib/python3.6/site-packages/statsmodels/stats/outliers_influence.py:309: RuntimeWarning: invalid value encountered in sqrt
 
return self.results.resid / sigma / np.sqrt(1 - hii)
/Users/henrikeckermann/anaconda3/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in greater
 
return (self.a < x) & (x < self.b)
/Users/henrikeckermann/anaconda3/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:879: RuntimeWarning: invalid value encountered in less
 
return (self.a < x) & (x < self.b)
/Users/henrikeckermann/anaconda3/lib/python3.6/site-packages/scipy/stats/_distn_infrastructure.py:1818: RuntimeWarning: invalid value encountered in less_equal
 cond2
= cond0 & (x <= self.a)
/Users/henrikeckermann/anaconda3/lib/python3.6/site-packages/statsmodels/stats/outliers_influence.py:323: RuntimeWarning: invalid value encountered in sqrt
 dffits_
= self.resid_studentized_internal * np.sqrt(hii / (1 - hii))
/Users/henrikeckermann/anaconda3/lib/python3.6/site-packages/statsmodels/stats/outliers_influence.py:352: RuntimeWarning: invalid value encountered in sqrt
 dffits_
= self.resid_studentized_external * np.sqrt(hii / (1 - hii))


Hi Henrik,

You need to look at your `OLSInfluence(results).summary_frame()` dataframe and see whether there are infs or nans for some observations. I guess there are..

Statamodels directly or indirectly through calling numpy or scipy is issuing many warnings in intermediate stages that often don't affect the final results. We haven't cleaned up enough in those cases.

However, In this case the warnings are for the final result, AFAICS.

Two possible reasons

- singular design: 
It could be that including interaction terms causes some perfectly correlated columns
check the condition number and the warning message in print(results.summary()) to see if this is the case. The warning message at the end or summary should show the smallest eigenvalue if it is close to zero.

- perfect prediction:I guess that's less likely in this case, but shows up often as problem in other models like Logit
To check this you can look at the residuals directly, e.g. `np.min(np.abs(results.resid))` or some pandas equivalent like describe.

Getting a test case for this would be helpful to investigate. Can you construct an example with data that you can make public?
Otherwise, it might be possible to construct an artificial example once we figured out the source of the warnings.

It looks like np.sqrt(1 - hii) is negative but I don't know right now what would cause this. (hii is the diagonal of the hat matrix)

Josef
 

Thanks for any help!
Henrik

Henrik Eckermann

unread,
Oct 1, 2017, 9:20:16 AM10/1/17
to pystat...@googlegroups.com
Joseph,

thank you very much. I deleted the topic just when you wanted to answer.

I made the following mistake:

I wrote: ‚ANXIETY ~ DISTRESS + GENDER+ FEAR + INTERACTION‘

When I did not calculate the interaction myself before but just do:
 ‚ANXIETY ~ DISTRESS  + GENDER*FEAR‘

everything worked fine.

Thank you still Josef. Your answer helped me to understand the problem and will also be helpful for the future!
Henrik

josef...@gmail.com

unread,
Oct 1, 2017, 9:29:11 AM10/1/17
to pystatsmodels
On Sun, Oct 1, 2017 at 9:20 AM, Henrik Eckermann <henrik.ec...@gmail.com> wrote:
Joseph,

thank you very much. I deleted the topic just when you wanted to answer.

I made the following mistake:

I wrote: ‚ANXIETY ~ DISTRESS + GENDER+ FEAR + INTERACTION‘

When I did not calculate the interaction myself before but just do:
 ‚ANXIETY ~ DISTRESS  + GENDER*FEAR‘

everything worked fine.

Good, that means that most likely you had a singular matrix and ran into the "dummy variable trap".
If the interaction term doesn't drop enough reference columns, then the interaction term is perfectly collinear with the constant and the main effects. 
patsy does this automatically.

However, patsy does not remove columns for empty cells, so it's possible to run into problems also with patsy, if some interaction cells (combination of levels of the categoricals) don't have any observations.
This is worked around in the parameter estimation in OLS but might cause problems in some post-estimation results. (I never checked those cases.)

Josef
Reply all
Reply to author
Forward
0 new messages