VIF results

51 views
Skip to first unread message

Hector Plata

unread,
Jan 13, 2016, 3:43:11 PM1/13/16
to pystatsmodels
HI, I'm new to python and the statsmodels library. Currently Im trying to replicate a project I did for my econometrics class (in which I used stata) with the statsmodels library. I have a problem when Im calculating the VIF with the statsmodels.stats.outliers_influence.variance_inflation_factor method, since the results arent the same nor slightly equal to the ones obtained by stata. The code I used is presented below, the variable vifResults stores the vif using the statsmodels method which is the one that diverge from the one calculated by stata and the vif variable calculates it in a "rudimentary" way, being this method the most accurate.

vif_stata = 2.49
vifResults = 1.51828167883
vif = 2.48617180511

import statsmodels.stats.outliers_influence as oi

#vif using the statsmodels method
vifResults
= oi.variance_inflation_factor(vifVar.as_matrix(), i)

#vif using the OSL rsquared
vif
= 1/(1 - sm.ols(formula='formula', data=vifVar).fit().rsquared)


I also want to mention that is not an index problem, by this I mean that the formula in the osl parameter is consistent with the index 'i' in the variance_inflation_factor.

I want to apologize for my english since is not my first language and I appreciate any help that can shed some light on why im getting different values.

josef...@gmail.com

unread,
Jan 13, 2016, 3:48:10 PM1/13/16
to pystatsmodels
I need to look at this again later.

my quick guess:
There is a "problem" with the constant, because I assumed that the vif
uses the exog array of a matrix.

Try to add a constant (column of ones) to vifVar, and see if the
results then match.

"Standalone" vif with explicit constant handling is in a PR but not
yet merged, AFAIR.

Josef
Message has been deleted

Hector Plata

unread,
Jan 13, 2016, 4:09:12 PM1/13/16
to pystatsmodels
Thanks for the quick response, you were right about the vifVar dataframe missing the constant (columns of 1's), adding it resolves my problem!
I didnt quite get what the "design matrix" meant when I read the documentation, thats what led me to my error. Thanks again!

Babak Ghalebi

unread,
Apr 28, 2016, 7:04:01 PM4/28/16
to pystatsmodels
After facing a similar problem myself, I looked into the source code of "variance_inflation_factor" and found the issue: the OLS function doesn't include an intercept term by default. Therefore the regression that is run in "variance_inflation_factor" is without an intercept term which will boost the R-squared which consequently will boost up the VIF. The source code of "variance_inflation_factor" should be edited to include an intercept term in the OLS function call.

-Babak 

josef...@gmail.com

unread,
Apr 28, 2016, 7:08:27 PM4/28/16
to pystatsmodels
On Thu, Apr 28, 2016 at 7:03 PM, Babak Ghalebi <babak....@gmail.com> wrote:
After facing a similar problem myself, I looked into the source code of "variance_inflation_factor" and found the issue: the OLS function doesn't include an intercept term by default. Therefore the regression that is run in "variance_inflation_factor" is without an intercept term which will boost the R-squared which consequently will boost up the VIF. The source code of "variance_inflation_factor" should be edited to include an intercept term in the OLS function call.

Hi Babak,

with a link to the PR that is supposed to add the related enhancements with new functions.

IIRC, I want to keep the behavior of the current variance_inflation_factor for use with regression models.

Josef
Reply all
Reply to author
Forward
0 new messages