Issue with variance inflation factor function

20 views
Skip to first unread message

Alex Wu

unread,
Aug 21, 2017, 8:40:55 AM8/21/17
to pystatsmodels
Hi there,

Thanks for all the endeavours for providing the statsmodels packages in Python. However, while we are using that for our teaching sessions for regression model, we spot some discrepancies and hope you can fix this.

The problem comes from the default setting in statsmodels.regression.linear_model.OLS   
and the default setting in statsmodels.ols.

The first one, which is also used in the variance inflation factor function (statsmodels.stats.outliers_influence.variance_inflation_factor), has the default setting with no intercept while the latter one has the default setting with intercept. This creates a bit of confusion in the result as the result won't be the same if you use the VIF function and or use step-by-step calculation by using the statsmodel.ols.

One possible version to fix the VIF function is: 

def vif_RG(exog, exog_idx):

    import statsmodels.api as sm

    k_vars = exog.shape[1]

   x_i = exog[:, exog_idx]

    mask = np.arange(k_vars) != exog_idx

    x_noti = exog[:, mask]

    x_noti = sm.add_constant(x_noti)

    r_squared_i = sm.OLS(x_i, x_noti).fit().rsquared

    #print(r_squared_i)

    vif = 1. / (1. - r_squared_i)

    return vif


Can you guys provide some feedback on this once you decide to change or take some actions about this?


Many thanks,

Alex

josef...@gmail.com

unread,
Aug 21, 2017, 9:13:52 AM8/21/17
to pystatsmodels
This was intended the way it is.
It was written together with the other outlier and influence measures as post estimation diagnostic for the specific regression problem used. If the user changes the design matrix, then the vif will change.

e.g.

However, over time I realized that vif is also used as standalone multicollinearity measure before deciding on a specific design matrix or as measure for the underlying data.

The vif then needs to work as standalone function and have options for demeaning, scaling or standardizing. I wrote a PR, I don't remember the status, but it would be helpful if you could review and comment whether this provides what you need.


Josef


 


Many thanks,

Alex


Reply all
Reply to author
Forward
0 new messages