LASSO and ridge regression & in statsmodels

4,982 views
Skip to first unread message

Josh Wasserstein

unread,
Sep 5, 2014, 6:38:24 AM9/5/14
to pystat...@googlegroups.com
Hi,

I searched but could not find any references to LASSO or ridge regression in statsmodels. Are they not currently included? If so, is it by design (e.g. sklearn includes it) or for other reasons (time)?

Is there any way to plug these models from sklearn into statsmodels and get the same level of analysis of the results?

Thanks again for this great package!

Josh

Kevin Sheppard

unread,
Sep 5, 2014, 7:11:00 AM9/5/14
to pystat...@googlegroups.com

josef...@gmail.com

unread,
Sep 5, 2014, 8:08:48 AM9/5/14
to pystatsmodels
On Fri, Sep 5, 2014 at 7:11 AM, Kevin Sheppard <kevin.k....@gmail.com> wrote:
Look at scikit-learn.




On Friday, September 5, 2014 11:38:24 AM UTC+1, Josh Wasserstein wrote:
Hi,

I searched but could not find any references to LASSO or ridge regression in statsmodels. Are they not currently included? If so, is it by design (e.g. sklearn includes it) or for other reasons (time)?


Our policy has been largely to leave it to scikit-learn. Although, statsmodels has had fit_regularized for discrete models for quite some time now. Those are mostly models not covered by scikit-learn.
scikit-learn has a lot more of the heavy duty regularized methods (with compiled packages and cython extensions) that we will not get in statsmodels.

Kerby has recently started to add L1-regularized fit to other models.

I have a PR that implements a generalized version of Ridge regression that allows shrinkage to a full prior. (A bit similar to Bayesian multilevel models.)


One thing that I'm still uncertain about is whether the standard errors are really reliable in L1-penalization if we want to do statistical inference on the model.
(L2 is easier because it's differentiable and the standard theory applies.)


We also had discussions in various github issues, and I expect that we will have some more general coverage of penalized models in the near future.

 

Is there any way to plug these models from sklearn into statsmodels and get the same level of analysis of the results?

If you do "feature selection" with scikit-learn, then you could plug the reduced model into the models in statsmodels, however the results statistics (covariance of parameters and inference) will not take the feature selection or data mining into account.


One usecase that I'd like to cover is when we have, say 5, explanatory variables where we are interested in the estimated parameters and hundred(s) "confounders", explanatory variables that might have an effect but we are not really interested in them.

Josef

Kerby Shedden

unread,
Sep 5, 2014, 2:58:57 PM9/5/14
to pystat...@googlegroups.com
Here are two examples of the lasso for logistic regression (needs current statsmodels master):

http://nbviewer.ipython.org/urls/umich.box.com/shared/static/ck0n67gt1sxaiwj9bp2c.ipynb

http://nbviewer.ipython.org/urls/umich.box.com/shared/static/az63gav7ly7y7jbxe9zd.ipynb

In master, RegressionModel.fit_regularized fits a linear model using the lasso.  It should work with OLS and GLS but GLS is untested.  Unfortunately no example at present.

Kerby Shedden

unread,
Sep 8, 2014, 7:19:42 AM9/8/14
to pystat...@googlegroups.com
Here is a notebook demonstrating the lasso for linear models:

http://nbviewer.ipython.org/urls/umich.box.com/shared/static/rg4sbfag376a5ffbhs47.ipynb

Josh Wasserstein

unread,
Sep 8, 2014, 8:09:19 AM9/8/14
to pystat...@googlegroups.com
Great resources. Thank you Kerby!

Josh
Reply all
Reply to author
Forward
0 new messages