Linear model search with BIC

28 views
Skip to first unread message

Warren Kretzschmar

unread,
Oct 30, 2016, 3:25:38 PM10/30/16
to pystatsmodels
Hi,
Has anyone implemented a model search based on information criteria like AIC or BIC in statsmodels? Something along the lines of stepAIC in R?

Cheers,
Warren

josef...@gmail.com

unread,
Oct 30, 2016, 3:53:46 PM10/30/16
to pystatsmodels

we also use brute force methods in some models, mainly in tsa for lag selection e.g. for adfuller, ARMA, VAR, ....

One problem is that stepwise regression has the reputation of not working very well, and then it's difficult to get excited about it. (For example, it doesn't stay high on my personal priorities for long enough to go beyond some experiments and helper functions.)
There are some computational speedups possible, especially for the linear model, that are currently not available, AFAIK.

For everything except for small problems or small predefined sequences of models as in tsa, I would use adaptive LASSO, SCAD or similar penalized estimators.

A related issue for stepwise regression or similar is how to handle variables that span multiple columns, eg. categorical variables, interactions or polynomials.

What's the size of your model selection, and do you have categorical variables?

Josef

josef...@gmail.com

unread,
Oct 30, 2016, 4:42:36 PM10/30/16
to pystatsmodels
On Sun, Oct 30, 2016 at 3:53 PM, <josef...@gmail.com> wrote:


On Sun, Oct 30, 2016 at 3:21 PM, Warren Kretzschmar <wkre...@gmail.com> wrote:
Hi,
Has anyone implemented a model search based on information criteria like AIC or BIC in statsmodels? Something along the lines of stepAIC in R?

Cheers,
Warren


we also use brute force methods in some models, mainly in tsa for lag selection e.g. for adfuller, ARMA, VAR, ....

One problem is that stepwise regression has the reputation of not working very well, and then it's difficult to get excited about it. (For example, it doesn't stay high on my personal priorities for long enough to go beyond some experiments and helper functions.)
There are some computational speedups possible, especially for the linear model, that are currently not available, AFAIK.

based on a google search I found an old branch of mine (that I had completely forgotten about, there is no PR)
that tries to use the sweep algorithm and sequential qr to speed up the search


:(

There are other efficient exact algorithms for example for selecting the best k subset of variables, but AFAIR those are also limited to just a few variables (e.g best 5 out of 15).

Josef
Reply all
Reply to author
Forward
0 new messages