fit a model to data using ols with constraints

1,602 views
Skip to first unread message

bsdz

unread,
Oct 31, 2013, 7:49:42 AM10/31/13
to pystat...@googlegroups.com
Is it possible to fit a model to a dataframe where the coefficients satisfy certain constraints?

For example, say in the model: 
mod = sm.ols(formula='Lottery ~ Literacy + Wealth + Region', data=df)
Is it possible to specify constraints on the coefficients of Literacy and Wealth to be greater than zero say whilst those of Region to be negative?

Thanks
Blair

Skipper Seabold

unread,
Oct 31, 2013, 7:56:30 AM10/31/13
to pystat...@googlegroups.com
Not yet in an easy and general way (e.g., non-linear constraints, inequalities), though this is something we're currently working on improving and should be part of a PR soon.

That said, you can do linear constraints only with the experimental restricted least squares code in 

 
Skipper

josef...@gmail.com

unread,
Oct 31, 2013, 8:10:00 AM10/31/13
to pystatsmodels
No, inequality constraints are not supported at all.
scipy has  scipy.optimize.nnls  that could be used to find the parameters.


Current linear models, OLS, ..., only use linear algebra and no iterative solvers.

One of the main problem why we haven't started yet with inequality constraints that may be binding, is that the resulting statistics, the covariance/uncertainty of the parameter estimates, are not standard, and neither I nor anyone else has yet tried to figure out and implement the details.

(We do use inequality constraints in the optimization of some non-linear models through reparameterization, but there they are are not supposed to be binding if the model works correctly.)

Josef



Thanks
Blair

josef...@gmail.com

unread,
Oct 31, 2013, 8:52:32 AM10/31/13
to pystatsmodels
On Thu, Oct 31, 2013 at 8:10 AM, <josef...@gmail.com> wrote:



On Thu, Oct 31, 2013 at 7:49 AM, bsdz <bla...@gmail.com> wrote:
Is it possible to fit a model to a dataframe where the coefficients satisfy certain constraints?

For example, say in the model: 
mod = sm.ols(formula='Lottery ~ Literacy + Wealth + Region', data=df)
Is it possible to specify constraints on the coefficients of Literacy and Wealth to be greater than zero say whilst those of Region to be negative?

In this case you could just try the 4 possibilities for constraints directly

Wealth  Region
not        not binding
not        binding (coefficient = 0, drop variable)
binding  not
binding  binding

and pick the one with the lowest ssr among the ones where the inequality holds.

The results will be conditional on the chosen model and will not take the search over the constraints into account.

Josef

josef...@gmail.com

unread,
Oct 31, 2013, 12:00:07 PM10/31/13
to pystatsmodels
On Thu, Oct 31, 2013 at 8:10 AM, <josef...@gmail.com> wrote:



On Thu, Oct 31, 2013 at 7:49 AM, bsdz <bla...@gmail.com> wrote:
Is it possible to fit a model to a dataframe where the coefficients satisfy certain constraints?

For example, say in the model: 
mod = sm.ols(formula='Lottery ~ Literacy + Wealth + Region', data=df)
Is it possible to specify constraints on the coefficients of Literacy and Wealth to be greater than zero say whilst those of Region to be negative?

No, inequality constraints are not supported at all.
scipy has  scipy.optimize.nnls  that could be used to find the parameters.


Current linear models, OLS, ..., only use linear algebra and no iterative solvers.

One of the main problem why we haven't started yet with inequality constraints that may be binding, is that the resulting statistics, the covariance/uncertainty of the parameter estimates, are not standard, and neither I nor anyone else has yet tried to figure out and implement the details.

I don't think it would be difficult to subclass regression models (or a higher level model) to inherit the data and formula handling and to get the parameter estimates with scipy's nnls.
However, most of the results statistics will not be correct in this case.

GAUSS has been supporting inequality constraints for a long time, but when I read their description a few years ago, I decided that I'm not interested enough to try to implement it.

At the end, it might not be too difficult to implement, but, without background in this, it will take some time to figure out what's going on and what should be done.



I would gladly learn enough about this to review a pull request!

Josef

bsdz

unread,
Oct 31, 2013, 1:15:18 PM10/31/13
to pystat...@googlegroups.com
Thanks for everyone's responses. I'll take a look at the beta-version least squares approach and the binding/non-binding approach. Failing that I may just try R directly or link through with Rpy2. Thanks again.

josef...@gmail.com

unread,
Oct 31, 2013, 3:28:15 PM10/31/13
to pystatsmodels
On Thu, Oct 31, 2013 at 7:56 AM, Skipper Seabold <jsse...@gmail.com> wrote:
On Thu, Oct 31, 2013 at 11:49 AM, bsdz <bla...@gmail.com> wrote:
Is it possible to fit a model to a dataframe where the coefficients satisfy certain constraints?

For example, say in the model: 
mod = sm.ols(formula='Lottery ~ Literacy + Wealth + Region', data=df)
Is it possible to specify constraints on the coefficients of Literacy and Wealth to be greater than zero say whilst those of Region to be negative?

Not yet in an easy and general way (e.g., non-linear constraints, inequalities), though this is something we're currently working on improving and should be part of a PR soon.

Skipper,
Do you have something in the works?
I don't even see an issue for adding RLS.

I'm changing the LikelihoodModelResults.f_test to a wald_test, but I don't see anything about user defined constraints and LM/score tests  (except in the GEE PR).

Josef
Reply all
Reply to author
Forward
0 new messages