Specifying a Constant in Statsmodels Linear Regression?

Max Song

unread,

Oct 23, 2014, 2:27:01 PM10/23/14

to pystat...@googlegroups.com

I want to use the statsmodels.regression.linear_model.OLS

package to do a prediction, but with a specified constant.

Currently, I can specify the presence of a constant with an argument:

(from docs: http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLS.html)

class statsmodels.regression.linear_model.OLS(endog, exog=None, missing='none', hasconst=None), where **hasconst** is a boolean.

What I want to do is specify explicitly a constant C, and then fit a linear regression model around it. From using that OLS, I want to generate a <RegressionResults class instance> and then access all the attributes like resid, etc.

A current suboptimal work around would be to specify the OLS without a constant, subtract the constant from the Y-values, and create a custom object that wraps both the specified constant and OLS w/o constant, every time I want to do predict or fit, to first subtract the constant from the Y variables, and then use the prediction.

Thanks!

Ps. I Put this question up on SO as well, so if anyone has an answer, can have the rest of the world benefit from the wisdom as well:

http://stackoverflow.com/questions/26534800/specifying-a-constant-in-statsmodels-linear-regression

Nathaniel Smith

unread,

Oct 23, 2014, 2:38:16 PM10/23/14

to pystatsmodels

Just as a note on related api design work, in R a standard feature supported by most model fitting functions is that you can pass a kwarg offset= to specify an a-priori constant "intercept".

josef...@gmail.com

unread,

Oct 23, 2014, 2:48:23 PM10/23/14

to pystatsmodels

On Thu, Oct 23, 2014 at 2:27 PM, Max Song <maxso...@gmail.com> wrote:
> I want to use the statsmodels.regression.linear_model.OLS
> package to do a prediction, but with a specified constant.
>
> Currently, I can specify the presence of a constant with an argument:
>
> (from docs:
> http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLS.html)
>
> class statsmodels.regression.linear_model.OLS(endog, exog=None,
> missing='none', hasconst=None), where **hasconst** is a boolean.
>
> What I want to do is specify explicitly a constant C, and then fit a linear
> regression model around it. From using that OLS, I want to generate a
> <RegressionResults class instance> and then access all the attributes like
> resid, etc.
>
> A current suboptimal work around would be to specify the OLS without a
> constant, subtract the constant from the Y-values, and create a custom
> object that wraps both the specified constant and OLS w/o constant, every
> time I want to do predict or fit, to first subtract the constant from the Y
> variables, and then use the prediction.

That's essentially the only or best way to do it.
A small advantage, this way you don't need to add the constant to the
X in prediction.

It's not currently directly supported.
It is available with `fit_constrained` for models that currently
define offset. However, in that case we handle the general
non-homogeneous restriction `R params = q`.

We don't have offset defined for all models, and I thought it wouldn't
be necessary for the linear models.

I don't know if it will be better to support offset in linear models
(several code changes for little benefit) or as part of the general
restricted linear model (more computational work because we might not
take advantage of the special case that only the constant is fixed).

Josef

josef...@gmail.com

unread,

Oct 23, 2014, 3:11:26 PM10/23/14

to pystatsmodels

To clarify this a bit: fit_constrained for restriction `R params =
q` is on my todo list for 0.7 for the linear models and at least for
the Logit and Probit in discrete (GLM and count models already have
it). (*)
All new non-linear/non-normal models are supposed to support `offset`.

I doubt it's worth the complication to handle offset in linear model,
just use endog - offset.

(*) this is only for constraints that can be obtained by transforming
endog and exog.
General constraints including those on shape parameters require a
different kind of implementation.

Josef

Max Song

unread,

Oct 23, 2014, 4:43:10 PM10/23/14

to pystat...@googlegroups.com

Josef,

Thank you for the quick response and also clarification - glad to hear that the offset is being built into GLM.

I will use the Endog - Offset in this case.

Best,

Max

Kerby Shedden

unread,

Oct 23, 2014, 7:13:42 PM10/23/14

to pystat...@googlegroups.com

My generic implementation of coordinate descent for fitting L1/L2 penalized regressions requires an offset to be able to work generically across all models. So in my branch containing this work (not yet PR'd) I have added the offset to linear models.

Max Song

unread,

Oct 24, 2014, 1:00:39 AM10/24/14

to pystat...@googlegroups.com

Nice! is there a repo of this somewhere online I can clone? having troubles wrapping as mentioned before, as all the rest of my functions rely on

it being specifically a RegressionResult object.

Max Song

unread,

Oct 24, 2014, 1:08:34 AM10/24/14

to pystat...@googlegroups.com

Hi Kerby,

I went to look at your repo - https://github.com/kshedden/statsmodels/blob/master/statsmodels/regression/linear_model.py

and found this for class OLS, which does not seem to support offsets. Maybe its in a super class? Or maybe written somewhere else?

Thanks!

class OLS(WLS):
	__doc__ = """
	A simple ordinary least squares model.

	%(params)s
	%(extra_params)s

	Attributes
	----------
	weights : scalar
	Has an attribute weights = array(1.0) due to inheritance from WLS.

	See Also
	--------
	GLS

	Examples
	--------
	>>> import numpy as np
	>>>
	>>> import statsmodels.api as sm
	>>>
	>>> Y = [1,3,4,5,2,3,4]
	>>> X = range(1,8)
	>>> X = sm.add_constant(X)
	>>>
	>>> model = sm.OLS(Y,X)
	>>> results = model.fit()
	>>> results.params
	array([ 2.14285714, 0.25 ])
	>>> results.tvalues
	array([ 1.87867287, 0.98019606])
	>>> print(results.t_test([1, 0])))
	<T test: effect=array([ 2.14285714]), sd=array([[ 1.14062282]]), t=array([[ 1.87867287]]), p=array([[ 0.05953974]]), df_denom=5>
	>>> print(results.f_test(np.identity(2)))
	<F test: F=array([[ 19.46078431]]), p=[[ 0.00437251]], df_denom=5, df_num=2>

	Notes
	-----
	No constant is added by the model unless you are using formulas.
	""" % {'params' : base._model_params_doc,
	'extra_params' : base._missing_param_doc + base._extra_param_doc}
	#TODO: change example to use datasets. This was the point of datasets!
	def __init__(self, endog, exog=None, missing='none', hasconst=None,
	**kwargs):
	super(OLS, self).__init__(endog, exog, missing=missing,
	hasconst=hasconst, **kwargs)
	if "weights" in self._init_keys:
	self._init_keys.remove("weights")

	def loglike(self, params):
	"""
	The likelihood function for the clasical OLS model.

	Parameters
	----------
	params : array-like
	The coefficients with which to estimate the log-likelihood.

	Returns
	-------
	The concentrated likelihood function evaluated at params.
	"""
	nobs2 = self.nobs / 2.0
	return -nobs2np.log(2np.pi)-nobs2np.log(1/(2nobs2) *\
	np.dot(np.transpose(self.endog -
	np.dot(self.exog, params)),
	(self.endog - np.dot(self.exog,params)))) -\
	nobs2

	def whiten(self, Y):
	"""
	OLS model whitener does nothing: returns Y.
	"""
	return Y

On Thursday, October 23, 2014 7:13:42 PM UTC-4, Kerby Shedden wrote:

Kerby Shedden

unread,

Oct 24, 2014, 9:18:39 AM10/24/14

to pystat...@googlegroups.com

It's in the elastic_net branch, here:

https://github.com/kshedden/statsmodels/blob/elastic_net/statsmodels/regression/linear_model.py

'extra_params' : base._missing_param_doc + base.<span class="n" style="box-sizing: bo
...

josef...@gmail.com

unread,

Oct 24, 2014, 10:43:34 PM10/24/14

to pystatsmodels

On Fri, Oct 24, 2014 at 9:18 AM, Kerby Shedden <kshe...@umich.edu> wrote:

It's in the elastic_net branch, here:

https://github.com/kshedden/statsmodels/blob/elastic_net/statsmodels/regression/linear_model.py

There might be more difficulties in linear models, and I don't know yet how they will work out

related to a constant as offset:
OLS is pretty careful about constant detection and adjusting the results depending on the presence of a constant, e.g. in rsquared.
(pseudo rsquared or lr_null in other models always compare to the model with a constant independently of whether one is present in exog or not.)

Does subtracting a constant offset mean we should use centered tss or uncentered tss?

Looking at the source of RegressionResults, it might not be as bad as I initially thought in terms of attributes and methods that need to be adjusted, it looks like we use predict instead of hardcoded exog dot params in most places (in contrast to some other models).

Josef

Reply all

Reply to author

Forward