Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Linear Regression without a constant term

383 views
Skip to first unread message

Vivek Saxena

unread,
Mar 24, 2010, 8:03:05 AM3/24/10
to
Hi,

Is it possible to perform a linear regression in MATLAB with no constant term?

I have data for 9 regressors and I have to fit a multiple linear regression model of Y (the response) on these 9 regressors without an intercept. That is,

Y = x_1*gamma_1 + x_2*gamma_2 + ..... + x_9*gamma_9 + epsilon

I noticed that regstats automatically appends a column of 1s to the X matrix (corresponding to the 0th regression coefficient being the intercept in the usual formulation), whereas regress assumes that the input X matrix already has such a structure. The documentation states that regress will produce an incorrect model if the constant term is not present.

Thanks

Cheers
Vivek.

Peter Perkins

unread,
Mar 24, 2010, 8:18:35 AM3/24/10
to
On 3/24/2010 8:03 AM, Vivek Saxena wrote:
> I noticed that regstats automatically appends a column of 1s to the X
> matrix (corresponding to the 0th regression coefficient being the
> intercept in the usual formulation),

It's true that by passing in 'linear' to REGSTATS, you do get an intercept term, but you can specify any model you want using a terms matrix. In you case, you want a linear term for each of 9 predictors, no intercept or interactions, and no higher order terms, so the terms matrix is just eye(9).


> whereas regress assumes that the
> input X matrix already has such a structure. The documentation states
> that regress will produce an incorrect model if the constant term is not
> present.

I think you're referring to this:

X should include a column of ones so that the model contains a constant
term. The F statistic and p value are computed under the assumption
that the model contains a constant term, and they are not correct for
models without a constant. The R-square value is one minus the ratio of
the error sum of squares to the total sum of squares. This value can
be negative for models without a constant, which indicates that the
model is not appropriate for the data.

The model itself, i.e., the estimated coefficients and their CIs, are estimated correctly when the model does not include an intercept. It's only the F statistic and the R^2 that become invalid when there's no intercept. Both of these goodness-of-fit statistics assume that the model y = constant + error is a special case of the model you're fitting, and if there's no intercept, it isn't.

Another possibility is to use LSCOV.

Hope this helps.

Jos (10584)

unread,
Mar 24, 2010, 8:22:07 AM3/24/10
to
"Vivek Saxena" <maveric...@yahoo.com> wrote in message <hocv1p$gep$1...@fred.mathworks.com>...

Construct a regression matrix without a column of ones. Example:

% data
x1 = cumsum(rand(1,10)) ;
x2 = cumsum(rand(size(x1))) ;
CF = [20 50] ;
y = CF(1) * x1 + CF(2) * x2 + randn(size(x1))/10 ;

%engine
M = [x1(:) x2(:)]
fittedCF = M \ y(:)

hth
Jos

Torsten Hennig

unread,
Mar 24, 2010, 8:32:51 AM3/24/10
to

Say you have measurements
(x_1)_i,...,(x_9)_i, y_i (i=1,...,n).
Define a matrix A with n rows and 9 columns by
A(i,j) = (x_j)_i (j=1,...,9 ; i=1,...,n))
Define a vector b by
b(i) = y_i (i=1,...,n).
Then the MATLAB command
gamma = A\b
gives your regression coefficients gamma_j.

Best wishes
Torsten.

Vivek Saxena

unread,
Mar 24, 2010, 8:43:08 AM3/24/10
to
Peter Perkins <Peter....@MathRemoveThisWorks.com> wrote in message <hocvur$15d$1...@fred.mathworks.com>...

>
> The model itself, i.e., the estimated coefficients and their CIs, are estimated correctly when the model does not include an intercept. It's only the F statistic and the R^2 that become invalid when there's no intercept. Both of these goodness-of-fit statistics assume that the model y = constant + error is a special case of the model you're fitting, and if there's no intercept, it isn't.

Thanks for your reply Peter. Usually when multicollinearity is to be detected and removed, one begins with a unit length model (centered and scaled), which contains no constant term. [At least that is what we have been taught.] Does MATLAB include a command for standardizing the regression model?

Also, if the design matrix input to REGSTATS is of the form [x11, x12, ...; x21, x22, ...], how does REGSTATS know whether or not a constant term exists? You say that the estimated coefficients and their CIs are estimated correctly even when the model does not include an intercept. But, the models are entirely different in the two cases. How do I know that beta(1) is not an intercept, but the regression coefficient for x1?

Vivek Saxena

unread,
Mar 24, 2010, 9:16:05 AM3/24/10
to
Torsten Hennig <Torsten...@umsicht.fhg.de> wrote in message <897885874.433029.12694...@gallium.mathforum.org>...

> Say you have measurements
> (x_1)_i,...,(x_9)_i, y_i (i=1,...,n).
> Define a matrix A with n rows and 9 columns by
> A(i,j) = (x_j)_i (j=1,...,9 ; i=1,...,n))
> Define a vector b by
> b(i) = y_i (i=1,...,n).
> Then the MATLAB command
> gamma = A\b
> gives your regression coefficients gamma_j.
>
> Best wishes
> Torsten.

Torsten, that is not correct. The regression coefficients are solutions to the least square equation, not A^-1b. The latter approach is simply not applicable because of the presence of error in each measurement (statistical, not deterministic).

Torsten Hennig

unread,
Mar 24, 2010, 9:35:10 AM3/24/10
to
> Torsten Hennig <Torsten...@umsicht.fhg.de> wrote
> in message
> <897885874.433029.1269434001451.JavaMail.root@gallium.

> mathforum.org>...
> > Say you have measurements
> > (x_1)_i,...,(x_9)_i, y_i (i=1,...,n).
> > Define a matrix A with n rows and 9 columns by
> > A(i,j) = (x_j)_i (j=1,...,9 ; i=1,...,n))
> > Define a vector b by
> > b(i) = y_i (i=1,...,n).
> > Then the MATLAB command
> > gamma = A\b
> > gives your regression coefficients gamma_j.
> >
> > Best wishes
> > Torsten.
>
> Torsten, that is not correct. The regression
> coefficients are solutions to the least square
> equation, not A^-1b. The latter approach is simply
> not applicable because of the presence of error in
> each measurement (statistical, not deterministic).

gamma = A\b
is the least-squares solution to the (overdetermíned)
linear system A*gamma = b.

Best wishes
Torsten.

Vivek Saxena

unread,
Mar 24, 2010, 9:52:05 AM3/24/10
to
Torsten Hennig <Torsten...@umsicht.fhg.de> wrote in message <911486177.433479.12694...@gallium.mathforum.org>...
> > Torsten Hennig <Torsten...@umsicht.fhg.de> wrote

> gamma = A\b
> is the least-squares solution to the (overdetermíned)
> linear system A*gamma = b.
>
> Best wishes
> Torsten.

Oh, isn't it just A^-1 b? Hmm, I didn't know. Thanks for pointing out. I use A\b for A^-1 b because MATLAB warns me if I use inv(A)*b. I didn't know its the least square solution.

dpb

unread,
Mar 24, 2010, 10:34:36 AM3/24/10
to
Vivek Saxena wrote:
...

> Oh, isn't it just A^-1 b? Hmm, I didn't know. ...

doc mldivide

--

Peter Perkins

unread,
Mar 24, 2010, 2:34:17 PM3/24/10
to
On 3/24/2010 8:43 AM, Vivek Saxena wrote:

> Thanks for your reply Peter. Usually when multicollinearity is to be
> detected and removed, one begins with a unit length model (centered and
> scaled), which contains no constant term. [At least that is what we have
> been taught.] Does MATLAB include a command for standardizing the
> regression model?

I don't know about "usually", but you can certainly call ZSCORES on your data before fittgin the regression. RIDGE, which does ridge regression, does this automatically for you, but not functions like REGRESS.


> Also, if the design matrix input to REGSTATS is of the form [x11, x12,
> ...; x21, x22, ...], how does REGSTATS know whether or not a constant
> term exists? You say that the estimated coefficients and their CIs are
> estimated correctly even when the model does not include an intercept.
> But, the models are entirely different in the two cases. How do I know
> that beta(1) is not an intercept, but the regression coefficient for x1?

Because REGSTATS does not take a design matrix as an input. It takes a data matrix, and it's the third input that determines how that is turned into a design matrix.

>> help regstats
REGSTATS Regression diagnostics for linear models.
[snip]
The optional input MODEL specifies how the design matrix is created
from DATA. The design matrix is the matrix of term values for each
observation. MODEL can be any of the following strings:

'linear' Constant and linear terms (the default)
'interaction' Constant, linear, and interaction terms
'quadratic' Constant, linear, interaction, and squared terms
'purequadratic' Constant, linear, and squared terms

Alternatively, MODEL can be a matrix of model terms accepted by the
X2FX function. See X2FX for a description of this matrix and for
a description of the order in which terms appear. You can use this
matrix to specify other models including ones without a constant term.

Renwen Lin

unread,
Sep 11, 2012, 4:12:08 AM9/11/12
to
mdl = LinearModel.fit(X,y,'Intercept',false);


"Vivek Saxena" wrote in message <hocv1p$gep$1...@fred.mathworks.com>...

Greg Heath

unread,
Sep 18, 2012, 11:30:21 PM9/18/12
to
"Vivek Saxena" wrote in message <hocv1p$gep$1...@fred.mathworks.com>...
Remove the mean from Y and X and use backslash. The resulting constant coefficient will be negligible.

If you use regstats or regress, the resulting R^2 and other summary statistics are not applicable.

Hope this helps.

Greg
0 new messages