Covariance in predictors

9 views
Skip to first unread message

Adam Clark

unread,
Nov 28, 2023, 3:54:29 PM11/28/23
to Manuele Bazzichetto, bayesianm...@googlegroups.com
The cases with Sigma as a scalar are, to my knowledge, the only ones I've run into myself in proofs - expanding Sigma to a matrix makes sense, though I honestly have no idea at all how that actually impacts the rest of the assumptions/interpretations of the regression.

But, thanks for the derivation. That's honestly also one I've never seen before. Perhaps you'd like to share it next week in class ;)? You can also help teach me what the Gauss-Markov theorem is - that's also new for me O.o.

In any case, I'm forwarding this to the class email list, and have added you to the group.

Thanks again!
Adam

On Tue, Nov 28, 2023 at 7:21 PM Manuele Bazzichetto <manuele.b...@gmail.com> wrote:
Indeed, among the assumptions of linear regression there is the lack of super strong correlation, as this makes model coefficients not uniquely identifiable. In the extreme case of a multiple regression with 2 predictors that are perfectly correlated (correlation coefficient = 1, so the predictors are basically the same thing), you would not be able to take the inverse of X’X (the matrix is singular), which is needed to solve the equations for the regression coefficients (and also to derive their covariance matrix). 

Nonetheless, the lack of super strong correlation is not among the assumptions of the Gauss-Markov theorem. What makes OLS estimators BLUE is the assumption of errors being independently distributed with mean 0 and variance Sigma^2). As far as I am concerned, the only place where you assume a covariance matrix with 0 off-diagonals is the errors’ covariance matrix (at least, in the ‘simplest' linear regression settings). 

Under the assumption of errors’ independence (0 off-diagonal elements of the covariance matrix) and homoscedasticity (diagonal of covariance matrix is populated by the identical Sigma^2 values), the n x n covariance matrix of the error term can be written as Sigma^2 * I, where I is the identity matrix. Basically, these assumptions are the key to end-up with the covariance matrix of the coefficients’ estimators being Sigma^2 * (X’X)-1. 

I am attaching a paper where I wrote down how to derive the covariance matrix of the coefficients’ estimators (assuming iid errors). This should make clearer why we need the covariance matrix of the error term to be Sigma^2 * I, while we don’t necessarily assume predictors are orthogonal to derive it. You can think of X’X as something super close to the covariance matrix of the predictors (although it’s more about regressors that we are talking about). Indeed, centering the columns of X (ie centering predictors) and scaling the elements of the resulting X’X by n would give you an estimate of the covariance matrix of the predictors. 

Replacing Sigma^2 by its estimators to derive the covariance matrix of the coefficients implies the strong assumption that errors in the population are iid (but that’s what we actually assume in linear regression!). That’s why people came out with things like sandwich estimators or feasible GLS, which are a ’softer’ version of techniques like GLS, mixed models and so on, where you make different assumptions on the errors’ covariance matrix (GLS with no random effects) or the covariance matrix of the response (mixed models). 

I think that this way of computing the covariance matrix of the coefficients’ estimators extend to any form of relationship between the response and the predictors (of course, as long as we are still in the world of models linear in the parameters). Fitting polynomials does not constrain the off-diagonals of the covariance matrix of the estimators to 0 (except if orthogonal polynomial terms are used, eg poly(x, 2, raw = F)).

Hope this long e-mail helps, and, if not, I am happy to discuss about this further. However, never forget I am 100% self-educated on stats, and everything I do/say can be easily reversed by a true statistician. 

And please, feel free to share our thoughts to the group chat. Actually, it would be great if you could invite me to this chat? It seems like a cool space to talk about stats 😊

Have a nice afternoon,

Manuele

IMG_9809.jpeg




Il giorno 28 nov 2023, alle ore 18:09, Adam Clark <adam....@gmail.com> ha scritto:

For sure - it really was super helpful. Among other things, if I'd left the model totally linear, I'd never have been able to show the result that I wanted to show, so you very much saved the day.

This is super helpful, though. I honestly had no idea at all about the second step (replacing Sigma with the estimator). For some reason, I'd assumed that the off-diagonal elements of Sigma were fixed at zero by assumption in OLS (and so, that the variance inflation factor was just an indicator of how badly you were wrong by making this assumption). Though clearly not - if R implements it in lm, then you are almost certainly right that this is just the default way of doing it. Maybe there was a time back in the bad old days before SVD that you had to assume zero off-diagonals? In any case, I'll have to look a bit into the OLS assumptions again - maybe collinearity in predictors is allowed if they are strictly linear?

Do you mind if I forward this to the group chat? I'm guessing this will be useful context for everyone else too!

On Tue, Nov 28, 2023 at 5:38 PM Manuele Bazzichetto <manuele.b...@gmail.com> wrote:
I am happy that my comment was helpful 😊

Anyway, just to write down what I was talking about before (I am not always able to express myself properly), in linear regression the covariance matrix of the coefficients' estimators (assuming independence and homoscedasticity) is:

Sigma^2 * ( X’ X )-1, 

where Sigma^2 is the error variance and X is the model matrix (with rank equal p, intercept - if included - plus the regressors). ‘-1’ means ‘inverse’. 

The variances of the coefficients’ estimators are the diagonal elements of the above matrix. These variances get inflated when the off-diagonal element(s) of the matrix are not 0. 

An estimator of the covariance matrix of the regression coefficients (estimators) is derived by replacing Sigma^2 by its estimator (the residual variance - Residuals Sum of Squares/(n - p)). Uff, that’s one of the reasons why I prefer Bayesian stats. You don’t have to repeat the word ‘estimator' 100 times 😂

To double-check that these are the variances actually reported in the lm summary, you can compare sqrt(diag(vcov(lm_object))) with the coefficients’ standard errors reported in the model summary.

Thanks again!

Looking forward to the next class!

Manuele
 

Il giorno 28 nov 2023, alle ore 17:06, Adam Clark <adam....@gmail.com> ha scritto:

For sure!! Will be great to have you next week.

And thanks very much for your comment at the end of the class! That was super helpful, and honestly very new and useful for me. I'd honestly thought that including effects of covariance was something that was only done in the "newer" functions like gls, lme, etc - i.e. that it was assumed to be zero for the lm function.

Among other things, your comment helped make the simulated example clearer - I've now added it to the webpage, with a little comment noting your point about linear systems.

On Tue, Nov 28, 2023 at 5:03 PM Manuele Bazzichetto <manuele.b...@gmail.com> wrote:
Hi Adam, 

I’d be interested in following last class on next Tuesday if it’s ok for you. 

Also, thanks a lot again for letting me participate. Your classes are simply amazing and are helping me a lot!

Have a nice afternoon,

Manuele

--
Adam Thomas Clark
Asst. Professor
Karl-Franzens-Universität Graz
Institut für Biologie
Reply all
Reply to author
Forward
0 new messages