Indeed, among the assumptions of linear regression there is the lack of super strong correlation, as this makes model coefficients
not uniquely identifiable. In the extreme case of a multiple regression with 2 predictors that are perfectly correlated (correlation coefficient = 1, so the predictors are basically the same thing), you would not be able to take the inverse of X’X (the matrix is singular), which is needed to solve the equations for the regression coefficients (and also to derive their covariance matrix).
Nonetheless, the lack of super strong correlation is not among the assumptions of the Gauss-Markov theorem. What makes OLS estimators BLUE is the assumption of errors being independently distributed with mean 0 and variance Sigma^2). As far as I am concerned, the only place where you assume a covariance matrix with 0 off-diagonals is the errors’ covariance matrix (at least, in the ‘simplest' linear regression settings).
Under the assumption of errors’ independence (0 off-diagonal elements of the covariance matrix) and homoscedasticity (diagonal of covariance matrix is populated by the identical Sigma^2 values), the n x n covariance matrix of the error term can be written as Sigma^2 * I, where I is the identity matrix. Basically, these assumptions are the key to end-up with the covariance matrix of the coefficients’ estimators being Sigma^2 * (X’X)-1.
I am attaching a paper where I wrote down how to derive the covariance matrix of the coefficients’ estimators (assuming iid errors). This should make clearer why we need the covariance matrix of the error term to be Sigma^2 * I, while we don’t necessarily assume predictors are orthogonal to derive it. You can think of X’X as something super close to the covariance matrix of the predictors (although it’s more about regressors that we are talking about). Indeed, centering the columns of X (ie centering predictors) and scaling the elements of the resulting X’X by n would give you an estimate of the covariance matrix of the predictors.
Replacing Sigma^2 by its estimators to derive the covariance matrix of the coefficients implies the strong assumption that errors in the population are iid (but that’s what we actually assume in linear regression!). That’s why people came out with things like sandwich estimators or feasible GLS, which are a ’softer’ version of techniques like GLS, mixed models and so on, where you make different assumptions on the errors’ covariance matrix (GLS with no random effects) or the covariance matrix of the response (mixed models).
I think that this way of computing the covariance matrix of the coefficients’ estimators extend to any form of relationship between the response and the predictors (of course, as long as we are still in the world of models linear in the parameters). Fitting polynomials does not constrain the off-diagonals of the covariance matrix of the estimators to 0 (except if orthogonal polynomial terms are used, eg poly(x, 2, raw = F)).
Hope this long e-mail helps, and, if not, I am happy to discuss about this further. However, never forget I am 100% self-educated on stats, and everything I do/say can be easily reversed by a true statistician.
And please, feel free to share our thoughts to the group chat. Actually, it would be great if you could invite me to this chat? It seems like a cool space to talk about stats 😊
Have a nice afternoon,
Manuele
![IMG_9809.jpeg](https://groups.google.com/group/bayesianmodellingug/attach/24e121848c1c2/IMG_9809.jpeg?part=0.1&view=1)
For sure - it really was super helpful. Among other things, if I'd left the model totally linear, I'd never have been able to show the result that I wanted to show, so you very much saved the day.
This is super helpful, though. I honestly had no idea at all about the second step (replacing Sigma with the estimator). For some reason, I'd assumed that the off-diagonal elements of Sigma were fixed at zero by assumption in OLS (and so, that the variance inflation factor was just an indicator of how badly you were wrong by making this assumption). Though clearly not - if R implements it in lm, then you are almost certainly right that this is just the default way of doing it. Maybe there was a time back in the bad old days before SVD that you had to assume zero off-diagonals? In any case, I'll have to look a bit into the OLS assumptions again - maybe collinearity in predictors is allowed if they are strictly linear?
Do you mind if I forward this to the group chat? I'm guessing this will be useful context for everyone else too!
I am happy that my comment was helpful 😊
Anyway, just to write down what I was talking about before (I am not always able to express myself properly), in linear regression the covariance matrix of the coefficients' estimators (assuming independence and homoscedasticity) is:
Sigma^2 * ( X’ X )-1,
where Sigma^2 is the error variance and X is the model matrix (with rank equal p, intercept - if included - plus the regressors). ‘-1’ means ‘inverse’.
The variances of the coefficients’ estimators are the diagonal elements of the above matrix. These variances get inflated when the off-diagonal element(s) of the matrix are not 0.
An estimator of the covariance matrix of the regression coefficients (estimators) is derived by replacing Sigma^2 by its estimator (the residual variance - Residuals Sum of Squares/(n - p)). Uff, that’s one of the reasons why I prefer Bayesian stats. You don’t have to repeat the word ‘estimator' 100 times 😂
To double-check that these are the variances actually reported in the lm summary, you can compare sqrt(diag(vcov(lm_object))) with the coefficients’ standard errors reported in the model summary.
Thanks again!
Looking forward to the next class!
Manuele
For sure!! Will be great to have you next week.
And thanks very much for your comment at the end of the class! That was super helpful, and honestly very new and useful for me. I'd honestly thought that including effects of covariance was something that was only done in the "newer" functions like gls, lme, etc - i.e. that it was assumed to be zero for the lm function.
Among other things, your comment helped make the simulated example clearer - I've now added it to the webpage, with a little comment noting your point about linear systems.
Hi Adam,
I’d be interested in following last class on next Tuesday if it’s ok for you.
Also, thanks a lot again for letting me participate. Your classes are simply amazing and are helping me a lot!
Have a nice afternoon,
Manuele