Dear Rona,
I am not sure if I will be able to coney everything regarding how to deal with
multicollinearity but hopefully I will able to give you some good pointers:
First, you need to look how "bad" your
multicollinearity is. You can go about this is in several different ways - Variance Inflation Factor, Tolerance (which is inverse of VIF), Condition Index, Eigenvalue analysis, etc
Let's say you use VIF as it is by far most popular method (even though not the most comprehensive one - there are methods by
Belsley, Kuh and Welsch
that go much more in depth as noted in their book Regression Diagnostics: Identifying Influential Data and Sources of Collinearity which is useful resource albeit rather advanced one)
VIF quantifies how much the variance of an estimated regression
coefficient increases if your predictors are correlated. Usually, if VIF is greater than 5 indicates a worrisome amount of
collinearity.
If feasible, the easiest method of reducing
multicollinearity is increasing your sample size - larger dataset usually means more variance thus likely weaker dependency between our predictors.
As you noted, any regularization technique like Ridge (L2) or Lasso (L1) regression would resolve the issue of multicollinearity but due to the nature of regularization it poses a challenge when it comes to hypothesis testing. One way to go around this problem (while using Ridge or Lasso regression), is to use
bootstrap methods to estimate the distribution of the ridge/lasso regression coefficients. If you resample your data over and over again and fit a
ridge/lasso regression model to each sample. This way you can build up a distribution
of each coefficient and then perform hypothesis testing based on this
newly obtained empirical distribution.
Another way to deal with this issue is to use principal component analysis - this is a
dimensionality reduction technique that transforms your
correlated variables into a set of uncorrelated principal components.
You can then use these components as the independent variables in your
regression model. This will help you with your
multicollinearity issue but the new PCA vectors might be hard to interpret as they might not have a direct link to your original predictors.
Similar to PCA, and in your case maybe more applicable is partial least squares regression as it does basically the same thing as PCA but the principal component is as similar as possible to the predictor as
the components are chosen to maximize the covariance with the predictor variables.
There are some other advanced statistical techniques that , if I'm not mistaken are discussed in the book I mentioned, that I am not too familiar with.
Lastly, even though I hope no-one from statistics department is reading this part, its worth mentioning that when it comes to applied work, in some cases moving from strict hypothesis
testing to the estimation and interpretation of effect sizes might be a valid way forward. For example, ridge regression provides biased output, but potentially better, more understandable, and
generalizable estimates of the relationships in your dataset. Looking at these estimates, discussing them, along with their limitations, can be a valid approach
in many research situations.
Hope this helps,
Mihovil