Singular Covariance Matrix

0 views

Skip to first unread message

Florian Peitz

unread,

Aug 5, 2024, 3:08:24 PM8/5/24

to fasticorde

Iam doing some calculations on different matrices (mainly in logistic regression) and I commonly get the error "Matrix is singular", where I have to go back and remove the correlated variables. My question here is what would you consider a "highly" correlated matrix? Is there a threshold value of correlation to represent this word? Like if a variable was 0.97 correlated to another one, is this a "high" enough to make a matrix singular?

A square matrix is singular, that is, its determinant is zero, if it contains rows or columns which are proportionally interrelated; in other words, one or more of its rows (columns) is exactly expressible as a linear combination of all or some other its rows (columns), the combination being without a constant term.

What must multivariate data look like in order for its correlation or covariance matrix to be a singular matrix as described above? It is when there is linear interdependances among the variables. If some variable is an exact linear combination of the other variables, with constant term allowed, the correlation and covariance matrces of the variables will be singular. The dependency observed in such matrix between its columns is actually that same dependency as the dependency between the variables in the data observed after the variables have been centered (their means brought to 0) or standardized (if we mean correlation rather than covariance matrix).

Some frequent particular situations when the correlation/covariance matrix of variables is singular: (1) Number of variables is equal or greater than the number of cases; (2) Two or more variables sum up to a constant; (3) Two variables are identical or differ merely in mean (level) or variance (scale).

Also, duplicating observations in a dataset will lead the matrix towards singularity. The more times you clone a case the closer is singularity. So, when doing some sort of imputation of missing values it is always beneficial (from both statistical and mathematical view) to add some noise to the imputed data.

In geometrical viewpoint, singularity is (multi)collinearity (or "complanarity"): variables displayed as vectors (arrows) in space lie in the space of dimentionality lesser than the number of variables - in a reduced space. (That dimensionality is known as the rank of the matrix; it is equal to the number of non-zero eigenvalues of the matrix.)

In a more distant or "transcendental" geometrical view, singularity or zero-definiteness (presense of zero eigenvalue) is the bending point between positive definiteness and non-positive definiteness of a matrix. When some of the vectors-variables (which is the correlation/covariance matrix) "go beyond" lying even in the reduced euclidean space - so that they cannot "converge in" or "perfectly span" euclidean space anymore, non-positive definiteness appears, i.e. some eigenvalues of the correlation matrix become negative. (See about non-positive definite matrix, aka non-gramian here.) Non-positive definite matrix is also "ill-conditioned" for some kinds of statistical analysis.

The picture below shows regression situation with completely collinear predictors. $X_1$ and $X_2$ correlate perfectly and therefore these two vectors coincide and form the line, a 1-dimensional space. This is a reduced space. Mathematically though, plane X must exist in order to solve regression with two predictors, - but the plane is not defined anymore, alas. Fortunately, if we drop any one of the two collinear predictors out of analysis the regression is then simply solved because one-predictor regression needs one-dimensional predictor space. We see prediction $Y'$ and error $e$ of that (one-predictor) regression, drawn on the picture. There exist other approaches as well, besides dropping variables, to get rid of collinearity.

The final picture below displays a situation with nearly collinear predictors. This situation is different and a bit more complex and nasty. $X_1$ and $X_2$ (both shown again in blue) tightly correlate and thence almost coincide. But there is still a tiny angle between, and because of the non-zero angle, plane X is defined (this plane on the picture looks like the plane on the first picture). So, mathematically there is no problem to solve the regression. The problem which arises here is a statistical one.

Usually we do regression to infer about the R-square and the coefficients in the population. From sample to sample, data varies a bit. So, if we took another sample, the juxtaposition of the two predictor vectors would change slightly, which is normal. Not "normal" is that under near collinearity it leads to devastating consequences. Imagine that $X_1$ deviated just a little down, beyond plane X - as shown by grey vector. Because the angle between the two predictors was so small, plane X which will come through $X_2$ and through that drifted $X_1$ will drastically diverge from old plane X. Thus, because $X_1$ and $X_2$ are so much correlated we expect very different plane X in different samples from the same population. As plane X is different, predictions, R-square, residuals, coefficients - everything become different, too. It is well seen on the picture, where plane X swung somewhere 40 degrees. In a situation like that, estimates (coefficients, R-square etc.) are very unreliable which fact is expressed by their huge standard errors. And in contrast, with predictors far from collinear, estimates are reliable because the space spanned by the predictors is robust to those sampling fluctuations of data.

Even a high correlation between two variables, if it is below 1, doesn't necessarily make the whole correlation matrix singular; it depends on the rest correlations as well. For example this correlation matrix:

Statistical data analyses, such as regressions, incorporate special indices and tools to detect collinearity strong enough to consider dropping some of the variables or cases from the analysis, or to undertake other healing means. Please search (including this site) for "collinearity diagnostics", "multicollinearity", "singularity/collinearity tolerance", "condition indices", "variance decomposition proportions", "variance inflation factors (VIF)".

I'm working with Gaussian process regression. Currently I start testing different covariance functions and compositions to see what type of data they could describe best. I made an own implementation in Java.

Are there methods or hints for regularizing the matrices? Or can that be done by using other values or ranges as inputs? May the introduction of error terms would help as well? Most problems I get with integer $x$ inputs to the Brownian motion covariance function $k(x,x') = \min(x,x')$. When I am using this the matrix it is always singular.

If all covariance functions give you a singular matrix, it could be that some of your data points are identical, which gives two identical rows/columns in the matrix. To regularise the matrix, just add a ridge on the principal diagonal (as in ridge regression), which is used in Gaussian process regression as a noise term.

Note that using a composition of covariance functions or an additive combination can lead to over-fitting the marginal likelihood in evidence based model selection due to the increased number of hyper-parameters, and so can give worse results than a more basic covariance function, even though the basic covariance function is less suitable for modelling the data.

I am trying to do an estimation by kriging with gstat, but can never achieve it because of an issue with the covariance matrix. I never have estimates on the locations I want, because they are all skipped. I have the following warning message, for each location :

Actually, there were duplicate locations in abun dataset (zerodist(abun)), they were not to be seeked into the grid on which I wanted to krig estimates. After getting rid of the duplicates, kriging worked fine.

I am trying to do a factor analysis (PCA) on a large data set. It has 181 columns and 163 rows, values are between -6 and 8. I have started the factor analysis and the eigenvalues and scree plot looks good.Below all the factors it says: "Warning: the Correlation matrix is not positive definite". Can someone help me understand what it means?

I try to perform the analysis using Maximum Likelihood and Common Factor Analysis with a Varimax rotations method. But I get an error message saying: "Maximum likelihood method requires a nonsingular correlation matrix. Switching to principal components method may fix the problem."

which expresses component X1 as a linear polynomial of the other components Xi. We conclude that a random vector X is singular if and only if one of its components is a linear polynomial of the other components. In this sense, a singular covariance matrix indicates that at least one component of a random vector is extraneous.

If one component of X is a linear polynomial of the rest, then all realizations of X must fall in a plane within n. The random vector X can be thought of as an m-dimensional random vector sitting in a plane within n, where m X in Exhibit 3.6.

If a random vector X is singular, but the plane it sits in is not aligned with the coordinate system of n, we may not immediately realize that it is singular from its covariance matrix Σ. A simple test for singularity is to calculate the determinant Σ of the covariance matrix. If this equals 0, X is singular. Once we know that X is singular, we can apply a change of variables to eliminate extraneous components Xi and transform X into an equivalent m-dimensional random vector Y, m X sit in so that it aligns with the coordinate system of n. Such a change of variables is obtained with a linear polynomial of the form