I've run into a case where it'd be convenient to be able to compute
the Cholesky decomposition of a semi-definite matrix. (It's a
covariance matrix computed from not-enough samples, so it's positive
semi-definite but rank-deficient.) As any schoolchild knows, the
Cholesky is well defined for such cases, but I guess that shows why
you shouldn't trust schoolchildren, because I guess the standard
implementations blow up if you try:
In [155]: np.linalg.cholesky([[1, 0], [0, 0]])
LinAlgError: Matrix is not positive definite - Cholesky
decomposition cannot be computed
Is there an easy way to do this?
-- Nathaniel
_______________________________________________
SciPy-User mailing list
SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
Hi all,
I've run into a case where it'd be convenient to be able to compute
the Cholesky decomposition of a semi-definite matrix. (It's a
covariance matrix computed from not-enough samples, so it's positive
semi-definite but rank-deficient.) As any schoolchild knows, the
Cholesky is well defined for such cases, but I guess that shows why
you shouldn't trust schoolchildren, because I guess the standard
implementations blow up if you try:
In [155]: np.linalg.cholesky([[1, 0], [0, 0]])
LinAlgError: Matrix is not positive definite - Cholesky
decomposition cannot be computed
Is there an easy way to do this?
Hi all,
I've run into a case where it'd be convenient to be able to compute
the Cholesky decomposition of a semi-definite matrix. (It's a
covariance matrix computed from not-enough samples, so it's positive
semi-definite but rank-deficient.) As any schoolchild knows, the
Cholesky is well defined for such cases, but I guess that shows why
you shouldn't trust schoolchildren, because I guess the standard
implementations blow up if you try:
In [155]: np.linalg.cholesky([[1, 0], [0, 0]])
LinAlgError: Matrix is not positive definite - Cholesky
decomposition cannot be computed
Is there an easy way to do this?
> How bout a slight regularization,
Indeed, from a statistics point of view, and non definite positive
covariance matrix is simply an estimation error. As a descriptive
statistic, it may look good, but it does not contain much information on
the population covariance. The easiest solution is to regularize it.
The good news is that there exists good heuristics (approximate oracles)
to find the optimal regularization parameter). If you assume that your
underlying data is Gaussian, the 'Approximate Oracle Shrinkage' is
optimal. If you don't want this assumption, the Ledoit-Wolf shrinkage
works great. In practice they are often similar, but LW tends to
under-penalize compared to AOS if you have little data compare to the
dimension of your dataset.
Another good news is that you can find Python implementations of these
heuristics in the scikit-learn:
https://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/covariance/shrunk_covariance_.py
you will find references to the paper if you are interested, and you can
simply copy out the function from the scikit-learn if you don't want to
depend on it.
HTH,
Gael