import numpy as np
import scipy as sp
import scipy.linalg
from scipy.linalg.fblas import dger, dgemm
def mean_cov(X):
n,p = X.shape
m = X.mean(axis=0)
# covariance matrix with correction for rounding error
# S = (cx'*cx - (scx'*scx/n))/(n-1)
# Am Stat 1983, vol 37: 242-247.
cx = X - m
scx = cx.sum(axis=0)
scx_op = dger(-1.0/n,scx,scx)
S = dgemm(1.0, cx.T, cx.T, beta=1.0,
c=scx_op, trans_a=0, trans_b=1, overwrite_c=1)
S[:] *= 1.0/(n-1)
return m,S.T
Let's time this couple of times against NumPy:
if __name__ == '__main__':
from time import clock
n,p = 20000,1000
X = 2*np.random.randn(n,p) + 5
t0 = clock()
m,S = X.mean(axis=0), np.cov(X, rowvar=False, bias=0)
t1 = clock()
print t1-t0
t0 = clock()
m,S = mean_cov(X)
t1 = clock()
print t1-t0
L:\>meancov.py
7.39771102515
2.24604790004
L:\>meancov.py
16.1079984658
2.21100101726
That speaks for itself :-D
One important lesson from this: it does not help to write a function
like cov(X) in C, when we have access to optimized BLAS from Python. So
let this serve as a warning against using C istead of Python.
If we don't want to correct rounding error, dger goes out and it just
becomes:
def mean_cov(X):
n,p = X.shape
m = X.mean(axis=0)
cx = X - m
S = dgemm(1./(n-1), cx.T, cx.T, trans_a=0, trans_b=1)
return m,S.T
Sturla
P.S. I should perhaps mention that I use SciPy and Numpy linked with
MKL. Looking at Windows' task manager, it seems both np.cov and my code
saturate all four cores.
_______________________________________________
SciPy-User mailing list
SciPy...@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
After plowing though numpy svn, I located np.dot here:
http://svn.scipy.org/svn/numpy/trunk/numpy/ma/extras.py
It seems numpy.cov is written in Python too. But it uses np.dot instead
of dgemm (why is that less efficient?), deals with masked arrays, and
forms more temporary arrays. That's why it's slower. The biggest
contribution is probably dgemm vs. np.dot.
And np.cov does not correct for rounding errors (which I think it should).
Sturla
I don't have MKL, so just using one core:
$ python mean_cov.py
5.09
4.73
If I switch from
n,p = 20000,1000
to
n,p = 2000,1000
I get:
0.51
0.48
A slightly faster version for small arrays (using dot instead of mean and sum):
def mean_cov2(X):
n,p = X.shape
one = np.empty(n)
one.fill(1.0 / n)
m = np.dot(one, X)
cx = X - m
scx = np.dot(one, cx)
scx *= n
scx_op = dger(-1.0/n,scx,scx)
S = dgemm(1.0, cx.T, cx.T, beta=1.0,
c=scx_op, trans_a=0, trans_b=1, overwrite_c=1)
S[:] *= 1.0/(n-1)
return m,S.T
gives
$ python mean_cov.py
0.51
0.48
0.45
but does that get rid of the rounding error correction?
(Pdb) ((S0 - S)*(S0 - S)).sum()
2.6895813860036798e-28
(Pdb) ((S2 - S)*(S2 - S)).sum()
2.331973369765596e-27
where S0 is np.cov, S is mean_cov, and S2 is mean_cov2.
http://www.lfd.uci.edu/~gohlke/pythonlibs/
<http://www.lfd.uci.edu/%7Egohlke/pythonlibs/>
http://www.enthought.com/products/epd.php
I'm on linux. But I am curious about the speed ups you get. Is there
an array size below which your parallel version is slower?
Then comes the annoying part, which is buildning and installing NumPy
and SciPy.