https://github.com/kwgoodman/bottleneck/issues/18
Timings of prototype function with (1000, 1000) array:
>> from bottleneck import ss_2d_float64_axis0
>> from bottleneck import ss_2d_float64_axis1
>> from scipy.stats import ss
>>
>> a = np.random.rand(1000,1000)
>>
>> timeit ss(a, 0)
100 loops, best of 3: 10.7 ms per loop
>> timeit ss(a, 1)
100 loops, best of 3: 3.03 ms per loop
>>
>> timeit ss_2d_float64_axis0(a)
100 loops, best of 3: 8.79 ms per loop
>> timeit ss_2d_float64_axis1(a)
1000 loops, best of 3: 1.17 ms per loop
Timing for (100,100) array:
>> a = np.random.rand(100,100)
>> timeit ss(a, 1)
10000 loops, best of 3: 44.8 us per loop
>> timeit ss_2d_float64_axis1(a)
100000 loops, best of 3: 12.2 us per loop
Timing for (1000,10) array:
>> a = np.random.rand(1000,10)
>> timeit ss(a, 1)
10000 loops, best of 3: 148 us per loop
>> timeit ss_2d_float64_axis1(a)
100000 loops, best of 3: 12.8 us per loop
For large 1d arrays np.dot is faster:
>> a = np.random.rand(1000)
>> timeit ss_1d_float64_axisNone(a)
1000000 loops, best of 3: 1.77 us per loop
>> timeit np.dot(a, a)
1000000 loops, best of 3: 1.6 us per loop
But not for small 1d arrays:
>> a = np.random.rand(10)
>> timeit ss_1d_float64_axisNone(a)
1000000 loops, best of 3: 696 ns per loop
>> timeit np.dot(a, a)
1000000 loops, best of 3: 801 ns per loop
Prototype code based on bn.nansum:
@cython.boundscheck(False)
@cython.wraparound(False)
def ss_1d_float64_axisNone(np.ndarray[np.float64_t, ndim=1] a):
"SS of 1d array with dtype=float64 along axis=None ignoring NaNs."
cdef np.float64_t asum = 0, ai
cdef Py_ssize_t i0
cdef np.npy_intp *dim
dim = PyArray_DIMS(a)
cdef int n0 = dim[0]
for i0 in range(n0):
ai = a[i0]
asum += ai * ai
return np.float64(asum)
@cython.boundscheck(False)
@cython.wraparound(False)
def ss_2d_float64_axis0(np.ndarray[np.float64_t, ndim=2] a):
"Mean of 2d array with dtype=float64 along axis=0 ignoring NaNs."
cdef np.float64_t asum = 0, ai
cdef Py_ssize_t i0, i1
cdef np.npy_intp *dim
dim = PyArray_DIMS(a)
cdef int n0 = dim[0]
cdef int n1 = dim[1]
cdef np.npy_intp *dims = [n1]
cdef np.ndarray[np.float64_t, ndim=1] y = PyArray_EMPTY(1, dims,
NPY_float64, 0)
for i1 in range(n1):
asum = 0
for i0 in range(n0):
ai = a[i0, i1]
asum += ai * ai
y[i1] = asum
return y