I think I am done with a complete rewrite of bottleneck.
Let me know if the rewrite branch works for you. I'd like to merge it into master soon.
All unit tests pass for me on
ubuntu 14.04 64-bit with python 2.7.6
ubuntu 14.04 64-bit with python 3.4.0
(Note that numpy 1.9.1 is required.)
I do not see any warnings when the unit tests run on python 2.7.6. (Can someone check if there are warning in python 3?)
The rewrite branch is here:
https://github.com/kwgoodman/bottleneck/tree/rewriteTentative release notes and benchmarks are below:
This release is a complete rewrite of Bottleneck.
**Faster**
- Builds 15 times faster
- Function-call overhead cut in half---a big speed up for small input arrays
- Arbitrary ndim input arrays accelerated; previously only 1d, 2d, and 3d
- bn.nanrankdata is twice as fast for float input arrays
- bn.move_max, bn.move_min are faster for int input arrays
- No speed penalty for reducing along all axes when input is Fortran ordered
**Smaller**
- Compiled binaries 14.1 times smaller
- Source tarball 4.7 times smaller
- 9.8 times less C code
- 4.3 times less Cython code
- 3.7 times less Python code
**Beware**
- Requires numpy 1.9.1
- Single API, e.g.: bn.nansum instead of bn.nansum and nansum_2d_float64_axis0
- On 64-bit systems bn.nansum(int32) returns int32 instead of int64
- Reducing over all axes returns, e.g., 6.0; previously np.float64(6.0)
- bn.ss() now has default axis=None instead of axis=0
- bn.nn() is no longer in bottleneck
**min_count**
- Previous releases had moving window function pairs: move_sum, move_nansum
- This release only has half of the pairs: move_sum
- Instead a new input parameter, min_count, has been added
- min_count=None same as old move_sum; min_count=1 same as old move_nansum
- If # non-NaN values in window < min_count, then NaN assigned to the window
- Exception: move_median does not take min_count as input
**Bug Fixes**
- Can now install bottleneck with pip even if numpy is not already installed
- bn.move_max, bn.move_min now return float32 for float32 input
- Bug prevention: add unit tests for 0d input arrays
Bottleneck performance benchmark
Bottleneck 1.0.0dev
Numpy (np) 1.9.1
Speed is NumPy time divided by Bottleneck time
NaN means approx one-third NaNs; float64 and axis=-1 are used
no NaN no NaN NaN NaN
(10,) (1000,1000) (10,) (1000,1000)
nansum 36.9 4.0 37.2 9.3
nanmean 137.2 5.2 138.6 10.3
nanstd 243.0 4.2 243.6 8.5
nanvar 209.2 4.2 208.7 8.5
nanmin 30.9 1.1 30.7 1.7
nanmax 30.6 1.1 30.4 2.9
median 36.3 0.8 38.6 0.9
nanmedian 49.1 2.9 55.8 6.8
ss 13.7 3.5 13.8 3.5
nanargmin 58.4 4.3 55.1 7.4
nanargmax 58.7 4.3 59.2 9.4
anynan 12.2 1.0 12.7 87.6
allnan 12.9 104.7 12.8 97.7
rankdata 44.2 1.4 44.6 2.1
nanrankdata 56.5 26.8 50.8 39.7
partsort 5.7 0.9 5.9 1.1
argpartsort 2.8 0.7 2.9 0.4
replace 11.0 1.2 11.1 1.2
move_sum 288.2 119.4 287.9 332.8
move_mean 723.2 95.1 728.3 415.1
move_std 1159.7 56.1 1243.9 758.0
move_min 204.2 21.3 208.0 54.2
move_max 229.8 21.7 236.2 123.9
move_median 464.4 43.4 451.8 206.0