This is a bug fix release.
The low-level functions nanstd_3d_int32_axis1 and
nanstd_3d_int64_axis1, called by bottleneck.nanstd(), wrote beyond the
memory owned by the output array if both arr.shape[1] == 0 and
arr.shape[0] > arr.shape[2], where arr is the input array.
Thanks to Christoph Gohlke for finding an example to demonstrate the bug.
download
http://pypi.python.org/pypi/Bottleneck
docs
http://berkeleyanalytics.com/bottleneck
code
http://github.com/kwgoodman/bottleneck
mailing list
http://groups.google.com/group/bottle-neck
mailing list 2
http://mail.scipy.org/mailman/listinfo/scipy-user
How embarrassing! The same bug in nanstd() that was fixed in 0.4.1
exists in nanvar(). Thank you, Christoph, for pointing that out. Fixed
in Bottleneck 0.4.2.
> Any interest adding a "min_periods" argument to the moving window
> functions in bottleneck?
Each moving window function in Bottleneck has a NaN version and a
non-NaN version, so move_nanmean() and move_mean(), for example.
Pandas has one version but you can adjust the min_periods to get
either the NaN or non-NaN version or anything in between. That's
clever.
The rest of Bottleneck uses the NaN and non-NaN naming, for example,
nanmedian and median. I think it is simpler (to discover what
Bottleneck can do for example) to stick with that. Much harder to
explain that the functionality is in a parameter that most users
haven't seen before. But let me think about it. It would be useful.
> One random question. Any idea on the long import time:
>
> $ time python -c "import bottleneck"
>
> real 0m0.712s
> user 0m0.546s
> sys 0m0.114s
> $ time python -c "import numpy"
>
> real 0m0.142s
> user 0m0.090s
> sys 0m0.049s
> $ time python -c "import scipy"
>
> real 0m0.201s
> user 0m0.132s
> sys 0m0.066s
Bottleneck has many low-level functions, for example,
median_2d_float64_axis0, median_2d_float64_axis1,
median_2d_int32_axis0, etc, etc. Maybe that explains it? But scipy has
a lot of functions too, so I don't know.
> One random question. Any idea on the long import time:
>
> $ time python -c "import bottleneck"
>
> real 0m0.712s
> user 0m0.546s
> sys 0m0.114s
> $ time python -c "import numpy"
>
> real 0m0.142s
> user 0m0.090s
> sys 0m0.049s
> $ time python -c "import scipy"
>
> real 0m0.201s
> user 0m0.132s
> sys 0m0.066s
Bottleneck imports are now 3x faster. I switched to a lazy import of
scipy (Bottleneck rarely uses scipy).
Before:
$ time python -c "import bottleneck"
real 0m0.196s
user 0m0.150s
sys 0m0.040s
After:
$ time python -c "import bottleneck"
real 0m0.061s
user 0m0.010s
sys 0m0.050s
Does adding Bottleneck to your package increase the import time by
0.06 seconds? No, not if your package imports numpy:
$ time python -c "import numpy; import bottleneck"
real 0m0.060s
user 0m0.020s
sys 0m0.030s
I used this pattern for lazy imports:
email = None
def parse_email():
global email
if email is None:
import email
which I found here:
http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Import_Statement_Overhead
Thanks, Wes, for the report.